Data Calculation
Statistics: The study of how to collect, analyze, summarize, and present data.
Calculus: A branch of mathematics that involves the study of rates of change and the changes between values that are related by a function.
Causation: When an action directly leads to an outcome, such as cause-effect relationship.
Correlation: The measure of the degree to which two variables change in relationship to each other.
Data Calculation in Spreadsheets
Formula
Array: A collection of values in spreadsheet cells.
Modulo: An operator (%) that returns the remainder when one number is divided by another.
SUMIF: A spreadsheet function that adds numeric data based on one condition.
AVERAGEIF: A spreadsheet function that returns the average of all cell values from a given range that meet a specified condition.
SUMPRODUCT: A function that multiplies arrays and returns the sum of those products.
MAXIFS: A spreadsheet function that returns the maximum value from a given range that meets a specified condition.
MINIFS: A spreadsheet function that returns the minimum value from a given range that meets a specified condition.
Pivot Table
Summary table: A table used to summarize statistical information about data.
Calculated field: A new field within a pivot table that carries out certain calculations based on the values of other fields.
Profit margin: A percentage that indicates how many cents of profit has been generated for each dollar of sale.
Data Calculation in SQL
GROUP BY: A SQL clause that groups rows that have the same values from a table into summary rows.
The simplest way to embed calculations in SQL:
SELECT
columnA,
columnB,
columnA/columnB AS columnX
FROM
table_name
Data Validation
Data validation process: The process of checking and rechecking the quality of data so that it is complete, accurate, secure, and consistent.
Underscores: Lines used to underline words and connect text characters.
Temporary Tables in SQL
Temporary table: A database table that is created and exists temporarily on a database server.
CONVERT: A SQL function that changes the unit of measurement of a value in data.
Create a temporary table using the WITH
clause:
WITH: A SQL clause that creates a temporary table that can be queried multiple times.
WITH trips_over_1_hr AS (
SELECT
*
FROM
bigquery-public-date.new_york.citibike_trips
WHERE
tripduration >= 60
)
## Count how many trips are 60+ munite long:
SELECT
COUNT(*) AS cnt
FROM
trips_over_1_hr
Use the SELECT INTO
statement:
SELECT INTO: A SQL clause that copies data from one table into a temporary table without adding the new table to the database.
SELECT
*
INTO
Africa_Sales
FROM
Global_Sales
WHERE
Region = "Africa"
Use the CREATE TABLE
clause:
CREATE TABLE: A SQL clause that adds a temporary table to a database that can be used by multiple people.
CREATE TABLE Africa_Sales AS
(
SELECT
*
FROM
Global_Sales
WHERE
Region = "Africa"
)
DROP TABLE: A SQL clause that removes a temporary table from a database.