Data Aggregation

Aggregation: The process of collecting or gathering many separate pieces into a whole.

Data aggregation: The process of gathering data from multiple sources and combining it into a single, summarized collection.

Using VLOOKUP in Spreadsheets

MATCH: A spreadsheet function used to locate the position of a specific lookup value.

VALUE: A spreadsheet function that converts a text string that represents a number to a numeric value.

Basic syntax: =VLOOKUP(”value_to_search”, sheet2!$A$1:$D$10, column_number, FALSE)

TRUE searches for approximate results, and FALSE searches for exact matches.

Absolute reference: A reference within a function that is locked so that rows and columns won’t change if the function is copied.

Using JOIN in SQL

JOIN: A SQL function that is used to combine rows from two or more tables based on a related column.

Example:

SELECT
    employees.name AS employee_name,
    employees.role AS employy_role,
    departments.name AS department_name
FROM
    employees
INNER JOIN
    deparments ON
    employees.deparment_id = deparments.deparment_id

Other JOINs: LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN

INNER JOIN: A SQL function that returns records with matching values in both tables.

LEFT JOIN: A SQL function that will return all the records from the left table and only the matching records from the right table.

RIGHT JOIN: A SQL function that will return all records from the right table and only the matching records from the left.

OUTER JOIN: A SQL function that combines RIGHT and LEFT JOIN to return all matching records in both tables.

Using COUNT and COUNT DISTINCT in SQL

COUNT DISTINCT: A SQL function that only returns the distinct values in a specified range.

Example:

SELECT
    orders.*,
    warehouse.warehouse_alias,
    warehouse.state
FROM 
    warehouse_orders.Orders AS orders
JOIN
    warehouse_orders.Warehouse AS warehouse ON
    orders.warehouse_id = warehouse.warehouse_id
LIMIT 100

SELECT
    COUNT(DISTINCT warehouse.state) AS num_states
FROM 
    warehouse_orders.Orders AS orders
JOIN
    warehouse_orders.Warehouse AS warehouse ON
    orders.warehouse_id = warehouse.warehouse_id
GROUP BY
    warehouse.state

Aliasing: Temporarily naming a table or column in a query to make it easier to read and write.

Work with Subqueries

Subquery: A SQL query that is nested inside a larger query.

Inner query: A SQL subquery that is inside of another SQL statement.

Outer query: A SQL statement containing a subquery.

Examples:

SELECT
    station_id,
    num_bikes_available
    (SELECT
        AVG(num_bikes_available)
    FROM bigquery-public-data.new_york.citibike_stations) AS avg_num_bikes_available
FROM 
    bigquery-public-data.new_york.citibike_stations
SELECT
    station_id,
    name,
    number_of_rides AS number_of_rides_starting_at_station
FROM
    (
        SELECT
            start_station_id
            COUNT(*) number_of_rides
        FROM
            bigquery-public-data.new_york.trips
        GROUP BY
            start_station_id
        )
AS station_num_trips
INNER JOIN
    bigquery-public-data.new_york.citibike_stations ON
    station_id = start_station_id
ORDER BY
    number_of_rides DESC
SELECT
    station_id,
    name
FROM
        bigquery-public-data.new_york.citibike_stations
WHERE
    station_id IN
    (
        SELECT
            start_station_id
        FROM
            bigquery-public-data.new_york.trips
        WHERE
            usertype = 'Subscriber'
    )

Use HAVING and CASE in subqueries.

HAVING: A SQL clause that adds a filter to a query instead of the underlying table that can only be used with aggregate functions.

Example:

SELECT
    Warehouse.warehouse_id,
    CONCAT(Warehouse.state, ': ', Warehouse.warehouse_alias) AS warehouse_name
    COUNT(Orders.order_id) AS number_of_orders,
    (
        SELECT
            COUNT(*)
        FROM warehouse_orders.Orders Orders
    )
    AS total_orders,
    CASE
        WHEN COUNT(Orders.order_id)/(SELECT COUNT(*) FROM warehouse_orders.Orders Orders) <= 0.20
        THEN "Fulfilled 0-20% of Orders"
        WHEN COUNT(Orders.order_id)/(SELECT COUNT(*) FROM warehouse_orders.Orders Orders) > 0.20
        AND COUNT(Orders.order_id)/(SELECT COUNT(*) FROM warehouse_orders.Orders Orders) <= 0.60
        THEN "Fulfilled 21-60% of Orders"
    ELSE "Fulfilled moree than 60% of Orders"
    END AS fulfillment_summary
FROM
    warehouse_orders.Warehouse Warehouse
LEFT JOIN
    warehouse_orders.Orders Orders ON
    Orders.warehouse_id = Warehouse.warehouse_id
GROUP BY
    Warehouse.warehouse_id,
    warehouse_name
HAVING 
    COUNT(Orders.order_id) > 0

results matching ""

    No results matching ""