Using SQL to Clean Data

SELECT
    name
FROM
    table_name
INSERT INTO xxx ()
VALUES ()
UPDATE table_name
SET variable_nanme1 = 'value1'
WHERE variable_name2 = 'value2'

To avoid duplicated data, we can use the DISTINCT function.

DISTINCT: A keyword that is added to a SQL SELECT statement to retrieve only non-duplicate entries.

SELECT
    DISTINCT variable_name
FROM
    table_name

Use the LENGTH or LEN function to check if the text string variable matches.

SELECT
    LENGTH(variable_name) AS column_name
FROM
    table_nanme
SELECT
    variable_nanme
FROM
    table_name
WHERE
    LENGTH(coulumn_nanme) > number

Use SUBSTR function to select data we want.

SUBSTR: A SQL function that extracts a substring from a string variable.

SELECT
    DISTINCT variable_name
FROM
    table_name
WHERE
    SUBSTR(column_nanme,start_letter_position,number_of_letter) = 'value'

Use the TRIM function to remove extra spaces.

SELECT
    DISTINCT variable_name
FROM
    table_nanme
WHERE
    TRIM(variable_name) = 'value'

Typecasting: Converting data from one type to another.

Use CAST function to convert data into different data types.

CAST: A SQL function that converts data from one datatype to another.

Float: A number that contains a decimal.

SELECT
    CAST(variable_name AS FLOAT64)
FROM
    table_name
ORDER BY
    CAST(variable_name AS FLOAT64) DESC
SELECT
    CAST(date AS date) AS date_only, variable_name2
FROM
    table_name
WHERE
    date BETWEEN '2020-12-01' AND '2020-12-31'

Use CONCAT function to combine two columns.

CONCAT: A SQL function that adds strings together to create new text strings that can be used as unique keys.

SELECT
    CONCAT(varibale_name1,variable_name2) AS new_name
FROM
    table_name
WHERE
    variable = 'value'

When some information is missing, COALESCE function gives a secondary reference.

COALESCE: A SQL function that returns non-null values in a list.

SELECT
    COALESCE(variable_name1,variable_name2) AS new_name
FROM
    table_name
WHERE
    variable = 'value'

results matching ""

    No results matching ""