Ways to Clean Data

Clean data: Data that is complete, correct, and relevant to the problem being solved.

Dirty data: Data that is incomplete, incorrect, or irrelevant to problem to be solved.

Data mapping: The process of matching fields from one data source to another.

Data merging: The process of combing two or more datasets into a single dataset.

Data validation: A tool for checking the accuracy and quality of data.

Why Cleaning Data is Important?

Compatibility: How well two or more datasets are able to work together.

Duplicate data: Any record that inadvertently shares data with another record.

Incomplete data: Data that is important fields.

Inconsistent data: Data that uses different formats to represent the same thing.

Incorrect/Inaccurate data: Data that is complete but inaccurate.

Outdated data: Any data that has been superseded by newer and more accurate information.

Null: An indication that a value does not exist in a dataset.

Data engineer: A professional who transforms data into a useful format for analysis and gives it a reliable infrastructure.

Data warehousing specialist: A professional who develops processes and procedures to effectively store and organize data.

Merger: An agreement that unites two organizations into a single new one.

Cleaning Data Using Spreadsheets

Syntax: The predetermined structure of a language that includes all required words, symbols, and punctuation, as well as their proper placement.

Conditional formatting: A spreadsheet tool that changes how cells appear when values meet specific conditions.

Remove duplicates: A spreadsheet tool that automatically searches for and eliminates duplicate entries from a spreadsheet.

Split: A function that divides text around a specified character and puts each fragment into a new, separate cell.

CONCATENATE: A spreadsheet function that joins together two or more text strings.

COUNTIF: A spreadsheet function that returns the number of cells that match a specified value.

Text string: A group of characters within a cell, most often composed of letters.

Substring: A smaller subset of a text string.

LEFT: A function that returns a set number of characters from the left side of a text string.

RIGHT: A function that returns a set number of characters from the right side of a text string.

Length: The number of characters in a text string.

LEN: A function that returns the length of a text string by counting the number of characters it contains.

MID: A function that returns a segment from the middle of a text string.

TRIM: A function that removes leading, trailing, and repeated spaces in data.

Delimiter: A character that indicates the beginning or end of a data item.

Field length: A tool for determining how many characters can be keyed into a spreadsheet field.

results matching ""

    No results matching ""