The Six Phases of Data Analysis
Phase 1: Ask
You need to identify your business task.
Business task: The question or problem data analysis resolves for a business.
You need to understand what should be down and what is expected.
Gap analysis: A method for examining and evaluating the current state of a process in order to identify opportunities for improvement in the future.
You need to identify your stakeholders.
Stakeholders: People who invest time and resources into a project and are interested in its outcome.
Phase 2: Prepare
You need to prepare your data and decide what data to use.
Data design: How information is organized.
When preparing data, we need to identify our data strategy.
Data strategy: The management of the people, processes, and tools used in data analysis.
Also, considering the concept of “Fairness” is also essential in this part.
Fairness: A quality of data analysis that does not create or reinforce bias.
When we realize that our data has some bias, we need to find a way to prevent the analysis of the data create or reinforce that bias.
E.g. We can increase the population of nighttime riders when finding the majority take the train in the daytime.
Phase 3: Process
Phase 4: Analysis
Using a Spreadsheet
You should consider to use a spreadsheet when examining a small amount of data.
Attribute: A characteristic or quality of data used to label a column in a table.
Observation: The attributes that describe a piece of data contained in a row of a table.
Formula: A set of instructions used to perform a calculation using the data in a spreadsheet.
Function: A preset command that automatically performs a process or task using the data in a spreadsheet.
Using a query language
Sometimes, you need to use a query language to access a database.
Query: A request for data or information from a database.
Query language: A computer programming language used to communicate with a database.
The most commonly-used query language is the SQL (Structured Query Language).
SELECT
#The range of the data
FROM
#The name of the the database
WHERE
#Additional requirement
Some times, we put a *
after SELECT
, meaning we select all data from the database that fulfills the additional requirement.
SELECT
*
FROM
movie
WHERE
movie_genre = "Action"
The code above selects all data from the database movie that have “Action” as their movie_genre.
Phase 5: Share
The share part focuses on visualizing the data to better present the results from the data analysis.
Data visualization: The graphical representation of data.
When sharing the data, we need to consider the context of the data
Context: The condition in which something exists or happens.
Root cause: The reason why a problem occurs.
We also need to select the ways of visualization wisely. Remember, visualizing the data makes the outcomes of the analysis more convincing and easy-to-follow. It also helps to present some interesting findings and trends directly.
Phase 6: Act
In this phase, the company should adjust the business strategy according to the conclusion drawn from the data analysis.
Data-driven decision-making: Using facts to guide business strategy.