Source of Data
Bad data source: A data source that is not reliable, original, comprehensive, current, and cited (ROCCC).
Good data source: A data source that is reliable, original, comprehensive, current, and cited (ROCCC).
Internal data: Data that lives within a company’s own systems.
External data: Data that lives and is generated outside of an organization.
Open data: Data that is available to the public.
First-party data: Data collected by an individual or group using their own resources.
Second-party data: Data collected by a group directly from its audience and then sold.
Third-party data: Data provided from outside sources who didn’t collect it directly.
Biases in Collecting Data
Bias: A conscious or subconscious preference in favor of or against a person, group of people, or thing
Different biases
Confirmation bias: The tendency to search for or interpret information in a way that confirms pre-existing beliefs.
Data bias: When a preference in favor of or against a person, group of people, or thing systematically skews data analysis results in a certain direction.
Observer bias/Experimenter bias: The tendency for different people to observe things differently.
Interpretation bias: The tendency to interpret ambiguous situations in a positive or negative way.
Sampling bias: Over-representing or under-representing certain members of a population as a result of working with a sample that is not representative of the population as a whole.
Population: In data analytics, all possible data values in a dataset.
Sample: In data analytics, a segment of a population that is representative of the entire population.
Unbiased sampling: When the sample of the population being measured is representative of the population as a whole.
Data Privacy and Data Ethics
Data privacy: Preserving a data subject’s information any time a data transaction occurs.
Data ethics: Well-founded standards of right and wrong that dictate how data is collected, shared, and used.
Ethics: Well-founded standards of right and wrong that prescribe what humans ought to do, usually in terms of rights, obligations, benefits to society, fairness, or specific virtues.
Consent: The aspect of data ethics that presumes an individual’s right to know how and why their personal data will be used before agreeing to provide it.
Currency: The aspect of data ethics that presumes individuals should be aware of financial transactions resulting from the sue of their personal data and the scale of those transactions.
Openness: The aspect of data ethics that promotes the free access, usage, and sharing of data.
Ownership: The aspect of data ethics that presumes individuals own the raw data they provide and have primary control over its usage, processing ,and sharing.
Transaction transparency: The aspect of data ethics that presumes all data-processing activities and algorithms should be explainable and understood by the individual who provides the data.
Data anonymization: The process of protecting people’s private or sensitive data by eliminating identifying information.
Data governance: A process for ensuring the formal management of a company’s data assets.
Data interoperability: The ability to integrate data from multiple sources and a key factor in the successful use of open data among companies and governments.