A Useful Equation - A Framework for Learning Things about the World

Expectations Notation

Treatment effect for some individual and Expectations
- Treatment effect: $Y_1-Y_0$
- Treatment effects of individuals in a group (index notation): $Y_{1_i}-Y_{0_i}$
- The Average Treatment Effect for a population or ATE: $E[Y_{1_i}-Y_{0_i}]$
  
  Formally, the expectation is the mean of some variable, so if we could randomly sample a very large number of people from a population (maybe even the entire population), then the average of those draws would equal the expectations.
- When we look to data, we try and estimate the expectation by taking averages. i.e., finding the average treatment effect (ATE).
  - Why ATE:
    - Good: it is interest to research, policy, or organizational question
    - Less good: easier to think about and learn about
Conditional Expectations: $E[X|C]$ it means the expectation of some property $X$ of the population given the condition $C$

The useful Equation $\text{Estimate}=\text{Estimand}+\text{Bias}+\text{Noise}$ Another way to put the equation: $\text{Correlation}=\text{Causation}+\text{Bias}+\text{Noise}$
- Estimate: What we see in the data
- Estimand: What we are interested in knowing
- Bias: The causal inference problem (often but not always)
- Noise: The statistical inference problem
Estimand, Estimator, and Estimate
- Estimand: the thing we want to measure
- Estimator: the procedure we use to generate our estimate
- Estimate: A "guess" of the value of the estimand, formed by some method (i.e., the estimator)
The Estimand for a causal claim is the ATE: $E[Y_{1_i}-Y_{0_i}]$
Properties of Estimators
- Bias: the estimator is systematically wrong on average.
  - If we ran it agian on different people/units, we would just be wrong again.
  - Unbiased = True/Correct on average
- Precision: The more consistent the hypothetical estimates from repeating the estimator, the more precise the estimate.

Recall the useful equation, why do estimates $\neq$ $\neq$ estimand: $\text{Estimate}=\text{Estimand}+\text{Bias}+\text{Noise}$
- Bias: Misses a particular direction
- Noise: Spread
Source of Bias:
- Sample is not representative of population of interest
- Systematic measurement error
- Response bias
  - Social desirability bias
  - Demand effects
- For causal claims: confounding and reverse casuality
Noise:
- Sampling Variation
- Role of "luck" is a function of sample size
- As sample gets larger $\rightarrow$ Less noise