Regression
From Potential Outcomes to Regression
- The potential outcomes of every person in an experiment can be seen to be equal to the average potential outcomes in the population, plus a residual or idiosyncratic component:
- We can always express the observed outcome of person as follows:
Collecting terms, we have
- We call the first term , the second term , and the third term , and notice that :
- So, the observed outcome has a LINEAR relationship with the treatment status, where the intercept of the line is the mean potential outcome when untreated, and the slope of the line is the ATE
Regression Mechanics
- , , and are data in the world that we observe, and , , and are parameter.
- These are the quantities we would like to estimate:
- : the intercept or constant term
- : the effect of the treatment
- : the effect of the control variable
- Error term/Residual, , which we hope.assume is uncorrelated with .
- Effect of Treatment Vs. Effect of Control Variable
- There is nothing in the equation that distinguishes the treatment variable from the control variable.
- This distinction is conceptual and is driven by
- Research design
- Question
- Typically, we don't actually care about the value of , and it may or may not represent an effect of interest.
- What's important is that is an effect of interest, and we have a research design that allows us to estimate it in an unbiased way.
- Interpreting Parameter:
- Literally:
- Pretend that we know the data generating process (DGP)
- For each individual , thier outcome = common intercept + + + random noise.
- Linear Approximation of Relationship:
- Acknowledge that we don't know the DGP
- Still like to estimate as the average linear relationship between and , controlling for .
- Regression always gives us the best linear approximation to the conditional expectation function (BLACEF), which may or may not be interesting.
- If is unrelated to the 's and 's after controlling for , then the BLACEF is the effect of on .
- Predictions and Errors:
- When we run a regression, we are estimating the values of , , and that gives us the best predictions of .
- Estimates , , and
- Our prediction of for each individual is .
- Take our estimates of , , and and plug in and to get a for each individual.
- The error associated with each prediction can be written as
- We could square this error and then sum these squares for the entire population.
- This total squared error indicates the extent to which the regression fits the data, with a value of indicating a perfect fit, and higher values indicating worse fit.
- Fitting Regression
- We have a measure of how well our estimates fit the data: Sum of Squared Errors.
- We could compare two different sets of estimates and see which fits the data better.
- More generally, we could try and fit the values , , and that minimize this squared errors.
- This is exactly what regression does: Ordinary Least Squares (OLS)
- Interpreting the Equation
- Let's focus on a simple linear regresion:
- : intercept, or the value of when all 's are .
- Sometimes all 's being zero is something meaningless, or an inappropriate extrapolation.
- : slope, or how much the average changes for a 1-unit change in the affiliated .
- Parameters of the regression model: unknown quantities we have to estimate using our data in order to fit our proposed regression model. In this case, and
- is the predicted average outcome for any value of the 's
- Accuracy of the predictions depends on the fit of the model.
Omitted Variable Bias
- Thought Experiment:
- : Treatment
- : Omitted Variable
- We might aks, hypothetically, how biased our results will be if we fail to control for .
- Long regression: includes :
- Short regression: omits :
- as long as and .
- If , then there is no need to include in the regression.
- If , then in the short regression is all the unexplained variation in not captured by . If is unrelated to , then the unexplained variation include variation in . So, in the short regression will simply equal $\gamma X_i+\epsilon_i$ in the long regression.
- In other words, if $\gamma=0$ or .
- Quantifying the Bias
- Specifically, we can quantify the bias associated with failing to include in the regression. Consider
- This is a regression of the control variable on the treatment variable.
- is a measure of the correlation between and - it's the slope coefficient relating changes in to changes in .
- This need not have a causal interpretation.
- It turns out that the bias associated with excluding from the regression is $\beta^S-\beta^L=\pi\gamma$. We sometimes call this omitted variable bias.
- This is a regression of the control variable on the treatment variable.
- Omitted Variable Bias
- : relationship between and (control variable and treatment)
- : relatipnship between and (control variable and outcome)
- The short regression, leaving out , will be biased if
- The control variable is correlated with the treatment variable and
- The control variable influences the outcome variable .
- Specifically, we can quantify the bias associated with failing to include in the regression. Consider
- OVB and Observability
- If we cannot observe , we cannot include it in the regression.
- The OVB equation gives us a way to think about the direction and extent of the bias:
positive | Negative | |
---|---|---|
Positive | ||
Negative |
Skepticism and Wrap-Up
- Deal with Confounder
- In principle, we could collect data on each of these factors and try to include them in the regression, which would hopefully yield better estimates of the effect of interest.
- Every extends to regressions with many control variables.
- Reason for Skepticism
- Regression-based causal inference is predicated on the assumption that when key observed variable have been made equal across treatment and control groups, selection bias from the things we can't see is also mostly eliminated.
- In order for regression with controls to allow us to estimate the treatment effect, we have to make an assumption about controlling for all relevant differences.
- This assumption is sometimes called the conditional-independence assumption or the selection-on-observables assumption
- Two big problems:
- How do we know we have controlled for all the relevant differences?
- Some of the differences are unoservable (and do not have good proxies)
- A meta-problem: we cannot test our assumptions