Regression

From Potential Outcomes to Regression

The potential outcomes of every person in an experiment can be seen to be equal to the average potential outcomes in the population, plus a residual or idiosyncratic component: $Y_{0_i}=E[Y_0]+r_{0_i}$

$Y_{1_i}=E[Y_1]+r_{1_i}$

We can always express the observed outcome of person $i$ as follows: $Y_i=[Y_{1_i}\cdot T_i]+[Y_{0_i}\cdot(1-T_i)]$

$Y_i=[(E[Y_1]+r_{1_{i}})\cdot T_i]+[(E[Y_0]+r_{0_{i}})\cdot(1-T_i)]$

$Y_i=[T_{i}E[Y_1]+T_ir_{1_i}]+[(1-T)]$

$Y_i=T_iE[Y_1]+(1-T_i)E[Y_0]+T_ir_{1_i}+(1-T_i)r_{0_i}$

$Y_i=T_iE[Y_1]+E[Y_0]+(-T_i)E[Y_0]+T_ir_{1_i}+(1-T_i)r_{0_i}$

$Y_i=E[Y_0]+T_i(E[Y_1]-E[Y_0])+T_ir_{1_i}+(1-T_i)r_{0_i}$ Collecting terms, we have $Y_i={\color{red}{E[Y_0]}}+{\color{green}{[E[Y_1]-E[Y_0]]}}\cdot T_i+{\color{purple}{[r_{1_i}\cdot T_i}+r_{0_i}\cdot(1-T_i)]}$

We call the first term $\color{red}\alpha$ , the second term $\color{gree}\beta$ , and the third term $\color{purple}r_i$ , and notice that $\beta=\text{ATE}$ : $Y_i={\color{red}\alpha}+{\color{green}\beta}T_i+{\color{purple}r_i}$
So, the observed outcome has a LINEAR relationship with the treatment status, where the intercept of the line is the mean potential outcome when untreated, and the slope of the line is the ATE

Regression Mechanics

$Y_i=\alpha+\beta T_i+\gamma V_i+r_i$

$Y$ , $T$ , and $V$ are data in the world that we observe, and $\alpha$ , $\beta$ , and $\gamma$ are parameter.
These are the quantities we would like to estimate:
- $\alpha$ : the intercept or constant term
- $\beta$ : the effect of the treatment
- $\gamma$ : the effect of the control variable
Error term/Residual, $r$ $r$ , which we hope.assume is uncorrelated with $T$ $T$ .
- Effect of Treatment Vs. Effect of Control Variable
There is nothing in the equation that distinguishes the treatment variable from the control variable.
This distinction is conceptual and is driven by
- Research design
- Question
Typically, we don't actually care about the value of $\gamma$ , and it may or may not represent an effect of interest.
What's important is that $\beta$ $β$ is an effect of interest, and we have a research design that allows us to estimate it in an unbiased way.
- Interpreting Parameter:
Literally:
- Pretend that we know the data generating process (DGP)
- For each individual $i$ , thier outcome = common intercept + $\beta T_i$ + $\gamma X_i$ + random noise.
Linear Approximation of Relationship:
- Acknowledge that we don't know the DGP
- Still like to estimate $\beta$ as the average linear relationship between $Y$ and $T$ , controlling for $X$ .
Regression always gives us the best linear approximation to the conditional expectation function (BLACEF), which may or may not be interesting.
If $T$ $T$ is unrelated to the $Y_1$ $Y^{ 1 }$ 's and $Y_0$ $Y^{ 0 }$ 's after controlling for $X$ $X$ , then the BLACEF is the effect of $T$ $T$ on $Y$ $Y$ .
- Predictions and Errors:
When we run a regression, we are estimating the values of $\alpha$ $α$ , $\beta$ $β$ , and $\gamma$ $γ$ that gives us the best predictions of $Y$ $Y$ .
- Estimates $\hat{\alpha}$ , $\hat{\beta}$ , and $\hat{\gamma}$
Our prediction of $\hat{Y_i}$ $ Y^{ i } ^ $ for each individual is $\hat{\alpha}+\hat{\beta}T_i+\hat{\gamma}X_i$ $ α ^ + β ^ T^{ i } + γ ^ X^{ i }$ .
- Take our estimates of $\hat{\alpha}$ , $\hat{\beta}$ , and $\hat{\gamma}$ and plug in $T_i$ and $X_i$ to get a $\hat{Y_i}$ for each individual.
The error associated with each prediction can be written as $Y_i-\hat{Y_i}=Y_i-(\hat{\alpha}+\hat{\beta}T_i+\hat{\gamma}X_i)$
We could square this error and then sum these squares for the entire population.
This total squared error indicates the extent to which the regression fits the data, with a value of $0$ $0$ indicating a perfect fit, and higher values indicating worse fit.
- Fitting Regression
We have a measure of how well our estimates fit the data: Sum of Squared Errors.
We could compare two different sets of estimates and see which fits the data better.
More generally, we could try and fit the values $\hat{\alpha}$ $ α ^ $ , $\hat{\beta}$ $ β ^ $ , and $\hat{\gamma}$ $ γ ^ $ that minimize this squared errors.
- This is exactly what regression does: Ordinary Least Squares (OLS)
- Interpreting the Equation
Let's focus on a simple linear regresion: $Y=\alpha+\beta * X$
$\alpha$ $α$ : intercept, or the value of $Y$ $Y$ when all $X$ $X$ 's are $0$ $0$ .
- Sometimes all $X$ 's being zero is something meaningless, or an inappropriate extrapolation.
$\beta$ : slope, or how much the average $Y$ changes for a 1-unit change in the affiliated $X$ .
Parameters of the regression model: unknown quantities we have to estimate using our data in order to fit our proposed regression model. In this case, $\alpha$ and $\beta$
$(\alpha+\beta *X)$ $(α + β * X)$ is the predicted average outcome for any value of the $X$ $X$ 's
- Accuracy of the predictions depends on the fit of the model.

Omitted Variable Bias

Thought Experiment:
- $T$ : Treatment
- $X$ : Omitted Variable
- We might aks, hypothetically, how biased our results will be if we fail to control for $X$ .
- Long regression: includes $X$ : $Y_i=\alpha^L+\beta^LT_i+\gamma X_i+\epsilon_i$
- Short regression: omits $X$ : $Y_i=\alpha^S+\beta^ST_i+\xi_i$
- $\beta^S\neq\beta^L$ $β^{ S } \neq β^{ L }$ as long as $\gamma\neq0$ $γ \neq 0$ and $\text{Cov}(X,T)\neq0$ $Cov (X, T) \neq 0$ .
  - If $\gamma=0$ , then there is no need to include $X$ in the regression.
  - If $\text{Cov}(X,T)=0$ , then $\xi_i$ in the short regression is all the unexplained variation in $Y_i$ not captured by $\alpha^S+\beta^ST_i$ . If $X$ is unrelated to $T$ , then the unexplained variation include variation in $X_i$ . So, $\xi_i$ in the short regression will simply equal $\gamma X_i+\epsilon_i$ in the long regression.
- In other words, $\beta^S=\beta^L$ if $\gamma=0$ or $\text{Cov}(X,T)=0$ .
Quantifying the Bias
- Specifically, we can quantify the bias associated with failing to include $X$ $X$ in the regression. Consider $X_i=\tau+\pi T_i+\mu_i$
  - This is a regression of the control variable on the treatment variable.
    - $\pi$ is a measure of the correlation between $X$ and $T$ - it's the slope coefficient relating changes in $T$ to changes in $X$ .
    - This need not have a causal interpretation.
  - It turns out that the bias associated with excluding $V$ from the regression is $\beta^S-\beta^L=\pi\gamma$. We sometimes call this omitted variable bias.
- Omitted Variable Bias $\beta^S-\beta^L=\pi\gamma$
  - $\pi$ : relationship between $X$ and $T$ (control variable and treatment)
  - $\gamma$ : relatipnship between $X$ and $Y$ (control variable and outcome)
- The short regression, leaving out $X$ $X$ , will be biased if
  - The control variable $X$ is correlated with the treatment variable $T$ and
  - The control variable $X$ influences the outcome variable $Y$ .
OVB and Observability
- If we cannot observe $X$ , we cannot include it in the regression.
- The OVB equation gives us a way to think about the direction and extent of the bias:

	positive $\pi$	Negative $\pi$
Positive $\gamma$	$+$	$-$
Negative $\gamma$	$-$	$+$

Skepticism and Wrap-Up

Deal with Confounder
- In principle, we could collect data on each of these factors and try to include them in the regression, which would hopefully yield better estimates of the effect of interest.
- Every extends to regressions with many control variables.
Reason for Skepticism
- Regression-based causal inference is predicated on the assumption that when key observed variable have been made equal across treatment and control groups, selection bias from the things we can't see is also mostly eliminated.
- In order for regression with controls to allow us to estimate the treatment effect, we have to make an assumption about controlling for all relevant differences.
  - This assumption is sometimes called the conditional-independence assumption or the selection-on-observables assumption
- Two big problems:
  - How do we know we have controlled for all the relevant differences?
  - Some of the differences are unoservable (and do not have good proxies)
  - A meta-problem: we cannot test our assumptions

Module 7 Regression

Regression

From Potential Outcomes to Regression

Regression Mechanics

Omitted Variable Bias

Skepticism and Wrap-Up

results matching ""

No results matching ""