Regression Discontinuity Design (RDD)

Small change difference -> Large treatment difference

Introduction

Regression Discontinuity Design (RDD) help us identify causal effects with reasonable assumptions in some specific circumstances:
- Situations where abrupt cutoffs on a running variable split units into exposed/treated and unesposed/untreated groups
- If there's just a little bit of randomness in the running variable, this can be thought of as a random-like assignment
  - Where we end up just below or just above the cutoff is "random"
  - So these groups near the threshold are very similar, don't differ systematically on potential confounders -> "apples-to-apples" comparison.
Terms:
- $X$ $X$ is a continuous variable that determines whether or not the student receives the scholarship (treatment)
  - Focusing Variable or Running Variable: the continuous variable determining treatment assignment
- We can always rescale $X$ $X$ to set the treshold at any value we like
  - Set the treshold at 0, so $T=1$ for all $X>0$ and $T=0$ for all $X<0$
Treatment Effects:
- As always, one particular value of $Y_1$ or $Y_0$ exists for every observation in our sample.
- If we could observe them, then we could calculate $E[Y_{1_i}-Y_{0_i}]$
- Instead of trying to estimate $E[Y_{1_i}-Y_{0_i}]$ $E [Y^{ 1^{ i } } - Y^{ 0^{ i } }]$ , we will, with some weaker assumptions, estimate $E[Y_{1_i}-Y_{0_i}\mid X_i=0]$ $E [Y^{ 1^{ i } } - Y^{ 0^{ i } } ∣ X^{ i } = 0]$
  - This is called the Local Average Treatment Effect (LATE): more generally, a unit with a running variable value right at the threshold for treatment.
- To obtain LATE, we cannot just look at those with exactly $X_i=0$ $X^{ i } = 0$ . Instead, we run two regressions on treated and untreated "near" the threshold.
  - $\lim_{X\to0+}E[Y_1\mid X]$ : the limit of the expected value of the outcome in the treated group as we appraoch the threshold from above.
  - $\lim_{X\to0-}E[Y_0\mid X]$ : the limit of expected value of the outcome in the untreated group as we approach the threshold from below.
  - Then, $\text{LSAT}=\lim_{X\to0+}E[Y_1\mid X]-\lim_{X\to0-}E[Y_0\mid X]$
  - With some assumptions, we can interpret this as the effect on the outcome of the exposure/treatment

The Continuity Assumption

For the LATE to equal the causal treatment effect (at the threshold), we must make the continuity assumption:
- $\lim_{X\to0+}E[Y_{1_i}\mid X]=\lim_{X\to0-}E[Y_{1_i}\mid X]$
- $\lim_{X\to0-}E[Y_{0_i}\mid X]=\lim_{X\to0+}E[Y_{0_i}\mid X]$ $lim^{ X \to 0 - } E [Y^{ 0^{ i } } ∣ X] = lim^{ X \to 0 + } E [Y^{ 0^{ i } } ∣ X]$
  - $\lim_{X\to0-}E[Y_{1_i}\mid X]$ and $\lim_{X\to0+}E[Y_{0_i}\mid X]$ are unobserable, by the fundamental problem of casual inference (we can't observe both $Y_{1_i}$ and $Y_{0_i}$ for the same unit).
- In other words, we should fit a smooth, continuous function for the potential outcomes across the threshold in both the treated and untreated groups.
  - No other risk factors for the outcome change sharply at threshold - otherwise any differences could be due to those rather than exposure (confounding).
Continuity Violation:
- The running variable can be associated with the outcome (and potential outcomes)
- Units can have some control over their running variable.
Assess the Continuity Assumption and Avoid Violations
- Subject matter expertise:
- Look at measruable pre-treatment characteristics and see if they are continuous at the threshold.
  - If instead there is bunching or clumping near the threshold on either side, many have a continuity issue.
  - Statistical tests for sorting

Methods to Estimate LATE

Naïve Binned Averages:
- Calculating mean in small bins on either side of the threshold
- Issue: choosing bin width
  - Has potentially large effect on LATE estimate
  - Assuming our outcome is correlated with our running variable (which it typically is), any nonzero bin width will bias our LATE
  - As we stretch further out, the average will be further biased from what it actually is right at the threshold.
Better choice: directly estimate limits using regression
Option 2: Local Linear Regression
- Choose small bins on each side of the threshold and fit a straight line linear regression
  - Can be done in a single regression model with interaction terms
- The coefficient of our exposure represents the jump (or discontinuity) in the graph to the right. This is the LATE.
- Key assumption: does the outcome change linearly in these bins?
Option 3: Polynomial Regression
- Fit a polynomial linear regression with a discontinuity at the threshold
  - Can use entire data or just small bins on either side
  - Can use different polynomials for each side
- The coefficient of our exposure represents the jump (or discontinuity) in the graph to the right. This is the LATE.
- Far more flexible, but also more complex.

Module 9 Regression Discontinuity Design

Regression Discontinuity Design (RDD)

Introduction

The Continuity Assumption

Methods to Estimate LATE

results matching ""

No results matching ""