Regression Discontinuity Design (RDD)

  • Small change difference -> Large treatment difference

Introduction

  • Regression Discontinuity Design (RDD) help us identify causal effects with reasonable assumptions in some specific circumstances:
    • Situations where abrupt cutoffs on a running variable split units into exposed/treated and unesposed/untreated groups
    • If there's just a little bit of randomness in the running variable, this can be thought of as a random-like assignment
      • Where we end up just below or just above the cutoff is "random"
      • So these groups near the threshold are very similar, don't differ systematically on potential confounders -> "apples-to-apples" comparison.
  • Terms:
    • XX is a continuous variable that determines whether or not the student receives the scholarship (treatment)
      • Focusing Variable or Running Variable: the continuous variable determining treatment assignment
    • We can always rescale XX to set the treshold at any value we like
      • Set the treshold at 0, so T=1T=1 for all X>0X>0 and T=0T=0 for all X<0X<0
  • Treatment Effects:
    • As always, one particular value of Y1Y_1 or Y0Y_0 exists for every observation in our sample.
    • If we could observe them, then we could calculate E[Y1iY0i]E[Y_{1_i}-Y_{0_i}]
    • Instead of trying to estimate E[Y1iY0i]E[Y_{1_i}-Y_{0_i}], we will, with some weaker assumptions, estimate E[Y1iY0iXi=0]E[Y_{1_i}-Y_{0_i}\mid X_i=0]
      • This is called the Local Average Treatment Effect (LATE): more generally, a unit with a running variable value right at the threshold for treatment.
    • To obtain LATE, we cannot just look at those with exactly Xi=0X_i=0. Instead, we run two regressions on treated and untreated "near" the threshold.
      • limX0+E[Y1X]\lim_{X\to0+}E[Y_1\mid X]: the limit of the expected value of the outcome in the treated group as we appraoch the threshold from above.
      • limX0E[Y0X]\lim_{X\to0-}E[Y_0\mid X]: the limit of expected value of the outcome in the untreated group as we approach the threshold from below.
      • Then, LSAT=limX0+E[Y1X]limX0E[Y0X] \text{LSAT}=\lim_{X\to0+}E[Y_1\mid X]-\lim_{X\to0-}E[Y_0\mid X]
      • With some assumptions, we can interpret this as the effect on the outcome of the exposure/treatment

The Continuity Assumption

  • For the LATE to equal the causal treatment effect (at the threshold), we must make the continuity assumption:
    • limX0+E[Y1iX]=limX0E[Y1iX]\lim_{X\to0+}E[Y_{1_i}\mid X]=\lim_{X\to0-}E[Y_{1_i}\mid X]
    • limX0E[Y0iX]=limX0+E[Y0iX]\lim_{X\to0-}E[Y_{0_i}\mid X]=\lim_{X\to0+}E[Y_{0_i}\mid X]
      • limX0E[Y1iX]\lim_{X\to0-}E[Y_{1_i}\mid X] and limX0+E[Y0iX]\lim_{X\to0+}E[Y_{0_i}\mid X] are unobserable, by the fundamental problem of casual inference (we can't observe both Y1iY_{1_i} and Y0iY_{0_i} for the same unit).
    • In other words, we should fit a smooth, continuous function for the potential outcomes across the threshold in both the treated and untreated groups.
      • No other risk factors for the outcome change sharply at threshold - otherwise any differences could be due to those rather than exposure (confounding).
  • Continuity Violation:
    • The running variable can be associated with the outcome (and potential outcomes)
    • Units can have some control over their running variable.
  • Assess the Continuity Assumption and Avoid Violations
    • Subject matter expertise:
    • Look at measruable pre-treatment characteristics and see if they are continuous at the threshold.
      • If instead there is bunching or clumping near the threshold on either side, many have a continuity issue.
      • Statistical tests for sorting

Methods to Estimate LATE

  • Naïve Binned Averages:
    • Calculating mean in small bins on either side of the threshold
    • Issue: choosing bin width
      • Has potentially large effect on LATE estimate
      • Assuming our outcome is correlated with our running variable (which it typically is), any nonzero bin width will bias our LATE
      • As we stretch further out, the average will be further biased from what it actually is right at the threshold.
  • Better choice: directly estimate limits using regression
  • Option 2: Local Linear Regression
    • Choose small bins on each side of the threshold and fit a straight line linear regression
      • Can be done in a single regression model with interaction terms
    • The coefficient of our exposure represents the jump (or discontinuity) in the graph to the right. This is the LATE.
    • Key assumption: does the outcome change linearly in these bins?
  • Option 3: Polynomial Regression
    • Fit a polynomial linear regression with a discontinuity at the threshold
      • Can use entire data or just small bins on either side
      • Can use different polynomials for each side
    • The coefficient of our exposure represents the jump (or discontinuity) in the graph to the right. This is the LATE.
    • Far more flexible, but also more complex.

results matching ""

    No results matching ""