Bias and Noise

Confounder

Confounder: a confounder is something directly affects both treatment status and outcome.

  • Other terms:
    • Common Cause: Common cause of outcome and treatment
    • Apples-to-oranges: Groups are not comparable in potential
    • Omitted Variable Bias (OVB): There is a missing variable
  • Three conditions make a variable a confounder:
    • Associated with the treatment
    • Cause of the outcome separate from any effects it has one the treatment
    • Not along a causal path between treatment and outcome (i.e., not a mediator/mechanism).
    • That is, it's not part of a casual chain like Exposure -> "Confounder" -> Outcome. It is something outside the casual chain.
  • Confounding is strictly a problem for causal questions.
    • It makes things look associated when there is no causal effect.
    • Irrelevant for description or prediction

Average Treatment Effects and Bias

  • The Ideal
    • Potential Outcomes
      • YY: the outcome we observe
      • TT: a binary variable indicating if treatement is taken
    • Y1i=Y_{1_i}= outcome observed for unit ii if T=1T=1.
    • Y0i=Y_{0_i}= outcome observed for unit ii if T=0T=0.
    • Y1iY0iY_{1_i}-Y_{0_i} is the casual effect of TT for unit ii - cannot be observed
    • The Average Treatement Effect (ATE) is E[Y1iY0i]E[Y_{1_i}-Y_{0_i}].
    • We can also think about ATE's for sub-populations E[Y1iY0iZ]E[Y_{1_i}-Y_{0_i}\mid Z]
    • The casual effect among those who got the treatement: ATT=E[Y1iY0iT=1]\text{ATT}=E[Y_{1_i}-Y_{0_i}\mid T=1]
      • Expectations are linear, so we can re-write the formula as E[Y1iY0iT=1]=E[Y1iT=1]E[Y0iT=1]E[Y_{1_i}-Y_{0_i}\mid T=1]=E[Y_{1_i}\mid T=1]-E[Y_{0_i}\mid T=1]
      • However, E[Y0iT=1]E[Y_{0_i}\mid T=1] is not an observable quantity. It is counterfactual.
    • How does this quantity of interest relate to the quantity that we can estimate?
      • What we can estimate: E[Y1T=1]E[Y0T=0]E[Y_1\mid T=1]-E[Y_0\mid T=0].
        • This is our estimate -> a correlation
      • What we are interested in: E[Y1T=1]E[Y0T=1]E[Y_1\mid T=1]-E[Y_0\mid T=1]
        • This is our estimand -> a causal effect.
      • Since Correlation=True Effect+Bias\text{Correlation}=\text{True Effect}+\text{Bias} (ignore noise for now), we get E[Y1T=1]E[Y0T=0]=E[Y1T=1]E[Y0T=1]+Bias E[Y_1\mid T=1]-E[Y_0\mid T=0]=E[Y_1\mid T=1]-E[Y_0\mid T=1]+\text{Bias}
      • So, we have Bias=What we estimateWhat we are interested in \text{Bias}=\text{What we estimate}-\text{What we are interested in} Bias=E[Y1T=1]E[Y0T=0]E[Y1T=1]+E[Y0T=1]=E[Y0T=1]E[Y0T=0] \begin{aligned} \text{Bias}&=E[Y_1\mid T=1]-E[Y_0\mid T=0]-E[Y_1\mid T=1]+E[Y_0\mid T=1]\\&=E[Y_0\mid T=1]-E[Y_0\mid T=0] \end{aligned}
      • In other words, if these two groups have different Y0Y_0's, then we have bias.
      • The difference between average outcomes among the control group and the average outcome that we would have obtained among the treatment group had they not gotten the treatment. The bias is non-zero whenever E[Y0T=1]E[Y0T=0].E[Y_0\mid T=1]\neq E[Y_0\mid T=0].

Randomness and Omniscient Powers

  • What does "apples-to-apples" mean?
    • For purpose of estimating the ATT, we need no difference of the following properties between treatment and control groups
      • on average
      • in Y0Y_0
    • Note that in general, ATTATE\text{ATT}\neq\text{ATE}. For comparison of the treatment and control to give an unbiased estimate of the ATE, we need a similar conditon on Y1Y_1.
  • What we would like to do: Randomize
    • If we could randomize, then the only difference between units would be if they received treatment
    • In particular, units would have the same potential units if we randomly assigned to two groups:
      • E[Y0A]E[Y_0\mid A] - expected outcome if not treated for population A
      • E[Y0B]E[Y_0\mid B] - expected outcome if not treated for population B.
      • E[Y0A]E[Y0B]=0E[Y_0\mid A]-E[Y_0\mid B]=0
    • Just let A=A= Treated and B=B= Untreated, we have Bias=E[Y0T=1]E[Y0T=0]=0. \text{Bias}=E[Y_0\mid T=1]-E[Y_0\mid T=0]=0.
    • Confounding: DAGs:
      • Directed Acyclic Graph (DAG) are very useful for casual questions and help to identify confounding and other biases.
      • Graph = some points connected by lines
      • Directed = those lines are arrows that indicate causation
      • Acyclic: cannot follow arrows to get back where it is started.

Review of Terms

Estimate=Estimand+Bias+Noise \text{Estimate}=\text{Estimand}+\text{Bias}+\text{Noise}

  • Terms
    • Estimate: what we see in the data
    • Estimand: what we are interested in knowing about
    • Estimator: the procedure we use to generate our estimate
    • Bias: Systematic error in estimate
    • Noise: Random error in estimate
  • Properties of Estimators:
    • Unbiasedness: An estimator is unbiased if, by repeating our estimation procedure over and over again an infinite number of times, the average value of our estimates would equal the estimand.
    • Precision: An estimator is precise if, by repeating our estimation procedure over and over again, the various estimates would be close to each other.
      • Precision is a relative term (need to compare).
      • The more consistent the hypothetical estimates from repeating the estimator, the more precise the estimate.
    • Efficiency: An estimate/estimator is efficient if, by repeating our estimation procedure over and over again, the various estimates would be close to the estimand on average.
      • The closer to the estimand, the more efficient the estimate
      • The combination of unbiasedness and precision imply efficiency, although an estimate can be efficient but biased, unbiased but inefficient, unbiased but imprecise, precise but biased, etc.

Law of Large Numbers (LLN)

  • Law of Large Numbers (LLN): The sample average can be arbitrarily close to the true population average by making the sample large enough.
    • This implies that the average estimators (both full sample and discard 1 and take average) are unbiased.
    • The speed by which they converge is the difference in efficiency.
    • LLN ensures convergence when taking average of a large number of sample sizes.
  • Consistent Estimators:
    • Consistency: An estimate/estimator is consistent if it converges exactly to the estimand as the sample size grows to infinity.
      • Estimators can be biased (for a finite sample size) but consistent.
      • They could also be unbiased but inconsistent, although this is less common.

Statistical Inference and Noise

Standard Error: the standard error is the stand deviation of the sampling distribution of our estimator.

  • Standard error measures: if we repeated our estimator an infinite number of time, how far would it be from the true estimand on average?
  • If the estimator is unbiased, standard error tells us how far our estimate is from the truth in espectation.
  • If our estimator is biased, then the standard error may not be particularly interesting, but it still gives us a measure of precision.
    • Everything has a standard error:
  • All estimates/statistics (numbers derived from samples from a population):
    • Single continuous measures (e.g. sample mean)
    • Single proportions (e.g. sample proportion)
    • Differences and ratio of proportions and continuous measures
    • Associations and measures of correlation
    • Regression coefficients
  • All are noisy
  • All have variability
  • All have a standard error, but with a different formula
    • Analytic Standard Error for a Proportion:
  • Suppose the sample size is NN and the outcome is a proportion pp, the standard error is approximately equal to p(1p)N. \sqrt{\dfrac{p(1-p)}{N}}.
  • Bigger sample -> Lower standard error
  • Less underlying variation -> smaller standard error
    • Subtleties: We don't know pp!
  • We can use our estimate (p^\hat{p}) of pp from the survey/sample to substitude for pp.
  • Problems arise if NN is really small/or pp is really close to 11 or 00.
  • Problem is that our estimate p^\hat{p} is misleading.

results matching ""

    No results matching ""