Bias and Noise
Confounder
Confounder: a confounder is something directly affects both treatment status and outcome.
- Other terms:
- Common Cause: Common cause of outcome and treatment
- Apples-to-oranges: Groups are not comparable in potential
- Omitted Variable Bias (OVB): There is a missing variable
- Three conditions make a variable a confounder:
- Associated with the treatment
- Cause of the outcome separate from any effects it has one the treatment
- Not along a causal path between treatment and outcome (i.e., not a mediator/mechanism).
- That is, it's not part of a casual chain like Exposure -> "Confounder" -> Outcome. It is something outside the casual chain.
- Confounding is strictly a problem for causal questions.
- It makes things look associated when there is no causal effect.
- Irrelevant for description or prediction
Average Treatment Effects and Bias
- The Ideal
- Potential Outcomes
- : the outcome we observe
- : a binary variable indicating if treatement is taken
- outcome observed for unit if .
- outcome observed for unit if .
- is the casual effect of for unit - cannot be observed
- The Average Treatement Effect (ATE) is .
- We can also think about ATE's for sub-populations
- The casual effect among those who got the treatement:
- Expectations are linear, so we can re-write the formula as
- However, is not an observable quantity. It is counterfactual.
- How does this quantity of interest relate to the quantity that we can estimate?
- What we can estimate: .
- This is our estimate -> a correlation
- What we are interested in:
- This is our estimand -> a causal effect.
- Since (ignore noise for now), we get
- So, we have
- In other words, if these two groups have different 's, then we have bias.
- The difference between average outcomes among the control group and the average outcome that we would have obtained among the treatment group had they not gotten the treatment. The bias is non-zero whenever
- What we can estimate: .
- Potential Outcomes
Randomness and Omniscient Powers
- What does "apples-to-apples" mean?
- For purpose of estimating the ATT, we need no difference of the following properties between treatment and control groups
- on average
- in
- Note that in general, . For comparison of the treatment and control to give an unbiased estimate of the ATE, we need a similar conditon on .
- For purpose of estimating the ATT, we need no difference of the following properties between treatment and control groups
- What we would like to do: Randomize
- If we could randomize, then the only difference between units would be if they received treatment
- In particular, units would have the same potential units if we randomly assigned to two groups:
- - expected outcome if not treated for population A
- - expected outcome if not treated for population B.
- Just let Treated and Untreated, we have
- Confounding: DAGs:
- Directed Acyclic Graph (DAG) are very useful for casual questions and help to identify confounding and other biases.
- Graph = some points connected by lines
- Directed = those lines are arrows that indicate causation
- Acyclic: cannot follow arrows to get back where it is started.
Review of Terms
- Terms
- Estimate: what we see in the data
- Estimand: what we are interested in knowing about
- Estimator: the procedure we use to generate our estimate
- Bias: Systematic error in estimate
- Noise: Random error in estimate
- Properties of Estimators:
- Unbiasedness: An estimator is unbiased if, by repeating our estimation procedure over and over again an infinite number of times, the average value of our estimates would equal the estimand.
- Precision: An estimator is precise if, by repeating our estimation procedure over and over again, the various estimates would be close to each other.
- Precision is a relative term (need to compare).
- The more consistent the hypothetical estimates from repeating the estimator, the more precise the estimate.
- Efficiency: An estimate/estimator is efficient if, by repeating our estimation procedure over and over again, the various estimates would be close to the estimand on average.
- The closer to the estimand, the more efficient the estimate
- The combination of unbiasedness and precision imply efficiency, although an estimate can be efficient but biased, unbiased but inefficient, unbiased but imprecise, precise but biased, etc.
Law of Large Numbers (LLN)
- Law of Large Numbers (LLN): The sample average can be arbitrarily close to the true population average by making the sample large enough.
- This implies that the average estimators (both full sample and discard 1 and take average) are unbiased.
- The speed by which they converge is the difference in efficiency.
- LLN ensures convergence when taking average of a large number of sample sizes.
- Consistent Estimators:
- Consistency: An estimate/estimator is consistent if it converges exactly to the estimand as the sample size grows to infinity.
- Estimators can be biased (for a finite sample size) but consistent.
- They could also be unbiased but inconsistent, although this is less common.
- Consistency: An estimate/estimator is consistent if it converges exactly to the estimand as the sample size grows to infinity.
Statistical Inference and Noise
Standard Error: the standard error is the stand deviation of the sampling distribution of our estimator.
- Standard error measures: if we repeated our estimator an infinite number of time, how far would it be from the true estimand on average?
- If the estimator is unbiased, standard error tells us how far our estimate is from the truth in espectation.
- If our estimator is biased, then the standard error may not be particularly interesting, but it still gives us a measure of precision.
- Everything has a standard error:
- All estimates/statistics (numbers derived from samples from a population):
- Single continuous measures (e.g. sample mean)
- Single proportions (e.g. sample proportion)
- Differences and ratio of proportions and continuous measures
- Associations and measures of correlation
- Regression coefficients
- All are noisy
- All have variability
- All have a standard error, but with a different formula
- Analytic Standard Error for a Proportion:
- Suppose the sample size is and the outcome is a proportion , the standard error is approximately equal to
- Bigger sample -> Lower standard error
- Less underlying variation -> smaller standard error
- Subtleties: We don't know !
- We can use our estimate () of from the survey/sample to substitude for .
- Problems arise if is really small/or is really close to or .
- Problem is that our estimate is misleading.