Bias and Noise

Confounder

Confounder: a confounder is something directly affects both treatment status and outcome.

Other terms:

Common Cause: Common cause of outcome and treatment

Apples-to-oranges: Groups are not comparable in potential

Omitted Variable Bias (OVB): There is a missing variable

Three conditions make a variable a confounder:

Associated with the treatment

Cause of the outcome separate from any effects it has one the treatment

Not along a causal path between treatment and outcome (i.e., not a mediator/mechanism).

That is, it's not part of a casual chain like Exposure -> "Confounder" -> Outcome. It is something outside the casual chain.

Confounding is strictly a problem for causal questions.

It makes things look associated when there is no causal effect.

Irrelevant for description or prediction

Average Treatment Effects and Bias

The Ideal
- Potential Outcomes
  - $Y$ : the outcome we observe
  - $T$ : a binary variable indicating if treatement is taken
- $Y_{1_i}=$ outcome observed for unit $i$ if $T=1$ .
- $Y_{0_i}=$ outcome observed for unit $i$ if $T=0$ .
- $Y_{1_i}-Y_{0_i}$ is the casual effect of $T$ for unit $i$ - cannot be observed
- The Average Treatement Effect (ATE) is $E[Y_{1_i}-Y_{0_i}]$ .
- We can also think about ATE's for sub-populations $E[Y_{1_i}-Y_{0_i}\mid Z]$
- The casual effect among those who got the treatement: $\text{ATT}=E[Y_{1_i}-Y_{0_i}\mid T=1]$ $ATT = E [Y^{ 1^{ i } } - Y^{ 0^{ i } } ∣ T = 1]$
  - Expectations are linear, so we can re-write the formula as $E[Y_{1_i}-Y_{0_i}\mid T=1]=E[Y_{1_i}\mid T=1]-E[Y_{0_i}\mid T=1]$
  - However, $E[Y_{0_i}\mid T=1]$ is not an observable quantity. It is counterfactual.
- How does this quantity of interest relate to the quantity that we can estimate?
  - What we can estimate: $E[Y_1\mid T=1]-E[Y_0\mid T=0]$ $E [Y^{ 1 } ∣ T = 1] - E [Y^{ 0 } ∣ T = 0]$ .
    - This is our estimate -> a correlation
  - What we are interested in: $E[Y_1\mid T=1]-E[Y_0\mid T=1]$ $E [Y^{ 1 } ∣ T = 1] - E [Y^{ 0 } ∣ T = 1]$
    - This is our estimand -> a causal effect.
  - Since $\text{Correlation}=\text{True Effect}+\text{Bias}$ (ignore noise for now), we get $E[Y_1\mid T=1]-E[Y_0\mid T=0]=E[Y_1\mid T=1]-E[Y_0\mid T=1]+\text{Bias}$
  - So, we have $\text{Bias}=\text{What we estimate}-\text{What we are interested in}$ $\begin{aligned} \text{Bias}&=E[Y_1\mid T=1]-E[Y_0\mid T=0]-E[Y_1\mid T=1]+E[Y_0\mid T=1]\\&=E[Y_0\mid T=1]-E[Y_0\mid T=0] \end{aligned}$
  - In other words, if these two groups have different $Y_0$ 's, then we have bias.
  - The difference between average outcomes among the control group and the average outcome that we would have obtained among the treatment group had they not gotten the treatment. The bias is non-zero whenever $E[Y_0\mid T=1]\neq E[Y_0\mid T=0].$

Randomness and Omniscient Powers

What does "apples-to-apples" mean?
- For purpose of estimating the ATT, we need no difference of the following properties between treatment and control groups
  - on average
  - in $Y_0$
- Note that in general, $\text{ATT}\neq\text{ATE}$ . For comparison of the treatment and control to give an unbiased estimate of the ATE, we need a similar conditon on $Y_1$ .
What we would like to do: Randomize
- If we could randomize, then the only difference between units would be if they received treatment
- In particular, units would have the same potential units if we randomly assigned to two groups:
  - $E[Y_0\mid A]$ - expected outcome if not treated for population A
  - $E[Y_0\mid B]$ - expected outcome if not treated for population B.
  - $E[Y_0\mid A]-E[Y_0\mid B]=0$
- Just let $A=$ Treated and $B=$ Untreated, we have $\text{Bias}=E[Y_0\mid T=1]-E[Y_0\mid T=0]=0.$
- Confounding: DAGs:
  - Directed Acyclic Graph (DAG) are very useful for casual questions and help to identify confounding and other biases.
  - Graph = some points connected by lines
  - Directed = those lines are arrows that indicate causation
  - Acyclic: cannot follow arrows to get back where it is started.

Review of Terms

$\text{Estimate}=\text{Estimand}+\text{Bias}+\text{Noise}$

Terms
- Estimate: what we see in the data
- Estimand: what we are interested in knowing about
- Estimator: the procedure we use to generate our estimate
- Bias: Systematic error in estimate
- Noise: Random error in estimate
Properties of Estimators:
- Unbiasedness: An estimator is unbiased if, by repeating our estimation procedure over and over again an infinite number of times, the average value of our estimates would equal the estimand.
- Precision: An estimator is precise if, by repeating our estimation procedure over and over again, the various estimates would be close to each other.
  - Precision is a relative term (need to compare).
  - The more consistent the hypothetical estimates from repeating the estimator, the more precise the estimate.
- Efficiency: An estimate/estimator is efficient if, by repeating our estimation procedure over and over again, the various estimates would be close to the estimand on average.
  - The closer to the estimand, the more efficient the estimate
  - The combination of unbiasedness and precision imply efficiency, although an estimate can be efficient but biased, unbiased but inefficient, unbiased but imprecise, precise but biased, etc.

Law of Large Numbers (LLN)

Law of Large Numbers (LLN): The sample average can be arbitrarily close to the true population average by making the sample large enough.
- This implies that the average estimators (both full sample and discard 1 and take average) are unbiased.
- The speed by which they converge is the difference in efficiency.
- LLN ensures convergence when taking average of a large number of sample sizes.
Consistent Estimators:
- Consistency: An estimate/estimator is consistent if it converges exactly to the estimand as the sample size grows to infinity.
  - Estimators can be biased (for a finite sample size) but consistent.
  - They could also be unbiased but inconsistent, although this is less common.

Statistical Inference and Noise

Standard Error: the standard error is the stand deviation of the sampling distribution of our estimator.

Standard error measures: if we repeated our estimator an infinite number of time, how far would it be from the true estimand on average?

If the estimator is unbiased, standard error tells us how far our estimate is from the truth in espectation.

If our estimator is biased, then the standard error may not be particularly interesting, but it still gives us a measure of precision.

Everything has a standard error:

All estimates/statistics (numbers derived from samples from a population):

Single continuous measures (e.g. sample mean)

Single proportions (e.g. sample proportion)

Differences and ratio of proportions and continuous measures

Associations and measures of correlation

Regression coefficients

All are noisy

All have variability

All have a standard error, but with a different formula

Analytic Standard Error for a Proportion:

Suppose the sample size is $N$ and the outcome is a proportion $p$ , the standard error is approximately equal to $\sqrt{\dfrac{p(1-p)}{N}}.$

Bigger sample -> Lower standard error

Less underlying variation -> smaller standard error

Subtleties: We don't know $p$ !

We can use our estimate ( $\hat{p}$ ) of $p$ from the survey/sample to substitude for $p$ .

Problems arise if $N$ is really small/or $p$ is really close to $1$ or $0$ .

Problem is that our estimate $\hat{p}$ is misleading.

Module 5 Biases and Noises

Bias and Noise

Confounder

Average Treatment Effects and Bias

Randomness and Omniscient Powers

Review of Terms

Law of Large Numbers (LLN)

Statistical Inference and Noise

results matching ""

No results matching ""