Reversion to the Mean
Galton's Regression
- Problem set up: Like father like son?
- Galton collected data on the heights of parantes and their adult children.
- Made a scatter plot
- Found the line of best fit.
- He expected that the line would be a 45 degree line:
- Intercept should be 0
- Slope should be 1
- That seems reasonable at first thought because it implies that our best guess about the height of a child would be their parents' height.
- However, the line of best fit is not a 45 degree line.
- The slope was less than 1 (but positive)
- Explained:
- Tall parents tend to have children that are taller than average but shorter than them.
- Short parents tend to have children that are shorter than average but taller than them.
- Galton referred to this phenomenon as regression to mediocrity.
- Why regression to mediocrity?
- Key: a person's height is determined by multiple things.
- For simplified analysis, suppose that height is only influenced by:
- Genes from parents
- The temperature of the day they were born
- Now, Height = Genes + Temperature. This equation implies that gene effect and temperature effect are independent of each other.
- If we find someone extremely high, the only way they can be so is that they have high genes from their parents and were born on a super hot day.
- However, their children are more likely to be born on a day with average temperature, so they will be shorter than their parents.
- But since their children still inherit the high genes, they will be taller than average.
- Signal + Noise = Reversion to Mean:
- Any outcome that is partly a function of Signal (gene) and noise (temperature).
- Extreme observations probably arise from both extreme signals and extreme noise.
Estimate=Estimand+Bias+Noise
- We will ignore Bias for now.
- In our setting, the estimate is the height, and the signal is the genes, which is the estimand. The noise is just the temperature. So,
Height=Genes+Temperature
Reversion to Mean Examples
- Golf: A player who scored higher in round 1 is more likely to score higher or lower in round 2?
- Those players are likely to score lower in round 2, but still higher than average.
- This is because their score = skills + luck.
- Medicine: Does vitamin C really help us get recovered from cold?
- No!
- Even though we feel better after taking vitamin C, it is just because we are getting better anyway.
- Where should we not expect reversion to mean?