Reversion to the Mean

Galton's Regression

  • Problem set up: Like father like son?
    • Galton collected data on the heights of parantes and their adult children.
      • Made a scatter plot
      • Found the line of best fit.
    • He expected that the line would be a 45 degree line:
      • Intercept should be 0
      • Slope should be 1
    • That seems reasonable at first thought because it implies that our best guess about the height of a child would be their parents' height.
  • However, the line of best fit is not a 45 degree line.
    • The slope was less than 1 (but positive)
    • Explained:
      • Tall parents tend to have children that are taller than average but shorter than them.
      • Short parents tend to have children that are shorter than average but taller than them.
    • Galton referred to this phenomenon as regression to mediocrity.
  • Why regression to mediocrity?
    • Key: a person's height is determined by multiple things.
    • For simplified analysis, suppose that height is only influenced by:
      • Genes from parents
      • The temperature of the day they were born
    • Now, Height = Genes + Temperature. This equation implies that gene effect and temperature effect are independent of each other.
    • If we find someone extremely high, the only way they can be so is that they have high genes from their parents and were born on a super hot day.
      • However, their children are more likely to be born on a day with average temperature, so they will be shorter than their parents.
      • But since their children still inherit the high genes, they will be taller than average.
  • Signal + Noise = Reversion to Mean:
    • Any outcome that is partly a function of Signal (gene) and noise (temperature).
    • Extreme observations probably arise from both extreme signals and extreme noise. Estimate=Estimand+Bias+Noise {\color{red}{\text{Estimate}}}= {\color{blue}{\text{Estimand}}} + {\color{purple}{\text{Bias}}} +{\color{green}{\text{Noise}}}
      • We will ignore Bias for now.
      • In our setting, the estimate is the height, and the signal is the genes, which is the estimand. The noise is just the temperature. So, Height=Genes+Temperature {\color{red}{\text{Height}}}={\color{blue}{\text{Genes}}}+{\color{green}{\text{Temperature}}}

Reversion to Mean Examples

  • Golf: A player who scored higher in round 1 is more likely to score higher or lower in round 2?
    • Those players are likely to score lower in round 2, but still higher than average.
    • This is because their score = skills + luck.
  • Medicine: Does vitamin C really help us get recovered from cold?
    • No!
    • Even though we feel better after taking vitamin C, it is just because we are getting better anyway.
  • Where should we not expect reversion to mean?

results matching ""

    No results matching ""