Prediction

What is prediction? Why is it different?

Prediction is everywhere:
- Basically all modern AI system employ prediction at their core
- Success of AI is largely a function of three advances:
  - Computation
  - Data
  - Algorithms
Some algorithms/Forms of Prediction:
- Machine Learning
  - Linear regression, logistic regression, PCA, LASSO, random forests, K-NN, etc.
- Artificial Intelligence
- Deep Learning
Why prediction?
- Prediction seeks to find a best guess for an outcome given some meaningfully associated data.
  - e.g. Stock markets
Why Prediction?
- What is machine learning?
  - The study of algorithms that improve through repeated experience
  - Given observations of an outcome and some important correlations, build models that predict outcomes.
  - Machine learning is mostly concerned with prediction.
Prediction vs. Explanatory Modeling:
- Different in goals:
  - Explanatory Modeling (Causal Inference): What is the effect of $X$ on $Y?$
  - Prediction: Can we forecast what $Y$ will be given $X$ ?
- Casual Inference: We want to know the effect of $X$ $X$ on $Y$ $Y$ , so we
  - Run an experiment randomizing $X$
  - Use a DiD or Regression Discontinuity
  - Run a regression controlling for all possible confounders.
  - Use modeling to estimate unobservable potential outcomes, control for confounders, and quantify uncertainty around our estimated effect.
- Predictive Recipe: find a model for $X$ $X$ and $Y$ $Y$ that minimizes my prediction error $\text{Prediction Error}=\sum_{i=1}^Nf(y_i-\hat{y}_i).$
  - In words, over $N$ observations, how far off from the truth amd I with the predictions produced by my model?
- Explanation can aid in some types of prediction problems but is not necessary in many cases.
- Even if we do not know the causal effect (or even if the causal effect is zero), there might be predictive information in knowing a treatment status.
  - They key is that these features contain information about the outcome we are trying to predict.
  - Rule of Thumb: avoid noise, add new information
- The main predictive question:
  - Can I predict the value of $Y$ given some correlates for an observation that was not included in the modeling step?
  - Sometimes, we care about future observations that have not yet occurred (forecasting).
  - Other times, we care about with-in sample prediction ability.
Types of Prediction.
- Static Prediction
  - Key assumption: Environment is static (does not change)
    - No policy change
    - No Adaptation
  - Dynamic prediction:
    - Environment may change due to
      - Adaptation
      - policy change

Good and Bad Prediction

What makes for good predictors:
- Avoid Redundancy
- Avoid pure noise
- Precise information reduces noise
- Predictors from a variety of domains provide different information
Evaluating Predictive Performance:
- The higher order polynomial, the less error we have in the training data.
- Overfitting occurs when we tune a predictive model only on the data we observe.
- Since the goal of prediction is to minimize the prediction errors, a model that only accounts for the observed data will produce fits that are too specific for general prediction.
- We need to balance fitting the observed data with generality - favor parsimonious models over complex models while still avoiding underfitting.
Overfitting/Bias-Variance Trade-off
- Overfitting: aka the bias-variance trade-off in machine learning parlance.
  - Overreacts to every tiny random perturbation in the data.
  - Excellent, maybe perfect fit to in-sample data
  - Models noise -> won't extrapolate/generalize well to new out-of-sample data.
- Underfitting: misses the true patterns/ relationships in the data (bias)
  - Underreacts to real changes
  - Won't describe in-sample or out-of-sample data well.
- Overfitting can occur in two circumstances:
  - Model is too flexible to observed data
  - Model includes many predictors that are only loosely related to the outcome - especially problematic when the predictors are correlated with one another (kitchen sink models).

How to avoid overfitting?

Evaluating Predictive Performance:
- The prediction recipe:
  - Collect predictors and outcomes for some number of observations.
  - Hold out some portion of the observations as a test set.
  - Find the optimal coefficients for the predictive model on the training subset of the data.
  - Evaluate the prediction error on the test set.
- Sometimes, there is a natural training/test split: e.g. time intervals
- Other times, we need to randomly split the data into training and test sets. e.g. randomly hold-out 20% of the data as a test set and fit the model on the other 80%. Quantify predictive performance on the held-out test set.
  - This is better than including all data in predictive model at first because it guards against overfitting to the data - if the model fits the training set well, but doesn't fit the test set, then the model predictions aren't very good.
Cross-Validation:
- The cross-validation prediction recipe:
  - Collect predictors and outcomes for some number of observations.
  - Divide the data into $K$ equally sized folds.
  - Treating fold $1$ as the test set, find the predictive model using the training set that minimizes prediction error in fold $1$ .
  - Repeat for each fold, treating them as the test set in turn.
  - Average the predictions of each "best model" to get the K-fold cross-validated prediction.
- K-fold cross validation addresses both problem of selecting test set and overfitting by ensuring that all observations are held-out at one point or another.
- Prevents overfitting by averaging over the models - what's good for one may not be good for another, so find a happy medium by averaging them all together.
Avoid Spurious Predictors
- Overfitting can also happen if our model includes many predictors that are only loosely related to the outcome.
  - Espeically problematic when the predictors are correlated with one another (kitchen sink models).
- Is the predictor really related to $Y$ or is the correlation in this data significant by chance/noise?
Signal vs. Noise
- Let $Z$ be an unimportant predictor
- When finding the model that minimizes prediction error, random chance dictates that there will be a relationship between $Z$ and $Y$ .
- If we try to apply this estimate to new observations, then the spurious correlation between $Z$ and $Y$ can be lead predictions to be off.
A final thought on preventing overfitting:
- A data scientist who has domain knowledge can prevent overfitting by only including predictors and potential models that make sense in the context of the problem.
- Even if the problem at hand isn't directly explanatory in nature, predictive models benefit from the same considerations that make a good causal model:
  - Only include predictors that meaningfully correlate with the outcome of interest.
  - Avoid having many correlated predictors.
  - Carefully consider functional forms for the predictive model - use a best-fit line when a line will do.

Module 15 Prediction

Prediction

What is prediction? Why is it different?

Good and Bad Prediction

How to avoid overfitting?

results matching ""

No results matching ""