Prediction

What is prediction? Why is it different?

  • Prediction is everywhere:
    • Basically all modern AI system employ prediction at their core
    • Success of AI is largely a function of three advances:
      • Computation
      • Data
      • Algorithms
  • Some algorithms/Forms of Prediction:
    • Machine Learning
      • Linear regression, logistic regression, PCA, LASSO, random forests, K-NN, etc.
    • Artificial Intelligence
    • Deep Learning
  • Why prediction?
    • Prediction seeks to find a best guess for an outcome given some meaningfully associated data.
      • e.g. Stock markets
  • Why Prediction?
    • What is machine learning?
      • The study of algorithms that improve through repeated experience
      • Given observations of an outcome and some important correlations, build models that predict outcomes.
      • Machine learning is mostly concerned with prediction.
  • Prediction vs. Explanatory Modeling:
    • Different in goals:
      • Explanatory Modeling (Causal Inference): What is the effect of XX on Y?Y?
      • Prediction: Can we forecast what YY will be given XX?
    • Casual Inference: We want to know the effect of XX on YY, so we
      • Run an experiment randomizing XX
      • Use a DiD or Regression Discontinuity
      • Run a regression controlling for all possible confounders.
      • Use modeling to estimate unobservable potential outcomes, control for confounders, and quantify uncertainty around our estimated effect.
    • Predictive Recipe: find a model for XX and YY that minimizes my prediction error Prediction Error=i=1Nf(yiy^i). \text{Prediction Error}=\sum_{i=1}^Nf(y_i-\hat{y}_i).
      • In words, over NN observations, how far off from the truth amd I with the predictions produced by my model?
    • Explanation can aid in some types of prediction problems but is not necessary in many cases.
    • Even if we do not know the causal effect (or even if the causal effect is zero), there might be predictive information in knowing a treatment status.
      • They key is that these features contain information about the outcome we are trying to predict.
      • Rule of Thumb: avoid noise, add new information
    • The main predictive question:
      • Can I predict the value of YY given some correlates for an observation that was not included in the modeling step?
      • Sometimes, we care about future observations that have not yet occurred (forecasting).
      • Other times, we care about with-in sample prediction ability.
  • Types of Prediction.
    • Static Prediction
      • Key assumption: Environment is static (does not change)
        • No policy change
        • No Adaptation
      • Dynamic prediction:
        • Environment may change due to
          • Adaptation
          • policy change

Good and Bad Prediction

  • What makes for good predictors:
    • Avoid Redundancy
    • Avoid pure noise
    • Precise information reduces noise
    • Predictors from a variety of domains provide different information
  • Evaluating Predictive Performance:
    • The higher order polynomial, the less error we have in the training data.
    • Overfitting occurs when we tune a predictive model only on the data we observe.
    • Since the goal of prediction is to minimize the prediction errors, a model that only accounts for the observed data will produce fits that are too specific for general prediction.
    • We need to balance fitting the observed data with generality - favor parsimonious models over complex models while still avoiding underfitting.
  • Overfitting/Bias-Variance Trade-off
    • Overfitting: aka the bias-variance trade-off in machine learning parlance. Underfitting, Overfitting and Good Balance
      • Overreacts to every tiny random perturbation in the data.
      • Excellent, maybe perfect fit to in-sample data
      • Models noise -> won't extrapolate/generalize well to new out-of-sample data.
    • Underfitting: misses the true patterns/ relationships in the data (bias)
      • Underreacts to real changes
      • Won't describe in-sample or out-of-sample data well.
    • Overfitting can occur in two circumstances:
      • Model is too flexible to observed data
      • Model includes many predictors that are only loosely related to the outcome - especially problematic when the predictors are correlated with one another (kitchen sink models).

How to avoid overfitting?

  • Evaluating Predictive Performance:
    • The prediction recipe:
      • Collect predictors and outcomes for some number of observations.
      • Hold out some portion of the observations as a test set.
      • Find the optimal coefficients for the predictive model on the training subset of the data.
      • Evaluate the prediction error on the test set.
    • Sometimes, there is a natural training/test split: e.g. time intervals
    • Other times, we need to randomly split the data into training and test sets. e.g. randomly hold-out 20% of the data as a test set and fit the model on the other 80%. Quantify predictive performance on the held-out test set.
      • This is better than including all data in predictive model at first because it guards against overfitting to the data - if the model fits the training set well, but doesn't fit the test set, then the model predictions aren't very good.
  • Cross-Validation:
    • The cross-validation prediction recipe:
      • Collect predictors and outcomes for some number of observations.
      • Divide the data into KK equally sized folds.
      • Treating fold 11 as the test set, find the predictive model using the training set that minimizes prediction error in fold 11.
      • Repeat for each fold, treating them as the test set in turn.
      • Average the predictions of each "best model" to get the K-fold cross-validated prediction.
    • K-fold cross validation addresses both problem of selecting test set and overfitting by ensuring that all observations are held-out at one point or another.
    • Prevents overfitting by averaging over the models - what's good for one may not be good for another, so find a happy medium by averaging them all together.
  • Avoid Spurious Predictors
    • Overfitting can also happen if our model includes many predictors that are only loosely related to the outcome.
      • Espeically problematic when the predictors are correlated with one another (kitchen sink models).
    • Is the predictor really related to YY or is the correlation in this data significant by chance/noise?
  • Signal vs. Noise
    • Let ZZ be an unimportant predictor
    • When finding the model that minimizes prediction error, random chance dictates that there will be a relationship between ZZ and YY.
    • If we try to apply this estimate to new observations, then the spurious correlation between ZZ and YY can be lead predictions to be off.
  • A final thought on preventing overfitting:
    • A data scientist who has domain knowledge can prevent overfitting by only including predictors and potential models that make sense in the context of the problem.
    • Even if the problem at hand isn't directly explanatory in nature, predictive models benefit from the same considerations that make a good causal model:
      • Only include predictors that meaningfully correlate with the outcome of interest.
      • Avoid having many correlated predictors.
      • Carefully consider functional forms for the predictive model - use a best-fit line when a line will do.

results matching ""

    No results matching ""