No category

Download Learning Stata

Transcript

3.3. REGRESSION
35
the closer R2 is the one, the better the fit. Unfortunately it has a few drawbacks, the biggest of which is that it actually suffers from its own simplicity. It’s
easy to interpret. You don’t need much statistical training to understand that
a high R2 is good: your model is explaining a lot of the variation in the data!
You do, however, need at least some statistical training to realise that it’s only
good if your model is correctly specified. High R2 s may erroneously arise from
omitted variables, mis-specifying the functional form, and most overlooked of
all, from having too many regressors, since increasing the number of explanatory variables always increases (or, in rare cases, doesn’t decrease) the R2 . Misspecifying the functional form and omitted variables affect all regression statistics; they, however, are slightly more difficult to interpret and won’t “improve”
as you recklessly add more right-hand variables (in fact, they’ll probably get
worse). So, it’s less of a risk that one will draw quick and superficial conclusions
from them.
Moreover, micro-economists are generally interested in the relationship between a particular independent variable and the dependent variable; usually,
they are less interested in how well the overall model explains the data. Instead,
they care about correctly specifying the model’s functional form and including
appropriate controls so as to reduce bias on the regressor of interest. Macroeconomists, who often attempt to predict an outcome, are more concerned with
the statistic.
As mentioned, the R2 is (weakly) increasing in the number of independent
variables: the more controls you have, the more variation is “explained”, if only
by chance. The adjusted R2 , often written R̄2 . was created to account for this.
As its title suggests, R̄2 “adjusts” R2 for the ratio of regressors to observations
Adding an independent variable with no explanatory power will decrease it. It
increases only if the new independent variable improves the predictive power of
the model more than would otherwise be possible via chance alone. Of course,
that’s still assuming the model is correctly specified.
Below the R̄2 is the root mean-square-error (MSE). The MSE is the RSS divided by the degrees of freedom (the sample size reduced by the number of
parameters). Again, assuming the model is correctly specified, the MSE is an
unbiased estimate of the error variance. Root MSE is it’s square root.

Related documents