Common Mistakes in Multiple Regression

 

1.   The response variable, Y, doesn’t need to be normal as this is not an assumption of multiple regression. In fact, Y will rarely be normal. What must be true is that the errors around a prediction Y^ must be normal which one can check with a normal plot of the residuals. The errors must also have a constant variance which you can check with a predicted by residual plot. This plot should have not pattern. One should what for a fanning out in this plot which would indicate that a log transformation is needed. Doing a normal plot of Y will cost one points.

2.   Regression does not assume that the regressors have any distribution. So checking to see if they have a normal distribution with normal plots is not required and will cost one points. One should, however, use box plots to check for outliers in the regressors. (Just don’t do a normal plot.) One probably leave these points in the data and once the multiple regression is done check for influential observations

3.   It is a waste to do bivariate regressions with least square fits. The fits give no useful information so one should not do them if one doesn’t want to loose points. It is good, however, to look at the scatter plots of each regressor versus the response. They may give one insights to whether transformations are needed.

4.   It is a mistake to relay on stepwise regression exclusively to identify the model. The model that stepwise regression comes up should be taken as a suggestion that you have to check using the tools taught in the class, such as, the Effects Table and residual analysis. When one has a small number of regressors one can select “all models” from the triangle menu on the stepwise output. The model selected this way will be better in term of the highest R-Square for a given number of regressors. However, this is not the only criteria for a model being good. One must check it as one must do with the model selected by stepwise regression.

5.   Confusing the errors with the residuals

 

Common Conceptual mistakes

 

1.   Saying that a confidence interval contains an statistic, such as the sample mean,with a given confidence instead of a parameter such as the population mean,.

2.   Confusing the terms: predictor variable, regressor, independent variable, dependent variable and response

3.   Confusing the concept of outlier and influential observation

1.   Confusing “estimating” with “predicting”