Chapter 12: Multiple Regression and Model Building  

An example (Butler Trucking) with simple regression

Butler Trucking: Here we have only one independent variable x1 (# kms travelled) with r2 = 0.60, only. (Dependent variable is y : travel time.) Is the fit good?

Butler Trucking problem with multiple regression

We now include a second variable, (x2 : # deliveries made) to explain the travel time. Here are the MegaStat results for this case with the output comments given here.

Multi-collinearity in multiple regression

 

Multi-collinearity is a serios problem in multiple regression. This arises if two or more independent variables are highly correlated with each other and it results in (i) inflation of standard errors, (ii) distorts the estimates of coefficients, (iii) removing a data point may result in large changes in coefficients (and their signs), among others.

 

Multi-collinearity can be checked using the correlation matrix output of MegaStat. See this example where distance and gasoline usage are correlated and even though model is significant, individual variables are not!

Dummy variables

In the Butler Trucking problem, suppose we consider a case where either a van (1) or a pickup truck (0) is used for deliveries. With such a dummy variable beta3 now corresponds to the additional time it takes to deliver goods with a van. The Excel file for this problem is here.

Real estate problem with multiple regression

Real Estate Data (again): This time we use LotSize (x1), SqrFt (x2), Bedrooms (x3) and Bathrooms (x4) as variables and predict the Price (y). The regression equation is obtained as

Price = 17.41 + 12.27*LotSize + 0.01*SqrFt + 55.03*Bedrooms + 20.55*Bathrooms.

So, if your home has: LotSize = 1, SqrFt = 2,800, Bedrooms = 6 and Bathrooms = 4, then Price = $496,457.

Also, note that R2 = 0.93, and the p-value for the F-test is almost 0. What do these imply?

Actual prices for home listed in Hamilton area

Actual real estate data for Hamilton homes.