Smith Fules sells heating oil to residential customers and would like to build a model to predict its customer’s oil consumption.
Oil customers are exposed to the risk of running out of fuel. Home heating oil suppliers therefore have to guarantee that the customer’s oil tank will not be allowed to run dry. Home heating oil industry try uses the concept of a degree-day, equal to the difference between the average daily temperature and 68 degree Fahrenheit. So if the average temperature on a given day is 50, the degree-days for that day will be 18. If the degree-day calculation results in a negative number, the degree-day number is recorded as 0.
By keeping track of the number of degree-days since the customer’s last oil fill, knowing the size of the customer’s oil tank, and estimating the customer’s oil consumptions as a function of the number of degree-days, the oil supplier can estimate when the customer is getting low on fuel and then resupply the customer. However, Smith has more than 2000 customers and computational burden of keeping track of all of these customers is enormous.
Smith wants to develop a consumption estimation model that is practical and reliable. To estimate customer oil use the following data are available:
Customer Oil Usage Degree Days Home Factor Number People
1 381 888 3 3
2 171 176 5 7
3 644 1073 5 4
4 19 126 2 4
5 394 645 5 5
6 153 326 4 6
7 7
1229 1 3
8 319 1218 2 4
9 40 570 2 1
10 121 334 1 7
11 243 738 3 3
12 200 1464 1 5
13 402 880 4 5
14 118 1134 1 5
15 319 1019 3 4
16 185 460 2 3
17 209 257 5 4
18 467 779 5 4
19 50 128 2 4
20 153 371 2 5
21 94 178 3 6
22 574 933 5 3
23 191 295 3 5
24 679 1358 4 5
25 305 626 4 5
26 85 237 2 7
27 87 813 1 6
28 170 385 3 5
29 92 678 1 4
30 35 54 2 3
31 60 314 1 5
32 507 898 4 3
33 148 966 1 6
34 83 84 5 3
35 318 919 3 4
36 85 379 1 4
37 245 512 3 4
38 56 355 2 3
39 303 759 3 3
40 10 777 1 4
(a) Use regression to see whether a statistically reliable oil consumption model can be estimated from the data. You should strive a model that is simple, and that does not violate any of the basic assumptions of the multiple regression such as multicollinearity and heteroscedasticity, but nevertheless has good predictive power.
Display the regression output as 'P6 Regression' in your Excel file.
Explain clearly if you use any transformations/interactions.
Explain why your model is valid. Reference any tables/figures that you need to make your point.
(b) Using the oil usage data, make one figure that shows as much of the insight from your regression in part a) as possible using Tableau. Copy and paste your figure with summary and caption. Try to represent visually the intuition that your regression gives you in the figure.