EBF 472 Problem Set #9: Inference for Regression | Complete Solution
- From Mathematics, Statistics
- ExpertT
- Rating : 109
- Grade : A+
- Questions : 1
- Solutions : 1026
- Blog : 0
- Earned : $53187.54
EBF 472: Quantitative Analysis in Earth Sciences
Problem Set #9: Inference for Regression
Due in class on Thursday, April 14
For this assignment, use the data provided to you in the excel file “HW9 Data.xlsx”.
Go to the “Body Fat” worksheet.
This is the data from the in-class tutorial last Thursday. It has 100 observations of five variables:
Body Fat (dependent variable)
Age (independent variable
Weight (independent variable)
Abdomen (independent variable)
Wrist (independent variable)
Use “linest()” in excel to regress Body Fat against ONLY WEIGHT (only one independent variable for this question). [make sure you select 5 rows and 2 columns of cells, and include ‘TRUE’ for the 3rd and 4th arguments].
What are the values for the slope and intercept in the results?
Test the slope for statistical significance:
What is the standard error for the slope?
Calculate the t-statistic for the slope
t=b/(S.E.)
What are the degrees of freedom for this correlation?
df=n-k-1
(k = number of explanatory variables; n = number of samples).
Calculate the exact p-value for your t-statistic value above and your degrees of freedom.
Use the excel function for a 2-tailed t-test:
=t.dist.2t( <tstat>, <df>)
Remember that the Null Hypothesis for this test is that the slope is zero:
H_0: β=0
Given your p-value above, can you reject the null? If so, at what level of significance?
What is the R2 for this regression? What does this number mean, in words?
Test the regression for significance:
What is the F-statistic value given in the linest() results?
Confirm this value by calculating the following, using the other values in the results from linest:
F=(SSModel⁄k)/(SSError⁄((n-k-1) ))
SSTotal=SSModel+SSError
Use the values for SSModel, SSError, n, and k from your regression and results, and calculate F. Ensure you get the same value as returned by linest().
Calculate the p-value for this F-statistic value. Use the excel function f.dist.rt() to give you the upper tail probability from the F-Distribution:
= f.dist.rt( <Fstat>, k, n – k – 1)
The Null hypothesis is that all slopes are zero, or equivalently, that none of the variance in the dependent variable (body fat) is explained by the regression model.
For the p-value you calculated above, can you reject the null?
Follow these steps to check the validity of the regression:
In an empty column next to your data (e.g. Column F), enter a formula for the predicted body fat:
= <intercept> + <slope> * <weight for this sample>
and paste down all 100 rows.
In the next empty column (e.g., Column G), enter a formula for the residual:
= <body fat for this sample> - <predicted body fat>
and paste down all 100 rows
Create a scatterplot with “predicted body fat” on the X-axis, and “residuals” on the Y axis, and paste the figure into your answers.
From the shape of the scatter, does this regression look ok, or is there a problem.
If there is a problem, identify what type of problem.
Repeat “Body Fat” regression using ALL independent variables.
Repeat all the steps above, but for a regression using all variables to predict body fat.
Use “linest()” to get the regression results, selecting body fat (A2-A101) for the “Ys” argument and selecting the other 4 columns (B2:E101) for the “Xs” argument.
[Make sure you select a range of empty cells 5 columns by 5 rows, and use “ctr-shift-enter”]
What are the values of all coefficients from this regression?
Test all 4 slopes for significance using the t-test as above.
For each variable, give the value you calculate for the t-statistic, the p-value of that statistic, and whether you reject the null or not for that variable.
Age:
Weight:
Abdomen:
Wrist:
[Make sure you use the correct degrees of freedom when you get the p-value].
Give the F-statistic for this regression, calculate its p-value, and state whether you reject the null for this model or not.
In empty columns, use the coefficients to calculate the predicted value and the residual for each sample. Insert a scatterplot of Predicted Body Fat vs. Residual, and comment on whether you see any problems in the plot.
Go to the “Energy Consumption” Worksheet.
This worksheet contains 41 samples (1970-2010) of
Annual energy consumption by the U.S. (dependent variable)
Energy Price (independent variable)
GDP per capita (independent variable)
Population (independent variable)
Use “linest()” to get the regression results, selecting Total Energy Cons for the “Ys” argument and selecting the other 3 variables for the “Xs” argument.
[Make sure you select a range of empty cells 4 columns by 5 rows, and use “ctr-shift-enter”]
What are the values of all coefficients from this regression?
Does the sign (+ or -) of each slope coefficient make sense?
Test all 3 slopes for significance using the t-test.
For each variable, give the value you calculate for the t-statistic, the p-value of that statistic, and whether you reject the null or not for that variable.
Energy Price:
GDP per capita:
Population:
[Make sure you use the correct degrees of freedom when you get the p-value].
Which variables are statistically significant for predicting energy consumption?
What is the value of R2 for this regression? What does this tell you about the model?
Give the F-statistic for this regression, calculate its p-value, and state whether you reject the null for this model or not.
In empty columns, use the coefficients to calculate the predicted value and the residual for each sample. Insert a scatterplot of Predicted vs. Residual, and comment on whether you see any problems in the plot.
[Solved] EBF 472 Problem Set #9: Inference for Regression | Complete Solution
- This solution is not purchased yet.
- Submitted On 16 Apr, 2016 04:37:16
- ExpertT
- Rating : 109
- Grade : A+
- Questions : 1
- Solutions : 1026
- Blog : 0
- Earned : $53187.54