Cash-back offer from May 7th to 12th, 2024: Get a flat 10% cash-back credited to your account for a minimum transaction of $50.Post Your Questions Today!

Question DetailsNormal
$ 40.00

STAT 501 Final Exam | 3 Questions solved

Question posted by
Online Tutor Profile
request

                                          STAT 501 – Final Exam

 

  1. (5x2 = 10 points) State which of the following statements are true and which are false. For the statements that are false, explain why they are false.

 

  1. In a logistic regression analysis where Y=1 represents survival and Y=0 represents death, the logit of the survival probability is the negative of the logit of death probability.
  2. In regression analysis, the method of ordinary least squares can be used in the presence of non-normal errors.
  3. In multiple linear regression analysis, the width of a prediction interval for a future response of Y based on a single predictor X increases with the value of X.
  4. The error terms in an AR(1) model have zero mean.
  5. In model selection, the MSE (or S) criterion minimizes confidence/prediction interval widths, while the PRESS criterion evaluates model unbiasedness.

 

 

  1. (7x2 = 14 points) Fill in the blanks with terms from the list: non-normality, multicollinearity, heteroscadasticity, confidence intervals, prediction intervals. [Note: there are 7 blanks but only 5 terms, so you’ll have to use some terms more than once and there may be some terms you don’t use at all.]

 

  1. A small p-value for the Ryan-Joiner test indicates ________.
  2. A residual vs. fits plot with a non-random pattern around a horizontal line at zero indicates ________.
  3. A large sample size ensures the validity of a confidence interval for a mean response even when errors exhibit ________.
  4. ________ among predictors will lead to unreliable ________ for regression coefficients.
  5. Weighted least squares estimation can be used in the presence of error ________ and ________.

 

 

 

  1. (4x3 = 12 points) The following ANOVA table is abstracted from a regression fit to the

model:  Y = β0 + β1 X1 + β2 X2 + β3 X3 + … + β10 X10 + ε.

 

Source

DF

SS

Regression

 

110.53

Residual Error

28

 

Total

 

150

 

 

Source

DF

Seq SS

X1

1

0.10

X2

1

40

X3

1

1.0

X4

1

55

X5

1

2.5

X6

1

0.08

X7

1

6.5

X8

1

4.0

X9

1

0.4

X10

1

0.95

 

 

  1. Calculate the three missing values in the upper table.
  2. For the 10-predictor model, perform a hypothesis test at significance level 0.05 to determine whether predictors X7, X8, X9, and X10 are significantly linearly related to Y upon controlling for the remaining predictors X1-X6 using a general linear F test. Write the null and alternative hypotheses, the value of the test statistic, the decision rule, and the conclusion. [Note: an F-distribution table is provided on the last page of the exam.]
  3. Later it was decided to consider a regression of Y on the first 4 predictors ONLY. Use information from both tables above to calculate adjusted R2 for the model with only the first 4 predictors.
  4. Given the information in both tables above, is it possible to test whether X1 and X3 can be dropped from the 4-predictor model? Give a brief argument supporting your answer. [You do not have to do a test, even if one is possible.]

 

  1. (4+9+4+4 = 21 points) Data from a local supermarket revealed that the deli usage of customers depends on their grocery bill and also on the time of shopping. To understand the link between these variables, a logistic regression model was fitted based on data from 890 sales records, which yielded the following.

 

  •  
  •  

bill 110.82410.824110.820.001

  •  
  •  
  •  

 

Odds Ratio95% CI

  •  

Odds Ratio for lunch=1 relative to lunch=0

Odds Ratio 95% CI

  1.  

Here  is the estimated probability of deli usage, bill is the amount of the grocery bill and lunch is a binary variable that equals 1 for a store visit at lunchtime and 0 for a store visit at other times.

  1. Is their any statistical evidence that shopping time is related to the odds of deli usage, and if so, does lunchtime have a higher odds of deli usage than a visit at other times?
  2. Write the regression equation to estimate the logit of:
  1. the probability of deli usage in terms of both predictors bill and lunch.
  2. the probability of deli usage for lunchtime shoppers and others separately.
  3. the probability of NO deli usage in terms of both predictors bill and  lunch.
  1. Use your answer in (b)(ii) above to find the value of bill at which the probability of using the deli is 0.80 for a lunchtime shopper.
  2. Write a sentence that interprets the coefficient estimate for the predictor variable bill.

 

  1. (7+7+8 = 22 points) Minitab outputs shown below are the results of a statistical analysis performed on a set of data consisting of 22 crop values (crop), along with fertilizer amounts (fert) and temperature (temp).

Coefficients

 

Term          Coef  SE Coef  T-Value  P-Value    VIF

Constant       104      104     1.00    0.329

fert          7.57     3.88     1.95    0.067  88.20

temp         3.591    0.855     4.20    0.001   1.61

fert*fert  -0.0821   0.0354    -2.32    0.032  90.92

 

Fitted Regression Equation

 

crop = 104 + 7.57 fert + 3.591 temp - 0.0821 fert*fert

 

fert   temp     Fit          95% CI              95% PI

50    37.64   412.701    (388.693, 436.709) (345.607, 479.795)

20    40      366.490    (285.192, 447.788) (263.852, 469.128)

 

Descriptive Statistics: fert, crop, temp

 

Variable        Mean   SE Mean  StDev  Minimum    Maximum

crop           376.4     13.4   62.9     270.0      460.0

fert           59.50     3.35   15.73    32.00      79.00

temp           37.64     1.34   6.29     27.00      46.00

 

  1. Comment on the validity of the fitted regression equation above in terms of the statistical significance of each predictor (use α =0.05) and regression pitfalls such as multicollinearity, outlier presence, error non-normality, and heteroscedasticity.
  2. Propose an alternative population regression model that may remedy some of these pitfalls.
  3. Comment on the validity of the 95% confidence intervals and prediction intervals computed for fert and temp settings:
  1. 50 and 37.64
  2. 20 and 40

 

 

 

  1. (6 points) The figure below gives the sample PACF for a time series data of monthly sales (in thousands of dollars). Use the PACF plot to propose an appropriate time series model for yt = sales during the tth month.

 

  1. (5x3 = 15 points) In an experiment, a researcher compares three different metal alloys (say, A, B, and C) used to weld pipes. For each alloy, 10 welds are made.  The response variable is strength of the weld (Y). In addition to the type of alloy (A, B, or C), a quantitative predictor, diameter of weld (X), will be used in a regression model for predicting Y.
  1. Write a population regression model (with no interaction) for predicting Y using X and alloy (A, B, or C) as predictors. Clearly define any necessary indicator variables.
  2. Explain precisely what each regression coefficient measures in the model that you wrote for part (a).
  3. What null hypothesis would be tested to determine whether there are differences among the alloys? Write the hypothesis in terms of the regression coefficients in part (a).
  4. Explain how you would carry out the hypothesis test described in part (c). Do not forget to write down the degrees of freedom for the test you propose.
  5. What term(s) should be added to the model to create a model with interactions? Write down the model
Available Answer
$ 40.00

[Solved] STAT 501 Final Exam | 3 Questions solved

  • This Solution has been Purchased 2 time
  • Submitted On 10 May, 2015 12:28:25
Answer posted by
Online Tutor Profile
solution
The intercept test β_0 may or may not have any practical interpretation depending on the range of the predictors, it has the usual interpretation that if all the predictors ...
Buy now to view the complete solution
Other Similar Questions
User Profile
Exper...

STAT 501 – Homework 10 (covers Lesson 11) | Complete Solution

From the above scatter pot we can see that the range of values lies in the same interval thus visually there is no sign of an extrapolation beyond the range of the data. From the given data we can see that, 1 4 2 1 4 4 1...
User Profile
Homew...

STAT 501 Final Exam | 3 Questions solved

The intercept test β_0 may or may not have any practical interpretation depending on the range of the predictors, it has the usual interpretation that if all the predictors are 0 then the value of the dependent variable. Th...
User Profile
smart...

STAT 501 Mid-Term Exam 2 | Solution

Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 4 112.612 28.1529 33.54 0.000 Stay 1 15.703 15.7032 18.71 0.000 Cultures 1 19.536 19.5358 23.27 0.00...

The benefits of buying study notes from CourseMerits

homeworkhelptime
Assurance Of Timely Delivery
We value your patience, and to ensure you always receive your homework help within the promised time, our dedicated team of tutors begins their work as soon as the request arrives.
tutoring
Best Price In The Market
All the services that are available on our page cost only a nominal amount of money. In fact, the prices are lower than the industry standards. You can always expect value for money from us.
tutorsupport
Uninterrupted 24/7 Support
Our customer support wing remains online 24x7 to provide you seamless assistance. Also, when you post a query or a request here, you can expect an immediate response from our side.
closebutton

$ 629.35