Cash-back offer from April 23rd to 27th, 2024: Get a flat 10% cash-back credited to your account for a minimum transaction of $50.Post Your Questions Today!

Question DetailsNormal
$ 35.00

Stats Final Exam Part 2 | Complete Solution

Question posted by
Online Tutor Profile
request

Stats Final Exam Part 2

Given the data set called County Demographic Information, construct a predictive model for the variable “Total Serious Crime” using some or all of the other variables in the set of data.
The model should be mathematically valid, accurate and reliable.
Total Serious Crime is Variable #8
Other Variables:
#2    Land Area
#3    Total Population
#4    Percent of Population aged 18-34
#5    Percent of Population 65 or over
#6    Number of Active Physicians
#7    Number of Hospital Beds
#9    Percent of High School Graduates
#10    Percent of Population with College Degrees
#11    Percent of Population below poverty level
#12    Unemployment Percent
#13    Per Capita Income
#14    Total Personal Income
#15    Geographic Region
Note: I am omitting the data set to simplify this problem; the following analyses use the data set described above, and you can assume the math is calculated correctly. I am testing to see if you can identify what analytical techniques may be validly employed and how effective are they building a model.
Variables 2 to 14 are numeric variables and variable 15 is categoric.
Analysis #1
In the given data set, we were asked to determine if an accurate predictive model for Variable #8, Serious Crime could be found using the attached data.
    Since Variable 15 was determined to be categoric, regression was not appropriate to use; so I used Analysis of Variance (ANOVA) to examine if there was a significant relationship between Variable 8 and 15. The results (using Systat 13.0) are printed above.
    
Variables    Levels
VAR(15) (4 levels)    1.000    2.000    3.000    4.000

Dependent Variable    VAR(8)
N    440
Multiple R    0.110
Squared Multiple R    0.012

Estimates of Effects B = (X'X)-1X'Y
Factor    Level    VAR(8)
CONSTANT        28,017.368
VAR(15)    1    -4,931.339
VAR(15)    2    -6,236.627
VAR(15)    3    -1,026.394

Analysis of Variance
Source    Type III SS    df    Mean Squares    F-Ratio    p-Value
VAR(15)    1.795E+010    3    5.985E+009    1.774    0.151
Error    1.471E+012    436    3.374E+009         

ANOVA results suggest that Variable 15 is significantly related to Variable 8, but Variable 15 can only explain approximately 15.1% of the variation in Variable 8.
Therefore, I conclude that variable 15 is significantly related to variable 8 although variable 15 is only a minor factor in predicting variable 8.

 


Analysis #2
Using Systat, I employed Multiple Linear Regression to attempt to create a predictive model, using all of the available variables as independent variables.
The results are shown below.
    
Dependent Variable    VAR(8)
N    440
Multiple R    0.919
Squared Multiple R    0.844
Adjusted Squared Multiple R    0.839
Standard Error of Estimate    23,367.069

Regression Coefficients B = (X'X)-1X'Y
Effect    Coefficient    Standard Error    Std.
Coefficient    Tolerance    t    p-Value
CONSTANT    -50,925.731    35,344.226    0.000    .    -1.441    0.150
VAR(2)    -3.054    0.849    -0.081    0.719    -3.599    0.000
VAR(3)    0.234    0.020    2.422    0.008    11.560    0.000
VAR(4)    221.063    424.685    0.016    0.393    0.521    0.603
VAR(5)    32.120    380.640    0.002    0.539    0.084    0.933
VAR(6)    -5.189    3.150    -0.159    0.039    -1.647    0.100
VAR(7)    3.404    2.280    0.134    0.046    1.493    0.136
VAR(9)    -265.566    321.799    -0.032    0.244    -0.825    0.410
VAR(10)    140.915    373.505    0.019    0.152    0.377    0.706
VAR(11)    1,142.711    488.132    0.091    0.241    2.341    0.020
VAR(12)    -159.661    658.025    -0.006    0.526    -0.243    0.808
VAR(13)    2.335    0.699    0.163    0.154    3.339    0.001
VAR(14)    -7.070    0.946    -1.564    0.008    -7.475    0.000
VAR(15)    1,456.610    1,319.387    0.026    0.668    1.104    0.270

Analysis of Variance
Source    SS    df    Mean Squares    F-Ratio    p-Value
Regression    1.256E+012    13    9.664E+010    176.989    0.000
Residual    2.326E+011    426    5.460E+008         

Since the combined model had a p-value of 0.000, I concluded that this model could accurately predict variable 8, Total Serious Crime. The R-Squared value of approximately .84 suggests that the model explains about 84% of the variation in Serious Crime. Therefore, I conclude that this is a fairly accurate, valid, predictive model of Total Serious Crime.

Analysis #3
Since many individual, independent variables of the previous regression model had p-values above .05, they were not significant factors. I discarded them, redid the regression analysis,  and got the results listed below.
    
Dependent Variable    VAR(8)
N    440
Multiple R    0.918
Squared Multiple R    0.842
Adjusted Squared Multiple R    0.840
Standard Error of Estimate    23,274.901

Regression Coefficients B = (X'X)-1X'Y
Effect    Coefficient    Standard Error    Std.
Coefficient    Tolerance    t    p-Value
CONSTANT    -63,890.789    10,233.100    0.000    .    -6.244    0.000
VAR(2)    -3.109    0.758    -0.083    0.894    -4.101    0.000
VAR(3)    0.250    0.016    2.580    0.013    15.282    0.000
VAR(11)    1,449.915    307.144    0.116    0.603    4.721    0.000
VAR(13)    2.460    0.469    0.171    0.341    5.250    0.000
VAR(14)    -7.899    0.787    -1.748    0.012    -10.037    0.000

Analysis of Variance
Source    SS    df    Mean Squares    F-Ratio    p-Value
Regression    1.254E+012    5    2.508E+011    462.898    0.000
Residual    2.351E+011    434    5.417E+008         

This model is a better predictive model than analysis #2 since it has a higher F-value, and therefore a smaller p-value. Also, each factor of the model has a p-value smaller than .05; this indicates that each component is significant in itself. The R-Squared value of .84 indicates that I can predict Variable 8 with approximately 84% accuracy, using only five variables and a constant.

Analysis #4
Repeating the previous analysis, but deleting the constant allowed me to raise the R-Squared value to almost .87.
Dependent Variable    VAR(8)
N    440
Multiple R    0.932
Squared Multiple R    0.869
Adjusted Squared Multiple R    0.868
Standard Error of Estimate    23,381.775

Regression Coefficients B = (X'X)-1X'Y
Effect    Coefficient    Standard Error    Std.
Coefficient    Tolerance    t    p-Value
VAR(2)    -3.010    0.763    -0.088    0.612    -3.942    0.000
VAR(3)    0.245    0.016    2.739    0.009    15.107    0.000
VAR(9)    -697.218    118.026    -0.846    0.015    -5.907    0.000
VAR(10)    496.913    209.212    0.174    0.056    2.375    0.018
VAR(11)    683.363    248.743    0.105    0.206    2.747    0.006
VAR(13)    1.727    0.472    0.511    0.015    3.657    0.000
VAR(14)    -7.658    0.780    -1.800    0.009    -9.818    0.000

Analysis of Variance
Source    SS    df    Mean Squares    F-Ratio    p-Value
Regression    1.576E+012    7    2.251E+011    411.714    0.000
Residual    2.367E+011    433    5.467E+008         

Using seven variables and no constant, I found a model that had each component with a low p-value (under .05) and an overall p-value of 0.000. I would conclude similar to what I did in analysis #3, but I would prefer this model because of its higher R-Squared value.

Analysis #5
Trying to optimize the model, I repeated the earlier analytical methods. I discarded the constant and tried to lower the number of variables. I was able to find a model (see results listed below, and compare to analyses #3 and #4 ) that used only four variables. Each variable had a p-value under .05, the F-value was higher than earlier models (therefore, the overall p-value was lower for the overall model) and the R-Squared value was still approximately .84.
    
Dependent Variable    VAR(8)
N    440
Multiple R    0.916
Squared Multiple R    0.840
Adjusted Squared Multiple R    0.839
Standard Error of Estimate    25,805.795

Regression Coefficients B = (X'X)-1X'Y
Effect    Coefficient    Standard Error    Std.
Coefficient    Tolerance    t    p-Value
VAR(2)    -2.141    0.814    -0.062    0.656    -2.629    0.009
VAR(3)    0.088    0.002    0.979    0.644    41.013    0.000
VAR(11)    1,240.562    217.578    0.191    0.327    5.702    0.000
VAR(13)    -0.846    0.116    -0.251    0.314    -7.328    0.000

Analysis of Variance
Source    SS    df    Mean Squares    F-Ratio    p-Value
Regression    1.522E+012    4    3.805E+011    571.367    0.000
Residual    2.903E+011    436    6.659E+008         
                    
Therefore, I concluded that Model #5 was the preferred model since it only had four input variables and achieved approximately the same predictive accuracy. Thus I needed only four independent variables to predict variable #8 with accuracy of approximately 84%.

A)    Are each of the five analyses valid? (if not, why not?)
B)    Are each of the five analyses significant? (why?)
C)    Are each of the five analyses accurate? (why?)
D)    Which is the best predictive model and why?

 

Available Answer
$ 35.00

[Solved] Stats Final Exam Part 2 | Complete Solution

  • This solution is not purchased yet.
  • Submitted On 18 Aug, 2015 11:37:49
Answer posted by
Online Tutor Profile
solution
Yes, all the five analyses are accurate, since all the analyses...
Buy now to view the complete solution
Other Similar Questions
User Profile
maste...

STATS FINAL EXAM/ STATS FINAL EXAM (100% ANSWER) A++

..................................................................this site is a fake site ...
User Profile
AceTu...

Biostats Final Exam Review | Complete Solution

A. Upper limit -11.88 B. Lower limit -26.72 C. Based on the confidence interval which of the following is (are) true? (a) There is significant evidence, alpha = 0.05, to show that there is a difference in total serum choleste...
User Profile
Exper...

Stats Final Exam Part 2 | Complete Solution

Yes, all the five analyses are accurate, since all the analyses were performed on the statistical tool Systat. The dependent and predictor variables chosen correctly which enables Systat to provide accurate results of the des...
User Profile
Exper...

Stats Final Exam Part 1 | Complete Solution

H0: µA = µB Ha: µA ≠ µB The level of significance, α = .05 The test statistic, t (54) = 1.39, p = .1706 > .05 indicated that there is not enough evidence to conclude that the two data sets are significantly differ...

The benefits of buying study notes from CourseMerits

homeworkhelptime
Assurance Of Timely Delivery
We value your patience, and to ensure you always receive your homework help within the promised time, our dedicated team of tutors begins their work as soon as the request arrives.
tutoring
Best Price In The Market
All the services that are available on our page cost only a nominal amount of money. In fact, the prices are lower than the industry standards. You can always expect value for money from us.
tutorsupport
Uninterrupted 24/7 Support
Our customer support wing remains online 24x7 to provide you seamless assistance. Also, when you post a query or a request here, you can expect an immediate response from our side.
closebutton

$ 629.35