Math 1530 Capstone Project Part I | Complete Solution
- From Mathematics, Statistics
- ExpertT
- Rating : 109
- Grade : A+
- Questions : 1
- Solutions : 1026
- Blog : 0
- Earned : $53187.54
Math 1530 Capstone Project Part I Fall 2012 Solution
Directions:
1. DO YOUR OWN WORK! It is academic misconduct to copy or seek assistance from other people, or to share your work with other students. Any academic misconduct on this project results in a grade of 0.
2. Capstone Project counts for 200 points of the total grade.
3. The project is due by __________ on _______, _______, 2012. No late projects will be accepted.
4. Start each problem on a new page.
5. Insert any graphs in the appropriate places (not attached as an addendum at the back or even at the end of the problem.)
6. Only insert the relevant portions of a Minitab display used to answer a question, not everything Minitab gives you in hoping the right information is somewhere in what you copied into the document.
Here are the questions that were asked on the survey:
1. GENDER: Are you male or female? (Male, Female)
2. What are your birth month and year? (MONTH_BIRTH: Month; YEAR_ BIRTH: Year)
3. ELECTION_VOTE: If you vote in the US Presidential Election this fall, which political party do you prefer? (Democrat (Barack Obama), Republican (Mitt Romney), Other Party)
4. ELECTION_WINNER: Who do you think will win the 2012 US Presidential Election? (Democrat (Barack Obama), Republican (Mitt Romney), Other)
5. WORK_HOURS: On average, how many hours per week will you be working at a paid job this semester?
6. FRIENDS_FB: How many Facebook friends do you have?
7. SHOE_SIZE: What is your shoe size?
8. HEIGHT: What is your height?
9. AGE_INSTRUCTOR: Guess the age (in years) of your Math 1530 instructor.
10. SALARY_EXPECTED: What is your expected salary (in dollars) for a secure job in Johnson City if you have finished your intended highest degree?
The following questions were included in the Research Assignment for MATH1530 students at the beginning of the semester. Imagine that you have finished your intended highest degree and have a secure job in Johnson City. You plan to buy a house in Johnson City area this year. Pick a home that you want to buy and input the information in the following questions.
11. TYPE_HOUSE: Which of the following best describes the type of the house? (Single family, Condominium, Town house, Apartment, Multifamily house, Other)
12. SCHOOL: Which elementary school does this house belong to? (Cherokee, Fairmont, Lake Ridge,
Mountain View, North Side, South Side, Towne Acres, Woodland, Unknown)
13. YEAR_HOUSE: Which year was the house built?
14. PRICE_HOUSE: What is the listing price (in dollars) of this house?
15. NUMBER_BRS: How many bedrooms are in this house?
16. SF_FINISHED: What is the total finished square feet of this house?
17. How much in dollars are the property taxes of this house? (TAX_COUNTY: County tax; TAX_CITY: City tax)
A total of 791 students responded to the MATH1530 class survey. The data for 788 students were recorded. The Minitab worksheet MATH1530Fall12Survey.mtw includes the responses to some of the questions. Note that there are some missing values, denoted by an asterisk (*), in the data set.
The Minitab worksheet is set up as follows:
C1: ID
C2: GENDER
C6: ELECTION_VOTE
C7: ELECTION_WINNER
C8: WORK_HOURS
C9: AGE_INSTRUCTOR
C10: SALARY_EXPECTED ($)
C11: PRICE_HOUSE
C12: SF_FINISHED
C13: LN_PRICE (Natural Log of PRICE_HOUSE)
C14: LN_SF (Natural Log of SF_FINISHED)
C15: TYPE_HOUSE (Code the type into two categories: single family and other)
1. Variable type. Which of these questions from the class survey produced variables that are categorical and which are quantitative? Circle your answer.
a. ELECTION_VOTE Categorical Quantitative Neither
b. AGE_INSTRUCTOR Categorical Quantitative Neither
c. TYPE_HOUSE Categorical Quantitative Neither
d. PRICE_HOUSE Categorical Quantitative Neither
e. TAX_COUNTY Categorical Quantitative Neither
Note: A categorical variable places an individual into one of several groups or categories. A quantitative variable takes numerical values for which arithmetic operations such as adding and averaging make sense.
2. Age of MATH1530 instructors: Question 9 from the survey asked students to guess the age (in years) of their Math 1530 instructors.
a. Create a histogram for AGE_INSTRUCTOR and insert it here.
b. Which of the following best describes the shape of the distribution? Circle your answer.
Unimodal Bimodal Mutimodal
c. Why do you think we have observed this shape?
The responses from MATH1530 students to this question are for different instructors. Therefore, the data should show several peaks which reflect the different ages of MATH1530 instructors.
3. Students’ expected salary: Question 10 from the survey asked “What is your expected salary (in dollars) for a secure job in Johnson City if you have finished your intended highest degree?”
a. Create an appropriate display for students’ expected salary and insert it here.
b. Which of the following best describes the shape of the distribution? Circle your answer.
Skewed left Symmetric Skewed right
c. Are there any outliers in this data? Justify your answer.
Yes, the boxplot of the variable shows that there are many outliers (*).
To verify this, use the measures obtained in the next question. We have
Q 3 – Q 1 = 95,000-50,000 = 45, 000. 1.5 * IQR = 1.5 * 45,000 = 67,500.
Lower fence = Q 1 – 1.5 * IQR = 50,000 – 67,500 = -17,500.
Upper fence = Q 3 + 1.5 * IQR = 95,000 + 67,500 = 162,500
Therefore, any meal cost below -17,500 or above 162,500 would be considered an outlier.
d. Use numerical measures appropriate for the shape to describe the center and spread.
Descriptive Statistics: SALARY_EXPECTED ($)
Variable N N* Mean SE Mean StDev Minimum Q1 Median
SALARY_EXPECTED ($) 704 84 87919 2985 79191 10000 50000 70000
Variable Q3 Maximum
SALARY_EXPECTED ($) 95000 850000
Since there are outliers, the five-number summary should be used to describe the distribution:
Min = 10,000, Q1 =50,000, Median = 70,000, Q3 = 95,000, Max = 850,000
Note that the mean is larger than the median. This will typically be the case when the distribution is right skewed.
e. Create a side-by-side boxplot to compare the distributions of the expected salary for males and females. Insert the graph below. Comment based on the graph.
Graph>Boxplot>With Groups
Descriptive Statistics: SALARY_EXPECTED ($)
Variable GENDER N N* Mean SE Mean StDev Minimum Q1
SALARY_EXPECTED ($) Female 436 56 83404 3311 69137 15000 50000
Male 268 28 95265 5677 92934 10000 50000
Variable GENDER Median Q3 Maximum
SALARY_EXPECTED ($) Female 66920 90000 700000
Male 75000 100000 850000
Both graphs appear skewed to the right with many high outliers. The boxplot for female has slightly more right skewness than the one for male. The median, Q3, and Maximum are large for males than for females. There are more outliers in the female group.
4. House listing price. The listing price of a house depends on many variables and one of them is the finished square footage. MATH1530 class survey asked students to select a home for sale that they want to buy in the Johnson City area this year assuming that they have finished their intended highest degree and have a secure job in Johnson City. Questions 14 and 16 asked students to input the listing price (in dollars) (PRICE_HOUSE) and the total finished square feet (SF_FINISHED) of the house. Assume the houses selected by the MATH1530 students are an SRS of all houses for sale in Johnson City this year. We are interested in studying the relationship between the listing price and the total finished square feet of a house and whether knowing a house’s finished square footage would explain the listing price.
In the dataset, there are a few houses with very large listing prices and finished square footage. In regression, sometimes, we also consider a natural logarithm transformation of both explanatory variable and response. Check http://en.wikipedia.org/wiki/Natural_logarithm for more details of natural logarithm transformation. Columns C13 (LN_PRICE) and C14 (LN_SF) are the natural logarithm of PRICE_HOUSE and SF_FINISHED, respectively.
a. Create appropriate plots to display the relationships between listing price and finished square footage and between the logarithm transformations of these two variables. Insert the plots here.
b. Does each of the plots show a positive association, a negative association, or no association between the two variables?
Both plots show a positive association between the two variables.
c. Which pair of variables is more appropriate to be fitted by a linear regression model, PRICE_HOUSE and SF_FINISHED or LN_PRICE and LN_SF? Explain.
The scatterplots show a clearer linear relationship between LN_PRICE and LN_SF. The variance of PRICE_HOUSE is larger as SF_FINISHED increases.
Stat>Basic Statistics>Correlation
d. What is the correlation coefficient between the two variables you selected in Part (c)? 0.667_
Note: if you selected PRICE_HOUSE and SF_FINISHED in Part (c), then the correlation is 0.649.
e. Obtain the least squares regression equation for the two variables you selected in Part (c). Insert it here. Stat>Regression>Fitted Line Plot
The regression equation is: LN_PRICE = 5.39 + 0.872 LN_SF
Note: if you selected PRICE_HOUSE and SF_FINISHED in Part (c), then the regression equation is PRICE_HOUSE = 38369 + 75.44 SF_FINISHED
f. (Bonus) Interpret the slope of the regression equation in part (e) in the context of the question.
The price of the house will increase $75.44 for each additional increase in the finished square footage.
The logarithm of the price of house will increase by 0.872 units on average for each addition increase in the logarithm of square footage.
g. How well does the regression equation fit the data? Explain. Justify your answer with appropriate plot(s) and summary statistics.
The fitted line plot shows that the regression model fits the data fairly well although there appears to be a couple of outliers. R2 (R-squared) is useful in describing the linear association between X and Y. Minitab displays this measure in the figure above: R-Sq = 44.5%. Therefore 44.5% (R-Sq) of the variation in the LN_PRICE can be explained by the least-squares equation.
The fitted line plot shows that the regression model fits the data fairly well, although there are outliers present. The R-Squared value of 42.1% says that 42.1% of the variation in house price can be explained by the finished square footage of the house.
Note: Another scatterplot that is helpful to see whether the model makes sense is the residual plot. This helps in determining the appropriateness of the regression model. Recall that the residuals are Residual = Observed Data – Predicted Data. The residual plot shouldn’t have any interesting features, like direction or shape. It should stretch horizontally with about the same amount of scatter about the horizontal line at 0. There should be no bends and no outliers. We see that the plot below looks fairly good. In minitab go to Stat>Regression>Regression>Graphs>”residuals versus fits”.
h. Assume that there is a house with total finished area of 13,300 square feet and listing price of $26,900 because of poor condition. The natural logarithm is 9.5 for the area and 10.2 for the listing price. If this observation is added to the analysis,
will it be an outlier? YES
will it be an influential point? YES
i. (Bonus) Assume the finished square footage is 2500 for a house in Johnson City. Use the regression equation to predict the listing price of this house if it is on the market this year.
The logarithm of 2500 is 7.824.
LN_PRICE = 5.39 + 0.872 LN(2500) = 5.3 + (.872)(7.824)=12.12
Thus the estimated listing price of this house is exp(12.12) = $183,505
or from the other model:
PRICE_HOUSE = 38369 + 75.44 SF_FINISHED
Price of House = 38369 + 75.44(2500) = $226,969
j. Provide a scatter plot of the two variables you selected in Part (c) and add the categorical variable TYPE_HOUSE. Display the regression lines for the two groups.
k. What do the associations of the two variables you selected in Part (c) by Type of house look like?
Single Family: positive negative no association
Other: positive negative no association
l. To predict the listing price of a house using the finished square footage, would you rather include the type of the house in the model?
Yes. The two regression lines are quite different.
[Solved] Math 1530 Capstone Project Part I | Complete Solution
- This solution is not purchased yet.
- Submitted On 03 Jul, 2016 06:00:20
- ExpertT
- Rating : 109
- Grade : A+
- Questions : 1
- Solutions : 1026
- Blog : 0
- Earned : $53187.54
Math 1530 Capstone Project Part I | Complete Solution
Math 1530 Quiz Complete work 100% Satisfaction Guaranteed!
MATH 1530 CAPSTONE TECHNOLOGY PROJECT SUMMER 2015 | Complete Solution
Math 1530 Quiz | Complete Solution
Eggs that are contaminated with salmonella can cause food poisoning among consumers. A large egg producer takes an SRS of 200 eggs from all the eggs shipped in one day. The laboratory reports that 11 of these eggs had salm...