Cash-back offer from May 7th to 12th, 2024: Get a flat 10% cash-back credited to your account for a minimum transaction of $50.Post Your Questions Today!

Question DetailsNormal
$ 14.00

Statistics 141 - Homework 6 complete solutions correct answers key

Question posted by
Online Tutor Profile
request

Statistics 141 - Homework 6 complete solutions correct answers key

 

NO LATE SUBMISSIONS

Write a report showing the code, results and plots for the questions below.

Put a printed version in Charles Arnold's mailbox in the Statistics department o_ce, 4th oor of

the Mathematical Sciences Building, and

send an electronic version to [email protected] with the subject STA141 Assignment 6.

Place the following text at the top of your report and sign it on the physical version you submit:

I certify that I have acknowledged any code that I used from any other person in the class, from

Piazza or any Web site or book or other source. Any other work is my own.

1 UNIX Shell Tools

In this part of the assignment, you will use UNIX shell tools to process data outside of R and also

to get data into R.

In the Data directory on the class Web site, there is a collection of CSV _les for the monthly airline

delay data from July 2012 to June 2013, inclusive. This is a compressed tar _le Airline2012_13.tar.gz.

Within this archive, each _le name is of the form year_Month.csv, e.g. 2013_January.csv.

Download this _le and extract the _les into a single directory. Use a shell command not a point-

and-click GUI (graphical user interface).

We want to count the number of ights for the 5 airports OAK, SFO, SMF, LAX and JFK. The

tasks are simple to state.

i) Compute the number of outbound ights for each of the _ve airports OAK, SMF, LAX, SFO

and JFK, and sort these counts from largest to smallest.

Perform the same computations in R. Compare the total time for each approach.

ii) Compute the total number of ights in and out of the _ve airports, i.e., the sume of both the

inbound and outbound ights. You can do this however you want using a mix of the shell and R

code. One way is to _rst obtain the lines in the _les which involve any of these _ve airports. Then

obtain a count for each pair of airports, i.e., ORIGIN, DESTINATION pairs. At most, how many

will there be? Then read these counts by ORIGIN, DESTINATION pairs into R and compute the

total number of ights for each of the 5 airports.

Use only the UNIX shell tools to do i). For ii), use the shell tools to greatly reduce the data and

then _nish o_ the computations in R.

Work on a small subset of the data _rst to get the code working correctly. You can check the results

by doing the equivalent computations in R. Then run it on the larger data set. Make certain to try

this regardless of how powerful and capable your computer is. If your computer is not capable of

running on the full data set, run it for di_erent size input and show a plot of the time taken as a

function of number of lines processed.

Shell commands that may be useful include: sed, egrep, wc, sort, uniq, cut, man, ls, gunzip, tr,

head, tail, echo, cat, xargs. You probably don't need them all.

2 Basebal, Databases and SQL

In this part of the assignment, you will gain experience with databases and SQL, and of course R,

data manipulation and visualization.

2.1 Data

We will use data about many, many aspects of baseball. This data has been compiled by Sean

Leahman and he has kindly made them available for use by many. Je_ Knecht has made the

data, up to 2011, available as an SQLite database. It is available via cloning a git repository

(https://github.com/jknecht/lahmann-2013.sqlite) You can also retrieve from the class Web

site at http://eeyore.ucdavis.edu/stat141/Data/lahman2013.sqlite. As we saw in class,

there are 24 tables in this database. Each table has columns and rows. Documentation for each of

the tables is available at http://seanlahman.com/files/database/readme2013.txt.

2.2 Software

You will need to install the RSQLite package, typically using install.packages().

2.3 Questions

You can answer these questions with a combination of SQL commands and R manipulation of the

results, if necessary.

Give the answer and show the SQL and R code used to answer each question.

1. What years does the data cover? are there data for each of these years?

2. How many (unique) people are included in the database? How many are players, managers, etc?

3. What team won the World Series in 2000?

4. What team lost the World Series each year?

5. Do you see a relationship between the number of games won in a season and winning the World

Series?

6. In 2003, what were the three highest salaries? (We refer here to unique salaries, i.e., more than

one player might be paid one of these salaries.)

7. For 1999, compute the total payroll of each of the di_erent teams. Next compute the team

payrolls for all years in the database for which we have salary information. Display these in a plot.

8. Study the change in salary over time. Have salaries kept up with ination, fallen behind, or

grown faster?

 

9. Compare payrolls for the teams that are in the same leagues, and then in the same divisions.

Are there any interesting characteristics? Have certain teams always had top payrolls over the

years? Is there a connection between payroll and performance?

10. Has the distribution of home runs for players increased over the years?

When answering the questions, try to summarize the results in convenient and informative form

(e.g. tables and/or plots) that illustrate the key features.

2.4 Bonus Questions

Students who are looking for bonus points (e.g., to makeup for other assignments) can compose

additional questions and answer these. Make certain to explicitly state each question, indicate why

it is interesting, and answer it using the data, providing conclusions, evidence and the code used

to answer the question.

 

Available Answer
$ 14.00

[Solved] Statistics 141 - Homework 6 complete solutions correct answers key

  • This solution is not purchased yet.
  • Submitted On 09 Apr, 2016 12:31:47
Answer posted by
Online Tutor Profile
solution
Statistics 141 - Homework 6 complete solutions correct answers key NO LATE SUBMISSIONS Write a report showing the code, results and plots for the questions below. Put a printed version in Charles Arnold's mailbox in the Statistics department o_ce, 4th oor of the Mathematical Sciences Building, and send an electronic version to [email protected] with the subject STA141 Assignment 6. Place the following text at the top of your report and sign it on the physical version you submit: I certify that I have acknowledged any code that I used from any other person in the class, from Piazza or any Web site or book or other source. Any other work is my own. 1 UNIX Shell Tools In this part of the assignment, you will use UNIX shell tools to process data outside of R and also to get data into R. In the Data directory on the class Web site, there is a collection of CSV _les for the monthly airline delay data from July 2012 to June 2013, inclusive. This is a compressed tar _le Airline2012_13.tar.gz. Within this archive, each _le name is of the form year_Month.csv, e.g. 2013_January.csv. Download this _le and extract the _les into a single directory. Use a shell command not a point- and-click GUI (graphical user interface). We want to count the number of ights for the 5 airports OAK, SFO, SMF, LAX and JFK. The tasks are simple to state. i) Compute the number of outbound ights for each of th...
Buy now to view the complete solution
Other Similar Questions
User Profile
Aplus...

statistics maths 510

1. What was the average effect of the process change? Did the process average increase or decrease and by how much?...
User Profile
AceTu...

MATH 399N Statistics for Decision Making Week 6 iLab | Complete Solution

E @ 99% 0.585777149 Mean +E 7.271491435 Mean -E 6.099937136 It implies that there is 99% probability that mean lies between above mentioned values ...
User Profile
kimwo...

Create an inferential statistics (hypothesis) test using the research question and two variables

Create an inferential statistics (hypothesis) test using the research question and two variables your learning team developed for the Week 2 Business Research Project Part 1 assignment. Include: The research question Mock dat...
User Profile
Edna

Expert Only - Exam Due Sunday Statistics Need A! Economic Statistics and Probability

Attached is the file with detailed solutions and answers. Please message me if needing clarifications....
User Profile
Edna

PLEASE HELP ME IT IS URGENT!!!!

Please find answers and excel file attached. Please message if needing clarifications...

The benefits of buying study notes from CourseMerits

homeworkhelptime
Assurance Of Timely Delivery
We value your patience, and to ensure you always receive your homework help within the promised time, our dedicated team of tutors begins their work as soon as the request arrives.
tutoring
Best Price In The Market
All the services that are available on our page cost only a nominal amount of money. In fact, the prices are lower than the industry standards. You can always expect value for money from us.
tutorsupport
Uninterrupted 24/7 Support
Our customer support wing remains online 24x7 to provide you seamless assistance. Also, when you post a query or a request here, you can expect an immediate response from our side.
closebutton

$ 629.35