Cash-back offer from April 23rd to 27th, 2024: Get a flat 10% cash-back credited to your account for a minimum transaction of $50.Post Your Questions Today!

Question DetailsNormal
$ 28.00

LAB ACTIVITY #3 | Complete Solution

Question posted by
Online Tutor Profile
request

LAB ACTIVITY #3

For each part of this lab, SUBMIT your solution to everything asked in BOLD RED
in Problems 1-6 below. Include the respective R code and output.

Part 1  (Continuous Distributions in R)

In this part we will practice finding probabilities and percentiles for Uniform, Normal, and Exponential random variables using R. Every continuous distribution in R has a root name. For example, for the Uniform distribution the root name is unif, for Normal – norm, and for Exponential – exp. The following letter-prefixes of the root are used to generate respective functions for the distributions:

    d      for "density", the density function (for continuous distributions this is the pdf)
    p      for "probability", the cumulative distribution function (this is the cdf)
    q      for "quantile", the inverse cdf (this allows to find percentiles)   
    r       for "random", generates a random variable that has the specified distribution

For a random variable that has Uniform distribution on interval from a to b

dunif(x, a, b)     # f(x), the pdf at argument x of Unif(a, b) random variable
punif(x, a, b)     # F(x), the cdf at argument x of Unif(a, b) random variable  
qunif(p, a, b)     # xα, or the 100*(1-α)th percentile of Unif(a, b) random variable, where p=1-α
runif(m, a, b)     # generates m random values from Unif(a, b) distribution
 
Similarly, for a Normal random variable with parameters µ and σ2
 
dnorm(x, mu, sigma)     # f(x), the pdf at argument x of N(µ, σ2) random variable
pnorm(x, mu, sigma)     # F(x), the cdf at argument x of N(µ, σ2) random variable  
qnorm(p, mu, sigma)     # xα, or the 100*(1-α)th percentile of N(µ, σ2), where p=1-α
rnorm(m, mu, sigma)     # generates m random values from N(µ, σ2) distribution
 
NOTE: In R you do not need to standardize normal random variables! You can work with any normal variable directly as long as you specify its mean µ and standard deviation σ.

For an Exponential random variable with parameter λ

dexp(x, lambda)     # f(x), the pdf at argument x of Exp(λ) random variable
pexp(x, lambda)     # F(x), the cdf at argument x of Exp(λ) random variable  
qexp(p, lambda)     # xα, or the 100*(1-α)th percentile of Exp(λ), where p=1-α
rexp(m, lambda)     # generates m random values from Exp(λ) distribution


Example:
a) Let X have Exponential distribution with parameter 3. Find P(0.1<X<0.4) and the quartiles of X.

Using the cdf of X, P(0.1<X<0.4)=F(0.4)-F(0.1)=0.439624

> pexp(0.4,3) - pexp(0.1,3)
[1] 0.439624
The lower quartile is the 25th percentile, or the value  x_0.75 such that  F(x_0.75 )=0.25. Since we need to find the argument of cdf (not the cdf itself), we use the inverse cdf function and x_0.75=0.09589402.

> qexp(0.25,3)
[1] 0.09589402

The median is the 50th percentile, or x_0.5 such that  F(x_0.5 )=0.5. So x_0.5=0.2310491

> qexp(0.5,3)
[1] 0.2310491

The upper quartile is the 75th percentile, or x_0.25 such that  F(x_0.25 )=0.75. So x_0.25=0.4620981

> qexp(0.75,3)
[1] 0.4620981

b) Let X have Uniform distribution on [–10.3, 15.6]. Find P(X<12|X≥-3.5) and generate 5 values from this distribution.

Using the definition of conditional probability and the cdf of X, P(X<12│X≥-3.5)=(P(X<12 ∩ X≥-3.5))/(P(X≥-3.5))=(P(-3.5≤X<12))/(P(X≥-3.5))=(F(12)-F(-3.5))/(1-F(-3.5))=0.8115183

> (punif(12,-10.3,15.6) - punif(-3.5,-10.3,15.6)) / (1 - punif(-3.5,-10.3,15.6))
[1] 0.8115183

The generated values are

    > runif(4,-10.3,15.6)
[1]  1.840664 -3.343330  3.351622  1.127541

Note: The generated values are not going to be the same for different runs of this code.

c) Let X ~ N(30.6,13.8). Find P(X>28), P(X>28|X>25.1), the 92nd percentile and the x_(0.85). Also generate 10 values from this distribution.

Using the cdf, P(X>28)=1-F(28)=0.758004

> 1 - pnorm(28, 30.6, sqrt(13.8))
[1] 0.758004

NOTE: The second parameter is the standard deviation, not the variance! Also as mentioned above, there is no need to standardize when using R! (However, still have to standardize when using the table.)

Using the definition of conditional probability and the cdf, P(X>28|X>25.1)=(P(X>28 ∩  X>25.1))/(P(X>25.1))=
=(P(X>28))/(P(X>25.1))=(1-F(28))/(1-F(25.1) )=0.8145004

> (1 - pnorm(28, 30.6, sqrt(13.8))) / (1 - pnorm(25.1, 30.6, sqrt(13.8)))
[1] 0.8145004

The 92nd percentile, or x_0.08 is the value such that F(x_0.08 )=0.92. Then x_0.08=35.81961

> qnorm(0.92, 30.6, sqrt(13.8))
[1] 35.81961

Next, x_0.85 is the 15th percentile, that is F(x_0.85 )=0.15. Then x_0.85=26.74982

qnorm(0.15, 30.6, sqrt(13.8))
[1] 26.74982

And the 10 generated values are

> rnorm(10, 30.6, sqrt(13.8))
 [1] 29.48343 27.72339 38.01511 30.71013 35.83176 27.70189 37.64109 31.98042
 [9] 35.58243 34.76870


Now solve the problem below using what we’ve discussed.  

Problem 1:
a) The scores of a reference population on the Wechsler Intelligence Scale for Children (WISC) are normally distributed with mean 100 and variance 225. Find the following:

the probability that a randomly selected child has a score of 110 or more

the probability that a randomly selected child has a score of less than 88

the probability that a randomly selected child has a score between 88 and 110

the probability that a randomly selected child has a score of at least 105, given the score is below 135

the probability that a child scores within 2.5 standard deviations from the mean

the score that separates a child from the top 5% of the population from the bottom 95%. What do we call this value?

the 98th percentile of the scores

the 34th percentile of the scores

the median score


b) Suppose messages arrive to a computer server according to Poisson process with the rate of 6 per hour. Find the following:

the probability that there are no messages in the next five minutes

the probability that there are no messages in the next three minutes, if there were no messages in the previous 5 minutes

the probability that it will take between 5 and 10 minutes for the 11th message to arrive after the arrival of the 10th message

the average time till the next message arrives

the median of the time till the next message arrives


c) Busses arrive at a certain stop at 15-minute intervals starting 7:00 am. Suppose a passenger comes to the stop at a time uniformly distributed between 7 and 7:30 am. Find the following:

the probability that he waits for a bus less than 5 minutes

the probability that he waits for a bus more than 10 minutes

 


Part 2  (Normal Approximation to Binomial)

In this part let us consider how well the Normal approximation to Binomial works. The R functions for Binomial distributions were discussed in Lab activity #2, and the R functions for Normal distributions are discussed in Part 1 of this lab.

Problem 2:
Suppose that 42% of all drivers stop at an intersection having flashing red lights when no other cars are visible. Of 350 randomly selected drivers coming to an intersection under these conditions, let X be the number of those who stop among the 350 drivers. Suppose we want to find P(115 ≤ X ≤ 155).

what is the exact distribution of X?

use the exact distribution of X to compute the exact probability P(115 ≤ X ≤ 155) in R

Now let’s see how well the Normal approximation to the Binomial works.

check the condition(s) for the approximation.

compute the mean and the standard deviation of the approximate distribution of X.

now calculate the approximate probability P(115 ≤ X ≤ 155) using R and remember to use the continuity correction factor of 0.5!

comment on how well/poorly the Normal approximation to the Binomial works here


Part 3   (Integration in R)

It is also useful to know how to integrate in R in order to find probabilities, expectations, etc. for continuous distributions. Let us illustrate how we use R to find integrals with an example.

Example:
Suppose the pdf of a continuous random variable X is f(x)=(|x|)/25 e^(x/5) if x<0, and 0 otherwise.
a) Let’s check that f(x) is a valid pdf.

First, f(x) ≥ 0 for any x, because |x| and exponent are non-negative for any x.
We also need to check that ∫_(-∞)^∞▒f(x)dx=1. We only need to integrate over the support S_X={x<0}. So we need to verify that ∫_(-∞)^0▒〖(|x|)/25 e^(x/5) dx〗=1.

To find this integral in R, first we need to create this function that we will integrate:

f = function(x)            # this tells R to create a function f of one argument x
{                    # inside {} we specify the formula for the function
    abs(x)*exp(x/5)/25        # the formula for the function you need to integrate
}

Note that this can be also written as one line

f = function(x) { abs(x)*exp(x/5)/25 }

Now we tell R to integrate this function f from –∞ to 0:

integrate(f, -Inf, 0)             # Inf corresponds to the infinity

Note that the first argument of integrate(f,a,b) command is the function that’s integrated, the second and the third arguments are the lower and the upper bounds of integration, respectively.

The R output is:

> f = function(x)
+ {
+ abs(x)*exp(x/5)/25
+ }
> integrate(f, -Inf, 0)
1 with absolute error < 7.5e-06

Here the integral is equal to 1 (with a very small error of calculation < 7.5e-06). Thus, f(x) satisfies both conditions and is a pdf.

b) Find the probability that |X+3|<2. Also find the probability that |X+3|<20.

First, we rewrite the probability as probability about X so that we can use the pdf of X to find it by integration. Then  P(|X+3|<2)=P(-2<X+3<2)=P(-5<X<-1)=∫_(-5)^(-1)▒〖f(x)dx〗=
=∫_(-5)^(-1)▒〖(|x|)/25 e^(x/5) dx=〗 0.246718

Since the function we integrate is the same, we only need to use the integrate command:

> integrate(f, -5, -1)
0.246718 with absolute error < 2.7e-15

Now for  P(|X+3|<20)=P(-20<X+3<20)=P(-23<X<17)=∫_(-23)^17▒〖f(x)dx〗=
=∫_(-23)^0▒〖f(x)dx〗=∫_(-5)^0▒〖(|x|)/25 e^(x/5) dx=〗 0.9437097

Note that we only integrate only over the part of the condition that’s inside the support (because the density f(x) is 0 outside the support). So the integration is from -23 to 0 (not 17).

> integrate(f, -23, 0)
0.9437097 with absolute error < 1e-14

c) Find the expectation and the variance of X.

For this random variable, the expectation is  E(X)=∫_(-∞)^∞▒xf(x)dx=∫_(-∞)^0▒〖x (|x|)/25 e^(x/5) dx〗=-10

The function we need to integrate has changed (we integrate xf(x) here, not f(x)), so we need to specify it and let’s call it g(x) for example.

g = function(x)
{
    x*abs(x)*exp(x/5)/25
}

And now we integrate it

integrate(g, -Inf, 0)        # because we integrate function g(x) here, g is the first argument

The R output is:

> g = function(x)
+ {
+ x*abs(x)*exp(x/5)/25
+ }
> integrate(g, -Inf, 0)
-10 with absolute error < 4.6e-07

Recall the variance definition  Var(X)=E(〖(X〗^2 )-(E(X))^2=E(〖(X〗^2 )-(-10)^2=E(〖(X〗^2 )-100.
Since for E(X^2 )=∫_(-∞)^∞▒〖x^2 f(x)dx〗=∫_(-∞)^0▒〖x^2  (|x|)/25 e^(x/5) dx〗  we integrate x^2  (|x|)/25 e^(x/5), we need to specify it in R, let’s call it h(x) for example.

h = function(x)
{
    x^2*abs(x)*exp(x/5)/25
}

Now to compute E(X^2 ) we use:

integrate(h, -Inf, 0)

The output for E(X^2 ) is:

> integrate(h, -Inf, 0)
150 with absolute error < 0.00019

Therefore Var(X)=150-100=50


Now solve the problem below.

Problem 3:
(Based on #5 from section 3.3) The lifetime X, in months, of certain equipment is believed to have pdf  f(x)=1/100 xe^(-x/10)  if x>0, and 0 otherwise. Use R commands for the needed integrations to find

the probability that the lifetime of the equipment is more than 5 years

the mean of X

the variance of X

 


Part 4   (Scatterplots and correlations)

One of the things we’ve talked about is correlation. Recall that it measures the strength of the linear relationship between two random variables.

Check out the nice and useful animated illustration of the relationship between scatterplots and correlation coefficient that can be found at http://www.bolderstats.com/jmsl/doc/ (see CorrelationPicture option).  Play with different values for a better understanding.


Problem 4:
Consider the following four graphs

GRAPH A
 


GRAPH B
 

 

 

 

 


GRAPH C
 


GRAPH D
 


Here are four possible values of correlation:  0.90, 0.52, 0.003, –0.14.

Fill out the table below by matching these four numbers with the corresponding graph that you think has that correlation and provide a short explanation for each choice.

Graph    Correlation value    Why this value?
A        

B        

C        

D        

 

 


Part 5   (Central Limit Theorem)

The Central Limit Theorem is one of the most important and useful results in statistics. It is used as theoretical basis for many statistical methods, in particular for confidence intervals and hypothesis testing that we will discuss later in the semester.

The CLT states: if  X_1,…,X_n be iid with mean μ and variance σ^2 and n is large enough (n≥30), then the approximate distribution of their sum and the sample mean are, respectively,
X_1+⋯+ X_n  ≈ N(nμ,nσ^2 )
X ̅=(X_1+⋯+ X_n)/n  ≈ N(μ,σ^2/n).

Example:
Let’s verify how the CLT works by studying the distribution of the sample mean. Let n = 40 and X_1,…,X_40 are iid Poisson(5). Study the distribution of X ̅=(X_1+⋯+ X_40)/40  based on 1000 simulated samples of size n.

For Poisson(5) distribution μ=E(X_i)=5 and σ^2=Var(X_i )=5. The CLT says that X ̅ is approximately normal with mean  μ=5 and variance  σ^2/n=5/40=0.125. Let’s verify this.

We will simulate m = 1000 samples; each sample consists of n = 40 values for which we compute the sample mean. This way we will obtain 1000 sample means (from each of these 1000 samples.)

n = 40                # sample size n
s_m = array(0,1000)        # creates a vector of length 1000 to record the 1000 sample means that will be created further; first parameter says that for now this is a vector of all 0s, and the second parameter specifies the lensth of the vector named s_m (for sample mean)

for(i in 1:1000)        # use a for-loop in R to fill out the entries of the vector above by calculating the sample mean for each sample: on step i, i = 1,…, 1000, we will generate ith sample of size 40 and compute the ith sample mean of the generated values and record it as the ith entry of vector s_m
{    
    x = rpois(n,5)        # generate n=40 values from Poisson(5) distribution
    s_m[i] = sum(x)/n    # compute sample average of the generated values and record in s_m
}
To get an idea of the distribution of the created 1000 sample means, we construct their histogram and also on the top of it draw the normal curve to determine how off the histogram is:

hist(s_m, prob=TRUE, breaks=15, density=15)    # create histogram for the entries of s_m    

xnorm = seq(min(s_m),max(s_m),length=40)     # create the grid of points
ynorm = dnorm(xnorm, mean(s_m), sd(s_m))     # compute the normal density at these points
lines(xnorm, ynorm, col=2, lwd=2)            # add the normal curve to the histogram above

The resulting histogram and the plot are below (note that for different runs of the code, i.e. for different generated samples, the plot is going to be slightly different)
 
The mean and variance of the 1000 sample means we created are, respectively

> mean(s_m)
[1] 4.99675
> var(s_m)
[1] 0.1221428

Compared to the theoretical values based on CLT (see above), the values are close.


Problem 5:
Verify how the CLT works for the sample mean when the sample size n = 38 and X_1,…,X_38 are iid Exp(4). Study the distribution of X ̅=(X_1+⋯+ X_38)/38  based on 1000 simulated samples of size n=38.

find the (theoretical) mean μ and variance σ^2 of the X_i’s

find the (theoretical) mean and variance of the sample mean (based on CLT)

include the histogram of the created 1000 sample mean values with the normal curve.

report the mean and variance of the created 1000 sample mean values and compare them to the theoretical values

ss

Problem 6:
Penn State Fleet which operates and manages car rentals for Penn State employees found that the tire lifetime for their vehicles has a mean of 50,000 miles and standard deviation of 3500 miles. They consider a random sample of 50 vehicles. Find the following:

what is the approximate distribution, the mean, and the variance of the mean lifetime? (Hint: Use the CLT)

find the probability that the sample mean lifetime for these 50 vehicles exceeds 52,000

Available Answer
$ 28.00

[Solved] LAB ACTIVITY #3 | Complete Solution

  • This Solution has been Purchased 1 time
  • Submitted On 18 Nov, 2015 12:37:12
Answer posted by
Online Tutor Profile
solution
# Problem 1 # (a) Normal Distribution 1 - pnorm(110,100,sqrt(225)) pnorm(88,100,sqrt(225)) pnorm(110,100,sqrt(225))-pnorm(88,100,sqrt(225...
Buy now to view the complete solution
Other Similar Questions
User Profile
newnu...

2022SU BIO 210 W02 Anatomy & Physiology I Chapter 11 Practice Lab Activity Appendicular

2022SU BIO 210 W02 Anatomy & Physiology I Chapter 11 Practice Lab Activity Appendicular...
User Profile
joe96

2022SU BIO 210 W02 Anatomy & Physiology I Chapter 11 Practice Lab Activity Appendicular

2022SU BIO 210 W02 Anatomy & Physiology I Chapter 11 Practice Lab Activity Appendicular...
User Profile
QuizM...

Gizmo Lab Activity: Human Homeostasis

Gizmo Warm-up To survive, an organism must be able to maintain stable internal conditions in a changing environment. This process is called homeostasis. The Human Homeostasis Gizmo™ allows you to explore how the human body...
User Profile
austi...

PSY2061 Lab Activity 2 | Complete Solution

An experiment refers to a research strategy that aims to establish a cause­andeffect explanation for the......
User Profile
AceTu...

CPIS 358 Lab Activity 1 | Complete Solution

This Solution is rated A+ previously,if you have any questions regarding this tutorial than you can contact me. Check your E-mail inbox to download your solution document....

The benefits of buying study notes from CourseMerits

homeworkhelptime
Assurance Of Timely Delivery
We value your patience, and to ensure you always receive your homework help within the promised time, our dedicated team of tutors begins their work as soon as the request arrives.
tutoring
Best Price In The Market
All the services that are available on our page cost only a nominal amount of money. In fact, the prices are lower than the industry standards. You can always expect value for money from us.
tutorsupport
Uninterrupted 24/7 Support
Our customer support wing remains online 24x7 to provide you seamless assistance. Also, when you post a query or a request here, you can expect an immediate response from our side.
closebutton

$ 629.35