chapter 15 final (homework answers)

Upload: kelly-johnson

Post on 07-Aug-2018

227 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/20/2019 Chapter 15 Final (Homework Answers)

    1/12

    Chapter 15

    15.1 The correlation is = 0.994, and the least-squares linear regression equation is

    , where

    ˆ 3.66 1.1969 y = − +  x  y  = humerus length and  x = femur length. The scatterplot with the

    regression line below shows a strong, positive, linear relationship. Yes, femur length is a very

    good predictor of humerus length.

    Femur length (cm)

       H  u  m  e  r  u  s   l  e  n  g   h   t   (  c  m   )

    7570656055504540

    90

    80

    70

    60

    50

    40

     

    15.2 (a) The least-squares regression line is ˆ 11.547 0.84042 y x= + , where y = height (inches)

    and x in arm span (inches). (b) Yes, the least-squares line is an appropriate model for the data

     because the residual plot shows an unstructured horizontal band of points centered at zero.

    Since 76 inches is within the range of arm spans examined in Mr. Shenk’s class, it is reasonable

    to predict the height of a student with a 76 inch arm span.

    15.3 (a) The observations are independent because they come from 13 unrelated colonies. (b)

    The scatterplot of the residuals against the percent returning (below on the left) shows nosystematic deviations from the linear pattern. (c) The spread may be slightly wider in the middle,

     but not markedly so. (d) The histogram (below on the right) shows no outliers or strong

    skewness, so there are no clear deviations from Normality.

    Percent return

      r  e  s   i   d  u  a   l

    8070605040

    5.0

    2.5

    0.0

    -2.5

    -5.0

    -7.5

    0

     residual

       C  o  u  n   t

    5.02.50.0-2.5-5.0

    5

    4

    3

    2

    1

    0

    15.4 (a) The observations are independent because they come from 16 different individuals. (b)

    The scatterplot of the residuals against nonexercise activity (below on the left) shows no

    systematic deviations from the linear pattern. One residual, about 1.6, is slightly larger than theothers, but this is nothing to get overly concerned about. (c) The spread is slightly higher for

    larger values of nonexercise acitvity, but not markedly so. (d) The histogram (below on the right)

    313

  • 8/20/2019 Chapter 15 Final (Homework Answers)

    2/12

    shows no outliers and a slight skewness to the right, but this does not suggest a lack of

     Normality.

    Nonexercise act ivity (calories)

       R  e  s   i   d  u  a   l

    7006005004003002001000-100

    2.0

    1.5

    1.0

    0.5

    0.0

    -0.5

    -1.0

    0

     Residual

       C  o  u  n   t

    1.51.00.50.0-0.5-1.0

    5

    4

    3

    2

    1

    0

     

    15.5 (a) The slope parameter  β  represents the change in the mean humerus length when femur

    length increases by 1 cm. (b) The estimate of  β  is 1.1969b = , and the estimate of α   is

    . (c) The residuals are −0.8226, −0.3668, 3.0425, −0.9420, and −0.9110, and their sum

    is −0.0001. The standard deviation is estimated by

    3.66a = −

    ( )2 11.791.982

    2 3

    resid s

    n= =

    ∑ .

    15.6 (a) The scatterplot (below on the left) shows a strong, positive linear relationship between

     x = speed (feet/second) and y = steps (per second). The correlation is 0.999r  =  and the least-

    squares regression line is . (b) The residuals (rounded to 4 decimal

     places) are 0.0106, −0.0013, −0.0010, −0.0110, −0.0093, 0.0031, and 0.0088, and their sum is

    −0.0001 (essentially 0, except for rounding error). (c) The estimate of

    ˆ 1.76608 0.080284 y = +  x

    α   is , the

    estimate of

    1.76608a =

     β  is , and the estimate of0.080284b =   σ   is0.00041

    0.00915

    s =   .

    Speed (ft/s)

       S   t  e  p  s   (  p  e  r  s  e  c  o  n   d   )

    2221201918171615

    3.6

    3.5

    3.4

    3.3

    3.2

    3.1

    3.0

     

    15.7 (a) The scatterplot below shows a strong, positive linear relationship. (b) The slope  β   

    gives this rate. The estimate of  β   is listed as the coefficient of “year” in the output, b =

    9.31868 tenths of a millimeter. (c) We are not able to make an inference for the tilt rate from asimple linear regression model, because the observations are not independent.

    314 Chapter 15

  • 8/20/2019 Chapter 15 Final (Homework Answers)

    3/12

     Year (coded as last two digits)

       L  e  a  n   (  c  o   d  e   d   f  r  o  m   2 .   9  m  e   t  e  r  s   )

    87858381797775

    750

    725

    700

    675

    650

     

    15.8 (a) The least-squares regression line is ˆ 0.12049 0.008569 y x= + , where y = the proportion

    of perch killed and x = the number of perch. The fact that the slope is positive tells us that as the

    number of perch increases, the proportion being killed by bass also increases. (b) The regression

    standard error is s = 0.1886, which estimates the standard deviation σ  . (c) Who?  The

    individuals are kelp perch. What?  The response variable is the proportion of perch killed and

    the explanatory variable is the number of perch available (or in the pen); both variables aquantitative. Why?  The researcher was interested in examining the relationship between

     predators and available prey. When, where, how, and by whom?  Todd Anderson published the

    data obtained from the ocean floor off the coast of southern California in 2001.Graphs: Thescatterplot provided clearly shows that the proportion of perch killed increases as the number of

     perch increases.  Numerical Summaries The mean proportions of perch killed are 0.175, 0.283,

    0.425, and 0.646, in order from smallest to largest number of perch available. Model The least-squares regression model is provided in part (a).  Interpretation  The data clearly support the

     predator-prey principle provided. (Students will soon learn how to formally test this hypothesis.)

    (d) Using df = 16 − 2 = 14 and , a 95% confidence interval for* 2.145t   =   β   is

    = (0.0033, 0.0138). We are 95% confident that the proportion of

     perch killed increases on average between 0.0033 and 0.0138 for each addition perch added to

    the pen.

    0.008569 2.145 0.002456± ×

     

    15.9 The regression equation is , where =calories andˆ 560.65 3.0771 y = −  x y  x =time. The

    scatterplot with regression line (below) shows that the longer a child remains at the table, the

    fewer calories he or she will consume. The conditions for inference are satisfied. Using df = 18

    and , a 95% confidence interval for* 2.101t   =   β   is 3.0771 2.101 0.8498− ± ×  = (−4.8625,

    −1.2917). With 95% confidence, we estimate that for every extra minute a child sits the table, heor she will consume an average of between 1.29 and 4.86 calories less during lunch.

    Inference for Regression 315

  • 8/20/2019 Chapter 15 Final (Homework Answers)

    4/12

    Time ( average number of minutes)

       C  a   l  o  r   i  e  s   (  a  v  e  r  a  g  e  n  u  m   b  e  r   )

    454035302520

    520

    500

    480

    460

    440

    420

    400

     

    15.10 (a) Excel’s 95% confidence interval for  β  is (0.0033, 0.0138). This matches the

    confidence interval calculated in Exercise 15.8. We are 95% confident that the proportion of

     perch killed increases on average between 0.0033 and 0.0138 for each addition perch added to

    the pen. (b) See Exercise 15.8 part (d) for a verification using the Minitab output. Using df =

    16 − 2 = 14 and with the Excel output, a 95% confidence interval for* 2.145t   =   β   is

    = (0.0032, 0.0140). (c) Using df = 16 − 2 = 14 and , a 90%

    confidence interval for

    0.0086 2.145 0.0025± × * 1.761t   =

     β   is 0.0086 1.761 0.0025± ×  = (0.0042, 0.0130).

    15.11 (a) The least-squares regression line from the S-PLUS output is ˆ 3.6596 1.1969 y x= − + ,

    where  y  = humerus length and  x = femur length. (b) The test statistic is

    1.196915.9374

    0.0751b

    bt 

    SE = =   . (c) The test statistic t  has df = 5 − 2 =3. The largest value in Table D

    is 12.92. Since 15.9374 > 12.92, we know that P-value < 0.0005. (d) There is very strong

    evidence that  β  > 0, that is, the line is useful for predicting the length of the humerus given the

    length of the femur. (e) Using df = 3 and , a 99% confidence interval for*

    5.841t   =   β   is= (0.7582,1.6356). We are 99% confident that for every extra centimeter

    in femur length, the length of the humerus will increase on average between 0.7582 cm and

    1.6356 cm.

    1.1969 5.841 0.0751± ×

     

    15.12 (a) The value of or 99.8% is very close to one (or 100%), which indicates

     perfect linear association. (b) The slope parameter

    2 0.998r   =

     β gives this rate. Using df = 5 and

    , a 99% confidence interval for* 4.032t   =   β   is 0.080284 4.032 0.0016± ×  = (0.0738, 0.0867).

    We are 99% confident that the rate at which steps per second increase as running speed increases

     by 1 ft/s is on average between 0.0738 and 0.0867.

    15.13 (a) The scatterplot (below) with regression line shows a strong, positive linear association between the number of jet skis in use (explanatory variable) and the number of accidents

    (response variable). (b) We want to test 0 : H  0 β   =  (there is no association between number of jet

    skis in use and number of accidents) versus :a H  0 β  >  (there is a positive association between

    number of jet skis in use and number of accidents). (c) The conditions are independence, themean number of accidents should have a linear relationship with the number of jet skis in use,

    the standard deviation should be the same for each number of jet skis in use, and the number of

    316 Chapter 15

  • 8/20/2019 Chapter 15 Final (Homework Answers)

    5/12

    accidents should follow a Normal distribution. The conditions are satisfied except for having

    independent observations, so we will proceed with caution. (d) LinRegTTest reports that t  =21.079 with df = 8 and P-value is 0.000. With the earlier caveat, there is very strong evidence to

    reject and conclude that there is a significant positive association between number of

    accidents and number of jet skis in use. As the number of jet skis in use increases, the number of

    accidents significantly increases. (e) Using df = 8 and , a 98% confidence interval for

    0 H 

    *

    2.896t   = β   is 0.0048  = (0.0042, 0.0054). With 98% confidence, we estimate that for

    every extra thousand jet skis in use, the number of accidents increase by a mean of between 4.2

    and 5.4 per year.

    2.896 0.0002± ×

    Number of jet skis in use

       N  u  m   b  e  r  o   f  a  c  c   i   d  e  n   t  s

    9000008000007000006000005000004000003000002000001000000

    4000

    3000

    2000

    1000

    0

     

    15.14 (a) We want to test 0 : H  0 β  =  (there is no association between yearly consumption of wine

    and deaths from heart disease) versus :a H  0 β  <  (there is a negative association between yearly

    consumption of wine and deaths from heart disease). The data are obtained from different

    nations, so independence seems reasonable. The other conditions of constant variance, linear

    relationship and Normality are also satisfied. The test statistic is22.969

    6.463.557

    t   −

    = − with df =

    17 and P-value < 0.0005. Since the P-value is smaller than any reasonable significance level,say 1%, we reject 0 H  . We have very strong evidence of a significant negative association

     between the consumption of wine and deaths from heart disease. (b) Using df = 17 and

    , a 95% confidence interval for* 2.110t   =   β   is 22.969 2.110 3.557− ± ×  = (−30.4743, −15.4637).

    With 95% confidence, we estimate that the number of deaths from heart disease (per 100,000

     people) decreases on average between 15.46 and 30.47 for each additional liter of wineconsumed (per person).

    Wine consumption (liters per person)

       D  e  a   t   h  s   f  r  o  m   h  e  a  r   t   d   i  s  e  a  s  e   (  p  e  r   1   0   0 ,   0   0   0  p  e  o  p   l  e   )

    9876543210

    300

    250

    200

    150

    100

    50

     

    Inference for Regression 317

  • 8/20/2019 Chapter 15 Final (Homework Answers)

    6/12

     

    15.15 (a) The scatterplot below shows a moderately strong, positive linear association between y = number of beetle larvae clusters and x = number of beaver-caused stumps. (b) The least-

    squares regression line is . =83.9%, so regression on stump counts explains

    83.9% of the variation in the number of beetle larvae. (c) We want to test

    ˆ 1.286 11.894 y = − +  x

    0

    2r 

    0: H    β   =  versus

    :a H  0 β   ≠ . The conditions for inference are met, and the test statistic is 10.47t  = with df = 21.The output shows P-value = 0.000, so we have very strong evidence that beaver stump countshelp explain beetle larvae counts.

    Number of beaver -caused stumps

       N  u  m   b  e  r  o   f   b  e  e   t   l  e   l  a  r  v  a  e  c   l  u  s   t  e  r  s

    54321

    60

    50

    40

    30

    20

    10

    0

     

    15.16 (a) The mean of the standardized residuals is 0.00174 and the standard deviation is 1.014.Since the residuals are standardized, we expect the mean and standard deviation to be close to 0

    and 1, respectively. (b) A stemplot is shown below on the left. The distribution is slightly

    skewed to the left, but this is not unusual for a small data set. There are no striking departures

    from Normality. For a standard Normal distribution, we would expect 95% of the observationsto fall between −2.0 and 2.0. Thus, −1.99 is quite reasonable. (c) The residual plot on the right

     below shows no obvious patterns.

    Stem- and- l eaf of Resi dual s N = 23Leaf Uni t = 0. 10

    3 - 1 9655 - 1 306 - 0 710 - 0 4422

    ( 4) 0 02249 0 567894 1 2233

    Number of beave r-caused st umps

       S   t  a  n   d  a  r   d   i  z  e   d  r  e  s   i   d  u  a   l  s

    54321

    1.5

    1.0

    0.5

    0.0

    -0.5

    -1.0

    -1.5

    -2.0

    0

     

    CASE CLOSED!(1) Descriptive statistics for x = number of three-point shots taken and y = percent made are

    shown below. The average number of three-point shots taken per game is 15.684 and the

    standard deviation is 2.865. The average percent of three-point shots made per game is 35.379

    and the standard deviation is 1.425. The correlation is 0.958r  = −  and the scatterplot belowshows a negative association between these two variables. Notice that the cluster of points in the

     bottom right corner shows some positive association, but the overall association between x and y is clearly negative.

    318 Chapter 15

  • 8/20/2019 Chapter 15 Final (Homework Answers)

    7/12

     Vari abl e N Mean St Dev Mi ni mum Q1 Medi an Q3 Maxi mum Taken 19 15. 684 2. 865 9. 200 13. 800 17. 100 17. 700 18. 300Percent 19 35. 379 1. 425 34. 100 34. 400 34. 600 36. 200 38. 400

    Number of 3-pointers taken

       P  e  r  c  e  n   t  o   f   3  -  p  o   i  n   t  e  r  s  m  a   d  e

    1816141210

    39

    38

    37

    36

    35

    34

     (2) The least-squares regression line is ˆ 42.8477 0.4762 y x= −

    2 0.917r   =

    0

     with or 91.7%. The

    linear model provides a reasonably good fit for these data. However, the residual plot shows aclear pattern with positive residuals for small and large numbers of 3-pointers taken and negativeresiduals in between the two extremes.

    (3) The point is tagged as being influential because it may have a considerable impact on the

    regression line. Influential points often pull the regression line in their direction so the residualstend to be small for influential points.

    (4) We want to test 0 : H    β   =  versus :a H  0 β   ≠ . Independence is reasonable because the data are

    from different seasons. The linear relationship condition is met, but the constant variance

    condition and the Normality are both questionable so we will proceed with caution. A histogramof the percent made below shows that the distribution is skewed to the right. The test statistic is

    with df = 17 and P-value = 0.000. We have very strong evidence of a significant

    association between the number of three-pointers taken and the percent made.

    13.7t  = −

    Percent of thee-pointers made

       C  o  u  n   t

    3837363534

    9

    8

    7

    6

    5

    4

    3

    2

    1

    0

     

    (5) Using df = 17 and , a 95% confidence interval for* 2.110t   =   β   is 0.4762 2.110 0.03475− ± ×  

    = (−0.5495, −0.4029). With 95% confidence, we estimate that for every additional three-pointer

    taken, the percent made will decrease on average between 0.40 and 0.55.

    15.17 Regression of fuel consumption on speed gives 0.01466b = − , , and

    with df = 13 and P-value= 0.541. Thus, we have no evidence to suggest a straight-

    0.02334bSE   =

    0.63t  = −

    Inference for Regression 319

  • 8/20/2019 Chapter 15 Final (Homework Answers)

    8/12

    line relationship between speed and fuel use. The scatterplot below shows a strong relationship

     between speed and fuel use, but the relationship is not linear. See Exercise 3.9 for more details.

    Speed (km/h)

       F  u  e   l  c  o  n  s  u  m  p   t   i  o  n

    160140120100806040200

    22.5

    20.0

    17.5

    15.0

    12.5

    10.0

    7.5

    5.0

     

    15.18 Repeated measurements of Sarah’s height are clearly not independent.

    15.19 (a) The slope β   tells us the mean change in the percent of forest lost for a 1 unit (1 cent

     per pound) increase in the price of coffee. The estimate of  β   is 0.05525b = and the estimate of

    α  is . (b) This says that the straight-line relationship described by the least-squares

    line is very strong. = 0.907 or 91% indicates that 91% of the total variation in the percent of

    forest lost is accounted for by the straight-line relationship with prices paid to coffee growers.

    (c) The P-value refers to the two-sided alternative:

    1.0134a = −2r 

    0 : H  0 β   =  versus :a H  0 β   ≠ . The small P-

    value indicates that we have very strong evidence of a significant association between the

     percent of forest lost and the price paid for coffee. (d) The residuals are −0.0988, 0.3934,

    −0.2800, −0.2053, and 0.1907, and their sum is 0. The standard deviationσ  is estimated by

    0.32150.3274

    3s =   . (e) A scatterplot (on the left) and a residual plot (on the right) are shown

     below. Even though the number of observations is small, there are no obvious problems with thelinear regression model. Coffee price appears to be a very good predictor of forest lost for this

    range of values.

    Price (cents per pound)

       F  o  r  e  s   t   l  o  s   t   (  p  e  r  c  e  n   t   )

    7060504030

    3.0

    2.5

    2.0

    1.5

    1.0

    0.5

     Price (cents per pound)

       R  e  s   i   d  u  a   l

    7060504030

    0.4

    0.3

    0.2

    0.1

    0.0

    -0.1

    -0.2

    -0.3

    0

     

    15.20 (a) The scatterplot below, with the regression line ˆ 70.436874 274.7821 y x= + , shows a

    moderate, positive, linear association. The linear relationship explains or 49.3% of

    the variation in gate velocity. (b) We want to test

    2 0.493r   

    0 : H  0 β   =  versus :a H  0 β   ≠ . The test statistic

    320 Chapter 15

  • 8/20/2019 Chapter 15 Final (Homework Answers)

    9/12

    is274.7821

    3.116388.17712

    t  =    with df = 10 and P-value = 0.011. (Table C indicates that 0.01 < P-

    value < 0.02.) Since the P-value < 0.05, we reject 0 H   and conclude that there is a significant

    linear relationship between thickness and gate velocity. The regression formula might be used asa rule of thumb for new workers to follow, but the wide spread in the scatterplot below suggests

    that there may be other factors that should be taken into account in choosing the gate velocity.

    Cylinder wall thickness (inches)

       G  a   t  e  v  e   l  o  c   i   t  y   (   f   t   /  s  e  c   )

    0.90.80.70.60.50.40.30.2

    350

    300

    250

    200

    150

    100

     

    15.21 (a) A scatterplot with the regression line is shown below. or 99.2%. (b) The

    estimates of

    2 0.992r   =

    α  ,  β  , and σ   are a = −2.3948 cm, b = 0.1585 cm / min, and s = 0.8059 cm. (c) The

    least-squares regression line is , where  = length andˆ 2.3948 0.1585 y x= − +  y  x = time.

    Time (min)

       L  e  n  g   t   h   (  c  m   )

    200150100500

    30

    25

    20

    15

    10

    5

    0

     

    15.22 (a) A scatterplot with the least-squares regression line ˆ 3.5051 0.0034 y x= − is shown

     below. We want to test 0 : H  0 β   =  versus :a H  0 β  < . The test statistic is 4.64t  = − with df = 14

    and P-value < 0.0005. We have very strong evidence that people with higher NEA gain less fat.

    (b) To find this interval, we need , which is given in the Minitab output below as

    0.0007414. Using df = 14 and , a 90% confidence interval for

    bSE 

    * 1.761t   =   β   is

    = (−

    0.0047,−

    0.0021).0.00344 1.761 0.0007414− ± ×

     

    Inference for Regression 321

  • 8/20/2019 Chapter 15 Final (Homework Answers)

    10/12

    NEA change (cal)

       F  a   t  g  a   i  n   (   k  g   )

    7006005004003002001000-100

    4

    3

    2

    1

    0

     

     The r egressi on equat i on i sFat gai n ( kg) = 3. 51 - 0. 00344 NEA change (cal )

    Predi ct or Coef SE Coef T PConst ant 3. 5051 0. 3036 11. 54 0. 000NEA change ( cal ) - 0. 0034415 0. 0007414 - 4. 64 0. 000

    S = 0. 739853 R- Sq = 60. 6% R- Sq( adj ) = 57. 8%

     

    15.23 (a) A scatterplot is shown below. There is a moderate, positive, linear association betweeninvestment returns in the U.S. and investments overseas. (b) The test statistic is

    0.61812.6091

    0.2369b

    bt 

    SE = =    with df = 25 and 0.01 < P-value < 0.02. Thus, we have fairly strong

    evidence that there is a significant linear relationship between the two returns. That is, the slope

    is nonzero. (c) or 21.4%, so only 21.4% of the variation in the overseas returns is

    explained by using linear regression with U.S. returns as the explanatory variable. Using this

    linear regression model for prediction will not be very useful in practice.

    2 0.214r   =

    U.S. return (%)

       O  v  e  r  s  e  a  s  r  e   t  u  r  n   (   %   )

    403020100-10-20-30

    70

    60

    50

    40

    30

    20

    10

    0

    -10

    -20

     

    15.24 (a) The residual plot (below on the left) shows that the variability about the regression

    line increases as the U.S. return increases. (b) The histogram (below on the right) indicates that

    the distribution of the residuals is skewed to the right. The outlier is from 1986, when theoverseas return was much higher than our regression model predicts.

    322 Chapter 15

  • 8/20/2019 Chapter 15 Final (Homework Answers)

    11/12

    U.S. return (%)

       R  e  s   i   d  u  a   l

    403020100-10-20-30

    60

    50

    40

    30

    20

    10

    0

    -10

    -20

    -30

    0

     Residual

       C  o  u  n   t

    40200-20

    6

    5

    4

    3

    2

    1

    0

     

    15.25 (a) The scatterplot below (on the left) shows a weak, negative association between corn

    yield and weeds. The least-squares regression line is ˆ 166.483 1.0987 y x= − , where  y  = corn yield

    (bushels per acre) and  x = weeds (per meter). or 20.9%, so the linear relationship

    explains about 20.9% of the variation in yield. (b) The t  statistic for testing

    2 0.209r   =

    0: H  0 β  =  versus

    : 0a

     H    β  <  is with df = 14 and P-value = 0.0375. Since 0.0375 < 0.05, there is

    sufficient evidence to conclude that more weeds reduce corn yields. (c) The small number of

    observations for each value of the explanatory variable (weeds/meter), the large variability inthose observations, and the small value of r 

    1.92t  = −

    2 will make prediction with this model imprecise. A

    residual plot below (on the right) also shows that the linear model is quite imprecise.

    Weeds (per meter)

       C  o  r  n  y   i  e   l   d   (   b  u  s   h  e   l  s  p  e  r  a  c  r  e   )

    9876543210

    180

    170

    160

    150

    140

     Weeds per meter

       R  e  s   i   d  u  a   l

    9876543210

    10

    5

    0

    -5

    -10

    -15

     

    Inference for Regression 323

  • 8/20/2019 Chapter 15 Final (Homework Answers)

    12/12

    15.26 Using df = 21 and , a 90% confidence interval for* 1.721t   =   β   is

    = (−12.9454, −6.4444). With 90% confidence, we estimate that for

    each one minute increase in time (a slower, more leisurely swim) the professor’s pulse will drop

    on average between 6 and 13 beats per minute. There is a negative relationship between the professor’s swimming time and heart rate. A scatterplot is shown below.

    9.6949 1.721 1.8887− ± ×

    Time (in minutes)

       P  u   l  s  e   (   b  e  a   t  s  p  e  r  m   i  n  u   t  e   )

    36.536.035.535.034.534.0

    160

    150

    140

    130

    120

     

    324 Chapter 15