practice p27. p26. a. growth. 200...

12
Practice P26. a. This plot shows the data for Dying Dice. 2 4 6 Roll Number Population 8 10 12 0 40 80 120 160 200 b. The transformed data show a nice linear trend. 2 4 6 Roll Number ln(population) 8 10 12 0 1.0 2.0 3.0 4.0 5.0 c. The equation of the line in part b is ˆ ln y 5.22 0.435x or ln(pop) 5.22 0.435 roll number Because e 0.435 0.647, the rate of dying is estimated to be 1 0.647, or about 0.353, or 35%, per time period. This rate is close to the theoretical probability of dying, set up to be 1/3. d. The residual plot shows some curvature, indicating a death rate of a little more than 0.35 in the early stages and a little less than 0.35 in the later stages. This kind of pattern would not be unusual in data on real animals. 0 2 4 6 Roll Number Residual 8 10 12 –0.20 –0.10 0 0.10 0.20 0.30 P27. The Florida population shows a definite nonlinear trend that could represent exponential growth. 1800 1850 1900 1950 Year Population 2000 0 3,000,000 6,000,000 9,000,000 12,000,000 15,000,000 a. The log transformation transforms the pattern to a linear one that can be summarized by a straight line. 10 11 12 13 14 15 16 17 1820 1860 1900 1940 1980 2020 Year ln(Population) b. The equation of the line is ln(pop) 54.9342 0.03583 year so that pop e 54.9382 ( e 0.03583 ) year e 54.9382 (1.036) year for a growth rate of 3.6% per year, which is a high rate of growth. c. The residual plot shows a pattern. Florida grew less rapidly than the model predicts up to about 1845, then grew more rapidly than predicted, then less, then more. A big jump in growth occurred between 1950 and 1960. Then in 2000, there was a big drop in growth. Residual –0.20 –0.15 –0.10 –0.05 0.00 0.05 0.10 0.15 1820 1860 1900 1940 1980 2020 Year 142 Section 3.5 Solutions Statistics in Action Instructor’s Guide, Volume 1 © 2008 Key Curriculum Press

Upload: vanthien

Post on 20-Apr-2018

214 views

Category:

Documents


1 download

TRANSCRIPT

Practice P26. a. This plot shows the data for Dying Dice.

2 4 6Roll Number

Pop

ula

tion

8 10 120

40

80

120

160

200

b. The transformed data show a nice linear trend.

2 4 6Roll Number

ln(p

opu

lati

on)

8 10 120

1.0

2.0

3.0

4.0

5.0

c. The equation of the line in part b is

ln y � 5.22 � 0.435x

or

ln(pop) � 5.22 � 0.435 roll number

Because e �0.435 � 0.647, the rate of dying is estimated to be 1 � 0.647, or about 0.353, or 35%, per time period. This rate is close to the theoretical probability of dying, set up to be 1/3.d. The residual plot shows some curvature, indicating a death rate of a little more than 0.35 in the early stages and a little less than 0.35 in the later stages. This kind of pattern would not be unusual in data on real animals.

0 2 4 6Roll Number

Res

idu

al

8 10 12

–0.20

–0.10

0

0.10

0.20

0.30

P27. The Florida population shows a definite nonlinear trend that could represent exponential growth.

1800 1850 1900 1950Year

Pop

ula

tion

20000

3,000,000

6,000,000

9,000,000

12,000,000

15,000,000

a. The log transformation transforms the pattern to a linear one that can be summarized by a straight line.

1011121314151617

1820 1860 1900 1940 1980 2020Year

ln(P

opu

lati

on)

b. The equation of the line is

ln(pop) � �54.9342 � 0.03583 year

so that

pop � e �54.9382 ( e 0.03583 ) year � e �54.9382 (1.036 ) year

for a growth rate of 3.6% per year, which is a high rate of growth.c. The residual plot shows a pattern. Florida grew less rapidly than the model predicts up to about 1845, then grew more rapidly than predicted, then less, then more. A big jump in growth occurred between 1950 and 1960. Then in 2000, there was a big drop in growth.

Res

idu

al

–0.20–0.15–0.10–0.05

0.000.05

0.100.15

1820 1860 1900 1940 1980 2020Year

142 Section 3.5 Solutions Statistics in Action Instructor’s Guide, Volume 1 © 2008 Key Curriculum Press

P28. x y log y

2 1000 3

1 100 2

0 10 1

�1 1 0

The plot does indeed show a straight line.

–1 –0.5 0 0.5x

log

y

1 1.5 20

0.5

1.0

1.5

2.0

2.5

3.0

The equation of this line is log y � 1 � x, so the slope is 1 and the y-intercept is 1.

P29. The tables and plots are shown here.a. xa ya log ya

6 1000 3

4 100 2

2 10 1

0 1 0

20

xa

log

y a

4 6

1

2

3

The equation of the line is log y a � 0 � 0.5 x a , so the slope is 0.5 and the y-intercept is 0.b. xb yb log yb

5 0.0001 �4

6 0.01 �2

8 100 2

5 6xb

log

y b

7 8

–4–3–2–1

012

The equation of the line is log y b � �14 � 2 x b , so the slope is 2 and the y-intercept is �14.

P30. If log y � c � dx, then by rules of logarithms, y � ( 10 c )(10 ) dx � ( 10 c )( 10 d ) x � a b x , where a � 10 c and b � 1 0 d . For P28, y � 10(10 ) x . For P29a, y � 1(1 0 0.5 ) x � 3.16 x . For P29b, y � 10 �14 ( 10 2 ) x � 10 �14 � 100 x .

P31. Using a ln( flight length) transformation gives the following printout, which agrees with the first part of Display 3.98 in the student book.

The regression equation isln(FlightLength) � 1.57 � 0.0120 speed

Predictor Coef Stdev t-ratio pConstant 1.5730 0.3281 4.79 0.000Speed 0.0119958 0.0007396 16.22 0.000

s � 0.2691 R–sq � 89.5% R–sq(adj) � 89.1%

Analysis of Variance

SOURCE DF SS MS F pRegression 1 19.058 19.058 263.09 0.000Error 31 2.246 0.072Total 32 21.304

The answer is, then:

ln(flight length) � 1.57 � 0.012 speed

flight length � e 1.57 e 0.012 speed � 4.807(1.012 ) speed

P32. A plot of the data shows a cluster at the left with little trend and three points at the right that, as a group, are potentially influential. The points in the residual plot show curvature.

–20

0

Con

sum

pti

on (

g)R

esid

ual

0

20

40

60

80

30

0 5 10 15 20 25 30Trips

0 5 10 15 20 25 30

Taking the natural log of the consumption straightens the residual plot. However, these transformed points don’t form an elliptical cloud. Some fishers’ families eat essentially no fish, even if the person fishes as many as 11 times a month.

Statistics in Action Instructor’s Guide, Volume 1 Section 3.5 Solutions 143© 2008 Key Curriculum Press

ln(C

onsu

mp

tion

)R

esid

ual

–1.5

0.0

2.0

0

1

2

3

4

5

0 5 10 15 20 25 30Trips

0 5 10 15 20 25 30

The equation of the regression line is ln(consumption) � 0.39 � 0.143 trips with a correlation of about 0.69.

As it happens, removing the three points to the right has a small effect on the slope, increasing it from 0.143 to 0.177. However, the correlation drops considerably, from 0.69 to 0.37, with the regression line now passing between the two remaining clusters of points.

P33. The prediction is ln(weight) � 3.29 ln(75) � 10.2 � 4.005. Solving,

ln(weight) � 4.005

weight � e 4.005 � 54.9 pounds (or 54.8 with no rounding)

P34. a. There is a strong positive curved relationship between depth and velocity. The curve is concave down with the rate of change in velocity decreasing as the depth increases.b. The curve is not exponential, but could be a power function with a power less than 1. Taking the log of both variables results in this plot, shown with its residual plot. A reasonable model is ln(velocity) � 0.146 � 0.175 ln(depth). Solving for velocity gives velocity � 1.157 depth 0.175 .

-4.0 -3.0 -2.0 -1.0 0.0

lnVelocity = 0.175lnDepth + 0.15; r2 = 0.89

-0.20

-0.10

0.00

0.10

ln(depth)

-0.6

-0.5

-0.4

-0.3

-0.2

-0.1

0.0

0.1

0.2

-4.0 -3.0 -2.0 -1.0 0.0ln(depth)

Tidal Velocity Scatter Plot

ln(v

elo

city

)R

esi

du

al

P35. a. i.

1xa

ya

2 30

5

10

15

20

25

30

ii. The y-scale must be shrunk more for larger values of x than for smaller values of x. The cube root transformation will straighten them.

iii.

1xa

2 30

1

2

3

ya1/3

b. i.

0 2 4xb

yb

6 8 10

2

4

6

8

10

144 Section 3.5 Solutions Statistics in Action Instructor’s Guide, Volume 1 © 2008 Key Curriculum Press

ii. The y-scale must be shrunk more for smaller values of x than for larger values of x. The reciprocal transformation (power of �1) will work.

iii.

0 2 4xb

6 8 10

0.20.40.60.81.0

y –1 b

c. i.

020 40

yc

60 80 100

2468

10

xc

ii. The y-scale must be expanded more for larger values of x than for smaller values of x. A square transformation (power of 2) will straighten the points.

iii.

20 40xc

60 80 1000

20

40

60

80

100

y 2 c

P36. For the points in P35a, the x-scale must be expanded. A cubic power transformation will straighten this relationship.

10 20 300

10

20

30

x3 a

ya

For the points in P35b, the x-scale must be shrunk a bit. A reciprocal transformation (power of �1) will do the trick.

00.2 0.4 0.6 0.8 1.0

2

4

6

8

10

x –1 b

yb

For the points in P35c, again, the x-scale must shrink. This time a square root transformation (power of 0.5) will straighten the points.

02 4

x 0.5c

yc

6 8 10

2468

10

The exponents in P36 are the reciprocals of the exponents in P35.

P37. a. The area is proportional to the square of the radius (y � πx 2 ), so y would have to be raised to the power 1 _ 2 (square root).b. The volume is proportional to the cube of a side (y � x 3 ), so y would have to be raised to the power of 1 _ 3 (cube root).c. The volume is proportional to the square of the diameter ( y � 8� � x _ 2 � 2 or y � 2� x 2 ), so y would have to be raised to the power 1 _ 2 (square root).

P38. The following plots show the diameter plotted against the square root of age and the residuals from the regression line. The regression analysis is also provided. This transformation results in a scatterplot with less of a fan shape than the one for diamete r 2 versus age. If you want to predict diameters along the full range of ages, this transformation will allow more even precision in the predictions.

2 3 4 5 6 7

012345678

2 3 4 5 6 7

–2.5

0

2.0

Sqrt (age)

Diameter � 1.47 · Sqrt (age) � 1.86; r2 � 0.83

Res

idu

alD

iam

eter

Statistics in Action Instructor’s Guide, Volume 1 Section 3.5 Solutions 145© 2008 Key Curriculum Press

Dependent variable is: DiameterNo SelectorR squared � 83.2% R squared (adjusted) � 82.5%s � 0.91�1 with �7 � 2 � 25 degrees of freedom

Sum of MeanSource Squares df Square F-ratioRegression 102.855 1 102.855 124Residual 20.7969 25 0.831876

s.e. ofVariable Coeffi cient Coeff t-ratio probConstant �1.85727 0.6175 �3.01 0.0059√Age 1.46516 0.1318 11.1 �0.0001

P39. a. The plot of the data suggests that you must expand the y-scale or shrink the x-scale, so the power transformation on x will have to be a power less than 1. Students may suggest a square root transformation on y because it has been successful in the past.b. The log-log transformation yields a nearly linear plot.

–2 –1 0log(body weight)

log(

brai

n w

eigh

t)

1 2 3 4–3

–1

0

1

2

3

4

c. The regression equation is log(brain) � 0.908 � 0.76 log(body), or brain � 8.10(body ) 0.76 (or 8.08 with no rounding). The slope of the line, 0.76, agrees with the insight that the x-scale must be transformed by a power less than 1.

The regression equation islogBrain � 0.908 � 0.760 logBody

Predictor Coef Stdev t-ratio pConstant 0.90754 0.04967 18.27 0.000logBody 0.76020 0.03162 24.04 0.000

s � 0.3156 R-sq � 92.6% R-sq(adj) � 92.5%

Analysis of Variance

SOURCE DF SS MS F pRegression 1 57.577 57.577 577.96 0.000Error 46 4.583 0.100Total 47 62.159

Exercises E55. The plot of the square root of flight length

versus speed still retains obvious curvature—this transformation is less satisfactory than the log transformation.

1020

3040

5060

70

Speed (mi/h)

Sqrt

(fl

ight

len

gth)

200 250 300 350 400 450 500 550

E56. The plots of the data and the regression analysis are shown next.

1 2 3Roll Number

Pop

ula

tion

4 5 60

40

80

120

160

200

0 1 2 3Roll Number

Res

idu

al

4 5 6

–0.3

0

0.2

0 1 2 3ln

(po

pula

tion

)4 5 6

123456

The equation of the regression line is ln y � 5.142 � 0.885x. Because e �0.885 � 0.413, the estimated rate of decay is 1 � 0.413 � 0.587. The curved, V-shaped residual plot shows that the rate of decay is greater than the estimated value during the first and last time periods and less than the estimated value over the middle time periods.

E57. The plot of the data shows curvature. Although the log-log transformation helps, it does not remove the curvature. (Neither will any other power transformation.) The residual plots suggest dividing the data into two groups, as the trends are more linear within each group. So a good way to model these data is to split them into two groups (perhaps with ages 2 through 8 in one group and 9 through 14 in the other). Then although the points in each group still have some curvature, the residuals are much smaller. The younger group has a weight gain per inch of height that is lower than the overall average, whereas the older group has a weight gain per inch of height that is higher than the average.

146 Section 3.5 Solutions Statistics in Action Instructor’s Guide, Volume 1 © 2008 Key Curriculum Press

Weight = 2.76Height - 80; r2 = 0.96

-606

35 40 45 50 55 60 65 70Height (in.)

0

20

40

60

80

100

120

35 40 45 50 55 60 65 70Height (in.)

Median Heights Scatter Plot

Wei

gh

t (l

b)

Res

idua

l

–0.08

0.00

0.10

3.23.43.63.84.04.24.44.64.8

3.5 3.6 3.7 3.8 3.9 4.0 4.1 4.2ln(height)

3.5 3.6 3.7 3.8 3.9 4.0 4.1 4.2

ln(w

eigh

t)R

esid

ual

Note on E58: Using the Fathom data file makes this question much easier. On the plot of cost/seat/mile versus flight length, highlight the points for one group. The same subjects will be highlighted on the other graphs and this question is more easily answered.

E58. The group on the right in the plot of cost/seat/mile versus flight length also happens to be the larger planes, judging from the numbers of seats, and they use more fuel and are the planes with the highest flight speeds. (Refer to the scatterplot matrix of air-line data for E8 on page 91 of this Instructor’s Guide.)

E59. a. There is a strong positive relationship between size of the hunting party and success rates. The plot looks fairly linear, but a look at the residual plot makes it apparent that there is some curvature in the data.

–8

0

12

0 2 4 6 8 10 12 14 16 18Number of Chimps

0

20

40

60

80

0 2 4 6 8 10 12 14 16 18

Per

cen

t Su

cces

sfu

lR

esid

ual

100

b. A line is not a bad fit here and would predict reasonably well for parties of 16 or fewer chimps because the residuals are small. However, a line is not the most appropriate model and would probably not work as well to predict for parties much more than 16 chimps strong.c. A log-log transformation works pretty well. The model would be ln(percent) � 0.524 ln(chimps) � 2.9575, or percent � 19.25 chimp s 0.524 .

–0.2

0.0

0.2

2.83.03.23.43.63.84.04.24.44.6

0.0 0.5 1.0 1.5 2.0 2.5 3.0ln(chimps)

0.0 0.5 1.0 1.5 2.0 2.5 3.0

Res

idu

alln

(per

cen

t)

Note that an exponential model fit using a log transformation does exactly the wrong thing, bending the plot in the wrong direction.

3.03.23.43.63.84.04.24.44.6

0 2 4 6 8 10 12 14 16 18Chimps

ln(p

erce

nt)

Statistics in Action Instructor’s Guide, Volume 1 Section 3.5 Solutions 147© 2008 Key Curriculum Press

d. The residual plot shown in part c shows a random scatter, which is good. There is more spread for the smaller hunting parties than for the larger ones, so a transformation that reduces this would be better.

E60. The data show a curved pattern of growth over time, which could perhaps be modeled as exponential growth as we often hear that population is growing exponentially.

1820 1900 1980Year

U.S

. Pop

ula

tion

050,000,000

100,000,000150,000,000200,000,000250,000,000300,000,000

a. The next plot shows the growth in population for each decade. The increase in population wasn’t constant from decade to decade but increased in a linear way. When change in population grows linearly, the population grows quadratically. Thus, we can predict that an exponential model isn’t appropriate and that taking the square root of each population will linearize the original scatterplot.

1840 1880 1920 1960 2000Year

Pop

ula

tion

Gro

wth

5,000,0000

10,000,00015,000,00020,000,00025,000,00030,000,00035,000,000

b. Taking the log of the population overcompensates for the curve in the original data; the growth is not really exponential growth. A better transformation is the square root of the population. The plot of these data, along with the regression analysis and residual plot, is presented next. The residuals still have some pattern, as is expected for time series data, but it is not very pronounced. The regression equation is �

_________ population � �138840 � 77.7

year.

Sqrt (Pop) = 77.7 Year � 138800; r 2 = 1.00

–400

300

0

2,0000

4,0006,0008,000

10,00012,00014,00016,00018,000

1820 1860 1900 1940 1980 2020Year

Res

idu

alSq

rt (

U.S

. pop

ula

tion

)

1820 1860 1900 1940 1980 2020

c. The pattern in the immigration data is quite cyclical, which is another common time series pattern. No simple power transformation will straighten this out. There is more than one “bend” in the data; power transformations only work well for a single bend.

1820 1860 1900 1940 1980 2020Year

Imm

igra

tion

(th

ousa

nd

s)

0100020003000400050006000700080009000

10,000

E61. The scatterplot of the original data appears here.

10 20 30 40Days in Advance

Pri

ce (

$)

50 60 70 800

100200300400500600700800

Because you would expect the price to go up as flight time gets nearer, a reciprocal transformation of the price (or 1/price) might linearize data such as these. Actually, this transformation does a good job. The plot and residual plot are shown here.

148 Section 3.5 Solutions Statistics in Action Instructor’s Guide, Volume 1 © 2008 Key Curriculum Press

0 10 20 30Days in Advance

Res

idu

al

40 50 60 70 80

–0.005

0

0.005

010 20 30 40

Rec

ipro

cal o

f P

rice

50 60 70 80

0.002

0.004

0.006

0.008

0.010

0.012

The regression equation is 1 ___ price � 0.00541 � 0.000039 days. If you solve for price, you get

price � 1 ______________________ (0.00541 � 0.000039 days)

That is, the number of days affects the price very little according to this model. (As students will learn later, the slope of the regression equation is not significantly different from 0.)

You get a linear relationship with some negative trend by plotting (ln(days), ln(price)). First, substitute 1 _ 2 day for 0 days before taking logs. This regression equation is ln(price) � 5.65 � 0.166 ln(days). The plot and residual plot are shown next. If you solve the equation for the price, you get price � 284.29 days �0.166 . Notice that for the range 10 days to 30 days, when most of the purchases were made, the range of prices is only from $161.64 to $193.98. Once again, days accounts for little of the variation in price. (And, again, the slope of the regression line is not statistically significant.)

–1 0 1 2 3ln(days)

Res

idu

al

4 5

–1.2

0

1.5

–1 0 1 2

ln(p

rice

)

3 4 5

4

5

6

7

Here’s the reason no transformation will give us a good model of predicting price from day: Five of the passengers paid a lot more for their tickets than did the other passengers, and they bought them 3, 4, 8, 9, and 9 days before the flight. (See the previous scatterplot.) But other passengers bought their tickets even closer to flight time and paid just about the same as passengers who bought their tickets months before. If the five passengers who paid extremely high prices are left out, the relationship is reasonably linear but flat. The correlation between days and price for the remaining passengers is 0.034, or practically nonexistent. Thus, the best model is to say that there is no relationship between the day these passengers bought their tickets and the price they paid, with the exception of five passengers who bought their tickets within 9 days of the flight and who paid more than double any other passenger.

E62. a. It appears that brain oxygen versus body mass could be modeled by exponential decay, but a quick check will show that a log transformation does little to straighten the plot. The log-log transformation does well, once again.

0 200 400Body Mass (kg)

Oxy

gen

Use

in B

rain

600 800

15

20

25

30

35

–2.5–5.0 0 2.5ln(body mass)

ln(b

rain

oxy

gen

)

5.0 7.5

2.8

3.0

3.2

3.4

2.6

3.6

The equation of this regression line is

ln(brain oxygen) � 3.26 � 0.07 ln(body mass)

which implies that ln(brain oxygen) decreases, on the average, by 0.07 units for every 1 unit increase in ln(body mass).

Statistics in Action Instructor’s Guide, Volume 1 Section 3.5 Solutions 149© 2008 Key Curriculum Press

b. Using the log-log transformation, the relationship between lung oxygen consumption and body mass has a similar linear trend except for a couple of stray points.

–2.5 0 2.5ln(body mass)

ln(l

un

g ox

ygen

)

5.0 7.5–5.0

1.50

2.00

1.25

1.75

2.25

2.50

The equation of this line is

ln(lung oxygen) � 1.976 � 0.0951 ln(body mass)

which implies that ln(lung oxygen) decreases, on the average, by 0.0951 units for every 1 unit increase in ln(body mass). c. If this theory is true and oxygen consumption depends on the relative size of the organ, then the lung oxygen consumption should decrease less rapidly than the brain oxygen consumption. But the data show that the lung oxygen consumption decreases more rapidly than that of the brain. There must be another explanation as to why the brain seems to use more oxygen, relative to its size, than does other organs.

E63. a. The scatterplot of these data show a marked decrease in the birthrate as the GNP increases. The relationship is nonlinear but does not look like exponential decay.

0

GN

P

510152025303540

5 10 15 20 25 30 35 40Birthrate

b. The log transformation works quite well here and gives a plot that seems appropriate for a regression line. The residual plot looks rather like random scatter and further supports this choice of a statistical model.

log(GNP) = –0.0674 Birthrate + 1.87; r 2 = 0.60

–1.2

0.0

0.8

40

log(

GN

P)

Res

idu

al

–0.6

–0.2

0.2

0.6

1.0

1.4

0 5 10 15 20 25 30 35Birthrate

0 5 10 15 20 25 30 35 40

Dependent variable is: logGNPNo SelectorR squared = 59.7% R squared (adjusted) = 58.0%s = 0.4383 with 25 - 2 = 23 degrees of freedom

Sum of Source Squares df Mean Square F-ratioRegression 6.54964 1 6.54964 34.1Residual 4.41913 23 0.192136

s.e. ofVariable Coeffi cient Coeff t-ratio probConstant 1.86903 0.2276 8.21 � 0.0001Birthrate �0.067360 0.0115 �5.84 � 0.0001

The regression equation is

log(GNP) � 1.87 � 0.0674 birthrate

or

GNP � 10 1.87 ( 10 �0.0674 � birthrate )

� 74.13 � 0.856 birthrate

To interpret the slope and intercept of the model we must use the linear version on the log scale. log(GNP) decreases, on the average, 0.0674 units for every 1 unit increase in birthrate.

E64. a. The decreasing trend has a slight curvature, especially toward the later years, so perhaps a log transformation (exponential decay) will work. This would make the interpretation of the results quite easy.

25

30

3540

45

50

55

1965 1975 1985 1995 2005Year

18+

150 Section 3.5 Solutions Statistics in Action Instructor’s Guide, Volume 1 © 2008 Key Curriculum Press

–0.08

0.00

0.06

1965 1975 1985 1995 2005Year

Res

idu

alln

(18+

)

3.23.33.43.53.63.73.83.94.0

1965 1975 1985 1995 2005

The residual plot still has a good deal of curvature. Another look at the original plot shows that the curve seems to be approaching 21, not 0, as an exponential decay function would.

10

20

30

40

50

18+

60

1965 1975 1985 1995 2005Year

An exponential model could be used by subtracting 21 from the percentage before taking the log. This graph shows the results.

Res

idu

alln

(18+

– 2

1)

1965 1975 1985 1995 2005Year

1965 1975 1985 1995 2005

–0.20

0.00

0.20

0.5

1.0

1.5

2.0

2.5

3.0

3.5

This exponential decay model fits well and shows that the level of smoking above 21% is decreasing at a rate of about 5.5% per year because e �0.05645 � 0.945. (Remember, this rate of decrease is a percent of a percent because the original measurements are percentages.)b. This plot poses a difficulty because the trend changes abruptly about 1991. One equation will not work.

–8

0

12

1965 1975 1985 1995 2005Year

Res

idu

al18

– 2

4

2025303540455055

1965 1975 1985 1995 2005

The rate of smoking seems to decrease linearly until about 1990, then it begins increasing linearly. These plots show lines for the two parts of the plot separately.

–2.0

0.0

2.5

Res

idu

al18

– 2

4

25303540455055

1965 1970 1975 1980 1985 1990 1995Year

1965 1970 1975 1980 1985 1990 1995

–4

0

3

1990 1994 1998 2002Year

2426283032

1990 1994 1998 2002

18 –

24

Res

idu

al

22

34

The regression equation for the first plot is percentage � 2311 � 1.1492 year, and for the second plot it is percentage � �872.00 � 0.514 year.c. The pattern of decrease for the 65 and older category is much more linear; in fact, the log transformation will make things worse instead of better.

Statistics in Action Instructor’s Guide, Volume 1 Section 3.5 Solutions 151© 2008 Key Curriculum Press

5

10

15

20

25

30

1965 1975 1985 1995 2005Year

65+

The regression equation is percentage � 1062 � 0.5255 year.

Predictor Coef Stdev t–ratio pConstant 1061.83 53.06 20.01 0.000Year �0.52550 0.02666 �19.71 0.000

s � 1.100 R–sq � 95.8% R–sq(adj) � 95.6%

Analysis of Variance

SOURCE DF SS MS F pRegression 1 470.16 470.16 388.57 0.000Error 17 20.57 1.21Total 18 490.73

This nearly constant (linear) rate of decrease amounts to about half a percentage point per year.

E65. a. See the plots below. The amount of CO 2 is definitely increasing over the years, and the upward curvature makes it reasonable to suspect exponential growth. But note that the log transformation is not much help here.

310320330340350360370380

CO

2

1950 1970 1990 2010Year

5.745.765.785.805.825.845.865.885.905.925.94

1970 1990Year

ln(C

O2)

1950 2010

b. The fitted line appears in the first plot in part a. The residual plot from the original data shows that CO 2 increased at a rate lower than the overall average from 1967 to about 1994 and at a higher rate than the overall average before 1967 and after 1994.

1960 1970 1980 1990 2000Year

resi

d(C

O2)

1950 2010

–3–2–1

012345

c. The pattern of the residuals suggests an abrupt change around 1976. A better way to model these data might be to use two straight lines with different slopes, one line covering the period from 1959 to about 1976 and the other from about 1977 to 2003. The first two plots below show the regression line and residual plots for years up to 1976 and for years after 1976, respectively. The third plot shows the two lines on the original plot.

–1.5

0.0

1.5

1958 1962 1966 1970 1974 1978Year

316318320322324326328330332334

1958 1962 1966 1970 1974 1978R

esid

ual

CO

2

–1.5

0.0

1.5

330

340

350

360

370

380

Res

idu

alC

O2

1975 1980 1985 1990 1995 2000 2005Year

1975 1980 1985 1990 1995 2000 2005

310

320

330

340

350

360

370

380

1955 1965 1975 1985 1995 2005Year

CO

2

152 Section 3.5 Solutions Statistics in Action Instructor’s Guide, Volume 1 © 2008 Key Curriculum Press

The regression equation for the first plot is CO 2 � �1559.3 � 0.957 year. For the second plot it is CO 2 � �2776.7 � 1.573 year. Another possibility would be to recognize that although an exponential model has an asymptote at zero, the CO 2 level in the atmosphere was never near 0. One estimate is that pre-industrial levels of CO 2 in the atmosphere were around 250 ppm. We can adjust for this by taking the natural log of ( CO 2 level � 250).

–0.02

0.00

Ad

just

ed ln

(CO

2)R

esid

ual

4.14.24.34.44.54.64.74.84.9

0.03

1955 1965 1975 1985 1995 2005Year

1955 1965 1975 1985 1995 2005

The regression equation for this plot is ln( CO 2 ) � �25.281 � 0.0150 year. Once again, notice that the residual plots have an oscillating pattern typical of time series data.d. The linear model gives an average increase of about 1.57 ppm CO 2 per year for years after 1976. Using the exponential model with an asymptote at 250 ppm, the amount of CO 2 in the atmosphere above 250 ppm is multiplied by e 0.0150 � 1.015 each year, for a growth rate of about 1.5% per year.

E66. The plot of average SAT math score versus percentage taking exam shows a decreasing trend with a curvature. A log-log transformation straightens this out nicely, and the regression analysis of ln(SAT math score) versus ln(percentage taking exam) provides a good model for

prediction, although the left end of this line is pulled down a bit by a couple of states that have both low percentages and relatively low scores.

480500520540560580600620

Scor

e

Percent0 20 40 60 80 100

Res

idu

alln

(sco

re)

6.20

6.24

6.28

6.32

6.36

6.40

–0.08

0.00

0.08

1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5ln(percent)

1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5

The complete regression analysis is shown here.

The regression equation is

lnScore � 6.46 �0.0509 lnPercent

Predictor Coef Stdev t–ratio pConstant 6.45586 0.01351 477.99 0.000lnPercent �0.050937 0.003967 �12.84 0.000

s � 0.02921 R–sq � 77.5% R–sq(adj) � 77.0%

Analysis of Variance

SOURCE DF SS MS F pRegression 1 0.14072 0.14072 164.89 0.000Error 48 0.04096 0.00085Total 49 0.18168

Statistics in Action Instructor’s Guide, Volume 1 Section 3.5 Solutions 153© 2008 Key Curriculum Press