practice p27. p26. a. growth. 200...
TRANSCRIPT
Practice P26. a. This plot shows the data for Dying Dice.
2 4 6Roll Number
Pop
ula
tion
8 10 120
40
80
120
160
200
b. The transformed data show a nice linear trend.
2 4 6Roll Number
ln(p
opu
lati
on)
8 10 120
1.0
2.0
3.0
4.0
5.0
c. The equation of the line in part b is
ln y � 5.22 � 0.435x
or
ln(pop) � 5.22 � 0.435 roll number
Because e �0.435 � 0.647, the rate of dying is estimated to be 1 � 0.647, or about 0.353, or 35%, per time period. This rate is close to the theoretical probability of dying, set up to be 1/3.d. The residual plot shows some curvature, indicating a death rate of a little more than 0.35 in the early stages and a little less than 0.35 in the later stages. This kind of pattern would not be unusual in data on real animals.
0 2 4 6Roll Number
Res
idu
al
8 10 12
–0.20
–0.10
0
0.10
0.20
0.30
P27. The Florida population shows a definite nonlinear trend that could represent exponential growth.
1800 1850 1900 1950Year
Pop
ula
tion
20000
3,000,000
6,000,000
9,000,000
12,000,000
15,000,000
a. The log transformation transforms the pattern to a linear one that can be summarized by a straight line.
1011121314151617
1820 1860 1900 1940 1980 2020Year
ln(P
opu
lati
on)
b. The equation of the line is
ln(pop) � �54.9342 � 0.03583 year
so that
pop � e �54.9382 ( e 0.03583 ) year � e �54.9382 (1.036 ) year
for a growth rate of 3.6% per year, which is a high rate of growth.c. The residual plot shows a pattern. Florida grew less rapidly than the model predicts up to about 1845, then grew more rapidly than predicted, then less, then more. A big jump in growth occurred between 1950 and 1960. Then in 2000, there was a big drop in growth.
Res
idu
al
–0.20–0.15–0.10–0.05
0.000.05
0.100.15
1820 1860 1900 1940 1980 2020Year
142 Section 3.5 Solutions Statistics in Action Instructor’s Guide, Volume 1 © 2008 Key Curriculum Press
P28. x y log y
2 1000 3
1 100 2
0 10 1
�1 1 0
The plot does indeed show a straight line.
–1 –0.5 0 0.5x
log
y
1 1.5 20
0.5
1.0
1.5
2.0
2.5
3.0
The equation of this line is log y � 1 � x, so the slope is 1 and the y-intercept is 1.
P29. The tables and plots are shown here.a. xa ya log ya
6 1000 3
4 100 2
2 10 1
0 1 0
20
xa
log
y a
4 6
1
2
3
The equation of the line is log y a � 0 � 0.5 x a , so the slope is 0.5 and the y-intercept is 0.b. xb yb log yb
5 0.0001 �4
6 0.01 �2
8 100 2
5 6xb
log
y b
7 8
–4–3–2–1
012
The equation of the line is log y b � �14 � 2 x b , so the slope is 2 and the y-intercept is �14.
P30. If log y � c � dx, then by rules of logarithms, y � ( 10 c )(10 ) dx � ( 10 c )( 10 d ) x � a b x , where a � 10 c and b � 1 0 d . For P28, y � 10(10 ) x . For P29a, y � 1(1 0 0.5 ) x � 3.16 x . For P29b, y � 10 �14 ( 10 2 ) x � 10 �14 � 100 x .
P31. Using a ln( flight length) transformation gives the following printout, which agrees with the first part of Display 3.98 in the student book.
The regression equation isln(FlightLength) � 1.57 � 0.0120 speed
Predictor Coef Stdev t-ratio pConstant 1.5730 0.3281 4.79 0.000Speed 0.0119958 0.0007396 16.22 0.000
s � 0.2691 R–sq � 89.5% R–sq(adj) � 89.1%
Analysis of Variance
SOURCE DF SS MS F pRegression 1 19.058 19.058 263.09 0.000Error 31 2.246 0.072Total 32 21.304
The answer is, then:
ln(flight length) � 1.57 � 0.012 speed
flight length � e 1.57 e 0.012 speed � 4.807(1.012 ) speed
P32. A plot of the data shows a cluster at the left with little trend and three points at the right that, as a group, are potentially influential. The points in the residual plot show curvature.
–20
0
Con
sum
pti
on (
g)R
esid
ual
0
20
40
60
80
30
0 5 10 15 20 25 30Trips
0 5 10 15 20 25 30
Taking the natural log of the consumption straightens the residual plot. However, these transformed points don’t form an elliptical cloud. Some fishers’ families eat essentially no fish, even if the person fishes as many as 11 times a month.
Statistics in Action Instructor’s Guide, Volume 1 Section 3.5 Solutions 143© 2008 Key Curriculum Press
ln(C
onsu
mp
tion
)R
esid
ual
–1.5
0.0
2.0
0
1
2
3
4
5
0 5 10 15 20 25 30Trips
0 5 10 15 20 25 30
The equation of the regression line is ln(consumption) � 0.39 � 0.143 trips with a correlation of about 0.69.
As it happens, removing the three points to the right has a small effect on the slope, increasing it from 0.143 to 0.177. However, the correlation drops considerably, from 0.69 to 0.37, with the regression line now passing between the two remaining clusters of points.
P33. The prediction is ln(weight) � 3.29 ln(75) � 10.2 � 4.005. Solving,
ln(weight) � 4.005
weight � e 4.005 � 54.9 pounds (or 54.8 with no rounding)
P34. a. There is a strong positive curved relationship between depth and velocity. The curve is concave down with the rate of change in velocity decreasing as the depth increases.b. The curve is not exponential, but could be a power function with a power less than 1. Taking the log of both variables results in this plot, shown with its residual plot. A reasonable model is ln(velocity) � 0.146 � 0.175 ln(depth). Solving for velocity gives velocity � 1.157 depth 0.175 .
-4.0 -3.0 -2.0 -1.0 0.0
lnVelocity = 0.175lnDepth + 0.15; r2 = 0.89
-0.20
-0.10
0.00
0.10
ln(depth)
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0.0
0.1
0.2
-4.0 -3.0 -2.0 -1.0 0.0ln(depth)
Tidal Velocity Scatter Plot
ln(v
elo
city
)R
esi
du
al
P35. a. i.
1xa
ya
2 30
5
10
15
20
25
30
ii. The y-scale must be shrunk more for larger values of x than for smaller values of x. The cube root transformation will straighten them.
iii.
1xa
2 30
1
2
3
ya1/3
b. i.
0 2 4xb
yb
6 8 10
2
4
6
8
10
144 Section 3.5 Solutions Statistics in Action Instructor’s Guide, Volume 1 © 2008 Key Curriculum Press
ii. The y-scale must be shrunk more for smaller values of x than for larger values of x. The reciprocal transformation (power of �1) will work.
iii.
0 2 4xb
6 8 10
0.20.40.60.81.0
y –1 b
c. i.
020 40
yc
60 80 100
2468
10
xc
ii. The y-scale must be expanded more for larger values of x than for smaller values of x. A square transformation (power of 2) will straighten the points.
iii.
20 40xc
60 80 1000
20
40
60
80
100
y 2 c
P36. For the points in P35a, the x-scale must be expanded. A cubic power transformation will straighten this relationship.
10 20 300
10
20
30
x3 a
ya
For the points in P35b, the x-scale must be shrunk a bit. A reciprocal transformation (power of �1) will do the trick.
00.2 0.4 0.6 0.8 1.0
2
4
6
8
10
x –1 b
yb
For the points in P35c, again, the x-scale must shrink. This time a square root transformation (power of 0.5) will straighten the points.
02 4
x 0.5c
yc
6 8 10
2468
10
The exponents in P36 are the reciprocals of the exponents in P35.
P37. a. The area is proportional to the square of the radius (y � πx 2 ), so y would have to be raised to the power 1 _ 2 (square root).b. The volume is proportional to the cube of a side (y � x 3 ), so y would have to be raised to the power of 1 _ 3 (cube root).c. The volume is proportional to the square of the diameter ( y � 8� � x _ 2 � 2 or y � 2� x 2 ), so y would have to be raised to the power 1 _ 2 (square root).
P38. The following plots show the diameter plotted against the square root of age and the residuals from the regression line. The regression analysis is also provided. This transformation results in a scatterplot with less of a fan shape than the one for diamete r 2 versus age. If you want to predict diameters along the full range of ages, this transformation will allow more even precision in the predictions.
2 3 4 5 6 7
012345678
2 3 4 5 6 7
–2.5
0
2.0
Sqrt (age)
Diameter � 1.47 · Sqrt (age) � 1.86; r2 � 0.83
Res
idu
alD
iam
eter
Statistics in Action Instructor’s Guide, Volume 1 Section 3.5 Solutions 145© 2008 Key Curriculum Press
Dependent variable is: DiameterNo SelectorR squared � 83.2% R squared (adjusted) � 82.5%s � 0.91�1 with �7 � 2 � 25 degrees of freedom
Sum of MeanSource Squares df Square F-ratioRegression 102.855 1 102.855 124Residual 20.7969 25 0.831876
s.e. ofVariable Coeffi cient Coeff t-ratio probConstant �1.85727 0.6175 �3.01 0.0059√Age 1.46516 0.1318 11.1 �0.0001
P39. a. The plot of the data suggests that you must expand the y-scale or shrink the x-scale, so the power transformation on x will have to be a power less than 1. Students may suggest a square root transformation on y because it has been successful in the past.b. The log-log transformation yields a nearly linear plot.
–2 –1 0log(body weight)
log(
brai
n w
eigh
t)
1 2 3 4–3
–1
0
1
2
3
4
c. The regression equation is log(brain) � 0.908 � 0.76 log(body), or brain � 8.10(body ) 0.76 (or 8.08 with no rounding). The slope of the line, 0.76, agrees with the insight that the x-scale must be transformed by a power less than 1.
The regression equation islogBrain � 0.908 � 0.760 logBody
Predictor Coef Stdev t-ratio pConstant 0.90754 0.04967 18.27 0.000logBody 0.76020 0.03162 24.04 0.000
s � 0.3156 R-sq � 92.6% R-sq(adj) � 92.5%
Analysis of Variance
SOURCE DF SS MS F pRegression 1 57.577 57.577 577.96 0.000Error 46 4.583 0.100Total 47 62.159
Exercises E55. The plot of the square root of flight length
versus speed still retains obvious curvature—this transformation is less satisfactory than the log transformation.
1020
3040
5060
70
Speed (mi/h)
Sqrt
(fl
ight
len
gth)
200 250 300 350 400 450 500 550
E56. The plots of the data and the regression analysis are shown next.
1 2 3Roll Number
Pop
ula
tion
4 5 60
40
80
120
160
200
0 1 2 3Roll Number
Res
idu
al
4 5 6
–0.3
0
0.2
0 1 2 3ln
(po
pula
tion
)4 5 6
123456
The equation of the regression line is ln y � 5.142 � 0.885x. Because e �0.885 � 0.413, the estimated rate of decay is 1 � 0.413 � 0.587. The curved, V-shaped residual plot shows that the rate of decay is greater than the estimated value during the first and last time periods and less than the estimated value over the middle time periods.
E57. The plot of the data shows curvature. Although the log-log transformation helps, it does not remove the curvature. (Neither will any other power transformation.) The residual plots suggest dividing the data into two groups, as the trends are more linear within each group. So a good way to model these data is to split them into two groups (perhaps with ages 2 through 8 in one group and 9 through 14 in the other). Then although the points in each group still have some curvature, the residuals are much smaller. The younger group has a weight gain per inch of height that is lower than the overall average, whereas the older group has a weight gain per inch of height that is higher than the average.
146 Section 3.5 Solutions Statistics in Action Instructor’s Guide, Volume 1 © 2008 Key Curriculum Press
Weight = 2.76Height - 80; r2 = 0.96
-606
35 40 45 50 55 60 65 70Height (in.)
0
20
40
60
80
100
120
35 40 45 50 55 60 65 70Height (in.)
Median Heights Scatter Plot
Wei
gh
t (l
b)
Res
idua
l
–0.08
0.00
0.10
3.23.43.63.84.04.24.44.64.8
3.5 3.6 3.7 3.8 3.9 4.0 4.1 4.2ln(height)
3.5 3.6 3.7 3.8 3.9 4.0 4.1 4.2
ln(w
eigh
t)R
esid
ual
Note on E58: Using the Fathom data file makes this question much easier. On the plot of cost/seat/mile versus flight length, highlight the points for one group. The same subjects will be highlighted on the other graphs and this question is more easily answered.
E58. The group on the right in the plot of cost/seat/mile versus flight length also happens to be the larger planes, judging from the numbers of seats, and they use more fuel and are the planes with the highest flight speeds. (Refer to the scatterplot matrix of air-line data for E8 on page 91 of this Instructor’s Guide.)
E59. a. There is a strong positive relationship between size of the hunting party and success rates. The plot looks fairly linear, but a look at the residual plot makes it apparent that there is some curvature in the data.
–8
0
12
0 2 4 6 8 10 12 14 16 18Number of Chimps
0
20
40
60
80
0 2 4 6 8 10 12 14 16 18
Per
cen
t Su
cces
sfu
lR
esid
ual
100
b. A line is not a bad fit here and would predict reasonably well for parties of 16 or fewer chimps because the residuals are small. However, a line is not the most appropriate model and would probably not work as well to predict for parties much more than 16 chimps strong.c. A log-log transformation works pretty well. The model would be ln(percent) � 0.524 ln(chimps) � 2.9575, or percent � 19.25 chimp s 0.524 .
–0.2
0.0
0.2
2.83.03.23.43.63.84.04.24.44.6
0.0 0.5 1.0 1.5 2.0 2.5 3.0ln(chimps)
0.0 0.5 1.0 1.5 2.0 2.5 3.0
Res
idu
alln
(per
cen
t)
Note that an exponential model fit using a log transformation does exactly the wrong thing, bending the plot in the wrong direction.
3.03.23.43.63.84.04.24.44.6
0 2 4 6 8 10 12 14 16 18Chimps
ln(p
erce
nt)
Statistics in Action Instructor’s Guide, Volume 1 Section 3.5 Solutions 147© 2008 Key Curriculum Press
d. The residual plot shown in part c shows a random scatter, which is good. There is more spread for the smaller hunting parties than for the larger ones, so a transformation that reduces this would be better.
E60. The data show a curved pattern of growth over time, which could perhaps be modeled as exponential growth as we often hear that population is growing exponentially.
1820 1900 1980Year
U.S
. Pop
ula
tion
050,000,000
100,000,000150,000,000200,000,000250,000,000300,000,000
a. The next plot shows the growth in population for each decade. The increase in population wasn’t constant from decade to decade but increased in a linear way. When change in population grows linearly, the population grows quadratically. Thus, we can predict that an exponential model isn’t appropriate and that taking the square root of each population will linearize the original scatterplot.
1840 1880 1920 1960 2000Year
Pop
ula
tion
Gro
wth
5,000,0000
10,000,00015,000,00020,000,00025,000,00030,000,00035,000,000
b. Taking the log of the population overcompensates for the curve in the original data; the growth is not really exponential growth. A better transformation is the square root of the population. The plot of these data, along with the regression analysis and residual plot, is presented next. The residuals still have some pattern, as is expected for time series data, but it is not very pronounced. The regression equation is �
_________ population � �138840 � 77.7
year.
Sqrt (Pop) = 77.7 Year � 138800; r 2 = 1.00
–400
300
0
2,0000
4,0006,0008,000
10,00012,00014,00016,00018,000
1820 1860 1900 1940 1980 2020Year
Res
idu
alSq
rt (
U.S
. pop
ula
tion
)
1820 1860 1900 1940 1980 2020
c. The pattern in the immigration data is quite cyclical, which is another common time series pattern. No simple power transformation will straighten this out. There is more than one “bend” in the data; power transformations only work well for a single bend.
1820 1860 1900 1940 1980 2020Year
Imm
igra
tion
(th
ousa
nd
s)
0100020003000400050006000700080009000
10,000
E61. The scatterplot of the original data appears here.
10 20 30 40Days in Advance
Pri
ce (
$)
50 60 70 800
100200300400500600700800
Because you would expect the price to go up as flight time gets nearer, a reciprocal transformation of the price (or 1/price) might linearize data such as these. Actually, this transformation does a good job. The plot and residual plot are shown here.
148 Section 3.5 Solutions Statistics in Action Instructor’s Guide, Volume 1 © 2008 Key Curriculum Press
0 10 20 30Days in Advance
Res
idu
al
40 50 60 70 80
–0.005
0
0.005
010 20 30 40
Rec
ipro
cal o
f P
rice
50 60 70 80
0.002
0.004
0.006
0.008
0.010
0.012
The regression equation is 1 ___ price � 0.00541 � 0.000039 days. If you solve for price, you get
price � 1 ______________________ (0.00541 � 0.000039 days)
That is, the number of days affects the price very little according to this model. (As students will learn later, the slope of the regression equation is not significantly different from 0.)
You get a linear relationship with some negative trend by plotting (ln(days), ln(price)). First, substitute 1 _ 2 day for 0 days before taking logs. This regression equation is ln(price) � 5.65 � 0.166 ln(days). The plot and residual plot are shown next. If you solve the equation for the price, you get price � 284.29 days �0.166 . Notice that for the range 10 days to 30 days, when most of the purchases were made, the range of prices is only from $161.64 to $193.98. Once again, days accounts for little of the variation in price. (And, again, the slope of the regression line is not statistically significant.)
–1 0 1 2 3ln(days)
Res
idu
al
4 5
–1.2
0
1.5
–1 0 1 2
ln(p
rice
)
3 4 5
4
5
6
7
Here’s the reason no transformation will give us a good model of predicting price from day: Five of the passengers paid a lot more for their tickets than did the other passengers, and they bought them 3, 4, 8, 9, and 9 days before the flight. (See the previous scatterplot.) But other passengers bought their tickets even closer to flight time and paid just about the same as passengers who bought their tickets months before. If the five passengers who paid extremely high prices are left out, the relationship is reasonably linear but flat. The correlation between days and price for the remaining passengers is 0.034, or practically nonexistent. Thus, the best model is to say that there is no relationship between the day these passengers bought their tickets and the price they paid, with the exception of five passengers who bought their tickets within 9 days of the flight and who paid more than double any other passenger.
E62. a. It appears that brain oxygen versus body mass could be modeled by exponential decay, but a quick check will show that a log transformation does little to straighten the plot. The log-log transformation does well, once again.
0 200 400Body Mass (kg)
Oxy
gen
Use
in B
rain
600 800
15
20
25
30
35
–2.5–5.0 0 2.5ln(body mass)
ln(b
rain
oxy
gen
)
5.0 7.5
2.8
3.0
3.2
3.4
2.6
3.6
The equation of this regression line is
ln(brain oxygen) � 3.26 � 0.07 ln(body mass)
which implies that ln(brain oxygen) decreases, on the average, by 0.07 units for every 1 unit increase in ln(body mass).
Statistics in Action Instructor’s Guide, Volume 1 Section 3.5 Solutions 149© 2008 Key Curriculum Press
b. Using the log-log transformation, the relationship between lung oxygen consumption and body mass has a similar linear trend except for a couple of stray points.
–2.5 0 2.5ln(body mass)
ln(l
un
g ox
ygen
)
5.0 7.5–5.0
1.50
2.00
1.25
1.75
2.25
2.50
The equation of this line is
ln(lung oxygen) � 1.976 � 0.0951 ln(body mass)
which implies that ln(lung oxygen) decreases, on the average, by 0.0951 units for every 1 unit increase in ln(body mass). c. If this theory is true and oxygen consumption depends on the relative size of the organ, then the lung oxygen consumption should decrease less rapidly than the brain oxygen consumption. But the data show that the lung oxygen consumption decreases more rapidly than that of the brain. There must be another explanation as to why the brain seems to use more oxygen, relative to its size, than does other organs.
E63. a. The scatterplot of these data show a marked decrease in the birthrate as the GNP increases. The relationship is nonlinear but does not look like exponential decay.
0
GN
P
510152025303540
5 10 15 20 25 30 35 40Birthrate
b. The log transformation works quite well here and gives a plot that seems appropriate for a regression line. The residual plot looks rather like random scatter and further supports this choice of a statistical model.
log(GNP) = –0.0674 Birthrate + 1.87; r 2 = 0.60
–1.2
0.0
0.8
40
log(
GN
P)
Res
idu
al
–0.6
–0.2
0.2
0.6
1.0
1.4
0 5 10 15 20 25 30 35Birthrate
0 5 10 15 20 25 30 35 40
Dependent variable is: logGNPNo SelectorR squared = 59.7% R squared (adjusted) = 58.0%s = 0.4383 with 25 - 2 = 23 degrees of freedom
Sum of Source Squares df Mean Square F-ratioRegression 6.54964 1 6.54964 34.1Residual 4.41913 23 0.192136
s.e. ofVariable Coeffi cient Coeff t-ratio probConstant 1.86903 0.2276 8.21 � 0.0001Birthrate �0.067360 0.0115 �5.84 � 0.0001
The regression equation is
log(GNP) � 1.87 � 0.0674 birthrate
or
GNP � 10 1.87 ( 10 �0.0674 � birthrate )
� 74.13 � 0.856 birthrate
To interpret the slope and intercept of the model we must use the linear version on the log scale. log(GNP) decreases, on the average, 0.0674 units for every 1 unit increase in birthrate.
E64. a. The decreasing trend has a slight curvature, especially toward the later years, so perhaps a log transformation (exponential decay) will work. This would make the interpretation of the results quite easy.
25
30
3540
45
50
55
1965 1975 1985 1995 2005Year
18+
150 Section 3.5 Solutions Statistics in Action Instructor’s Guide, Volume 1 © 2008 Key Curriculum Press
–0.08
0.00
0.06
1965 1975 1985 1995 2005Year
Res
idu
alln
(18+
)
3.23.33.43.53.63.73.83.94.0
1965 1975 1985 1995 2005
The residual plot still has a good deal of curvature. Another look at the original plot shows that the curve seems to be approaching 21, not 0, as an exponential decay function would.
10
20
30
40
50
18+
60
1965 1975 1985 1995 2005Year
An exponential model could be used by subtracting 21 from the percentage before taking the log. This graph shows the results.
Res
idu
alln
(18+
– 2
1)
1965 1975 1985 1995 2005Year
1965 1975 1985 1995 2005
–0.20
0.00
0.20
0.5
1.0
1.5
2.0
2.5
3.0
3.5
This exponential decay model fits well and shows that the level of smoking above 21% is decreasing at a rate of about 5.5% per year because e �0.05645 � 0.945. (Remember, this rate of decrease is a percent of a percent because the original measurements are percentages.)b. This plot poses a difficulty because the trend changes abruptly about 1991. One equation will not work.
–8
0
12
1965 1975 1985 1995 2005Year
Res
idu
al18
– 2
4
2025303540455055
1965 1975 1985 1995 2005
The rate of smoking seems to decrease linearly until about 1990, then it begins increasing linearly. These plots show lines for the two parts of the plot separately.
–2.0
0.0
2.5
Res
idu
al18
– 2
4
25303540455055
1965 1970 1975 1980 1985 1990 1995Year
1965 1970 1975 1980 1985 1990 1995
–4
0
3
1990 1994 1998 2002Year
2426283032
1990 1994 1998 2002
18 –
24
Res
idu
al
22
34
The regression equation for the first plot is percentage � 2311 � 1.1492 year, and for the second plot it is percentage � �872.00 � 0.514 year.c. The pattern of decrease for the 65 and older category is much more linear; in fact, the log transformation will make things worse instead of better.
Statistics in Action Instructor’s Guide, Volume 1 Section 3.5 Solutions 151© 2008 Key Curriculum Press
5
10
15
20
25
30
1965 1975 1985 1995 2005Year
65+
The regression equation is percentage � 1062 � 0.5255 year.
Predictor Coef Stdev t–ratio pConstant 1061.83 53.06 20.01 0.000Year �0.52550 0.02666 �19.71 0.000
s � 1.100 R–sq � 95.8% R–sq(adj) � 95.6%
Analysis of Variance
SOURCE DF SS MS F pRegression 1 470.16 470.16 388.57 0.000Error 17 20.57 1.21Total 18 490.73
This nearly constant (linear) rate of decrease amounts to about half a percentage point per year.
E65. a. See the plots below. The amount of CO 2 is definitely increasing over the years, and the upward curvature makes it reasonable to suspect exponential growth. But note that the log transformation is not much help here.
310320330340350360370380
CO
2
1950 1970 1990 2010Year
5.745.765.785.805.825.845.865.885.905.925.94
1970 1990Year
ln(C
O2)
1950 2010
b. The fitted line appears in the first plot in part a. The residual plot from the original data shows that CO 2 increased at a rate lower than the overall average from 1967 to about 1994 and at a higher rate than the overall average before 1967 and after 1994.
1960 1970 1980 1990 2000Year
resi
d(C
O2)
1950 2010
–3–2–1
012345
c. The pattern of the residuals suggests an abrupt change around 1976. A better way to model these data might be to use two straight lines with different slopes, one line covering the period from 1959 to about 1976 and the other from about 1977 to 2003. The first two plots below show the regression line and residual plots for years up to 1976 and for years after 1976, respectively. The third plot shows the two lines on the original plot.
–1.5
0.0
1.5
1958 1962 1966 1970 1974 1978Year
316318320322324326328330332334
1958 1962 1966 1970 1974 1978R
esid
ual
CO
2
–1.5
0.0
1.5
330
340
350
360
370
380
Res
idu
alC
O2
1975 1980 1985 1990 1995 2000 2005Year
1975 1980 1985 1990 1995 2000 2005
310
320
330
340
350
360
370
380
1955 1965 1975 1985 1995 2005Year
CO
2
152 Section 3.5 Solutions Statistics in Action Instructor’s Guide, Volume 1 © 2008 Key Curriculum Press
The regression equation for the first plot is CO 2 � �1559.3 � 0.957 year. For the second plot it is CO 2 � �2776.7 � 1.573 year. Another possibility would be to recognize that although an exponential model has an asymptote at zero, the CO 2 level in the atmosphere was never near 0. One estimate is that pre-industrial levels of CO 2 in the atmosphere were around 250 ppm. We can adjust for this by taking the natural log of ( CO 2 level � 250).
–0.02
0.00
Ad
just
ed ln
(CO
2)R
esid
ual
4.14.24.34.44.54.64.74.84.9
0.03
1955 1965 1975 1985 1995 2005Year
1955 1965 1975 1985 1995 2005
The regression equation for this plot is ln( CO 2 ) � �25.281 � 0.0150 year. Once again, notice that the residual plots have an oscillating pattern typical of time series data.d. The linear model gives an average increase of about 1.57 ppm CO 2 per year for years after 1976. Using the exponential model with an asymptote at 250 ppm, the amount of CO 2 in the atmosphere above 250 ppm is multiplied by e 0.0150 � 1.015 each year, for a growth rate of about 1.5% per year.
E66. The plot of average SAT math score versus percentage taking exam shows a decreasing trend with a curvature. A log-log transformation straightens this out nicely, and the regression analysis of ln(SAT math score) versus ln(percentage taking exam) provides a good model for
prediction, although the left end of this line is pulled down a bit by a couple of states that have both low percentages and relatively low scores.
480500520540560580600620
Scor
e
Percent0 20 40 60 80 100
Res
idu
alln
(sco
re)
6.20
6.24
6.28
6.32
6.36
6.40
–0.08
0.00
0.08
1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5ln(percent)
1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5
The complete regression analysis is shown here.
The regression equation is
lnScore � 6.46 �0.0509 lnPercent
Predictor Coef Stdev t–ratio pConstant 6.45586 0.01351 477.99 0.000lnPercent �0.050937 0.003967 �12.84 0.000
s � 0.02921 R–sq � 77.5% R–sq(adj) � 77.0%
Analysis of Variance
SOURCE DF SS MS F pRegression 1 0.14072 0.14072 164.89 0.000Error 48 0.04096 0.00085Total 49 0.18168
Statistics in Action Instructor’s Guide, Volume 1 Section 3.5 Solutions 153© 2008 Key Curriculum Press