stat answers

Upload: ssckp86

Post on 04-Apr-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/29/2019 Stat Answers

    1/79

    Answers to ExercisesPart I. Design ofExperimentsChapter 2. Observational StudiesSet A, page 20I. False. The population got bigger too. You need to look at the number of deathsrelative to total population size. The population in 2000 was about 281 million,and in 1970 it was about 203 million: 2.4 out of 281 is smaller than 1.9 out of

    203, so the death rate was lower in 2000. There was a very considerable increasein life expectancy between 1970 and 2000.Comment. Between 1970 and 2000, the population got older, on average, so thereduction in death rates is even more impressive.

    2. The basic facts: richer families are more likely to volunteer for the experiment,and their children more vulnerable to polio (section 1 of chapter 1).(a) From line 1 of the table, the polio rates in the two vaccine groups were

    about the same. I f (for example) the consent group in the NFIP study hadbeen richer, their rate would have been higher.(b) From line 3 of the table, the polio rates in the two no-consent groups wereabout the same.(c) From line 2 of the table, the polio rate in the NFIP control group was quitea bit lower than the rate in the other control group.(d) The no-consent group is predominantly lower-income, and the children aremore resistant to polio. The NFIP control group has a range of incomes,including the more vulnerable children from the higher-income families.(e) The ones who consent are different from the ones who don't consent (p. 4).Comment on (c). The NFIP controls had a whole range of family backgrounds.The controls in the randomized experiment were from families who consentedto participate. These families were richer, and their children more vulnerable topolio. The NFIP design was biased against the vaccine.

    3. Children who were vaccinated might engage in more risky behavior-a biasagainst the vaccine. On the other hand, the placebo effect goes in favor of the vaccine. (The similarity of rates in line 1 of table 1, p. 6, suggests biases are small.)4. No, because the experimental areas were selected in those parts of the countrymost at risk from polio. See section 1 of chapter 1.5. The people who broke the blind found out whether or not they were getting vitamin C. The ones who knew they were getting vitamin C for prevention tended to

    get fewer colds. Those on vitamin C for therapy tended to get shorter colds. Thisis the placebo effect. Blinding is important.6. 558/1,045 R:: 53%, and 1,813/2,695 R:: 67%. Adherence is lower in the nicotinicacid group. Something went wrong with the randomization or the blind. (Forexample, nicotinic acid might have unpleasant side effects, which causes subjectsto stop taking it.)

  • 7/29/2019 Stat Answers

    2/79

    A-44 ANSWERS TO EXERCISES (pages 22-23)

    7. In trial (i), something must have gone wrong with the randomization. The difference between 49.3% and 69.0% shows that the treatment group smoked less tobegin with, which would bias any further comparisons. The difference cannot bedue to the treatment, because baseline data say what the subjects were like beforeassignment to treatment or control. (More about this in chapter 27.)8. Option (ii) explains the association, option (i) does not. Choose (ii). Seep. 20.9. (a) Yes: 39 deaths from breast cancer in the treatment group, versus 63 in thecontrol group.(b) The death rate in the treatment group (screened and refused together) is aboutthe same as the death rate in the control group because screening has littleimpact on deaths from causes other than breast cancer.(c) Compare A) the control group with B) those who refused screening in the

    treatment group. Group A includes women who would accept screening aswell as those who would refuse. On average, then, group A is richer thangroup B. Neither group is affected by screening, and group A has a higherdeath rate from breast cancer.(d) Most deaths are from causes other than breast cancer; those rates are notaffected by screeping. However, the women who refuse screening are poorerand more vulnerable to most diseases. That is why their death rates are higher.Comments. (i) In part (a), you should compare the whole treatment group withthe whole control group. This is the "intention to treat" principle. It is conservative, that is, it understates the benefit of screening. (If all the women had come infor screening, the benefit would have been higher.) You should not compare the"examined" with the "refused" or with the controls: that is biased against treatment, see exercise lO(a).(ii) The Salk vaccine field trial could have been organized like HIP: (1) definea study population of, say, 1,000,000 children; (2) randomize half of them totreatment and half to control, where treatment is the invitation to come in andbe vaccinated; (3) compare polio rates for the whole treatment group versus thewhole control group. In this setup, it would not be legitimate to compare just thevaccinated children with the controls; you would have to compare the whole treatment group with the whole control group. The design actually used in the Salkfield trial was better, because of the blinding (section 1 of chapter 1); however,this seems to have been a relatively minor issue for HIP, and the design they usedis substantially easier to manage.

    10. (a) This is not a good comparison. There is a bias against screening. The comparison between the "examined" and "refused" groups is observational, eventhough the context is an experiment: it is the women who decide whether tobe examined or not. This is just like adherence to protocol in the clofibratetrial (section 2). There are confounding variables, like income and education,to worry about. These matter. The comparison is biased against screeningbecause the women who come in for examination are richer, and more vulnerable to breast cancer.(b) This is not a good theory: the overall death rate in the treatment group fromdiseases other than breast cancer is about the same as that in the control group,and the reduction in breast cancer death rate is due to screening.(c) False. Screening detects breast cancers which are there and would otherwisebe detected later. That is the point of screening.

  • 7/29/2019 Stat Answers

    3/79

    ANSWERS TO EXERCISES (pages 23-34) A-45

    Comments. (i) In the HIP trial, the number of deaths from other causes is large,and subject to moderately large chance effects, so the difference 837-879 = -42is not such a reliable statistic. More about this in chapter 27. The comparison of1.1 and 1.5 in 10(a) is very unreliable, because the number of breast cancers is sosmall-23 and 16. However, the difference between 39 and 63 in 9(a) is hard toexplain as a chance variation.(ii) In part lO(c), within the treatment group, the screened women had a higherincidence rate of diagnosed breast cancer, compared to the women who refused.The two main reasons: (1} screening detects cancers; (2) breast cancer-like polioand unlike most other diseases-hits the rich harder than it hits the poor, and therich are more likely to accept screening.(iii) The benefits of mammography for women age 50-70 are now generally recognized; there remains some question whether the benefits extend to women below the age of 50. For references, see note 14 to chapter 2.

    11. The women who have been exposed to herpes are the ones who are more activesexually; this evidence is not convincing. (See example 2 on p. 16.)Comment. In the 1970s, herpes (HSV-2) was thought to be causal. In the 1980s,new evidence from molecular biology suggested that HSV was not a primarycausal agent, and implicated strains of human papilloma virus (HPV-16, 18). Forreferences, see note 4 to chapter 2.

    12. I f a woman has already aborted in a previous pregnancy-and is therefore moreat risk in her current pregnancy-a physician is likely to tell her to cut down onexercise. In this instance, exercise is a marker of good health, not a cause.

    13. False. Altogether, 900 out of 2,000 men are admitted, or 45%; while 360 outof 1,100 women are admitted, or 33%. This is because women tend to apply todepartment B, which is harder to get into. See section 4.

    14. (a) 39 out of 398 is like 40 out of 400, or 10 out of 100, or 10%.(b) 25% (c) 25% (d) 50%15. (a) 10%. That's spread over a $10,000 range, so for the next three parts, guessabout 1% in each $1,000 range.(b) 1% (c) 1% (d) 2%

    Part II. Descriptive StatisticsChapter 3. The HistogramSet A, page 331. (a) 2% (b) 3% (c) 4%2. More between $10,000 and $11,000.3. (a) B (b) 20% (c) 70%

    (d) 5%

    4. (a) Well over 50%. (b) Well under 50%.5. Class (b).

    (e) 15% (f ) 15%

    (c) About 50%.

  • 7/29/2019 Stat Answers

    4/79

    A-46 ANSWERS TO EXERCISES (pages 34-42)

    6. There were more in the range 90 to 100.7. A (ii), B (i), C (iii)8. The figure does not adjust for inflation, so the comparison is not a good one.Comment. In 1973, a dollar bought roughly 4 times as much as in 2004. The figure below compares the 2004 histogram with the 1973 histogram--corrected for thischange in purchasing power. Family income went up by a factorof about 4 in "nominal" dollars, but in "real" dollars--corrected for inflation-there was not that muchimprovement. (We shifted the 2004 histogram to the right a little; data on the consumerprice index are from Statistical Abstract, 1993, table 756; 2003, table 713; table 690in the latter publication suggests about a 15% increase in real family income over theperiod 1980-2000; prices indices are not the most reliable of statistics, because theymay not reflect quality improvements.)

    2 ____ 1973------- 2004

    0

    Set B, page 38

    HIII!0 25

    --l---1IlII!

    --,to---,1 r--1l l r - -1 : l1 1I I II I 1

    --,~ - - 1 - - - - , _ I I I --"1----r---.---.,___

    50 75 100 125 150 175INCOME (THOUSANDS OF DOLLARS) 200

    I. The 1991 histogram is shown in figure 5 on p. 39, and the reason for the spikes isdiscussed on that page.2. Smooths out the graph between 0 and8.3. The educational level went up. For example, more people finished high school andwent on to college in 1991 than in 1970.

    Comment. In this century, there has been a remarkable and steady increase in theeducational level of the population. In 1940, only 25% of the population age 25+had finished high school. By 1993, this percentage was up to 80%, and still climbing. In that year, about 7% of the population age 25+ had completed a master'sdegree or better. In 2005, about 85% of the population age 25+ had a high schooldegree, and 9% had a master's degree or better.4. Wentup.Set C, page 41I. 15% per $100.2. Option (ii) is the answer, because (i) doesn't have units, and (iii) has the wrongunits for density.3. 1,750, 2,000, 1, 0.5. The idea on density: I f you spread 10 percent evenly over1 em = 10 mm, there is 1 percent in each mm, that is, 1 percent per mm.4. (a) 1.5% per cigarette x 10 cigarettes= 15%.(b) 30% (c) 30% + 20% =50% (d) 10% (e) 3.5%

  • 7/29/2019 Stat Answers

    5/79

    ANSWERS TO EXERCISES (pages ~ 0 ) A ~ 1

    Set D, page 441. (a) qualitative

    (b) qualitative(c) quantitative, continuous(d) quantitative, continuous(e) quantitative, discrete2. (a) Number of children is a discrete variable.(b) ___ HS

    50 ___________BA

    5

    0 0 2 3 4 5 6NUMBER OF CHILDREN

    (c) Better-educated women have fewer children.Set E, page 461. On the whole, the mothers with four children have higher blood pressures. Causality is not proved, there is the confounding factor of age. The mothers with fourchildren are older. (After controlling the age, the Drug Study found there was noassociation left between number of children and blood pressure.)2. Left: adds 10 mm Right: adds 10%Set F, page 481. (a) 7% (b) 5%

    (c) The users tend to have higher blood pressures.2. Use of the pill is associated with an increase in blood pressure of several mm.3. The younger women have slightly higher blood pressures.

    Comment. This is a definite anomaly. Most U.S. studies show that systolic bloodpressure goes up with age. By comparison, the younger women in the Contraceptive Drug Study have blood pressures which are too high, while the older womenhave blood pressures which are too low. This probably results from bias in theprocedure used to measure blood pressures at the multiphasic, which tended tominimize the prevalence of blood pressures above 140 mm.

    Chapter 4. The Average and the Standard DeviationSet A, page 601. (a) (b) (c)

    X )( I )(3 4 5 3 4.3 s Avet t t

  • 7/29/2019 Stat Answers

    6/79

    A-48 ANSWERS TO EXERCISES (pages 60-fj5)

    Comment. With two numbers, the average is half way between. I f you add biggernumbers to the list, the average moves up. (Smaller numbers move it down.) Theaverage is always somewhere between the smallest and biggest number on the list.2. I f he average is 1, the list consists of ten 1 s. If the average is 3, the list consists often 3's. The average cannot be 4: it has to between 1 and 3.3. The average of (ii) is bigger, it has the large entry 11.4. (10x66 inches+ 77 inches)/11 = 67 inches = 5 feet 7 inches. Or reason this way:the new person is 11 inches taller than the old average. So he adds 11 inches/11 =1 inch to the average.5. 5 feet 6! inches. As the number of people in the room goes up, each additionalperson has less of an effect on the average.6. 5 feet 6 inches + 22 inches = 7 feet 4 inches: it's a giraffe.

    7. The Rocky Mountains are at the right end, Kansas is around 0 (sea level), and theMarianas trench is at the left end.

    8. The conclusion does not follow, the data are cross-sectional not longitudinal. Themen with higher diastolic blood pressures are likely to die earlier; they will not berepresented in the graph. Furthermore, men with higher blood pressure are morelikely to be put on medications that reduce blood pressure.

    9. During the recessions, firms tend to lay off the workers with lowest seniority, whoare also the lowest paid. This raises the average wage of those left on the payroll.When the recession ends, these low-paid workers are rehired.Comment. It matters who is included in an average-and who is excluded.

    Set B, page 651. (a) 50 (b) 25 (c) 402. (a) median= average (b) median= average

    (c) median is to the left of the average-long right hand tail at work.

  • 7/29/2019 Stat Answers

    7/79

    ANSWERS TO EXERCISES (pages 65-70) A-49

    3. 204. The average has to be bigger than the median, so guess 25. (The exact answer is 27 .)5. The average: long right-hand tail.6. (a) I (b) 10 (c) 5 (d) 5

    ("Size" means, neglecting signs.)Set C, page 67I. (a) average = 0, r.m.s. size = 4

    (b) average= 0, r.m.s. size= 10.On the whole, the numbers in list (b) are bigger in size.

    2. (a) lO (to one decimal place, the exact answer is 9.0).(b) 20 (to one decimal place, the exact answer is 19.8).(c) 1 (to one decimal place, the exact answer is 1.3).The average of the lists is 0; the r.m.s. operation wipes out the signs.

    3. For both lists, it's 7; all the entries have the same size, 7.4. The r.m.s. size is 3.2.5. The r.m.s. size is 3.1.

    Comment. The r.m.s. in exercise 5 is smaller than in exercise 4. There is a reason.Suppose we are going to compare each number on a list to some common value. Ther.m.s. size of the amounts off depends on this value. For some values the r.m.s. islarger, for others the r.m.s. is smaller. When is the r.m.s. smallest? It can be provedmathematically that the r.m.s. size of the amounts off is smallest for the average.

    6. The errors are way bigger than 3.6, which is supposed to be the r.m.s. size. Something is wrong with the computer.Set D, page 701. (a) 170 em is 24 em above average, the SD is 8 em, so 24 em represents 3 SDs.(b) 2 em is 0.25 SDs.

    (c) 1.5 x 8 = 12 em, the boy is 146- 12 = 134 em tall.(d) shortest, 146- 18 = 128 em; tallest, 146 + 18 = 164 em.2. (a) 150 em-about average; 4 em is only 0.5 SDs.130 em-unusually short; 16 em is 2 SDs.165 em-unusually tall.140 em-about average.

    (b) About 68% were in the range 138 to 154 em (ave 1 SD), and 95% were inthe range 130 to 162 em (ave 2 SD).3. biggest, (iii); smallest, (ii).

    Comment. All three lists have the same average of50 and the same range, 0 to 100.But in list (iii), more of the numbers are further away from 50. In list (ii), more ofthe numbers are closer to 50. There is more to "spread" than the range.4. (a) 1, since all deviations from the average of 50 are 1.(b) 2 (c) 2 (d) 2 (e) 10

  • 7/29/2019 Stat Answers

    8/79

    A-50 ANSWERS TO EXERCISES (pages 70-73)Comment. The SD says how far off average the entries are, on the whole. Just askyourself whether the amounts off are on the whole more like 1, 2, or 10 in size.

    5. 25 years. The average is maybe 30 years, so i f 5 years were the answer, manypeople would be 4 SDs away from the average; with 50 years, everybody wouldbe within 1 SD of the average.6. (a) (i) (b) (ii) (c) (v)7. In trial (i), something went wrong: the treatment group is much heavier than thecontrol group. (See exercise 7 on p. 22.)8. The averages and SDs should be about the same, but the investigator with thebigger sample is likely to get the tallest man, as well as the shortest. The biggerthe sample, the bigger the range. The SD and the range measure different things.9. Guess the average, 69 inches. You have about 1/3 of a chance to be off by moreone SD, which is 3 inches.

    10. 3 inches. The SD is the r.m.s. deviation from average.Set E, page 721. The SD of (ii) is larger; in fact, the SD of (i) is 1, the SD of (ii) is 2.2. No, the SD is different from the average absolute deviation, so the method iswrong.3. No, the 0 does count, so the method is wrong.4. (a) All three classes have the same average, 50.(b) Class B has the biggest SD; there are more students far away from average.(c) All three classes have the same range. There is more to spread than the range;see exercise 3 on p. 70.5. (a) (i) average= 4; deviations= -3, -1, 0, 1, 3; SD = 2.(ii) average= 9; deviations= -3, -1, 0, 1, 3; SD = 2.(b) List (ii) is obtained from list (i) by adding 5 to each entry. This adds 5 to theaverage, but does not affect the deviations from the average. So, it does not

    affect the SD. Adding the same number to each entry on a list does not affecttheSD.6. (a) (i) average= 4; deviations= -3, -1, 0, 1, 3; SD = 2.(ii) average= 12; deviations= -9, -3, 0, 3, 9; SD = 6.(b) List (ii) is obtained from list (i) by multiplying each entry by 3. This multiplies the average by 3. It also multiplies the deviations from the average by afactor of 3, so it multiplies the SD by a factor of 3. Multiplying each entry ona list by the same positive number just multiplies the SD by that number.7. (a) (i) average= 2; deviations= 3, -6, 1, -3, 5; SD = 4.(ii) average= -2; deviations= -3, 6, -1, 3, -5; SD = 4.

    (b) List (ii) is obtained from list (i ) by changing the sign of each entry. Thischanges the sign of the average and all the deviations from the average, butdoes not affect the SD.8. (a) This would increase the average by $250 but leave the SD alone.(b) This would increase the average and SD by 5%.9. The r.m.s. size is 17, and the SD is 0.

  • 7/29/2019 Stat Answers

    9/79

    ANSWERS TO EXERCISES (pages 73-88) A-51

    10. The SD is much smaller than the r.m.s. size. Seep. 72.11. No.12. Yes; for instance, the list 1, 1, 16 has an average of6 and an SD of about 7.Chapter 5. The Normal Approximation for DataSet A, page 821. (a) 60 is 10 above average; that's 1 SD. So 60 is + 1 in standard units. Similarly,45 is -0.5 and 75 is +2.5.(b) 0 corresponds to the average, 50. The score which is 1.5 in standard units is 1.5SDs above average; that 's 1.5 x 10 = 15 points above average, or 65 points.

    The score 22 is -2.8 in standard units.2. The average is 10; the SD is 2.

    (a) In standard units, the list is + 1.5, -0.5, +0.5, -1.5, 0.(b) The converted list has an average of 0 and an SD of I. (This is always so:when converted to standard units, any list will average out to 0 and the SDwill be 1.)

    Set B, page 841. (a) 11%

    (d) 25%2. (a) 13. (a) 1.65

    (b) 34%(e) 43%(b) 1.15

    (c) 79%(f ) 13%

    (b) 1.30. It's NOT the same z as in (a).l f ~ ~ 9 0 % , t h e n ~ = 8 0 %

    z. -z..4. (a) 100% - 39% = 61%.(b) impossible without further information5. (a) 58%...;- 2 = 29% (b) 50%-29% = 21%.(c) impossible without further information.Set C, page 881. (a)

    / / / /1/( / /164-.3 In 66 inAve.

    0 .65

    (b) 69% (c) 0.2 of 1%.

    orQ -64-. '3 = .G52 .6

    z.

    65

    Percent = shaded area= 74%

  • 7/29/2019 Stat Answers

    10/79

    A-52 ANSWERS TO EXERCISES (pages 88-111)2. (a) 77% (b) 69%3. In figure 2, the percentage of women with heights between 61 inches and 66 inches

    is exactly equal to the area under the histogram and approximately equal to the areaunder the normal curve.Set D, page 891. (a) 75% (b) $29,000(c) 75%. Reason: 90% - 10% = 80% are in the range $15,000 to $135,000; and$15,000 to $125,000 is about the same range but a little smaller.2. 5,95.3. $7,000.4. The area to the left of the 25th percentile has to be 25% of the total area, so the25th percentile must be quite a bit smaller than 25 mm.5. (a) It has fatter tails.(b) The interquartile range is about 15.Set E, page 921. She was 2.15 SDs above average, at the 98th percentile.2. The score is 0.85 SDs above average, which is 0.85 x 100 85 points above

    average. That's 535 + 85 = 620.3. 2.75 points-0.50 SDs below average.Set F, page 931. (a) The average is

    59 X (98.6- 32) =37.0TheSDis

    59 X 0.3 =0.17(b) In standard units, the change of scale washes out, so the answer is 1.5.

    Chapter 7. Plotting Points and LinesSet A, page I ll1. A= (1, 2) B = (4, 4) C = (5, 3) D = (5, 1) E = (3, 0).2. x up by 3, y up by 2.3. PointD.

  • 7/29/2019 Stat Answers

    11/79

    ANSWERS TO EXERCISES (pages 112-114) A-53

    Set B, page 1121. The four points all lie on a line.

    4 a 2 1

    2 3 42. The maverick is (1, 2) and it is above the line.3. The points all lie on a line.

    X y 10I 3 9 2 5 e3 74 9 7

    66 43 2

    2 a 44. (1, 2) is out; (2, 1) is in.5. (1, 2) is in; (2, 1) is out.6. (1, 2) is in; (2, 1) is out.Set C, page 1141. Fig. 16 Fig. 17

    SlopeIntercept -1/4inperlb 51 in -10Fig. 18

    10Note: In Figure 18, the axes cross at (2, 2).

  • 7/29/2019 Stat Answers

    12/79

    A-54 ANSWERS TO EXERCISES (pages 115-116)

    Set D, page 1151. 4 (a )

    (c)

    -1

    (b )- 3_,..

    2. On the line.3. On the line.4. Above the line.5.

    10

    -1

    6. 6

    1 2 a ...Set E, page 1161. e

    Slope Intercept Height at x = 2 5(a) 2 1 5 (b )(b) 1/2 2 3

    11 2 a 4

  • 7/29/2019 Stat Answers

    13/79

    ANSWERS TO EXERCISES (pages 116-123) A-5532. (a) y = 4x + 1 1(b) y= - - x+44 1(c) y = - - x +22

    3. They are all on the line y = 2x.e76

    1

    1 2 a 44. They are all on the line y = x.

    1 2 a 45. (a) on the line. (b) above the line. (c) below the line.6. All three statements are true. I f you understand exercises 4, 5, and 6, you are ingood shape for part III.

    Part ill. Correlation and RegressionChapter 8. CorrelationSet A, page 1221. (a) shortest father, 59 inches; his son, 65 inches.(b) tallest father, 75 inches; his son, 70 inches.(c) 76 inches, 64 inches.

    (d) two: 69 inches, 70 inches.(e) ave= 68 inches. (f ) SD = 3 inches.2. X y--1 42 33 14 14 2

  • 7/29/2019 Stat Answers

    14/79

    A-56 ANSWERS TO EXERCISES (pages 123-130)3. (a) avex = 1.5 (b) SD of x = 0.5(c) ave y = 2 (d) SD of y = 1.54. 5 4

    l : ~ 3 2

    1 1 1 2 3 2 3

    (b) C, G, H (c) a v e ~ 50(e) a v e ~ 30

    5. (a) A, B, F(d) SD 25(t) False. (g) False, the association is negative.

    6. (a) 75(d) The final.Set B, page 128

    (b) 10(e) The final. (c) 20(t) True.

    1. (a) Negative. The older the car, the lower the price.(b) Negative. The heavier the car, the less efficient.

    4 6

    2. Left: ave x = 3.0, SD x = 1.0, ave y = 1.5, SD y = 0.5, positive correlation.Right: ave x = 3.0, SD x = 1.0, ave y = 1.5, SD y = 0.5, negative correlation.3. The left hand diagram has correlation closer to 0, it 's less like a line.4. The correlation is about 0.5.5. The correlation is nearly 0.

    Comment. Psychologists call this "attenuation." I f you restrict the range of onevariable, that usually cuts the correlation down.6. (a) All the points on the scatter diagram would lie on a line sloping up, so thecorrelation would be 1.

    (b) Close to 1; this is like part (a), with some noise thrown into the data.Comment. In the March 2005 Current Population Survey, the correlation betweenthe ages of the husbands and wives was about 0.93; the husbands were, on average,2.3 years older than their wives.

    7. (a) Nearly -1: the older you are, the earlier you were born; but there is somefuzz, depending on whether your birthday is before or after the day of thequestionnaire.(b) Somewhat positive.8. (a) Somewhat positive. Although wife's income must be less than family income,

    the two are positively associated.(b) Nearly -1 . I f family income is practically constant, the more the wife makes,the less the husband can make.Comment. In the March 2005 Current Population Survey, the correlation betweenwife's income and total income was about 0.70. Among families with total incomein the range $80,000--$90,000, the correlation between husband's income and wife'sincome was about -0.98.

    9. False: seep. 126.

  • 7/29/2019 Stat Answers

    15/79

    ANSWERS TO EXERCISES (pages 131-144) A-57

    Set C, page 1311. (a) True. (b) False.2. Dashed.3. He is one SD above average in height and must weigh 140 + 20 = 160 pounds.4. (a) Yes. (b) No. (c) Yes.Set D, page 1341. (a) ave of x = 4, SD of x = 2ave of y = 4, SD of y = 2

    Standard unitsX

    -1.5-1.0-0.50.00.51.0

    1.5

    y1.01.50.50.0

    -0.5-1.5-1.0

    Product-1.50-1.50-0.250.00-0.25-1.50-1.50

    r = average of products -0.93(b) r = 0.82, by calculation.(c) No calculation is necessary: r = -1 . The points all lie on a line sloping down,

    y = 8 -x .2. About 50%.3. About 25%.4. About5%.Chapter 9. More about CorrelationSet A, page 1431. (a) About the same.(b) The maximum has to be bigger than the minimum.2. No: the correlation between x and y is the same as the correlation between y

    andx.3. r stays the same.4. r stays the same.5. r changes.6. (a) Up. (b) Down. (c) Reverses the sign.7. (a) 1 (b) Goes down.(c) r will be less than !-measurement error.8. The correlation would go down (to about 0.25, in fact).

  • 7/29/2019 Stat Answers

    16/79

    A-58 ANSWERS TO EXERCISES (pages 144-148)

    9. The correlation for the whole year is bigger; for example, it will be very cold inthe winter, very hot in the summer-in both cities.Comment. This is another example of "attenuation" (exercise 5 on p. 130). Inthe scatter diagram below, the crosses show the data for June 2005 (r = 0.42);the dots show the data for days in other months; the correlation for all 365 daysis 0.92. Focusing on June restricts the range of the temperatures, and attenuates(weakens) the correlation.

    1009080

    z 7060504030

    : . I2 0 + - ~ - - . - - . - - r - - . - - . - - . - . 20 30 40 50 60 70 80 90 100WASHINGTON

    10. Data set (iii) is the same as (ii), with x andy switched; so r is 0.7857. Data set (iv)comes from (i), by adding 1 to each x-value, so r is 0.8571. Data set (v) comesfrom (i) by doubling each y-value, so r is 0.8571 too. Data set (vi) comes from(ii) by subtracting 1 from each x-value, and multiplying each y-value by 3, so ris 0.7857.

    Set B, page 1451. Each diagram separately has correlation near 0.6. But all together, things look

    much more like a line, and the correlation is closer to 0.9-this is attenuation inreverse.2. Somewhat more than 0.67. This is like the previous exercise: when you put all thechildren together, the data are much more linear. Also see exercise 9 on p. 144.3. Yes; the only difference is a change of scale.4. Yes; it's like any of the diagrams in the previous exercise, so r 0.7.Set C, page 1481. (i) should be summarized using r, (ii) and (iii) should not.2. False: like diagram (iii) in exercise 1.3. Nearly 1. There is a strong association, but the relationship is quadratic not linear,

    so the correlation cannot be 1.4. Both are false. You need to look at the scatter diagram to check for outliers or

    non-linearity.

  • 7/29/2019 Stat Answers

    17/79

    ANSWERS TO EXERCISES (pages 149-161) A-59

    Set D, page 1491. (a) Diagram is not given. (b) True.(c) This cannot be determined from the data (but is true by other studies).2. No. This correlation might well exaggerate the strength of the relationship-it'sbased on rates.Set E, page 1521. Duration is only measured to the nearest 2 million years; this variable is not easy

    to determine very accurately.2. Yes, and this would exaggerate the strength of the association.3. (a) True. (b) True. (c) True. (d) False.

    Moral: association is not the same as causation.4. Probably, but this doesn 't follow from the data. It could be, for example, that peoplewho have trouble reading watch more television-so causality runs in the other

    direction. After all, the correlation between x and y equals the correlation betweeny andx.5. The best explanation is the association between coffee drinking and cigarette smok-ing. Coffee drinkers are likelier to smoke, smoking causes heart trouble.6. This is an observational study, not a controlled experiment, and plotting pointsfrom the fifties or seventies on the graph just makes a mud pie.

    The Phlllipa "Curve" for the period 1949-74.

    c,Q....111

  • 7/29/2019 Stat Answers

    18/79

    A-60 ANSWERS TO EXERCISES (page 161)

    final will only be about 0.5 SDs above average on the final, that is,0.5 x 15 = 7.5points. So, the estimated average score on the final for this group is60+7.5 = 67.5.Comment. The regression estimates always lie on aline-the regression line. Moreabout this in chapter 12.

    2. (a) 190 pounds(c) -68 pounds (b) 173 pounds(d) -206 pounds.Comment on (c). This is getting ridiculous, but the Public Health Service didn'trun into any little men 2 feet tall, so the regression line doesn't pay much attentionto this possibility. The regression line should be trusted less and less the furtheraway it gets from the center of the scatter diagram.

    3. False. Think of the scatter diagram for the heights and weights of all the men.Take a vertical strip over 69 inches, representing all the men whose height wasjust about average. Their average weight shouldbe just about the overall average.But the men aged 45-74 are represented by a different collection of points, someof which are in the strip, and many of which aren't. The regression line says howaverage weight depends on height, not age. (The older men actually weigh a littlemore than average-middle-age spread has set in.).rThemenof.J, average height

    Height Height

    The men aged"''S to 7'1

    4. These women have completed 12 years of schooling, which is 2 years below average. They are 2(2.4 0.83 SDs below average in schooling. The estimate is thatthey are below average in income, but not by 0.83 SDs--nly by r x 0.83 0.28SDs of income. In dollars, that's 0.28 x $26,000 $7,300. Their average incomeis estimated asoverall average- $7,300 = $32,000-$7,300 = $24,700.

    5. The points must all lie on the SD line, which slopes down; the rate is one SDof yper SD of x.

  • 7/29/2019 Stat Answers

    19/79

    ANSWERS TO EXERCISES (pages 163-165) A-61

    Set B, page 1631. (a) True: the graph of averages slopes upward. Generally, men with higher in

    comes have wives with higher incomes. People often choose mates with similar educational levels and family backgrounds, which tends to bring incomesinto line as well.

    2.

    (b) Chance error. The data are from a sample, and there are only 4 couples behindthe dot.(c) The regression estimates would be a little too low: the line runs below the dots./ 5 0 l ine

    /

    Reareseion line

    The crosses fall on the solid regression line, the dashed line is the SD line.3. For the two diagrams on the left, the SD line is dashed and the regression line issolid. For the two on the right, the SD line is solid and the regression line is dashed.Moral: the regression line isn't as steep as the SD line.4.

    (a)

    (b)

    (c)

    Sc a t t e rd i a8 r am

    el. .1 2 a1

    :L1 z a 4

    Graph o favcraaea al"'dresrea4aion lil"'

    : ~ 2 2

    1 2 a:f1

    2

  • 7/29/2019 Stat Answers

    20/79

    A--62 ANSWERS TO EXERCISES (pages 164-174)

    4. ( d )

    +L4s 3

    a 2.1 1 )(

    1 2. a 4 1 2. 3 4Set C, page 1671. (a) 67.5 (b) 45 (c) 60 (d) 60

    This exercise is about individuals; exercise 1 on p. 161 was about groups. Thearithmetic for parts (a-c) is the same; pp.165-66.2. (a) 79% (b) 38% (c) 50% (d) 50%

    Work for (a):~ = 9 0 % ~ = 8 0 % z .., 1.'3

    z - z zIn standard units, his SAT score was 1.3. The regression prediction for his first-yearscore is 0.6 x 1.3 0.8 in standard units.

    ~ = 7 9 % o.eThis corresponds to a percentile rankof79%. In example 2, the predicted percentilerank was only 69%, which is closer to 50%. That is because the correlation waslower in example 2. There is more regression to the mean in example 2.

    3. (a) The SD line-dashed. (b) The regression line-solid.4. (a) There is a minimum age for marriage.

    (b) Age is reported as a whole year; there are a lot of husbands age 30, but noneaged 30.33; likewise for the wives.5. False. The regression line says how average weight depends on height, not on age.See exercise 3 on p. 161.Set D, page 1741. No, this looks like the regression effect. Imagine a controlled experiment. At oneairport, the instructors discuss the ratings with the pilots. At another, the instructors keep the ratings to themselves. Even at the second airport, the ratings on thetwo landings will not be identical-differences come in. So the regression effectappears: on the average, the bottom group improves a bit, and the top group fallsback. That is probably all the air force saw in their data.2. No. It looks like the tutoring had an effect-regression would only take them closerto the average, but they got to the other side.3. The sons of the 61-inch fathers are taller, on the average, than the sons of the 62-

  • 7/29/2019 Stat Answers

    21/79

    ,...... .0..._,

    ANSWERS TO EXERCISES (pages 175-184) A-63

    inch fathers. This is just chance variation. By the luck of the draw, Pearson got toomany families where the father was 61 inches tall and the son was extra tall.Comment. There were only 8 families where the father was about 61 inches tall,and 15 where the father was 62 inches-lots of room for chance error.

    Set E, page 1751. False. There are two completely different groups of men here. (See the diagram

    below.) The ones who are 63 inches tall are in the vertical strip. They average 138pounds in weight, as shown by the cross. The ones who weighed 138 pounds arein the horizontal strip. Their average height is shown by a heavy dot, and it's a lotmore than 63 inches.Remember, there are two regression l ines-

    one for weight on height, one for height on weight.This line predictsheight ~ r o m weight

    SD line

    1.38 1-1-- . .J -C-f+--- ' ' - - - - -+--

    This linepredictsweightf romheight

    63Height (in) Height (in)

    2. False. The fathers only average 69 inches; you have to use the other line.3. False. This is just like exercises 1 and 2. (A typical student at the 69th percentile ofthe first-year tests should be at the 58th percentile on the SAT; use the other line.)Chapter 11. The R.M.S. Error for RegressionSet A, page 1841. B is tall and chubby, whileD is short and skinny.2. (a) False. (b) True.3. Prediction errors = -7 , 1, 3, -1 , 4; r.m.s. error= 3.9.4. (a) 0.2 (b) 1 (c) 5.5. A few thousand dollars.6. The one with the smaller r.m.s. error should be used, as it will be more accurate

    overall.

  • 7/29/2019 Stat Answers

    22/79

    A--64 ANSWERS TO EXERCISES (pages 184-193)

    7. (a) 8 points-one r.m.s. error.8. (a) $20,000.Set B, page 1871. ~ I I - 0.62 x 10 = 8 points.2. (a) Guess the average, 65.

    (b) 16 points-two r.m.s. errors.(b) The horizontal line. Seep. 183.

    (b) 10. I f you use the regression line, the r.m.s. error is given by the formula (exercise 1 . I f you use the average, the r.m.s. error is the SD. (See exercises 9-10on p. 71.)(c) Use the regression line, and the r.m.s. error is given by the formula as 8 points(exercise 1).3. Generally, it helps to have more information. The r.m.s. error will be smaller forperson B, by the factor . j 1 - 0.62 = 0.8. Seep. 186.Set C, page 1891. (a) (iii) (b) (ii)2. (a) (i) (ii) (b) not used3. (a) SD of y 1

    (b) SD of r e s i d u a l s ~ 0.6

    (c) (i)(c) (iii)

    (c) SD of y in strip 0.6, about the same as the SD of the residuals.Comment. The vertical scatter in the strip is about the same as the r.m.s. error ofthe regression line-but the vertical scatter in the whole diagram is a lot more thanthe vertical scatter in the strip.Set D, page 1931. (a) True.(b) True; the scatter diagram is homoscedastic, so the subjects are off the regression line by similar amounts in each vertical strip.(c) False, because the scatter diagram is heteroscedastic; 9 points is a sort of av

    erage amount off, but the prediction errors are going to be bigger with highscores.2. (a) .j1 - 0.52 x 2.7 2.3 inches.(b) 71 inches-regression method.(c) 2.3 inches. The scatter diagram is homoscedastic, so the sons' heights are offthe regression line by similar amounts, for any father's height. The amount of fis the r.m.s. error of the line.(d) The prediction is 68 inches, and it is likely to be off by 2.3 inches or so.3. (a) . j 1 - 0.372 x $20,000 $18,600.

    (b) $24,500--regression method.(c) This cannot be determined from the information given. The $18,600 is sort ofthe average amount off the line. But the scatter diagram is heteroscedastic, sothe amount off the line changes from strip to strip. The spread in incomes islarger for more highly educated people, so the amount off will be larger than$18,600.(d) The prediction is $7,100. The amount off cannot be determined, but will beless than $18,600.

  • 7/29/2019 Stat Answers

    23/79

    ANSWERS TO EXERCISES (pages 194-207) A-65

    4. The husband is between 20 and 30 years of age.5. (a) 50, 15 (b) 50, 15 (c) 0.95 (d) 25, 5

    (e) 0.5-at tenuation. See exercise 9 on p. 144 and exercises 1-2 on pp. 145-146.6. (a) The SD for all the wives is much bigger. That is the main point of exercises4-6. See the comments below.(b) The two SDs are about the same.

    Comments. If you just take the families where the husband is 20 to 30 years ofage,the wives are going to be much more similar in age, their SD drops from about 15years to about 5 years. I f you take the husbands born in March, that does not cutdown the variability in the ages of their wives. Smaller samples do not generallyhave smaller SDs (exercise 8 on p. 71 ). But if you restrict the range of x, that willgenerally reduce the SD of y.7. (a) 68 inches, the average.(b) 3 inches, the SD.(c) Regression. I f one twin is 6 ft 6 in, guess 6 ft 5! in for the other one.(d) J1 - 0.952 x 3::::::; 0.9 inches.

    Comments. (i) I f r = 1, you should guess that the height of the second twin equalsthe height of the first one. But r is a little less than 1. So you regress the secondtwin back toward the mean-a little bit.(ii) The answer to (d) is quite a bit smaller than the answer to (b). When r = 0.95,there is quite a large reduction in r.m.s. error when you use the regression line.

    Set E, page 1971. (a)

    63 in 68 inAvePe r c e n t = 2%0

    (b) new average::::::; 63.9 inches, new SD::::::; 2.4 inches

    63.9 inNew ave

    I0

    2. (a) 14% (b)3. (a) 38% (b)

    ///68in

    I://1.70

    33%60%

    New ave,j,68- 63.9 1.70

    2 .41'N e ~ . & ~ S D

    Chapter 12. The Regression LineSet A, page 2071. (a) $2,000 x 8 + $5,000 = $21,000(b) $2,000 X 12 + $5,000 = $29,000(c) $2,000 x 16 + $5,000 = $37,000

    -70Percent . = 4%

  • 7/29/2019 Stat Answers

    24/79

    A-66 ANSWERS TO EXERCISES (pages 207-227)

    2. (a) 240 ounces= 15 pounds (b) 20 ounces.(c) 3 ounces of nitrogen yields 18 lb 12 oz of rice, 4 ounces of nitrogen yields 20pounds of rice.(d) Controlled.(e) Yes. The line fits quite well (r = 0.95), and 3 ounces is close to a value that

    was used.(f) No. That's too far away from the amounts used.3. (a) Predicted son's height = 0.5 x father's height+ 35 inches.(b) Predicted father's height = 0.5 x son's height + 33.5 inches.

    Comment. There are two regression lines, one predicts son's height from father'sheight, the other predicts father's height from son's height (section 5of chapter 10).4. This testimony is overstatement. Associations in the data may be due to confound

    ing. Without doing the experiment, or working very hard at the observational data,you can't be sure what the impact of interventions will be.Set B, page 2101. With 12 years of education, height is predicted as 69.75 inches; with 16 years,height is predicted as 70.75 inches. Going to college clearly has no effect on height.This observational study picked up a correlation between height and education due

    to some third factor in family background.2. 439.16 em, 439.26 em. Hanging a bigger weight on the wire makes it stretch more.

    You can trust the regression line in exercise 2 because it is based on an experiment.In exercise 1, the line was fitted to data from an observational study.3. (a) 540 + 110 = 6504. (a) 540

    (b) 540(b) 540

    (c) Greater than (p. 208).(c) Greater than (p. 208).

    Comment. if you use the average value of y to predict y, the r.m.s. error is the SDof y; seep. 183.

    5. The regression line makes the smallest r.m.s. error (p. 208).

    Part :rv. ProbabilityChapter 13. What Are the Chances?Set A, page 2251. (a) (vi)

    (e) (ii)2. About 500.

    (b) (iii)(f) (v)

    3. About 1,000.4. About 14.

    (c) (iv)(g) (vi) (d) (i)

    5. Box (ii}, because [l] pays more than [1], and the other ticket is the same.Set B, page 2271. (a) The question is about the second ticket, not the first: see part (a) of example 2.

    The answer is 1/4.

  • 7/29/2019 Stat Answers

    25/79

    ANSWERS TO EXERCISES (pages 227-232) A--67

    (b) 1/3; there are 3 tickets left after 11] is drawn.2. (a) 1/4 (b) 1/4

    With replacement, the box stays the same.3. (a) 1/2 (b) 1/2The chances for the 5th toss of the penny do not depend on the results of the first 4tosses.

    4. (a) 1/52 (b) 1/48This is like example 2 on p. 226.

    Set C, page 2291. (a) 12/512. (a) 1/63. (a) 4/52

    (b) 13/52 X 12/51 = 1/17 6%.(b) 1/6 X 1/6 X 1/6 = 1/216 1/2 of 1%.(b) 4/52 X 4/51 X 4/50 5/10,000.

    Comment. In this exercise, the cards are dependent; in exercise 2, the rolls wereindependent.4. "At least one ace" is the better option: you would choose an exam in which you

    had to get at least one question right out of six, over an exam in which you had toget all six right.

    5. This is fine, it's the multiplication rule.6. The coin has to land "tails, heads"; the chance is 1/4.7. (a) 1/8(b) I - 1/8 = 7/8(c) 7/8; you get at least one tail when you don't get three heads: so (b) and (c) are

    the same.(d) 7j8;just switch heads and tails in (c).Set D, page 2321. (a) independent: if you get a white ticket, there is I chance in 3 to get "I" and 2

    chances in 3 to get "2"; if you get the black ticket, the chances for the numbersstay the same.(b) independent(c) dependent: with the white tickets, there is only I chance in 3 to get "2"; withthe black tickets, there are 2 chances in 3.2. (a,b) independent (c) dependent

    Comment. This kind of box will come up again in chapter 27. Here is the argumentfor (a). Suppose you draw a ticket, and see the first number is 4 but don't see thesecond number: the chance that the second number will be 3 is 1/2. Likewise if thefirst number is 1. That is independence.3. Ten years is 520 weeks, so the chance is (999,999/1,000,000)520 0.9995.

    Comment. In the New York State Lotto, your chance of winning something isabout 1/12,000,000.4. This is false. It 's like saying someone doesn't have a temperature because you can'tfind the thermometer. To figure out whether two things are independent or not, you

    pretend to know how the first one turned out, and then see if the chances for thesecond change. The emphasis is on the word "pretend."

  • 7/29/2019 Stat Answers

    26/79

    A-68 ANSWERS TO EXERCISES (pages 232-242)

    5. (a) 5% (b) 20%To figure (a) out, suppose you have 80 men and 20 women in the class. You alsohave 15 cards marked "freshman" and 85 cards marked "sophomore." You want togive out a card to each student, so that as few women as possible get "sophomore."The strategy is to give a sophomore card to each man; you are left with 5, whichhave to go to 5 women. The 15 freshman cards go to the other 15 women.Comment. I f year and sex are independent, the percentage of sophomore womenwould be 85% of 20% = 17%, between the two extremes.

    6. Same as previous exercise: the chance of getting a sophomore woman equals thepercentage of sophomore women in the class.7. False. The calculation assumes that the percentage of women is the same acrossall age groups, and it isn't: women live longer than men. (Actually, women age 85

    and over accounted for nearly 1.1% of the U.S. population in 2002.)8. I f the subject draws the ace of spades from the small pile, he has 13 chances in52 to draw a spade from the big deck, and win the prize. Likewise i f he draws thedeuce of clubs. Or any other card. So the answer is 13/52 = 1/4.Chapter 14. More about ChanceSet A, page 240

    The chance is 4/36.2. There are 25 possible results; for 5 of them, the sum is 6. So the chance is 5/25.(The figure is not shown.)3. Most often, 7; least often, 2, 12. (Use figure 1 to get the chance of each total, as inexercise 1.)4. (a) 2/4 (b) 2/6 (c) 3/6Set B, page 2421. False. The question is about the number of children who had either cookies or ice

    cream, including the gluttons who had both. The number depends on the choicesmade by the children, and two possibilities are shown in the table.Cookies only Ice cream only Both Neither

    12 17 0 213 8 9 30In the first case, 12 children had cookies only, 17 children had ice cream only, 0 hadboth, and 21 had neither. So 12 + 17 = 29 had cookies or ice cream. The secondline shows another possibility, where 9 children had both cookies and ice cream.In this situation, the number with cookies or ice cream is 3 + 8+ 9 = 20. Just asa check: the number with cookies is 3+9 = 12, and the number with ice cream is8 + 9 = 17, as given in the problem. But the number with cookies or ice cream isnot 12 + 17, because the addition double counts the 9 gluttons. The number whohad cookies or ice cream depends on the number of gluttons who had both.

  • 7/29/2019 Stat Answers

    27/79

    ANSWERS TO EXERCISES (pages 243-247) A-69

    2. (a) 4/20 (b) 8/20 (c) 12/20 (d) 14/20Comment. (4 + 8 + 12)/20 gives the wrong answer to (d)-by double-countingsome dots and triple-counting others.

    3. They are the same.4. False. Simply adding the two chances double counts the chance of 81:] . Seeexample 5 on p. 242.5. False. There is 1chance in 10 of getting ITJ on any particular draw, but these eventsare not mutually exclusive.6. True. 100% - (10% + 20%) = 70%. Use the addition rule, and p. 223 for thesubtraction.Set C, page 2461. (a) 1/52 of the contestants step forward.(b) 1/52 of the contestants step forward; example 2 in chapter 13.(c) The ones who got both the ace of hearts on the first card and the king of heartson the second card step forward twice. (In terms of getting the weekend, that'soverkill.) The fraction who step forward twice is 1/52 x 1/51.

    (d) False; as (c) shows, the events aren't mutually exclusive, so addition doublecounts the chance that both occur.Comment. The chance in (d) is1/52 + 1/52 - 1/52 X 1/51.

    2. (a) 1/52 of the contestants step forward.(b) 1/52 of the contestants step forward.(c) If you get the ace of hearts on the first card, you can't get it on the second card;nobody steps forward twice.(d) True; as (c) shows, the events are mutually exclusive, so addition is legitimate.Comment. In exercise 2, the two ways to win are mutually exclusive; not so inexercise 1. Addition is legitimate in exercise 2, not in 1.

    3. (a,b) True; see example 2 in chapter 13.(c) False. "Top card is the jack of clubs" and "bottom card is the jack of diamonds"aren't mutually exclusive, so you can't add the chances.(d) True. "Top card is the jack of clubs" and "bottom card is the jack of clubs" aremutually exclusive.(e,f) False; these events aren't independent, you need the conditional chances.

  • 7/29/2019 Stat Answers

    28/79

    A-70 ANSWERS TO EXERCISES (pages 247-258)

    4. (a) False; 1/2 x 1/3 = 1/6, but A and B may be dependent: you need the condi-tional chance ofB given A.(b) True; see section 4 ofchapter 13.(c) False. ("Mutually exclusive" implies dependence, and the chance is actually 0.)(d) False; 1/2 + 1/3 = 5/6, but you can't add the chances because A and B maynot be mutually exclusive.(e) False; if they're independent, they have some chance ofhappening together, sothey can't be mutually exclusive: don't add the chances.(f) True.Comment. I f you have trouble with exercises 3 and 4, look at example 6, p. 244.

    5. See example 2 in chapter 13.(a) 4/52 (b) 4/51 (c) 4/52 x 4/51

    Set D, page 2501. (a) (i)

    (c) (iii)(e) (i) (ii)

    (b) (i) (ii)(d) (ii) (iii)(f) (i)

    2. Bets (a) and (f) say the same thing in different language. So do (b) and (e). Bet (d)is better than (c).3. (a) 3/4 (b) 3/4 (c) 9/16 (d) 9/16 (e) 1 - 9/16 = 7/164. (a) Chance of no aces = (5/6)3 58%, so chance of at least one ace 42%.Like de Mere, with 3 rolls instead of 4.(b) 67% (c) 89%5. 1 - (35/36)36 64%6. The chance that the point 17 will not come up in 22 throws is (31/32)22 49.7%.The chance that it will come up in 22 throws is therefore 100% - 49.7% = 50.3%.So this wager (laid at even money) was also favorable to the Master of the Ball.Poor Adventurers.7. The chanceof surviving 50 missions is (0.98)50 36%. Deighton is adding chances

    for events that are not mutually exclusive.Chapter 15. The Binomial CoefficientsSet A, page 2581. The number is 4.2. The number is 6.3. (a) (5/6)4 =625/1,296 48%

    (b) 4(1/6)(5/6)3 =500/1,296 39%(c) 6(1/6)2 (5/6)2 = 150/1,296 12%(d) 4(1/6)3(5/6) = 20/1,296 1.5%(e) (1/6)4 = 1/1,296 0.08 of 1%(f ) Addition rule: (150 + 20+ 1)/1,296 13%.

    4. This is the same as exercise 3(a-c). Rolling an ace is like drawing a red marble,while 2 through 6 correspond to green. To see why, imagine two people, A and B,performing different chance experiments:

  • 7/29/2019 Stat Answers

    29/79

    ANSWERS TO EXERCISES (pages 258-277) A-71

    A rolls a die four times and counts the number of aces. B draws four times at random with replacement from the boxI RJ [Q] [Q] [Q] [Q] [Q] I and counts the number of R's.

    The equipment is different, but as far as the chance ofgetting any particular numberof reds is concerned, the two experiments are equivalent.

    There are four rolls, just as there are four draws. The rolls are independent; so are the draws. Each roll has 1 chance in 6 to contribute one to the count (ace); similarlyfor each draw (red).

    10' ( 1 Jo 2525. The chance of getting exactly 5 heads is - 1 - 1 -2 = - 0 4 25%. The5.5. I, 210' ( 1 Jo 210chance of getting exactly 4 heads is 41 2 = 1024 21%. The chance of

    getting exactly 6 heads is the same. By the addition rule, the chance of getting 4through 6 heads is 672/1,024 66%.6. You need the chance of getting 7, 8, 9, or 10 heads when a coin is tossed 10 times.Use the binomial formula, and the addition rule:

    10! (I Jo 10! ( 1 Jo 10! ( 1 Jo 10! ( 1 Jo 1767!3! 2 + 8!2! 2 + 9! 1! 2 + 10!0! 2 = 1,024 17%.Comment. Looks like chance, not vitamins.

    Part V. Chance VariabilityChapter 16. The Law of AveragesSet A, page 2771. The error is 50 in absolute terms, 5% in percentage terms.2. The error is 1,000 in absolute terms, 1/10 of 1% in percentage terms. Compare thiswith the previous exercise: the chance error has gone up in absolute terms (from

    50 to 1,000) but down in percentage terms (from 5% to 1/10 of 1%).3. False. The chance stays at 50%. Seep. 274.4. (a) Ten tosses. As the number of tosses goes up, you are more and more likelyto be close to 50% heads, less and less likely to be above 60% heads. Here,chance variability in the percentages helps you, a small number of tosses is

    better than a large number.(b) One hundred tosses: now chance variability in the percentages hurts you-because you want to be close to 50%. With more tosses, there is less chancevariability in the percentages. More tosses are better.(c) One hundred tosses; like (b).(d) Ten tosses. As the number of tosses goes up, there is less and less chancefor the number of heads to exactly equal the expected number. Let's take amore extreme case: suppose you toss the coin 1,000,000 times. The chance of

    getting exactly 500,000 heads-rather than 500,001 or 500,043 or 499,997 orsome other number close to 500,000---is quite slim.

  • 7/29/2019 Stat Answers

    30/79

    A-72 ANSWERS TO EXERCISES (pages 277-290)

    5. Option (i ) is better. This is just like exercise 4(a).6. Option (ii), the reason is chance error.7. It's about the same with or without replacement.8. Same. Both have 50% I-II's and 50% [!] s.9. Eventually, the chance error would be large and negative. Then, it would get positive again. In absolute terms, the swings get wilder and wilder.Set B, page 2801. 47 X I + 53 X 2 = I53.2. (a) 100, 200 (b) 50, 50 (c) 50 X l + 50 X 2 = 150.3. (a) 100, 900.(b) 33 X 1 + 33 X 2 + 33 X 9 400.

    Comment. 400 isn't halfway between 100 and 900.4. Guess 500 in all three cases; (iii) is best, (i) worst.5. The chance for "1" is I in 10; the chance for "3 or less" is 3 in 10; the chance for

    "4 or more" is 7 in 10---there are 7 numbers from 4 through 10 inclusive. Drawingat random from boxes is discussed in chapters 13-14.6. Box (i) is better, it has fewer -1 s, and the same 2.7. Options (i) and (ii) do it. Your net gain is the sum of your wins and losses, takingsigns into account.Set C, page 2841. (i) and (ii) are the same. (iii) means that all ten draws must be "1," which is worsethan (i).2. Option (i) is no good; the sum of the draws is unrelated to the net gain. Option (ii)is no good; it says you win $17 with 2 chances in 36 on a single play, but yourchances are 2 in 38. Option (iii) is right. If in doubt, review example 1 on p. 283.3. Your net gain is like the sum of 10 draws made at random with replacement fromthe box I ticket 215 tickets rn 1

    This is a terrible game.Chapter 17. The Expected Value and Standard ErrorSet A, page 2901. (a) 100 x 2 = 200 (b) -25 (c) 0 (d) 6 6 ~ Comment on (d). The "expected value" need not be one of the possible values. It'slike saying that the average family has 2.1 children. This is sensible, even thoughthe "average family" is a statistical fiction.2. This is the same as the expected value for the sum of two draws from the boxI n rn rn @] 111 [] I So the answer is 2 X 3.5 = 7 squares.

  • 7/29/2019 Stat Answers

    31/79

    ANSWERS TO EXERCISES (pages 290-294) A-73

    3. The model is given on pp. 283-284. The average of the the numbers in the box is($35- $37)/38 = -$2/38 -$0.05

    (To compute the average, you have to add up the tickets in the box; [ilill adds$35 to the total, but the 37 's take $37 away; then you have to divide bythe number of tickets in the box, which is 38.) The expected net gain is equal to100 x (-$.05) = -$5. You can expect to lose around $5.4. The box is on p. 283. The average of the box is

    ($18- $20)/38 = -$2/38 -$0.05(The average is the total of the numbers in the box, divided by 38; the 18 ticketsmarked "+$1" contribute $18 to the total, while the 20 tickets marked "-$1" take$20 away.) The expected net gain is 100 x (-$0.05) = -$5.Comment. Exercises 3 and 4 show that with either bet (number or red-or-black),you can expect to lose I/19 of your stake on each play.

    5. -$50. Moral: the more you play, the more you lose.6. The average of the box is (18x - $20)/38. To be fair, this has to equal 0. Theequation is 18x- $20 = 0. S o x ~ $1.11. They should pay you $1.11.7. The Master of the Ball should have paid 31 pounds, just as the Adventurers thought.Moral: the Adventurers may have the fun, but it is the Master of the Ball who hasthe profit.Set B, page 2931. (a) The average of the box is 4; the SD is 2. So the expected value for the sum is

    100 x 4 = 400; the SE for the sum is ./100 x 2 = 20.(b) Around 400, give or take 20 or so.(c) Guess 400, off by 20 or so. Parts (b) and (c) interpret the numbers in (a).2. The net gain is like the sum of 100 draws from the box Irn 10 1- The average

    of the box is $0; the SD is $1. The sum of 100 draws has expected value $0; theSE for the sum is ./100 x $1 = $10. So your net gain will be around $0, give ortake $10 or so.3. With option (ii), the numbers are too close to 50; no number is more than 5 away.With option (iii), the numbers alternate much too regularly. Option (i) is it.4. The expected value is 150, the observed value is 157, the chance error is 7, thestandard error is 10.5. Multiplying the number of draws by 4 multiplies the expected value by 4 and theSE by .J4 = 2. The expected value for the sum of 100 draws is 4 x 50 = 200, and

    theSE is 2 x 10 = 20.6. (a) is true, (b) is false: the expected value for the sum of the draws can be computedexactly, as

    number of draws x average of box(c) is false, (d) is true: the sum will be off its expected value, and the SE tells youby about how much.

    7. Yes. The chance is small, but positive. If you wait long enough, events of smallprobability do happen.

  • 7/29/2019 Stat Answers

    32/79

    A-74 ANSWERS TO EXERCISES (pages 296-299)

    Set C, page 2961. (a) Smallest, 100; largest, 400.(b) The average of the box is 2; the SD is 1. The sum has an expected value of100 x 2 = 200; theSE for the sum is ../100 x 1 = 10. The sum will be around200, give or take 10 or so.(c) {///200 250Exp s

    (/// C h a n e ~ = shaded areaI =o%52. (a) Largest, 900; smallest, 100. (b) C h a n c e ~ 68%3. (a) The expected value is 0, so the sum is around 0, and your best hope is chancevariability in the sum-you want the sum to be far from its expected value.Chance variability goes up with the number of draws, choose 100.(b) Same as (a).(c) Now chance variability in the sum works against you, because you want thesum to be close to its expected value; choose 10.4. (i) Expected value for sum = 500, SE for sum = 30.(ii) Expected value for sum = 500, SE for sum = 20.

    Both sums will be around 500, but sum (i) will be further away. In (a) and (b),chance variability helps-choose (i). In (c), chance variability hurts-choose (ii).5. 98%.6. Either they win $25,000 (with chance 20/38 53%) or they lose $25,000 (withchance 18/38 47%). The answer is 50%.

    Comment. The casino is much happier with a lot of small bets, where the profit isalmost guaranteed, than with one big bet, where there is a lot of risk.7. One number will pay off $35,000, but the other 37 will lose, so the gambler loses

    $2,000 for sure.Comment. The casino likes the gamblers to spread their bets.8. Option (ii) is right; theSE doesn't go up by a full factor of 2, but only .J2 1.4.Set D, page 2991. (a) No, replace the 5 by 7 - (-2) = 9. (b) Yes. (c) Yes.(d) No-the list shows 3 different numbers, so the short-cut doesn't apply.2. The net gain is like the sum of 100 draws from the box

    1 rn:Jrnrnrn 1The average of the box is ($2- $1 -$1 - $1)/4 = -$0.25. The SD is($2- (-$1)] X J1/4 X 3/4 $1.30.

    The net gain in 100 plays will be around 100 x (-$0.25) = -$25, give or take../100 x $1.30 = $13 or so.3. (a) From the point of view of the house, a dollar bet on the house special is likeone draw from the box

  • 7/29/2019 Stat Answers

    33/79

    ANSWERS TO EXERCISES (pages 299-304) A-75

    Is tickets 33 tickets lllllThe average of the box is [5 x (-$6) + 33 x $1)]/38 $.08. So the houseexpects to make about 8 cents per dollar bet. As far as the house is concerned,this is a great bet.(b) The player's net gain is like the sum of 100 draws at random with replacementfrom the same box with the signs reversed:Is tickets [ i l] 33 tickets [ J I ] IThe average of the box is -$.08; the SD is

    [$6 - (-$1)] x J'=-s=38=-x-3=-=3--:-:/30 78 $2.37.The player's expected net gain in 100 plays is -$8, give or take $24 or so.

    -+e oxp. 0.35Chance = shadcz.d area0 0 . ' 3 ~ 36%

    4. The expected net gain in 100 one-dollar bets on a section is -$5; the SE is $14.The expected net gain in 100 bets on red is -$5; theSE is $10. Options (i) and (ii)have the same expected net gain. But (i) has the bigger SE, that is, more variability:(a) is false, (b) and (c) are true.Set E, page 3031. (a) You can't add up words, so box (i) is out. With box (iii), you get 2 chances in3 to go up each time, and it should only be 1 in 2. Box (ii) is the one.(b) Average of box= 0.5 and SD of box= 0.5 too. The sum of 16 draws has anexpected value of 16 x 0.5 = 8; theSE is .Jf6 x 0.5 = 2. The number ofheads will be around 8, give or take 2 or so.2. New box: I Q] [Q] [Q] [Q] m It's 3 SE, chance is about 99.7%.3. New box: I Q]m It' s 1 SE or more, chance is about 16%.4. Group of Observed Expected Chance Standard100 tosses value value error error

    1-100101-200201-300301-400

    44544853

    50505050

    -6+4-2+35. Expect about 68---example 5 on p. 301; actually, you see 69.6. (a,b) About 99.7%-it's 3 SEs.

    5555

    Comment. When the number of tosses goes up from 10,000 to 1 000,000, the percentage of heads gets closer to 50%: the 99.7%-interval shrinks from50% 1.5% to 50% 0.15%.

    7. Expected is 30, observed is 33, chance error is 3, SE is about 3.5.8. Put in five O's and five 1 s. Tell it to draw 1,000 times.9. It's fine. The number of aces isn't supposed to be 16.67 exactly, it's only supposedto be around 16.67.

  • 7/29/2019 Stat Answers

    34/79

    A-76 ANSWERS TO EXERCISES (pages 312-319)

    Chapter 18. The Normal Approximation for Probability HistogramsSet A, page 3121. Between 70 and 80 inclusive.2. (a) Between 6.5 and 10.5.(b) Between 6.5 and 7.5-the left and right edges of the rectangle over 7.3. (a) 7(b) 7: tallest bar in 2nd panel.(c) No, this is just chance variation. In fact 4 is less likely than 5, as the probabilityhistogram in the bottom panel shows.(d) (iii). The top panel is an empirical histogram-it shows observed percentages,

    not chances.4. (a) 3, 6(b) Bottom panel-the probability histogram shows chances. The values 2 and 3are equally likely for the product.(c) Look at the second panel: 3 appeared more often. Chance variation again.(d) The value 14 is impossible for the product. Reason: there are only two ways tofactor 14, as 1 x 14 or 2 x 7; no die can show 7 or 14.(e) The bottom panel is a probability histogram, so areas under it representchances: 11.1% is the chance of getting a product of 6 when you roll a pair ofdice.

    5. A goes with (i) and B with (ii). B is lower, more spread out, and farther to the right.Box (ii) has a bigger average and a bigger SD.6. False. The probability histogram for the sum tells you the chances for the sum. Itdoesn't tell you how the draws turned out. The shaded area represents the chancethat the sum will be in the range from 5 to 10 inclusive. (The box had 85 ticketsmarked 0, 2 tickets marked 1, and 13 tickets marked 2.)Set B, page 3181. (i) Exactly 6 heads. (ii) 3 to 7 heads exclusive.(iii) 3 to 7 heads inclusive.2. The area between 51.5 and 52.5 under the histogram gives the exact chance. Thenormal curve is only an approximation (but a very good one).3. The expected number of heads is 50; the SE is 5. You want the area of the rectangleover 60 in figure 3, p. 315.

    I50Ellp

    0

    ~ m 69.5 60 60.5

    1.8 2.1

    Comment. The exact chance is 1.084%.

    0 1.9 2.1

    Chance = shaded area= 1.085%

  • 7/29/2019 Stat Answers

    35/79

  • 7/29/2019 Stat Answers

    36/79

    A-78 ANSWERS TO EXERCISES (pages 325-349)

    6. (i) 100 (ii) 400 (iii) 900The histograms get closer to the normal curve as the number of draws goes up.

    7. Choose (i).Comment. Chances are given by areas under probability histograms. Often, thecorresponding area under the normal curve is a good approximation, but not herethe curve is much higher than the histogram, so the area under the curve is muchbigger than the area under the histogram.

    8. Most likely, 105; least likely, 101; expected value, 100.Comment. There is a trough in this histogram near the expected value. (With 100draws the trough has disappeared.)

    9. (a) Much smaller than 50%. The value 276,000 is 0.276 million, about half-waybetween the 0.2 and the 0.4 on the horizontal axis. The area to the right of thispoint is much smaller than 50%. (This histogram has a very long right-handtail, and the expected value is a lot bigger than the median.)

    (b) 1,000,000/100 = 10,000(c) 400,000 to 410,000 is a lot more likely, relatively speaking. The box just to the

    right of 400,000 is relatively much higher than the box just to the left. Productshave quite irregular probability histograms.

    Part VI. SamplingChapter 19. Sample SurveysSet A, page 349

    1. The population consists of all undergraduates registered in the current term. Theparameter is the percentage of these undergraduates living at home.2. (a) This is a probability method: it is perfectly definite, chance enters in a planned

    way-when you choose that random starting point between 1 and 100---andnobody has any discretion as to who gets in the sample.(b) The method is different from simple random sampling. For instance, two people whose names are adjacent on the list have no chance to get into the sampletogether. (Simple random samples are defined in section 4.)

    (c) The sample is unbiased: each person has an equal chance of getting into thesample.3. Choose (ii). See pp. 334, 339, and 342.4. The population and the sample are the same, namely, all men age 18 in the Netherlands in 1968; there is no room for sampling error.5. Doing a survey by telephone could introduce bias, because telephone subscribersare probably different from non-subscribers. However, the percentage of non

    subscribers is so small that this bias can usually be ignored. (I f you are estimatingsmall percentages, or are interested in the sort of people who might not havetelephones, this bias can matter.) Using telephone books would introduce seriousbias, since there are many unlisted numbers. See section 7.Comment. About 95% of households in the U.S. have telephones, according toStatistical Abstract, 2006, table 1117. The corresponding figure in 1980 was 93%.

  • 7/29/2019 Stat Answers

    37/79

    ANSWERS TO EXERCISES (pages 349-361) A-79

    6. No. You might expect the respondents interviewed by blacks to be much morecritical. (And they were.)7. No, this parish might have been quite different from the rest of the South. (It was:Plaquemines is sugar country, and sugar required more highly skilled labor thancotton.)8. No. First, the ETS judgment about "representative" schools may have been biased. Next, the schools may not have used good methods to draw a sample oftheir own students.

    Comment. There are about 3,600 institutions of higher learning in the U.S., including junior colleges, community colleges, teachers' colleges. About 1,000 ofthem are very small, altogether enrolling only 10% of the student population. Atthe other end, there are about 100 schools with enrollments over 20,000--andthese account for about one third of the student population.9. Quite a bit different from. Non-respondents generally differ from respondentsearly respondents probably differ from late ones. (In the study, the percentagewith TB was quite a bit higher among the last 200 respondents: perhaps thosepeople did not want to have their illness confirmed.)

    10. A description of the sample design would be more reassuring than a sales pitchfollowed by a disclaimer.11. With 200 replies out of 20,000 questionnaires, nonresponse bias is an overwhelming problem. With 200 responses out of 400 questionnaires, the response rate isadequate to show something important: a substantial fraction of high-school biology teachers hold creationist views.12. False. The serious problem is non-response bias. Additional people brought intothe sample to build it back up to planned size are likely to differ from nonrespondents, and do not fix the problem of non-response bias.Chapter 20. Chance Errors in SamplingSet A, page 3611. populationpopulation percentagesamplesample sizesample numbersample percentagedenominator for sample percentage

    box40%draws1,000number of 1 s among the drawspercentage of 1 s among the draws1,000

    2. The box model: make 400 draws from a box with 10,000 OJ 's and 15,000 [Q] 's.The average of the box is 0.40, and the SD is about 0.5, so the expected value forthe sum is 400 x 0.4 = 160 and theSE for the sum is J400 x 0.5 :::::: 10.(a) EV for number= 160 and SE for number= 10.(b) EV for percent= (160/400) x 100% = 40%, and

    SE for percent= (10/400) x 100% = 2.5%.(c) 40%, 2.5%.

    Comments. (i) Parts (b) and (c) call for the same numbers, in part (c) you haveto interpret the results. (ii) The expected value for the sample percentage is thepopulation percentage (p. 359).

  • 7/29/2019 Stat Answers

    38/79

    A-80 ANSWERS TO EXERCISES (pages 361-366)

    3. TheSE for the number of heads is JIO,OOO x 0.5 =50. TheSE for the percent is(50/10,000) x 100% = 0.5 of 1%.4. (a) and (b) are both true.Comment. When drawing at random from a 0--1 box, the EV for the percentage of1's among the draws equals the percentage of 1's in the box. This is so whether thedraws are made with or without replacement. The equality is exact.5. False. They forgot to change the box. The number of 1 s is like the sum of 400draws from the box I QJ [QJ [QJ ITJ [QJ I6. 10%+1%. The number of red marbles in the sample is 909. If the number is 1 SEtoo high, it's 90 + 9: now convert to percent out of 900. Our SE for a percentage is

    added to or subtracted from the expected value, not multiplied.7. The total distance advanced equals the total number of spots thrown. This is likethe sum of 200 draws (at random with replacement) from the boxI TJ rn rn wn lliJ 1.

    The average of this box is 3.5, and the SD is 1.7. So he can expect to advancearound 200 x 3.5 = 700 squares, give or take J200 x 1.7 24 squares or so.8. Sherlock Holmes is forgetting about chance error.Set B, page 3661. (a) The expected value for the percentage of reds in the sample equals the percentage of reds in the population. (Population = box, sample = draws.) Seep. 359.(b) As the number of draws goes up, the SE for the number of reds in the samplegoes up but the SE for the percentage of reds goes down. See p. 360.2. The first thing to do is to set up a box model. There should be 30,000 tickets in thebox, one for each registered voter; 12,000 are marked 1 (Democrat) and 18,000 aremarked 0. The number of Democrats in the sample is like the sum of 1,000 drawsfrom the box. The fraction of 1 s in the box is 0.4. The expected value for the sum

    is 1, 000 x 0.4 = 400. The SD of the box is J0.4 x 0.6 0.49. The SE for thesum is J1,000 x 0.49 15.(a) The expected value for the percent is 400 out of 1,000, or 40%. The SEfor the percent is 15 out of 1,000, or 1.5%. (No surprise about the expectedvalue: 40% of the registered voters are Democrats.)(b) The percentage of Democrats in the sample will be around 40.0%, give ortake 1.5% or so. Parts (a) and (b) require the same calculations; in (b), youhave to to interpret the results.(c) This is 0.67 SE, the chance is about 48%.

    3. (a) There should be 100,000 tickets in the box, one for each person in the population, of which 60,000 are marked 1 (married) and 40,000 are marked 0. Thenumber of married people in the sample is like the sum of 1 600 draws fromthe box. The expected value for the sum is 1,600 x 0.6 = 960. The SD of thebox is J0.6 x 0.4 0.5. TheSE for the sum is JI,600 x 0.5 = 20. Thenumber of married people in the sample will be 960, give or take 20 or so.Now 960 out of 1,600 is 60%, and 20 out of 1,600 is 1.25%. So 60% of thepeople in the sample will be married, give or take 1.25% or so.

  • 7/29/2019 Stat Answers

    39/79

    ll"j I!18% 60%Exp

    / / ' i I- 1 , 6 0

    ANSWERS TO EXERCISES (pages 366-379) A-81

    Chance = shaded area""" 5 'Yo

    (b) There should be 100,000 tickets in the box, of which 10,000 are marked 1(income over $75,000) and the other 90,000 are marked 0. There are 1,600draws. The chance is about 9%.(c) The box has 100,000 tickets, of which 20,000 are marked 1 (college degree)and the other 80,000 are marked 0. There are 1,600 draws. The chance is about68%.4. The shaded area represents the chance of drawing a sample in which 22% or more

    of the sample persons earn more than $50,000 a year.5. (a) the chance that the sample will have 88 high earners(b) the chance that the sample will have 22% high earners(c) 88 is 22% of 400, so the same chance is described in two different ways. Nocoincidence at all.Set C, page 3701. Option (iii) is right. That is the point of the section.2. Number SE for percentage ofofdraws 1 s among draws

    2,50025,000100,0001%0.27 of 1%0%

    Comment. After 100,000 draws, there are no more tickets in the box, and no uncertainty about the percentage of 1 s among the draws.3. The sample size should be 2,500.4. The SE is the same for all three boxes, because all three have the same fracticn of1 s, so the same SD.5. SE with = 20%; SE without= x 20% 16%.

    Comment. This is an artificial example where the number of draws is a large fraction of the number of tickets in the box, so the correction factor really kicks in.

    Chapter 21. The Accuracy of PercentagesSet A, page 3791. (a) observed (b,c) estimated from the data as

    Comment. There is a big difference between chapter 20 and chapter 21. In chapter 20, you knew the composition of the box, and could compute the expectedvalue and SE exactly. Here, the composition of the box has to be estimated fromthe data. In chapter 20, you reason forward, from the box to the draws. Here, youreason backward, from the draws to the box.

  • 7/29/2019 Stat Answers

    40/79

    A-82 ANSWERS TO EXERCISES (pages 379-383)

    2. The first step is to set up the model. (We need the box model to compute the SE forthe sum of draws.) There are 100,000 tickets in the box, some marked 1 (currentlyenrolled in college) and the others 0 (not enrolled). Then 500 draws are made fromthe box to get the sample. The number of college students in the sample is like thesum of the draws. The fraction of 1 s in the box is unknown, but can be estimatedby the fraction of 1 s observed in the sample, which is 194/500 0.388. So theSD of the box is estimated as .J0.388 x 0.612 0.49. The SE for the sum isJ500 x 0.49 11. The II is the likely size of the chance error in the 194. The SEfor the percentage of 1 s is (11/500) x 100% = 2.2%. The percentage of persons18-24 in the town who are college students is estimated as 38.8%. The estimate islikely to be off by 2.2% or so. The estimate is 38.8%, and the give-or-take numberis 2.2%.

    3. The estimate is 48%, give or take 5% or so.4. The estimate is 2.8%, give or take 0.8 of 1% or so.5. The estimate is 46.8%, give or take 2.5% or so.6. No. Most people work for the few large establishments.7. SE=2%.8. (a) 18.0% 1.9% (b) 21.0% 2.0% (c) 24.5% 2.2%

    9.

    Comment. The third person is off by a couple of SEs in estimating the percentageof 1 s in the box; even so, the estimated standard error is only off by 0.2 of 1%.The bootstrap method is good at estimating SEs.Known Estimatedto be from the data as

    Observed value 30.8% N/AExpected value N/A 30.8%SE N/A 1.5%SDofbox N/A 0.46Number of draws 1,000 NIA

    Set B, page 3831. (a) observed (b,c) estimated from the data as

    See exercise 1 on p. 379.2. (a) 38.8% 4.4% (b) 38.8% 6.6% (c) 38.8% 3.3%

    Comments. As the confidence level goes up, the confidence interval gets longer.However, as the sample size goes up, the confidence interval gets shorter.3. (a) Expect 1 red marble among the draws, give or take 1 or so.(b) It is impossible to draw fewer than 0 red marbles, so the chance is 0.(c) About 16%.

    (d) No. If the probability histogram looks like the normal curve, then the chanceof drawing fewer than 0 red marbles can be read off the curve. Since 16% =1=0%-see (b) and (c)-the histogram does not look like the curve.

    Comment. The histogram is shown at the top of the next page.4. False. The normal approximation cannot be used here. As best we can estimate

    from the sample, 1% of the marbles in the box are red, and 99% are blue. This is

  • 7/29/2019 Stat Answers

    41/79

    ANSWERS TO EXERCISES (pages 383-387) A-8340

    20 1 . 1 ~ . I I0 2 3 4 5 6the box in exercise 3. The probability histogram for the percentage of reds among100 marbles drawn from this box does not look like the normal curve. (With 100draws out of 10,000, there is little difference between sampling with or withoutreplacement.) I f he sample were bigger, or the box were less lopsided, the normalcurve would be fine.

    Set C, page 3861. Probabilities are used when reasoning from the box to the draws; confidence levelsare used when reasoning from the draws to the box.2. (a) The chance error is in the observed value.(b) The confidence interval is for the population percentage.3. (a) 18.0% 3.8%, covers.(b) 21.0% 4.0%, covers.(c) 24.5% 4.4%, just misses.4. (a) True.

    (b) False. The EV is computed exactly; the chance error is in the sample percent-age of reds, not in the expected value.(c) True.(d) False. Confidence intervals are for parameters, not sample data. See pp. 385-386.(e) True.Comment on (b). The SE tells you the likely size of the chance error in the per-centage of reds among the draws. The 50%, however, is a property of the box anddoes not depend on how the draws turn out: there is no chance error in the 50%.For instance, if you draw 100 times and get 53 reds, the sample percentage of redsis 53%, and the chance error-in the 53%-is +3%. I f you get 42 reds, the per-centage of reds among the draws is 42%, and the chance error in the 42% is -8%.But the expected value stays the same, no matter how the draws turn out. Also seeexercise 6 on p. 294.

    5. (a) True. (b) True. (c) True. (d) True.(e) False; the sample percentage is 53%, you don't need a confidence intervalfor that.6. (a) True.

    (b) True.(c) False. The sample percentage is known, and in the interval.(d) False. I f you view the interval as fixed, the chance is either 0 or 1. Moral:the chances are in the sampling procedure, not the population. That is whystatisticians use the term "confidence interval."7. False. The SE for the percentage measures the likely size of the difference between

  • 7/29/2019 Stat Answers

    42/79

    A-84 ANSWERS TO EXERCISES (pages 387-404)

    one sample percentage and the population percentage; not the difference betweentwo sample percentages.Comment. The SE for the difference between two sample percentages has to bebigger, because both are subject to chance variability; by contrast, the populationpercentage isn't varying. See chapter 27 for more about the difference between twosample percentages.

    8. True. Probabilities are used when you reason forward, from the box to the draws;confidence levels are used when reasoning backward, from the draws to the box:see pp. 385-386.Set D, page 3881. Theory says, watch out for this man. What population is he talking about? Why arehis students like a simple random sample from the population? Until he can answerthese questions, don't pay much attention to theSEs he calculates.2. This is not a simple random sample: you are guaranteed to get 25 students fromeach class, a simple random sample won't do that. The procedure does not apply.Set E, page 3901. This isn't a simple random sample, the formulas don't apply.2. This is fine.3. (a) "altered voter enthusiasm"(b) Chance variation-the Gallup Poll is based on a random sample.(c) As table 2 shows, chance errors of several percentage points are quite possible.Maybe late September is not such a good guide to early November after all.(On the other hand, Bush did win.)Chapter 22. Measuring Employment and UnemploymentSet A, page 4031. (a) True.(b) False. The Bureau would divide up the sample into groups, by race, age, andso on, then weight up each group separately; section 4.2. 151.4 million 0.1 million; section 5.3. This is a simple random sample of households, and the inference is about households. The SD of the box is estimated as .J0.80 x 0.20 = 0.40. The SE for thesum is .JWO x 0.40 = 4. The SE for the percentage is 4%.4. This is a simple random sample of households, but a cluster sample of people.(The household is the cluster.) The inference is about people. So, you need more

    information to estimate the SE-the formulas for simple random samples do notapply (section 5).Comment on exercises 3 and 4. In exercise 3, you have a simple random sampleof households, and make an inference about households-the percent where alloccupants are vaccinated. In exercise 4, you are making an inference about peoplefrom a cluster sample of people.

    5. TheSE for the percentage is only 0.2 of 1%, so a discrepancy of 55%-52% = 3%

  • 7/29/2019 Stat Answers

    43/79

    ANSWERS TO EXERCISES (pages 4 0 ~ 2 0 ) A-85

    is almost impossible to explain as a chance error. People like to say they voted, evenif they didn't .

    6. The one for white males; it is based on a lot more people.Chapter 23. The Accuracy of AveragesSet A, page 4131. (a) 7,611/100 = 76.11 (b) 73.94 X 100 = 7,3942. TheSE for the average is 1. The answer to (a) is almost 100%. The answer to (b)is 68%. Don't confuse theSE for the average of the draws with the SD of the box.3. (a) False. (b) True.To repeat, do not mix up the SE for the average of the draws with the SD ofthe box.4. (a) The expected value for the average of the draws equals the average of the box.

    (b) As the number of draws goes up, the SE for the sum of the draws goes up butthe SE for the average of the draws goes down.

    5. The SE for the sum of the draws is .JTilO x 20 = 200. The SE for the average is200/100 = 2. The average of the draws will be around 50, give or take 2 or so.This is still true if the draws are made without replacement, because only a smallfraction of the tickets in the box are drawn out. On the other hand, if you draw100 tickets at random without replacement from a box of 100 tickets, theSE is 0.

    6. The chance that the average of the draws is between 2.25 and 2.75.7. The percentage of times[] came up in the 50 draws.8. (a) The chance that the sum will be 90.

    (b) The chance that the average will be 3.6.(c) 3.6 = 90/25, so the same chance is described in two different ways. Nocoincidence at all. See exercise 5 on p. 366.9. (a), (c), (e) are true; (b), (d), (f ) are false. You know the contents of the box; you

    can compute the expected value for the average without error; however, there ischance error in the average of the draws. See exercise 6 on p. 294, exercises 4-6on pp. 386-387.10. The average of the draws is just their sum, divided by 25 (the number of draws).

    So 25 changes to I, 50 to 2, and 55 to 55/25 = 2.2.Set B, page 4201. populationpopulation average

    samplesample averagesample size

    boxaverage of the boxdrawsaverage of the drawsnumber of draws

    2. (a) "SD of box" makes sense; "SE for box" does not.(b) "SE for average of draws" makes sense; "SE for average of box" does not.The term "SD" applies to a list of numbers; "SE" applies to a chance process. Thetickets in the box (and their average) are fixed, but the draws are random.

  • 7/29/2019 Stat Answers

    44/79

    A-86 ANSWERS TO EXERCISES (pages 420-422)3. (a,b) Estimated from the sample as. The SD of the sample is $19,000; this is usedto estimate the SD of the box. The SE is based on the estimated SD; so it too

    is an estimate. I f you do not know what is in the box, you have to estimate theSD and the SE from the data.(c) observed.4. 95% of 50 48.5. (a) Each organization takes its sample average as the centerof its confidence interval. The sample averages are different, because of chance variation.(b) The sample SDs are different (chance variation), so the estimated SEs aredif-ferent. That is why the lengthsof the intervals are different.(c) 49.6. The box has 30,000 tickets, one for each registered student, showing hisor her age.The data are like 900 draws from the box; the sample average is like the averageofthe draws. The SD of the box is estimated as 4.5 years, the SE for the sum of thedraws is .J900 x 4.5 = 135 years, theSE for the average is 135/900 = 0.15 years.

    (a) Estimate is 22.3 years, off by 0.15 years or so.(b) The interval is 22.3 0.3 years.7. (a) The interval is $568$24. Even though the data don't follow the normal curve,the probability histogram for the averageof the draws does.(b) False: $24 is the SE for the average of the draws, not the SD of the box.8. False. The SE for the average gives the likely size of the difference between thesample average and the population average, not the difference between two sampleaverages. So $18 is the wrong margin of error. See exercise 7 on p. 387.9. The probability histogram is about chances for the sample average;it is not aboutdata. Here, the probability histogram is given. Part (a) asks for + 1 in standard units,relative to the probability histogram. We need the center and spread of this histogram. The center is the expected value for the sample average, which equals theaverage of the box. This is given: it is $61,700. The spread is theSE for the sampleaverage. This can be worked out exactly, because the problem the SD of thebox. This is $50,000. So the SE for the sum of the draws is ../625 x $50,000 =

    $1,250,000.TheSE for the averageof the draws is $1,250,000/625 = $2,000. And+1 in standard units is$61,700 + $2,000= $63,700.Thatistheanswerto(a).In part (b), you are being asked to see where $58,700 fits, on the axisof the probability histogram. It comes in below the expected value: $58,700 is below $61,700.So, $58,700 is on the negative partof the axis. In fact, this value is $3,000 belowthe expected value. And 1 SE is $2,000. So $58,700 is-1.5 in standard units. Thatis the answer to (b).Comments. (i) The key point: in this problem, the average and SDof the box aregiven.(ii) A typical sample average is around I SE away from the population average.Our sample average was 1.5 SE too low. We didn't get enough rich people in thesample.(iii) Look at figure 1 on p. 411. The histogram is about the process of drawingat random and taking the average; it is not about any particular setof draws. I fyou draw 25 tickets and their average happens to be 3.2, that doesn't change thehistogram. This exercise illustrates the same point, in a more complicated setting.(iv) You would use the SD of $50,000 to convert to standard units relative to adata histogram-for the incomes of all 25,000 families in the town. The SD of

  • 7/29/2019 Stat Answers

    45/79

    ANSWERS TO EXERCISES (pages 421-424) A-87

    $49,000 works relative to another data histogram-for the incomes of the 625sample families.Set C, page 4231. Number EVforsum SEforsum EV or average SE for averageofdraws ofdraws ofdraws ofdraws ofdraws

    25 75 10 3.0 0.4100 300 20 3.0 0.2400 1,200 40 3.0 0.1

    2. (a) True. The expected value for the average of the draws equals the average ofthe box (p. 410).(b) Can't tell; you need the SD of the box.3. (a) Estimated from the data as; you would need the average of the box to computethe expected value exactly.(b) To compute the SE exactly, you need the SD of the box; even to estimate it,you would need the SD of the draws.

    Comment. The expected value applies to the process of drawing at random, ratherthan any particular set of draws. For example, suppose you draw 25 times at ran-do