mat 540 statistical concepts for research

5

Chapter 2

ORGANIZATION AND DESCRIPTION OF

DATA

2.1 (a) The percentage in other classes is 100 32.7 12.8 12.5 12.1 8.2 21.7 %− − − − − =

(b)

0

5

10

15

20

25

30

35

Paper Yard Food Plastic Metals Other

Pe

rce

nt

Was

te

(c) The percentage of waste that is paper and paperboard is: 32.7 %

The percentage of waste in the top two categories is: 32.7 12.8 45.5+ = %

The percentage in the top five categories is:

32.7 12.8 12.5 12.1 8.2 78.3%+ + + + =

6 CHAPTER 2. ORGANIZATION AND DESCRIPTION OF DATA

2.2 The frequency table for blood type is

Blood type Frequency Relative Frequency

O 16 16 / 40 0.40=

A 18 18 / 40 0.45=

B 4 4 / 40 0.10=

AB 2 2 / 40 0.05=

Total 40 1.00

2.3 The frequency table for number of activities is

Number of Activities Frequency Relative Frequency

0 7 7/40 = 0.175

1 10 10/40 = 0.25

2 13 13/40 = 0.325

3 5 5/40 = 0.125

4 2 2/40 = 0.05

5 1 1/40 = 0.025

6 1 1/40 = 0.025

7 1 1/40 = 0.025

Total 40 1.00

This is the relative frequency histogram:

7

2.4 The frequency table for number of crashes per month is

Number of Activities Frequency Relative Frequency

0 5 5/59 = 0.085

1 12 12/59 = 0.203

2 11 11/59 = 0.186

3 14 14/59 = 0.237

4 8 8/59 = 0.136

5 8 8/59 = 0.136

6 1 1/59 = 0.017

Total 59 1.00 (rounding error)

This is the relative frequency histogram:

2.5 (a) The table of relative frequencies for workers in the department is

Mode of Transportation Frequency Relative Frequency

Drive alone 25 25 / 40 0.625=

Car pool 3 3 / 40 0.075=

Ride bus 7 7 / 40 0.175=

Other 5 5 / 40 0.125=

Total 40 1.000


(b) The pie chart for workers in the department is

2.6 The table of relative frequencies for the money raised (in million dollars) is

Source Frequency Relative Frequency

Individuals and bequests 117 117 / 207 0.565=

Industry and business 24 24 / 207 0.116=

Foundations and associations 66 66 / 207 0.319=

Total 207 1.000

The pie chart for the university fund drive is

2.7 There are overlapping classes in the grouping. A report of 3 stolen bicycles will fall

in two classes.

2.8 There is a gap. A report of 6 complaints in one week does not fall in any class. The

last class should be 6 or more.

9

2.9 There is a gap. The response 5 close friends does not fall in any class. The last

class should be 5 or more.

2.10 The first class should be “less than 175 pounds”. Otherwise, a light weight kicker

cannot be assigned to a class.

2.11 (a) Yes. (b) Yes. (c) Yes. (d) No. (e) No.

2.12 The frequency table of the survey response is

Response Frequency Relative Frequency

1 14 14 / 50 0.28=

2 13 13 / 50 0.26=

3 7 7 / 50 0.14=

4 16 16 / 50 0.32=

Total 50 1.00

2.13 (a) The relative frequencies are 0.18 , 0.48, 0.26, and 0.08 for 0, 1, 2, and 3 bags,

respectively.

(b) Nearly one-half of the passengers check exactly one bag. The longest tail is to

the right.

(c) The proportion of passengers who fail to check a bag is 9 / 50 0.18= .


2.14 The dot diagram of meter readings is

422 432 442 452 462 472

Measurements DotPlot

2.15 The dot diagram of amounts of radiation leakage is

2.16 The dot diagram of number of bad checks received is

876543

Number of Bad Checks Received

11

2.17 (a) The dot diagram of number of CFUs is

(b) There is a long tail to the right with one extremely large value of 1600 CFU

units.

(c) There is one day so the proportion is 1/15 0.067=

2.18 (a) The frequency distribution of tornado fatalities is given in the table below.

Class Interval Frequency Relative Frequency

[0, 25) 2 2/58 = 0.035

[25, 50) 19 19/58 = 0.333

[50, 75) 18 18/58 = 0.316

[75, 100) 7 7/58 = 0.123

[100, 150) 5 5/58 = 0.088

[150, 200) 2 2/58 = 0.035

[200, 250) 1 1/58 = 0.018

[250, 550) 3 3/58 = 0.053


(b) The relative frequency histogram is given below.

0 25 50 75 100 150 200 250

0.0

0.1

0.2

0.3

Number of Deaths

Re

lative

Fre

que

ncy

(c) The proportion of years having 49 or fewer tornado fatalities is

0.035 0.333 0.368+ = .

(d) There is a long tail to the right due to the fact that the last class interval is much

wider than the others yet still exhibits a low frequency of observations.


2.19 (a) In the following frequency distribution of lizard speed (in meters per second),

the left endpoint is included in the class interval but not the right endpoint.


0.45 to 0.90 2 0.067

0.90 to 1.35 6 0.200

1.35 to 1.80 11 0.367

1.80 to 2.25 5 0.167

2.25 to 2.70 6 0.200


(b) All of the class intervals are of length 0.45 so we can graph rectangles whose

heights are the relative frequency. The histogram is

2.20 In the following frequency distribution of order of earthquake magnitudes (as given

on the Richter scale), the left endpoint is not included in the class interval, but the

right one is.


(6.0, 6.3] 12 12/55 = 0.218

(6.3, 6.6] 15 15/55 = 0.273

(6.6, 6.9] 10 10/55 = 0.182

(6.9, 7.2] 10 10/55 = 0.182

(7.2, 7.5] 5 5/55 = 0.091

( ]7.5,7.8 2 2/55 = 0.036

(7.8, 8.1] 1 1/55 = 0.018


The class intervals all have the same length so we take the option of making the

height of a rectangle equal to the relative frequency. The histogram is

13

0

2

4

6

8

10

12

14

16

6.3 6.6 6.9 7.2 7.5 7.8 8.1

Order of Earthquake Magnitude

Fre

qu

en

cy

2.21 This time, the frequency distribution is given by


(6.0, 6.3] 12 12/55 = 0.218

(6.3, 6.6] 15 15/55 = 0.273

(6.6, 6.9] 10 10/55 = 0.182

(6.9, 7.2] 10 10/55 = 0.182

(7.2, 7.9] 8 8/55 = 0.145


The corresponding frequency histogram is as follows:

0

2

4

6

8

10

12

14

16

6.3 6.6 6.9 7.2 >7.2

Order of Earthquake Magnitude

Fre

qu

en

cy


2.22 The stem-and-leaf display of the scores is

9 58

10 6

11 559

12 6

13 135678

14 344557

15 2478

16 01222567

17 14688

18 24

19 04

2.23 The stem-and-leaf display of the amount of iron present in the oil is

0 6

1 2234455567777889

2 000000222445567799

3 022444566

4 1167

5 12

2.24 The corresponding measurements are

246 268 293 319 344 371 382 397 405 426 443 490 504 568 613

2.25 The double-stem display of the amount of iron present in the oil is

0 6

1 22344

1 55567777889

2 00000022244

2 5567799

3 022444

3 566

4 11

4 67

5 12

15

2.26 The corresponding measurements are

18 20 20 21 22 22 23 23 24 24 24 25 25 25 26 26 27 29 30

2.27 The five-stem display of the Consumer Price Index in 2001 for the given cities is

15 5

15

15 9

16

16

16

16 7

16 8

17 0

17 22333

17 4

17 677

17 88

18 11

18 2

18

18 67

18

19 011

2.28 (a) The median is 5. The sample mean is

3 7 4 11 5 306

5 5x

+ + + += = =

(b) The median is 3. The sample mean is

3 1 7 3 1 153

5 5x

+ + + += = =

2.29 (a) The median is 3. The sample mean is

2 5 1 4 3 153

5 5x

+ + + += = =

(b) The mean is

26 30 38 32 26 31 18330.5

6 6x

+ + + + += = =


The ordered measurements are: 26, 26, 30, 31, 32, 38

30 31median 30.5

2

+= =

(c) The sample mean is

1 2 0 1 4 1 21

7x

− + + + + − += =

The ordered measurements are: 1, 1, 0, 1, 2, 2, 4− − .

The median is 1.

2.30 The sample mean is

6.3 6.9 5.7 5.4 5.6 5.5 6.6 6.5 48.56.0625

8 8x

+ + + + + + += = =

The ordered measurements are: 5.4, 5.5, 5.6, 5.7, 6.3, 6.5, 6.6, 6.9.

5.7 6.3 12median 6

2 2

+= = =

2.31 (a) 3810 /15 254.x = =

(b) The ordered observations are:

10 20 50 60 80 90 90 110

140 180 260 340 380 400 1600

So, the median is 110 CFU units. The one very large observation makes the

sample mean much larger. Hence, the sample median is better to use in this

instance.

2.32 (a) The ordered monthly incomes are: 2275 2350 2425 2450 2475 2650 4700.

193252760.7

7x = = , median 2450= .

(b) For a typical salary, the median is better. Only one person earns more than the

mean.

2.33 The mean is 956 /12 79.67= . The claim ignores variability and is not true. It is

certainly unpleasant with a daily maximum temperature 105oF in July.

17

2.34 The sample mean is

85 82 77 83 80 77 94 57882.57

7 7x

+ + + + + += = = cases

The ordered sales times are: 77, 77, 80, 82, 83, 85, 94

median 82=

2.35 (a) 212 / 25 8.48x = =

(b) The sample median is 8. Since the sample mean and median are about the

same, either of them can be used as an indication of radiation leakage.

2.36 The mean, 10.30, is one measure of center tendency and the median, 10.00, is

another. These values may be interpreted as follows. On average, there were 10.3

reports of aggravated assault at the 27 universities. Thirteen of the universities had

at least 10 such reports while 13 recorded at most 10 such reports. At least one

school logged exactly 10 reports.

2.37 The mean, 118.05, is one measure of center tendency and the median, 117.00, is

another. The value 118.05 tells us that, on average, that a baby weighed 118.05

ounces. The median tells us that about half of the babies weighed at least 117

ounces while roughly half weighed at most 117 ounces.

2.38 (a) 0(7) 1(10) 2(13) 3(5) 4(2) 5(1) 6(1) 7(1)

1.92540

x+ + + + + + +

= = (activities)

(b) Sample median is 2 activities

(c) The large observations of 5, 6, and 7 activities did not drastically affect the

computation of the mean in this instance.

2.39 (a) 1(7) 2(9) 3(6) 4(5) 5(3)

2.630

x+ + + +

= = (returns)

(b) Sample median is 2 returns.

2.40 (a) Sample median (240 248) / 2 244= + = (seconds).

(b) 1239 / 6 206.5x = = (seconds).

2.41 (a) 271/ 40 6.775x = = days.

(b) Sample median (6 7) / 2 6.5= + = . Both the sample mean and the sample

median give a good indication of the amount of mineral lost.

2.42 (a) Sample median for males (45.8 48.3) / 2 47.05= + = .

(b) Sample median for females 30.3= .

(c) Sample median for the combined set of males and females 38.6= .


2.43 Sample median (176 187) / 2 181.5= + = (minutes).

2.44 In Exercise 2.43, the sample mean 1862 /10 186.2= = (minutes). The total time for

10 games is 10 1862x = minutes and this is meaningful. However 10 × median

ignores the actual times of the long games and is therefore meaningless.

2.45 (a) The dot diagram for the diameters (in feet) of the Indian mounds in southern

Wisconsin is

(b) 346 /13 26.62x = = . Sample median 24= .

(c) 13 / 4 3.25= , so we count in 4 observations. 1 22Q = and 3 30Q = .

2.46 40 / 4 10= , an integer, so we average the 10th and 11th observations to get the first

quartile: 1 (1 1) / 2 1Q = + = . Similarly, we average the 20th and 21

st observations to

get the median, or 2 (2 2) / 2 2Q = + = days. Finally, we average the 30th and 31

st

observations to get the third quartile: 3 (3 3) / 2 3Q = + = .

2.47 (a) Median (152 154) / 2 153= + = .

(b) 40 / 4 10= , so we need to count in 10 observations. The 11-th smallest

observation also satisfies the definition. This yields 1

135 136135.5

2Q

+= = .

Using a similar approach, we find that 3

166 167166.5

2Q

+= = .

2.48 2283 / 25 91.32x = = calls per shift.

2.49 The ordered data are

50 57 68 69 72 73 73 80 82 91

92 93 94 96 96 100 102 104 105 106

108 109 118 118 127

Since the number of observations is 25, the median or second quartile is the 13th

ordered observation in the list. The first quartile is the 7th ordered observation and

the third quartile is the 19th ordered observation:

1 2 373 94 105Q Q Q= = =

19

2.50 (a) The ordered data are

0.50 0.76 1.02 1.04 1.20 1.24 1.28 1.29 1.36 1.49

1.55 1.56 1.57 1.57 1.63 1.70 1.72 1.78 1.78 1.92

1.94 2.10 2.11 2.17 2.47 2.52 2.54 2.57 2.66 2.67

Since the number of observations is 30, the median or second quartile is the

average of the 15th and 16th in the list. Sample median (1.63 1.70) / 2= +

1.665= meters per second. Because 30 / 4 7.5= , the first quartile is the 8th

ordered observation, and because (0.75)(30)=22.5, the third quartile is the 23rd

ordered observation:

1 2 31.29 1.665 2.11Q Q Q= = =

(b) Since 0.9(30) 27= , the 90th percentile is the average of the 27th and 28th

observation in the ordered list. Sample 90th percentile (2.54 2.57) / 2= +

2.555= .

2.51 (a) The ordered observations are

10 20 50 60 80 90 90 110

140 180 260 340 380 400 1600

Since the sample size is 15, the median is the 8th ordered observation 110. To

obtain 1Q , we find 15 / 4 3.75= so the first quartile is the 4th ordered

observation in the ordered list. To obtain 3Q , we find 0.75(15)=11.25, so that

the third quartile is the 12th ordered observation:

1 360 340Q Q= =

(b) The 90th percentile requires us to count in at least 0.9(15) 13.5= or 14

observations. The 90th sample percentile 400= .

2.52 (a) The mean of the original data set is

4 8 8 7 9 6 427

6 6x

+ + + + += = =

Adding 4c = to the original data set we get: 8, 12, 12, 11, 13, 10. The mean of

the new data set is

8 12 12 11 13 10 6611

6 6x c x c

+ + + + ++ = = = = +

which equals 4 7 4x + = + . Multiplying the original data set by 2d = we get:

8, 16, 16, 14, 18, 12. The mean of the new data set is

8 16 16 14 18 12 84

146 6

dx d x+ + + + +

= = = =

which equals 2(7)d x× = .


(b) The median of the original data set is

7 8median 7.5

2

+= =

When 4c = is added to the original data set, the median of the new data set is

11 12median of ( 4) 11.5

2x

++ = =

which equals (median ) 7.5 4c+ = + . When the original data set is multiplied

by 2d = , the median of the new data set is

14 16median of 2 15

2x

+= =

which equals ( median) 2(7.5)d × = .

2.53 (a) The ordered data are 73, 74, 76, 76, 80. The median is 76oF and the mean is

o379 / 5 75.8x F= = .

(b) The mean of o( 32)F − is 32x − by property (i) of Exercise 2.52 with 32c = − .

By property (ii)

o o

o

5 5mean of ( 32) (mean of ( 32))

9 9

5 5( 32) (75.8 32) 24.33

9 9

F F

x C

− = −

= − = − =

By similar properties for the median

o o o5 5 5median of ( 32) (median of ( ) 32) (76 32) 24.44

9 9 9F F C− = − = − =

2.54 (a) Company A. The average is highest and a superior machinist would earn above

the median.

(b) Company B. A medium quality machinist would be paid near the median.

Company B has the higher median.

21

2.55 (a)

(b)

Lake Apopka 67

13.405

x = = Lake Woodruff 454

50.449

x = =

(c) From the dot diagrams, the males in Lake Apopka have lower levels of

testosterone and their sample mean is only about one-third of that for males in

(un-contaminated) Lake Woodruff. This finding is consistent with the

environmentalists’ concern that the contamination has affected the testosterone

levels and the reproductive abilities.

2.56 (a)


(b) Males 67 / 5 13.40x = = Females 144 /11 13.09x = =

(c) The dot diagrams of the amount of testosterone seems to be quite similar for

males and females although there is a gap in the male diagram. The two means

are nearly the same which suggests that the insecticide contamination has

pushed hormone concentrations far out of balance because, ordinarily, males

should have higher testosterone concentrations.

2.57 (a) We carry out all necessary calculations in the following table. The mean is

12 / 3 4x = = .

x x x−

2( )x x− 7 3 9

2 2− 4

3 1− 1

Total 12 0.0 14

(b) The variance and the standard deviation are

2 147 , 7 2.646

3 1s s= = = =

−


15 / 3 5x = = .

x x x−

2( )x x− 4 1− 1

9 4 16

2 3− 9

Total 15 0.0 26


2 2613 , 13 3.606

3 1s s= = = =

−


32 / 4 8x = = .

x x x−

2( )x x−

8 0 0

6 2− 4

14 6 36

4 4− 16

Total 32 0.0 56


2 5618.667 , 18.667 4.320

4 1s s= = = =

−

23


9.5 / 5 1.9x = = .

x x x−

2( )x x− 2.5 0.6 0.36

1.7 0.2− 0.04

2.1 0.2 0.04

1.5 0.4− 0.16

1.7 0.2− 0.04

Total 9.5 0.0 0.64


2 0.640.16 , 0.16 0.40

5 1s s= = = =

−

2.61 We carry out all necessary calculations in the following table.

x 2x

8 64

3 9

4 16

Total 15 89

The variance is

( )2

22 21 1 15 1

89 (89 75) 71 2 3 2

xs x

n n

= − = − = − = −

∑∑

2.62 We carry out all necessary calculations in the following table.

x 2x

8 64

6 36

14 196

4 16

Total 32 312

The variance is

( )2

22 21 1 32 1 56

312 (312 256) 18.6671 3 4 3 3

xs x

n n

= − = − = − = = −

∑∑

2.63 (a) 2 2(34 12 / 5) / 4 1.30s = − = .

(b) 2 2(19 ( 7) / 6) / 5 2.167s = − − = .

(c) 2 2(499 59 / 7) / 6 0.286s = − = .


2.64 (a) Many factors could explain the difference in apartment rents. One possible

factor is simply that different landlords may charge different rents. Other

factors are the size of the apartment, the proximity of the apartment to key

locations such as parks or public transportation, and whether utilities such as

water and electricity are included.

(b) 2 2(3,836,675 5155 / 7) / 6 6730.95s = − = .

(c) 6730.95 82.04s = =

2.65 2

(9726 346 /13) /12 6.5643s = − = .

2.66 (a) 2 2(142 26 / 5) / 4 1.70s = − = .

(b) 1.70 1.304s = =

2.67 (a) 2 2(3,140,900 3810 /15) /14 155,225.7s = − = .

(b) 155225.7 393.99s = = .

(c) 2 2(580900 2210 /14) /13 17,848.9s = − = , So, 17848.9 133.6s = = . The

single very large value greatly inflates the standard deviation.

2.68 (a) 2 2(2410 212 / 25) / 24 25.51s = − = .

(b) 25.51 5.05s = = .

2.69 (a) 1862 /10 186.2x = = .

(b) 2 2(353796 1862 /10) / 9 787.96s = − = .

(c) 787.96 28.07s = = .

2.70 (a) 62 / 50 1.24x = = bags.

(b) 2 2(112 62 / 50) / 49 0.71673s = − = , so that .847s = .

2.71 (a) Median 68.4= .

(b) 478.4 / 7 68.343x = = .

(c) 2 2(32730.34 478.4 / 7) / 6 5.853s = − = . Hence 2.419s = .

2.72 (a) The measure of variation displayed is 7.61, the sample standard deviation. The

sample variance is 2 27.61 57.9121s = = .

(b) The interquartile range is 3 1 14.00 5.00 9.00Q Q− = − = . This means the center

half of the data span an interval of length 9.

(c) Any value greater than 7.61 would correspond to greater variation.

25

2.73 (a) The measure of variation displayed is 15.47, the sample standard deviation.

The sample variance is 2 215.47 239.321s = = .

(b) The interquartile range is 3 1 131.00 106.00 25.00Q Q− = − = . This means the

center half of the data span an interval of length 25 ounces.

(c) Any value smaller than 15.47 would correspond to smaller variation.

2.74 (i) For the observations 5, 9, 9, 8, 10, 7, 28, 3.2x s= = and 1.789s = . Add 4c =

to the observations x , we have 9, 13, 13, 12, 14, 11. The sample mean and

variance of the new data set are

9 13 13 12 14 11mean of ( 4) 12

6

9 1 1 0 4 1 16variance of ( 4) 3.2

6 1 5

x

x

+ + + + ++ = =

+ + + + ++ = = =

−

So the standard deviation of the new data set 4x + is 3.2 1.789= which is the

same as the standard deviation of x.

(ii) Multiply the observations x by 2d = . We get 10, 18, 18, 16, 20, 14. The

sample mean and variance of the new data set are

10 18 18 16 20 14mean of 2 16

6

36 4 4 0 16 4variance of 2

6 1

4(9 1 1 0 4 1)4(3.2) 12.8

5

x

x

+ + + + += =

+ + + + +=

−

+ + + + += = =

So the standard deviation of the new data set 2x is 2 3.2 3.578= or d times the

standard deviation of x.

2.75 Using the data set in Exercise 2.22, in Exercise 2.47, we determined that 1 135.5Q =

and 3 166.5Q = . Hence,

Interquartile range 3 1 166.5 135.5 31.0Q Q= − = − = points.

2.76 We determined in an earlier exercise (2.33) that 1 1Q = and 3 3Q = . Hence,

Interquartile range 3 1 3 1 2Q Q= − = − = days.


2.77 No. Typically, the middle half of a data set is much more concentrated than the

combination of the two quarters, one in each tail. As an example, for the water

quality data of Exercise 2.17, the range is 1600 10 1590− = because of one

extremely large observation. From the quartiles determined in Exercise 2.51, the

interquartile range is 340 60 280− = . The range is six times larger than the

interquartile range.

2.78 (a) 150.125x = and 2 49.354s = so 2x s± is the interval (100.771, 199.479) .

This interval contains 38 observations or proportion .95 of the observations.

And 3x s± is the interval (76.094, 224.156) which contains proportion 1 of the

observations.

(b) The empirical guidelines suggests proportion 0.95 in the interval 2x s± and we

observed 0.95. It suggests proportion 0.997 for the interval 3x s± and we

observed 1.000. The agreement is excellent.

2.79 (a) 6.775x = and 19.4096 4.406s = = .

(b) The proportion of the observations are given in the following table:

2 3

Interval: (2.369, 11.181) ( 2.037, 15.587) ( 6.443, 19.993)

Proportion: 26 / 40 0.65 38 / 40 0.95 40 / 40 1.00

Guidelines: 0.68 0.95 0.997

x s x s x s± ± ±

− −

= = =

(c) We observe a good agreement with the proportions suggested by the empirical

guideline.

2.80 (a) 51.71/ 30 1.724x = = and 2

(98.641 51.71 / 30) / 29 0.5727s = − = .


2 3

Interval: (1.151, 2.297) (.579, 2.869) (.006, 3.442)

Proportion: 20 / 30 0.667 29 / 30 0.967 30 / 30 1.00

Guidelines: 0.68 0.95 0.997

x s x s x s± ± ±

= = =


guideline.

2.81 (a) 2.6x = and 1.69655 1.3025s = = .


2 3

Interval: (1.2975, 3.9025) ( 0.005, 5.205) ( 1.3075, 6.5075)

Proportion: 15 / 30 0.50 30 / 30 1.00 30 / 30 1.00

Guidelines: 0.68 0.95 0.997

x s x s x s± ± ±

− −

= = =


guideline.

27

2.82 (a) The z-values of 350 and 620 are

350 490 620 4901.167 , 1.083

120 120z z

− −= = − = = .

(b) For the z-score of 2.4, the raw score is obtained by solving the equation

2102.4

50

xz

−= = so 210 50(2.4) 330x = + = .

2.83 (a) 102 118.05

1.03715.47

z−

= = − (b) 144 118.05

1.67715.47

z−

= =

2.84 (a), (b) The boxplots for salaries in City A and City B are shown below.


(c) There is a greater difference between the cities with respect to the higher salaries.

For instance, any salary above the median in City B is greater than the 75th

percentile in City A.

2.85 For males, the minimum and the maximum horizontal velocity of a thrown ball are

25.2 and 59.9 respectively. The quartiles are:

1

3

(38.6 39.1) / 2 38.85,

median (45.8 48.3) / 2 47.05,

(49.9 51.7) / 2 50.8.

Q

Q

= + =

= + =

= + =

For females, the minimum and the maximum horizontal velocity of a thrown ball

are 19.4 and 53.7 respectively. The quartiles are

1 325.7, median 30.3, 33.5Q Q= = = .

The boxplot of the male and female throwing speed are

Comparing the two boxplots, we can see that males throw the ball faster than

females.

2.86 (a) The differences, arranged in order, between 2007 and 1992 Consumer Price

Index are

13 13 14 18 20 20 21 21 21 22 23 24 25 25 26 26

27 33 33 34 35 36 37 41

1 2 3

20 21 24 25 33 3320.5, 24.5, 33

2 2 2Q Q Q

+ + += = = = = =

The five-number summary is: 13, 20.25, 24.5, 33, 41

29

Alternatively, from Minitab:

Descriptive Statistics: Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3 Maximum

C1 24 0 25.33 1.59 7.81 13.00 20.25 24.50 33.00 41.00

(b)

40

35

30

25

20

15

10

Boxplot of Increases in Consumer Price Indices

2.87 (a)

608

25.3324

x = = and 216806 (608) / 24

7.81124 1

s−

= =−

(b) Since 2 25.33 2(7.811) 9.708x s− = − = and 2 25.33 2(7.811) 40.952x s+ = + =

only the increase of 41 for Honolulu lies outside the interval. The proportion

23 / 24 0.958= lies within the interval.


2.88 (a) Using the ordered data set from Example 5, we have

1 2 3

min 4.5 max 10.0

6.3 7.3 8.3Q Q Q

= =

= = =

(b) The box plot depicting this data set is as follows:

10

9

8

7

6

5

4

C1

Hours of Sleep

2.89 From Exercise 2.38, we know that

1.925x = and 2249 (77) / 40

1.60740 1

s−

= =−

2.90 (a) 296 /14 21.14x = =

(b)

22961207814 21.16

14 1s

−= =

−

(c) The dot plot is given below

(d) All are losses except for two gains in the 1998 and 2002 elections.

••

31


8 5 4 5 8 11 12 16 26 30 43 47 52 55− −

Median (12 16) / 2 14= + = seats lost.

(b) The maximum number of seats lost, 55, occurred when Harry S. Truman was

President. The minimum number, 8− or a gain, occurred during G.W. Bush’s

term as President.

(c) range 55 ( 8) 63= − − =

2.92

The process appears to be in statistical control. The pattern is nearly a horizontal

band with one possible low value.

2.93

The value 215 from the second pay period looks high and 194 from the fifth period

is possibly high.


2.94 We calculate 2283 / 25 91.32x = = and 9281.44 / 24 19.67s = = so the upper

limit is 2 130.66x s+ = and the lower limit is 2 51.98x s− = .

Only the value 50 calls for worker 20 is out of control.

2.95 We calculate 2501/ 26 96.2x = = and 65254 / 25 51.1s = = so the upper limit is

2 198.4x s+ = and the lower limit is 2 6.0x s− = − which we take as 0.

Only the value 215 from the second pay period is out of control.

33

2.96

Year

Exchange Rate

2007200620052004200320022001200019991998199719961995199419931992

1.6

1.5

1.4

1.3

1.2

1.1

1.0

Time Series Plot of Exchange Rate

The process appears to be in statistical control between 1994 and 2003, but then

begins to taper off from 2003 to 2007.

2.97

Observation

Individual Value

15131197531

1.8

1.6

1.4

1.2

1.0

_X=1.3544

UCL=1.7918

LCL=0.917

I Chart of Exchange Rate

The process appears to be in statistical control.


2.98 We re-calculate without the outlier 5326.

183291221.9

15x = = and

5195138609.16

14s = =

so the upper limit is 2 2440.2x s+ = and the lower limit is 2 3.6x s− = . All of the

points are within the control limits.

2.99 (a) The relative frequencies of the occupation groups are:

Relative Frequency

2007 2000

Goods Producing 0.139 0.161

Service (Private) 0.722 0.702

Government 0.139 0.136

Total 1.000 0.999

(b) The proportions of persons in private service occupations and government has

increased while the proportion in goods producing have decreased from 2000 to

2007.

2.100 (a) The frequency table of “intended major” of the students is:

Intended major Frequency Relative Frequency

Biological Science 18 0.367

Humanities 4 0.082

Physical Sciences 9 0.184

Social Science 18 0.367

Total 49 1.000

35

(b) The frequency table of “year in college” of the students is:

Year Frequency Relative Frequency

1 4 0.082

2 10 0.204

3 20 0.408

4 15 0.306

Total 49 1.000

The histogram is:

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

1 2 3 4

Year in College

Rela

tiv

e F

req

uen

cy

2.101 The dot diagrams of heights for the male and female students are


2.102 The frequency table of the causes for power outage is:

Frequency Table for Causes of Outage

Cause Frequency

Trees and limbs 12

Animals 9

Lighting 3

Wind storm 1

Fuse 1

Unknown 4

The Pareto chart for the cause of outage is

2.103 (a) Yes. The exact number of lunches is the sum of the frequencies of the first

three classes.

(b) Yes. The exact number of lunches is the sum of the frequencies of the last

three classes.

(c) No.

2.104 The sample mean and sample standard deviation are:

143 131 101 143 111125.8

5x

+ + + += =

1452.819.1

5 1s = =

− millimeters.

2.105 (a) The mean, 227.4, is one measure of center tendency and the median, 232.5, is

another. These values may be interpreted as follows. On average, the 20

grizzly bears weigh 227.4 pounds apiece. Half of the grizzly bears sampled

weighed at least 232.5 pounds while half weighed at most 232.5 pounds.

(b) The sample standard deviation is 82.7 pounds.

(c) The z score for a weight of 320 pounds is

320 227.41.12

82.7z

−= =

37

2.106 (a) Median (64 67) / 2 65.5= + = .

(b) We count in 38 / 4 9.5= or 10 observations to find 1 50Q = and 3 79Q = .

(c) The proportion of students who scored below 70 is 20 / 38 0.526= .

The proportion of students who scored 80 or over is 9 / 38 0.237= .

2.107 (a) Sample median (9 9) / 2 9= + = .

(b) 271/ 30 9.033x = = .

(c) The sample variance is 2

2 1 2712561 3.895

29 30s

= − =

.

2.108 (a) The double stem display is

4 23

4 6677899

5 0011112244444

5 555566677778

6 0111244

6 589

(b) Median 1 3

54 5554.5, 50.5 , 57.5

2Q Q

+= = = = .

2.109 (a) 7, 2x s= =

(b) By the properties, the new data set 100x + has sample mean (7 100)= +

107= and standard deviation 2. By direct calculation, we verify

106 108 104 109 108

1075

x+ + + +

= =

2 2 2 2 22 (106 107) (108 107) (104 107) (109 107) (108 107)

44

s− + − + − + − + −

= =

(c) By the properties, the new data set 3x− has sample mean 3(7) 21= − = −

and standard deviation 3 3(2) 6s− = = . By direct calculation, we verify

18 24 12 27 2421

5x

− − − − −= = −

2 2 2 2 2

2 (3) ( 3) (9) ( 6) ( 3) 1 1 9 4 19 9

4 4s

+ − + + − + − + + + + = = =

2.110 (a) For the heights of males, 69.61, 2.97x s= = .

(b) For the heights of females, 66.14, 2.60x s= = .

(c) For the heights of males, median 1 370, 68, 72Q Q= = = .

(d) For the heights of females, median 1 366.5, 65, 67Q Q= = = .


2.111 (a) The dot diagrams are

(b) From the dot diagrams we can see the number of flies (grape juice) is centered

at about 11 and the number of flies (regular food) is centered near 25. The

spread looks about the same.

(c) Regular food: 25.1, 6.84x s= = .

Grape juice: 11.05, 6194x s= = .

2.112 (a) The dot diagram of the usage times per ounce of toothpaste is

(b) The relative frequency of usage times that do not exceed .80 is

number of usage times less than or equal to .80 12

0.5024 24

= = .

(c) 0.794x = and 0.115s = .

(d) Median 10.805, 0.755Q= = and 3 0.86Q = .

•

39

2.113 (a) 5.38x = and 3.42s = .

(b) Median 5= .

(c) Range 13 0 13= − = .

2.114 (a) 0.75 since 96.0 is the third quartile.

(b) 0.50 since 84.0 is the median.

(c) 0.68 in the interval 2x s± or 59.5-101.5 if the frequency distribution is nearly

bell-shaped.

(d) 0.50 since 1 75.5Q = and 3 96.0Q = .

(e) 0.997 in the interval 3x s± or 49.0-112.0 if the frequency distribution is

nearly bell-shaped.

2.115 (a) Median 14.505, 4.30Q= = and 3 4.70Q = .

(b) 90th percentile = (4.80 5.07) / 2 4.935= + = .

(c) 4.5074x = and 0.368s = .

(d) The boxplot of acid rain in Wisconsin is

2.116 (a) and (b). We have 4.507x = and 0.368s = so

2 3

Interval: (4.139, 4.875) (3.771, 5.243) (3.403, 5.611)

Proportion: 38 / 50 0.76 46 / 50 0.92 50 / 50 1.000

Guidelines: 0.68 0.95 0.997

x s x s x s± ± ±

= = =

(c) The observed proportions are somewhat close to those suggested by the

empirical guidelines. However, the proportion between one and two standard

deviations (46 38) / 50 0.16− = is noticeably smaller than the expected

0.95 0.68 0.27− = .


2.117 (a) Median 16.7, 6.4Q= = and 3 7.1Q = .

(b) 370.4 / 55 6.7345x = = and 22506.22 (370.4) / 55 / 54 0.466s = − = .

(c) The boxplot of the data is

2.118 (a)


( ]4500, 1600− − 1 0.026

( 1600, 850]− − 1 0.026

( 850, 250]− − 2 0.051

( 250,0]− 8 0.205

(0,250] 13 0.333

(250,850] 6 0.154

(850,1650] 5 0.128

(1650, 2450] 3 0.077

Total 39 1.0000

(b) The frequency and relative frequency histograms are:

Yearly Changes in Dow Jones

Averages

0

2

4

6

8

10

12

14

-1600 -850 -250 0 250 850 1650 2450

Fre

qu

en

cy

41

Yearly Changes in Dow Jones Averages

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

1 2 3 4 5 6 7 8

Changes in DJ Average

Rela

tiv

e F

req

uen

cy

(Here, the classes were renumbered 1 – 8, consecutively, for simplicity.

(c) Since the value 0 is included in the interval ( 250,0]− , it is unclear how many

of the 8 observations contained in that interval are negative. It is, however,

unlikely that the Dow Jones average at the end of one year was exactly the

same as that in the previous year. It seems safe to assume that all 8

differences in the interval ( 250,0]− are negative. The proportion of changes

that are negative is then (1 1 2 8) / 39 0.308+ + + = .

(d) The distribution is roughly bell-shaped centered around class 5.

2.119 (a)

Year

C1

200820042000199619921988198419801976197219681964

4.2

4.1

4.0

3.9

3.8

3.7

3.6

Winning Times in Minutes


(b) It is not reasonable, because a frequency distribution would not show the

systematic decrease of the winning times over the years, which is the main

feature of these observations.

2.120 The mode is 1 bag, since the sample has twenty-four 1’s.

2.121 We calculate

416783.34

50x = = and

2416750

419, 41138.368

49s

−= =

2.122 (a) The time plot for this data set is as follows:

2004199919941989198419791974196919641959

200

150

100

50

Year

Number of Deaths

Time Plot for Deaths Due to Lightning

(b) The number of deaths due to lightning is steadily declining over the indicated

time period. The mean and standard deviation, while important, would not

capture this trend. The 5-number summary would be an important supplement

to report when describing this data set.

2.123 (a) The partial MINITAB output is

43

(b) The partial MINITAB output for the acid rain data.

(Note that MINITAB uses a slightly different convention for determining 1Q

and 3Q .)

2.124 (a) The histogram and box plot reveal a longer right hand tail. The five

observations 4749, 4846, 4949, 5005, and 5157 seem to be detached and

larger than expected for a bell-shaped pattern.

(b) Textbook scheme: 1 3470Q = MINITAB scheme: 1 3457.5Q =

The scheme used by the text yields a slightly higher value.

2.125 The partial MINITAB output for the data set in Table 4.

2.126 The MINITAB commands and partial output of the final times to run 1.5 miles in

Data Bank D.5.

2.127 The mean and standard deviation given by MINITAB are the rounded off values

of the answer given by SAS.


2.128 (a) Freshwater growth of male salmon: 98.350, 30.03x s= =

median 198.5, 80Q= = and 3 118Q = . The frequency table for freshwater

growth of male salmon is:


40-60 6 0.150

60-80 4 0.100

80-100 11 0.275

100-120 10 0.250

120-140 6 0.150

140-160 3 0.075

Total 40 1.000

Because the cells have equal width, we take the option of using relative

frequency for the vertical scale. The histogram is shown below.

(b) Freshwater growth of female salmon: 114.825, 22.22x s= =

median 1115.5, 98.5Q= = and 3 132.5Q = . The frequency table for

freshwater growth of female salmon is:


60-80 2 0.050

80-100 9 0.225

100-120 12 0.300

120-140 13 0.325

140-160 3 0.075

160-180 1 0.025

Total 40 1.000

Because the cells have equal width, we take the option of using relative

frequency for the vertical scale. The histogram is shown below.

45

(c) The box plots for freshwater growth of male and female salmon are:

2.129 (a) The histogram of the alligator data is

(b)

4035 155672

109.1 65.837 37 1

x s= = = =−


16 18 19 19 24 29 32 33 46 58

68 72 78 82 82 83 99 101 109 110

114 118 125 134 140 141 142 143 163 170

184 194 200 220 220 221 228

Median 1109, 58Q= = and 3 143Q = .

(b) We count in 34 positions to find the 90th percentile 220= . The only two

observations that are higher were taken from females.


2.131 (a)

(b) The ordered observations are

75.3 75.7 75.9 75.9 76.2 76.3 76.4 76.4 76.6 76.6

76.7 76.9 76.9 77.0 77.0 77.1 77.4 77.4 77.4 77.4

77.4 77.5 77.6 77.6 77.8 77.9 77.9 77.9 77.9 77.9

78.0 78.1 78.3 78.4 78.4 78.5 79.1 79.2 80.0 80.4

There are 40 observations so the median (77.4 77.4) / 2 77.4= + = . The first

quartile is the average of the 40 / 4 = 10th and 11th observations in the sorted

list. 1 (76.6 76.7) / 2 76.65Q = + = and 1 (77.9 78.0) / 2 77.95Q = + = .

(c) The interval x s± or (76.36,78.56) has relative frequency 30 / 40 0.75=

compared to 0.683. The interval 2x s± or (75.26,79.66) has relative

frequency 38 / 40 0.95= compared to 0.95. The interval 3x s± or

(74.16,80.76) has relative frequency 1 compared to 0.997. The agreement is

quite good.

mat 540 statistical concepts for research

Documents

frequency table

relative frequency histogram

table of relative frequencies

dot diagram of number

description of data

number of crashes

percentage of waste

dot diagram of amounts