copyright © 2013, 2010 and 2007 pearson education, inc. chapter numerically summarizing data 3

82
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

Upload: deirdre-shields

Post on 12-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

Copyright © 2013, 2010 and 2007 Pearson Education, Inc.

Chapter

Numerically

Summarizing

Data

3

Page 2: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3
Page 3: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

Copyright © 2013, 2010 and 2007 Pearson Education, Inc.

Section

Measures of Central Tendency

3.1

Page 4: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-4

The arithmetic mean of a variable is computed by adding all the values in the data set and then dividing by the number of observations: “N” or “n”.

Page 5: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-5

The population mean, μ (mew), is computed using all the individuals in a population. The population mean is a parameter.

The sample mean, (x-bar), is computed using sample data. It is a statistic.

x

Page 6: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-6

If x1, x2, …, xN are the N observations of a variable from a population,

then the population mean, µ, is:

x1 x2 L xN

N

xiN

Page 7: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-7

If x1, x2, …, xn are the n observations of a variable from a sample,

then the sample mean, , is

x

x1 x2 L xn

n

xin

x

Page 8: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-8

EXAMPLE Computing a Population Mean

and a Sample Mean

The following data represent the travel times (min) to work for all seven employees of a company.

23, 36, 23, 18, 5, 26, 43

(a) Compute the population mean of this data.

(b)Then, take a simple random sample of n = 3 employees. Compute the sample mean.

(c) Then, take a second simple random sample of n = 3 employees. Again compute the sample mean.

Page 9: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-9

EXAMPLE Computing a Population Mean

1 2 7...

7

ix

Nx x x

23 36 23 18 5 26 43

7

174

7

24.9 minutes

(a)

Page 10: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-10

EXAMPLE Computing a Sample Mean

(b) Obtain a simple random sample of size n = 3 from the population of seven employees. Use this simple random sample to determine a sample mean. Find a second simple random sample and determine that sample mean.

1 2 3 4 5 6 7

23, 36, 23, 18, 5, 26, 43

5 36 26

322.3

x

36 23 26

328.3

x

Hint: Use Calc RandInt

Recall, pop mean = 24.9 min

Page 11: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-11

The median of a variable is the value that lies in the middle of the data when arranged (sorted) in

ascending order (Calc:Stat:SortA)

It is the value for which there are an equal number of actual data pieces above and below it.

The median may/may not be an actual piece of data.

.

Page 12: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-12

Finding the Median of a Data Set

1Enter the data into the Calc as a List (Stat)

2 Sort the data in ascending order. Assign each data piece a Rank starting at Min.

3Determine the number of observations, n.

4Determine the observation in the middle of the data set: Rank: (n+1)/2

Page 13: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-13

Steps in Finding the Median of a Data Set

•If n is odd, then the median is the actual data value in the middle of the data set.

•If n is even, then the median is the mean of the two middle observations.

Page 14: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-14

EXAMPLE Computing a Median of a Data Set

The following data represent the travel times (in minutes) to work for all seven employees of a start-up web development company.

23, 36, 23, 18, 5, 26, 43

Determine the median of this data.

Step 1: Sort (A): 5, 18, 23, 23, 26, 36, 43

1 7 14

2 2

n Step 2: So, the med is Rank = 4

5, 18, 23, 23, 26, 36, 43

Page 15: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-15

EXAMPLE Computing a Median of a Data Set

Suppose the start-up company hires a new employee. The travel time of the new employee is 70 minutes. Determine the median of the “new” data set.

23, 36, 23, 18, 5, 26, 43, 70

1 8 14.5

2 2

n

5, 18, 23, 23, 26, 36, 43, 70

24.5M

med is Rank 4 ½ piece of data

Page 16: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-16

EXAMPLE Computing a Median of a Data Set

The following data represent the travel times (in minutes) to work for all seven employees of a start-up company.

23, 36, 23, 18, 5, 26, 43

Suppose a new employee is hired who has a 130 min commute.

How does this impact the value of the mean and median?

Mean before new hire: 24.9 minutes

Mean after new hire: 38 minutes

Median before new hire: 23 minutes Median after new hire: 24.5 minutes

Page 17: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-17

A numerical summary of data (mean, median, etc) is said to be resistant if “extreme” data values (very large or very small) do not affect its value significantly.

Page 18: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-18

Page 19: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-19

EXAMPLE Describing the Shape of the Distribution

The following data represent the asking price ($) of homes for sale in Lincoln, NE.

Source: http://www.homeseekers.com

79,995 128,950 149,900 189,900

99,899 130,950 151,350 203,950

105,200 131,800 154,900 217,500

111,000 132,300 159,900 260,000

120,000 134,950 163,300 284,900

121,700 135,500 165,000 299,900

125,950 138,500 174,850 309,900

126,900 147,500 180,000 349,900

Page 20: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-20

The mean asking price is $168,320 and the median asking price is $148,700.

Therefore, we would conjecture (estimate) that the distribution is

skewed right.

Page 21: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-21

350000300000250000200000150000100000

12

10

8

6

4

2

0

Asking Price

Frequency

Asking Price of Homes in Lincoln, NE

Page 22: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-22

The mode of a variable is the most frequently occurring observation of the variable

(if there is one.)

If no data piece occurs more than once, we say the data have no mode.

A set of data can have no mode, one mode, or more than one mode.

Page 23: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-23

EXAMPLE Finding the Mode of a Data Set

The data on the next slide represent the Vice Presidents of the United States

and their state of birth.

Find the mode, if there is one.

Page 24: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-24

Joe Biden

Pennsylvania

Page 25: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-25

The mode is New York.

Page 26: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-26

Tally data to determine most frequent observation

Page 27: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3
Page 28: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

Copyright © 2013, 2010 and 2007 Pearson Education, Inc.

Section

Measures of Dispersion

(Variation)

3.2

Page 29: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-29

The range, R, of a variable is the difference between the largest data value and the smallest data

value.

Range = R = max - min value

Page 30: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-30

EXAMPLE Finding the Range of a Set of Data

The following data represent the travel times (min) to work for all seven employees of a start-

up company:

23, 36, 23, 18, 5, 26, 43

Find the range of the data.

Range = (max – min)

43 – 5 = 38 minutes

Page 31: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-31

The population standard deviation of a variable is the square root of the mean of the squared

deviations from the population mean.

The population standard deviation is symbolically represented by “σ” (lowercase Greek letter sigma).

Page 32: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-32

Computing Population Standard Deviation

The following data represent the travel times (min) to work for all seven employees of a start-up

company.

23, 36, 23, 18, 5, 26, 43

Compute the population standard deviation of this data.

Hint: First, put the data into a TI-84 List, then find the mean = 24.85714 min

Page 33: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-33

xi μ xi – μ (xi – μ)2

23 24.85714 -1.85714 3.44898

36 24.85714 11.14286 124.1633

23 24.85714 -1.85714 3.44898

18 24.85714 -6.85714 47.02041

5 24.85714 -19.8571 394.3061

26 24.85714 1.142857 1.306122

43 24.85714 18.14286 329.1633

902.8571 2

ix

xi 2N

902.8571

711.36 minutes

Page 34: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-34

The sample standard deviation, “s” of a variable uses the same computation except we divide by (n – 1) instead of N.

“n” is the sample size and “N” is the population size of the data set.

Page 35: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-35

EXAMPLE Computing a Sample Standard Deviation

Here are the results of a random sample of three times taken from the travel times (min) to work for all seven employees of a start-up company:

5, 26, 36

Find the sample standard deviation “s”.

Page 36: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-36

s xi x 2n 1

500.66667

215.82 minutes

Recall, the Pop std dev (using all 7 times) was 11.36 min.

Page 37: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

If you have final exam grades for two If you have final exam grades for two different math classes (A & B), different math classes (A & B),

and if the data is more dispersed and if the data is more dispersed (larger Range) for one class (A) than (larger Range) for one class (A) than

the other, the other, then the standard deviation of that then the standard deviation of that

class (A) will be larger than the class (A) will be larger than the other.other.

Page 38: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-38

The variance of a variable is the square of the standard deviation.

The math symbol for the population variance is σ2 ,

and for the sample variance is s2

Page 39: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-39

EXAMPLE Computing a Population Variance

The following data represent the travel times (min) to work for all seven employees of a start-up web development company.

23, 36, 23, 18, 5, 26, 43

Compute the population and sample variance of this data.

Page 40: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-40

EXAMPLE Computing a Population Variance

Recall from earlier that the pop standard deviation was σ = 11.36 min, so the pop variance is

σ2 = 129.05

From before, the sample std dev was s = 15.82 min, so the sample variance is s2 = 250.27

Page 41: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

TI-84TI-84

The calculator will give The calculator will give twotwo std devs:std devs:

“ “σσ ” for a Pop ” for a Pop

and “s” for a sample.and “s” for a sample.

It will It will notnot give give anyany variance variance or range statistics.or range statistics.

Page 42: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-42

The Empirical Rule68 – 95 - 99.7

If a distribution is roughly bell shaped, then:

1. Approx 68% of the data will lie within 1 standard deviation of the mean.

2. Approx 95% of the data will lie within 2 standard deviations of the mean.

3. Approx 99.7% of the data will lie between 3 std dev’s of the mean.

Page 43: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-43

Page 44: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-44

EXAMPLE Using the Empirical Rule

The following data represent the blood HDL cholesterol levels of all 54 female patients of Dr. Dracula.

41 48 43 38 35 37 44 44 4462 75 77 58 82 39 85 55 5467 69 69 70 65 72 74 74 7460 60 60 61 62 63 64 64 6454 54 55 56 56 56 57 58 5945 47 47 48 48 50 52 52 53

Page 45: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-45

Using a TI-84 we find: 7.11 and 4.57

Page 46: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-46

22.3 34.0 45.7 57.4 69.1 80.8 92.5

Actually, 45 of the 54, or 83.3% of his patients have HDL between 34.0 and 69.1.

According to the Empirical Rule, 99.7% of the patients will have HDL within 3 standard deviations of the mean.

13.5% + 34% + 34% = 81.5% of all patients will have HDL between 34.0 and 69.1 according to the Empirical Rule.

Page 47: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-47

Chebychev’s Theorem

For any type distribution (Normal or not),

at least

1 1

k2

100%

of the data lie within “k”

std devs of the mean, (k >1).

Page 48: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-48

EXAMPLE Using Chebychev’s Theorem

Using the data from the HDL blood example, use Chebychev’s Theorem to determine the percentage of

patients that have HDL levels within 3 std dev (SD) of the mean.

at least

(b) the actual percentage of his patients that had HDL between 34 and 80.8 (within 3 SD of mean).

52/54 ≈ 0.96 ≈ 96%

1 1

32

100% 88.9%

Page 49: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

Copyright © 2013, 2010 and 2007 Pearson Education, Inc.

Section

Measures of Central Tendency /Dispersion from Grouped Data

3.3

Page 50: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-50

We have discussed computing statistics from raw data, but often the only available data have already been summarized into frequency distributions called “grouped data”.

We cannot find exact values of the mean/std dev without raw data, but we can approximate these measures using the following techniques….

Page 51: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-51

1. Approximate the Mean of a Variable from a Frequency Distribution

where xi is the midpoint or value of the ith classfi is the frequency of the ith classn is the number of classes

x xi fifi

x1 f1 x2 f2 ... xn fn

f1 f2 ... fn

Page 52: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-52

Hours 0 1-5 6-10 11-15 16-20 21-25 26-30 31-35

Freq 0 130 250 230 180 100 60 50

The National Survey of Student Engagement is a 2007 survey asking freshman college students how much time

they spend preparing for class each week.

Approximate the mean number of hours spent preparing for class each week.

EXAMPLE Approximating the Mean from a Frequency Distribution

Page 53: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-53

Time (hr) Frequency MP = xi xi fi

0 0 0 0

1 - 5 130 3 390

6 - 10 250 8 2000

11 - 15 230 13 2990

16 - 20 180 18 3240

21 - 25 100 23 2300

26 – 30 60 28 1680

31 – 35 50 33 1650

x

if

i 14,250 f

i 1000

x xi fifi

14,250

100014.25

Page 54: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-54

2. The weighted mean, of a variable is found by multiplying each value of the variable

by its corresponding weight, adding these products, and dividing by the sum of the weights.

xw wi xiwi

w1x1 w2 x2 ... wn xn

w1 w2 ... wn

where w is the weight of the ith observationxi is the value of the ith observation

xw

Page 55: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-55

EXAMPLE Computed a Weighted Mean

Kayla goes to the Nut store and creates her own snack mix. She combines 1 pound of raisins, 2 pounds of chocolate-covered peanuts, and 1.5 pounds of cashews. The raisins cost $1.25 per pound, the chocolate covered peanuts cost $3.25 per pound, and the cashews cost $5.40 per pound.

What is the mean cost per pound of this mix?

xw 1($1.25) 2($3.25)1.5($5.40)

1 2 1.5

$15.85

4.5$3.52

Page 56: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-56

Approximate the Standard Deviation of a Variable from a Frequency Distribution

xi 2

fifi

SampleStandard Deviation

PopulationStandard Deviation

where xi is the midpoint of the ith classfi is the frequency of the ith class

s xi x 2

fifi 1

Page 57: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

Copyright © 2013, 2010 and 2007 Pearson Education, Inc.

Section

Measures of Position and Outliers

3.4

Page 58: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-58

A z-score describes how many std dev’s a data piece is from the mean

(above or below). There is both a population z-score and

a sample z-score:

z x

Sample z-scorePopulation z-score

The z-score has mean of 0 and standard deviation of 1.

z x x

s

Page 59: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-59

EXAMPLE Using Z-Scores

The mean height of adult males is 69.1 inches with a standard deviation of 2.8 inches. The mean height

of adult females is 63.7 inches with a standard deviation of 2.7 inches.

Who is relatively taller: a man whose height is 83 inches, or a woman whose height is 76 inches?

(In other words, which one is further from the mean of their gender?)

Page 60: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-60

The man’s height is 4.96 standard deviations above the male mean. The woman’s height is 4.56 standard deviations above the female mean.

The man is relatively taller because his height is further above the mean height of men than hers is above that of women.

zkg 83 69.1

2.84.96

zcp 76 63.7

2.74.56

Page 61: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-61

The kth percentile, denoted, Pk , of a set of data is a value such that “k” percent of the observations are less than or equal to the value.

Page 62: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-62

EXAMPLE Interpret a Percentile

The Graduate Record Examination (GRE) is an exam required for admission to many U.S. graduate schools.

The University of Pittsburgh Graduate School of Public Health requires a GRE score no less than the 70th

percentile for admission into their Human Genetics Master of Science program.

Interpret this P70 admissions requirement.

Page 63: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-63

EXAMPLE Interpret a Percentile

The 70th percentile is the score such that 70% of the individuals who took the exam scored worse, and 30% of the individuals scored the same or better.

In order to be admitted to this program, an applicant must score higher than 70% of the people who take the GRE.

Or, the individual’s score must be in the top 30%.

Page 64: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-64

EXAMPLE Percentile

The following are scores on a Stats exam:

42,50,59,62,68,73,76,81,86,90,94,100

What is the percentile value of the 81 score?

7/12 = 0.58

… and 94 is the score

58P

83P

Page 65: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-65

“Quartiles” divide data into fourths, or four equal parts.

• The 1st quartile, Q1, divides the bottom 25% the data from the top 75%. The 1st quartile is equivalent to the 25th percentile.

• divides the bottom 50% of the data from the top 50%. It is equivalent to the 50th percentile, which is equivalent to the median.

• divides the bottom 75% of the data from the top 25%. It is equivalent to the 75th percentile.

2Q

3Q

Page 66: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-66

Finding Quartiles

Step 1 Arrange the data in ascending order.

Step 2 Determine the median, M, or second quartile, Q2 .

Step 3 Divide the data set into halves: the observations below (to the left of) M and the observations above M. The first quartile, Q1 , is the median of the bottom half, and the third quartile, Q3 , is the median of the top half.

Page 67: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-67

A group of Brigham Young University students collected data on the speed of vehicles traveling

through a construction zone on a state highway, where the posted speed was 25 mph.

The recorded speed of 14 randomly selected vehicles is given below:

20, 24, 27, 28, 29, 30, 32, 33, 34, 36, 38, 39, 40, 40

Find and interpret the quartiles for speed in the construction zone.

EXAMPLE Finding and Interpreting Quartiles

Page 68: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-68

EXAMPLE Finding and Interpreting Quartiles

n = 14 observations Median = = 32.5 mph

The median of the bottom half of the data is Q1

20, 24, 27, 28, 29, 30, 32

The median of these seven observations is 28. Therefore, Q1 = 28.

The median of the top half of the data is the third quartile, Q3 and Q3 = 38.

2Q

Page 69: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-69

Interpretation:

• 25% of the speeds are less than or equal to 28 mph and 75% of the speeds are greater than 28 mph.

• 50% of the speeds are less than or equal to the median, 32.5 mph, and 50% of the speeds are greater.

• 75% of the speeds are less than or equal to 38 mph, and 25% of the speeds are greater.

Page 70: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-70

The Interquartile range, IQR, is the range of the middle 50% of the data

observations.

The IQR is the difference between the third and first quartiles: IQR = Q3 – Q1

In the vehicle speed problem, the IQR = 38-28 = 10 mph, and 50% of

the observed speeds lie between 28 and 38 mph.

Page 71: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-71

EXAMPLE Determining and Interpreting the Interquartile Range

Determine and interpret the interquartile range of the speed data.

Q1 = 28 Q3 = 38

The range of the middle 50% of car speeds traveling through the construction zone is

10 miles per hour.

IQR Q3 Q

1

38 28

10

Page 72: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-72

Suppose a 15th car travels through the construction zone at 100 mph. How does this value impact the mean, median, standard deviation, and interquartile range?

Without 15th car With 15th car

Mean 32.1 mph 36.7 mph

Median 32.5 mph 33 mph

Standard deviation 6.2 mph 18.5 mph

IQR 10 mph 11 mph

Page 73: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-73

Checking for “Outliers” by Using Quartiles

1. Compute the interquartile range.

2. Determine the fences. Fences serve as cutoff points for determining outliers.

Lower Fence = Q1 – 1.5(IQR): 28-15 = 13 mph

Upper Fence = Q3 + 1.5(IQR): 38+15 = 53 mph

3. Any data value outside the fence is called an “outlier” (asterisked) and does not qualify as a min/max value.

Page 74: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

Copyright © 2013, 2010 and 2007 Pearson Education, Inc.

Section

The 5-Number Summary and

Boxplots

3.5

Page 75: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-75

The Five-number summary of a data set consists of the Min data value, Q1, the Median, Q3, and the Max data value as

follows:

Page 76: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-76

EXAMPLE Obtaining the Five-Number Summary

Every six months, the US Federal Reserve Board conducts a survey of credit card bank plans in the

U.S.

The following data are the interest rates charged by 10 randomly selected banks who issue credit

cards for the July 2005 survey.

Determine the Five-number summary of the data.

Page 77: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-77

EXAMPLE Obtaining the Five-Number Summary

Institution Rate

Pulaski Bank and Trust Company 6.5%

Rainier Pacific Savings Bank 12.0%

Wells Fargo Bank NA 14.4%

Firstbank of Colorado 14.4%

Lafayette Ambassador Bank 14.3%

Infibank 13.0%

United Bank, Inc. 13.3%

First National Bank of The Mid-Cities 13.9%

Bank of Louisiana 9.9%

Bar Harbor Bank and Trust Company 14.5%Source: http://www.federalreserve.gov/pubs/SHOP/survey.htm

Page 78: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-78

EXAMPLE Obtaining the Five-Number Summary

Enter the % data into a TI-84 List:

12.0, 13.3, 13.9, 14.3,14.4, 14.5, 9.9, 6.5, 13.0,14.4

1-Var Stats will give the 5-Number Summary as follows:

Five-number Summary:

6.5% 12.0% 13.6% 14.4% 14.5%

Page 79: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-79

The interquartile range (IQR) is 14.4% - 12% = 2.4%

The lower and upper fences are:

Lower Fence = Q1 – 1.5(IQR) = 12 – 1.5(2.4) = 8.4%

Upper Fence = Q3 + 1.5(IQR) = 14.4 + 1.5(2.4) = 18.0%

5-N: 6.5% 12.0% 13.6% 14.4% 14.5%

[ ]*

Page 80: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

3-80

The bank credit card rate boxplot indicates that the distribution is skewed left.

Use a boxplot and quartiles to describe the shape of a distribution.

Page 81: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3

ENDEND

CHAP 3CHAP 3

Summarizing Numerical Summarizing Numerical DataData

Page 82: Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3