chapter 2: summarizing data€¦ · political affiliation (independent, democrat, republican) is a...

22
1 Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2 nd edition, Chapter 2 Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 2: Summarizing Data Hildebrand, Ott and Gray Basic Statistical Ideas for Managers Second Edition 2 Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2 nd edition, Chapter 2 Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. Learning Objectives for Ch. 2 Understanding the different ways of classifying data Understanding the difference between descriptive statistics and inferential statistics Graphing data to reveal the basic pattern of data Understanding how the sample mean, trimmed sample mean and sample median measure the central tendency of data Understanding how the interquartile range and standard deviation measure the variability of data 3 Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2 nd edition, Chapter 2 Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. Learning Objectives for Ch. 2 Understanding when and how the Empirical Rule should be used Understanding the idea of outliers Understanding the need for calculators and statistical software Understanding the basic idea about statistical quality control

Upload: others

Post on 15-Aug-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 2: Summarizing Data€¦ · political affiliation (Independent, Democrat, Republican) is a variable. variable observational experimental cross-sectional time-series ... Click

1Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Chapter 2:

Summarizing Data

Hildebrand, Ott and Gray

Basic Statistical Ideas for Managers

Second Edition

2Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Learning Objectives for Ch. 2

• Understanding the different ways of classifying data

• Understanding the difference between descriptive statistics and inferential statistics

• Graphing data to reveal the basic pattern of data

• Understanding how the sample mean, trimmed sample mean and sample median measure the central tendency of data

• Understanding how the interquartile range and standard deviation measure the variability of data

3Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Learning Objectives for Ch. 2

• Understanding when and how the Empirical

Rule should be used

• Understanding the idea of outliers

• Understanding the need for calculators and

statistical software

• Understanding the basic idea about statistical

quality control

Page 2: Chapter 2: Summarizing Data€¦ · political affiliation (Independent, Democrat, Republican) is a variable. variable observational experimental cross-sectional time-series ... Click

4Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Introduction

5Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Introduction

• Variable is the measurable characteristic of an entity.

Exercise 2.60:

An office supply company does a third of its business

supplying local government and school districts. This

business is done by competitive bids. Each potential sale

requires a clerk to prepare a bid form. The firm had no

real idea of how much effort the bid preparation required,

so the bid clerk was asked to record the start and stop

times for a sample of 65 bids. The data were recorded

two ways: minutes spent per bid (MINPRBID in the output

below) and bids per hour BIDPERHR = 60/MINPRBID.

6Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Introduction

CASE MINPRBID BIDPERHR

1 155.000 0.387. . .. . . . . . 65 126.000 0.476

• MNPRBID is a .

• BIDPERHR is a .

variable

variable

Page 3: Chapter 2: Summarizing Data€¦ · political affiliation (Independent, Democrat, Republican) is a variable. variable observational experimental cross-sectional time-series ... Click

7Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Introduction

Example: Monthly percentage returns for IBM

from November 2000 through

September 2003 is a .

• Both examples illustrate data as opposed

to data.

• Exercise 2.60 illustrates data.

• The RIBM example illustrates data.

• The R^DJI and RIBM are variables.

• The use of the values 1, 2 and 3 to designate a voter’s

political affiliation (Independent, Democrat, Republican)

is a variable.

variable

observational

experimental

cross-sectional

time-series

quantitative

qualitative

8Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Introduction

• Descriptive Statistics vs. Inferential Statistics

• Descriptive Statistics - Data summarization

• Inferential Statistics - Use of sample data to make

inferences about a population

parameter.

• Population: the collection of objects upon which

measurements could be taken.

• Sample: a subset of the population.

• An example illustrating how Descriptive Statistics assists

in the Inferential Statistics process follows.

9Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Introduction

Example: A poll to determine the percentage favoring a national policy

• Using the sample percentage to make inference about the unknownpopulation percentage.

• Using numerical and graphical tools to summarize the sample of 500 people chosen.

• Treat data as being a sample from a population.

Population

Parameter of Interest:

Percentage of people

favoring the national

policy = ?

Sample

Of 500 people randomly

chosen, 300 support the

policy.

Sample Percentage:

300/500

[Inferential statistics]

[Descriptive statistics]

Page 4: Chapter 2: Summarizing Data€¦ · political affiliation (Independent, Democrat, Republican) is a variable. variable observational experimental cross-sectional time-series ... Click

10Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Section 2.1

The Distribution of Values of a Variable

(Graphical Procedures)

11Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

2.1 The Distribution of Values of a Variable

(Graphical Procedures)

Example: Monthly percentage returns for IBM and ^DJI.

• A partial listing of RIBM and R^DJI

Y(35)10.60Y35-1.49Y(35)35.39Y357.71Sep-03

Y(34)8.67Y341.97Y(34)31.77Y341.14Aug-03

Y(33)8.56Y332.76Y(33)19.71Y33-1.52Jul-03

Y(32)6.11Y321.53Y(32)17.83Y32-6.28Jun-03

Y(31)5.94Y314.37Y(31)10.31Y313.89May-03

Y(30)4.37Y306.11Y(30)8.24Y308.24Apr-03

Y(29)3.59Y291.28Y(29)7.71Y290.62Mar-03

………………………

………………………

Y(6)-5.48Y68.67Y(6)-10.50Y619.71Apr-01

Y(5)-5.87Y5-5.87Y(5)-10.70Y5-3.72Mar-01

Y(4)-6.23Y4-3.60Y(4)-10.81Y4-10.70Feb-01

Y(3)-6.87Y30.92Y(3)-10.84Y331.77Jan-00

Y(2)-11.08Y23.59Y(2)-19.46Y2-9.10Dec-00

Y(1)-12.37Y1-5.07Y(1)-22.65Y1-4.95Nov-00

Ordered

Sorted

R^DJI(%)DataR^DJI(%)Ordered

Sorted

RIBM(%)DataRIBM(%)Month

Monthly Percentage Returns for IBM and ^DJI

12Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

2.1 The Distribution of Values of a Variable

(Graphical Procedures)

• The RIBM and R^DJI will be used to illustrate the

graphical and numerical procedures that follow.

• Histogram: a graphical display of quantitative data.

• Horizontal axis shows intervals or classes (adjacent,

mutually exclusive).

• Vertical axis shows frequency or relative frequency

per class.

Page 5: Chapter 2: Summarizing Data€¦ · political affiliation (Independent, Democrat, Republican) is a variable. variable observational experimental cross-sectional time-series ... Click

13Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

2.1 The Distribution of Values of a Variable

(Graphical Procedures)

• Side-by-side histograms

Frequency

32241680-8-16-24

20

15

10

5

0

32241680-8-16-24

R^DJI RIBM

Side-by-Side Histograms of R^DJI and RIBM

14Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

2.1 The Distribution of Values of a Variable

(Graphical Procedures)

• Is there a “typical” return for either ^DJI or IBM?

Which intervals have the greatest frequency?

• Which return has more variability?

• Are the histograms “mound-shaped”?

15Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

2.1 The Distribution of Values of a Variable

(Graphical Procedures)

• Procedure to obtain side-by-side histograms using Minitab:

� Click on Graph � Histograms � Simple

� Click on “OK”

� In “Graph Variables” box, enter desired variables: R^DJI and RIBM

� Click on “Labels”, enter title

� Click on “OK”

� Click on “Multiple Graphs”

� Under “Show Graph Variables”, select “In separate panels of the same graph”

� Under “Same Scales for Graphs”, select “Same Y” and “Same X including bins”

� Click on “OK”

Page 6: Chapter 2: Summarizing Data€¦ · political affiliation (Independent, Democrat, Republican) is a variable. variable observational experimental cross-sectional time-series ... Click

16Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

2.1 The Distribution of Values of a Variable

(Graphical Procedures)

• Superimposed histograms facilitate comparison

of R^DJI and RIBM.

Data

Frequency

32241680-8-16-24

20

15

10

5

0

Variable

R^DJI

RIBM

Superimposed Histograms of RIBM(%) and R^DJI(%)

• For what interval(s), did R^DJI dominate RIBM?

17Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

2.1 The Distribution of Values of a Variable

(Graphical Procedures)

• Procedure to obtain superimposed histograms using Minitab:

� Click on Graph � Histograms �With Outline and Graph

� Click “OK”

� In “Graph Variables” box, enter desired variables: R^DJI and RIBM

� Click on “Labels”, enter title

� Click on “OK”

� Click on “Multiple Graphs”

� Under “Show Graph Variables”, select “Overlaid on the same graph”

� Click on “OK”

18Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

2.1 The Distribution of Values of a Variable

(Graphical Procedures)

• Stem-and-Leaf Diagram

• Truncate data if necessary.

• Separate each truncated value into a stem

component and a leaf component.

• Leaves are displayed individually in ascending order

for each stem.

Page 7: Chapter 2: Summarizing Data€¦ · political affiliation (Independent, Democrat, Republican) is a variable. variable observational experimental cross-sectional time-series ... Click

19Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

2.1 The Distribution of Values of a Variable

(Graphical Procedures)

Example: Monthly percentage returns for IBM.

• In general, the first data value is placed in the stem and the second

digit is the leaf.

• Sometimes, the first two digits go in the stem.

Example: Suppose the data range from 170 to 240.

• For a reasonable number of groups, we sometimes split the stem.

Example: Suppose the data range from 20 to 40.

……

……

-10-10.8393

-19-19.4639

-22-22.6463

Truncated ValuesRIBM

20Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

2.1 The Distribution of Values of a Variable

(Graphical Procedures)

Example: Monthly percentage returns for IBM and ^DJI.

1) RIBM

Stem-and-leaf of RIBM N = 35

Leaf Unit = 1.0

1 -2 2

2 -1 9

6 -1 0000

11 -0 98876

(8) -0 44332210

16 0 001134

10 0 67778

5 1 0

4 1 79

2 2

2 2

2 3 1

1 3 5

21Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

2.1 The Distribution of Values of a Variable

(Graphical Procedures)

2) R^DJIStem-and-leaf of R^DJI N = 35

Leaf Unit = 1.0

1 -1 2

2 -1 1

2 -0

4 -0 66

9 -0 55554

13 -0 3332

17 -0 1100

(8) 0 00111111

10 0 2223

6 0 45

4 0 6

3 0 88

1 1 0

• The stem-and-leaf diagram is a histogram turned on its side with more refinement.

Page 8: Chapter 2: Summarizing Data€¦ · political affiliation (Independent, Democrat, Republican) is a variable. variable observational experimental cross-sectional time-series ... Click

22Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

2.1 The Distribution of Values of a Variable

(Graphical Procedures)

• Procedure to obtain stem-and-leaf plot using Minitab:

� Click on Graph � Stem-and-Leaf

� In “Graph Variables” box, enter R^DJI and RIBM

� Click on “OK”

23Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

2.1 The Distribution of Values of a Variable

(Graphical Procedures)

• The boxplot, another graphical procedure, is

presented as part of Section 2.4.

24Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

2.1 The Distribution of Values of a Variable

(Graphical Procedures)

• Graphical displays can answer the following:

• Shape of the distribution (symmetric or skewed)

• Existence of a “typical value”

• Degree of variation

• Presence of outliers

Page 9: Chapter 2: Summarizing Data€¦ · political affiliation (Independent, Democrat, Republican) is a variable. variable observational experimental cross-sectional time-series ... Click

25Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Section 2.2

Two-Variable Summaries

26Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

2.2 Two-Variable Summaries

• Both Variables are QualitativeExercise 9.41:

A personnel director for a large, research-oriented firm categorizes colleges and universities as most desirable, good, adequate, and undesirable for purposes of hiring their graduates. Data are collected on 156 recent graduates and each is rated by a supervisor as outstanding, average or poor. It has been suggested that the school type makes a difference in the rating decision of the supervisor.

• The data are presented in a two-way frequency table or cross-tabulation table:

Supervisor’s Rating

School Type Outstanding Average Poor

Most Desirable 21 25 2

Good 20 36 10

Adequate 4 14 7

Undesirable 3 8 6

• Both factors or variables, “school type” and “rating” are qualitative.

27Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

2.2 Two-Variable Summaries

• Several approaches can be used to determine if there is a relationship between school type and rating.

• One approach is to look at the row percentages for each type of school. These are shown below.

Outstanding Average Poor Total

Most 21 25 2 4843.8 52.1 4.2 30.77

Good 20 36 10 6630.3 54.5 15.2 42.31

Adequate 4 14 7 2516.0 56.0 28.0 16.03

Undesirable 3 8 6 1717.6 47.1 35.3 10.90

Total 48 83 25 15630.77 53.21 16.03 100.00

• There appears to be a relation between the two factors since there is a tendency for “outstanding” percentage to decrease and the “poor”percentage to increase as one moves from the “Most Desirable” to the “Undesirable” school types.

• This is shown in the Excel “100% Stacked Column” graph that follows.

Page 10: Chapter 2: Summarizing Data€¦ · political affiliation (Independent, Democrat, Republican) is a variable. variable observational experimental cross-sectional time-series ... Click

28Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

2.2 Two-Variable Summaries

• The blue columns (Outstanding Rating) increase as one goes from Undesirable to Most Desirable type of school from which to recruit. The white columns (Poor Rating) decrease as one goes from Undesirable to Most Desirable type of school.

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Undesirable Adequate Good Most Desirable

Outstanding Average Poor

29Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

2.2 Two-Variable Summaries

• Another approach is to look at the column percentages

for each type of rating.

• These column percentages can be viewed as “given that

a person has received a specific rating, what is the

chance that the individual graduated from a certain type

of school?”

• This is shown in the Excel “100% Stacked Column” graph

that follows.

30Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

2.2 Two-Variable Summaries

• The green columns (Most Desirable Type of School) decrease as one goes from Outstanding to Poor supervisory rating. The blue columns (Undesirable Type of School) increase as one goes from Outstanding to Poor supervisory rating.

0%

20%

40%

60%

80%

100%

Outstanding Average Poor

Undesirable Adequate Good Most Desirable

Page 11: Chapter 2: Summarizing Data€¦ · political affiliation (Independent, Democrat, Republican) is a variable. variable observational experimental cross-sectional time-series ... Click

31Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

2.2 Two-Variable Summaries

• Another perspective is obtained by using Excel’s “3-D Column” graph.

• Average ratings are greatest for graduates from “Good” schools.

• Outstanding ratings are greatest for graduates from “Most Desirable”schools.

• This problem will be revisited and reanalyzed in Chapter 9.

Undesirable

Adequate

Good

Most Desirable

Outstanding

Average

Poor

0

5

10

15

20

25

30

35

40

Outstanding

Average

Poor

32Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

2.2 Two-Variable Summaries

• Both Variables are Quantitative

• Suppose one variable is labeled X and the other Y. A scatterplot is a

graph of the (X,Y) pairs and is used to assess the simultaneous

behavior of two quantitative variables.

• The scatterplot was introduced in Chapter 1 to assess if the RIBM

(the Y variable) increase, stay constant, or decrease as R^DJI (the X

variable) increase. The scatterplot follows.

• It is seen that as the R^DJI increases, the RIBM increases.

R ^ D JI

RIBM

1 050- 5- 1 0- 1 5

4 0

3 0

2 0

1 0

0

- 1 0

- 2 0

- 3 0

S c a t t e r p l o t o f R IB M v s . R ^ D J I

33Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Section 2.3

On the Average: Typical Values

(Numerical Methods for Summarizing Data)

Page 12: Chapter 2: Summarizing Data€¦ · political affiliation (Independent, Democrat, Republican) is a variable. variable observational experimental cross-sectional time-series ... Click

34Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

2.3 On the Average: Typical Values

(Numerical Methods for Summarizing Data)

• Data: y1, y2, ..., yn (n denotes size of sample)

Measures of Central Tendency or Location

• Sample Mean

∑=

=n

i

iyny1

)/1(

35Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

2.3 On the Average: Typical Values

(Numerical Methods for Summarizing Data)

Example: Monthly percentage returns for IBM and ^DJI.

Descriptive Statistics: RIBM, R^DJI

Variable N N* Mean SE Mean TrMean StDev Variance Minimum Q1

RIBM 35 0 0.437 2.08 -0.315 12.30 151.21 -22.65 -8.24

R^DJI 35 0 -0.341 0.894 -0.251 5.287 27.948 -12.369 -4.399

Variable Median Q3 Maximum Range IQR

RIBM -1.52 7.09 35.39 58.03 15.33

R^DJI 0.194 2.764 10.605 22.973 7.164

• RIBM = 0.437

• R^DJI = -0.341

y

y

Using only the mean return, IBM is

the preferable investment.

[Past returns may not reflect future returns.]

36Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

2.3 On the Average: Typical Values

(Numerical Methods for Summarizing Data)

• Procedure to obtain descriptive statistics using Minitab:

� Click on Stat � Basic Statistics � Display Descriptive Statistics

� In “Variables” box, enter R^DJI and RIBM

� Click on “Statistics”

� Select additional statistics of interest, such as Interquartile Range

� Click on “OK”

Page 13: Chapter 2: Summarizing Data€¦ · political affiliation (Independent, Democrat, Republican) is a variable. variable observational experimental cross-sectional time-series ... Click

37Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

2.3 On the Average: Typical Values

(Numerical Methods for Summarizing Data)

• Trimmed Mean (TRMEAN)

• Trim off the largest 5%, for example, and the smallest

5% of the observations and then calculate the sample

mean of the remaining 90% of the data values.

• Purpose: Minimize the effect of unusual observations.

Example: Monthly percentage returns for IBM and ^DJI.

• Trimmed meanRIBM = -0.315

• Trimmed meanR^DJI = -0.251

• The trimmed mean for IBM differs greatly from = 0.437

because of the two very high returns for IBM

y

38Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

2.3 On the Average: Typical Values

(Numerical Methods for Summarizing Data)

• Sample Median

The middle value after the data are arranged in

ascending order: y(1), y(2), ..., y(n)

. ,

,

even is n if yy

odd; is n if yMedian

+=

=

+

+

1 2

n

2

n

2

1n

2

1

39Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

2.3 On the Average: Typical Values

(Numerical Methods for Summarizing Data)

Example: Monthly percentage returns for IBM and ^DJI.

n = 35 ⇒ (n+1)/2 = 18

Median = y(18) {18th ordered observation}

• For RIBM: y(18) = -1.52

• For R^DJI: y(18) = 0.19

Using only the median return, ^DJI

is the preferable investment.

Page 14: Chapter 2: Summarizing Data€¦ · political affiliation (Independent, Democrat, Republican) is a variable. variable observational experimental cross-sectional time-series ... Click

40Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

2.3 On the Average: Typical Values

(Numerical Methods for Summarizing Data)

• Sample Mode

Don’t use as a measure of central tendency.

41Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Section 2.4

Measuring Variability

42Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

2.4 Measuring Variability

• Sample Range (denoted by R)

R = Largest observation

- Smallest observation

Example: Monthly percentage returns for IBM and ^DJI

• RIBM: R = 35.39 – ( - 22.65) = 58.03

• R^DJI: R = 10.60 – ( - 12.37) = 22.97

• Primary area of application: Statistical Process Control

• “The range is very sensitive to outliers …” (H, O & G)

• “... as the sample size increases, the range tends to

increase …” (H, O & G)

Page 15: Chapter 2: Summarizing Data€¦ · political affiliation (Independent, Democrat, Republican) is a variable. variable observational experimental cross-sectional time-series ... Click

43Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

2.4 Measuring Variability

• Sample Variance (denoted by s2)

• Preliminary concept: How do you measure distance

between two points a and b on the real number line?

a b

• ( )1

1n =

= −− ∑

22

1

n

i

i

s y y

Measure of distance: 2( )b a−

Measures squared distance

between each observation and

the sample mean

44Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

2.4 Measuring Variability

• The divisor (n – 1) is the “degrees of freedom”.

Example: Let n=3. Suppose we have three data values.

The building blocks of s2 are deviations:

• There is one constraint on these deviations: Σ(yi - ) = 0.

• For this example, the degrees of freedom = 3-1 = 2.• You have freedom to specify any 2 of the 3 deviations.

• Once you specify any two of the deviations, the third deviation has to be a value so that all deviations add to 0.

• In general, the degrees of freedom = n -1.

=⇒=== yyyy ,, 321 7 23 4

=−=−=− yyyyyy 321 ,,3 -1 -2

y

45Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

2.4 Measuring Variability

• Sample Variance

s2RIBM = 151.21 s2R^DJI = 27.948

• Sample Standard Deviation

sRIBM = 12.30% sR^DJI = 5.287%

Using only the standard deviation as the criterion,

^DJI is the preferable investment.

2s s=

Page 16: Chapter 2: Summarizing Data€¦ · political affiliation (Independent, Democrat, Republican) is a variable. variable observational experimental cross-sectional time-series ... Click

46Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

2.4 Measuring Variability

• Linking the histogram with the sample mean and sample standard deviation.

• Empirical Rule:

For a set of measurements having a mound-shaped histogram, the interval

• The approximation may be poor if the data are severely

skewed or bimodal, or contain outliers.

.3

;%952

;%681

tsmeasurementheofallelyapproximatcontainssy

tsmeasurementheofelyapproximatcontainssy

tsmeasurementheofelyapproximatcontainssy

±

±

±

47Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

2.4 Measuring Variability

Example: Monthly percentage returns for IBM and ^DJI.

• For the RIBM and R^DJI data, determine the intervals

and .

• Then determine the actual percentage of observations

within each of these intervals.

• How do these results compare with the percentages

specified by the Empirical Rule?

• Do the results correspond or disagree?

• State the reason for your answer.

sy ± sy 2±

48Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

2.4 Measuring Variability

RIBM Actual Percentage E.R.Percentage Percentage Difference

s = [- 11.86%, 12.74%] 29/35 = 82.9% 68% 21.9%

2s = [- 24.16%, 25.04%] 33/35 = 94.3% 95% -0.74%

R^DJI Actual Percentage E.R.Percentage Percentage Difference

s = [- 5.63%, 4.95%] 25/35 = 71.4% 68% 5.04%

2s = [- 10.92%, 10.23%] 32/35 = 91.4% 95% -3.75%

±y±y

±y±y

Page 17: Chapter 2: Summarizing Data€¦ · political affiliation (Independent, Democrat, Republican) is a variable. variable observational experimental cross-sectional time-series ... Click

49Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

2.4 Measuring Variability

• For the RIBM data set, the actual percentage within one standard deviation of the mean does not correspond with the value specified by the Empirical Rule. Look at the histogram for RIBM. This is not a mound-shaped histogram. We wouldn’t expect the E.R. to work well. Actually, it should not even be applied to the RIBM data. However, the correspondence between the actual percentage and that specified by the E.R. is fairly good for the two standard deviation interval, even though the histogram shows the E.R. should not have been used. The message here is that the E.R. can give results close to the actual percentage even though it should not have been used.

• If the actual and E.R. percentages do not correspond, this is a signal that the histogram is not mound-shaped.

• For the ^DJI data set, the actual percentage and that specified by the E.R. are closer as would be expected, because the histogram for R^DJI is closer to being mound-shaped.

• A fuller explanation of the E.R. is provided in Chapter 5.

50Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

2.4 Measuring Variability

• Quartiles• Separate data into 4 sections

• First Quartile = Q1 = y(k) , where k = (n + 1) / 4

• Third Quartile = Q3 = y(3k)

Example: Monthly percentage returns for IBM and ^DJI.

k = (35+1)/4 = 9 ⇒ Q1 = y(9), Q3 = y(27)

• How are the quartiles used to measure variability?

2.7647.09Q3

-4.399-8.24Q1

R^DJIRIBM

51Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

2.4 Measuring Variability

• Interquartile Range (IQR)

IQR = Q3 – Q1

Example: Monthly percentage returns for IBM and ^DJI.

For RIBM, IQR = 7.09 – (- 8.24) = 15.33

For R^DJI, IQR = 2.764 – (- 4.399) = 7.164

• Does the value of the IQR change if the smallest

(largest) observation get smaller (larger)?

• The IQR is used in the boxplot.

Page 18: Chapter 2: Summarizing Data€¦ · political affiliation (Independent, Democrat, Republican) is a variable. variable observational experimental cross-sectional time-series ... Click

52Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

2.4 Measuring Variability

• The boxplot uses 5 numbers to represent the

data distribution.

I N T E R Q U A R T I L E R A N G E

L E F T -H A N DW H I S K E R

R I G H T -H A N DW H I S K E R

Q 1

( 2 5 t h P E R C E N T I L E )

Q 3

( 7 5 t h P E R C E N T I L E )

OUTLIER

Extends to the

smallest observation

that is not an outlier

Extends to the

largest observation

that is not an outlier

MEDIAN

53Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

2.4 Measuring Variability

Example: Monthly percentage returns for IBM and ^DJI.

• RIBM has greater variation since its IQR is wider.

Data

RIBMR^DJI

40

30

20

10

0

-10

-20

-30

Side-by-Side Boxplots of R^DJI and RIBM

54Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

2.4 Measuring Variability

• Procedure to obtain side-by-side boxplots using Minitab:

� Click on Graph � Boxplot � Multiple Y’s (Simple)

� Click on “OK”

� In “Graph variables” box, enter columns where data is stored

� Select “Labels” and enter title

� Select “Data View” and choose desired options, such as “Median Symbol” and “Median Connect Line”

� Click on “OK”

Page 19: Chapter 2: Summarizing Data€¦ · political affiliation (Independent, Democrat, Republican) is a variable. variable observational experimental cross-sectional time-series ... Click

55Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

2.4 Measuring Variability

• Determination of outliers and serious outliers.

• An outlier is an observation outside the inner fences.

• Lower Inner Fence (LIF) = Q1 – (1.5)(IQR)

• Upper Inner Fence (UIF) = Q3 + (1.5)(IQR)

Example: Monthly percentage returns for IBM and ^DJI.

For RIBM: LIF = -8.24 – (1.5)(15.33) = -31.235

UIF = 7.09 + (1.5)(15.33) = 30.085

For R^DJI: LIF = -4.399 – (1.5)(7.164) = -15.145

UIF = 2.764 + (1.5)(7.164) = 13.51

• A serious outlier is an observation outside the outer fences.

• For outer fences, replace (1.5) by (3.0).

56Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

2.4 Measuring Variability

• Boxplot of RIBM and R^DJI with inner fences:

• RIBM has 2 outliers.

Data

RIBMR^DJI

40

30

20

10

0

-10

-20

-30

Side-by-Side Boxplots of R^DJI and RIBM

57Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Section 2.5

Calculators and Statistical Software

Page 20: Chapter 2: Summarizing Data€¦ · political affiliation (Independent, Democrat, Republican) is a variable. variable observational experimental cross-sectional time-series ... Click

58Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

2.5 Calculators and Statistical Software

• Facilitates the statistical analysis

• Software falls in two categories:• Dedicated statistical software, such as Minitab®,

SPSS®, and SAS®.

• Spreadsheet software.

• Should one use a spreadsheet or statistical software?

• The interested reader should refer to the following web sites:www.amstat.org/education/ASA_endorsement.html in the section titled “Support”

www-unix.oit.umass.edu/~evagold/excel.html

www.seismo.unr.edu//ftp/pub/updates/louie/mccullough.pdf

59Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Section 2.6

Statistical Methods

and Quality Improvement

60Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

2.6 Statistical Methods

and Quality Improvement

• History of Quality

• 1931 – Economic Control of Quality of Manufactured Product,

Walter Shewart

• Concept of statistical variation was unveiled

• Process control charts were introduced

• 1985 – Out of the Crisis,

Dr. W. Edwards Deming

• “14 Points for Management” were established

Page 21: Chapter 2: Summarizing Data€¦ · political affiliation (Independent, Democrat, Republican) is a variable. variable observational experimental cross-sectional time-series ... Click

61Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

2.6 Statistical Methods

and Quality Improvement

• Variation is part of any process

• Two types of variation:

• Common cause

• Inherent to every system

• Accounts for 80-90% of observed variation

• Special cause

• External sources

• Not inherent to system

• Accounts for 10-20% of observed variation

• Control chart is used to distinguish between common

cause and special cause variation.

62Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

2.6 Statistical Methods

and Quality Improvement

• Generic Control Chart

0 1 2 3 4 5

Value of Some Statistic

UCL

CL

LCL

TimeUCL – Upper

Control Limit

CL – Center

Line

LCL – Lower

Control Limit

• If the statistic plots outside the control limits, this

denotes special cause variation may be present.

• It could also be a false alarm.

Common Cause

Variation

Special Cause Variation

63Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Keywords: Chapter 2

• Descriptive Statistics

• Inferential Statistics

• Relative Frequency

• Histogram

• Mound-shaped

• Stem-and-Leaf Graph

• Boxplot

• Outlier

• Median

• Trimmed Mean

• Mean

• Variance

• Standard Deviation

• Empirical Rule

• Interquartile Range

• Control Chart

Page 22: Chapter 2: Summarizing Data€¦ · political affiliation (Independent, Democrat, Republican) is a variable. variable observational experimental cross-sectional time-series ... Click

64Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Summary of Chapter 2

• The different ways of classifying data

• The difference between descriptive statistics and inferential statistics

• Graphing data to understand how the values are distributed

• The rationale underlying the mean, trimmed mean and median as measures of central tendency

• The rationale underlying the standard deviation and interquartile range as measures of variability

65Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Summary of Chapter 2

• How the Empirical Rule links the histogram with

the mean and standard deviation

• An objective criterion used to determine outliers

• The need for calculators and statistical software

in analyzing data

• The difference between common and special

causes in statistical quality control

• How control charts distinguish between special

and common causes

66Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Summary of Chapter 2

Graphical and Numerical Methods for Summarizing Data

Graphical Methods

• Histogram

• Stem-and-leaf Plot

• Boxplot

Objective method for

determing outliers:

Inner fences

Q1 – (1.5)(IQR)

Q3 + (1.5)(IQR)

Numerical Methods

Measures of central tendency

• Sample mean ( )

• Trimmed mean

• Median

Measures of dispersion

• Sample standard deviation (s)

• IQR (Q3 – Q1)

yEmpirical R

ule

Empirical Rule