introduction to business statistics qm 120 ch t chapter 3 3 slides.pdf · 2008. 3. 3. · measures...

38
DEPARTMENT OF QUANTITATIVE METHODS & INFORMATION SYSTEMS Introduction to Business Statistics Introduction to Business Statistics QM 120 Ch t 3 Chapter 3 Dr. Mohammad Zainal Spring 2008

Upload: others

Post on 06-Mar-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Business Statistics QM 120 Ch t Chapter 3 3 Slides.pdf · 2008. 3. 3. · Measures of central tendency for ungrouped data 8 Median ¾The median is the value of the

DEPARTMENT OF QUANTITATIVE METHODS & INFORMATION SYSTEMS

Introduction to Business StatisticsIntroduction to Business StatisticsQM 120

Ch t 3Chapter 3

Dr. Mohammad ZainalSpring 2008

Page 2: Introduction to Business Statistics QM 120 Ch t Chapter 3 3 Slides.pdf · 2008. 3. 3. · Measures of central tendency for ungrouped data 8 Median ¾The median is the value of the

Measures of central tendency for ungrouped data2

Graphs are very helpful to describe the basic shape of a datadistribution; “a picture is worth a thousand words.” There are

2

distribution; a picture is worth a thousand words. There arelimitations, however to the use of graphs.

One way to overcome graph problems is to use numericalOne way to overcome graph problems is to use numericalmeasures, which can be calculated for either a sample or apopulation.population.

Numerical descriptive measures associated with a populationof measurements are called parameters; those computed fromof measurements are called parameters; those computed fromsample measurements are called statistics.

A measure of central tendency gives the center of a histogramA measure of central tendency gives the center of a histogramor frequency distribution curve.

Th di d dThe measures are: mean, median, and mode.QM-120, M. Zainal

Page 3: Introduction to Business Statistics QM 120 Ch t Chapter 3 3 Slides.pdf · 2008. 3. 3. · Measures of central tendency for ungrouped data 8 Median ¾The median is the value of the

Measures of central tendency for ungrouped data3

Data that give information on each member of the populationor sample individually are called ungrouped data, whereas

3

or sample individually are called ungrouped data, whereasgrouped data are presented in the form of a frequencydistribution table.MeanThe mean (average) is the most frequently used measure of( g ) q ycentral tendency.

The mean for ungrouped data is obtained by dividing the sume ea o u g ouped data is obtai ed by di idi g t e suof all values by the number if values in the data set. Thus,

x∑

xx

Nx

=

=

:sampleforMean

:populationfor Mean μ

nx =:samplefor Mean

QM-120, M. Zainal

Page 4: Introduction to Business Statistics QM 120 Ch t Chapter 3 3 Slides.pdf · 2008. 3. 3. · Measures of central tendency for ungrouped data 8 Median ¾The median is the value of the

Measures of central tendency for ungrouped data4

Example: The following table gives the 2002 total payrolls of five MLBteams.

4

2002 total payroll (millions of dollars)

MLB team

62Anaheim Angels93Atlanta Braves126New York Yankees 126New York Yankees75St. Louis Cardinals34Tampa Bay Devil Rays The Mean is

a balancing point

34 62 9375 126

QM-120, M. Zainal

Page 5: Introduction to Business Statistics QM 120 Ch t Chapter 3 3 Slides.pdf · 2008. 3. 3. · Measures of central tendency for ungrouped data 8 Median ¾The median is the value of the

Measures of central tendency for ungrouped data5

Sometimes a data set may contain a few very small or a fewvery large values. Such values are called outliers or extreme

5

very large values. Such values are called outliers or extremevalues.

We should be very cautious when using the mean It may notWe should be very cautious when using the mean. It may notalways be the best measure of central tendency.

Example: The following table lists the 2000 populations (inExample: The following table lists the 2000 populations (inthousands) of five Pacific states.

Excluding California

5894Washington

Population (thousands)

State5.2788

41212 627 3421 5894 Mean =

+++=

1212Hawaii

627Alaska

3421Oregon

An outlier 2.9005

533,8721212 627 3421 5894 Mean =

++++=

Including California

California 33,872 5QM-120, M. Zainal

Page 6: Introduction to Business Statistics QM 120 Ch t Chapter 3 3 Slides.pdf · 2008. 3. 3. · Measures of central tendency for ungrouped data 8 Median ¾The median is the value of the

Measures of central tendency for ungrouped data6

Weighted Mean

Sometimes we may assign weight (importance) to each

6

Sometimes we may assign weight (importance) to each

observation before we calculate the mean.

A mean computed in this manner is refereed to as a weighted

mean and it is given byg y

∑∑=

i

ii

WxW

x

where xi is the value of observation i and Wi is weight for

obser ation i

∑ i

observation i.

QM-120, M. Zainal

Page 7: Introduction to Business Statistics QM 120 Ch t Chapter 3 3 Slides.pdf · 2008. 3. 3. · Measures of central tendency for ungrouped data 8 Median ¾The median is the value of the

Measures of central tendency for ungrouped data7

Example: Consider the following sample of four purchases of one stock in the KSE Find the average cost of the stock

7

one stock in the KSE. Find the average cost of the stock.

Purchase Price Quantity

1 300 5 0001 .300 5,000

2 .325 15,000

3 .350 10,000

4 .295 20,000

QM-120, M. Zainal

Page 8: Introduction to Business Statistics QM 120 Ch t Chapter 3 3 Slides.pdf · 2008. 3. 3. · Measures of central tendency for ungrouped data 8 Median ¾The median is the value of the

Measures of central tendency for ungrouped data8

MedianThe median is the value of the middle term in a data set has

8

set data ranked a in term2

1n the of Value Medianth

⎟⎠⎞

⎜⎝⎛ +

=

been ranked in increasing order.

2 ⎠⎝The value .5(n + 1) indicates the position in the ordered dataset.

If n is even, we choose a value halfway between the twomiddle observationsExample: Find the median for the following two sets ofmeasurements

2, 9, 11, 5, 6  2, 9, 11, 5, 6, 27

QM-120, M. Zainal

Page 9: Introduction to Business Statistics QM 120 Ch t Chapter 3 3 Slides.pdf · 2008. 3. 3. · Measures of central tendency for ungrouped data 8 Median ¾The median is the value of the

Measures of central tendency for ungrouped data9

ModeThe mode is the value that occurs with the highest frequency

9

g q yin a data set.

A data with each value occurring only once has no mode.A data with each value occurring only once has no mode.

A data set with only one value occurring with highestfrequency has only one mode it is said to be unimodalfrequency has only one mode, it is said to be unimodal.

A data set with two values that occurs with the same (highest)frequency has two modes it is said to be bimodalfrequency has two modes, it is said to be bimodal.

If more than two values in a data set occur with the same(hi h t) f it i id t b lti d l(highest) frequency, it is said to be multimodal.

QM-120, M. Zainal

Page 10: Introduction to Business Statistics QM 120 Ch t Chapter 3 3 Slides.pdf · 2008. 3. 3. · Measures of central tendency for ungrouped data 8 Median ¾The median is the value of the

Measures of central tendency for ungrouped data10

Example: You are given 8 measurements: 3, 5, 4, 6, 12, 5, 6, 7.Find

10

a) The mean. b) The median. c) The mode.

Relationships among the mean, median, and Modep g

Symmetric histograms when

Mean = Median = ModeMean  Median  Mode

Right skewed histograms when

Mean > Median > ModeMean > Median > Mode

Left skewed histograms when

d dMean < Median < ModeQM-120, M. Zainal

Page 11: Introduction to Business Statistics QM 120 Ch t Chapter 3 3 Slides.pdf · 2008. 3. 3. · Measures of central tendency for ungrouped data 8 Median ¾The median is the value of the

Measures of dispersion for ungrouped data11

RangeData sets may have same center but look different because of

11

ythe way the numbers are spread out from center.Example:Company 1: 47 38 35 40 36 45 39Company 2: 70 33 18 52 27

Measure of variability can help us to create a mental picture ofthe spread of the data.The range for ungrouped data

Range = Largest value – Smallest valueThe range, like the mean, is highly influenced by outliers.The range is based on two values only.g y

QM-120, M. Zainal

Page 12: Introduction to Business Statistics QM 120 Ch t Chapter 3 3 Slides.pdf · 2008. 3. 3. · Measures of central tendency for ungrouped data 8 Median ¾The median is the value of the

Measures of dispersion for ungrouped data12

Variance and standard deviationThe standard deviation is the most used measure of dispersion.

12

pIt tells us how closely the values of a data set are clusteredaround the mean.

In general, larger values of standard deviation indicate thatvalues of that data set are spread over a relatively larger rangep y g garound the mean and vice versa.

( ) ( )22

22 ∑∑∑∑

xx( ) ( )

1 :sample and, :Population

2

2

2

2

−=

−=

∑∑∑∑n

nx

xs

NNx

Standard deviation is always non‐negative

22 and ss == σσ

Standard deviation is always non negativeQM-120, M. Zainal

Page 13: Introduction to Business Statistics QM 120 Ch t Chapter 3 3 Slides.pdf · 2008. 3. 3. · Measures of central tendency for ungrouped data 8 Median ¾The median is the value of the

Measures of dispersion for ungrouped data13

Example: Find the standard deviation of the data set in table.13

2002 payroll (millions of dollars)

MLB team

62A h i A l 62Anaheim Angels

93Atlanta Braves

126New York Yankees

75St. Louis Cardinals

34Tampa Bay Devil Rays

QM-120, M. Zainal

Page 14: Introduction to Business Statistics QM 120 Ch t Chapter 3 3 Slides.pdf · 2008. 3. 3. · Measures of central tendency for ungrouped data 8 Median ¾The median is the value of the

Measures of dispersion for ungrouped data14

In some situations we may be interested in a descriptivestatistics that indicates how large is the standard deviation

14

statistics that indicates how large is the standard deviationcompared to the mean.

It is very useful when comparing two different samples withIt is very useful when comparing two different samples withdifferent means and standard deviations.

It is given by:It is given by:

⎞⎛σ %100⎟⎟⎠

⎞⎜⎜⎝

⎛×=

μσCV

QM-120, M. Zainal

Page 15: Introduction to Business Statistics QM 120 Ch t Chapter 3 3 Slides.pdf · 2008. 3. 3. · Measures of central tendency for ungrouped data 8 Median ¾The median is the value of the

Mean, variance, and standard deviation for grouped data15

Mean for grouped dataOnce we group the data, we no longer know the values of

15

g p , gindividual observations.

Thus, we find an approximation for the sum of these values.Thus, we find an approximation for the sum of these values.

l tifMmf∑

:sampleforMean

:populationfor Mean

mfx

Nf

=

class. a offrequency theis andmidpoint theis Where

:samplefor Mean

fmn

x

QM-120, M. Zainal

Page 16: Introduction to Business Statistics QM 120 Ch t Chapter 3 3 Slides.pdf · 2008. 3. 3. · Measures of central tendency for ungrouped data 8 Median ¾The median is the value of the

Mean, variance, and standard deviation for grouped data16

Variance and standard deviation for grouped data16

( )22 mffm − ∑∑

( ) :Population

22

2

mfN

Nf

=

∑∑

∑σ

( )

1 : Sample

2

2

nnmf

fms

−=

∑∑

class. a of frequency the is andmidpoint the is Where fm

22 and ss == σσ

QM-120, M. Zainal

Page 17: Introduction to Business Statistics QM 120 Ch t Chapter 3 3 Slides.pdf · 2008. 3. 3. · Measures of central tendency for ungrouped data 8 Median ¾The median is the value of the

Mean, variance, and standard deviation for grouped data17

Example: The table below gives the frequency distribution ofthe daily commuting times (in minutes) from home to CBA for

17

all 25 students in QMIS 120. Calculate the mean and thestandard deviation of the daily commuting times.

fDaily commuting time (min)40 to less than 10910 to less than 20 910 to less than 20620 to less than 30430 to less than 40240 to less than 50

25Total

QM-120, M. Zainal

Page 18: Introduction to Business Statistics QM 120 Ch t Chapter 3 3 Slides.pdf · 2008. 3. 3. · Measures of central tendency for ungrouped data 8 Median ¾The median is the value of the

Use of standard deviation18

Z‐ScoresWe often are interested in the relative location of a data point xi

18

We often are interested in the relative location of a data point xiwith respect to the mean (how far or how close).

The z‐scores (standardized value) can be used to find theThe z scores (standardized value) can be used to find therelative location of a data point xi compared to the center of thedata.data.

It is given by

sxxZ i

i−

=

The z‐score can be interpreted as the number of standarddeviations xi is from the mean.

QM-120, M. Zainal

Page 19: Introduction to Business Statistics QM 120 Ch t Chapter 3 3 Slides.pdf · 2008. 3. 3. · Measures of central tendency for ungrouped data 8 Median ¾The median is the value of the

Use of standard deviation19

Example: Find the z‐scores for the following data where themean is 44 and the standard deviation is 8.

19

mean is 44 and the standard deviation is 8.

Number of students in a class

4654424632

QM-120, M. Zainal

Page 20: Introduction to Business Statistics QM 120 Ch t Chapter 3 3 Slides.pdf · 2008. 3. 3. · Measures of central tendency for ungrouped data 8 Median ¾The median is the value of the

Use of standard deviation20

Chebyshevʹs TheoremChebyshevʹs theorem allows you to understand how the value

20

y yof a standard deviation can be applied to any data set.

Chebyshevʹs theorem: The fraction of any data set lying withinChebyshev s theorem: The fraction of any data set lying withink standard deviations of the mean is at least

1

where k = a number greater than 1.

2

11k

where k a number greater than 1.

This theorem applies to all data sets, which include a sample ora populationa population.

Chebyshev’s theorem is very conservative but it can be appliedto a data set with any shapeto a data set with any shape.

QM-120, M. Zainal

Page 21: Introduction to Business Statistics QM 120 Ch t Chapter 3 3 Slides.pdf · 2008. 3. 3. · Measures of central tendency for ungrouped data 8 Median ¾The median is the value of the

Use of standard deviation2121

k = 1

k = 2 k = 3QM-120, M. Zainal

Page 22: Introduction to Business Statistics QM 120 Ch t Chapter 3 3 Slides.pdf · 2008. 3. 3. · Measures of central tendency for ungrouped data 8 Median ¾The median is the value of the

Use of standard deviation22

Example: The average systolic blood pressure for 4000 womenwho were screened for high blood pressure was found to be 187

22

who were screened for high blood pressure was found to be 187with standard deviation of 22. Using Chebyshev’s theorem, findat least what percentage of women in this group have a systolicp g g p yblood pressure between 143 and 231.

QM-120, M. Zainal

Page 23: Introduction to Business Statistics QM 120 Ch t Chapter 3 3 Slides.pdf · 2008. 3. 3. · Measures of central tendency for ungrouped data 8 Median ¾The median is the value of the

Use of standard deviation23

Empirical RuleThe empirical rule gives more precise information about a data 

23

set than the Chebyshevʹs Theorem, however it only applies to a data set that is bell‐shaped.

ThTheorem:1‐ 68% of the observations lie within one standard deviation of the mean.2 95% of the observations lie within two standard deviations of the mean2‐ 95% of the observations lie within two standard deviations of the mean.3‐ 99.7% of the observations lie within three standard deviations of the mean.

QM-120, M. Zainal

Page 24: Introduction to Business Statistics QM 120 Ch t Chapter 3 3 Slides.pdf · 2008. 3. 3. · Measures of central tendency for ungrouped data 8 Median ¾The median is the value of the

Use of standard deviation24

Example: The age distribution of a sample of 5000 persons isbell‐shaped with a mean of 40 years and a standard deviation of

24

12 years. Determine the approximate percentage of people whoare 16 to 64 years old.

QM-120, M. Zainal

Page 25: Introduction to Business Statistics QM 120 Ch t Chapter 3 3 Slides.pdf · 2008. 3. 3. · Measures of central tendency for ungrouped data 8 Median ¾The median is the value of the

Use of standard deviation25

Detecting outliersSometimes a data set may have one or more observation that is

25

Sometimes a data set may have one or more observation that isunusually small or large value.

This extreme value is called an outlier and cam be detectedThis extreme value is called an outlier and cam be detectedusing the z‐score and the empirical rule for data with bell‐shapedistributiondistribution.

An experienced statistician may face the following situationsd d t t k tiand need to take an action

Outlier Action

A data value that was incorrectly recorded Correct it before any further analysisA data value that was incorrectly recorded Correct it before any further analysis

A data value that was incorrectly included Remove it before any further analysis

A data value that belongs to the data set and  Keep it !correctly recorded

QM-120, M. Zainal

Page 26: Introduction to Business Statistics QM 120 Ch t Chapter 3 3 Slides.pdf · 2008. 3. 3. · Measures of central tendency for ungrouped data 8 Median ¾The median is the value of the

Measure of position26

Quartiles and interquartile rangeA measure of position which determines the rank of a single

26

A measure of position which determines the rank of a singlevalue in relation to other values in a sample or population.

Quartiles are three measures that divide a ranked data set intoQuartiles are three measures that divide a ranked data set intofour equal parts

QM-120, M. Zainal

Page 27: Introduction to Business Statistics QM 120 Ch t Chapter 3 3 Slides.pdf · 2008. 3. 3. · Measures of central tendency for ungrouped data 8 Median ¾The median is the value of the

Measure of position27

Calculating the quartilesThe second quartile is the median of a data set.

27

q

The first quartile, Q1, is the value of the middle term among observations that are less than the median.obse a io s a a e ess a e e ia

The third quartile, Q3, is the value of the middle term among observations that are greater than the median.observations that are greater than the median.

Interquartile range (IQR) is the difference between the thirdInterquartile range (IQR) is the difference between the third quartile and the first quartile for a data set.

IQR = Interquartile range = Q3 – Q1Q q g Q3 Q1

QM-120, M. Zainal

Page 28: Introduction to Business Statistics QM 120 Ch t Chapter 3 3 Slides.pdf · 2008. 3. 3. · Measures of central tendency for ungrouped data 8 Median ¾The median is the value of the

Measure of position28

Example: For the following data75.3    82.2    85.8    88.7    94.1    102.1    79.0    97.1    104.2    119.3    81.3    77.1 

28

FindThe values of the three quartilesThe values of the three quartiles.

The interquartile range.

Where does the 104 2 fall in relation to these quartilesWhere does the 104.2 fall in relation to these quartiles.

QM-120, M. Zainal

Page 29: Introduction to Business Statistics QM 120 Ch t Chapter 3 3 Slides.pdf · 2008. 3. 3. · Measures of central tendency for ungrouped data 8 Median ¾The median is the value of the

Measure of position29

Percentile and percentile rankPercentiles are the summery measures that divide a ranked

29

Percentiles are the summery measures that divide a rankeddata set into 100 equal parts.

The kth percentile is denoted by Pk where k is an integer in theThe k percentile is denoted by Pk , where k is an integer in therange 1 to 99.

The approximate value of the kth percentile is

set data ranked ain term100

theof Value th

kknP ⎟

⎠⎞

⎜⎝⎛=

QM-120, M. Zainal

Page 30: Introduction to Business Statistics QM 120 Ch t Chapter 3 3 Slides.pdf · 2008. 3. 3. · Measures of central tendency for ungrouped data 8 Median ¾The median is the value of the

Measure of position30

Example: For the following data75 3 82 2 85 8 88 7 94 1 102 1 79 0 97 1 104 2 119 3 81 3 77 1

30

75.3    82.2    85.8    88.7    94.1    102.1    79.0    97.1    104.2    119.3    81.3    77.1 

Find the value of the 40th percentile.

Solution: 

QM-120, M. Zainal

Page 31: Introduction to Business Statistics QM 120 Ch t Chapter 3 3 Slides.pdf · 2008. 3. 3. · Measures of central tendency for ungrouped data 8 Median ¾The median is the value of the

Box-and-whisker plot31

Box and whisker plot gives a graphic presentation of datausing five measures:

31

gQ1, Q2, Q3, smallest, and largest values*.

Can help to visualize the center, the spread, and the skewnessCan help to visualize the center, the spread, and the skewnessof a data set.

Can help in detecting outliers.a e p i e e i g ou ie

Very good tool of comparing more than a distribution.

Detecting an outlier:Detecting an outlier:Lower fence: Q1 – 1.5(IQR)

Upper fence: Q + 1 5(IQR)Upper fence: Q3 + 1.5(IQR)

If a data point is larger than the upper fence or smaller than the lower fence, it is considered to be an outlier.lower fence, it is considered to be an outlier.

QM-120, M. Zainal

Page 32: Introduction to Business Statistics QM 120 Ch t Chapter 3 3 Slides.pdf · 2008. 3. 3. · Measures of central tendency for ungrouped data 8 Median ¾The median is the value of the

Box-and-whisker plot32

To construct a box plotDraw a horizontal line representing the scale of the measurements.

32

Calculate the median, the upper and lower quartiles, and the IQR for thedata set and mark them on the line.Form a box just above the line with the right and left ends at Q1 and Q3Form a box just above the line with the right and left ends at Q1 and Q3.Draw a vertical line through the box at the location of the median.Mark any outliers with an asterisk (*) on the graphMark any outliers with an asterisk ( ) on the graph.Extend horizontal lines called “Whiskers” from the ends of the box to thesmallest and largest observation that are not outliers.

Q1 Q2 Q3Lower Upper

**

Q1 Q2 Q3fence

ppfence

QM-120, M. Zainal

Page 33: Introduction to Business Statistics QM 120 Ch t Chapter 3 3 Slides.pdf · 2008. 3. 3. · Measures of central tendency for ungrouped data 8 Median ¾The median is the value of the

Box-and-whisker plot33

Example: Construct a box plot for the following data.340 300 470 340 320 290 260 330

33

340 300 470 340 320 290 260 330

QM-120, M. Zainal

Page 34: Introduction to Business Statistics QM 120 Ch t Chapter 3 3 Slides.pdf · 2008. 3. 3. · Measures of central tendency for ungrouped data 8 Median ¾The median is the value of the

Measures of association between two variables34

So far we have studied numerical methods to describe datawith one variable.

34

Often decision makers are interested in the relationshipbetween two variables.

To do so, we will use descriptive measure that is calledcovariance.

Covariance assigns a numerical value to the linear relationshipbetween two variables (see scatter diagram). It is given by

( )( )n

yyxx ii

−−−

= ∑xy 1S :covariance Sample

( )( )N

yxn

yixi μμσ

−−= ∑xy : covariance Population

1

QM-120, M. Zainal

Page 35: Introduction to Business Statistics QM 120 Ch t Chapter 3 3 Slides.pdf · 2008. 3. 3. · Measures of central tendency for ungrouped data 8 Median ¾The median is the value of the

Measures of association between two variables35

A big disadvantage of the covariance is that it depends on theunits of measurement for x and y.

35

y

For the same data set, we will have two different covariancevalues depending on the units (i.e. height in meters orp g ( gcentimeters will make a big difference).

Pearson’s correlation coefficient is a good remedy to that problemff g y pas it can go only from ‐1 to 1.

It is given by

:tcoefficienncorrelatioPopulation xy=σ

σ

,SS

Sr :t coefficienn correlatio Sample

:t coefficienn correlatioPopulation

xyxy

yxxy

=

σσσ

SS yxy

QM-120, M. Zainal

Page 36: Introduction to Business Statistics QM 120 Ch t Chapter 3 3 Slides.pdf · 2008. 3. 3. · Measures of central tendency for ungrouped data 8 Median ¾The median is the value of the

Measures of association between two variables36

Example: A golfer is interested in investigating the relationship,if any, between driving distance and 18‐hole score.

36

y, g

Average DrivingDistance (meters)

Average18‐Hole Score

277.6259.5269 1

697170269.1

267.0255.6

707071

272.9 69

QM-120, M. Zainal

Page 37: Introduction to Business Statistics QM 120 Ch t Chapter 3 3 Slides.pdf · 2008. 3. 3. · Measures of central tendency for ungrouped data 8 Median ¾The median is the value of the

Measures of association between two variables37

x y

37

277.6259 5

6971259.5

269.1267.0255 6

71707071255.6

272.97169

QM-120, M. Zainal

Page 38: Introduction to Business Statistics QM 120 Ch t Chapter 3 3 Slides.pdf · 2008. 3. 3. · Measures of central tendency for ungrouped data 8 Median ¾The median is the value of the

Measures of association between two variables38

Example: Find the covariance and Pearson’s correlation coefficient forthe following data

38

Week Number of commercials (x) Sales in $ (y)

1 2 50

2 5 572 5 57

3 1 41

4 3 54

5 4 54

6 1 38

7 5 63

8 3 48

9 4 59

10 2 4610 2 46

QM-120, M. Zainal