measures of dispersion - erho.weebly.com · measures of dispersion general classifications of...

18
1 Chapter 8 Measures of Dispersion Chapter 8. Measures of Dispersion Definition of Measures of Dispersion (page 231) A measure of dispersion is a descriptive summary measure that helps us characterize the data set in terms of how varied the observations are from each other. A small value indicates that the observations are not too different from each other; that is, there is a concentration of observations about the center of the distribution. On the other hand, a large value indicates that the observations are very different from each other or they are widely spread out from the center. The smallest possible value of a measure of dispersion should be 0. A zero measure should indicate the absence of variation.

Upload: others

Post on 11-Oct-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Measures of Dispersion - erho.weebly.com · Measures of Dispersion General Classifications of Measures of Dispersion (page 232) Measures of Absolute Dispersion A measure of absolute

1

Chapter 8

Measures of Dispersion

Chapter 8. Measures of Dispersion

Definition of Measures of Dispersion (page 231)

A measure of dispersion is a descriptive summary measurethat helps us characterize the data set in terms of how variedthe observations are from each other.

A small value indicates that the observations are not toodifferent from each other; that is, there is a concentration ofobservations about the center of the distribution. On the otherhand, a large value indicates that the observations are verydifferent from each other or they are widely spread out from thecenter. The smallest possible value of a measure of dispersionshould be 0. A zero measure should indicate the absence ofvariation.

Page 2: Measures of Dispersion - erho.weebly.com · Measures of Dispersion General Classifications of Measures of Dispersion (page 232) Measures of Absolute Dispersion A measure of absolute

2

Chapter 8. Measures of Dispersion

Illustration

A={98, 98, 99, 99, 99,100, 100,100, 100, 100, 100, 100, 101, 101, 101, 102, 102}B={20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180}

The means of both collections are equal to 100. Their medians are also equal to 100.

Which collection must have a higher measure of dispersion? For which collection is the mean a more reliable measure of central

tendency? (Reliable in the sense that if we repeatedly select an observation at random from the collection, its value is usually not too different from the mean.)

The measure of dispersion serves as a measure of the reliability of the mean or median as measures of central tendency.

Chapter 8. Measures of Dispersion

General Classifications of Measures of Dispersion (page 232)

Measures of Absolute DispersionA measure of absolute dispersion has the same unit as the observations. (Examples: range, interquartile range, standard deviation)

Measures of Relative DispersionA measure of relative dispersion has no unit and is therefore useful in comparing the variability of one distribution with another distribution. (Example: coefficient of variation)

Page 3: Measures of Dispersion - erho.weebly.com · Measures of Dispersion General Classifications of Measures of Dispersion (page 232) Measures of Absolute Dispersion A measure of absolute

3

Chapter 8. Measures of Dispersion

Definition of Range (page 232)

Definition 8.1The range is the distance between the maximum value and the minimum value.

Range = Maximum – Minimum

Sometimes the range is presented by stating the smallest and the largest values.

Chapter 8. Measures of Dispersion

Example 8.1 (page 233)

Given the weights of 5 rabbits (in pounds), find the range.

8 pounds 10 pounds 12 pounds 14 pounds 15 pounds lightest heaviest Solution: The maximum is 15 pounds and the minimum is 8 pounds. Thus, the range of the weights of the rabbits is Range = maximum – minimum = 15 - 8 = 7 pounds We can also say that the weights of the rabbits range from 8 to 15pounds.

Page 4: Measures of Dispersion - erho.weebly.com · Measures of Dispersion General Classifications of Measures of Dispersion (page 232) Measures of Absolute Dispersion A measure of absolute

4

Chapter 8. Measures of Dispersion

Approximating the Range from the FDT (pages 234-235)

Range = HCl LClUCL LCL

where UCLHCl = upper class limit of the last class LCLLCl = lower class limit of the first class Example 8.5: Age (in years) No. of women Lowest Class Interval 5 - 9 7 10 - 14 10 15 - 19 13 20 - 29 8 30 - 34 5 Highest Class Interval 35 - 39 3 64 Range = 39 - 5 = 34 years

Chapter 8. Measures of Dispersion

Characteristics of the Range (page 235)

It is a simple, easy-to-compute and easy-to-understand measure.

Weaknesses: It fails to communicate any information about the clustering

or the lack of clustering of the values in the middle of the distribution since it uses only the extreme values (minimum and maximum).

An outlier can greatly affect its value. It tends to be smaller for smaller collections than for larger

collections. It cannot be approximated from frequency distributions with

an open-ended class. It is not tractable mathematically.

Page 5: Measures of Dispersion - erho.weebly.com · Measures of Dispersion General Classifications of Measures of Dispersion (page 232) Measures of Absolute Dispersion A measure of absolute

5

Chapter 8. Measures of Dispersion

Definition of the Interquartile Range (IQR)

The interquartile range (IQR) is the difference between the third and first quartiles of the data set. That is,

IQR = Q3 – Q1

Chapter 8. Measures of Dispersion

Remarks about the IQR: The interquartile range reflects the variability of the

middle 50% of the observations in the array. The IQR may be viewed as the range for a trimmed

data set wherein the smallest 25% and the largest 25% of observations have been removed. This modified range addresses the weakness of the range’s sensitivity towards outliers.

A shortcoming of the IQR is that it could be 0 even if there is still some variation among the smallest 25% and largest 25% of all observations.

Page 6: Measures of Dispersion - erho.weebly.com · Measures of Dispersion General Classifications of Measures of Dispersion (page 232) Measures of Absolute Dispersion A measure of absolute

6

Chapter 8. Measures of Dispersion

Example

An economist studying the variation in family incomes in a community found that the first quartile income is P36,500 and the third quartile income is P120,000. Find the interquartile range.

IQR = Q3 – Q1 =120,000 - 36,500 = P83,500

Chapter 8. Measures of Dispersion

Definition of the Variance (page 237) Definition 8.2. The population variance is the mean of the squared deviations between each observed value and the mean.

Population Variance:

N

XN

ii

2

12

where Xi is the measure taken from the ith element of population is the population mean N is the population size

Sample Variance :

11

2

2

n

XXs

n

ii

where Xi is the measure taken from the ith element of sample X is the sample mean n is the sample size

Page 7: Measures of Dispersion - erho.weebly.com · Measures of Dispersion General Classifications of Measures of Dispersion (page 232) Measures of Absolute Dispersion A measure of absolute

7

Chapter 8. Measures of Dispersion

Remarks: (page 237)

The population variance is a parameter while the sample variance is a statistic. The squared difference of an observation from the mean gives us an idea on

how close this observation is to the mean. A large squared difference indicates that the observation and the mean are far from each other while a small squared difference indicates that the observation and the mean are close to each other. A small variance indicates that the observations are highly concentrated about the mean so that it is appropriate to use the mean to represent all of the values in the collection.

The sample variance is not the mean of the squared deviations of the observations from the mean. The denominator of the sample variance is not n (the size of the sample); rather, it is (n-1). The reason for using (n-1) as the divisor is that in Inferential Statistics, the corresponding statistic with n as the divisor tends to underestimate the population variance. Using the divisor (n-1) is used to make up for this tendency to underestimate.

The unit of the variance is the square of the units of measures in the data set. Thus, strictly speaking, the variance is not a measure of absolute dispersion. Often, it is desirable to return to the original units of measure and so it is the standard deviation that is presented.

Chapter 8. Measures of Dispersion

Definition of Standard Deviation (page 238)

The standard deviation is the positive square root of the variance.

The population standard deviation for a finite population with N elements, denoted by the Greek letter (lower case sigma) is:

.

N

XN

ii

1

2

The sample standard deviation for a sample with n elements, denoted by the letter s is:

11

2

n

XXs

n

ii

Page 8: Measures of Dispersion - erho.weebly.com · Measures of Dispersion General Classifications of Measures of Dispersion (page 232) Measures of Absolute Dispersion A measure of absolute

8

Chapter 8. Measures of Dispersion

Example 8.7 (pages 239-240)

Given the IQ of 7 students in the sample, compute for the sample standard deviation. Let Xi = weight of ith student in the sample, n = no.of students = 7. ( x = 107) iX 107iX 2107iX

100 -7 49 99 -8 64

110 3 9 105 -2 4 112 5 25 107 0 0 116 9 81

232107x 2i )(

s =

1-7

)107(7

1

2n

iiX 232 6.2

6

Chapter 8. Measures of Dispersion

Computational Formula of the Variance (page 241)

If the mean is a rounded figure then the propagation of rounding errors is very fast when we use the definitional formula to compute the variance. We can avoid this by using the following computational formula for the variance:

Population Variance:

22

1 122

N N

i ii i

N X X

N

Sample Variance:

22

i1 12s

( 1)

n n

ii i

n X X

n n

Page 9: Measures of Dispersion - erho.weebly.com · Measures of Dispersion General Classifications of Measures of Dispersion (page 232) Measures of Absolute Dispersion A measure of absolute

9

Chapter 8. Measures of Dispersion

Proof

2 2 2 2 2

2 1 1 1 1( ) ( 2 ) 2

N N N N

i i i i ii i i i

X X X X X N

N N N

2

2 1

2 2 2 2 1

1 1

2

N

iNi

iN Ni

i ii i

XX N

NX N N X N

N N N

2 22 2

22 1 1 1

1 112

N N N

i N NN i ii i i

i iii ii

X N X XN X XX

N NN N N

Chapter 8. Measures of Dispersion

Example (page 242)

Using the same data set on the IQ of 7 students in the sample, we compute the sample standard deviation using the computational formula. iX

2iX

100 10000 99 9801

110 12100 105 11025 112 12544 107 11449 116 13456

7

1

749n

iiX

7

1

2 80375n

iiX

27 72

21 1

77(80375) (749) 6.2

7(7 1) 7(6)

n n

i ii iX X

s

Page 10: Measures of Dispersion - erho.weebly.com · Measures of Dispersion General Classifications of Measures of Dispersion (page 232) Measures of Absolute Dispersion A measure of absolute

10

Chapter 8. Measures of Dispersion

Mathematical Properties of the Standard Deviation (page 247)

Property 1: If each observation of a set of data is transformed by the addition (or subtraction) of a constant c to each observation, the standard deviation of the new set of data is the same as the standard deviation of the original data set.

Proof: Original Data = {X1, X2,…, Xn} Transformed Data = {Y1, Y2,…, Yn} where Yi = Xi + c

22

1 1

( ) ( ) ( )

1 1

n n

i ii i

Y

Y Y X c X cs

n n

2

1( )

1

n

ii

X

X Xs

n

Chapter 8. Measures of Dispersion

Mathematical Properties of the Standard Deviation (page 248)

Property 2: If each observation of a set of data is transformed by the multiplication (or division) of a constant c to each observation, the standard deviation of the new set of data is equal to the standard deviation of the original data multiplied (or divided) by |c|.

Proof: Original Data = {X1, X2,…, Xn} Transformed Data = {Y1, Y2,…, Yn} where Yi = cXi

22

1 1( ) ( ) ( )

1 1

n n

i ii i

Y

Y Y cX cXs

n n

2 2 2

2 21 1( ) ( )

| |1 1

n n

i ii i

x x

c X X c X Xc s c s

n n

Page 11: Measures of Dispersion - erho.weebly.com · Measures of Dispersion General Classifications of Measures of Dispersion (page 232) Measures of Absolute Dispersion A measure of absolute

11

Example

The weights (in milligrams) of ants in a sample are as follows:Sample data = {3, 8, 35, 1, 50}

Its mean is 19.4 mg. and its standard deviation is 21.892 mg.

If each measurement is converted to grams (1000 mg=1 g), what will be the new mean? the standard deviation?

If each ant gained 2 mg. in weight, what will be the new mean? the standard deviation?

Chapter 8. Measures of Dispersion

Chapter 8. Measures of Dispersion

Characteristics of the Standard Deviation(page 247)

It uses every observation in its computation. It may be distorted by outliers. This is because squaring

large deviations from the mean will give more weight to these outliers.

It is amenable to algebraic treatment. It is always nonnegative. A value of 0 implies the absence of

variation. Level of measurement must at least be interval for the

standard deviation to be interpretable.

Page 12: Measures of Dispersion - erho.weebly.com · Measures of Dispersion General Classifications of Measures of Dispersion (page 232) Measures of Absolute Dispersion A measure of absolute

12

Chapter 8. Measures of Dispersion

Approximating the Variance from FDT (page 243-244)

Population Variance:

222

1 12 i 12

( )k kk

i i i ii ii i

N f X f Xf X

N N

Sample Variance:

2k22

1 12 i 1

( )

1 ( 1)

k k

i i i ii ii i

n f X f Xf X Xs

n n n

where iX = class mark of ith class interval fi = frequency of ith class interval k = number of classes N = number of observations in the population n = number of observations in the sample = population mean X = sample mean

Chapter 8. Measures of Dispersion

Example Given the frequency distribution of daily income in pesos received by a sample of 85 women, approximate the mean and standard deviation. Class Limits fi Xi fiXi fiXi

2

200 – 249 4 224.5 898.0 201,601.00 250 - 299 10 274.5 2,745.0 753,502.50 300 - 349 14 324.5 4,543.0 1,474,203.50 350 - 399 25 374.5 9,362.5 3,506,256.25 400 - 449 13 424.5 5,518.5 2,342,603.25 450 - 499 10 474.5 4,745.0 2,251,502.50 500 - 550 9 524.5 4,720.5 2,475,902.25 TOTALS 85 32,532.5 13,005,571.25

7

17

1

32,532.5 382.73585

i ii

ii

f XX

f

27 7

22

1 1 (85)(13,005,571.25) (32,532.5) 81.23( 1) (85)(84)

i i i ii i

n f X f Xs pesos

n n

Page 13: Measures of Dispersion - erho.weebly.com · Measures of Dispersion General Classifications of Measures of Dispersion (page 232) Measures of Absolute Dispersion A measure of absolute

13

Bianayme-Chebyshev Rule (page 249)

Chapter 8. Measures of Dispersion

The percentage of observed values that fall within distances of k standard deviations below and above the mean must be at least 100%1 2k

1 , whatsoever the shape of the data distribution. The

value k is any number greater than 1.

The Bienayme-Chebyshev rule gives us an idea how to characterize the data set using the standard deviation. We now state this rule for k = 2, 3, and 4 as follows:

At least 2

1 1 31 100% 1 100% 100% 75%2 4 4

of the observed values fall within distances

of 2 standard deviations below and above the mean.

At least 88.89%100%311 2

of the observed values fall within distances of 3 standard

deviations below and above the mean.

At least 93.75%100%4112

of the observed values fall within distances of 4 standard

deviations below and above the mean.

Example 8.15 (page 249)

Chapter 8. Measures of Dispersion

Suppose you have information that the mean weight of all female employees in a manufacturing industry is 120 pounds and the standard deviation of the weights is 8 pounds. Use the Bienayme-Chebyshev rule to determine the interval containing at least 75% of all the measures in the population. Solution: We solve for k from the equation, 0.75 = 1- 2

1k

. We find that k=2 so that the interval

we are looking for is: 2 = 120 (2)(8) = 120 16. That is, at least 75% of all female employees have weights ranging from 104 to 136 lbs.

Question: If the standard deviation had been smaller, say 4 lbs, which interval will contain at least 75% of all the measures in the population?

Page 14: Measures of Dispersion - erho.weebly.com · Measures of Dispersion General Classifications of Measures of Dispersion (page 232) Measures of Absolute Dispersion A measure of absolute

14

Chapter 8. Measures of Dispersion

Comparing the Variation of Observations of 2 or More Distributions

Consider the following sample of weights of ants in milligrams={3, 8, 35, 1, 50}. Its standard deviation is 21.892 mg.

This time consider this sample of weights of elephants in grams={6000000, 5999999, 5999998, 6000001, 6000002}. Its standard deviation is 1.581 grams.

Can we use the standard deviations of the two collections to compare the variation of the observations of these two collections?

We cannot use measures of absolute dispersion to compare the variation of the observations of two or more collections when (i) the units are different or (ii) the means are very different from each other.

Chapter 8. Measures of Dispersion

Definition of Measures of Relative Dispersion

Measures of relative dispersion are measures of dispersion that have no unit of measurement and are used to compare the scatter of one distribution with the scatter of another distribution.

Page 15: Measures of Dispersion - erho.weebly.com · Measures of Dispersion General Classifications of Measures of Dispersion (page 232) Measures of Absolute Dispersion A measure of absolute

15

Chapter 8. Measures of Dispersion

Definition of Coefficient of Variation (page 253)

The coefficient of variation (CV) is a measure of relative dispersion and is defined as:

population CV = %100x

where is the population standard deviation is the population mean

sample CV = %100xs x

where s is the sample standard deviation x is the sample mean Note: The coefficient of variation describes the standard deviation as a percentage of the mean. (Example: CV=10% indicates that the standard deviation is 10% of the mean). Consequently, the CV is not interpretable when the mean is negative and is undefined when the mean is 0.

Chapter 8. Measures of Dispersion

Example 8.17b (page 254)

Suppose we get the prices of an 80-gram pack of a certain brand of cracker nuts at 20 different grocery stores. The mean price of the 20 packs of cracker nuts is P9.50 with a standard deviation of P0.6. On the other hand, the weights of the 20 packs of cracker nuts have a mean content of 82 grams with a standard deviation of 3.5 grams. Can we say that prices are more variable than weight?

Solution:

CVprice = 0.6 100 6.32%9.5

x CVweight = 3.5 100 4.27%82

x

Page 16: Measures of Dispersion - erho.weebly.com · Measures of Dispersion General Classifications of Measures of Dispersion (page 232) Measures of Absolute Dispersion A measure of absolute

16

Chapter 8. Measures of Dispersion

Example 8.17c (page 255)

The foreign exchange rate is an indicator of the stability of the peso and an indicator of the economic performance. The level of the peso is dependent on the market forces and not on government policy. The government intervenes only through the Bangko Sentral ng Pilipinas when there are speculative elements in the market. Given below are the means and standard deviations of the quarterly P-$ exchange rate for the periods 1998 to 1999 and 2000 to 2001. Which of the two periods is the peso more stable? 1998-1999 2000-2001 Mean P40.4 P48.6 Standard Deviation P2.01 P1.21 Solution:

CV98-99 = 4.98%100%x40.42.01

CV00-01 = 2.49%100%x48.61.21

Chapter 8. Measures of Dispersion

Another important measure: The Standard Score (page 250)

The standard score or z-score indicates the relative position of an observation in the collection where the observed value came from It is used to compare two values from different collections that

(1) differ with respect to X or s, or both, or (2) are expressed in different units. It is also used to identify possible outliers. As a rule, if the

|standard score| > 3, then it is marked as a possible outlier. When all the observations in a collection are standardized then the mean and standard deviation of this collection of standard scores are 0 and 1, respectively.

Page 17: Measures of Dispersion - erho.weebly.com · Measures of Dispersion General Classifications of Measures of Dispersion (page 232) Measures of Absolute Dispersion A measure of absolute

17

Chapter 8. Measures of Dispersion

Definition of Standard Score (page 250)

Definition 8.4. The standard score or z-score measures how many standard deviations an observed value is above or below the mean.

Population Z-score = -X

where is the population mean is the population standard deviation

Sample Z-score = sX - X

where x is the sample mean s is the sample standard deviation

Chapter 8. Measures of Dispersion

Remarks (page 250)

A positive z-score measures the number of standard deviations an observation is above the mean, and a negative z-score measures the number of standard deviations an observation is below the mean. A z-score of 0 means that the observation is equal to the mean.

The z-score has no unit which makes it possible to compare the z-scores computed using different collections.

Page 18: Measures of Dispersion - erho.weebly.com · Measures of Dispersion General Classifications of Measures of Dispersion (page 232) Measures of Absolute Dispersion A measure of absolute

18

Chapter 8. Measures of Dispersion

Example 8.16 (page 250-251)

The mean grade in Statistics 101 is 70% and the standard deviation is 10%; whereas in Math 17, the mean grade is 80% and the standard deviation is 20%. a) Mark got a grade of 75% in Statistics 101 and a grade of 90% in Math 17. In which subject did Mark

perform better if we consider the grades of the other students in the two subjects?

Solution: 10175 70 0.5

10Statz , 17

90 80 0.520Mathz

If we consider the grades of the other students in the two subjects, Mark’s score in Stat 101 is just as good as his score in Math 17. Based on the z scores, Mark’s scores in both subjects are 0.5 standard deviations above their respective mean scores. b) Peter got a grade of 70% in both Statistics 101 and Math 17. In which subject did Peter perform better if

we consider the grades of the other students in the two subjects?

Solution: 10170 70 0

10Statz , 17

70 80 0.520Mathz

Peter relatively performed better in Statistics 101. Based on the z scores, Peter’s score in Stat 101 is equal to the mean score in Stat 101, while his score in Math 17 is 0.5 standard deviations below the mean. c) Paul got a grade of 100% in in Stat 101. Compute for the z score and interpret.

Solution: 100 70 3

10z

Paul’s score is above the mean. Its distance from the mean is thrice the size of the standard deviation. We may consider this an unusually high score when compared with other Stat 101 grades.

Chapter 8. Measures of Dispersion

Assignment

1. Using the data in Exercise no. 2 page 251, compute for the following:a) Meanb) Rangec) Standard deviation using the standard deviation mode of you calculatord) Standard deviation using the computational formula (show solution)e) coefficient of variationf) Use the Bianayme-Chebyshev rule to identify an interval that will enclose at least 75%

of the observations.g) What percentage of observations are actually contained within 2 standard deviations

from the mean.2. Using the fdt in Exercise no. 4 page 252, approximate the following:

a) Meanb) Rangec) Standard deviation using the computational formula (show all important steps)d) coefficient of variation

3. Answer Exercise no. 6, page 252. 4. Answer Exercise no. 1, page 255. Justify your answer with the appropriate statistics.