3.1 measures of central tendency · 2011. 12. 30. · 3.1 measures of central tendency •summation...

25

Upload: others

Post on 23-Sep-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 3.1 Measures of Central Tendency · 2011. 12. 30. · 3.1 Measures of Central Tendency •Summation Notation Xn i=1 xi or P x – Sum observation on the variable that appears to the
Page 2: 3.1 Measures of Central Tendency · 2011. 12. 30. · 3.1 Measures of Central Tendency •Summation Notation Xn i=1 xi or P x – Sum observation on the variable that appears to the

3.1 Measures of Central Tendency

• Summation Notationn∑

i=1

xi or∑

x – Sum observation on the variable that appears to the

right of the summation symbol.

Example 1 Suppose the variable xi is used to represent the number of bathroom in a given res-

idence. Five residential properties are examined, and the value of xi is recorded for each. The

observations are x1 = 2, x2 = 1, x3 = 3, x4 = 2, and x5 = 3.

(a) Find5∑

i=1

xi and5∑

i=1

x2i .

(b) Find5∑

i=1

(xi − 3) and5∑

i=1

(xi − 3)2 .

Solution: (a) The symbol5∑

i=1

xi tells us to sum the xi values in the data set. Therefore,

5∑

i=1

xi = x1 + x2 + x3 + x4 + x5 = 2 + 1 + 3 + 2 + 3 = 11

5∑

i=1

x2i = x2

1 + x22 + x2

3 + x24 + x2

5 = 22 + 12 + 32 + 22 + 32 = 27.

(b) The symbol5∑

i=1

(xi − 3) tells us to subtract 3 from each xi value and then sum Therefore,

5∑

i=1

(xi − 3) = (x1 − 3) + (x2 − 3) + (x3 − 3) + (x4 − 3) + (x5 − 3)

= (2− 3) + (1− 3) + (3− 3) + (2− 3) + (3− 3)

= (−1) + (−2) + 0 + (−1) + 0 = −4

5∑

i=1

(xi − 3)2 = (x1 − 3)2 + (x2 − 3)2 + (x3 − 3)2 + (x4 − 3)2 + (x5 − 3)2

= (2− 3)2 + (1− 3)2 + (3− 3)2 + (2− 3)2 + (3− 3)2

= (−1)2 + (−2)2 + (0)2 + (−1)2 + (0)2 = 6.

2

Page 3: 3.1 Measures of Central Tendency · 2011. 12. 30. · 3.1 Measures of Central Tendency •Summation Notation Xn i=1 xi or P x – Sum observation on the variable that appears to the

Example 2 Find∑

xy if the variables xi, yi are given by

i 1 2 3 4 5

xi −1 0 1 4 6

yi 3 8 2 3 6

Solution:

xy =∑

xiyi = x1y1 + x2y2 + x3y3 + x4y4 + x5y5

= −1× 3 + 0× 8 + 1× 2 + 4× 3 + 6× 6 = 47

3.1.1 Mean

Definition 1 The mean of a set of measurements is defined to be the sum of the measurements

divided by the total number of measurements.

• Mean for ungrouped data

The mean for ungrouped data is obtained by dividing the sum of all values by the number of values

in the data set. Thus,

(a) Mean for population data µ =x1 + x2 + x3 + . . . + xN

N=

∑Ni=1 xi

N

(b) Mean for sample data x̄ =x1 + x2 + x3 + . . . + xn

n=

∑ni=1 xi

n

where∑

xi is the sum of all values, N is the population size, n is the sample size, µ is the population

mean, x̄ is the sample mean.

Example 3 The monthly starting salaries for a sample of 12 Business School Graduates are as

follows:

Monthly Monthly Monthly Monthly

Graduate Salary Graduate Salary Graduate Salary Graduate Salary

1 2050 4 2080 7 2090 10 2525

2 2150 5 1955 8 2330 11 2120

3 2250 6 1910 9 2140 12 2080

3

Page 4: 3.1 Measures of Central Tendency · 2011. 12. 30. · 3.1 Measures of Central Tendency •Summation Notation Xn i=1 xi or P x – Sum observation on the variable that appears to the

Solution: The sample mean is

x̄ =x1 + x2 + x3 + . . . + xn

n=

2050 + 2150 + . . . + 2080

12=

25, 680

12= 2140.

• Mean for grouped data

(a) Mean for population data µ =

∑fimi

N

(b) Mean for sample data x̄ =

∑fimi

n

where fi denote the frequency of class i and mi denote the midpoint of class i.

Example 4 Recall the frequency distribution in Example 2 (Chapter 2). Find the mean.

Audit Time Frequency Class Midpoint

(Days) fi mi fimi

10-14 4 12 48

15-19 8 17 136

20-24 5 22 110

25-29 2 27 54

30-34 1 32 32∑

fi = 20∑

fimi = 380

Hence, the sample mean is

x̄ =

∑fimi

∑fi

=380

20= 19.

3.1.2 Weighted Mean

Let X1, X2, . . . , XN be a set of N values, and let w1, w2, . . . , wN be the weight assigned to them.

The weighted mean is found by dividing the sum of the products of the values and their weights

by the sum of the weights.

µWeighted =w1X1 + w2X2 + · · ·+ wNXN

w1 + w2 + · · ·+ wN

=

∑wiXi

∑wi

4

Page 5: 3.1 Measures of Central Tendency · 2011. 12. 30. · 3.1 Measures of Central Tendency •Summation Notation Xn i=1 xi or P x – Sum observation on the variable that appears to the

Example 5 It is decided that six observations, 12, 20, 17, 5, 9 and 22, should be given the weights

10, 4, 6, 18, 16 and 3 respectively. What is the weighted mean?

Solution: The weighted mean is calculated as

∑wiXi

∑wi

=(10× 12) + (4× 20) + (6× 17) + (18× 5) + (16× 9) + (3× 22)

10 + 4 + 6 + 18 + 16 + 3

=602

57= 10.6

Example 6 A mathematics class is divided into two sections, both of which are given the same

test. Section 1 (41 students) has a mean score of 62; and section 2 (52 students) has a mean score

of 68. Find the mean of the whole class correct to two decimal places.

Solution: Since w1 = 41, X1 = 62, w2 = 52, X2 = 68, we find

x̄ =w1X1 + w2X2

w1 + w2

=(41) (62) + (52) (68)

41 + 52=

6078

93= 65.36

Example 7 The examination results of an AD student are listed as follows:

Subject Credit Grade Grade Point

I 3 B+ 3.5

II 3 B 3

III 1 A+ 4.5

IV 4 D+ 1.5

V 5 D 1

Hence, the GPA of the student is calculated as follows

GPA =3.5× 3 + 3× 3 + 4.5× 1 + 1.5× 4 + 1× 5

3 + 3 + 1 + 4 + 5= 2.1875 ≈ 2.19

3.1.3 Geometric Mean

The Geometric mean of a set of values is defined as the nth-root of the product of the n values.

The geometric mean of X1, X2, . . . , Xn is given by

X̄G = n

√X1 ×X2 × · · ·Xn = (X1 ×X2 × · · ·Xn)1/n

5

Page 6: 3.1 Measures of Central Tendency · 2011. 12. 30. · 3.1 Measures of Central Tendency •Summation Notation Xn i=1 xi or P x – Sum observation on the variable that appears to the

Example 8 Find the geometric mean for the following set of data

8, 18, 24, 36, 64.

Solution: By the formula, the geometric mean is

5√

8× 18× 24× 36× 64 =5√

7962 624 = 24

Application – Geometric Mean Rate of Return

R̄G = [(1 + R1)× (1 + R2)× · · · × (1 + Rn)]1/n − 1

where Ri is the rate of return in time period i.

Example 9 The price of a stock has risen by 6%, 13%, 11% and 15% in each of 4 successive years,

find the average percentage risen in the price of the stock.

Solution:

The geometric mean is given by

[(1 + 0.06)× (1 + 0.13)× (1 + 0.11)× (1 + 0.15)]1/4 − 1 = (1.5290)1/4 − 1 = 1.112− 1

The average rise = 0.112 = 11.2%.

This value of 11.2% can be translated as the constant increase necessary each year to produce the

final year price, given the starting price.

3.1.4 Median

Definition 2 The median of a set measurements is defined to be the middle value when the mea-

surements are arranged from lowest to highest.

Median for ungrouped data

• If there is an odd number of items, the median is the value of the middle item when all items

are arranged in ascending order.

• If there is an even number of items, the median is the average value of the two middle items

when all items are arranged in ascending order.

6

Page 7: 3.1 Measures of Central Tendency · 2011. 12. 30. · 3.1 Measures of Central Tendency •Summation Notation Xn i=1 xi or P x – Sum observation on the variable that appears to the

Example 10 Find the median of the given data: 54 42 46 32 46.

Solution: Arrange the data in ascending order

32 42 46︸︷︷︸

Median

46 54.

Hence, the median is the 3rd term 46.

Example 11 Compute the median for Example 3.

Solution: Arranging the 12 items in ascending order

1910 1955 2050 2080 2080 2090 2120︸ ︷︷ ︸

Median= 2090+2120

2=2105

2140 2150 2250 2330 2525

Median for grouped data

For grouped data, the median can be found by first identify the class containing the median, then

apply the following formula:

median = l1 +

(n/2− C

fm

)

(l2 − l1)

where: l1 is the lower class boundary of the median class;

n is the total frequency;

C is the cumulative frequency just before the median class;

fm is the frequency of the median;

l2 is the upper class boundary containing the median.

It is obvious that the median is affected by the total number of data but is independent of extreme

values. However if the data is ungrouped and numerous, finding the median is tedious. Note that

median may be applied in qualitative data if they can be ranked.

3.1.5 Mode

Definition 3 The mode of a set of measurements is defined to be the measurements that occurs

most often(with highest frequency). The mode may not exist, and even if it does exist it may not be

unique. A distribution having only one mode is called unimodal.

7

Page 8: 3.1 Measures of Central Tendency · 2011. 12. 30. · 3.1 Measures of Central Tendency •Summation Notation Xn i=1 xi or P x – Sum observation on the variable that appears to the

Example 12 Refer to Example 5 (Chapter 2). Find the mode.

Automobile Purchase frequency

Chevrolet Cavalier 9

Ford Escort 14

Toyota Echo 8

Honda Accord 11

Hyundai Excel 8

Total 50

The mode is the Ford Escort.

For grouped data, the mode can be found by first identify the largest frequency of that class, called

modal class, then apply the following formula on the modal class:

mode = l1 +

(fa

fa + fb

)

(l2 − l1)

where: l1 is the lower class boundary of the modal class;

fa is the difference of the frequencies of the modal class with the previous class

and is always positive;

fb is the difference of the frequencies of the modal class with the following class

and is always positive;

l2 is the upper class boundary of the modal class.

3.1.6 Characteristics of each measure of Central Tendency

• Mode

1. It is the most frequent or probable measurement in the data set.

2. There can be more than one mode for a data set.

3. It is not influenced by the extreme measurements.

4. Modes of subsets cannot be combined to determine the mode of the complete data set.

5. For group data, its value can change depending on the class used.

6. It is applicable for both qualitative and quantitative data.

8

Page 9: 3.1 Measures of Central Tendency · 2011. 12. 30. · 3.1 Measures of Central Tendency •Summation Notation Xn i=1 xi or P x – Sum observation on the variable that appears to the

• Median

1. It is the central value 50% of the measurements lie above it and 50% fall below it.

2. There is only one median for a data set.

3. Medians of subsets cannot be combined to determine the median of the complete data set.

4. For grouped data, its value is rather stable even when the data are organized into different

class.

5. It is applicable to quantitative data only.

• Mean

1. It is the average of the measurements in a data set.

2. There is only one mean for a data set.

3. Its value is influenced by extreme measurements, trimming can help to reduce the degree of

influence.

4. Means of subsets can be combined to determine the mean of the complete data set.

5. It is applicable to quantitative data only.

3.1.7 Percentiles

Definition 4 The pth percentile is a value such that at least p percent of the items take on this

value or less and at least (100− p) percent of the items take on this value or more.

• Calculating the pth Percentile

Step 1. Arrange the data in ascending order (rank order from smallest value to largest value).

Step 2. Compute an index i as follows:

i =( p

100

)

n

where p is the percentile of interest and n is the number of items.

9

Page 10: 3.1 Measures of Central Tendency · 2011. 12. 30. · 3.1 Measures of Central Tendency •Summation Notation Xn i=1 xi or P x – Sum observation on the variable that appears to the

Step 3.

(a) If i is not an integer, round up. The next integer value greater than i denotes the position of

the pth percentile.

(b) If i is an integer, the pth percentile is the average of the data values in positions i and i + 1.

Example 13 Determine the 85th percentile for Example 3.

Solution: Step 1. Arrange the 12 data values in ascending order.

Step 2.

i =( p

100

)

n =

(85

100

)

12 = 10.2

Step 3. Since i is not an integer, round up. The position of the 85th percentile is the next integer

greater than 10.2, the 11th position.

3.1.8 Quartiles

Definition 5 The 25th, 50th, and 75th percentiles of the data referred to as the first quartile, the

second quartile, and third quartile, respectively. The quartiles can be used to divide the data

into four parts, with each part containing approximately 25% of the data.

Q1 = first quartile

Q2 = second quartile (Median)

Q3 = third quartile

Example 14 Find the quartiles for the given set of data.

−6.1, − 2.8, − 1.2, − 0.7, 4.3, 5.5, 5.9, 6.5, 7.6, 8.3, 9.6, 9.8, 12.9, 13.1, 18.5

Solution: For Q1,

i =25

100× n =

1

4× 15 = 3.75

Round up= 4

Hence, Q1 = Observation at 4th position = −0.7.

For Q3,

i =75

100× n =

3

4× 15 = 11.25

Round up= 12

Hence, Q3 = Observation at 12th position = 9.8.

Example 15 Find the Q1 and Q3 for the Example 3. The ordered data are given as follows:

1910 1955 2050 2080 2080 2090 2120 2140 2150 2250 2330 2525

10

Page 11: 3.1 Measures of Central Tendency · 2011. 12. 30. · 3.1 Measures of Central Tendency •Summation Notation Xn i=1 xi or P x – Sum observation on the variable that appears to the

Solution: For Q1,

i =25

100× n =

1

4× 12 = 3←− integer

Hence, Q1 = average of observation at 3rd position and 4th position = 2050+20802

= 2065.

For Q3,

i =75

100× n =

3

4× 12 = 9←− integer

Hence, Q3 = average of observation at 9th position and 10th position = 2150+22502

= 2200.

3.2 Measures of Variation

3.2.1 Range

Definition 6 The range is the difference between the largest and smallest data values.

3.2.2 Interquartile Range

Definition 7 The interquartile Range (IQR) is the difference between the third and first quar-

tiles in a set of data.

IQR = Q3 −Q1

Example 16 Refer to Example 15. The range = 2525−1910 = 615 and the IQR = 2200−2065 =

135.

3.2.3 Variance and Standard Deviation

Definition 8 The variance is the average of the squared differences between each of the observa-

tions in a set of data and the mean. The standard deviation is the positive square root of the

variance.

• For ungrouped data

11

Page 12: 3.1 Measures of Central Tendency · 2011. 12. 30. · 3.1 Measures of Central Tendency •Summation Notation Xn i=1 xi or P x – Sum observation on the variable that appears to the

Variance Standard Deviation

(a) Population σ2 =

∑Ni=1(xi − µ)2

Nσ =√

σ2 =

√∑Ni=1(xi − µ)2

N

(b) Sample s2 =

∑ni=1(xi − x̄)2

n− 1s =√

s2 =

√∑ni=1(xi − x̄)2

n− 1

• For grouped data

Variance Standard Deviation

(a) Population σ2 =

∑fi(mi − µ)2

∑fi

σ =√

σ2 =

√∑

fi(mi − µ)2

∑fi

(b) Sample s2 =

∑fi(mi − x̄)2

∑fi − 1

s =√

s2 =

√∑

fi(mi − x̄)2

∑fi − 1

• Alternative Formula:

(a) Ungrouped data: σ2 =

∑Ni=1 x2

i −N × µ2

Nand s2 =

∑ni=1 x2

i − n× x̄2

n− 1

(b) Grouped data σ2 =

∑Ni=1 fim

2i −N × µ2

Nand s2 =

∑ni=1 fim

2i − n× x̄2

n− 1

Example 17 Find the variance and standard deviation for the data given in Example 3.

Solution:

12

Page 13: 3.1 Measures of Central Tendency · 2011. 12. 30. · 3.1 Measures of Central Tendency •Summation Notation Xn i=1 xi or P x – Sum observation on the variable that appears to the

Sample Deviation Squared Deviation

Monthly Mean About the Mean About the Mean

Salary (xi) (x̄) (xi − x̄) (xi − x̄)2

2050 2140 −90 8100

2150 2140 10 100

2250 2140 110 12100

2080 2140 −60 3600

1955 2140 −185 34225

1910 2140 −230 52900

2090 2140 −50 2500

2330 2140 190 3600

2140 2140 0 0

2525 2140 385 148225

2120 2140 −20 400

2080 2140 −60 3600∑

(xi − x̄) = 0∑

(xi − x̄)2 = 301850

Thus, the variance is

s2 =

∑12i=1(xi − x̄)2

n− 1=

301850

11= 27440.91

and the standard deviation is s =√

s2 =√

27440.91 = 165.7.

Example 18 Redo the last example by using the alternative formula.

Solution:

12∑

i=1

xi = 2050 + 2150 + · · ·+ 2080 = 25680

12∑

i=1

x2i = (2050)2 + (2150)2 + · · ·+ (2080)2 = 55257050

x̄ =

∑12i=1 xi

n=

25680

12= 2140

s2 =

∑12i=1 x2

i − n (x̄)2

n− 1=

55257050− 12 (2140)2

12− 1= 27440.91

s =√

27440.91 = 165.7

13

Page 14: 3.1 Measures of Central Tendency · 2011. 12. 30. · 3.1 Measures of Central Tendency •Summation Notation Xn i=1 xi or P x – Sum observation on the variable that appears to the

Monthly Salary (xi) x2i

2050 20502 = 4202 500

2150 25102 = 6300100...

...

2120 21202 = 4494 400

2080 20802 = 4326 400∑

xi = 25680∑

x2i = 55257050

Example 19 Find the variance and standard deviation for the data given in Example 4.

Solution: Recall that the mean is x̄ = 19. See Example 4.

Audit Class Squared

Time Frequency Midpoint Deviation Deviation

(Days) fi mi (mi − x̄) (mi − x̄)2 fi(mi − x̄)2

10-14 4 12 −7 49 196

15-19 8 17 −2 4 32

20-24 5 22 3 9 45

25-29 2 27 8 64 128

30-34 1 32 13 169 169

570 ←∑

fi(mi − x̄)2

The variance is

s2 =

∑fi(mi − x̄)2

∑fi − 1

=570

19= 30

and the standard deviation is

s =√

s2 =√

30 = 5.477 226

Example 20 Redo the last example by using the alternative formula.

Solution: Recall that the mean is x̄ = 19. See Example 4.

14

Page 15: 3.1 Measures of Central Tendency · 2011. 12. 30. · 3.1 Measures of Central Tendency •Summation Notation Xn i=1 xi or P x – Sum observation on the variable that appears to the

Audit Class

Time Frequency Midpoint

(Days) fi mi fim2i

10-14 4 12 576

15-19 8 17 2312

20-24 5 22 2420

25-29 2 27 1458

30-34 1 32 1024

7790 ←∑

fim2i

The variance is

s2 =

∑fim

2i − n(x̄)2

∑fi − 1

=7790− (20) (192)

20− 1=

570

19= 30

and the standard deviation is

s =√

s2 =√

30 = 5.477 226

3.3 Coefficient of Variation

The Coefficient of Variation is defined as follows

CV =Standard Deviation

Mean× 100%

When to use CV.

1. The data are in different units.

2. The data are in the same units, but the means are apart.

Example 21 A study of the test scores for an in-plant course in management principles and the

years of service of the employees enrolled in the course resulted in these statistics: The mean score

was 200; the standard deviation was 40. The mean number of years of service was 20 years; the

standard deviation was 2 years. Compare the relative dispersion in the two distributions using the

coefficient of variation.

15

Page 16: 3.1 Measures of Central Tendency · 2011. 12. 30. · 3.1 Measures of Central Tendency •Summation Notation Xn i=1 xi or P x – Sum observation on the variable that appears to the

Solution: The distributions are in different units (test scores and years of service). Therefore, they

are converted to coefficients of variation.

For the test scores: For years of service:

CV =s

x̄× (100) CV =

s

x̄× (100)

=40

200× (100) =

2

20× (100)

= 20 percent = 10 percent

Interpreting, there is more dispersion relative to the mean in the distribution of test scores compared

with the distribution of years of service (because 20 > 10 percent).

The same procedure is used when the data are in the same units but the means are far apart. (See

the following example.)

Example 22 The variation in the annual incomes of executives is to be compared with the variation

in incomes of unskilled employees. For a sample of executives, x̄ = $500, 000 and s = $50, 000. For

a sample of unskilled employees, x̄ = $22, 000, and s = $2, 200. We are tempted to say that there is

more dispersion in the annual incomes of the executives because $50, 000 > $2, 200. The means are

so far apart, however, that we need to convert the statistics to coefficients of variation to make a

meaningful comparison of the variation in annual incomes.

Solution:

For the executives: For the unskilled employees:

CV =s

x̄× (100) CV =

s

x̄× (100)

=$50, 000

$500, 000× (100) =

$2, 200

$22, 000× (100)

= 10 percent = 10 percent

There is no difference in the relative dispersion of the two groups.

3.4 Shape

A important property of a set of data is its shape–the manner in which the data are distribution.

Either the distribution of the data is symmetrical or it not. If the distribution of data is not

symmetrical, it is called asymmetrical or skewed.

Mean > Median: positive or right-skewness

Mean = Median: symmetry or zero-skewness

Mean < Median: negative or left-skewness

16

Page 17: 3.1 Measures of Central Tendency · 2011. 12. 30. · 3.1 Measures of Central Tendency •Summation Notation Xn i=1 xi or P x – Sum observation on the variable that appears to the

The skewness is an abstract quantity which shows how data piled-up. A number of measures have

been suggested to determine the skewness of a given distribution. One of the simplest one is known

as Pearson’s measure of skewness:

Skewness =Mean - Mode

Standard Deviation

≈3(Mean - Median)

Standard Deviation

Positive skewness arises when the mean is increased by some unusually high values; negative skew-

ness occurs when the mean is reduced by some extremely low values. Data are symmetrical when

there is no really extreme values in a particular direction so that low and high values balance each

other out.

For data sets that are extremely skewed, be wary of using the mean as a measure of the “center” of

distribution. In this situation, a more meaningful measure of central tendency may be the median,

which is more resistant to the influence of extreme measurements.

3.5 Box-and-Whisker Plot (Box Plot)

A plot that shows the center, spread, and skewness of a data set. It is constructed by drawing a

box and two whiskers that use the median, the first quartile, the third quartile, and the smallest

and the largest values in the data set.

17

Page 18: 3.1 Measures of Central Tendency · 2011. 12. 30. · 3.1 Measures of Central Tendency •Summation Notation Xn i=1 xi or P x – Sum observation on the variable that appears to the

Example 23 The following data give the incomes (in thousands of dollars) for a sample of 13

households.

23 17 32 60 22 52 29 20 38 42 92 27 46

Construct a box-and-whisker plot for these data.

Solution: The following five steps are performed to construct a box-and-whisker plot.

Step 1 First, rank the data in increasing order and calculate the values of the median, the first

quartile, the third quartile, and the interquartile range. The ranked data are

17 20 22 23 27 29 32 38 42 46 52 60 92

Step 2 Determine the median, the quartiles, the smallest and the largest values in the given data set.

These five values for our example are as follows.

Median i = 132

= 6.5 =⇒ 7 th ordered data = 32

First quartile Q1 i = 134

= 3.25 =⇒ 4th ordered data = 23

Third quartile Q3 i = 3(13)4

= 9.75 =⇒ 10th ordered data = 46

Smallest value = 17

Largest value = 92

Step 3 Draw a horizontal line and mark the income levels on it such that all the values in the given

data set are covered. Above the horizontal line, draw a box with its left side at the position

of the first quartile and the right side at the position of the third quartile. Inside the box,

draw a vertical line at the position of the median.

Step 4 By drawing two lines, join the points of the smallest and the largest values to the box. These

values are 17 and 60 in this example as listed in Step 2. The two lines that join the box to

these two values are called whiskers.

3.6 Uses of Standard Deviation

3.6.1 The Empirical Rule

For data having a bell-shaped distribution,

18

Page 19: 3.1 Measures of Central Tendency · 2011. 12. 30. · 3.1 Measures of Central Tendency •Summation Notation Xn i=1 xi or P x – Sum observation on the variable that appears to the

• Approximately 68% of the items will be within one standard deviation of the mean.

• Approximately 95% of the items will be within two standard deviation of the mean.

• Almost all (99.7%) of the items will be within three standard deviation of the mean.

Example 24 The age distribution of a sample of 5000 persons is bell-shaped with a mean of 40

years and a standard of 12 years. Determine the approximate percentage of people who are 16 to 64

years old.

Solution: We will use the empirical rule to find the required percentage because the distribution of

ages follows a bell-shaped curve from the given information, for the this distribution,

x̄ = 40 years and s = 12 years

19

Page 20: 3.1 Measures of Central Tendency · 2011. 12. 30. · 3.1 Measures of Central Tendency •Summation Notation Xn i=1 xi or P x – Sum observation on the variable that appears to the

Each of the two points, 16 and 64, is 24 units away from the mean. Dividing 24 by 12, we convert

the distance between each of the two points and the mean in terms of standard deviation. Thus,

the distance between 16 and 40 and between 40 and 64 is each equal to 2s. Because the are within

two standard deviations of the mean is approximately 95% for a bell-shaped, approximately 95%

of the people in the sample are 16 to 64 years old.

3.6.2 Chebyshev’s Theorem

At least (1− 1/k2) of the items in any data set must be within k standard deviation of the mean,

where k is any value greater than 1.

Example 25 For a statistics class, the mean for the midterm scores is 75 and the standard devi-

ation is 8. Using Chebyshev’s theorem, find the percentage of students who scored between 59 and

91.

Solution: Let µ and σ be the mean and the standard deviation, respectively, of the midterm scores.

The from the given information,

µ = 75 and σ = 8

To find the percentage of students who scored between 59 and 91, the first step is to determine k.

Each of the two points, 59 and 91, is 16 units away from the mean.

20

Page 21: 3.1 Measures of Central Tendency · 2011. 12. 30. · 3.1 Measures of Central Tendency •Summation Notation Xn i=1 xi or P x – Sum observation on the variable that appears to the

The value of k is obtained by dividing the distance between the mean and each point by the standard

deviation. Thus,

k = 16/8 = 2

1−1

k2= 1−

1

(2)2 =3

4= 0.75

Hence, according to Chebyshev’s theorem, at least 75% of the students scored between 59 and 91.

3.6.3 z-score

A z-score measures how many standard deviations an observation is above or below the mean.

zi =xi − x̄

s

where zi = the z-score for item i, x̄ = the sample mean s = the sample standard deviation.

Example 26 Different typing skills are required for secretaries depending on whether one is working

in a law office, an accounting firm, or for a research mathematical group at a major university. In

order to evaluate candidates for these positions, an employment agency administers three distinct

standardized typing samples. A time penalty has been incorporated into the scoring of each sample

based on the number of typing errors. The mean and standard deviation for each test, together with

the score achieved by a recent applicant, are given as follows.

Sample Applicants’s score Mean Standard deviation

Law 141 sec 180 sec 30 sec

Accounting 7 min 10 min 2 min

Scientific 33 min 26 min 5 min

For what type of position does this applicant seem to be best suited?

Solution: First we compute z-score for each sample.

Law : z =141− 180

30= −1.3

Accounting : z =7− 10

2= −1.5

Scientific : z =33− 26

5= 1.4

21

Page 22: 3.1 Measures of Central Tendency · 2011. 12. 30. · 3.1 Measures of Central Tendency •Summation Notation Xn i=1 xi or P x – Sum observation on the variable that appears to the

Since speed is of primary importance, we looking for the z-score that represents the greatest number

of standards to the left of the mean and in our case that would be −1.5. Therefore, this applicant

ranks higher among typists in accounting firms than when compared to typists in the other two

areas, and consequently should be placed with an accounting firm.

22

Page 23: 3.1 Measures of Central Tendency · 2011. 12. 30. · 3.1 Measures of Central Tendency •Summation Notation Xn i=1 xi or P x – Sum observation on the variable that appears to the

3.7 Use Scientific Calculator to find the mean and standard deviation

Example 27 Use the calculator to find the mean and standard deviation of the data set:

1, 2, 5, 6, 8, 9, 10, 12, 14, 18.

Steps Function Keys Descriptions

1 MODE 3 Change to the statistical mode

2 SHIFT AC Clear all old data

3 1 M+DATA

/ RUNDATA

Input the first data

4 2 M+DATA

/ RUNDATA

Input the second data

5... Continue to input the data

6 1 8 M+DATA

/ RUNDATA

Input the final data

7 SHIFT 1 µ or x̄, population mean or sample mean

8 SHIFT 2 σ population standard deviation

9 SHIFT 3 s sample standard deviation

10 Kout 1∑

x2, sum of squares of all data

11 Kout 2∑

x, sum of all data

12 Kout 3 n population size or sample size

23

Page 24: 3.1 Measures of Central Tendency · 2011. 12. 30. · 3.1 Measures of Central Tendency •Summation Notation Xn i=1 xi or P x – Sum observation on the variable that appears to the

Kout 1 =∑

x2 = 975

Kout 2 =∑

x = 85

Kout 3 = n = 10

SHIFT 1 = x̄ = 8.5

SHIFT 2 = xσn = 5.0249378

SHIFT 3 = xσn−1 = 5.2967495

Example 28 Suppose that we want to find the mean and standard deviation of the following data:

Audit Time Frequency Class Midpoint

(Days) fi mi

10-14 4 12

15-19 8 17

20-24 5 22

25-29 2 27

30-34 1 32

MODE 3

SHIFT AC

1 2 × 4 M+DATA

/ RUNDATA

1 7 × 8 M+DATA

/ RUNDATA

2 2 × 5 M+DATA

/ RUNDATA

2 7 × 2 M+DATA

/ RUNDATA

3 2 × 1 M+DATA

/ RUNDATA

Kout 1 =∑

x2 = 7790 Kout 2 =∑

x = 380

Kout 3 = n = 20 SHIFT 1 = x̄ = 19

24

Page 25: 3.1 Measures of Central Tendency · 2011. 12. 30. · 3.1 Measures of Central Tendency •Summation Notation Xn i=1 xi or P x – Sum observation on the variable that appears to the

SHIFT 2 = xσn = 5.338539126 SHIFT 3 = xσn−1 = 5.477225575

Remarks: If your calculator has no MODE 3 function, then you may use the MODE

2 function. But all the x values will be moved to the y values. Eg.∑

x2 =∑

y2 = Kout

4 .

25