engineering statistic 4 lecture: measures of …uowa.edu.iq/filestorage/file_1553620932.pdf · 1-...

7
In statistics, to describe the data set accurately, statisticians must know more than the measures of central tendency. Two data sets with the same mean may have completely different variation or the measures of , so the measures that help us know about the spread of data set are called dispersion such as : dispersion 1- Range. 2- Variance. 3- Standard deviation. The range is the simplest of the three measures and is defined now. The range is the highest : Range - 1 value minus the lowest value. The symbol R is used for the range. : Disadvantage of range a- Based on two values only, largest and smallest. b- Extremely large or extremely small data can significantly effected the range. 9 16 10 7 - 7 2 0 - 5 Calculate the range for the following data set : ): 1 Exp.( Sol: -9 -7 0 2 5 7 10 16 R = highest value - lowest value R= 16 (-9) = 25 2- Variance and Standard Deviation. a- Ungrouped data Exp(2): Find the sample variance , standard deviation and the range ,for the amount of European auto sales for a sample of 6 years shown. The data are in millions of dollars. 11.2, 11.9, 12.0, 12.8, 13.4, 14.3 Sol: 1 − : 2 = ∑(− ) 2 −1 12.6 /6= 75.6/6= 14.3 + 13.4 + 12.8 + 12.0 + 11.9 + 11.2 = = ∑x/n - x : 2 = ∑(−) 2 : 2 = ∑(− ) 2 −1 : = √ 2 : = √ 2 R = highest value - lowest value ENGINEERING STATISTIC 4 th LECTURE: MEASURES OF DISPERSION AND POSITION

Upload: others

Post on 24-Mar-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ENGINEERING STATISTIC 4 LECTURE: MEASURES OF …uowa.edu.iq/filestorage/file_1553620932.pdf · 1- Range: The range is the simplest of the three measures and is defined now. The range

In statistics, to describe the data set accurately, statisticians must know more than the measures of

central tendency. Two data sets with the same mean may have completely different variation or

the measures of , so the measures that help us know about the spread of data set are called dispersion

such as : dispersion

1- Range.

2- Variance.

3- Standard deviation.

The range is the simplest of the three measures and is defined now. The range is the highest : Range -1

value minus the lowest value. The symbol R is used for the range.

:Disadvantage of range

a- Based on two values only, largest and smallest.

b- Extremely large or extremely small data can significantly effected the range.

9 16 10 7 -7 2 0 -5 Calculate the range for the following data set : ):1Exp.(

Sol: -9 -7 0 2 5 7 10 16

R = highest value - lowest value

R= 16 – (-9) = 25

2- Variance and Standard Deviation.

a- Ungrouped data

Exp(2): Find the sample variance , standard deviation and the range ,for the amount of European auto

sales for a sample of 6 years shown. The data are in millions of dollars.

11.2, 11.9, 12.0, 12.8, 13.4, 14.3

Sol:

1 − 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒: 𝑠2 = ∑(𝑥−𝑥−)2

𝑛−1

12.6/6= 75.6/6= 14.3 +13.4 +12.8 +12.0 +11.9 +11.2 = = ∑x/n -x

𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒: 𝜎2 = ∑(𝑥−𝜇)2

𝑁

𝑆𝑎𝑚𝑝𝑙𝑒 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒: 𝑠2 = ∑(𝑥−𝑥−)2

𝑛−1

𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛: 𝜎 = √ 𝜎2

𝑆𝑎𝑚𝑝𝑙𝑒 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛: 𝑠 = √ 𝑠2

R = highest value - lowest value

ENGINEERING STATISTIC

4th LECTURE: MEASURES OF DISPERSION AND POSITION

Page 2: ENGINEERING STATISTIC 4 LECTURE: MEASURES OF …uowa.edu.iq/filestorage/file_1553620932.pdf · 1- Range: The range is the simplest of the three measures and is defined now. The range

2)-x-m(xif 2)-x-m(x i.fmx mx )iFreq. (f Class

boundaries

i

272.25 272.25 8 8 1 5.5–10.5 1

264.5 132.25 26 13 2 10.5–15.5 2

126.75 42.25 54 18 3 15.5–20.5 3

11.25 2.25 115 23 5 20.5–25.5 4

49.00 12.25 112 28 4 25.5–30.5 5

216.75 72.25 99 33 3 30.5–35.5 6

364.5 182.25 76 38 2 35.5–40.5 7

∑1305 ∑490

𝑥− =∑ 𝑓.𝑥𝑚

𝑛 , 𝑥− =

490

20= 24.5

s2= (11.2-12.6)2 +(11.9-12.6)2+(12.0-12.6)2+(12.8-12.6)2+(13.4-12.6)2+(14.3-12.6)2/ 5= 1.278

2- Standard deviation : s = √1.278= 1.13

3- The range (R) = 14.3 – 11.2 = 3.1

b- Grouped data

Exp(3): Find the variance and the standard deviation for the data in this frequency distribution table . The data represent the number of miles that 20 runners ran during one week.

35.5–40.5 30.5–35.5 25.5–30.5 20.5–25.5 15.5–20.5 10.5–15.5 5.5–10.5 Class

boundaries

2 3 4 5 3 2 1 )iFreq. (f

Sol:

𝑠2 = ∑ 𝑓(𝑥𝑚−𝑥−)2

𝑛−1=

1305

19= 68.68

s= 8.28

𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒: 𝜎2 = ∑ 𝑓(𝑥𝑚−𝜇)2

𝑁

𝑆𝑎𝑚𝑝𝑙𝑒 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒: 𝑠2 = ∑ 𝑓(𝑥𝑚−𝑥−)2

𝑛−1

𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛: 𝜎 = √ 𝜎2

𝑆𝑎𝑚𝑝𝑙𝑒 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛: 𝑠 = √ 𝑠2

Page 3: ENGINEERING STATISTIC 4 LECTURE: MEASURES OF …uowa.edu.iq/filestorage/file_1553620932.pdf · 1- Range: The range is the simplest of the three measures and is defined now. The range

Coefficient of Variation

A statistic that allows you to compare standard deviations when the units are different, it denoted by

CVar, is the standard deviation divided by the mean. The result is expressed as a percentage.

Measures of Position

In addition to measures of central tendency and measures of variation, there are measures

of position or location. These measures include :

1-standard scores.

2- percentiles.

3- deciles, and quartiles.

They are used to locate the relative position of a data value in the data set.

z score or standard score : it represents the number of standard deviations that a data value falls above

or below the mean.

For samples, the formula is : z = (x-x-)/s

For populations, the formula is: z = (x-µ)/σ

Exp: A student scored 65 on a calculus test that had a mean of 50 and a standard deviation

of 10; she scored 30 on a history test with a mean of 25 and a standard deviation of 5.

Compare her relative positions on the two tests.

Sol:

z1= (x-x-)/s = 65-50/10= 1.5

z2= (x-x-)/s= 30-25/5= 1

Since the z score for calculus is larger, her relative position in the calculus class is

higher than her relative position in the history class.

For samples: CVar =(s/x-) *100

For populations: CVar =(σ/µ) *100

*** Note that if the z score is positive, the score is above the mean. If the z score is 0, the

score is the same as the mean. And if the z score is negative, the score is below the mean.

Exp: the mean of the number of sales of cars over a 3 month period is 87, and the standard deviation is 5. The mean of the commissions is 5225 $, and the standard deviation is 773 $. Compare the variations of the two. Solution: The coefficients of variation are:

CVar =(s/x-) *100 = (5/87)*100= 5.75% sales

CVar =(s/x-) *100 = (773/5225)*100= 14.8 % commissions

Since the coefficient of variation is larger for commissions, the commissions are more

variable than the sales.

Page 4: ENGINEERING STATISTIC 4 LECTURE: MEASURES OF …uowa.edu.iq/filestorage/file_1553620932.pdf · 1- Range: The range is the simplest of the three measures and is defined now. The range

Quartile .

As the name implies, quartiles divide the data set into four equal parts. Therefore the first quartile, Q1,

is the 25th percentile, the second quartile, Q2 is the 50th percentile (or the median), and the third quartile,

Q3, is the 75th percentile. The difference between the third and first quartiles is inter quartile range

(IQR).

For ungrouped data, the quartiles (Q1, Q2, and Q3) are calculated by:

1- Arrange the data in order from lowest to highest.

2- Find the median of the data values. This is the value for Q2.

3- Find the median of the data values that fall below Q2. This is the value for Q1.

4- Find the median of the data values that fall above Q2. This is the value for Q3.

Percentiles: divide the data set into 100 equal groups. Each data set has 99 percentiles, data must be

ranked in increasing order to compute percentiles. The kth percentile is denoted by Pk , where k is an

integer range from (1 –99). For example, the 25th percentile which is denoted by P25, is defined to be

that numerical value such that at most 25% of the values are smaller than it and at most 75% are larger

than it in an ordered data set.

1Q 2Q

median 3Q Min. value Max. value

25% of data 25% of data 25% of data 25% of data

IQR = Q3- Q1

Example : Find Q1, Q2, and Q3 for the data set 15, 13, 6, 5, 12, 50, 22, 18.

Sol:

1- Arrange the data in increasing order: 5, 6, 12, 13, 15, 18, 22, 50

Q2 is the median of all values → Q2 = (13+15)/2= 14

Q1 is the median of values ( 5, 6, 12, 13 ) → Q1 = (6+12)/2= 9.

Q3 is the median of values (15, 18, 22, 50) → Q3 = (18+22)/2= 20.

Example : the following are the ages of nine employees of an insurance company

47 28 39 51 33 37 59 24 33

a- Find the values of three quartiles

b- When does the age 28 fall in relation to the ages of these employees.

c- Find the inter quartile range (IQR).

Sol:

Arrange the data in increasing order: 24, 28, 33, 33, 37, 39, 47, 51, 59

Q2 is the median of all values → Q2 = 37

Q1 is the median of values ( 24, 28, 33, 33) → Q1 = (28+33)/2= 30.5.

Q3 is the median of values (39, 47, 51, 59) → Q3 = (47+51)/2=49.

b- The age 28 fall in the first 25% of the ages.

c- The inter quartile range(IQR) = Q3- Q1 = 49-30.5= 18.5 years.

Page 5: ENGINEERING STATISTIC 4 LECTURE: MEASURES OF …uowa.edu.iq/filestorage/file_1553620932.pdf · 1- Range: The range is the simplest of the three measures and is defined now. The range

For ungrouped data,

The percentile corresponding to a given value (x) is computed by using the formula:

Percentile = 𝑁𝑜.𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠 𝑏𝑒𝑙𝑜𝑤 𝑥+0.5

𝑇𝑜𝑡𝑎𝑙 𝑁𝑜.𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠 ∗ 100

To Finding the value Corresponding to a Given Percentile :

Let p be the percentile and n the sample size.

Arrange the data in order.

Compute c = (n×p)/100.

If c is not a whole number, round up to the next whole number. If c is a whole number, use the

value halfway between c and c+1.

The value of c is the position value of the required percentile.

Deciles :divide the distribution into 10 groups. They are denoted by D1, D2, etc.

Note that D1 corresponds to P10; D2 corresponds to P20; etc.

Deciles can be found by using the formulas given for percentiles. Taken altogether then, these are the

relationships among percentiles, deciles, and quartiles.

Example: A teacher gives a 20-point test to 10 students. Find the percentile rank of

a score of 12. Scores: 18, 15, 12, 6, 8, 2, 3, 5, 20, 10.

Sol:

Ordered set: 2, 3, 5, 6, 8, 10, 12, 15, 18, 20.

Percentile = 𝑁𝑜.𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠 𝑏𝑒𝑙𝑜𝑤 𝑥+0.5

𝑇𝑜𝑡𝑎𝑙 𝑁𝑜.𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠 ∗ 100

Percentile = 6+0.5

10 ∗ 100 = 65th percentile

Student did better than 65% of the class.

Example: For the following data set: 2, 3, 5, 6, 8, 10, 12, 15, 18, 20.

Find the values of the 25th and 80th percentile.

Sol:

a. n = 10, p = 25

c = (10×25)/100 = 2.5. Hence round up to c = 3.

Thus, the value of the 25th percentile is the value x = 5.

b. n = 10, p = 80

c = (10× 80)/100 = 8.

Thus the value of the 80th percentile is the average of the 8th and 9th values.

x = (15 + 18)/2 = 16.5.

Page 6: ENGINEERING STATISTIC 4 LECTURE: MEASURES OF …uowa.edu.iq/filestorage/file_1553620932.pdf · 1- Range: The range is the simplest of the three measures and is defined now. The range

Exp: The following are test scores for a particular math class. Find the sixth deciles

44 56 58 62 64 64 70 72 72 72

74 74 75 78 78 79 80 82 82 84

86 87 88 90 92 95 96 96 98 100

Sol: D6 = P60

n = 30, p = 60, c = (30×60)/100 = 18

The average of the 18th and 19th items represents the 6th deciles. D6= 82.

Percentiles, deciles, and quartiles for grouped data: in order to find what value corresponds to a

specified i position such as the positions of Percentile, Quartile, or Decile in grouped data, the

following formulas must be used:

Where:

Qi , Pi, Di are the quartile, Percentile and deciles of i position.

L: lower class boundary for the class contain i position.

n: total number of data.

F: The cumulative frequency of the class before the class contain i position

fm: The frequency of the class contain i position. , Δ: the class width.

Deciles are denoted by D1, D2, D3, . . . , D9,

and they correspond to P10, P20, P30, . . . , P90.

Quartiles are denoted by Q1, Q2, Q3

and they correspond to P25, P50, P75.

The median = P50 = Q2 =D5.

𝑄𝑖= 𝐿 + n∗(

𝑖

4)− 𝐹

𝑓𝑖∗ ∆

𝑃𝑖= 𝐿 + n∗(

𝑖

100)− 𝐹

𝑓𝑖∗ ∆

𝐷𝑖= 𝐿 + n∗(

𝑖

10)− 𝐹

𝑓𝑖∗ ∆

Page 7: ENGINEERING STATISTIC 4 LECTURE: MEASURES OF …uowa.edu.iq/filestorage/file_1553620932.pdf · 1- Range: The range is the simplest of the three measures and is defined now. The range

cumulative freq.

Freq.

)i(f Class

boundaries

i

2 2 7.5–10.5 1

6 4 10.5–13.5 2

12 6 13.5–16.5 3

16 4 16.5–19.5 4

19 3 19.5–22.5 5

20 1 22.5–25.5 6

Exp: The time taken by 20 worker in a factory to do a particular job were tabled as follow, find Q2, P70,

and D4.

Sol:

𝑄𝑖= 𝐿 + n∗(

𝑖

4)− 𝐹

𝑓𝑖∗ ∆

For Q2 → n ∗ (𝑖

4) = 20 ∗

2

4 = 10

the class boundary of Q2 is (13.5–16.5)

𝑄2= 13.5 + 10 − 6

6∗ 3 = 15.5

𝑃𝑖= 𝐿 + n∗ (

𝑖

100)− 𝐹

𝑓𝑖∗ ∆

for P70 → n ∗ (𝑖

100) = 20 ∗

70

100 = 14

the class boundary of P70 is (16.5–19.5)

𝑃70= 16.5 + 14− 12

4∗ 3 = 18

𝐷𝑖= 𝐿 + n∗(

𝑖

10)− 𝐹

𝑓𝑖∗ ∆

For D4 → n ∗ (𝑖

10) = 20 ∗

4

10 = 8

the class boundary of D4 is (13.5–16.5)

𝐷4 = 13.5 + 8− 6

6∗ 3 = 14.5