measures of variability - national tsing hua universitymx.nthu.edu.tw/~chaoenyu/stat5.pdf · 2019....

42
Measures of Variability Copyright © 2013 Pearson Education Same center, different variation Variation Variance Standard Deviation Coefficient of Variation Range Interquartile Range Measures of variation give information on the spread or variability of the data values. Ch. 2-1 2.2

Upload: others

Post on 21-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Measures of Variability - National Tsing Hua Universitymx.nthu.edu.tw/~chaoenyu/stat5.pdf · 2019. 11. 7. · or variability of the data values. Ch. 2-1 2.2. Range ... Measures the

Measures of Variability

Copyright © 2013 Pearson Education

Same center,

different variation

Variation

Variance Standard

Deviation

Coefficient of

Variation

Range Interquartile

Range

Measures of variation give

information on the spread

or variability of the data

values.

Ch. 2-1

2.2

Page 2: Measures of Variability - National Tsing Hua Universitymx.nthu.edu.tw/~chaoenyu/stat5.pdf · 2019. 11. 7. · or variability of the data values. Ch. 2-1 2.2. Range ... Measures the

Range

Simplest measure of variation

Difference between the largest and the smallest

observations:

Copyright © 2013 Pearson Education

Range = Xlargest – Xsmallest

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Range = 14 - 1 = 13

Example:

Ch. 2-2

Page 3: Measures of Variability - National Tsing Hua Universitymx.nthu.edu.tw/~chaoenyu/stat5.pdf · 2019. 11. 7. · or variability of the data values. Ch. 2-1 2.2. Range ... Measures the

Disadvantages of the Range

Ignores the way in which data are distributed

Sensitive to outliers

Copyright © 2013 Pearson Education

7 8 9 10 11 12

Range = 12 - 7 = 5

7 8 9 10 11 12

Range = 12 - 7 = 5

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120

Range = 5 - 1 = 4

Range = 120 - 1 = 119

Ch. 2-3

Page 4: Measures of Variability - National Tsing Hua Universitymx.nthu.edu.tw/~chaoenyu/stat5.pdf · 2019. 11. 7. · or variability of the data values. Ch. 2-1 2.2. Range ... Measures the

Interquartile Range

Can eliminate some outlier problems by using the interquartile range (IQR)

Eliminate high- and low-valued observations and calculate the range of the middle 50% of the data

Interquartile range = 3rd quartile – 1st quartile

IQR = Q3 – Q1

Copyright © 2013 Pearson Education Ch. 2-4

Page 5: Measures of Variability - National Tsing Hua Universitymx.nthu.edu.tw/~chaoenyu/stat5.pdf · 2019. 11. 7. · or variability of the data values. Ch. 2-1 2.2. Range ... Measures the

Interquartile Range

Copyright © 2013 Pearson Education Ch. 2-5

The interquartile range (IQR) measures the

spread in the middle 50% of the data

Defined as the difference between the

observation at the third quartile and the

observation at the first quartile

IQR = Q3 - Q1

Page 6: Measures of Variability - National Tsing Hua Universitymx.nthu.edu.tw/~chaoenyu/stat5.pdf · 2019. 11. 7. · or variability of the data values. Ch. 2-1 2.2. Range ... Measures the

Box-and-Whisker Plot

Copyright © 2013 Pearson Education Ch. 2-6

A box-and-whisker plot is a graph that describes the

shape of a distribution

Created from the five-number summary: the

minimum value, Q1, the median, Q3, and the

maximum

The inner box shows the range from Q1 to Q3, with a

line drawn at the median

Two “whiskers” extend from the box. One whisker is

the line from Q1 to the minimum, the other is the line

from Q3 to the maximum value

Page 7: Measures of Variability - National Tsing Hua Universitymx.nthu.edu.tw/~chaoenyu/stat5.pdf · 2019. 11. 7. · or variability of the data values. Ch. 2-1 2.2. Range ... Measures the

Box Plot

Copyright © 2013 Pearson Education Ch. 2-7

Page 8: Measures of Variability - National Tsing Hua Universitymx.nthu.edu.tw/~chaoenyu/stat5.pdf · 2019. 11. 7. · or variability of the data values. Ch. 2-1 2.2. Range ... Measures the

Population Variance

Average of squared deviations of values from

the mean (Karl Pearson 1893)

Population variance:

Copyright © 2013 Pearson Education

N

μ)(x

σ

N

1i

2

i

2

Where = population mean

N = population size

xi = ith value of the variable x

μ

Ch. 2-8

Page 9: Measures of Variability - National Tsing Hua Universitymx.nthu.edu.tw/~chaoenyu/stat5.pdf · 2019. 11. 7. · or variability of the data values. Ch. 2-1 2.2. Range ... Measures the

Sample Variance

Average (approximately) of squared deviations

of values from the mean

Sample variance:

Copyright © 2013 Pearson Education

1-n

)x(x

s

n

1i

2

i

2

Defect: Not in the same unit of original data values.

Ch. 2-9

Page 10: Measures of Variability - National Tsing Hua Universitymx.nthu.edu.tw/~chaoenyu/stat5.pdf · 2019. 11. 7. · or variability of the data values. Ch. 2-1 2.2. Range ... Measures the

Population Standard Deviation

Most commonly used measure of variation

Shows variation about the mean

Has the same units as the original data

Population standard deviation:

Copyright © 2013 Pearson Education

N

μ)(x

σ

N

1i

2

i

Ch. 2-10

Page 11: Measures of Variability - National Tsing Hua Universitymx.nthu.edu.tw/~chaoenyu/stat5.pdf · 2019. 11. 7. · or variability of the data values. Ch. 2-1 2.2. Range ... Measures the

Sample Standard Deviation

Most commonly used measure of variation

Shows variation about the mean

Has the same units as the original data

Sample standard deviation:

Copyright © 2013 Pearson Education

1-n

)x(x

S

n

1i

2

i

Ch. 2-11

Page 12: Measures of Variability - National Tsing Hua Universitymx.nthu.edu.tw/~chaoenyu/stat5.pdf · 2019. 11. 7. · or variability of the data values. Ch. 2-1 2.2. Range ... Measures the

Calculation Example:Sample Standard Deviation

Copyright © 2013 Pearson Education

Sample

Data (xi) : 10 12 14 15 17 18 18 24

n = 8 Mean = x = 16

4.30957

130

18

16)(2416)(1416)(1216)(10

1n

)x(24)x(14)x(12)X(10s

2222

2222

A measure of the “average”

scatter around the mean

Ch. 2-12

Page 13: Measures of Variability - National Tsing Hua Universitymx.nthu.edu.tw/~chaoenyu/stat5.pdf · 2019. 11. 7. · or variability of the data values. Ch. 2-1 2.2. Range ... Measures the

Measuring variation

Copyright © 2013 Pearson Education

Small standard deviation

Large standard deviation

Ch. 2-13

Page 14: Measures of Variability - National Tsing Hua Universitymx.nthu.edu.tw/~chaoenyu/stat5.pdf · 2019. 11. 7. · or variability of the data values. Ch. 2-1 2.2. Range ... Measures the

Comparing Standard Deviations

Copyright © 2013 Pearson Education

s = 3.338(compare to the two

cases below)

11 12 13 14 15 16 17 18 19 20 21

11 12 13 14 15 16 17 18 19 20 21

Data B

Data A

s = 0.926 (values are concentrated

near the mean)

11 12 13 14 15 16 17 18 19 20 21s = 4.570(values are dispersed far

from the mean)Data C

Ch. 2-14

Mean = 15.5 for each data set

Page 15: Measures of Variability - National Tsing Hua Universitymx.nthu.edu.tw/~chaoenyu/stat5.pdf · 2019. 11. 7. · or variability of the data values. Ch. 2-1 2.2. Range ... Measures the

Advantages of Variance and Standard Deviation

Each value in the data set is used in the

calculation

Values far from the mean are given extra

weight

(because deviations from the mean are squared)

Copyright © 2013 Pearson Education Ch. 2-15

Page 16: Measures of Variability - National Tsing Hua Universitymx.nthu.edu.tw/~chaoenyu/stat5.pdf · 2019. 11. 7. · or variability of the data values. Ch. 2-1 2.2. Range ... Measures the

Coefficient of Variation

Measures relative variation

Always in percentage (%)

Shows variation relative to mean

Can be used to compare two or more sets of

data measured in different units

Copyright © 2013 Pearson Education

100%x

sCV

Ch. 2-16

100%μ

σCV

Population coefficient of

variation:

Sample coefficient of

variation:

Page 17: Measures of Variability - National Tsing Hua Universitymx.nthu.edu.tw/~chaoenyu/stat5.pdf · 2019. 11. 7. · or variability of the data values. Ch. 2-1 2.2. Range ... Measures the

Comparing Coefficient of Variation

Stock A:

Average price last year = $50

Standard deviation = $5

Stock B:

Average price last year = $100

Standard deviation = $5

Copyright © 2013 Pearson Education

Both stocks

have the same

standard

deviation, but

stock B is less

variable relative

to its price

10%100%$50

$5100%

x

sCVA

5%100%$100

$5100%

x

sCVB

Ch. 2-17

Page 18: Measures of Variability - National Tsing Hua Universitymx.nthu.edu.tw/~chaoenyu/stat5.pdf · 2019. 11. 7. · or variability of the data values. Ch. 2-1 2.2. Range ... Measures the

For any population with mean μ and

standard deviation σ , and k > 1 , the

percentage of observations that fall within

the interval

[μ + kσ]Is at least

Copyright © 2013 Pearson Education

Chebychev’s Theorem

)]%(1/k100[12

Ch. 2-18

Page 19: Measures of Variability - National Tsing Hua Universitymx.nthu.edu.tw/~chaoenyu/stat5.pdf · 2019. 11. 7. · or variability of the data values. Ch. 2-1 2.2. Range ... Measures the

Regardless of how the data are distributed, at

least (1 - 1/k2) of the values will fall within k

standard deviations of the mean (for k > 1)

Examples:

(1 - 1/1.52) = 55.6% ……... k = 1.5 (μ ± 1.5σ)

(1 - 1/22) = 75% …........... k = 2 (μ ± 2σ)

(1 - 1/32) = 89% …….…... k = 3 (μ ± 3σ)

Copyright © 2013 Pearson Education

Chebychev’s Theorem

withinAt least

(continued)

Ch. 2-19

Page 20: Measures of Variability - National Tsing Hua Universitymx.nthu.edu.tw/~chaoenyu/stat5.pdf · 2019. 11. 7. · or variability of the data values. Ch. 2-1 2.2. Range ... Measures the

If the data distribution is bell-shaped, then

the interval:

contains about 68% of the values in

the population or the sample

Copyright © 2013 Pearson Education

The Empirical Rule

1σμ

μ

68%

1 σμ

Ch. 2-20

Page 21: Measures of Variability - National Tsing Hua Universitymx.nthu.edu.tw/~chaoenyu/stat5.pdf · 2019. 11. 7. · or variability of the data values. Ch. 2-1 2.2. Range ... Measures the

contains about 95% of the values in

the population or the sample

contains almost all (about 99.7%) of

the values in the population or the sample

Copyright © 2013 Pearson Education

The Empirical Rule

2 σμ

3 σμ

3 σμ

99.7%95%

2 σμ

Ch. 2-21

(continued)

Page 22: Measures of Variability - National Tsing Hua Universitymx.nthu.edu.tw/~chaoenyu/stat5.pdf · 2019. 11. 7. · or variability of the data values. Ch. 2-1 2.2. Range ... Measures the

A z-score shows the position of a value

relative to the mean of the distribution.

indicates the number of standard deviations a

value is from the mean.

A z-score greater than zero indicates that the value is

greater than the mean

a z-score less than zero indicates that the value is

less than the mean

a z-score of zero indicates that the value is equal to

the mean.

Copyright © 2013 Pearson Education

z-Score

Ch. 2-22

Page 23: Measures of Variability - National Tsing Hua Universitymx.nthu.edu.tw/~chaoenyu/stat5.pdf · 2019. 11. 7. · or variability of the data values. Ch. 2-1 2.2. Range ... Measures the

If the data set is the entire population of data

and the population mean, µ , and the population

standard deviation, σ, are known, then for each

value, xi, the z-score associated with xi is

Copyright © 2013 Pearson Education

z-Score

Ch. 2-23

σ

μ-xz

i

(continued)

Page 24: Measures of Variability - National Tsing Hua Universitymx.nthu.edu.tw/~chaoenyu/stat5.pdf · 2019. 11. 7. · or variability of the data values. Ch. 2-1 2.2. Range ... Measures the

If intelligence is measured for a population

using an IQ score, where the mean IQ score

is 100 and the standard deviation is 15, what

is the z-score for an IQ of 121?

Copyright © 2013 Pearson Education

z-Score

Ch. 2-24

1.415

100- 121

σ

μ-xz

i

(continued)

A score of 121 is 1.4 standard

deviations above the mean.

Page 25: Measures of Variability - National Tsing Hua Universitymx.nthu.edu.tw/~chaoenyu/stat5.pdf · 2019. 11. 7. · or variability of the data values. Ch. 2-1 2.2. Range ... Measures the

Weighted Mean and Measures of Grouped Data

The weighted mean of a set of data is

Where wi is the weight of the ith observation

and

Use when data is already grouped into n classes, with wi values in the ith class

Copyright © 2013 Pearson Education

n

xwxwxw

n

xw

xnn2211

n

1i

ii

Ch. 2-25

i

wn

2.3

Page 26: Measures of Variability - National Tsing Hua Universitymx.nthu.edu.tw/~chaoenyu/stat5.pdf · 2019. 11. 7. · or variability of the data values. Ch. 2-1 2.2. Range ... Measures the

Approximations for Grouped Data

Suppose data are grouped into K classes, with

frequencies f1, f2, . . ., fK, and the midpoints of the

classes are m1, m2, . . ., mK

For a sample of n observations, the mean is

Copyright © 2013 Pearson Education

n

mf

x

K

1i

ii

K

1i

ifnwhere

Ch. 2-26

Page 27: Measures of Variability - National Tsing Hua Universitymx.nthu.edu.tw/~chaoenyu/stat5.pdf · 2019. 11. 7. · or variability of the data values. Ch. 2-1 2.2. Range ... Measures the

Approximations for Grouped Data

Suppose data are grouped into K classes, with

frequencies f1, f2, . . ., fK, and the midpoints of the

classes are m1, m2, . . ., mK

For a sample of n observations, the variance is

Copyright © 2013 Pearson Education Ch. 2-27

1n

)x(mf

s

K

1i

2

ii

2

Page 28: Measures of Variability - National Tsing Hua Universitymx.nthu.edu.tw/~chaoenyu/stat5.pdf · 2019. 11. 7. · or variability of the data values. Ch. 2-1 2.2. Range ... Measures the

Measures of Relationships Between Variables

Two measures of the relationship between variable are

Covariance a measure of the direction of a linear relationship

between two variables

Correlation Coefficient a measure of both the direction and the strength of a

linear relationship between two variables

Copyright © 2013 Pearson Education Ch. 2-28

2.4

Page 29: Measures of Variability - National Tsing Hua Universitymx.nthu.edu.tw/~chaoenyu/stat5.pdf · 2019. 11. 7. · or variability of the data values. Ch. 2-1 2.2. Range ... Measures the

Covariance

The covariance measures the strength of the linear relationship between two variables

The population covariance:

The sample covariance:

Only concerned with the strength of the relationship

No causal effect is implied

Copyright © 2013 Pearson Education

N

))(y(x

y),(xCov

N

1i

yixi

xy

1n

)y)(yx(x

sy),(xCov

n

1i

ii

xy

Ch. 2-29

Page 30: Measures of Variability - National Tsing Hua Universitymx.nthu.edu.tw/~chaoenyu/stat5.pdf · 2019. 11. 7. · or variability of the data values. Ch. 2-1 2.2. Range ... Measures the

Interpreting Covariance

Covariance between two variables:

Cov(x,y) > 0 x and y tend to move in the same direction

Cov(x,y) < 0 x and y tend to move in opposite directions

Cov(x,y) = 0 x and y are independent

Copyright © 2013 Pearson Education Ch. 2-30

Page 31: Measures of Variability - National Tsing Hua Universitymx.nthu.edu.tw/~chaoenyu/stat5.pdf · 2019. 11. 7. · or variability of the data values. Ch. 2-1 2.2. Range ... Measures the

Coefficient of Correlation

Measures the relative strength of the linear relationship between two variables

Population correlation coefficient:

Sample correlation coefficient:

Copyright © 2013 Pearson Education

YXss

y),(xCovr

YXσσ

y),(xCovρ

Ch. 2-31

Page 32: Measures of Variability - National Tsing Hua Universitymx.nthu.edu.tw/~chaoenyu/stat5.pdf · 2019. 11. 7. · or variability of the data values. Ch. 2-1 2.2. Range ... Measures the

Features of Correlation Coefficient, r

Unit free

Ranges between –1 and 1

The closer to –1, the stronger the negative linear

relationship

The closer to 1, the stronger the positive linear

relationship

The closer to 0, the weaker any positive linear

relationship

Copyright © 2013 Pearson Education Ch. 2-32

Page 33: Measures of Variability - National Tsing Hua Universitymx.nthu.edu.tw/~chaoenyu/stat5.pdf · 2019. 11. 7. · or variability of the data values. Ch. 2-1 2.2. Range ... Measures the

Scatter Plots of Data with Various Correlation Coefficients

Copyright © 2013 Pearson Education

Y

X

Y

X

Y

X

Y

X

Y

X

r = -1 r = -.6 r = 0

r = +.3r = +1

Y

Xr = 0

Ch. 2-33

Page 34: Measures of Variability - National Tsing Hua Universitymx.nthu.edu.tw/~chaoenyu/stat5.pdf · 2019. 11. 7. · or variability of the data values. Ch. 2-1 2.2. Range ... Measures the

Interpreting the Result

r = .733

There is a relatively

strong positive linear

relationship between

test score #1

and test score #2

Students who scored high on the first test tended to score high on second test

Copyright © 2013 Pearson Education

Scatter Plot of Test Scores

70

75

80

85

90

95

100

70 75 80 85 90 95 100

Test #1 ScoreT

est

#2 S

co

re

Ch. 2-34

Page 35: Measures of Variability - National Tsing Hua Universitymx.nthu.edu.tw/~chaoenyu/stat5.pdf · 2019. 11. 7. · or variability of the data values. Ch. 2-1 2.2. Range ... Measures the

Covariance

Let X and Y be discrete random variables with means

μX and μY

The expected value of (X - μX)(Y - μY) is called the

covariance between X and Y

For discrete random variables

An equivalent expression is

x y

yxYXy))P(x,μ)(yμ(x)]μ)(YμE[(XY)Cov(X,

x y

yxyxμμy)xyP(x,μμE(XY)Y)Cov(X,

Copyright © 2013 Pearson

EducationCh. 4-35

Page 36: Measures of Variability - National Tsing Hua Universitymx.nthu.edu.tw/~chaoenyu/stat5.pdf · 2019. 11. 7. · or variability of the data values. Ch. 2-1 2.2. Range ... Measures the

Correlation

The correlation between X and Y is:

-1 ≤ ρ ≤ 1

ρ = 0 no linear relationship between X and Y

ρ > 0 positive linear relationship between X and Y when X is high (low) then Y is likely to be high (low)

ρ = +1 perfect positive linear dependency

ρ < 0 negative linear relationship between X and Y when X is high (low) then Y is likely to be low (high)

ρ = -1 perfect negative linear dependency

YXσσ

Y)Cov(X,Y)Corr(X,ρ

Copyright © 2013 Pearson

EducationCh. 4-36

Page 37: Measures of Variability - National Tsing Hua Universitymx.nthu.edu.tw/~chaoenyu/stat5.pdf · 2019. 11. 7. · or variability of the data values. Ch. 2-1 2.2. Range ... Measures the

Covariance and Independence

The covariance measures the strength of the

linear relationship between two variables

If two random variables are statistically

independent, the covariance between them

is 0

The converse is not necessarily true

Copyright © 2013 Pearson

EducationCh. 4-37

Page 38: Measures of Variability - National Tsing Hua Universitymx.nthu.edu.tw/~chaoenyu/stat5.pdf · 2019. 11. 7. · or variability of the data values. Ch. 2-1 2.2. Range ... Measures the

Example: Investment Returns

Return per $1,000 for two types of investments

P(xiyi) Economic condition Passive Fund X Aggressive Fund Y

.2 Recession - $ 25 - $200

.5 Stable Economy + 50 + 60

.3 Expanding Economy + 100 + 350

Investment

E(x) = μx = (-25)(.2) +(50)(.5) + (100)(.3) = 50

E(y) = μy = (-200)(.2) +(60)(.5) + (350)(.3) = 95

Copyright © 2013 Pearson

EducationCh. 4-38

Page 39: Measures of Variability - National Tsing Hua Universitymx.nthu.edu.tw/~chaoenyu/stat5.pdf · 2019. 11. 7. · or variability of the data values. Ch. 2-1 2.2. Range ... Measures the

Computing the Standard Deviation for Investment Returns

P(xiyi) Economic condition Passive Fund X Aggressive Fund Y

0.2 Recession - $ 25 - $200

0.5 Stable Economy + 50 + 60

0.3 Expanding Economy + 100 + 350

Investment

43.30

(0.3)50)(100(0.5)50)(50(0.2)50)(-25σ 222

X

193.71

(0.3)95)(350(0.5)95)(60(0.2)95)(-200σ 222

y

Copyright © 2013 Pearson

EducationCh. 4-39

Page 40: Measures of Variability - National Tsing Hua Universitymx.nthu.edu.tw/~chaoenyu/stat5.pdf · 2019. 11. 7. · or variability of the data values. Ch. 2-1 2.2. Range ... Measures the

Covariance for Investment Returns

P(xiyi) Economic condition Passive Fund X Aggressive Fund Y

.2 Recession - $ 25 - $200

.5 Stable Economy + 50 + 60

.3 Expanding Economy + 100 + 350

Investment

8250

95)(.3)50)(350(100

95)(.5)50)(60(5095)(.2)200-50)((-25Y)Cov(X,

Copyright © 2013 Pearson

EducationCh. 4-40

Page 41: Measures of Variability - National Tsing Hua Universitymx.nthu.edu.tw/~chaoenyu/stat5.pdf · 2019. 11. 7. · or variability of the data values. Ch. 2-1 2.2. Range ... Measures the

Portfolio Example

Investment X: μx = 50 σx = 43.30

Investment Y: μy = 95 σy = 193.21

σxy = 8250

Suppose 40% of the portfolio (P) is in Investment X and 60% is in Investment Y:

The portfolio return and portfolio variability are between the values for investments X and Y considered individually

77)95()6(.)50(4.E(P)

04.133

8250)2(.4)(.6)((193.21))6(.(43.30)(.4)σ2222

P

Copyright © 2013 Pearson

EducationCh. 4-41

Page 42: Measures of Variability - National Tsing Hua Universitymx.nthu.edu.tw/~chaoenyu/stat5.pdf · 2019. 11. 7. · or variability of the data values. Ch. 2-1 2.2. Range ... Measures the

Interpreting the Results for Investment Returns

The aggressive fund has a higher expected return, but much more risk

μy = 95 > μx = 50

but

σy = 193.21 > σx = 43.30

The Covariance of 8250 indicates that the two investments are positively related and will vary in the same direction

Copyright © 2013 Pearson

EducationCh. 4-42