lecture 4 variability: standard deviation. variability reminder - how spread out the scores...

40
Lecture 4 Variability: Standard Deviation

Post on 20-Dec-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Lecture 4 Variability: Standard Deviation. Variability Reminder - How spread out the scores are…Range - How does the range of each of these distributions

Lecture 4

Variability: Standard Deviation

Page 2: Lecture 4 Variability: Standard Deviation. Variability Reminder - How spread out the scores are…Range - How does the range of each of these distributions

Variability

Reminder - How spread out the scores are…Range - How does the range of each of these distributions vary? Or the Interquartile range?

Measure of error - is our sample similar to the population OR is an individual score representative of its sample

Page 3: Lecture 4 Variability: Standard Deviation. Variability Reminder - How spread out the scores are…Range - How does the range of each of these distributions

Standard Deviation Standard deviation - the average distance on

either side of the mean. Goal of the SD is to measure the standard or

typical distance from the mean.– But it’s not practical with large N, so we need to

estimate the variance and standard deviation using equations

60

62

64

66

68

70

72

74

76

Ben Tom Bill James Matt

He

igh

t (i

n.)

• Mean = 70.8

•Ben is 66 in. tall. His deviation from the mean is -4.8.

•James is 75 in. tall. His deviation from the mean is 4.2

Page 4: Lecture 4 Variability: Standard Deviation. Variability Reminder - How spread out the scores are…Range - How does the range of each of these distributions

How much scores typically vary around the mean; a measure of dispersion

Usually 1/5 - 1/6 of the range Based on the mean, therefore:

– Requires at least interval data– Sensitive to outliers– accounts for all

scores in a distribution

Standard Deviation

f

1 2 3 4 5 6 7 98

M

Page 5: Lecture 4 Variability: Standard Deviation. Variability Reminder - How spread out the scores are…Range - How does the range of each of these distributions

Logic of the Standard Deviation:Let’s start by looking at the population Step 1: Find the Deviation for each

score from the mean. X - . Be sure to include both the sign (+/-) and the number. X X -

65 -1490 +1184 +576 -381 +298 +1982 +356 -23

= 79 0

* Notice that the sum of the deviations = 0. This reflects the fact that the mean is a balancing point

* Bonus - you can use this fact to check yourselves

Page 6: Lecture 4 Variability: Standard Deviation. Variability Reminder - How spread out the scores are…Range - How does the range of each of these distributions

Step 2 - Remember the standard deviation is the average of the deviations, but this won’t work because the sum of our deviations = 0– Solution = get rid of the signs (+/-)– Square each score

Square of each score and sum them = Sum of Squared Deviations

= SS

X X - (X – )265 -14.4 207.490 10.6 112.484 4.6 21.276 -3.4 11.681 1.6 2.698 18.6 346.082 2.6 6.859 -20.4 416.2

X = 79.4 0 1123.9 * Sum of Squared Deviations = SS

Page 7: Lecture 4 Variability: Standard Deviation. Variability Reminder - How spread out the scores are…Range - How does the range of each of these distributions

Step 3 - Calculate the mean squared deviation = SS / N

This value is called the variance and is represented with the symbol MS or 2 .

Variance will be important for use in inferential stats methods, but it isn’t the best descriptive stat.

-- it’s hard to visualize variability with

the variance alone.

X X - (X – )265 -14.4 207.490 10.6 112.484 4.6 21.276 -3.4 11.681 1.6 2.698 18.6 346.082 2.6 6.859 -20.4 416.2

X = 79.4 0 1123.9

MS = 1123.9 / 8 = 140.5

* Sum of Squared Deviations = SS

Page 8: Lecture 4 Variability: Standard Deviation. Variability Reminder - How spread out the scores are…Range - How does the range of each of these distributions

Step 4: Correct for having squared all the deviations because we want a value that easily corresponds to the mean that we can visualize:– Standard deviation = variance

X - (X – )2207.4112.4

346.0

416.2

1123.9

X65 -14.490 10.684 4.6 21.276 -3.4 11.681 1.6 2.698 18.682 2.6 6.859 -20.4

X = 79.4 0

140.5 = 11.9Standard deviation = the square root of the mean squared deviation

Conceptually the average distance from the mean: on average a random point pulled from this distribution will be 11.9 away from the mean.

Page 9: Lecture 4 Variability: Standard Deviation. Variability Reminder - How spread out the scores are…Range - How does the range of each of these distributions

Putting it Together

X - (X – )2207.4112.4

346.0

416.2

1123.9

X65 -14.490 10.684 4.6 21.276 -3.4 11.681 1.6 2.698 18.682 2.6 6.859 -20.4

X = 79.4 0

= 11.9 What can we say about a score that lies 12 points from the mean, 91 points?

What about a score that lies 30 points from the mean, 49 points?

Page 10: Lecture 4 Variability: Standard Deviation. Variability Reminder - How spread out the scores are…Range - How does the range of each of these distributions

REVIEW: variance = mean squared deviation = greek lower case letter sigma 2 = SS / N

Standard deviation = = SS/ N Computing SS:

– Definitional formula: SS = (X - )2

Shows exactly how scores vary about the mean (like we just did). Works best on whole numbers.

– Computational formula: SS = X2 - [ (X)2 / N]

Easier for calculations because it works directly with the scores, but less intuitive about the mean.

Population Standard Deviation

Page 11: Lecture 4 Variability: Standard Deviation. Variability Reminder - How spread out the scores are…Range - How does the range of each of these distributions

Formulas for Pop. SD and Variance

Variance = SS / N (mean squared deviation)

Standard deviation = SS/N

Denoted by Greek letters and 2

Page 12: Lecture 4 Variability: Standard Deviation. Variability Reminder - How spread out the scores are…Range - How does the range of each of these distributions

Let’s Do It TogetherX X - (X - )2 X2 (X)2 2 24

24

28

32

33

48

64

42

38

67

55

455

-17.4

-17.4

-13.4

-9.4

-8.4

6.6

22.6

0.6

-3.4

25.6

13.6

0

302.8

302.8

179.6

88.4

70.6

43.6

510.8

.36

11.6

655.4

185

2351

576

576

784

1024

1089

2304

4096

1764

1444

4489

3025

21171

207025 213.7 14.6

Definitional:SS = (X - )2

Computational:

SS = X2 - [ (X)2 / N]

Page 13: Lecture 4 Variability: Standard Deviation. Variability Reminder - How spread out the scores are…Range - How does the range of each of these distributions

Another Example… Find for the following sets of numbers X = 1, 7, 7, 9 X = 1, 6, 1, 1, 1, 1

X X2 (X)2 2 10

15

17

21

24

31

13

Definitional:SS = (X - )2

Computational:

SS = X2 - [ (X)2 / N]

Page 14: Lecture 4 Variability: Standard Deviation. Variability Reminder - How spread out the scores are…Range - How does the range of each of these distributions

Samples vs. Populations Rationale: Inferential statistics rely on

samples to draw general conclusions about the population.– PROBLEM - sample variability tends to be

less than population variability.– Thus, this variability is biased. That is, it

underestimates the pop. variability. pop. variability

xx

xx

xx

sample variability

Page 15: Lecture 4 Variability: Standard Deviation. Variability Reminder - How spread out the scores are…Range - How does the range of each of these distributions

Terms Biased - a sample statistic is said to be

biased if on the average the sample statistic consistently underestimates or overestimates the population parameter.

Unbiased - a sample statistic is said to be unbiased if on average the sample statistics is equal to the population parameter

Page 16: Lecture 4 Variability: Standard Deviation. Variability Reminder - How spread out the scores are…Range - How does the range of each of these distributions

An Analogy for a Biased Stat Imagine you were interested in studying

learning in elementary school children.– What if you chose as your sample child

geniuses from computer and science camp?

– Could you generalize from your sample to the population of elementary school children?

A sample statistic for SD will be biased even with a representative sample - We have to perform a correction

Page 17: Lecture 4 Variability: Standard Deviation. Variability Reminder - How spread out the scores are…Range - How does the range of each of these distributions

Samples: s and Changes in notation to reflect a sample:

– So to calculate SS (same as for pop.):• (1) Find deviation: X - M• (2) Squared each deviation: (X - M)2

• (3) Sum squared devations: SS = (X - M)2

Correcting for the bias is done in the calculation for the mean square deviation or variance:– Sample variance - s2 = SS / (n - 1)– Sample standard deviation = s = SS / (n - 1)

or s = s2

Page 18: Lecture 4 Variability: Standard Deviation. Variability Reminder - How spread out the scores are…Range - How does the range of each of these distributions

Let’s Do it TogetherXf

1 2 3 4 5 6 7 98

X X2

4

5

6

6

6

7

7

7

8

8

8

8

9

9

98

16

25

36

36

36

49

49

49

64

64

64

64

81

81

714

The smallest distance from the mean is 1 and the largest distance is 3, so the SD should be somewhere in between.

SS = 714 - (982 / 14) = 28

* NOTE: do not correct for bias in SS

S2 or MS = SS / (n-1)

S2 or MS = 28 / 13 = 2.2

S = 2.2 = 1.5

SS = X2 - [ (X)2 / n]

Page 19: Lecture 4 Variability: Standard Deviation. Variability Reminder - How spread out the scores are…Range - How does the range of each of these distributions

Start Easy: Find s

X = 5, 1, 5, 5

X = 1, 7, 1, 1

•NOTE: do not correct for bias in SS

S2 or MS = SS / (n-1)

S = S2

SS = X2 - [ (X)2 / n]

Page 20: Lecture 4 Variability: Standard Deviation. Variability Reminder - How spread out the scores are…Range - How does the range of each of these distributions

A little more complexX X^2

322.84336.63368.80276.84512.20285.05239.68262.86302.13300.12326.62257.65429.81291.71263.15323.49

SS = X2 - [ (X)2 / n]

MS or S2 = SS / n-1

s = SS / (n - 1)

104223.10113319.68136011.7376638.36

262348.6881251.0757446.6369094.1291283.7290071.31

106683.0666383.09

184733.6685093.6669247.72

104644.41

5099.6 1698474.01

SS = 1698474.01 - (26005920.2 / 16)

MS = 73104 / 15

s = 69.8

Page 21: Lecture 4 Variability: Standard Deviation. Variability Reminder - How spread out the scores are…Range - How does the range of each of these distributions

Sample Variability and Degrees of Freedom: Why do we correct with n-1?

(1) the deviations computed from a sample are not “real” deviations.

Sampling error - sample and pop. are close, but not exact. SS is smaller for the sample - math. proof Using a sample mean places a restriction on the variability

X X - (X - )2 X X - M (X - M)2 12

8

10

+4

0

+2

16

0

4SS = 17

Where = 8

12

8

10

+2

+2

+2

4

4

4

SS = 12Where M = 10

Page 22: Lecture 4 Variability: Standard Deviation. Variability Reminder - How spread out the scores are…Range - How does the range of each of these distributions

More about n -1 Sample mean is known before

deviations and SS can be computed.

Sample of n=3 with a M=10. Therefore, as soon as the first two values are given X = 12, 8 you know the last value is 10.

n-1 scores can vary; the last score is not free to vary

X X - (X - )2 X X - M (X - M)2 12

8

10

+4

0

+2

16

0

4SS = 17

Where = 8

12

8

10

+2

+2

+2

4

4

4

SS = 12Where M = 10

Page 23: Lecture 4 Variability: Standard Deviation. Variability Reminder - How spread out the scores are…Range - How does the range of each of these distributions

Degrees of Freedom df commonly encountered as n - 1, where n is

the number of scores in the sample Refers to the number of scores in a distribution

that are free to vary once the M & n are set

Example{5, 10, 15}; n = 3; M = 10

How many scores could you change and still

have n = 3 & M = 10?

n = 1 or 2

So, s2 = SS / n-1 = SS / df

Page 24: Lecture 4 Variability: Standard Deviation. Variability Reminder - How spread out the scores are…Range - How does the range of each of these distributions

Cafeteria degrees of freedom: An analogy

You are 4th in line at the cafeteria to choose your dessert. The choices are a cheesecake, a piece of fruit, pumpkin pie, and a stale cookie.– The first person chooses the cheescake– Next to go is the apple– Then the pumpkin pie– The last choice is restricted and can’t vary.

You are stuck with the stale cookie

Page 25: Lecture 4 Variability: Standard Deviation. Variability Reminder - How spread out the scores are…Range - How does the range of each of these distributions

Degrees of Freedom Why n - 1?

– Because you are estimating the from M. Once this is done, the estimate is fixed & cannot be changed. Therefore, you can only vary N - 1 scores with this fixed value

This is the case whenever we are estimating a parameter from a statistic.

Page 26: Lecture 4 Variability: Standard Deviation. Variability Reminder - How spread out the scores are…Range - How does the range of each of these distributions

A little more about biased stats Population N=6 (0, 0, 3, 3, 9, 9) = 4, 2 =14 Take all possible n = 2 samples

Biased variance unbiased varianceSample First score Second score Mean n n-1

1 0 0 0 0 02 0 3 1.5 2.25 4.53 0 9 4.5 20.25 40.54 3 0 1.5 2.25 4.55 3 3 3 0 06 3 9 6 9 187 9 0 4.5 20.25 40.58 9 3 6 9 189 9 9 9 0 0

36 63 126

Page 27: Lecture 4 Variability: Standard Deviation. Variability Reminder - How spread out the scores are…Range - How does the range of each of these distributions

Properties of the Standard Deviation

Distribution:– Homogeneous sample: data values are

very similar = small s2 and s.– Heterogeneous sample: data values are

dissimilar = big s2 and s.

Helps make predictions about the amount of error in your sample. How close is your sample to the population

Page 28: Lecture 4 Variability: Standard Deviation. Variability Reminder - How spread out the scores are…Range - How does the range of each of these distributions

Properties of the Standard Deviation Transforming scores:

Adding or subtracting a constant does not change the SD

f

1 2 3 4 5 6 7 98 3 4 5 6 7 8 9 1311

Another way to determine if the SD is affected by a constant is to pick any two scores and calculate the distance between the two both before and after the constant

e.g. you and a friend compare scores on an exam your friend earned a 85 and you earned a 90. Later you find out that a 5 point curve was added to everyone’s score.

Page 29: Lecture 4 Variability: Standard Deviation. Variability Reminder - How spread out the scores are…Range - How does the range of each of these distributions

Properties of the Standard Deviation Transforming scores:

Multiplying or dividing by a constant changes SD by that amount

f

Another way to determine if the SD is affected by a constant is to pick any two scores and calculate the distance between the two both before and after the constant

1 2 3 4

f

10 20 30 40

1 10

Page 30: Lecture 4 Variability: Standard Deviation. Variability Reminder - How spread out the scores are…Range - How does the range of each of these distributions

Factors that affect Variability Extreme Scores:

– Range is most affected– SD and variance somewhat affected– SIR not affected

Sample Size:– Range is directly related to sample size.

This is unacceptable.– SD, variance, and SIR unaffected by

sample size Open-ended Distributions:

– Cannot computer range, SD, or variance– SIR is your only option

Page 31: Lecture 4 Variability: Standard Deviation. Variability Reminder - How spread out the scores are…Range - How does the range of each of these distributions

Relationship with other Statistics SD is derived using information about

the mean (distances) - the two go hand-in-hand

Interquartile range (& SIR) are based on percentiles, so is the median (mdn is 50th percentile)

Range has no direct relationship with any other statistical measures

Page 32: Lecture 4 Variability: Standard Deviation. Variability Reminder - How spread out the scores are…Range - How does the range of each of these distributions

Why we need to know this information Variability influences how easy it is to

see patterns in our data….

Estimate M for each sample

Sample 1 Sample 2

X

34

35

36

35

X

26

10

64

40

Page 33: Lecture 4 Variability: Standard Deviation. Variability Reminder - How spread out the scores are…Range - How does the range of each of these distributions

Why we need to know this information Keep the goal in mind:

– Research uses samples to deduce information about the population

– Consider the data from two experiments and determine whether or not there appears to be a consistent difference

f

Talk therapy = M = 20

Meditation = M = 40

5 10 15 20 25 30 35 40 45 50 60

f

5 10 15 20 25 30 35 40 45 50 60

Experiment 1 Experiment 2

Page 34: Lecture 4 Variability: Standard Deviation. Variability Reminder - How spread out the scores are…Range - How does the range of each of these distributions

Graphical Representation of

f

1 2 3 4 5 6 7 98

=1.58

Page 35: Lecture 4 Variability: Standard Deviation. Variability Reminder - How spread out the scores are…Range - How does the range of each of these distributions

Graphic Representation - Box Plots Also called box-and-whisker plots Useful for

– comparing distributions– displaying variability

Box defines the interquartile range– Top line defines the third quartile– Bottom line defines the first quartile

Whiskers extend out to the highest and lowest scores

Median is often displayed by a line

Page 36: Lecture 4 Variability: Standard Deviation. Variability Reminder - How spread out the scores are…Range - How does the range of each of these distributions

Graphic Representation - Boxplots

Page 37: Lecture 4 Variability: Standard Deviation. Variability Reminder - How spread out the scores are…Range - How does the range of each of these distributions

Pearson’s Coefficient of Skew Pearson’s coefficient of skew tells us if a distribution

is positive or negatively skewed and how much (+/- 0.5 is approximately symmetric/normal)

s3 = [3(M - mdn)] / s

M = 20, s = 5, md = 24

s3 = [3(20 - 24)] / 5 s3 = -2.4

Negatively skewed

Page 38: Lecture 4 Variability: Standard Deviation. Variability Reminder - How spread out the scores are…Range - How does the range of each of these distributions

Try one M = 50, Mdn = 30, s = 7

s3 = [3(M - mdn)] / s

Page 39: Lecture 4 Variability: Standard Deviation. Variability Reminder - How spread out the scores are…Range - How does the range of each of these distributions

X

1

2

3

4

5

6

7

8

9

10

11

12

13

f

1

1

1

1

1

2

4

5

6

9

11

6

2

Putting it all together…

Find Pearson’s coefficient of skew

s3 = [3(M - mdn)] / s

For this table s = 2.74

Page 40: Lecture 4 Variability: Standard Deviation. Variability Reminder - How spread out the scores are…Range - How does the range of each of these distributions

Homework: Chapter 4

1, 3, 4, 6, 8, 11, 12, 14, 19, 20, 23, 24, 25

Read IN THE LITERATURE pg 122-123.

Skim Chapter 6 pages 161 - 166; section on Probability.

** BRING YOUR TEXT BOOKS TO CLASS TOMORROW**