biderman's psychology 201 handouts · web viewpsy 201 lecture notes measures of variability...

30
PSY 201 Lecture Notes Measures of Variability and Shape Variability Variability refers to differences between score values The larger the differences, the greater the variability Example of low variability: Costs of year old Toyota Camrys in $1000s in a small town 25 27 25 26 26 26 27 Example of large variability: Costs of year old Toyota Camrys in $1000s in a large city 26 24 28 22 30 32 20 Dot plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 25 30 35 40 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 25 30 35 40 After examination of the numbers, we can see that there are bigger differences between the 2 nd than between the 1 st . How should those differences be summarized? The possible measures: 1. Range: Difference between largest score and smallest score. 2. Interquartile Range: Difference between the score at the third quartile and score at the 1 st quartile. 3. Variance Biderman’s 201 Handouts Topic 4 (Numeric Measures II) -14 5/1/2022 Small town Big city prices

Upload: others

Post on 14-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Biderman's Psychology 201 Handouts · Web viewPSY 201 Lecture Notes Measures of Variability and Shape Variability ‘ Variability refers to differences between score valuesThe larger

PSY 201 Lecture Notes Measures of Variability and Shape

Variability ‘

Variability refers to differences between score values

The larger the differences, the greater the variability

Example of low variability: Costs of year old Toyota Camrys in $1000s in a small town25 27 25 26 26 26 27

Example of large variability: Costs of year old Toyota Camrys in $1000s in a large city 26 24 28 22 30 32 20

Dot plots

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 25 30 35 40

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 25 30 35 40

After examination of the numbers, we can see that there are bigger differences between the 2nd than between the 1st. How should those differences be summarized?

The possible measures:

1. Range: Difference between largest score and smallest score.2. Interquartile Range: Difference between the score at the third quartile and score at the 1st quartile.3. Variance4. Standard Deviation

Biderman’s 201 Handouts Topic 4 (Numeric Measures II) -14 5/22/2023

Small town prices

Big city prices

Page 2: Biderman's Psychology 201 Handouts · Web viewPSY 201 Lecture Notes Measures of Variability and Shape Variability ‘ Variability refers to differences between score valuesThe larger

The Range

Range = Largest value in the collection minus smallest value.

Problems with the Range

1. Quite variable from sample to sample, even if all samples are from the same population.How ironic that a measure of variability would be too variable.

2. May be restricted by ceiling or floor of the scale.

Much psychological measurement comes from scales to which persons respond on a 1-5 scale, often labeled 1=Strongly disagree, 2=Disagree, 3=Neutral, 4=Agree, 5=Strongly Agree.

“I support Richard M. Nixon.” in 1950s “I support Richard M. Nixon.” in 1970sValue Freq Value Freq5 1 5 1004 3 4 503 100 3 202 3 2 501 1 1 100

Range: 5-1=4 Range: 5-1=4

So the range is not generally useful, although it is often reported.

Biderman’s 201 Handouts Topic 4 (Numeric Measures II) -14 5/22/2023

Page 3: Biderman's Psychology 201 Handouts · Web viewPSY 201 Lecture Notes Measures of Variability and Shape Variability ‘ Variability refers to differences between score valuesThe larger

The Interquartile Range

Quartiles: Points identifying "quarters" of a distribution.

Conceptual Definitions

Q4 Fourth Quartile The value below which 4/4th's of the distribution falls.

Q3 Third Quartile The value below which 3/4ths of the distribution falls.

Q2 Second Quartile The value below which 2/4ths of the distribution falls.

Q1 First Quartile The value below which 1/4th of the distribution falls.

Q0 "Zeroth" Quartile The value below which 0/4th's of the distribution falls.

Operational Definitions

Q4 The largest score value in the distribution.

Q3 The median of the scores in the upper half of the distribution. (If N is odd, include the overall median in the upper half.)

Q2 The overall median of the collection. Compute using the median formula.

Q1 The median of the scores in the lower half of the distribution.. (If N is odd, include the overall median in the lower half.)

Q0 The smallest score value in the distribution.

Interquartile Range: The distance (on the number line) between the Q1 and Q3 - between the first quartile and the third quartile.

IQR = Q3 - Q1

InterpretationThe distance or interval size required to contain the middle 50% of the scores.

If the middle 50% is contained in a small area, the distribution is quite "crowded" - the scores are close to each other; the distribution has little variability.

If the middle 50% is contained in a wide area, the distribution is sparse - the scores are far from either other; the distribution has much variability.

Biderman’s 201 Handouts Topic 4 (Numeric Measures II) -14 5/22/2023

Page 4: Biderman's Psychology 201 Handouts · Web viewPSY 201 Lecture Notes Measures of Variability and Shape Variability ‘ Variability refers to differences between score valuesThe larger

Example - A distribution with an even number of scores. (But you don’t have to compute by hand.)

75 65 50 45 40 40 35 35 30 30 30 25 25 10

Upper half of distribution

So the interquartile range (IQR) for this distribution is 45 – 30 = 15.

Example - A distribution with an odd number of scores.

Upper half of distributionLower half of distribution

Note that 35, the overall median is included in both the lower and upper halves.

65 50 45 40 35 35 30 25 25 20 15

So, the interquartile range (IQR) for this distribution is 42.5 – 25 = 17.5

Biderman’s 201 Handouts Topic 4 (Numeric Measures II) -14 5/22/2023

Page 5: Biderman's Psychology 201 Handouts · Web viewPSY 201 Lecture Notes Measures of Variability and Shape Variability ‘ Variability refers to differences between score valuesThe larger

Comparing Variability of ACTComp scores of Females and Males

Interestingly, Q1, Q2, and Q3 are identical for Males and Females in this sample.The interquartile range for each distribution is 25 – 19 = 6 for both distributions. Go figure.Males and females are about equally variable on most characteristics.

Showing the interquartile range in a boxplot.

The IQR is simply the distance between the top and bottom of the box.

Biderman’s 201 Handouts Topic 4 (Numeric Measures II) -14 5/22/2023

Q3Q2Q1

Females

Males

Page 6: Biderman's Psychology 201 Handouts · Web viewPSY 201 Lecture Notes Measures of Variability and Shape Variability ‘ Variability refers to differences between score valuesThe larger

The Variance

The variance is the “average” of the squared differences of the scores from the mean.

Group Symbol Formula

Population σ2 (sigma squared) Σ(X-µ)2 / N So it’s a mean square.

Sample S2 or s2 (ess squared)( Σ(X-X-bar)2 / (N-1)

Note that the formula for the sample variance is different from the formula for the population variance.

The sample variance requires dividing the sum of squared differences by N-1, not N.

For this reason, the sample variance is almost the average of squared differences when computed from a sample.

It is exactly the average when computed from a population.

Hence the quotes around average in the definition above.

Biderman’s 201 Handouts Topic 4 (Numeric Measures II) -14 5/22/2023

Page 7: Biderman's Psychology 201 Handouts · Web viewPSY 201 Lecture Notes Measures of Variability and Shape Variability ‘ Variability refers to differences between score valuesThe larger

Computing the Variance using paper and pencilThe first set of Camry prices above

X Mean X-Mean (X-Mean)2

25 26 -1 127 26 1 1 Σ(X-X-bar)2 (called SS by Corty)25 26 -1 126 26 0 0 26 26 0 026 26 0 027 26 1 1

Sum of squared differences (Σ(X-X-bar)2) = 4Population Variance = Σ(X-µ)2 / N = 4/7 = .57 We will never compute a population variance.Sample Variance = Σ(X-X-bar)2 / (N-1) = 4/6 = .67 (Note – this is what SPSS computes.)

Graphical representation of the (X-Mean) differences for the first set of prices. O O O O O O O -------------------------------------------------------------| | | | | | | | | | | | |20 21 22 23 24 25 26 27 28 29 30 31 32

Now the second set of Camry Prices

X Mean X-Mean (X-Mean)2

26 26 0 024 26 -2 428 26 2 422 26 -4 1630 26 4 1632 26 6 3620 26 -6 36

Sum of squared differences = 112Population Variance = Σ(X-µ)2 / N =112/7 = 16. Sample Variance = Σ(X-X-bar)2 / (N-1) = 112/6 = 18.67

Graphical representation of the (X-Mean) differences for the second set of prices

O O O O O O O -------------------------------------------------------------| | | | | | | | | | | | |20 21 22 23 24 25 26 27 28 29 30 31 32

Biderman’s 201 Handouts Topic 4 (Numeric Measures II) -14 5/22/2023

4

112

Page 8: Biderman's Psychology 201 Handouts · Web viewPSY 201 Lecture Notes Measures of Variability and Shape Variability ‘ Variability refers to differences between score valuesThe larger

What’s good and what’s bad about the variance

What’s good

1) The variance is “connected” to every score in the collection. This is generally regarded as a plus – changing the value of ANY score will change the variance.

2) The variance has good lineage – it’s part of the formula for the Normal Distribution.

3) The variance is a key quantity in many inferential statistics.

What’s bad

1) The variance is in squared units. So its value is not easily related to the individual score values.

So the variance is not a good DESCRIPTIVE measure of variability.

Computing the Variance using SPSSEnter the data

Analyze -> Descriptives

Choose all the Options you wish

Biderman’s 201 Handouts Topic 4 (Numeric Measures II) -14 5/22/2023

Page 9: Biderman's Psychology 201 Handouts · Web viewPSY 201 Lecture Notes Measures of Variability and Shape Variability ‘ Variability refers to differences between score valuesThe larger

The Standard Deviation

The standard deviation is the square root of the variance.

Group Symbol Formula

Population σ (sigma) Σ(X-µ)2 / N

Sample S Σ(X-X-bar)2 / (N-1)

Note that as was the case for the variance the formula for the sample standard deviation is different from the formula for the population standard deviation. The sample standard deviation requires dividing the sum of squared differences by N-1, not N.

FYI – Most computer programs automatically compute the “dividing by N-1” standard deviation. This is what SPSS does.

Biderman’s 201 Handouts Topic 4 (Numeric Measures II) -14 5/22/2023

Memorize this formula

Page 10: Biderman's Psychology 201 Handouts · Web viewPSY 201 Lecture Notes Measures of Variability and Shape Variability ‘ Variability refers to differences between score valuesThe larger

Computing the Standard Deviation using paper and pencil

Computation of the standard deviation involves1) computing the variance,2) taking the square root of the variance.

Consider the first set of Camry prices above

X X-bar X-X-bar (X-X-bar)2

25 26 -1 127 26 1 125 26 -1 126 26 0 026 26 0 026 26 0 027 26 1 1

Sum of squared differences = 4Population VarianceN = 4/7 = .57Sample VarianceN-1 = 4/6 = .67

Population standard deviation = sqrt(.57) = .75Sample standard deviation = sqrt(.67) = .82

Now the second set of Camry Prices – the big city prices

X X-bar X-X-bar (X-X-bar)2

26 26 0 024 26 -2 428 26 2 422 26 -4 1630 26 4 1632 26 6 3620 26 -6 36

Sum of squared differences = 112Population VarianceN = 112/7 = 16Sample VarianceN-1 = 112/6 = 18.67

Population standard deviation = sqrt(16) = 4.00Sample standard deviation = sqrt (18.67) = 4.32

Biderman’s 201 Handouts Topic 4 (Numeric Measures II) -14 5/22/2023

We did this above computing the variance.

New stuff

We did this above computing the variance.

New stuff

Page 11: Biderman's Psychology 201 Handouts · Web viewPSY 201 Lecture Notes Measures of Variability and Shape Variability ‘ Variability refers to differences between score valuesThe larger

What’s good and what’s bad about the standard deviation

What’s good:

1) Connected to every score; 2) Good lineage;3) Fits the data – the values of the standard deviation make sense.

What’s bad:

1) Inflated by skewness, outliers2) What does the standard deviation mean????

Unfortunately, there is no simple, easy to digest, description of what the standard deviation represents.

If anything, it might be thought of as the “average” of the differences of the scores from the mean. If someone not familiar with statistics asks me what it is, that’s what I tell them.

But, lack of an interpretation of the number doesn’t prevent us from using that number.

Computing the Standard Deviation using SPSSEnter the data

Analyze -> Descriptives

Choose the Options you wish to have SPSS compute

Biderman’s 201 Handouts Topic 4 (Numeric Measures II) -14 5/22/2023

Page 12: Biderman's Psychology 201 Handouts · Web viewPSY 201 Lecture Notes Measures of Variability and Shape Variability ‘ Variability refers to differences between score valuesThe larger

Computing the Standard Deviation using ExcelEnter the data into Excel

Highlight a cell.Formula -> More Functions -> Statisticsl -> STDEV.S

Put the coordinates of the cells containing the data into one of the fields.

Marvel at your handiwork

Biderman’s 201 Handouts Topic 4 (Numeric Measures II) -14 5/22/2023

Page 13: Biderman's Psychology 201 Handouts · Web viewPSY 201 Lecture Notes Measures of Variability and Shape Variability ‘ Variability refers to differences between score valuesThe larger

Two key facts about the standard deviation

1. For large (N bigger than 30) unimodal, symmetric (US) distributions, with no outliers . . .About 2/3 of the scores will be within one standard deviation of the mean, that is, between Mean-1SD and Mean+1SD.

2. For large unimodal, symmetric (US) distributions, with no outliers . . .About 95% of the scores will be within two standard deviations of the mean, that is, between Mean–2SD and Mean+2SD.

This means that if you know three things about the distribution: 1) that it’s unimodal and symmetric, 2) the mean, and 3) the standard deviation, you can tell pretty much how an individual score placed in that distribution.

For example, Joe scores two standard deviations above the mean on a test.

What percent of the persons taking the test scored worse than Joe?Fact 2 above says that 95% of the scores are below Joe’s.

And of the remaining 5%, ½ of that, or 2 ½ % would be in the left hand tail of the distribution, way below Joe’s score and the other 2 ½% would be in the upper tail, above Joe’s score.

So the answer is approximately 97.5% of the scores would be below Joe’s.

TEST QUESTION.

Biderman’s 201 Handouts Topic 4 (Numeric Measures II) -14 5/22/2023

Mean - SD Mean + SDMean

<------About 2/3 of scores ----->

Mean Mean + SDMean - SD

<------95% of scores ----->

Mean + 2SDMean - 2SD

2 ½% 2 ½%95%

Joe

Page 14: Biderman's Psychology 201 Handouts · Web viewPSY 201 Lecture Notes Measures of Variability and Shape Variability ‘ Variability refers to differences between score valuesThe larger

Why do we care about Variability

1. It may be important to identify situations for which variability of opinion is high.

Example: Attitudes toward abortion.

Example: Attitudes toward parking meters in the Fort Wood area – large variability from ++ to --.

2. There may be situations in which it is important to have low (or high) variability.

For example, most teachers prefer low variability of entering ability when teaching classes such as this.

If variability is low, this means that the teacher’s presentation will likely be understood by everyone, if it’s chosen appropriately.

If variability is high, some may not understand and some may be bored.

The average depth of a river you must cross is 3’. If the standard deviation of depths is 6”, you’re OK. But if the standard deviation of depths is 2’, then it’s quite likely you’ll hit a hole that it 7’ deep.

3. Variability is part of individual differences that must be explained by psychology.

There is variability in almost every human characteristic. The discovery of explanations for that variability occupies much of the time of research psychologists.

Example

We measure Extraversion. Some people score high – they’re always “on” at parties and functions. Others score low – they’d rather be alone. Why?

If you’re an introvert, see the Susan Cain You-Tube TED Talk on introverts.

By the way, Susan Cain has a best-selling book about introverts.

4. Variability, as measured by the standard deviation, is used assess the size of differences in means.

Conventions concerning the size of mean differences

.2 Standard deviations = Small difference

.5 Standard deviations = Medium difference

.8 Standard deviations = Large differenceBiderman’s 201 Handouts Topic 4 (Numeric Measures II) -14 5/22/2023

Page 15: Biderman's Psychology 201 Handouts · Web viewPSY 201 Lecture Notes Measures of Variability and Shape Variability ‘ Variability refers to differences between score valuesThe larger

Measures of distribution shape

Measures of skewness

A popular measure of skewness is the following, given by

Kirk, R. (1999). Statistics: An introduction. 4th Ed. New York: Harcourt Brace.

Skewness = (Σ(X-Mean)3 / N ) / S3

In English: The sum of the cubed deviations of scores from the mean divided by N, then divided by the cube of the standard deviation.

Interpretation of values

Value of Skewness measure Interpretaton

Larger than 0 Positively skewed distribution

0 Symmetric distribution

Less than 0 Negatively skewed distribution

Biderman’s 201 Handouts Topic 4 (Numeric Measures II) -14 5/22/2023

Page 16: Biderman's Psychology 201 Handouts · Web viewPSY 201 Lecture Notes Measures of Variability and Shape Variability ‘ Variability refers to differences between score valuesThe larger

Example of the skewness statistic

1. Salaries from the Employee Data file. 2. Extroversion scores of 109 UTC studentsStatistics

s alary Current Salary474

0

2.125

.112

Val id

Mis s ing

N

Sk ewnes s

Std. Error o f Sk ewnes s

$0$20,000

$40,000$60,000

$80,000$100,000

$120,000$140,000

Current Salary

0

20

40

60

80

100

120

Freq

uenc

y

Mean = $34,419.57Std. Dev. = $17,075.661N = 474

Histogram

Biderman’s 201 Handouts Topic 4 (Numeric Measures II) -14 5/22/2023

Statistics

hex t109

1

-.220

.231

Va l id

Mis s ing

N

Sk ewnes s

Std. Error o f Sk ewnes s

0.00 2.00 4.00 6.00 8.00

hext

0

2

4

6

8

10

12

14

Freq

uenc

y

Mean = 4.4582Std. Dev. = 0.95104N = 109

Histogram

Page 17: Biderman's Psychology 201 Handouts · Web viewPSY 201 Lecture Notes Measures of Variability and Shape Variability ‘ Variability refers to differences between score valuesThe larger

Kurtosis

Kurtosis refers to the relationship of the shape of a distribution to the shape of the Normal Distribution.

Kirk gives the following measure of Kurtosis

Kurtosis = ( (Σ(X-Mean)4 / N ) / S4 ) - 3

In English: The sum of the deviations of scores from the mean raised to the fourth power divided by N, then divided by the standard deviation raised to the fourth power minus 3.

Interpretation

Value of Kurtosis measure Interpretaton

Larger than 0 More peaked than the Normal distribution

0 Same peakedness as the Normal distribution.

Less than 0 Less peaked (flatter) than the Normal distribution.

Biderman’s 201 Handouts Topic 4 (Numeric Measures II) -14 5/22/2023

Page 18: Biderman's Psychology 201 Handouts · Web viewPSY 201 Lecture Notes Measures of Variability and Shape Variability ‘ Variability refers to differences between score valuesThe larger

Examples of two distributions with different Kurtosis.

1. Scores of 1000 values from a uniform distribution.

According to the Kurtosis measure the distribution is less peaked – flatter - than the Normal Distribution.

2. Conscientiousness scores of 547 UTC students . . .

The Conscientiousness scores are slightly more peaked than the normal distribution.

Biderman’s 201 Handouts Topic 4 (Numeric Measures II) -14 5/22/2023