numeric summaries and descriptive statistics. populations vs. samples we want to describe both...

Post on 05-Jan-2016

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

populations vs. samples

• we want to describe both samples and populations

• the latter is a matter of inference…

“outliers”

• minority cases, so different from the majority that they merit separate consideration– are they errors?– are they indicative of a different pattern?

• think about possible outliers with care, but beware of mechanical treatments…

• significance of outliers depends on your research interests

summaries of distributions

• graphic vs. numeric– graphic may be better for visualization– numeric are better for statistical/inferential

purposes

• resistance to outliers is usually an advantage in either case

general characteristics

• kurtosis

-5 50.00

0.22

-5 5D

0.0

0.4X

-5 5D

0.0

0.8

X

‘leptokurtic’ ’platykurtic’

[“peakedness”]

0.0 0.2 0.4 0.6 0.8 1.0 1.2D

0

1

2

3

4

5X

right(positive)

skew

0.0 0.2 0.4 0.6 0.8 1.0 1.2D

0

1

2

3

4

5

X

left(negative)

skew

• skew (skewness)

central tendency

• measures of central tendency– provide a sense of the value expressed by

multiple cases, over all…

• mean

• median

• mode

mean

• center of gravity

• evenly partitions the sum of all measurement among all cases; average of all measures

n

xx

n

ii

1

• crucial for inferential statistics

• mean is not very resistant to outliers

• a “trimmed mean” may be better for descriptive purposes

mean – pro and con

meanrim diameter (cm)

unit 1 unit 212.6 16.211.6 16.416.3 13.813.1 13.212.1 11.326.9 14.09.7 9.0

11.5 12.514.8 15.613.5 11.212.4 12.213.6 15.5

11.7

n 12 13total 168.1 172.6total/n 14.0 13.3

unit 1 unit 29 26

252423222120191817

3 16 2415 56

14.0== 8 14 0651 13 28 ==13.3641 12 25

65 11 23710

7 9 0

R: mean(x)

trimmed meanrim diameter (cm)

unit 1 unit 29.7 9.0

11.5 11.211.6 11.312.1 11.712.4 12.212.6 12.513.1 13.213.5 13.813.6 14.014.8 15.516.3 15.626.9 16.2

16.4

n 10 11total 131.5 147.2total/n 13.2 13.4

unit 1 unit 29 26

252423222120191817

3 16 2415 56

8 14 013.2== 651 13 28 ==13.4

641 12 2565 11 237

107 9 0

R: mean(x, trim=.1)

median

• 50th percentile…

• less useful for inferential purposes

• more resistant to effects of outliers…

median

rim diameter (cm)

unit 1 unit 29.7 9.0

11.5 11.211.6 11.312.1 11.712.4 12.212.6 12.5

12.9 <-- 13.2 13.213.1 13.813.5 14.013.6 15.514.8 15.616.3 16.226.9 16.4

unit 1 unit 29 26

252423222120191817

3 16 2415 56

8 14 0651 13 28 ==13.20

12.85== 641 12 2565 11 237

107 9 0

mode

• the most numerous category• for ratio data, often implies that data have

been grouped in some way• can be more or less created by the grouping

procedure• for theoretical distributions—simply the

location of the peak on the frequency distribution

isol

ated

sca

tter

s

ham

lets

vill

ages

regi

onal

cen

ters

regi

onal

cen

ters

modal class = ‘hamlets’

-5 50.00

0.22

1.0 1.5 2.0 2.5

dispersion

• measures of dispersion – summarize degree of clustering of cases, esp.

with respect to central tendency…

• range

• variance

• standard deviation

range

unit 1 unit 29.7 9.0

11.5 11.211.6 11.312.1 11.712.4 12.212.6 12.513.1 13.213.5 13.813.6 14.014.8 15.516.3 15.626.9 16.2

16.4

unit 1 unit 2* 9 26| 25| 24| 23| 22| 21| 20| 19| 18| 17| 3 16 24 *| 15 56 || 8 14 0 || 651 13 28 || 641 12 25 || 65 11 237 || 10 |* 7 9 0 *

• would be better to use midspread…R: range(x)

variance

• analogous to average deviation of cases from mean

• in fact, based on sum of squared deviations from the mean—“sum-of-squares”

11

2

2

n

xxs

n

ii

R: var(x)

variance

• computational form:

1

/2

11

2

2

n

nxx

s

n

ii

n

ii

• note: units of variance are squared…

• this makes variance hard to interpret

• ex.: projectile point sample:mean = 22.6 mmvariance = 38 mm2

• what does this mean???

standard deviation

• square root of variance:

11

2

n

xxs

n

ii

1

/1

2

1

2

n

nxx

s

n

i

n

iii

standard deviation

• units are in same units as base measurements

• ex.: projectile point sample:mean = 22.6 mmstandard deviation = 6.2 mm

• mean +/- sd (16.4—28.8 mm)– should give at least some intuitive sense of where most

of the cases lie, barring major effects of outliers

top related