![Page 1: Numeric Summaries and Descriptive Statistics. populations vs. samples we want to describe both samples and populations the latter is a matter of inference…](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f295503460f94c41f52/html5/thumbnails/1.jpg)
populations vs. samples
• we want to describe both samples and populations
• the latter is a matter of inference…
![Page 2: Numeric Summaries and Descriptive Statistics. populations vs. samples we want to describe both samples and populations the latter is a matter of inference…](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f295503460f94c41f52/html5/thumbnails/2.jpg)
“outliers”
• minority cases, so different from the majority that they merit separate consideration– are they errors?– are they indicative of a different pattern?
• think about possible outliers with care, but beware of mechanical treatments…
• significance of outliers depends on your research interests
![Page 3: Numeric Summaries and Descriptive Statistics. populations vs. samples we want to describe both samples and populations the latter is a matter of inference…](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f295503460f94c41f52/html5/thumbnails/3.jpg)
![Page 4: Numeric Summaries and Descriptive Statistics. populations vs. samples we want to describe both samples and populations the latter is a matter of inference…](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f295503460f94c41f52/html5/thumbnails/4.jpg)
summaries of distributions
• graphic vs. numeric– graphic may be better for visualization– numeric are better for statistical/inferential
purposes
• resistance to outliers is usually an advantage in either case
![Page 5: Numeric Summaries and Descriptive Statistics. populations vs. samples we want to describe both samples and populations the latter is a matter of inference…](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f295503460f94c41f52/html5/thumbnails/5.jpg)
general characteristics
• kurtosis
-5 50.00
0.22
-5 5D
0.0
0.4X
-5 5D
0.0
0.8
X
‘leptokurtic’ ’platykurtic’
[“peakedness”]
![Page 6: Numeric Summaries and Descriptive Statistics. populations vs. samples we want to describe both samples and populations the latter is a matter of inference…](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f295503460f94c41f52/html5/thumbnails/6.jpg)
0.0 0.2 0.4 0.6 0.8 1.0 1.2D
0
1
2
3
4
5X
right(positive)
skew
0.0 0.2 0.4 0.6 0.8 1.0 1.2D
0
1
2
3
4
5
X
left(negative)
skew
• skew (skewness)
![Page 7: Numeric Summaries and Descriptive Statistics. populations vs. samples we want to describe both samples and populations the latter is a matter of inference…](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f295503460f94c41f52/html5/thumbnails/7.jpg)
![Page 8: Numeric Summaries and Descriptive Statistics. populations vs. samples we want to describe both samples and populations the latter is a matter of inference…](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f295503460f94c41f52/html5/thumbnails/8.jpg)
central tendency
• measures of central tendency– provide a sense of the value expressed by
multiple cases, over all…
• mean
• median
• mode
![Page 9: Numeric Summaries and Descriptive Statistics. populations vs. samples we want to describe both samples and populations the latter is a matter of inference…](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f295503460f94c41f52/html5/thumbnails/9.jpg)
mean
• center of gravity
• evenly partitions the sum of all measurement among all cases; average of all measures
n
xx
n
ii
1
![Page 10: Numeric Summaries and Descriptive Statistics. populations vs. samples we want to describe both samples and populations the latter is a matter of inference…](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f295503460f94c41f52/html5/thumbnails/10.jpg)
• crucial for inferential statistics
• mean is not very resistant to outliers
• a “trimmed mean” may be better for descriptive purposes
mean – pro and con
![Page 11: Numeric Summaries and Descriptive Statistics. populations vs. samples we want to describe both samples and populations the latter is a matter of inference…](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f295503460f94c41f52/html5/thumbnails/11.jpg)
meanrim diameter (cm)
unit 1 unit 212.6 16.211.6 16.416.3 13.813.1 13.212.1 11.326.9 14.09.7 9.0
11.5 12.514.8 15.613.5 11.212.4 12.213.6 15.5
11.7
n 12 13total 168.1 172.6total/n 14.0 13.3
unit 1 unit 29 26
252423222120191817
3 16 2415 56
14.0== 8 14 0651 13 28 ==13.3641 12 25
65 11 23710
7 9 0
R: mean(x)
![Page 12: Numeric Summaries and Descriptive Statistics. populations vs. samples we want to describe both samples and populations the latter is a matter of inference…](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f295503460f94c41f52/html5/thumbnails/12.jpg)
trimmed meanrim diameter (cm)
unit 1 unit 29.7 9.0
11.5 11.211.6 11.312.1 11.712.4 12.212.6 12.513.1 13.213.5 13.813.6 14.014.8 15.516.3 15.626.9 16.2
16.4
n 10 11total 131.5 147.2total/n 13.2 13.4
unit 1 unit 29 26
252423222120191817
3 16 2415 56
8 14 013.2== 651 13 28 ==13.4
641 12 2565 11 237
107 9 0
R: mean(x, trim=.1)
![Page 13: Numeric Summaries and Descriptive Statistics. populations vs. samples we want to describe both samples and populations the latter is a matter of inference…](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f295503460f94c41f52/html5/thumbnails/13.jpg)
median
• 50th percentile…
• less useful for inferential purposes
• more resistant to effects of outliers…
![Page 14: Numeric Summaries and Descriptive Statistics. populations vs. samples we want to describe both samples and populations the latter is a matter of inference…](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f295503460f94c41f52/html5/thumbnails/14.jpg)
median
rim diameter (cm)
unit 1 unit 29.7 9.0
11.5 11.211.6 11.312.1 11.712.4 12.212.6 12.5
12.9 <-- 13.2 13.213.1 13.813.5 14.013.6 15.514.8 15.616.3 16.226.9 16.4
unit 1 unit 29 26
252423222120191817
3 16 2415 56
8 14 0651 13 28 ==13.20
12.85== 641 12 2565 11 237
107 9 0
![Page 15: Numeric Summaries and Descriptive Statistics. populations vs. samples we want to describe both samples and populations the latter is a matter of inference…](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f295503460f94c41f52/html5/thumbnails/15.jpg)
mode
• the most numerous category• for ratio data, often implies that data have
been grouped in some way• can be more or less created by the grouping
procedure• for theoretical distributions—simply the
location of the peak on the frequency distribution
![Page 16: Numeric Summaries and Descriptive Statistics. populations vs. samples we want to describe both samples and populations the latter is a matter of inference…](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f295503460f94c41f52/html5/thumbnails/16.jpg)
isol
ated
sca
tter
s
ham
lets
vill
ages
regi
onal
cen
ters
regi
onal
cen
ters
modal class = ‘hamlets’
-5 50.00
0.22
1.0 1.5 2.0 2.5
![Page 17: Numeric Summaries and Descriptive Statistics. populations vs. samples we want to describe both samples and populations the latter is a matter of inference…](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f295503460f94c41f52/html5/thumbnails/17.jpg)
dispersion
• measures of dispersion – summarize degree of clustering of cases, esp.
with respect to central tendency…
• range
• variance
• standard deviation
![Page 18: Numeric Summaries and Descriptive Statistics. populations vs. samples we want to describe both samples and populations the latter is a matter of inference…](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f295503460f94c41f52/html5/thumbnails/18.jpg)
range
unit 1 unit 29.7 9.0
11.5 11.211.6 11.312.1 11.712.4 12.212.6 12.513.1 13.213.5 13.813.6 14.014.8 15.516.3 15.626.9 16.2
16.4
unit 1 unit 2* 9 26| 25| 24| 23| 22| 21| 20| 19| 18| 17| 3 16 24 *| 15 56 || 8 14 0 || 651 13 28 || 641 12 25 || 65 11 237 || 10 |* 7 9 0 *
• would be better to use midspread…R: range(x)
![Page 19: Numeric Summaries and Descriptive Statistics. populations vs. samples we want to describe both samples and populations the latter is a matter of inference…](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f295503460f94c41f52/html5/thumbnails/19.jpg)
variance
• analogous to average deviation of cases from mean
• in fact, based on sum of squared deviations from the mean—“sum-of-squares”
11
2
2
n
xxs
n
ii
R: var(x)
![Page 20: Numeric Summaries and Descriptive Statistics. populations vs. samples we want to describe both samples and populations the latter is a matter of inference…](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f295503460f94c41f52/html5/thumbnails/20.jpg)
variance
• computational form:
1
/2
11
2
2
n
nxx
s
n
ii
n
ii
![Page 21: Numeric Summaries and Descriptive Statistics. populations vs. samples we want to describe both samples and populations the latter is a matter of inference…](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f295503460f94c41f52/html5/thumbnails/21.jpg)
• note: units of variance are squared…
• this makes variance hard to interpret
• ex.: projectile point sample:mean = 22.6 mmvariance = 38 mm2
• what does this mean???
![Page 22: Numeric Summaries and Descriptive Statistics. populations vs. samples we want to describe both samples and populations the latter is a matter of inference…](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f295503460f94c41f52/html5/thumbnails/22.jpg)
standard deviation
• square root of variance:
11
2
n
xxs
n
ii
1
/1
2
1
2
n
nxx
s
n
i
n
iii
![Page 23: Numeric Summaries and Descriptive Statistics. populations vs. samples we want to describe both samples and populations the latter is a matter of inference…](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f295503460f94c41f52/html5/thumbnails/23.jpg)
standard deviation
• units are in same units as base measurements
• ex.: projectile point sample:mean = 22.6 mmstandard deviation = 6.2 mm
• mean +/- sd (16.4—28.8 mm)– should give at least some intuitive sense of where most
of the cases lie, barring major effects of outliers