2. statistics lod lecture
TRANSCRIPT
-
8/12/2019 2. Statistics Lod Lecture
1/64
Handling Data and Figures of Merit
Data comes in different formats
timeHistogramsLists
But. Can contain the same information about quality
What is meant by quality?
(figures of merit)
Precision, separation (selectivity), limits of detection,Linear range
-
8/12/2019 2. Statistics Lod Lecture
2/64
day weight day weight day weight1 140 31 143.9 61 1442 140.1 32 144 62 144.2
3 139.8 33 142.5 63 144.54 140.6 34 142.9 64 144.25 140 35 142.8 65 143.96 139.8 36 143.9 66 144.27 139.6 37 144 67 144.58 140 38 144.8 68 144.39 140.8 39 143.9 69 144.2
10 139.7 40 144.5 70 144.911 140.2 41 143.9 71 14412 141.7 42 144 72 143.813 141.9 43 144.2 73 14414 141.4 44 143.8 74 143.815 142.3 45 143.5 75 14416 142.3 46 143.8 76 144.517 141.9 47 143.2 77 143.718 142.1 48 143.5 78 143.919 142.5 49 143.6 79 14420 142.3 50 143.4 80 144.221 142.1 51 143.9 81 14422 142.5 52 143.6 82 144.423 143.5 53 144 83 143.824 143 54 143.8 84 144.125 143.2 55 143.626 143 56 143.827 143.4 57 14428 143.5 58 144.229 142.7 59 14430 143.7 60 143.9
My weight
Plot as a function of time data was acquired:
-
8/12/2019 2. Statistics Lod Lecture
3/64
139
140
141
142
143
144
145
146
0 10 20 30 40 50 60
Day
w e i g h t
( l b s
)
Do not use curved lines to connect data points that assumes you know more about therelationship of the data than you really do
Comments: background is white (less ink);Font size is larger than Excel
default (use 14 or 16)
day weight day weight day weight1 140 31 1 43.9 61 1442 140.1 32 144 62 144.23 139.8 33 142.5 63 144.54 140.6 34 142.9 64 144.25 140 35 142.8 65 143.96 139.8 36 143.9 66 144.27 139.6 37 144 67 144.58 140 38 144.8 68 144.39 140.8 39 143.9 69 144.2
10 139.7 40 144.5 70 144.911 140.2 41 143.9 71 14412 141.7 42 144 72 143.813 141.9 43 144.2 73 14414 141.4 44 143.8 74 143.815 142.3 45 143.5 75 14416 142.3 46 143.8 76 144.517 141.9 47 143.2 77 143.718 142.1 48 143.5 78 143.919 142.5 49 143.6 79 14420 142.3 50 143.4 80 144.221 142.1 51 143.9 81 14422 142.5 52 143.6 82 144.4
23 143.5 53 144 83 143.824 143 54 143.8 84 144.125 143.2 55 143.626 143 56 143.827 143.4 57 14428 143.5 58 144.229 142.7 59 14430 143.7 60 143.9
-
8/12/2019 2. Statistics Lod Lecture
4/64
-
8/12/2019 2. Statistics Lod Lecture
5/64
Bin refers to what groups ofweight to cluster. LikeA grade curve which listsnumber of students who got
between 95 and 100 pts95-100 would be a bin
-
8/12/2019 2. Statistics Lod Lecture
6/64
Assume my weight is a single, random, set of similar data
0
5
10
15
20
25
Weight (lbs)
# o f
O b s e r v a t
i o n s
Make a frequency chart (histogram) of the data
Create a model of my weight and determine average Weight and how consistent my weight is
139
140
141
142
143
144
145
146
0 10 20 30 40 50 60
Day
w e i g h t
( l b s
)
-
8/12/2019 2. Statistics Lod Lecture
7/64
0
5
10
15
20
25
Weight (lbs)
# o f
O b s e r v a t
i o n s
= measure of the consistency, or similarity, of weights
average143.11
s = 1.4 lbs
Inflection pt
s = standard deviation
-
8/12/2019 2. Statistics Lod Lecture
8/64
Characteristics of the Model Population(Random, Normal)
Peak height, APeak location (mean or average), mPeak width, W, at baselinePeak width at half height, W 1/2Standard deviation, s, estimates the variation in an infinite population, s
Related concepts
f x A e x
s
m
s
2
12
2
-
8/12/2019 2. Statistics Lod Lecture
9/64
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
-5 -4 -3 -2 -1 0 1 2 3 4 5
s
A m p
l i t u
d e
Width is measuredAt inflection point =s
W1/2
Triangulated peak: Base width is 2s < W < 4s
-
8/12/2019 2. Statistics Lod Lecture
10/64
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
-5 -4 -3 -2 -1 0 1 2 3 4 5
s
A m p
l i t u
d e
+/- 1s
Area +/- 2s = 95.4%
Area +/- 3s = 99.74 %
pp s~ 6
Pp = peak to peak or largest separation ofmeasurements
Peak to peak is sometimesEasier to see on the data vs time plot
Area = 68.3%
-
8/12/2019 2. Statistics Lod Lecture
11/64
139
140
141
142
143
144
145
146
0 10 20 30 40 50 60
Day
w e i g h t ( l b s ) Peak to
peak
pp s~ 6
139.5
144.9
s~ pp/6 = (144.9-139.5)/6~0.9
(Calculated s= 1.4)
0
5
10
15
20
25
Weight (lbs)
# o f
O b s e r v a
t i o n s
-
8/12/2019 2. Statistics Lod Lecture
12/64
-0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
-5 -4 -3 -2 -1 0 1 2 3 4 5
s
A m p l i
t u d e
Scale up the first derivative and second derivative to see better
There are some other important characteristics of a normal (random) population
1st
derivative 2nd derivative
-
8/12/2019 2. Statistics Lod Lecture
13/64
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
-5 -4 -3 -2 -1 0 1 2 3 4 5
s
A m p l i
t u d e
Population, 0 th derivative
1st derivative,Peak is at the inflectionDetermines the std. dev.
2nd derivativePeak is at the inflectionOf first derivative shouldBe symmetrical for normal
Population; goes to zero atStd. dev.
-
8/12/2019 2. Statistics Lod Lecture
14/64
Asymmetry can be determined from principle component analysis
A. F. ( Alanah Fitch) = asymmetric factor
-
8/12/2019 2. Statistics Lod Lecture
15/64
Is there a difference between my baseline weight and school weight? Can you detect a difference? Can you quantitate a difference?
139
140
141
142
143
144
145
146
0 10 20 30 40 50 60
Day
w e i g h t
( l b s
)
Vacation
School Begins
Baseline
Comparing TWO populations of measurements
-
8/12/2019 2. Statistics Lod Lecture
16/64
0
5
10
15
20
25
138 139 140 141 142 143 144 145 146 147
Weight (lbs)
# o f
O b s e r v a
t i o n s
Exact same information displayed differently, but now we divideThe data into different measurement populations
baseline
school
Model of the data as two normal populations
-
8/12/2019 2. Statistics Lod Lecture
17/64
25 25
-
8/12/2019 2. Statistics Lod Lecture
18/64
0
5
10
15
20
25
Weight (lbs)
# o f
O b s e r v a
t i o n s
0
5
10
15
20
25
138 139 140 141 142 143 144 145 146 147
Weight (lbs)
#
o f O b s e r v a t
i o n s
We have two models to describe the population of measurementsOf my weight.In one we assume that all measurements fall into a single population.
In the second we assume that the measurementsHave sampled two different populations.
Which is the better model?How to we quantify better?
0
5
10
15
20
25
138 139 140 141 142 143 144 145 146 147
Weight (lbs)
# o f
O b s e r v a
t i o n s
-
8/12/2019 2. Statistics Lod Lecture
19/64
0
5
10
15
20
25
138 139 140 141 142 143 144 145 146 147
Weight (lbs)
# o f
O b s e r v a
t i o n s
Compare how closeThe measured dataFits the model
Did I gain weight?
The red bars represent the differenceBetween the two population model andThe data
The purple lines representThe difference between
The single populationModel and the dataWhich modelHas less summed
differences?
-
8/12/2019 2. Statistics Lod Lecture
20/64
-
8/12/2019 2. Statistics Lod Lecture
21/64
Anova: Single Factor 5% certaintySUMMARY
Groups Count Sum Average VarianceColumn 1 12 277.41 23.1175 8.70360227Column 2 12 345.72 28.81 6.50010909
ANOVASource of Variation SS df MS F P-value F crit Between Groups 194.4273 1 194.4273 25.5762995 4.59E-05 4.300949Within Groups 167.2408 22 7.601856 Sou
Total 361.6682 23
Test: is F
-
8/12/2019 2. Statistics Lod Lecture
22/64
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
14 19 24 29 34 39
Length (cm)
F r e q u e n c y
White, N=12, Sum sq diff=0.037, stdev=2.55White, N=38, Sum sq diff=0.028, stdev=2.15
Red, N=12, Sum sq diff=0.11, stdev=3.27Red, N=40, Sum sq diff=0.017, stdev-2.67
0
0.05
0.1
0.15
0.2
0.25
0.3
14 19 24 29 34 39
Length, cm
F r e q u e n c y
N=24 Sum sq diff=0.0449, stdev=3.96N=78, sum sq diff=0.108, stdev=4.05
In an Analysis of Variance you test the hypothesis that the sample isBest described as a single population.1. Create the expected frequency (Gaussian from normal error curve)2. Measure the deviation between the histogram point and the expected
frequency3. Square to remove signs4. SS = sum squares5. Compare to expected SS which scales with population size6. If larger than expected then can not explain deviations assuming a
single population
0 35 0 3
-
8/12/2019 2. Statistics Lod Lecture
23/64
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
14 19 24 29 34 39
Length (cm)
F r e q u e n c y
White, N=12, Sum sq diff=0.037, stdev=2.55White, N=38, Sum sq diff=0.028, stdev=2.15
Red, N=12, Sum sq diff=0.11, stdev=3.27Red, N=40, Sum sq diff=0.017, stdev-2.67
0
0.05
0.1
0.15
0.2
0.25
0.3
14 19 24 29 34 39
Length, cm
F r e q u e n c y
N=24 Sum sq diff=0.0449, stdev=3.96N=78, sum sq diff=0.108, stdev=4.05
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
15 17 19 21 23 25 27 29 31 33 35
Length (cm)
S q u a r e
D i f f e r e n c e
E x p e c
t e d M e a s u r e
d The square differencesFor an assumption ofA single populationIs larger than forThe assumption ofTwo individual
populations
-
8/12/2019 2. Statistics Lod Lecture
24/64
There are other measurements which describe the two populations
Resolution of two peaks
R x x
W W a b
a b
2 2
Mean or average
Baseline width
-
8/12/2019 2. Statistics Lod Lecture
25/64
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
1 1.5 2 2.5 3 3.5 4
x
S i g n a l
xa xb
x xa b
W a
2W b
2
n this example
W W a b2 2
Peaks are baseline resolved when R > 1 R x x W W
a b a b 1 2 2:
-
8/12/2019 2. Statistics Lod Lecture
26/64
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
1 1.5 2 2.5 3 3.5 4
x
S i g n a l
xa xb
x xa b
W a
2W b
2
n this example
W W a b2 2
Peaks are just baselineresolved when R = 1 R x x
W W a b a b 1 2 2:
-
8/12/2019 2. Statistics Lod Lecture
27/64
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1 1.5 2 2.5 3 3.5 4
x
S i g n a l
xa xb
x xa b
W a
2W b
2
n this example
W W a b2 2
Peaks are not baseline resolvedwhen R < 1 R x x W W
a b a b 1 2 2:
-
8/12/2019 2. Statistics Lod Lecture
28/64
2008 Data
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
14 19 24 29 34 39
Length (cm)
F r e q u e n c y
White, N=12, Sum sq diff=0.037Red, N=12, Sum sq diff=0.11
What is the R for this data?
x W W p R W 12
R 1
-
8/12/2019 2. Statistics Lod Lecture
29/64
Visually less resolved Visually better resolved
Comparison of 1978 Low Lead to 1979 High Lead
0
5
10
15
20
25
0 20 40 60 80 100 120 140 160Ser ies 2 Ser ies 3
Comparison of 1978 Low Lead to 1978 High Lead
0
5
10
15
20
25
0 20 40 60 80 100 120 140 160
IQ Verbal
% M e a s u r e
d
Anonymous 2009 student analysis of Needleman data
W
W
a
b
2112 70 42
2130 95 35
~ ~
~ ~ R
x xW W
a b
a b
2 2
-
8/12/2019 2. Statistics Lod Lecture
30/64
Visually less resolved Visually better resolved
Comparison of 1978 Low Lead to 1979 High Lead
0
5
10
15
20
25
0 20 40 60 80 100 120 140 160Ser ies 2 Ser ies 3
Comparison of 1978 Low Lead to 1978 High Lead
0
5
10
15
20
25
0 20 40 60 80 100 120 140 160
IQ Verbal
% M e a s u r e
d
Anonymous 2009 student analysis of Needleman data
W
W
a
b
2112 70 42
2130 95 35
~ ~
~ ~
x xa b ~ ~112 95 17 R
x xW W
a b
a b
2 2
17
42 350 22~ .
-
8/12/2019 2. Statistics Lod Lecture
31/64
Other measures of the quality of separation of thePeaks
1. Limit of detection
2. Limit of quantification3. Signal to noise (S/N)
-
8/12/2019 2. Statistics Lod Lecture
32/64
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
-6 -4 -2 0 2 4 6 8 10 12
s
A m p
l i t u d e
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
-6 -4 -2 0 2 4 6 8 10 12
s
A m p l
i t u d e
3s
X blank
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
-6 -4 -2 0 2 4 6 8 10 12
s
A m p
l i t u d e
3s
X limit of detection
x x s LOD blank blank 3
99.74%
Of the observationsOf the blank will lie
below the mean of theFirst detectable signal
(LOD)
-
8/12/2019 2. Statistics Lod Lecture
33/64
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
-6 -4 -2 0 2 4 6 8 10 12
s
A m p l
i t u d e
3s
Two peaks are visible when all the data is summed together
-
8/12/2019 2. Statistics Lod Lecture
34/64
Estimate the LOD (signal) of this data
139
140
141
142
143
144
145
146
0 10 20 30 40 50 60
Day
w e i g h t ( l b s )
0
5
10
15
20
25
138 139 140 141 142 143 144 145 146 147
Weight (lbs)
# o f
O b s e r v a
t i o n s
-
8/12/2019 2. Statistics Lod Lecture
35/64
Other measures of the quality of separation of thePeaks
1. Limit of detection
2. Limit of quantification3. Signal to noise (S/N)
Your book suggests 10
-
8/12/2019 2. Statistics Lod Lecture
36/64
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
-6 -4 -2 0 2 4 6 8 10 12
s
A m p l
i t u d e
x x s LOQ blank blank 9Your book suggests 10
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
-6 -4 -2 0 2 4 6 8 10 12
s
A m p
l i t u d e
9s
Limit of quantification requires absolute
Certainty that no blank is part of the
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
-6 -4 -2 0 2 4 6 8 10 12
s
A m p
l i t u d e
-
8/12/2019 2. Statistics Lod Lecture
37/64
139
140
141
142
143
144
145
146
0 10 20 30 40 50 60
Day
w e i g h t ( l b s )
Estimate the LOQ (signal) of this data
139
140
141
142
143
144
145
146
0 10 20 30 40 50 60
Day
w e i g h t ( l b s )
0
5
10
15
20
25
138 139 140 141 142 143 144 145 146 147
Weight (lbs)
# o f
O b s e r v a
t i o n s
-
8/12/2019 2. Statistics Lod Lecture
38/64
Other measures of the quality of separation of thePeaks
1. Limit of detection
2. Limit of quantification3. Signal to noise (S/N)
Signal = x sample - x blank
Noise = N = standard deviation, s
S N
x x s
x x pp
sample blank sample blank
6
(This assumes pp school ~ pp baseline)
-
8/12/2019 2. Statistics Lod Lecture
39/64
Estimate the S/N of this data
139
140
141
142
143
144
145
146
0 10 20 30 40 50 60
Day
w e i g h t
( l b s
)
Vacation
School Begins
Baseline
Signal
Peak to peak variation within mean school~ 6s where s = N for Noise
(This assumes pp school pp baseline)
0
5
10
15
20
25
138 139 140 141 142 143 144 145 146 147
Weight (lbs)
# o f
O b s e r v a
t i o n s
-
8/12/2019 2. Statistics Lod Lecture
40/64
0
5
10
15
20
25
30
35
0 5 10 15 20 25 30
Sample number
l e n g
t h ( c m
)
Can you tell where the switch between Red and white potatoes begins?
What is the signal (length of white)?What is the background (length of red)?What is the S/N ?
-
8/12/2019 2. Statistics Lod Lecture
41/64
Effect of sample size on the measurement
Error curve
-
8/12/2019 2. Statistics Lod Lecture
42/64
Error curvePeak height grows with # of measurements.+ - 1 s always has same proportion of total number of measurements
However, the actual value of s decreases as population grows
-
8/12/2019 2. Statistics Lod Lecture
43/64
2008 Data
-
8/12/2019 2. Statistics Lod Lecture
44/64
22.5
23
23.5
24
24.5
25
25.5
26
26.5
27
0 2 4 6 8 10 12 14
Sample number
R e
d R u n n
i n g
L e n g
t h A v e r a g e
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
R e
d R u n n
i n g
S t d e v
2008 Data
y = -0.8807x + 5.9303R 2 = 0.9491
2.5
2.7
2.9
3.1
3.3
3.5
3.7
3.9
4.1
1.5 2 2.5 3 3.5 4sqrt number of samples
s t d e v r e
d l e n g
t h c m
s s
n sample
population
sample
0.35
-
8/12/2019 2. Statistics Lod Lecture
45/64
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
14 19 24 29 34 39
Length (cm)
F r e q u e n c y
White, N=12, Sum sq diff=0.037, stdev=2.55White, N=38, Sum sq diff=0.028, stdev=2.15
Red, N=12, Sum sq diff=0.11, stdev=3.27Red, N=40, Sum sq diff=0.017, stdev-2.67
-
8/12/2019 2. Statistics Lod Lecture
46/64
Calibration Curve
A calibration curve is based on a selected measurement as linearIn response to the concentration of the analyte.
Or a prediction of measurement due to some change Can we predict my weight change if I had spent a longer time on
Vacation?
bxa y
vacationondaysbalbs fitch
-
8/12/2019 2. Statistics Lod Lecture
47/64
0
5
10
15
20
25
138 139 140 141 142 143 144 145 146 147
Weight (lbs)
# o f
O b s e r v a t
i o n s
-
8/12/2019 2. Statistics Lod Lecture
48/64
-
8/12/2019 2. Statistics Lod Lecture
49/64
y = 0.3542x + 140.04R 2 = 0.7425
139
139.5
140
140.5
141
141.5
142
142.5
143
0 1 2 3 4 5 6
Days on Vacation
F i t c h W e i g h
t , l b s
Can get this by using trend line
This is just a trendline
-
8/12/2019 2. Statistics Lod Lecture
50/64
y = -0.8807x + 5.9303R 2 = 0.9491
2.5
2.7
2.9
3.1
3.3
3.5
3.7
3.9
4.1
1.5 2 2.5 3 3.5 4
sqrt number of samples
s t d e v r e
d l e n g
t h c m
jFrom format data Sample sqrt(#samples) stdev
1 1 #DIV/0!2 1.414213562 2.0364683 1.732050808 4.4757274 2 4.31441
5 2.236067977 3.8440456 2.449489743 3.8446047 2.645751311 3.7351248 2.828427125 3.4584149 3 3.235055
10 3.16227766 3.09305311 3.31662479 2.93594412 3.464101615 2.950187
SUMMARY OUTPUT
Regression StatisticsMultiple R 0.296113395R Square 0.087683143Adjusted R Square -0.013685397Standard Error 0.703143388Observations 11
ANOVAdf SS MS F ignificance
Regression 1 0.427662048 0.427662 0.864994 0.376617Residual 9 4.449695616 0.494411Total 10 4.877357664
Coefficients Standard Error t Stat P-value Lower 95%Intercept 3.884015711 0.514960076 7.542363 3.53E-05 2.719094X Variable 1 -0.06235252 0.067042092 -0.93005 0.376617 -0.21401
Using the analysis
Data packGet an errorAssociated withThe intercept
-
8/12/2019 2. Statistics Lod Lecture
51/64
In the best of all worlds you should have a series of blanksThat determine youre the noise associated with the background
x x s LOD blank blank 3Sometimes you forget, so to fall back and punt, estimateThe standard deviation of the blank from the linear regression
But remember, in doing this you are acknowledgingA failure to plan ahead in your analysis
x x b conc LOD LOD blank [ . ]
-
8/12/2019 2. Statistics Lod Lecture
52/64
[ . ]conc LOD s
b
blank 3
Extrapolation of the associated errorCan be obtained from the Linear
Regression data
Sensitivity (slope)
x x s LOD blank blank
3 x s x b conc LODblank blank blank 3 [ . ]
The concentration LOD depends on BOTH
Stdev of blank and sensitivity
Signal LOD
!!Note!!Signal LOD Conc LOD
We want Conc. LOD
Selectivity
-
8/12/2019 2. Statistics Lod Lecture
53/64
-350
-300
-250
-200
-150
-100
-50
0024681012
pH or pM
m V
y = -31.143x - 74.333R2 = 0.9994
-350
-300
-250
-200
-150
-100
-50
0024681012
pH or pM
m V
y = -31.143x - 74.333R2 = 0.9994
-350
-300
-250
-200
-150
-100
-50
0024681012
pH or pM
m V
y = -31.143x - 74.333R2 = 0.9994
y = -41x - 118.5R2 = 0.9872
-350
-300
-250
-200
-150
-100
-50
0024681012
pH or pM
m V
Difference in slope is one measure selectivity
In a perfect method the sensing device would have zeroSlope for the interfering species
Pb 2+
H+
-
8/12/2019 2. Statistics Lod Lecture
54/64
Limit of linearity
5% deviation
-
8/12/2019 2. Statistics Lod Lecture
55/64
Summary: Figures of Merit Thus far
R = resolution
S/NLOD = both signal and concentrationLOQLOLSensitivity (calibration curve slope)Selectivity (essentially difference in slopes)
Can be expressed in terms of signal, but betterExpression is in terms of concentration
Tests: Anova
Why is the limit of detection important?
Why has the limit of detection changed so much in theLast 20 years?
-
8/12/2019 2. Statistics Lod Lecture
56/64
-
8/12/2019 2. Statistics Lod Lecture
57/64
-
8/12/2019 2. Statistics Lod Lecture
58/64
-
8/12/2019 2. Statistics Lod Lecture
59/64
The End
-
8/12/2019 2. Statistics Lod Lecture
60/64
0
5
10
15
20
25
40 60 80 100 120 140 160Verbal IQ
% o
f M e a s u r e m e n
t s
0
5
10
15
20
25
40 60 80 100 120 140 160
Verbal IQ
% o f
M e a s u r e m e n
t s
Which of these two data sets would be likelyTo have better numerical value for the
Ability to distinguish between two differentPopulations?
Needlemans data
2008 Data Height for normalized
-
8/12/2019 2. Statistics Lod Lecture
61/64
2008 Data
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
14 19 24 29 34 39
Length (cm)
F r e q u e n c y
White, N=12, Sum sq diff=0.037Red, N=12, Sum sq diff=0.11
Height for normalizedBell curve
-
8/12/2019 2. Statistics Lod Lecture
62/64
0
0.05
0.1
0.15
0.2
0.25
0.3
14 19 24 29 34 39
Length (cm)
F r e q u e n c y
White, N=12, Sum sq diff=0.037, stdev=2.55White, N=38, Sum sq diff=0.028, stdev=2.15
Red, N=12, Sum sq diff=0.11, stdev=3.27Red, N=40, Sum sq diff=0.017, stdev-2.67
Increasing the sample size decreases the std dev and increases separationOf the populations, notice that the means also change, will do so untilWe have a reasonable sample of the population
-
8/12/2019 2. Statistics Lod Lecture
63/64
0
5
10
15
20
25
40 60 80 100 120 140 160
Verbal IQ
% o
f M e a s u r e m e n
t s
-
8/12/2019 2. Statistics Lod Lecture
64/64
0
5
10
15
20
25
40 60 80 100 120 140 160
Verbal IQ
% o
f M e a s u r e m e n
t s