Phlebotomy Training for M-III Students: Statistical Analysis of
Test Results
Richard A. McPherson, M.D., M.S.
Phlebotomy Training 2001-2008
• Exercise offered to third year medical students as part of orientation every year since 2001.
• This is the third year that phlebotomy training was mandatory.
• Other exercises offered in IV/Foley catheter placement.
Numbers of Students Submitting Blood Specimens Each year
• 2001 103• 2002 83• 2003 102• 2004 87• 2005 98• 2006 150• 2007 147• 2008 134• Total 904
50
100
150
Cou
nt
2001 2002 2003 2004 2005 2006 2007 2008
Phlebotomy Training 2008• Wednesday, July 23, 2008 in three separate
sessions held in the Medical Sciences Building at 1, 2 and 3 PM.
• A total of 153 students attended the exercise in which each student collected two tubes of blood on a partner.
• Specimens were successfully collected from 134 students and submitted to the laboratory for simple chemical and hematological measurements.
• The students’ own results were provided to them with a unique identifying number known only to each individual student.
10
20
30
Cou
nt
22 Y
ears
23 Y
ears
24 Y
ears
25 Y
ears
26 Y
ears
27 Y
ears
28 Y
ears
29 Y
ears
30 Y
ears
31 Y
ears
32 Y
ears
37 Y
ears
39 Y
ears
22 Years
23 Years
24 Years
25 Years
26 Years
27 Years
28 Years
29 Years
30 Years
31 Years
32 Years
37 Years
39 Years
Total
Level
1
15
33
31
21
12
6
6
3
2
2
1
1
134
Count
0.00746
0.11194
0.24627
0.23134
0.15672
0.08955
0.04478
0.04478
0.02239
0.01493
0.01493
0.00746
0.00746
1.00000
Prob
13 Levels
Frequencies
Age
20
40
60
Cou
nt
F M
F
M
Total
Level
66
68
134
Count
0.49254
0.50746
1.00000
Prob
2 Levels
Frequencies
Gender
25
50
75
Cou
ntAsian-Pacific
Black
Hispanic Other Unknown White
Asian-Pacific
Black
Hispanic
Other
Unknown
White
Total
Level
30
4
1
3
7
89
134
Count
0.22388
0.02985
0.00746
0.02239
0.05224
0.66418
1.00000
Prob
6 Levels
Frequencies
Race
Specimens by Gender
Male Female Total
Chemistry 68 66 134
Hematology 67 62 129
Reasons to Test Student Specimens• Courtesy to students for participation
• Teach interpretation of laboratory results (i.e., reference ranges) to students
• Evaluate current reference ranges for appropriateness
• Discover previously unknown medical condition– Students could opt out from testing blood.
• Demonstrate statistical applications
Goal 1. Descriptive Statistics
• Measure of Central Tendency– Mean– Median– Mode
• Measure of Dispersion– Standard deviation– Interquartile range (25th to 75th percentile range)
Before you get going with the analysis,
LOOK AT YOUR DATA!!!**#$!@$%
Strategies for Dealing withNon-normal distributions
1. Check for outliers– Extreme cases from errors of recording or
entering data– Individuals that clearly do not belong in the
population sampled.
Example: Checking for OutliersFour methods evaluated for Erythrocyte
Mean Cell Volume on 131 blood specimens
60
70
80
90
100
110
120
60
70
80
90
100
110
120
0
100
200
300
400
500
600
700
800
900
1000
1100
60
70
80
90
100
110
120
Method 1 vs Method 3Method 3 clearly has data entry errors of
1000.0 and 18.9
100.0%
99.5%
97.5%
90.0%
75.0%
50.0%
25.0%
10.0%
2.5%
0.5%
0.0%
maximum
quartile
median
quartile
minimum
1000.0
1000.0
117.3
99.5
93.6
89.5
84.4
74.0
64.4
18.9
18.9
Quantiles
100.0%
99.5%
97.5%
90.0%
75.0%
50.0%
25.0%
10.0%
2.5%
0.5%
0.0%
maximum
quartile
median
quartile
minimum
123.00
123.00
108.00
98.16
93.40
88.80
83.80
74.64
64.98
58.90
58.90
Quantiles
Method 3 edited to remove incorrect values; more normal in distribution
60
70
80
90
100
110
120
60
70
80
90
100
110
120
50
60
70
80
90
100
110
120
130
60
70
80
90
100
110
120
Outlier Trimming
• Remove upper and lower percentiles of data such as 0.5% to use data between 0.5 percentile and 99.5 percentile
• Eliminates what is most likely to be severely atypical information or data entry error
Serum ALT values trimmed for central 99 percent
0 100 300 500 700 900 1100 1300 1500 1700 1900 2100 2300 2500 270010 20 30 40 50 60 70 80 90100 120 140 160 180 200 220
Strategies for Dealing withNon-normal distributions
2. If results are skewed, transform to a scale that is more nearly normal by logarithm, square root, etc.
ALT, Log ALT, SQRT ALT
.01
.05
.10
.25
.50
.75
.90
.95
.99
-3
-2
-1
0
1
2
3
Nor
mal
Qua
ntile
Plo
t
2.5 3 3.5 4 4.5 5
.01
.05
.10
.25
.50
.75
.90
.95
.99
-3
-2
-1
0
1
2
3
Nor
mal
Qua
ntile
Plo
t
3 4 5 6 7 8 9 10 11 12
.01
.05
.10
.25
.50
.75
.90
.95
.99
-3
-2
-1
0
1
2
3
Nor
mal
Qua
ntile
Plo
t
10 20 30 40 50 60 70 80 90 100 120
2008 Student Hemoglobin Distribution
5
10
15
20
Cou
nt11 12 13 14 15 16 17
Hemoglobin (g/dL)
100.0%
99.5%
97.5%
90.0%
75.0%
50.0%
25.0%
10.0%
2.5%
0.5%
0.0%
maximum
quartile
median
quartile
minimum
17.3
17.3
16.9
16.2
15.1
14.3
13.4
12.6
11.6
10.8
10.8
Quantiles
Mean
Std Dev
Std Err Mean
upper 95% Mean
lower 95% Mean
N
14.3
1.32
0.12
14.5
14.1
129
Moments
.01
.05
.10
.25
.50
.75
.90
.95
.99
-3
-2
-1
0
1
2
3
Nor
mal
Qua
ntile
Plo
t
5
10
15
20
Cou
nt
11 12 13 14 15 16 17
Hemoglobin (g/dL)
Assessment of Normality of Distribution:Normal Quantile Plot
Parameters: Mean
• Formula for mean
n
x
n
xxxxx
n
ii
n
1321 ...
mean
Parameters: Variance
• Formula for variance (variances are additive)
1variance 1
2
2
n
xxs
n
ii
Parameters: Standard Deviation
• Formula for Std Dev
11
2
2
n
xxss
n
ii
Goal 2. Comparative Statistics
• Parametric: uses a formula to describe distribution– Student t-test– One-way analysis of variance
• Non-parametric: assumes no particular distribution– Wilcoxon rank-order test
Comparison of Hgb in Females vs Males
10
20
30
Coun
t
10 11 12 13 14 15 16 17 18
Hemoglobin in Females
5
10
15
20
Coun
t
10 11 12 13 14 15 16 17 18
Hemoglobin in Males
Assumptions for Use of t test• Similar numbers in each group• Similar variances in each group• Individuals in each group are independent of one
another (random selection, non-biased)• Values are normally distributed.
You want to make a conclusion (inference) that is generalizable to a larger population than that which constitutes your sample. Accordingly the sample should be representative of the population.
Student’s t Test
• Student was the pseudonym for William Sealy Gossett [1876-1937], who developed statistical methods for solving problems in a brewery where he worked (Guinness in Dublin). He published his work in 1908 in the journal Biometrika. He did not publish under his own name so the nature of his work for optimizing production conditions could remain a trade secret.
Student’s t Test• Principle: Compare the difference between
means to the amount of noise (scatter) in measurements to judge if the difference in means could be due to chance alone.
B ofnumber B of variance
A ofnumber A of variance
B) of(mean - A) of(mean
groups ofy Variabilit
means groupbetween Difference
noise
signal
t
Comparative Statistics: Student’s t Test
• Hemoglobin mean values: females, 13.3 g/dL, males15.2 g/dL
• Are these mean values truly different from one another?
• Student t-value of 10.873, df=127, p-value <0.0001, or less than once in 10,000 times by chance alone.
HGB
11
12
13
14
15
16
17
F M
Gender
Confidence Intervals on Means of the Groups being Compared: no overlap
Level Number Mean Std Error
Lower 95% CI
Upper 95% CI
Female 62 13.33 0.1209 13.093 13.572
Male 67 15.16 0.1163 14.927 15.387
Comparison of WBC in Females vs Males
• WBC mean values: females, 7.1, males 6.6
• Are these mean values truly different from one another?
• Student t-value of 1.787, df=127, p-value = 0.0764, or about 1 in 13 times by chance alone.
WBC
3
4
5
6
7
8
9
10
11
12
13
F M
Gender
If we suspect a gender-related difference, how can we show it to have statistical
significance? Adjustables:
• Distance between means; use a more discriminating instrument, method, principle of measurement.
• Noise level: use a more precise method with less scatter in measurement
• Accept a higher type I error rate.
• Number of observations: increase N
Goal 3. Power Analysis
Do a power analysis to find N at which the conditions of a pilot study predict significance (at the level specified) could be achieved if your estimates of mean difference (delta) and variance (noise level) are accurate.
Definitions of Statistical Power• The likelihood of finding a statistically significant
difference when a true difference exists.Online Learning Center
• The power of a statistical test is the probability that the test will reject a false null hypothesis (that it will not make a Type II error). As power increases, the chances of a Type II error decrease. The probability of a Type II error is referred to as the false negative rate (β). Therefore power is equal to 1 − β. Wikipedia
Formula for calculating sample size
• N = number of subjects in each group• Z= parameter for chance of finding a difference
by chance alone (usually set to 5 percent) = 1.96• Z = parameter indicating power of finding a
difference (usually set to 80 percent) = 0.84
= the difference between group means (usually obtained from a pilot study or by an informed guess
= common SD for both groups
2
2242
ZZ
N
Z
Power Analysis for WBC vs Gender• Power• Alpha Sigma Delta Number Power• 0.0500 1.586712 0.249608 129 0.4260• 0.0500 1.586712 0.249608 139 0.4530• 0.0500 1.586712 0.249608 149 0.4792• 0.0500 1.586712 0.249608 159 0.5046• 0.0500 1.586712 0.249608 169 0.5293• 0.0500 1.586712 0.249608 179 0.5530• 0.0500 1.586712 0.249608 189 0.5760• 0.0500 1.586712 0.249608 199 0.5981• 0.0500 1.586712 0.249608 209 0.6193• 0.0500 1.586712 0.249608 219 0.6397• 0.0500 1.586712 0.249608 229 0.6593• 0.0500 1.586712 0.249608 239 0.6780• 0.0500 1.586712 0.249608 249 0.6959• 0.0500 1.586712 0.249608 259 0.7130• 0.0500 1.586712 0.249608 269 0.7294• 0.0500 1.586712 0.249608 279 0.7449• 0.0500 1.586712 0.249608 289 0.7597• 0.0500 1.586712 0.249608 299 0.7738• 0.0500 1.586712 0.249608 309 0.7872• 0.0500 1.586712 0.249608 319 0.7999• 0.0500 1.586712 0.249608 329 0.8119• 0.0500 1.586712 0.249608 339 0.8233• 0.0500 1.586712 0.249608 349 0.8342
Pow
er
0.00
0.20
0.40
0.60
0.80
1.00
100 150 200 250 300 350
Number
Gender
Alpha=0.05 Sigma=1.58671 Delta=0.24961
Power Plot
Need a total of 320 subjects to show significance at 0.05 levelwith 80% power
Goal 4. How to fit a line
• Least squares regression minimizes square of (vertical) distances from data points to line (best fit)
• y = ax + b
34
36
38
40
42
44
46
48
50
HC
T
11 12 13 14 15 16 17
HGB
Plot of residuals shows homoscedasticity (uniformity of data
over entire range)
-3
-1
1
3
Res
idua
l
11 12 13 14 15 16 17
HGB
Hct = 8.890 + 2.284xHgb
• R2 = 0.9069, so >90% of variation in Hct is predicted by Hgb
• Intercept = 8.890– t = 9.55, p<0.0001
• Slope = 2.284– t = 35.17, p<0.0001
• Is this a great fit or what?
• What if Hgb = 0? Then Hct should = 0, not 8.890
So force the line through the origin.
34
36
38
40
42
44
46
48
50H
CT
11 12 13 14 15 16 17
HGB
Hct = 0 + 2.901xHgb
60
65
70
75
80
85
90
95
100M
CV
4 4.5 5 5.5 6 6.5
RBC
Normal Variants?
Uric A
cid
1
2
3
4
5
6
7
8
9
10
11
F M
Gender
Super Difference by Gender
Gender Different AnalytesSo
dium
136
137
138
139
140
141
142
143
144
145
146
147
F M
Gender
Carb
on D
ioxide
16
18
20
22
24
26
28
30
32
F M
Gender
Gender Different AnalytesBU
N
10
15
20
25
30
35
F M
Gender
Crea
tinine
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
1.5
F M
Gender
Gender Different AnalytesCa
lcium
8.5
9
9.5
10
10.5
11
F M
Gender
Mag
nesiu
m
1.7
1.8
1.9
2
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
F M
Gender
Gender Different AnalytesAlb
umin
2.5
3
3.5
4
4.5
5
5.5
F M
Gender
Glob
ulins
2
2.5
3
3.5
4
4.5
5
F M
Gender
Gender Different AnalytesAS
T
0
10
20
30
40
50
60
70
80
90
F M
Gender
ALT
0
20
40
60
80
100
120
F M
Gender
Gender Different AnalytesAlk
Pho
s
30
40
50
60
70
80
90
100
110
120
130
F M
Gender
Bili T
otal
0
0.5
1
1.5
2
2.5
F M
Gender
Variation over Time: PlateletsP
LT
100
200
300
400
500
600
2001 2002 2003 2004 2005 2006 2007 2008
Year
Variation over Time: WBCW
BC
10
2001 2002 2003 2004 2005 2006 2007 2008
Year
Glucose 2008: postprandial (1 to 4 PM)
20
40
60
Cou
nt
50 60 70 80 90 100 110 120 130
Glucose (mg/dL)
Glucose over the YearsG
lucose
100
200
300
2001 2002 2003 2004 2005 2006 2007 2008
Year
AcknowledgementsPathology faculty• Roger Riley, MD• Kim Sanford, MD• Samuel B. Hunter, MD
Resident• Saud Rahman, MD
Nurse• Jennifer Anderson, RN
Phlebotomists• Linda Walker, MT, Supervisor• Charity Delacruz, CPT• Rogelio Inocencio, MLT• Jean Merritt, CPT• Shirley White, CPT
Test ordering, set-up, and processing
• Caroline Greene, MT• Susan Handwerk• June Lee, MT, Evening
Supervisor• Kristina Nilsen, MT• Millicent Smith, MT• Karen Tinsley, MT