statistics excellent
DESCRIPTION
TRANSCRIPT
![Page 1: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/1.jpg)
Practical Applications of Statistical Methods in the
Clinical Laboratory
Practical Applications of Statistical Methods in the
Clinical Laboratory
Roger L. Bertholf, Ph.D., DABCCAssociate Professor of Pathology
Director of Clinical Chemistry & Toxicology
UF Health Science Center/Jacksonville
![Page 2: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/2.jpg)
“[Statistics are] the only tools by which an opening can be cut
through the formidable thicket ofdifficulties that bars the path of those who pursue the Science of
Man.”
“[Statistics are] the only tools by which an opening can be cut
through the formidable thicket ofdifficulties that bars the path of those who pursue the Science of
Man.”
[Sir] Francis Galton (1822-1911)
![Page 3: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/3.jpg)
“There are three kinds of lies: Lies, damned lies, and
statistics”
“There are three kinds of lies: Lies, damned lies, and
statistics”
Benjamin Disraeli (1804-1881)
![Page 4: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/4.jpg)
What are statistics, and what are they used for?
What are statistics, and what are they used for?
• Descriptive statistics are used to characterize data
• Statistical analysis is used to distinguish between random and meaningful variations
• In the laboratory, we use statistics to monitor and verify method performance, and interpret the results of clinical laboratory tests
![Page 5: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/5.jpg)
““Do not worry about your Do not worry about your difficulties in mathematics, I assure difficulties in mathematics, I assure
you that mine are greater”you that mine are greater”
““Do not worry about your Do not worry about your difficulties in mathematics, I assure difficulties in mathematics, I assure
you that mine are greater”you that mine are greater”
Albert Einstein (1879-1955)
![Page 6: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/6.jpg)
““I don't believe in I don't believe in mathematics”mathematics”
““I don't believe in I don't believe in mathematics”mathematics”
Albert Einstein
![Page 7: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/7.jpg)
Summation functionSummation function
N
N
ii xxxxx 321
1
![Page 8: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/8.jpg)
Product functionProduct function
x x x x xii
N
N
11 2 3
![Page 9: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/9.jpg)
The Mean (average)The Mean (average)
The mean is a measure of the centrality of a set of data.
![Page 10: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/10.jpg)
Mean (arithmetical)Mean (arithmetical)
xN
xii
N
1
1
![Page 11: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/11.jpg)
Mean (geometric)Mean (geometric)
x x x x x xg NN
ii
N
N 1 2 3
1
![Page 12: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/12.jpg)
Use of the Geometric mean:Use of the Geometric mean:
The geometric mean is primarily used to average ratios or rates of change.
![Page 13: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/13.jpg)
Mean (harmonic)Mean (harmonic)
![Page 14: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/14.jpg)
Example of the use of Harmonic mean:
Example of the use of Harmonic mean:
Suppose you spend $6 on pills costing 30 cents per dozen, and $6 on pills costing 20 cents per dozen. What was the average price of the pills you bought?
![Page 15: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/15.jpg)
Example of the use of Harmonic mean:
Example of the use of Harmonic mean:
You spent $12 on 50 dozen pills, so the average cost is 12/50=0.24, or 24 cents.
This also happens to be the harmonic mean of 20 and 30:
21
30
1
20
24
![Page 16: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/16.jpg)
Root mean square (RMS)Root mean square (RMS)
xx x x x
N Nxrms
Ni
i
N
1
222
32 2
2
1
1
![Page 17: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/17.jpg)
For the data set:1, 2, 3, 4, 5, 6, 7, 8, 9, 10:
Arithmetic mean 5.50
Geometric mean 4.53
Harmonic mean 3.41
Root mean square 6.20
![Page 18: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/18.jpg)
The Weighted MeanThe Weighted Mean
xx w
ww
i ii
N
ii
N
1
1
![Page 19: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/19.jpg)
Other measures of centralityOther measures of centrality
• Mode
![Page 20: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/20.jpg)
The ModeThe Mode
The mode is the value that occurs most often
![Page 21: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/21.jpg)
Other measures of centralityOther measures of centrality
• Mode
• Midrange
![Page 22: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/22.jpg)
The MidrangeThe Midrange
The midrange is the mean of the highest and lowest values
![Page 23: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/23.jpg)
Other measures of centralityOther measures of centrality
• Mode
• Midrange
• Median
![Page 24: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/24.jpg)
The MedianThe Median
The median is the value for which half of the remaining values are above and half are below it. I.e., in an ordered array of 15 values, the 8th value is the median. If the array has 16 values, the median is the mean of the 8th and 9th values.
![Page 25: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/25.jpg)
Example of the use of median vs. mean:
Example of the use of median vs. mean:
Suppose you’re thinking about building a house in a certain neighborhood, and the real estate agent tells you that the average (mean) size house in that area is 2,500 sq. ft. Astutely, you ask “What’s the median size?” The agent replies “1,800 sq. ft.”
What does this tell you about the sizes of the houses in the neighborhood?
![Page 26: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/26.jpg)
Measuring varianceMeasuring variance
Two sets of data may have similar means, but otherwise be very dissimilar. For example, males and females have similar baseline LH concentrations, but there is much wider variation in females.
How do we express quantitatively the amount of variation in a data set?
![Page 27: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/27.jpg)
Mean differenceN
x x
Nx
Nx
x x
ii
N
ii
N
i
N
1
1 1
0
1
11
( )
![Page 28: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/28.jpg)
The VarianceThe Variance
N
ii xx
NV
1
2)(1
![Page 29: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/29.jpg)
The VarianceThe Variance
The variance is the mean of the squared differences between individual data points and the mean of the array.
Or, after simplifying, the mean of the squares minus the squared mean.
![Page 30: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/30.jpg)
The VarianceThe Variance
22
222
22
22
1
2
2
)1(1
211
12
11
)(1
xx
xxx
xN
xxN
xN
xN
xxN
xN
xxN
V
ii
ii
N
ii
![Page 31: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/31.jpg)
The VarianceThe Variance
In what units is the variance?
Is that a problem?
![Page 32: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/32.jpg)
V
Nx xi
i
N1
1
2( )
The Standard DeviationThe Standard Deviation
![Page 33: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/33.jpg)
The Standard DeviationThe Standard Deviation
The standard deviation is the square root of the variance. Standard deviation is not the mean difference between individual data points and the mean of the array.
1 1 2
Nx x
Nx x ( )
![Page 34: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/34.jpg)
The Standard DeviationThe Standard Deviation
In what units is the standard deviation?
Is that a problem?
![Page 35: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/35.jpg)
The Coefficient of Variation*The Coefficient of Variation*
CVx
100
*Sometimes called the Relative Standard Deviation (RSD or %RSD)
![Page 36: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/36.jpg)
Standard Deviation (or Error) of the Mean
Standard Deviation (or Error) of the Mean
The standard deviation of an average decreases by the reciprocal of the square root of the number of data points used to calculate the average.
x
x
N
![Page 37: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/37.jpg)
ExercisesExercises
How many measurements must we average to improve our precision by a factor of 2?
![Page 38: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/38.jpg)
AnswerAnswer
To improve precision by a factor of 2:
1
205
1
1
052
2 42
.
.
( )
N
N
N quadruplicate
![Page 39: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/39.jpg)
ExercisesExercises
• How many measurements must we average to improve our precision by a factor of 2?
• How many to improve our precision by a factor of 10?
![Page 40: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/40.jpg)
AnswerAnswer
To improve precision by a factor of 10:
1
1001
1
1
0110
10 1002
.
.
!
N
N
N times
![Page 41: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/41.jpg)
ExercisesExercises
• How many measurements must we average to improve our precision by a factor of 2?
• How many to improve our precision by a factor of 10?
• If an assay has a CV of 7%, and we decide run samples in duplicate and average the measurements, what should the resulting CV be?
![Page 42: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/42.jpg)
AnswerAnswer
Improvement in CV by running duplicates:
CVCV
dup 2
7
1414 9%
..
![Page 43: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/43.jpg)
Population vs. Sample standard deviation
Population vs. Sample standard deviation
• When we speak of a population, we’re referring to the entire data set, which will have a mean :
Population meanN
xii
1
![Page 44: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/44.jpg)
Population vs. Sample standard deviation
Population vs. Sample standard deviation
• When we speak of a population, we’re referring to the entire data set, which will have a mean
• When we speak of a sample, we’re referring to a subset of the population, customarily designated “x-bar”
• Which is used to calculate the standard deviation?
![Page 45: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/45.jpg)
“Sir, I have found you an argument. I am not obliged to find you an understanding.”
“Sir, I have found you an argument. I am not obliged to find you an understanding.”
Samuel Johnson (1709-1784)
![Page 46: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/46.jpg)
Population vs. Sample standard deviation
Population vs. Sample standard deviation
1
1
1
2
2
Nx
sN
x x
ii
ii
( )
( )
![Page 47: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/47.jpg)
DistributionsDistributions
• Definition
![Page 48: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/48.jpg)
Statistical (probability) Distribution
Statistical (probability) Distribution
• A statistical distribution is a mathematically-derived probability function that can be used to predict the characteristics of certain applicable real populations
• Statistical methods based on probability distributions are parametric, since certain assumptions are made about the data
![Page 49: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/49.jpg)
DistributionsDistributions
• Definition
• Examples
![Page 50: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/50.jpg)
Binomial distributionBinomial distribution
The binomial distribution applies to events that have two possible outcomes. The probability of r successes in n attempts, when the probability of success in any individual attempt is p, is given by:
P r p n p pn
r n rr n r( ; , ) ( )
!
!( )!
1
![Page 51: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/51.jpg)
ExampleExample
What is the probability that 10 of the 12 babies born one busy evening in your hospital will be girls?
![Page 52: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/52.jpg)
SolutionSolution
P
or
( ; . , ) . ( . )!
!( )!
. .
10 05 12 05 1 0512
10 12 10
0016 16%
10 12 10
![Page 53: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/53.jpg)
DistributionsDistributions
• Definition
• Examples– Binomial
![Page 54: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/54.jpg)
“God does arithmetic”“God does arithmetic”
Karl Friedrich Gauss (1777-1855)
![Page 55: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/55.jpg)
The Gaussian DistributionThe Gaussian Distribution
What is the Gaussian distribution?
![Page 56: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/56.jpg)
63813612287795296172246185 etc.
![Page 57: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/57.jpg)
1 100number
F
![Page 58: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/58.jpg)
63813612287795296172246185
22735433995612858241677438
+
851529045127121407015441388110493
=
![Page 59: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/59.jpg)
2 200number
F
![Page 60: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/60.jpg)
. . . etc.
![Page 61: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/61.jpg)
Pro
bab
ility
x
![Page 62: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/62.jpg)
The Gaussian Probability Function
The Gaussian Probability Function
P x e x( ; , ) ( ) /
1
2
2 22
The probability of x in a Gaussian distribution with mean and standard deviation is given by:
![Page 63: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/63.jpg)
The Gaussian DistributionThe Gaussian Distribution
• What is the Gaussian distribution?
• What types of data fit a Gaussian distribution?
![Page 64: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/64.jpg)
“Like the ski resort full of girls hunting for husbands and husbands
hunting for girls, the situation isnot as symmetrical as it might
seem.”
“Like the ski resort full of girls hunting for husbands and husbands
hunting for girls, the situation isnot as symmetrical as it might
seem.”
Alan Lindsay Mackay (1926- )
![Page 65: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/65.jpg)
Are these Gaussian?Are these Gaussian?
• Human height
• Outside temperature
• Raindrop size
• Blood glucose concentration
• Serum CK activity
• QC results
• Proficiency results
![Page 66: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/66.jpg)
The Gaussian DistributionThe Gaussian Distribution
• What is the Gaussian distribution?
• What types of data fit a Gaussian distribution?
• What is the advantage of using a Gaussian distribution?
![Page 67: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/67.jpg)
Gaussian probability distributionGaussian probability distributionP
rob
abil
ity
µ µ+ µ+2 µ+3µ-µ-2µ-3
.67
.95
![Page 68: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/68.jpg)
What are the odds of an observation . . .
What are the odds of an observation . . .
• more than 1 from the mean (+/-)
• more than 2 greater than the mean
• more than 3 from the mean
![Page 69: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/69.jpg)
Some useful Gaussian probabilities
Some useful Gaussian probabilities
Range Probability Odds+/- 1.00 68.3% 1 in 3+/- 1.64 90.0% 1 in 10+/- 1.96 95.0% 1 in 20+/- 2.58 99.0% 1 in 100
![Page 70: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/70.jpg)
ExampleExample
This
Th
at
![Page 71: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/71.jpg)
[On the Gaussian curve] “Experimentalists think that it is a mathematical theorem while the mathematicians believe it to be
an experimental fact.”
[On the Gaussian curve] “Experimentalists think that it is a mathematical theorem while the mathematicians believe it to be
an experimental fact.”
Gabriel Lippman (1845-1921)
![Page 72: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/72.jpg)
DistributionsDistributions
• Definition
• Examples– Binomial– Gaussian
![Page 73: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/73.jpg)
"Life is good for only two things, discovering
mathematics and teaching mathematics"
"Life is good for only two things, discovering
mathematics and teaching mathematics"
Siméon Poisson (1781-1840)
![Page 74: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/74.jpg)
The Poisson DistributionThe Poisson Distribution
The Poisson distribution predicts the frequency of r events occurring randomly in time, when the expected frequency is
P re
r
r
( ; )!
![Page 75: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/75.jpg)
Examples of events described by a Poisson distribution
Examples of events described by a Poisson distribution
??• Lightning
• Accidents
• Laboratory?
![Page 76: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/76.jpg)
A very useful property of the Poisson distribution
A very useful property of the Poisson distribution
V r( )
![Page 77: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/77.jpg)
Using the Poisson distributionUsing the Poisson distribution
How many counts must be collected in an RIA in order to ensure an analytical CV of 5% or less?
![Page 78: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/78.jpg)
AnswerAnswer
Since CVx
and
counts
( ) ( )
.
100 100
0 05
400
![Page 79: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/79.jpg)
DistributionsDistributions
• Definition
• Examples– Binomial– Gaussian– Poisson
![Page 80: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/80.jpg)
The Student’s t DistributionThe Student’s t Distribution
When a small sample is selected from a large population, we sometimes have to make certain assumptions in order to apply statistical methods
![Page 81: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/81.jpg)
Questions about our sampleQuestions about our sample
• Is the mean of our sample, x bar, the same as the mean of the population, ?
• Is the standard deviation of our sample, s, the same as the standard deviation for the population, ?
• Unless we can answer both of these questions affirmatively, we don’t know whether our sample has the same distribution as the population from which it was drawn.
![Page 82: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/82.jpg)
Recall that the Gaussian distribution is defined by the probability function:
Note that the exponential factor contains both and , both population parameters. The factor is often simplified by making the substitution:
P x e x( ; , ) ( ) /
1
2
2 22
zx
( )
![Page 83: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/83.jpg)
The variable z in the equation:
is distributed according to a unit gaussian, since it has a mean of zero and a standard deviation of 1
zx
( )
![Page 84: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/84.jpg)
Gaussian probability distributionGaussian probability distributionP
rob
abil
ity
0 1 2 3-1-2-3
.95
z
.67
![Page 85: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/85.jpg)
But if we use the sample mean and standard deviation instead, we get:
and we’ve defined a new quantity, t, which is not distributed according to the unit Gaussian. It is distributed according to the Student’s t distribution.
( )x x
st
![Page 86: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/86.jpg)
Important features of the Student’s t distribution
Important features of the Student’s t distribution
• Use of the t statistic assumes that the parent distribution is Gaussian
• The degree to which the t distribution approximates a gaussian distribution depends on N (the degrees of freedom)
• As N gets larger (above 30 or so), the differences between t and z become negligible
![Page 87: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/87.jpg)
Application of Student’s t distribution to a sample mean
Application of Student’s t distribution to a sample meanThe Student’s t statistic can also be used
to analyze differences between the sample mean and the population mean:
N
s
xt
)(
![Page 88: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/88.jpg)
Comparison of Student’s t and Gaussian distributions
Comparison of Student’s t and Gaussian distributions
Note that, for a sufficiently large N (>30), t can be replaced with z, and a Gaussian distribution can be assumed
![Page 89: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/89.jpg)
ExerciseExercise
The mean age of the 20 participants in one workshop is 27 years, with a standard deviation of 4 years. Next door, another workshop has 16 participants with a mean age of 29 years and standard deviation of 6 years.
Is the second workshop attracting older technologists?
![Page 90: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/90.jpg)
Preliminary analysisPreliminary analysis
• Is the population Gaussian?
• Can we use a Gaussian distribution for our sample?
• What statistic should we calculate?
![Page 91: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/91.jpg)
SolutionSolution
First, calculate the t statistic for the two means:
19.1
16
4
20
6
)2729(
)()(
22
2
22
1
21
21
2
2
1
1
21
N
s
N
s
xx
N
s
N
s
xxt
![Page 92: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/92.jpg)
Solution, cont.Solution, cont.
Next, determine the degrees of freedom:
N N Ndf
1 2 2
16 20 2
34
![Page 93: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/93.jpg)
Statistical TablesStatistical Tables
df t0.050 t0.025 t0.010
- - - -
34 1.645 1.960 2.326
- - - -
![Page 94: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/94.jpg)
ConclusionConclusion
Since 1.16 is less than 1.64 (the t value corresponding to 90% confidence limit), the difference between the mean ages for the participants in the two workshops is not significant
![Page 95: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/95.jpg)
The Paired t TestThe Paired t Test
Suppose we are comparing two sets of data in which each value in one set has a corresponding value in the other. Instead of calculating the difference between the means of the two sets, we can calculate the mean difference between data pairs.
![Page 96: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/96.jpg)
Instead of:
we use:
to calculate t:
( )x x1 2
N
iii xx
Nxx
12121 )(
1)(
tx x
s
Nd
( )1 2
2
![Page 97: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/97.jpg)
Advantage of the Paired tAdvantage of the Paired t
If the type of data permit paired analysis, the paired t test is much more sensitive than the unpaired t.
Why?
![Page 98: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/98.jpg)
Applications of the Paired tApplications of the Paired t
• Method correlation
• Comparison of therapies
![Page 99: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/99.jpg)
DistributionsDistributions
• Definition
• Examples– Binomial– Gaussian– Poisson– Student’s t
![Page 100: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/100.jpg)
The 2 (Chi-square) DistributionThe 2 (Chi-square) Distribution
There is a general formula that relates actual measurements to their predicted values
22
21
[ ( )]y f xi i
ii
N
![Page 101: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/101.jpg)
The 2 (Chi-square) DistributionThe 2 (Chi-square) Distribution
A special (and very useful) application of the 2 distribution is to frequency data
22
1
( )n f
fi i
ii
N
![Page 102: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/102.jpg)
ExerciseExercise
In your hospital, you have had 83 cases of iatrogenic strep infection in your last 725 patients. St. Elsewhere, across town, reports 35 cases of strep in their last 416 patients.
Do you need to review your infection control policies?
![Page 103: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/103.jpg)
AnalysisAnalysis
If your infection control policy is roughly as effective as St. Elsewhere’s, we would expect that the rates of strep infection for the two hospitals would be similar. The expected frequency, then would be the average
83 35
725 416
118
114101034
.
![Page 104: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/104.jpg)
Calculating 2Calculating 2
First, calculate the expected frequencies at your hospital (f1) and St. Elsewhere (f2)
f cases
f cases1
2
725 01034 75
416 01034 43
.
.
![Page 105: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/105.jpg)
Calculating 2Calculating 2
Next, we sum the squared differences between actual and expected frequencies
22
2 283 75
75
35 43
432 34
( )
( ) ( )
.
n f
fi i
ii
![Page 106: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/106.jpg)
Degrees of freedomDegrees of freedom
In general, when comparing k sample proportions, the degrees of freedom for 2 analysis are k - 1. Hence, for our problem, there is 1 degree of freedom.
![Page 107: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/107.jpg)
ConclusionConclusion
A table of 2 values lists 3.841 as the 2
corresponding to a probability of 0.05.
So the variation (2between strep infection rates at the two hospitals is within statistically-predicted limits, and therefore is not significant.
![Page 108: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/108.jpg)
DistributionsDistributions
• Definition
• Examples– Binomial– Gaussian– Poisson– Student’s t 2
![Page 109: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/109.jpg)
The F distributionThe F distribution
• The F distribution predicts the expected differences between the variances of two samples
• This distribution has also been called Snedecor’s F distribution, Fisher distribution, and variance ratio distribution
![Page 110: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/110.jpg)
The F distributionThe F distribution
The F statistic is simply the ratio of two variances
(by convention, the larger V is the numerator)
FVV 1
2
![Page 111: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/111.jpg)
Applications of the F distributionApplications of the F distribution
There are several ways the F distribution can be used. Applications of the F statistic are part of a more general type of statistical analysis called analysis of variance (ANOVA). We’ll see more about ANOVA later.
![Page 112: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/112.jpg)
ExampleExample
You’re asked to do a “quick and dirty” correlation between three whole blood glucose analyzers. You prick your finger and measure your blood glucose four times on each of the analyzers.
Are the results equivalent?
![Page 113: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/113.jpg)
DataData
Analyzer 1 Analyzer 2 Analyzer 3
71 90 72
75 80 77
65 86 76
69 84 79
![Page 114: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/114.jpg)
AnalysisAnalysis
The mean glucose concentrations for the three analyzers are 70, 85, and 76.
If the three analyzers are equivalent, then we can assume that all of the results are drawn from a overall population with mean and variance 2.
![Page 115: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/115.jpg)
Analysis, cont.Analysis, cont.
Approximate by calculating the mean of the means:
70 85 76
377
![Page 116: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/116.jpg)
Analysis, cont.Analysis, cont.
Calculate the variance of the means:
Vx
( ) ( ) ( )70 77 85 77 76 773
38
2 2 2
![Page 117: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/117.jpg)
Analysis, cont.Analysis, cont.
But what we really want is the variance of the population. Recall that:
x N
![Page 118: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/118.jpg)
Analysis, cont.Analysis, cont.
Since we just calculated
we can solve for
Vx x 2 38
VN N
N
x x
x
22 2
2 2 4 38 152
![Page 119: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/119.jpg)
Analysis, cont.Analysis, cont.
So we now have an estimate of the population variance, which we’d like to compare to the real variance to see whether they differ. But what is the real variance?
We don’t know, but we can calculate the variance based on our individual measurements.
![Page 120: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/120.jpg)
Analysis, cont.Analysis, cont.
If all the data were drawn from a larger population, we can assume that the variances are the same, and we can simply average the variances for the three data sets.
V V V1 2 3
314 4
.
![Page 121: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/121.jpg)
Analysis, cont.Analysis, cont.
Now calculate the F statistic:
F 15214 4
10 6.
.
![Page 122: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/122.jpg)
ConclusionConclusion
A table of F values indicates that 4.26 is the limit for the F statistic at a 95% confidence level (when the appropriate degrees of freedom are selected). Our value of 10.6 exceeds that, so we conclude that there is significant variation between the analyzers.
![Page 123: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/123.jpg)
DistributionsDistributions
• Definition• Examples
– Binomial– Gaussian– Poisson– Student’s t 2
– F
![Page 124: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/124.jpg)
Unknown or irregular distribution
Unknown or irregular distribution
• Transform
![Page 125: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/125.jpg)
Log transformLog transformP
rob
abili
ty
x
Pro
bab
ility
log x
![Page 126: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/126.jpg)
Unknown or irregular distribution
Unknown or irregular distribution
• Transform
• Non-parametric methods
![Page 127: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/127.jpg)
Non-parametric methodsNon-parametric methods
• Non-parametric methods make no assumptions about the distribution of the data
• There are non-parametric methods for characterizing data, as well as for comparing data sets
• These methods are also called distribution-free, robust, or sometimes non-metric tests
![Page 128: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/128.jpg)
Application to Reference RangesApplication to Reference Ranges
The concentrations of most clinical analytes are not usually distributed in a Gaussian manner. Why?
How do we determine the reference range (limits of expected values) for these analytes?
![Page 129: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/129.jpg)
Application to Reference Ranges
Application to Reference Ranges
• Reference ranges for normal, healthy populations are customarily defined as the “central 95%”.
• An entirely non-parametric way of expressing this is to eliminate the upper and lower 2.5% of data, and use the remaining upper and lower values to define the range.
• NCCLS recommends 120 values, dropping the two highest and two lowest.
![Page 130: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/130.jpg)
Application to Reference RangesApplication to Reference Ranges
What happens when we want to compare one reference range with another? This is precisely what CLIA ‘88 requires us to do.
How do we do this?
![Page 131: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/131.jpg)
“Everything should be made as simple as possible, but not
simpler.”
“Everything should be made as simple as possible, but not
simpler.”
Albert Einstein
![Page 132: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/132.jpg)
Solution #1: Simple comparisonSolution #1: Simple comparison
Suppose we just do a small internal reference range study, and compare our results to the manufacturer’s range.
How do we compare them?
Is this a valid approach?
![Page 133: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/133.jpg)
NCCLS recommendations
• Inspection Method: Verify reference populations are equivalent
• Limited Validation: Collect 20 reference specimens– No more than 2 exceed range– Repeat if failed
• Extended Validation: Collect 60 reference specimens; compare ranges.
![Page 134: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/134.jpg)
Solution #2: Mann-Whitney*Solution #2: Mann-Whitney*
Rank normal values (x1,x2,x3...xn) and the reference population (y1,y2,y3...yn):
x1, y1, x2, x3, y2, y3 ... xn, yn
Count the number of y values that follow each x, and call the sum Ux. Calculate Uy also.
*Also called the U test, rank sum test, or Wilcoxen’s test.
![Page 135: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/135.jpg)
Mann-Whitney, cont.Mann-Whitney, cont.
It should be obvious that: Ux + Uy = NxNy
If the two distributions are the same, then:
Ux = Uy = 1/2NxNy
Large differences between Ux and Uy indicate that the distributions are not equivalent
![Page 136: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/136.jpg)
“‘Obvious’ is the most dangerous word in mathematics.”
“‘Obvious’ is the most dangerous word in mathematics.”
Eric Temple Bell (1883-1960)
![Page 137: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/137.jpg)
Solution #3: Run testSolution #3: Run test
In the run test, order the values in the two distributions as before:
x1, y1, x2, x3, y2, y3 ... xn, yn
Add up the number of runs (consecutive values from the same distribution). If the two data sets are randomly selected from one population, there will be few runs.
![Page 138: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/138.jpg)
Solution #4: The Monte Carlo method
Solution #4: The Monte Carlo method
Sometimes, when we don’t know anything about a distribution, the best thing to do is independently test its characteristics.
![Page 139: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/139.jpg)
The Monte Carlo methodThe Monte Carlo method
A xy x
A rx
A
A
sq
cir
cir
sq
2
22
2
4
x
y
![Page 140: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/140.jpg)
The Monte Carlo methodThe Monte Carlo method
Reference population
mean, SD
mean, SD
mean, SD
mean, SD
N
N
N
N
![Page 141: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/141.jpg)
The Monte Carlo methodThe Monte Carlo method
With the Monte Carlo method, we have simulated the test we wish to apply--that is, we have randomly selected samples from the parent distribution, and determined whether our in-house data are in agreement with the randomly-selected samples.
![Page 142: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/142.jpg)
Analysis of paired dataAnalysis of paired data
• For certain types of laboratory studies, the data we gather is paired
• We typically want to know how closely the paired data agree
• We need quantitative measures of the extent to which the data agree or disagree
• Examples?
![Page 143: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/143.jpg)
Examples of paired dataExamples of paired data
• Method correlation data
• Pharmacodynamic effects
• Risk analysis
• Pathophysiology
![Page 144: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/144.jpg)
CorrelationCorrelation
0 5 10 15 20 25 30 35 40 45 50
0
5
10
15
20
25
30
35
40
45
50
![Page 145: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/145.jpg)
Linear regression (least squares)Linear regression (least squares)
Linear regression analysis generates an equation for a straight line
y = mx + b
where m is the slope of the line and b is the value of y when x = 0 (the y-intercept).
The calculated equation minimizes the differences between actual y values and the linear regression line.
![Page 146: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/146.jpg)
CorrelationCorrelation
0 5 10 15 20 25 30 35 40 45 50
0
5
10
15
20
25
30
35
40
45
50
y = 1.031x - 0.024
![Page 147: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/147.jpg)
CovarianceCovariance
Do x and y values vary in concert, or randomly?
cov( , ) ( )( )x yN
y y x xi ii
1
![Page 148: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/148.jpg)
• What if y increases when x increases?
• What if y decreases when x increases?
• What if y and x vary independently?
cov( , ) ( )( )x yN
y y x xi ii
1
![Page 149: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/149.jpg)
CovarianceCovariance
It is clear that the greater the covariance, the stronger the relationship between x and y.
But . . . what about units?
e.g., if you measure glucose in mg/dL, and I measure it in mmol/L, who’s likely to have the highest covariance?
![Page 150: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/150.jpg)
The Correlation CoefficientThe Correlation Coefficient
cov( , )( )( )
x y Ny y x x
x y
i ii
y x
1
1 1
![Page 151: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/151.jpg)
The Correlation CoefficientThe Correlation Coefficient
• The correlation coefficient is a unitless quantity that roughly indicates the degree to which x and y vary in the same direction.
is useful for detecting relationships between parameters, but it is not a very sensitive measure of the spread.
![Page 152: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/152.jpg)
CorrelationCorrelation
0 5 10 15 20 25 30 35 40 45 50
0
5
10
15
20
25
30
35
40
45
50
y = 1.031x - 0.024 = 0.9986
![Page 153: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/153.jpg)
CorrelationCorrelation
0 5 10 15 20 25 30 35 40 45 50
0
5
10
15
20
25
30
35
40
45
50
y = 1.031x - 0.024 = 0.9894
![Page 154: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/154.jpg)
Standard Error of the EstimateStandard Error of the Estimate
The linear regression equation gives us a way to calculate an “estimated” y for any given x value, given the symbol ŷ (y-hat):
y mx b
![Page 155: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/155.jpg)
Standard Error of the EstimateStandard Error of the Estimate
Now what we are interested in is the average difference between the measured y and its estimate, ŷ :
sN
y yy x i ii
/ ( ) 1 2
![Page 156: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/156.jpg)
CorrelationCorrelation
0 5 10 15 20 25 30 35 40 45 50
0
5
10
15
20
25
30
35
40
45
50
y = 1.031x - 0.024 = 0.9986sy/x=1.83
![Page 157: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/157.jpg)
CorrelationCorrelation
0 5 10 15 20 25 30 35 40 45 50
0
5
10
15
20
25
30
35
40
45
50
y = 1.031x - 0.024 = 0.9894sy/x = 5.32
![Page 158: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/158.jpg)
Standard Error of the EstimateStandard Error of the Estimate
If we assume that the errors in the y measurements are Gaussian (is that a safe assumption?), then the standard error of the estimate gives us the boundaries within which 67% of the y values will fall.
2sy/x defines the 95% boundaries..
![Page 159: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/159.jpg)
Limitations of linear regressionLimitations of linear regression
• Assumes no error in x measurement
• Assumes that variance in y is constant throughout concentration range
![Page 160: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/160.jpg)
Alternative approachesAlternative approaches
• Weighted linear regression analysis can compensate for non-constant variance among y measurements
• Deming regression analysis takes into account variance in the x measurements
• Weighted Deming regression analysis allows for both
![Page 161: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/161.jpg)
Evaluating method performanceEvaluating method performance
• Precision
![Page 162: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/162.jpg)
Method PrecisionMethod Precision
• Within-run: 10 or 20 replicates– What types of errors does within-run precision
reflect?
• Day-to-day: NCCLS recommends evaluation over 20 days– What types of errors does day-to-day precision
reflect?
![Page 163: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/163.jpg)
Evaluating method performanceEvaluating method performance
• Precision
• Sensitivity
![Page 164: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/164.jpg)
Method SensitivityMethod Sensitivity
• The analytical sensitivity of a method refers to the lowest concentration of analyte that can be reliably detected.
• The most common definition of sensitivity is the analyte concentration that will result in a signal two or three standard deviations above background.
![Page 165: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/165.jpg)
Sig
nal
time
Signal/Noise threshold
![Page 166: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/166.jpg)
Other measures of sensitivityOther measures of sensitivity
• Limit of Detection (LOD) is sometimes defined as the concentration producing an S/N > 3.– In drug testing, LOD is customarily defined as the lowest
concentration that meets all identification criteria.
• Limit of Quantitation (LOQ) is sometimes defined as the concentration producing an S/N >5.– In drug testing, LOQ is customarily defined as the lowest
concentration that can be measured within ±20%.
![Page 167: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/167.jpg)
QuestionQuestion
At an S/N ratio of 5, what is the minimum CV of the measurement?
If the S/N is 5, 20% of the measured signal is noise, which is random. Therefore, the CV must be at least 20%.
![Page 168: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/168.jpg)
Evaluating method performanceEvaluating method performance
• Precision
• Sensitivity
• Linearity
![Page 169: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/169.jpg)
Method LinearityMethod Linearity
• A linear relationship between concentration and signal is not absolutely necessary, but it is highly desirable. Why?
• CLIA ‘88 requires that the linearity of analytical methods is verified on a periodic basis.
![Page 170: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/170.jpg)
Ways to evaluate linearityWays to evaluate linearity
• Visual/linear regression
![Page 171: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/171.jpg)
Sig
nal
Concentration
![Page 172: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/172.jpg)
OutliersOutliers
We can eliminate any point that differs from the next highest value by more than 0.765 (p=0.05) times the spread between the highest and lowest values (Dixon test).
Example: 4, 5, 6, 13
(13 - 4) x 0.765 = 6.89
![Page 173: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/173.jpg)
Limitation of linear regression method
Limitation of linear regression method
If the analytical method has a high variance (CV), it is likely that small deviations from linearity will not be detected due to the high standard error of the estimate
![Page 174: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/174.jpg)
Sig
nal
Concentration
![Page 175: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/175.jpg)
Ways to evaluate linearityWays to evaluate linearity
• Visual/linear regression
• Quadratic regression
![Page 176: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/176.jpg)
Quadratic regressionQuadratic regression
Recall that, for linear data, the relationship between x and y can be expressed as
y = f(x) = a + bx
![Page 177: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/177.jpg)
Quadratic regressionQuadratic regression
A curve is described by the quadratic equation:
y = f(x) = a + bx + cx2
which is identical to the linear equation except for the addition of the cx2 term.
![Page 178: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/178.jpg)
Quadratic regressionQuadratic regression
It should be clear that the smaller the x2 coefficient, c, the closer the data are to linear (since the equation reduces to the linear form when c approaches 0).
What is the drawback to this approach?
![Page 179: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/179.jpg)
Ways to evaluate linearityWays to evaluate linearity
• Visual/linear regression
• Quadratic regression
• Lack-of-fit analysis
![Page 180: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/180.jpg)
Lack-of-fit analysisLack-of-fit analysis• There are two components of the variation
from the regression line– Intrinsic variability of the method– Variability due to deviations from linearity
• The problem is to distinguish between these two sources of variability
• What statistical test do you think is appropriate?
![Page 181: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/181.jpg)
Sig
nal
Concentration
![Page 182: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/182.jpg)
Lack-of-fit analysisLack-of-fit analysis
The ANOVA technique requires that method variance is constant at all concentrations. Cochran’s test is used to test whether this is the case.
V
VpL
ii
0 5981 0 05. ( . )
![Page 183: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/183.jpg)
Lack-of-fit method calculationsLack-of-fit method calculations
• Total sum of the squares: the variance calculated from all of the y values
• Linear regression sum of the squares: the variance of y values from the regression line
• Residual sum of the squares: difference between TSS and LSS
• Lack of fit sum of the squares: the RSS minus the pure error (sum of variances)
![Page 184: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/184.jpg)
Lack-of-fit analysisLack-of-fit analysis
• The LOF is compared to the pure error to give the “G” statistic (which is actually F)
• If the LOF is small compared to the pure error, G is small and the method is linear
• If the LOF is large compared to the pure error, G will be large, indicating significant deviation from linearity
![Page 185: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/185.jpg)
Significance limits for GSignificance limits for G
• 90% confidence = 2.49
• 95% confidence = 3.29
• 99% confidence = 5.42
![Page 186: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/186.jpg)
“If your experiment needs statistics, you ought to have done
a better experiment.”
“If your experiment needs statistics, you ought to have done
a better experiment.”
Ernest Rutherford (1871-1937)
![Page 187: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/187.jpg)
Evaluating Clinical Performance of laboratory tests
Evaluating Clinical Performance of laboratory tests
• The clinical performance of a laboratory test defines how well it predicts disease
• The sensitivity of a test indicates the likelihood that it will be positive when disease is present
![Page 188: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/188.jpg)
Clinical SensitivityClinical Sensitivity
SensitivityTP
TP FN
100
If TP as the number of “true positives”, and FN is the number of “false negatives”, the sensitivity is defined as:
![Page 189: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/189.jpg)
ExampleExample
Of 25 admitted cocaine abusers, 23 tested positive for urinary benzoylecgonine and 2 tested negative. What is the sensitivity of the urine screen?
23
23 2100 92%
![Page 190: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/190.jpg)
Evaluating Clinical Performance of laboratory tests
Evaluating Clinical Performance of laboratory tests
• The clinical performance of a laboratory test defines how well it predicts disease
• The sensitivity of a test indicates the likelihood that it will be positive when disease is present
• The specificity of a test indicates the likelihood that it will be negative when disease is absent
![Page 191: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/191.jpg)
Clinical SpecificityClinical Specificity
If TN is the number of “true negative” results, and FP is the number of falsely positive results, the specificity is defined as:
SpecificityTN
TN FP
100
![Page 192: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/192.jpg)
ExampleExample
What would you guess is the specificity of any particular clinical laboratory test? (Choose any one you want)
![Page 193: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/193.jpg)
AnswerAnswerSince reference ranges are customarily set to
include the central 95% of values in healthy subjects, we expect 5% of values from healthy people to be “abnormal”--this is the false positive rate.
Hence, the specificity of most clinical tests is no better than 95%.
![Page 194: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/194.jpg)
Sensitivity vs. SpecificitySensitivity vs. Specificity
• Sensitivity and specificity are inversely related.
![Page 195: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/195.jpg)
Mar
ker
con
cen
trat
ion
- +Disease
![Page 196: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/196.jpg)
Sensitivity vs. SpecificitySensitivity vs. Specificity
• Sensitivity and specificity are inversely related.
• How do we determine the best compromise between sensitivity and specificity?
![Page 197: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/197.jpg)
Receiver Operating Characteristic
Receiver Operating Characteristic
Tru
e p
osi
tive
rat
e(s
ensi
tivi
ty)
False positive rate1-specificity
![Page 198: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/198.jpg)
Evaluating Clinical Performance of laboratory tests
Evaluating Clinical Performance of laboratory tests
• The sensitivity of a test indicates the likelihood that it will be positive when disease is present
• The specificity of a test indicates the likelihood that it will be negative when disease is absent
• The predictive value of a test indicates the probability that the test result correctly classifies a patient
![Page 199: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/199.jpg)
Predictive ValuePredictive Value
The predictive value of a clinical laboratory test takes into account the prevalence of a certain disease, to quantify the probability that a positive test is associated with the disease in a randomly-selected individual, or alternatively, that a negative test is associated with health.
![Page 200: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/200.jpg)
IllustrationIllustration
• Suppose you have invented a new screening test for Addison disease.
• The test correctly identified 98 of 100 patients with confirmed Addison disease (What is the sensitivity?)
• The test was positive in only 2 of 1000 patients with no evidence of Addison disease (What is the specificity?)
![Page 201: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/201.jpg)
Test performanceTest performance
• The sensitivity is 98.0%
• The specificity is 99.8%
• But Addison disease is a rare disorder--incidence = 1:10,000
• What happens if we screen 1 million people?
![Page 202: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/202.jpg)
AnalysisAnalysis
• In 1 million people, there will be 100 cases of Addison disease.
• Our test will identify 98 of these cases (TP)
• Of the 999,900 non-Addison subjects, the test will be positive in 0.2%, or about 2,000 (FP).
![Page 203: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/203.jpg)
Predictive value of the positive test
Predictive value of the positive test
The predictive value is the % of all positives that are true positives:
PVTP
TP FP
100
9898 2000
100
4 7%.
![Page 204: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/204.jpg)
What about the negative predictive value?
What about the negative predictive value?
• TN = 999,900 - 2000 = 997,900
• FN = 100 * 0.002 = 0 (or 1)
PVTN
TN FN
100
997 900997 900 1
100
100%
,,
![Page 205: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/205.jpg)
Summary of predictive valueSummary of predictive value
Predictive value describes the usefulness of a clinical laboratory test in the real world.
Or does it?
![Page 206: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/206.jpg)
Lessons about predictive valueLessons about predictive value
• Even when you have a very good test, it is generally not cost effective to screen for diseases which have low incidence in the general population. Exception?
• The higher the clinical suspicion, the better the predictive value of the test. Why?
![Page 207: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/207.jpg)
EfficiencyEfficiency
We can combine the PV+ and PV- to give a quantity called the efficiency:
The efficiency is the percentage of all patients that are classified correctly by the test result.
EfficiencyTP TN
TP FP TN FN
100
![Page 208: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/208.jpg)
Efficiency of our Addison screenEfficiency of our Addison screen
98 997 90098 2000 997 900 2
100 99 8%
,,
.
![Page 209: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/209.jpg)
“To call in the statistician after the experiment is done may be no more
than asking him to performa postmortem examination: he may be able to say what the experiment
died of.”
“To call in the statistician after the experiment is done may be no more
than asking him to performa postmortem examination: he may be able to say what the experiment
died of.”
Ronald Aylmer Fisher (1890 - 1962)
![Page 210: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/210.jpg)
Application of Statistics to Quality Control
Application of Statistics to Quality Control
• We expect quality control to fit a Gaussian distribution
• We can use Gaussian statistics to predict the variability in quality control values
• What sort of tolerance will we allow for variation in quality control values?
• Generally, we will question variations that have a statistical probability of less than 5%
![Page 211: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/211.jpg)
“He uses statistics as a drunken man uses lamp posts --
for support rather than illumination.”
“He uses statistics as a drunken man uses lamp posts --
for support rather than illumination.”
Andrew Lang (1844-1912)
![Page 212: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/212.jpg)
Westgard’s rulesWestgard’s rules
• 12s
• 13s
• 22s
• R4s
• 41s
• 10x
1 in 20
1 in 300
1 in 400
1 in 800
1 in 600
1 in 1000
![Page 213: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/213.jpg)
Some examplesSome examples
mean
+1sd
+2sd
+3sd
-1sd
-2sd
-3sd
![Page 214: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/214.jpg)
Some examplesSome examples
mean
+1sd
+2sd
+3sd
-1sd
-2sd
-3sd
![Page 215: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/215.jpg)
Some examplesSome examples
mean
+1sd
+2sd
+3sd
-1sd
-2sd
-3sd
![Page 216: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/216.jpg)
Some examplesSome examples
mean
+1sd
+2sd
+3sd
-1sd
-2sd
-3sd
![Page 217: Statistics excellent](https://reader038.vdocument.in/reader038/viewer/2022102618/54c66fb64a795913618b461c/html5/thumbnails/217.jpg)
“In science one tries to tell people, in such a way as to be
understood by everyone, something that
no one ever knew before. But in poetry, it's the exact opposite.”
“In science one tries to tell people, in such a way as to be
understood by everyone, something that
no one ever knew before. But in poetry, it's the exact opposite.”
Paul Adrien Maurice Dirac (1902- 1984)