Use of Statistics in Evaluation Use of Statistics in Evaluation of Trace Evidenceof Trace Evidence
Robert D. KoonsRobert D. Koons
CFSRU, FBI AcademyCFSRU, FBI Academy
Quantico, VA 22135Quantico, VA 22135
Trace Evidence SymposiumTrace Evidence Symposium
8/15/2007, Clearwater Beach, Florida8/15/2007, Clearwater Beach, Florida
• Comparison of trace evidence is best thought of as a Comparison of trace evidence is best thought of as a process of elimination.process of elimination.
• Selection of features for comparison that provide the Selection of features for comparison that provide the best source discrimination is best source discrimination is alwaysalways a good idea. a good idea.
• Match criteria do not have to be statistically-based to Match criteria do not have to be statistically-based to be effective.be effective.
• Frequency of occurrence statistics for trace evidence Frequency of occurrence statistics for trace evidence can can almostalmost never be calculated for good discriminating never be calculated for good discriminating features.features.
• Databases are useful for making broad classification Databases are useful for making broad classification rules, but they are generally useless for calculating the rules, but they are generally useless for calculating the significance of a match.significance of a match.
My rules for comparison of trace My rules for comparison of trace evidenceevidence
Characteristics of Measurements Characteristics of Measurements on Trace Evidenceon Trace Evidence
• Data for many variables are “continuous”Data for many variables are “continuous”
• Data distributions are “often” unknownData distributions are “often” unknown– Frequency distributions are nonstandardFrequency distributions are nonstandard
– Across-sample distributions are unknownAcross-sample distributions are unknown
– Accuracy and precision of data depends on Accuracy and precision of data depends on analytical methodanalytical method
– Databases are both time and location dependentDatabases are both time and location dependent
• Forensic and scientific (statistical) issues may Forensic and scientific (statistical) issues may not be the samenot be the same
Significance of a MatchSignificance of a Match
Measured Values Increasing Measured Values Increasing
Source 1Source 1 Source 2Source 2QQ
Fisher’s RatioFisher’s Ratio
mm11 and m and m22 are the means of class 1 and class 2 are the means of class 1 and class 2
vv11 and v and v22 are the variances are the variances
F
(m m )
v v1 2
2
1 2
measure for linear discriminating power of a variablemeasure for linear discriminating power of a variable
Barium (µg/g)
0 5 10 15 20 25 30 35 40
Str
on
tiu
m (
µg
/g)
35
40
45
50
55
60
65
Barium (µg/g)
0 5 10 15 20 25 30 35 40
Iro
n (
µg
/g)
500
1000
1500
2000
2500
3000
3500
4000
4500
S K
T
S
K
T
Three sheets of float glassThree sheets of float glass
Truth TableTruth Table
SameSource
DifferentSources
Indistinguishable CorrectFalse
inclusionType IIerror
ExclusionFalse
exclusionType Ierror
Correct
Roles in Sample ComparisonRoles in Sample Comparison
Do the samples Do the samples match?match?
What is the significance?What is the significance?
StatisticsStatisticsAnalytical ScienceAnalytical Science
Some Match MethodologiesSome Match Methodologies
• tt-test -test – Welch’s modification?Welch’s modification?– Multivariate (Bonferroni) correction? Multivariate (Bonferroni) correction?
• Range overlap (many to many or one to many)Range overlap (many to many or one to many)• 22σσ, 3, 3σσ, etc. (or 2s, 3s, etc.), etc. (or 2s, 3s, etc.)• Continuous probabilistic approachContinuous probabilistic approach• Dimension reduction, then matchDimension reduction, then match• Cluster analysis Cluster analysis • Multivariate test (Hotelling’s tMultivariate test (Hotelling’s t22))• Discriminant analysis (PCA)Discriminant analysis (PCA)
300 400 500 600SAL
0
10
20
30
40
Count
59000 60000 61000 62000 63000 64000SCA
0
10
20
30
40
50
60
70
80
90
Count
900 1000 1100 1200SFE
0
10
20
30
40
50
60
Count
30 35 40 45SZR
0
10
20
30
40
50
Count
4 5 6 7 8 9SBA
0
10
20
30
40
50
60
70
80
90
100
Count
1.5171 1.5172 1.5173 1.5174 1.5175S
0
50
100
150
Count
Measured Distributions in a Sheet Measured Distributions in a Sheet of Float Glassof Float Glass
Refractive Index Aluminum BariumRefractive Index Aluminum Barium
Calcium Iron ZincCalcium Iron Zinc
Fiber No 42Fiber No 42
Fiber No 52Fiber No 52
A*
-14 -12 -10 -8 -6 -4 -2
B*
-7
-6
-5
-4
-3
-2
-1
A*
-14 -12 -10 -8 -6 -4 -2
B*
-7
-6
-5
-4
-3
-2
-1
A*
-14
-12
-10
-8
-6
-4
-2
B*
-7
-6
-5
-4
-3
-2
-1
A*
-14
-12
-10
-8
-6
-4
-2
B*
-7
-6
-5
-4
-3
-2
-1
Factor 1 (77%)
-4 -3 -2 -1 0 1 2 3 4
Fa
cto
r 2
(19
%)
-2.5
-2.0
-1.5
-1.0
-0.5
0.0
0.5
1.0
1.5
2.0
PCA plot of Australian ocher dataPCA plot of Australian ocher data
B, Sc, Se, Rb, Pd, B, Sc, Se, Rb, Pd, Hf, Th, and U in Hf, Th, and U in
ochers from 3 areas ochers from 3 areas of Australiaof Australia
From: R.L. Green & From: R.L. Green & R.J. Watling, JFS 7/07R.J. Watling, JFS 7/07
PCA plot of Australian ocher dataPCA plot of Australian ocher data
ochers from 3 ochers from 3 populations within a populations within a
single region of single region of AustraliaAustralia
From: R.L. Green & From: R.L. Green & R.J. Watling, JFS 7/07R.J. Watling, JFS 7/07
Factor 1 (43%)
-4 -3 -2 -1 0 1 2 3 4
Fac
tor
2 (3
4%)
-3
-2
-1
0
1
2
3
4
Distances
0.0 0.5 1.0 1.5 2.0
To
ne
r N
um
ber
2200C
2200B
2200A
4000C
4000B
4000A
2300C
2300B
2300A
Classification of laser jet tonersClassification of laser jet toners
LA-ICP-MS data, heirarchichal clustering, Euclidean distance, LA-ICP-MS data, heirarchichal clustering, Euclidean distance, 9 elements, normalized 9 elements, normalized
Copper Wire Samples
Co
nce
ntr
atio
n (
µg
/g)
0
20
40
60
80
100
120
140
160
180
Ni As Ag Sb Pb Bi
Seven-Stranded Copper Wire by ICP-MSSeven-Stranded Copper Wire by ICP-MS
Hefty Easy Flaps
0
0.5
1Ti
Fe
Zn
Sr
Ba
Ca
Glad Quick Tie
0
0.5
1Ti
Fe
Zn
Sr
Ba
Ca
Ruffies
0
0.5
1Ti
Fe
Zn
Sr
Ba
Ca
Star PlotsStar PlotsPolyethylene Trash BagsPolyethylene Trash Bags
Plant Location
0 2 4 6 8 10 12 14 16 18 20
Pla
tin
um
Co
nc
en
tra
tio
n (
µg
/g)
1e+1
1e+2
1e+3
1e+4
1e+5
1e+6