cci firearms and toolmark examiner academy
DESCRIPTION
CCI Firearms and Toolmark Examiner Academy Workshop on Current F irearms and T oolmark R esearch Pushing O ut the Frontiers of Forensic Science. Outline. Morning- ish Introduction and the Daubert Standard Confocal Microscopy Focus Variation Microscopy - PowerPoint PPT PresentationTRANSCRIPT
CCI Firearms and Toolmark Examiner Academy Workshop on Current Firearms and Toolmark
Research
Pushing Out the Frontiers of Forensic Science
3 2 1 0 1 2 3
Outline
• Morning-ish
• Introduction and the Daubert Standard
• Confocal Microscopy
• Focus Variation Microscopy
• Interferometric Microscopy
• Surface Data/Filtering
Outline• Afternoon-ish
• Similarity scores and Cross-correlation functions
• Known Match/Known Non-Match Similarity Score histograms. False Positives/False Negatives/Error Rates
• Multivariate Discrimination of Toolmarks• Measures of “Match Quality”
• Confidence• Posterior Error Rate/Random Match
Probability• Lessons learned in conducting a successful
research project
Introduction• DNA profiling the most successful application of
statistics in forensic science.• Responsible for current interest in “raising standards” of
other branches in forensics…??
• No protocols for the application of statistics to comparison of tool marks.• Our goal: application of objective, numerical
computational pattern comparison to tool marks
Caution: Statistics is not a panacea!!!!
• Daubert (1993)- Judges are the “gatekeepers” of scientific evidence.
• Must determine if the science is reliable • Has empirical testing been done?
• Falsifiability
• Has the science been subject to peer review?
• Are there known error rates?
• Is there general acceptance?
• Federal Government and 26(-ish) States are “Daubert States”
The Daubert Standard
Tool Mark Comparison Microscope
G. Petillo
G. Petillo
4 mm
Known Match Comparisons5/8” Consecutively manufactured chisels
G. Petillo
Known NON Match Comparisons5/8” Consecutively manufactured chisels
G. Petillo
4 mm 4 mm
600 um
5/8” Consecutively manufactured chisels
Marvin Minsky First confocal microscope
Confocal Microscope
Confocal Microscopes
In focus light
Out of focus light
Tool mark surface(profile of a striation pattern)
Focal planefor objective
Sample stage
Objective lens
Illumination aperture
Source
Confocal pinholeDetector
Rastering pattern oflaser confocal
Nipkow disk sweepsmany pinholes
Programmable array Illumination/DetectionGet any illumination/detection pattern
Sample stageScan stage in“z”-direction
Objective’s focal plane
Sample stageScan stage in“z”-direction
Detector
Objective’s focal plane
Sample stageScan stage in“z”-direction
Detector
Objective’s focal plane
Sample stage
Scan stage in“z”-direction
Detector
Objective’s focal plane
Sample stage
Scan stage in“z”-direction
Detector
Objective’s focal plane
Sample stage
Scan stage in“z”-direction
Detector
Objective’s focal plane
Sample stage
Scan stage in“z”-direction
Detector
Objective’s focal plane
Sample stage
Scan stage in“z”-direction
Detector
Objective’s focal plane
Sample stage
Scan stage in“z”-direction
Detector
Objective’s focal plane
Sample stage
Scan stage in“z”-direction
Detector
Objective’s focal plane
Sample stage
Scan stage in“z”-direction
Detector
Objective’s focal plane
Sample stage
Scan stage in“z”-direction
Detector
Objective’s focal plane
Sample stage
Scan stage in“z”-direction
Detector
Objective’s focal plane
Sample stage
Scan stage in“z”-direction
Detector
Objective’s focal plane
Sample stage
Scan stage in“z”-direction
Detector
Objective’s focal plane
Sample stage
Scan stage in“z”-direction
Detector
Objective’s focal plane
Sample stage
Scan stage in“z”-direction
Detector
Objective’s focal plane
Sample stage
Scan stage in“z”-direction
Detector
Objective’s focal plane
Sample stage
Scan stage in“z”-direction
Detector
Objective’s focal plane
Sample stage
Scan stage in“z”-direction
Detector
Objective’s focal plane
Sample stage
Scan stage in“z”-direction
Detector
Objective’s focal plane
Sample stage
Scan stage in“z”-direction
Detector
Objective’s focal plane
Sample stage
Scan stage in“z”-direction
Detector
Objective’s focal plane
Detector For Each Detector Pixel:
Record the “axial response” as stage is moved along the z-direction
Point on surface corresponding topixel’s is in maximum focus here
Increasing surface height
All-in-Focus 2D Image
Overlay confocal “z-stack”
• 3D confocal image of portion of chisel striation pattern
• Use high NA objectives for best results
• Small working distances
• Flanks up to ~ 70o
• Cost ~150K – 250K (FTI IBIS ~1M)
• Get a vibration isolation table for your instrument ~7K
• Set up in a (dry) basement if possible
• Accuracy down to +/- 10 nm
Confocal Microscope Trivia
Optical slice thickness =
• Some manufactures:
• Olympus
• LEXT (Laser)
• Zeiss
• CSM (White Light)
• LSM (Laser)
• Nanofocus
• msurf series (White Light)
• Sensofar/Leica
• Plu series/DCM (White Light)
Confocal Microscope Trivia
Focus Variation Microscope
Scherer and Prantl
“Low res” commonFocus variation mic
~ +/- 1mm
In focus light
Out of focus light
Tool mark surface(profile of a striation pattern)
Focal planefor objective
Sample stage
Objective lens
Source
Detector
Cutaway
Alicona, GMBH
Sample stageScan stage in“z”-direction
Objective’s focal plane
Detector For Each Detector Pixel:
Record the “axial response” as stage is moved along the z-direction
Point on surface corresponding topixel is in maximum focus here
Focus Determination:Detector
Pixel of interest
Compute standard deviation (sd) of pixels grey valuesin the neighborhood
A pixel in focus sits in a neighborhood with a large sd
• Use high NA objectives for best results
• Can use external light
• Large working distances
• Flanks up to ~75o
• Cost ~200K – 250K.
• 80K models WON’T have the vertical resolution needed for forensic work
• Get a vibration isolation table for your instrument ~7K
• Set up in a (dry) basement if possible
• Accuracy down to +/- 10nm
Focus Variation Microscope Trivia
• Some manufactures:• Alicona
• IFM• Can get optional rotational stage
• Sensofar/Leica
• S neox/DCM
Focus Variation Microscope Trivia
Interferometer
Incoming wave
split
Path lengths equalRecombine in-phase
Fixed mirror
Movable mirrorrecombine
Interferometer
Incoming wave
split
Path lengths NOT equalRecombine out-of-phase
Fixed mirror
Movable mirrorrecombine
Interferometric Height Measurement
• The basic idea:• Each surface point is a “fixed mirror”
• Move a reference mirror in objective
• Split beams recombine in and out of phase
• Constructive interference occurs when surface points in focal plane
• Infer the surface heights from where constructive interference occurs
Interferometric Microscope
James WyantEarly Interferometric
Microscope
Early InterferometricMicroscope for Surafce Metrology
Wyant
Wyant
Modern InterferometricMicroscope for Surafce Metrology
Tool mark surface(profile of a striation pattern)
Focal planefor objective
Sample stage
Objective lens
Camera (Detector)Source
MicroscopeConfiguration
PiezoReference mirror
Beam-splitter
Scan objective forInterference in “z”-direction
Path lengths equalPoint in focus
Tool mark surface(profile of a striation pattern)
Sample stage
Objective lens
Camera (Detector)Source
MicroscopeConfiguration
PiezoReference mirror
Beam-splitter
Scan objective forInterference in “z”-direction
Path lengths un-equalPoint in out of focus
Focal planefor objective
Interference Objectives
Mirau objective~ 10X – 100X
Michelson objective~ 2X – 10X
Linnik objective+ 100X
Detector
For Each Detector Pixel:
Record each pixels interference pattern as objective is scanned
Point on surface corresponding To pixel’s is in maximum focus here
Inference patterns:
Sample stage
Scan objective forInterference in “z”-direction
Fringes
Bruker NSD Bruker NSD
Fringe Pattern Surface
Turn Fringes Into A Surface
Intensity for each detector pixel:
Fourier transform I(z) to get q(k)
Compute surface heights
deGroot
k
arg[
q(k)
]
k0
q
A
with:
Interferometry Trivia
• Use high NA objectives for best results
• Small working distances
• Flanks up to ~25o
• Cost ~200K – 250K.
• Get a vibration isolation table for your instrument ~7K
• Set up in a (dry) basement if possible
• Comes in two modes
• VSI: Accuracy +/- 10nm
• PSI: Accuracy below 1nm
• Some manufactures:• Bruker (Acquired WYKO/Veeco)
• Taylor Hobson
• Sensofar/Leica
• S neox/DCM
Interferometry Trivia
Surface Data
37.88 37.89 37.89 37.90 37.92 37.91 37.93 37.93 37.94 37.9937.88 37.89 37.87 37.87 37.87 37.85 37.89 37.92 37.97 38.0237.86 37.85 37.84 37.84 37.84 37.85 37.85 37.92 37.98 38.0337.84 37.82 37.81 37.81 37.83 37.85 37.88 37.92 37.97 38.0437.81 37.80 37.80 37.82 37.84 37.86 37.89 37.94 37.98 38.0537.81 37.78 37.79 37.82 37.85 37.89 37.94 37.96 38.00 38.0437.82 37.80 37.80 37.83 37.87 37.91 37.98 37.99 38.02 38.0537.84 37.81 37.80 37.81 37.84 37.89 37.95 37.99 38.01 38.0637.84 37.80 37.76 37.77 37.78 37.86 37.92 37.96 38.00 38.0337.80 37.77 37.76 37.74 37.79 37.84 37.90 37.93 37.98 38.00
Surface heights (mm)
Land Engraved Area:
Point are “double precision”: 64-bits/point BIG FILES!
Surface Data
Detector levels (16-bit values):
Land Engraved Area:
16617 16622 16622 16625 16632 16629 16638 16639 16645 1666516618 16620 16613 16613 16610 16605 16622 16632 16656 1667616606 16602 16600 16597 16597 16603 16604 16632 16662 1668416600 16589 16587 16587 16594 16603 16616 16632 16658 1668616585 16583 16583 16588 16599 16608 16619 16643 16662 1668916587 16572 16579 16590 16604 16622 16641 16652 16669 1668816591 16581 16583 16594 16610 16630 16661 16663 16679 1669216597 16586 16583 16585 16597 16623 16646 16666 16674 1669516599 16581 16566 16569 16574 16607 16634 16651 16669 1668316581 16567 16562 16556 16575 16597 16625 16640 16660 16671
Point are detector grey levels: 16-bits/point Smaller files. Convert to mm in RAM
• Different systems use different storage formats
• Be aware if writing custom apps. ASK COMPANY FOR FILE FORMAT!
• Alicona: Saves surface data as doubles. HUGE FILES!
• Zeiss: Saves surface data as 16-bit grey levels with conversion factor
• Other?? 24, 32-bit detectors now??
• Need to standardize file format!
• X3DZhang,Brubaker
• Digital-Surf .surPetraco
Surface Data Trivia
• Think of a toolmark surface as being made up of a series of waves
Surface Filtering
• Examine different scales by “blocking out” (filtering) some of the sinusoids
Surface Filtering
“Low Pass” filter blocks high frequencies and passes low frequencies (long wavelengths)
• Examine different scales by “blocking out” (filtering) some of the sinusoids
Surface Filtering
“High Pass” filter blocks low frequencies and passes high frequencies (short wavelengths)
• Wavelength “cutoffs”
Surface Filtering Trivia
A “High Pass” filterA “Low Pass” filter
lcut lcut
• Wavelength ranges
• Short wavelengths passed: roughness
• Medium wavelengths passed: waviness
• Long wavelengths passed: form
• Band-pass filter: Select narrow wavelength bands to keep.
• High-pass/Low-pass combinations (Filter banks)
• Wavelets are great at doing this
Surface Filtering
Statistics
Weapon Mark Association
– What measurement techniques can be used to obtain data for toolmarks?
– What statistical methods should be used?• How do we measure a degree of confidence for an association, i.e. a
“match”?• What are the identification error rates for different methods of
identification?
• R is not a black box!• Codes available for review; totally transparent!
• R maintained by a professional group of statisticians, and computational scientists• From very simple to state-of-the-art procedures
available
• Very good graphics for exhibits and papers
• R is extensible (it is a full scripting language)• Coding/syntax similar to MATLAB
• Easy to link to C/C++ routines
Why ?
• Where to get information on R :• R: http://www.r-project.org/
• Just need the base
• RStudio: http://rstudio.org/
• A great IDE for R
• Work on all platforms
• Sometimes slows down performance…
• CRAN: http://cran.r-project.org/
• Library repository for R
• Click on Search on the left of the website to search for package/info on packages
Why ?
Finding our way around R/RStudio
• Gauge similarity between tool marks with one number• Similarity “metric” is a function which
measures “sameness”• Only requirement: s(A,B) = s(B,A)
• There are an INFINITE number of ways to measure similarity!!
Common Computational Practice
• Often max CCF is used.
Cross-correlation
Cross-correlation
KNM can sometimes have high max-ccf…
max-ccf: 0.751
• Glock primer shear: Each profile ~2+ mm
• Lag over 2000 units (~0.8 mm)• Max CCF distributions
Cross-Correlation
Scores from“Known Non-Matches”
Scores from “Known Matches”
We thought: Ehhhhhh…….
• Random variables - All measurements have an associated “randomness” component
• Randomness –patternless, unstructured, typical, total ignoranceChaitin, Claude
Multivariate Feature Vectors
• For an experiment/observation, put many measurements together into a list • Collection random variables into a list called a
random vector
1. Also called: observation vectors
feature vectors
• Potential feature vectors for surface metrology• Entire surfaces
• *Surface profiles
• Surface/profile parameters
• Surface/profile Fourier transform or wavelet coefficients
• Translation/rotation/scale invariant surface (image) moments
Multivariate Feature Vectors
Mean total profile:
Mean waviness profile:
Waviness profile
Barcode representation
Tool
mar
ks (
scre
wdr
iver
str
iati
on p
rofi
les)
for
m d
atab
ase
Biasotti-Murdock Dictionary
Consecutive Matching Striae (CMS)-Space
Some Important Terms
• Latent Variable: weighted combination of experimental variables into a new “synthetic” variable• Also called: scores, components or factors
• The weights are called loadings
• Most latent variables we will study are linear combinations between experimental variables and loadings:• Dot prod. between obs. vect. and loading vect.
gives a score:
• PCA:
• Is a rotation of reference frame
• Gives new PC directions’ relative importance
• PC variance
Principal Component Analysis
• Technically, PCA is an eigenvalue-problem• Diagonalize some version of S or R to get a PCs
• Typically
Principal Component Analysis
covariancematrix matrix of PC
“loadings”matrix of PC variances
• For a data frame of p variables, there are p possible PCs.
• s ≅ PC importance, dimension reduction
• Scores are data projected into space of PCs retained
• Scores plots, either 2D or 3D
• Need a data matrix to do machine learning
Setup for Multivariate Analysis
Represent as a vector of values
{-4.62, -4.60, -4.58, ...} • Each profile or surface is a row in the data matrix • Typical length is ~4000 points/profile• 2D surfaces are far longer
• HIGHLY REDUNDANT representation of surface data
• PCA can:• Remove much of the redundancy• Make discrimination computations
far more tractable
• How many PCs should we use to represent the data??
• No unique answer
• FIRST we need an algorithm to I.D. a toolmark to a tool
• ~45% variance retained
• 3D PCA of 1740 real and simulated mean profiles of striation patterns from 58 screwdrivers:
Support Vector Machines• Support Vector Machines (SVM) determine
efficient association rules• In the absence of specific knowledge of probability
densities
SVM decision boundary
Support Vector Machines• SVM computed as optimization of “Lagrange
multipliers”
• Quadratic optimization problem • Convex => SVMs unique unlike NNs
• k(xi,xj) kernel function
• “Warps” data space and helps to find separations
• Many forms depending on application: linear, rbf usually
• C: penalty parameter • control the margin of error between groups that are not
perfectly separable: 0.1 to 10 usually
Support Vector Machines
• The SVM decision rule is given as:
• Equation for a plane in “kernel space”
• Multi group classification handled by “voting”
• How many Principal Components should we use?
PCA-SVM
With 7 PCs, expect ~3% error rate
With 13 PCs, expect ~1% error rate
• This supervised technique is called Linear Discriminant Analysis (LDA) in R• Also called Fisher linear discriminant analysis
• CVA is closely related to linear Bayes-Gaussian discriminant analysis
Canonical Variate Analysis
• Works on a principle similar to PCA: Look for “interesting directions in data space”• CVA: Find directions in space which best separate
groups.• Technically: find directions which maximize ratio of
between group to within variation
Canonical Variate Analysis
Project on PC1:Not necessarily good group separation!
Project on CV1:Good group separation!
Note: There are #groups -1 or p CVswhich ever is smaller
• Use between-group to within-group covariance matrix, W-1B to find directions of best group separation (CVA loadings, Acv):
Canonical Variate Analysis
• CVA can be used for dimension reduction.• Caution! These “dimensions” are not at right
angles (i.e. not orthogonal)
• CVA plots can thus be distorted from reality
• Always check loading angles!
• Caution! CVA will not work well with very correlated data
• Distance metric used in CVA to assign group i.d. of an unknown data point:
• If data is Gaussian and group covariance structures are the same then CVA classification is the same as Bayes-Gaussian classification.
Canonical Variate Analysis
• 2D/3D-CVA scores plots of RB screwdrivers
2D CVA 3D CVA
Canonical Variate Analysis
• 2D scores plots of RB screwdrivers:
PCA vs. CVA
2D PCA of striation pattern mean profiles 2D CVA of striation pattern mean profiles
• Discriminant functions are trained on a finite set of data • How much fitting should we do?
• What should the model’s dimension be?
Error Rate Estimation
• Model must be used to identify a piece of evidence (data) it was not trained with. • Accurate estimates for error rates of decision
model are critical in forensic science applications.
• The simplest is apparent error rate:• Error rate on training set
• Lousy estimate, but better than nothing
• Cross-Validation: hold-out chunks of data set for testing • Known since 1940s
• Most common: Hold-one-out
Error Rate Estimation
• Bootstrap: Randomly selection of observed data (with replacement) • Known since the 1970s
• Can yield confidence intervals around error rate estimate
• The Best: Small training set, BIG test set
Refined bootstrapped I.D. error rate for primer shear striation patterns= 0.35% 95% C.I. = [0%, 0.83%]
(sample size = 720 real and simulated profiles)
18D PCA-SVM Primer Shear I.D. Model, 2000 Bootstrap Resamples
How good of a “match” is it?Conformal PredictionVovk
• Data should be IID but that’s it C
umul
ativ
e #
of E
rror
s
Sequence of Unk Obs Vects
80% confidence20% errorSlope = 0.2
95% confidence5% errorSlope = 0.05
99% confidence1% errorSlope = 0.01
• Can give a judge or jury an easy to understand measure of reliability of classification result
• This is an orthodox “frequentist”
approach• Roots in Algorithmic Information
Theory
• Confidence on a scale of 0%-100%
• Testable claim: Long run I.D. error-rate should be the chosen significance level
How Conformal Prediction works for us• Given a “bag” of obs with known identities and one obs of
unknown identityVovk
• Estimate how “wrong” labelings are for each observation with a non-conformity score (“wrong-iness”)
• Looking at the “wrong-iness” of known observations in the bag:
• Does labeling-i for the unknown have an unusual amount of “wrong-iness”??:
• For us, one-vs-one SVMs:
• If not:
• ppossible-IDi ≥ chosen level of significance
• Put IDi in the (1 - )*100% confidence interval
Conformal Prediction
Theoretical (Long Run) Error Rate: 5%
Empirical Error Rate: 5.3%
14D PCA-SVM Decision Modelfor screwdriver striation patterns
• For 95%-CPT (PCA-SVM) confidence intervals will not contain the correct I.D. 5% of the time in the long run
• Straight-forward validation/explanation picture for court
Conformal Prediction Drawbacks
• CPT is an interval method• Can (and does) produce multi-label I.D. intervals• A “correct” I.D. is an interval with all labels
• Doesn’t happen often in practice…
• Empty intervals count as “errors”• Well…, what if the “correct” answer isn’t in the database
• An “Open-set” problem which Champod, Gantz and Saunders have pointed out
• Must be run in “on-line” mode for LRG
• After 500+ I.D. attempts run in “off-line” mode we noticed in practice
• An I.D. is output for each questioned toolmark• This is a computer “match”
• What’s the probability it is truly not a “match”?
• Similar problem in genomics for detecting disease from microarray data• They use data and Bayes’ theorem to get an
estimateNo diseasegenomics = Not a true “match”toolmarks
How good of a “match” is it?Efron Empirical Bayes’
Empirical Bayes’• We use Efron’s machinery for “empirical
Bayes’ two-groups model”Efron
• Surprisingly simple!
• Use binned data to do a Poisson regression
• Some notation:
• S-, truly no association, Null hypothesis
• S+, truly an association, Non-null hypothesis
• z, a score derived from a machine learning task to I.D. an unknown pattern with a group• z is a Gaussian random variate for the Null
Empirical Bayes’• From Bayes’ Theorem we can getEfron:
Estimated probability of not a true “match” given the algorithms' output z-score associated with its “match”
Names: Posterior error probability (PEP)Kall
Local false discovery rate (lfdr)Efron
• Suggested interpretation for casework:• We agree with Gelaman and ShaliziGelman:
= Estimated “believability” of machine made association
“…posterior model probabilities …[are]… useful as tools for prediction and for understanding structure in data, as long as these probabilities are not taken too seriously.”
Empirical Bayes’• Bootstrap procedure to get estimate of the KNM distribution of
“Platt-scores”Platt,e1071
• Use a “Training” set
• Use this to get p-values/z-values on a “Validation” set
• Inspired by Storey and Tibshirani’s Null estimation methodStorey
z-score
From fit histogram by Efron’s method get:
“mixture” density
We can test the fits to
and !
What’s the point??
z-density given KNM => Should be Gaussian
Estimate of prior for KNM
• Use SVM to get KM and KNM “Platt-score” distributions
• Use a “Validation” set
Posterior Association Probability: Believability Curve
12D PCA-SVM locfdr fit for Glock primer shear patterns
+/- 2 standard errors
Bayesian over-dispersed Poisson with intercept on test setBayesian Poisson with intercept on test set
Poisson (Efron) on test set Bayesian Poisson on test set
Bayes Factors/Likelihood Ratios
• In the “Forensic Bayesian Framework”, the Likelihood Ratio is the measure of the weight of evidence.• LRs are called Bayes Factors by most statistician
• LRs give the measure of support the “evidence” lends to the “prosecution hypothesis” vs. the “defense hypothesis”
• From Bayes Theorem:
Bayes Factors/Likelihood Ratios
• Once the “fits” for the Empirical Bayes method are obtained, it is easy to compute the corresponding likelihood ratios.o Using the identity:
the likelihood ratio can be computed as:
Bayes Factors/Likelihood Ratios • Using the fit posteriors and priors we can obtain the likelihood ratiosTippett, Ramos
Known match LR values
Known non-match LR values
Empirical Bayes’: Some Things That Bother Me
• Need a lot of z-scores• Big data sets in forensic science largely don’t exist
• z-scores should be fairly independent• Especially necessary for interval estimates around
lfdrEfron
• Requires “binning” in arbitrary number of intervals• Also suffers from the “Open-set” problem• Interpretation of the prior probability for this
application• Should Pr(S-) be 1 or very close to it? How close?
How to Carry Out a “Successful” Research Project
The Synergy Between Practitioners and Academia
Collaboration
• Practitioners:• Think about what questions you want to be able to
answer with data BEFORE experimentation• Write down proposed questions/design
• Be aware that the questions you want answers too MAY NOT have answers• What can you answer??
• Be aware that a typical research project takes 1-2
years to complete
Collaboration
• Practitioners:• Research projects are NOT just for interns!
• Interns typically need tremendous supervision for scientific/applied statistical research
• Take a college course on statistics/experimental design• Rate-my-professor is your friend!
• Visit local university/company websites to look for the outside expertise you may need.• Visit the department, go to some seminars
Collaboration
• Academics/Research consultants:• Be aware practitioners cannot just publish
whenever and whatever they want• Long internal review processes!
• COMMUNICATION!!!!!• Listen carefully to the needs/questions of
collaborating practitioners• Negotiate the project design
• What kind of results can be achieved within a reasonable amount of time?
• Hold regular face to face meetings if possible
Collaboration
• Academics/Research consultants:• Applied research is not just for
undergraduates/high-school interns!• Visit the crime lab!!!!!
• Watch the practitioners do their job.• Learn the tools they use day to day!
• Microscopy!!!!!
• Use their accumulated experience to help guide your design/desired outcomes• What do they focus on??
Fire Debris Analysis Casework
• Liquid gasoline samples recovered during investigation:• Unknown history
• Subjected to various real world conditions.
• If an individual sample can be discriminated from the larger group, this can be of forensic interest.
• Gas-Chromatography Commonly Used to ID gas.• Peak comparisons of chromatograms difficult and time
consuming.• Does “eye-balling” satisfy Daubert, or even Frye .....????
• 2D PCA• 97.3% variance retained
• Avg. LDA HOO correct classification rate: 83%
-2.5 -1.5 -1
PC 1
-0.2
-0.1
0.1
0.2
0.3
0.4
0.5
PC 2
1
1
1
1
1
1
1
22
22
2
22
33
3
333 3
44
4
44
4
4
555
5
5
5
5
66
6
6
66 6
7 7
77
7
7
7
88
8
88
8 8
999
1010 10
111111
121212
131313
14
1414
151515 16
1616
1717
17
181818
19
19
19
2020
20
• 2D CVA• Avg. LDA HOO correct classification rate: 92%
-0.1 -0.08 -0.06 -0.04 -0.02 0.02
CV 1
-0.06
-0.04
-0.02
0.02
0.04
CV 2
111 1111
2222222
3333333
4 44 44
44
5 55
5
55 5
6 666
666
7777
7 77
88888889
99
101010
111111
121212
131313
1414
14
151515161616171717
18 18181919
19202020
Accidental Patterns on Footwear
• Shoe prints contain marks and patterns due to various circumstances that can be used to distinguish one shoe print from another.
• How reliable are the accidental patterns for identifying particular shoes?
-7.5
-5
-2.5
0
2.5
xaxis
-5
0
5
yaxis
-6
-4
-2
0
2
zaxis
1111111
1
1111
X
2222
22
22222
X
333333333333X
444
44
4
444
44
4
X
5555
5 55
5555
X
66666666
6
6
666
66
X
77777777
7777777
X
888888
8888
88
8
X
99999999
999999
9
X
-7.5
-5
-2.5
0
2.5
xaxis
-5
0
5
yaxis
3D PCA59.7% of variance
Facial Recognition Approach to Accidental Pattern Identification
Tool marks• Like shoes, tools can leave marks
which can be used in identification
• Class characteristics
• Subclass characteristics
• Individual characteristics
Standard Striation PatternsMade with ¼’’ Slotted Screwdriver
Measure lines and grooves with ImageJ
Translate ImageJ data to a feature vector that can be processed
A, 2, #2Bromberg, Lucky
C, 8, #4Bromberg, Lucky
LEA Striations
Questioned Documents: Photocopier Identification• Mordente, Gestring, Tytell
050 010 0015 0020 0025 00
50 010 0015 0020 0025 0030 00
050 010 0015 0020 0025 00
50 010 0015 0020 0025 0030 00
050 010 0015 0020 0025 00
50 010 0015 0020 0025 0030 00
050 010 0015 0020 0025 00
50 010 0015 0020 0025 0030 00
050 010 0015 0020 0025 00
50 010 0015 0020 0025 0030 00
050 010 0015 0020 0025 00
50 010 0015 0020 0025 0030 00
050 010 0015 0020 0025 00
50 010 0015 0020 0025 0030 00
050 010 0015 0020 0025 00
50 010 0015 0020 0025 0030 00
050 010 0015 0020 0025 00
50 010 0015 0020 0025 0030 00
050 010 0015 0020 0025 00
50 010 0015 0020 0025 0030 00
Photocopy of a blank sheet of paper
Dust: Where does it come from?
Any matter or substance: • both natural and synthetic • reduces into minute bits, pieces, smears, and residues • encountered as trace aggregates
Our Environments!
Evidence!
N. Petraco
Where can you find it?
Everywhere
HouseWork
OutdoorsVehicle
N. Petraco
Analyze Results3D PCA-Clustering can show potential for discrimination
Bayes Net for Dust in Authentication Case
References• Bolton-King, Evans, Smith, Painter, Allsop, Cranton. AFTE J 42(1),23 2010
• Artigas. In: Optical Measurement of Surface Topography. Leach ed. Springer, 201l
• Helmli. In: Optical Measurement of Surface Topography. Leach ed. Springer, 2011
• deGroot. In: Optical Measurement of Surface Topography. Leach ed. Springer, 201l
• Efron, B. (2010). Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction. New York: Cambridge University Press.
• Gambino C., McLaughlin P., Kuo L., Kammerman F., Shenkin S., Diaczuk P., Petraco N., Hamby J. and Petraco N.D.K., “Forensic Surface Metrology: Tool Mark Evidence", Scanning 27(1-3), 1-7 (2011).
• JAGS “A program for analysis of Bayesian hierarchical models using Markov Chain Monte Carlo simulation”, Version 3.3.0. http://mcmc-jags.sourceforge.net/
• Kall L., Storey J. D., MacCross M. J. and Noble W. S. (2008). Posterior error probabilities and false discovery rates: two sides of the same coin. J Proteome Research, 7(1), 40-44.
References• locfdr R package. 2011. locfdr: “Computation of local false discovery rates”, Version 1.1-7.
http://cran.r-project.org/web/packages/locfdr/index.html• Moran B., "A Report on the AFTE Theory of Identification and Range of Conclusions for Tool Mark
Identification and Resulting Approaches To Casework," AFTE Journal, Vol. 34, No. 2, 2002, pp. 227-35.• Petracoa N. D. K., Chan H., De Forest P. R., Diaczuk P., Gambino C., Hamby J., Kammerman F., Kammrath B.
W., Kubic T. A., Kuo L., Mc Laughlin P., Petillo G., Petraco N., Phelps E., Pizzola P. A., Purcell D. K. and Shenkin P. “Final Report: Application of Machine Learning to Toolmarks: Statistically Based Methods for Impression Pattern Comparisons”. National Institute of Justice, Grant Report: 2009-DN-BX-K041; 2012.
• Petraco N. D. K., Kuo L., Chan H., Phelps E., Gambino C., McLaughlin P., Kammerman F., Diaczuk P., Shenkin P., Petraco N. and Hamby J. “Estimates of Striation Pattern Identification Error Rates by Algorithmic Methods”, AFTE J., In Press, 2013.
• Petraco N. D. K., Zoon P., Baiker M., Kammerman F., Gambino C. “Stochastic and Deterministic Striation Pattern Simulation”. In preparation 2013.
• Platt J. C. “Probabilities for SV Machines”. In: Advances in Large Margin Classifiers Eds: Smola A. J., Bartlett P., Scholkopf B., and Schuurmans D. MIT Press, 2000.
• Plummer M. “JAGS: A Program for Analysis of Bayesian Graphical Models Using Gibbs Sampling”, Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003), March 20–22, Vienna, Austria.
• Stan Development Team. 2013. “Stan: A C++ Library for Probability and Sampling”, Version 1.3. http://mc-stan.org/
• Storey J. D. and Tibshirani R. “Statistical significance for genome wide studies”. PNAS 2003;100(16):9440-9445.
• Vovk V., Gammerman A., and Shafer G. (2005). Algorithmic learning in a random world. 1st ed. Springer, New York.
References20. Tippett CF, Emerson VJ, Fereday MJ, Lawton F, Richardson A, Jones LT, Lampert SM., “The
Evidential Value of the Comparison of Paint Flakes from Sources other than Vehicles”, J Forensic Soc Soc 1968;8(2-3):61-65.
21. Ramos D, Gonzalez-Rodriguez J, Zadora G, Aitken C. “Information-Theoretical Assessment of the Performance of Likelihood Ratio Computation Methods”, J Forensic Sci 2013;58(6):1503-1518.
Acknowledgements
• Professor Chris Saunders (SDSU)
• Professor Christoph Champod (Lausanne)
• Alan Zheng (NIST)
• Research Team:
• Dr. Martin Baiker
• Ms. Helen Chan
• Ms. Julie Cohen
• Mr. Peter Diaczuk
• Dr. Peter De Forest
• Mr. Antonio Del Valle
• Ms. Carol Gambino
• Dr. James Hamby
• Ms. Alison Hartwell, Esq.
• Dr. Thomas Kubic, Esq.
• Ms. Loretta Kuo
• Ms. Frani Kammerman
• Dr. Brooke Kammrath
• Mr. Chris Lucky
• Off. Patrick McLaughlin
• Dr. Linton Mohammed
• Mr. John Murdock
• Mr. Nicholas Petraco
• Dr. Dale Purcel
• Ms. Stephanie Pollut
• Dr. Peter Pizzola
• Dr. Graham Rankin
• Dr. Jacqueline Speir
• Dr. Peter Shenkin
• Mr. Chris Singh
• Mr. Peter Tytell
• Mr. Todd Weller
• Ms. Elizabeth Willie
• Dr. Peter Zoon
Website: Data, codes, reprints and preprints:
toolmarkstatistics.no-ip.org/