turning data into information
DESCRIPTION
Turning Data into Information. Limitations and Solutions. Richard Burrows. First, a little history. Lead in Albacore: Guide to Lead Pollution in Americans Science, Vol 207, March 1980 p1167 Typical results for fresh albacore muscle were around 400 ng/g Pb - PowerPoint PPT PresentationTRANSCRIPT
Turning Data into InformationTurning Data into Information
Limitations and Solutions
Richard Burrows
First, a little history
• Lead in Albacore: Guide to Lead Pollution in Americans
~ Science, Vol 207, March 1980 p1167~ Typical results for fresh albacore muscle were
around 400 ng/g Pb~ Typical results for albacore muscle from lead
soldered cans were around 700-1000 ng/g• Therefore, the canning process approximately
doubles the concentration of lead in tuna?
• Actually, when analyzed using clean preparation techniques and isotope dilution ICPMS the concentration of lead in fresh albacore muscle was found to be approx 0.3 ng/g
~ Highly regarded government and commercial laboratories at the time were overestimating the concentration of lead in fresh tuna by over1000 X.
Lab 1 Lab 2
Pb in albacore muscle 400 0.3
Pb in albacore muscle from lead soldered can
700 1400
Factor 1.75 4700
Issues with Detection Limits
• MDL~ Short term~ Small data set~ No consideration of blank bias~ Assumes constant variance
• If there is no blank bias, then no problem, but
.39
Spike Level ug/L STD N Mean
Recovery %
0.007 0.03261 8 0.900 12860 0.01 0.02191 8 0.901 9013
0.015 0.02406 8 0.903 6021 0.02 0.04697 8 0.840 4201 0.03 0.25555 8 0.734 2448 0.04 0.23866 8 0.943 2357 0.07 0.03795 8 0.950 1357 0.1 0.07147 8 1.01 1007 0.15 0.16215 8 1.04 695 0.2 0.24956 8 1.16 582 0.3 0.05539 8 1.26 418 0.4 0.02806 8 1.08 270 0.7 0.02448 8 1.40 200 1 0.19004 8 1.79 179
1.5 0.03847 8 2.24 149
EPA MDL 0.073
EPA ML 0.2
Episode 6000 data, Chromium by 200.8
• The more sensitive the method, the bigger the problem with ignoring blank bias in the detection limit determination~ ICPMS~ Method 1668 PCBs~ Method 1631 Mercury~ SIM analysis
Method blank detect rates, multi lab study
• 8270C 0.3%• 8270 SIM 6.4%• 8260B 2%• 8021B 16%, • ICPMS 8%
Solutions to Detection Limits
• Consider long term variability• Keep non-constant variance in mind• Consider qualitative identification criteria
Take Blank Bias into Account!!
A Better MDL
Use method blanksDL = <X> + ts
Estimate QL At least 2X DL
Check QL with spikes
Check QLAll results > DL
RSD OKRecovery OK
OngoingQuarterly QL verification
Periodic reassessment of
<X> + ts
Detection/Quantitation Federal Advisory Committee Procedure
Issues with Quantitation Limits
• Until recently, no requirement for a prepped standard at the quantitation limit
• Based on MDL• Precision based on statistical prediction
~ 3 times MDL, therefore 10%RSD
Solutions to Quantitation Limits
• Recent method and regulatory updates~ SW-846 Update V
◦ LLOQ standard, prepped, per quarter at least, must be within 50% of true value, or generate in house limits
~ Drinking water methods◦ 7 replicates initially, at MRL, prediction interval
within 50-150%
~ Texas PQL◦ 8 spikes at the PQL◦ 10% RSD, metals; 20% RSD, volatiles; 30% RSD
semivolatiles
Solutions to Quantitation Limits
• DQFAC procedure~ 7 replicates at QL, then quarterly verifications~ Limits not defined in the procedure
DQFAC
• What we need a procedure to do:~ Provide an explicit, verifiable estimate of bias at
the quantitation limit~ Provide an explicit, verifiable estimate of precision
at the quantitation limit~ Provide that qualitative identification criteria
defined in the method are met at the quantitation limit
~ Assess multi and inter laboratory variability when data from more than one laboratory is used
A Better Quantitation Limit
Estimate QL At least 2X DL
Analyze a minimum of 7
replicates divided into at least 3
batches
Use TCEQ PQL level
Analyze a minimum of 8
replicates
FACDQ Procedure
Texas PQL Procedure
A Better Quantitation Limit
Assess Results
Precision and accuracy better than any regulatory
requirements
Assess Results
Precision and accuracy better than TCEQ
requirements
FACDQ Procedure
Texas PQL Procedure
Lowest expected result > detection limit
Ongoing verification
At leat 4 spikes at QL per year
Evaluate at least every 2 years
At least 4 spikes at PQL per year
Evaluate at least once per year
FACDQ Procedure
Texas PQL Procedure
Precision and accuracy better than any regulatory
requirements
Precision and accuracy better than TCEQ
requirements
Ongoing verification assessment (continues
Lowest expected result evaluation
Qualitative identification evaluation
FACDQ Procedure
Texas PQL Procedure
Determination of Precision and Accuracy Criteria
• Step 1
GUESS
Metals Volatiles Semivolatiles
Precision 10% RSD 20% RSD 30% RSD
Accuracy 70-130% 70-130% 50-150%
There will be poor performers…..
Determination of Precision and Accuracy Criteria
• Step 2
• Spike at multiple levels around the anticipated quantitation limit
VERIFY
Analyte ug/L
Benzene 0.5 1 2 4 8
Acrylonitrile 12.5 25 50 100 200
Evaluation levels
• Metals
6010
Thallium blank 1 2 4 8 16
Vanadium blank 5 10 20 40 80
6020
Thallium blank 0.5 1 2 4 8
Vanadium blank 1.25 2.5 5 10 20
Arsenic
0 1 2 3 4 5 61%
10%
100%
1000%
RSD vs. True Concentration (T)
Data Constant-SD Model SL-SD Model Expo-SD Model Hybrid-SD Model IQE10%
IQE20% IQE30%
True Concentration (T)
RS
D
o-Xylene
0 5 10 15 20 251%
10%
100%
RSD vs. True Concentration (T)
Data Constant-SD Model SL-SD Model Expo-SD Model Hybrid-SD Model IQE10%
IQE20% IQE30%
True Concentration (T)
RS
D
Vinyl acetate
0 10 20 30 40 50 601%
10%
100%
RSD vs. True Concentration (T)
Data Constant-SD Model SL-SD Model Expo-SD Model Hybrid-SD ModelIQE10% IQE20% IQE30%
True Concentration (T)
RS
D
Objectives
• NOT the lowest quantitation limits that can be achieved~ Reasonable limits that are relevant to groundwater
monitoring criteria and can be achieved by most labs
~ PQLs that can be verified by data analyzed at the PQL
Next Steps
• Gather additional data bracketing expected quantitation limits
• 30 plus labs involved• Attempt to mimic real world conditions• Large data set will be available in about 6
months
Summary
Determine limits using
multi lab data
Consider regulatory
need
Identify poor performers
Individual labs demonstrate
ability to meet limits
Update limits if
necessary
Don’t need low ppb
levels for minerals
Some analytes will
not meet desired MQOs
Spiking at the PQL
TCEQ collects ongoing
verification data
IQE was used – other
procedures could be used
Problems with solutions to Quantitation limits
• Key points~ Spike at or very close to the quantitation limit
Problems with solutions to Quantitation limits
• Precision is highly dependent on how the data is generated
• Method 8260• 70 analytes, spiked at 0.2 ug/L, one batch
~ Average RSD = 8.2%
• Multiple batches, multiple instruments, spikes aged after preparation to simulate holding time
1ug/L 2ug/L 5ug/L 10ug/L 20ug/L
24.6% RSD 15.7% RSD 13.0% RSD 13.4% RSD 12.8% RSD
Issues with Calibration
• Analyze at least 5 points• RSD, linear regression, quadratic regression• r, r2 > 0.990 (0.995)
The curve that cannot fail
Conc Resp1 0.002 0.003 0.004 0.005 0.0010 0.00
100 117
slope 0.81564corr 0.99679int 4.16667
Calibration issues
r= 0.997, r2 = 0.994 RSE = 179%
Dalapon
RSE = 63%
Solutions to Calibration
• Calculate “readback” for each level~ Recent drinking water methods~ Recent SW-846 methods
• Pros~ Provides an indication of the error introduced at
each level~ Conceptually straightforward
• Cons~ Lots of numbers!~ Difficult to compare different curve types~ Need to be careful with criteria
Solutions to calibration
• RSE~ Extends applicability of RSD (used for average
curve) to all other curve types• Pros
~ Allows easy comparison of curve types~ Will indicate failing calibration if any point (high or
low concentration) has a high deviation from the curve
~ Can use same criteria as RSD• Cons
~ Not currently available in most chromatographic data systems
Error 1 20% 50% 100% 34% 50% 30%
Error 2 20% 20% 20% 28% 0% 10%
Error 3 20% 20% 20% 5% 0% 10%
Error 4 20% 20% 20% 3% 0% 10%
Error 5 20% 20% 20% 1% 0% 10%
Error 6 20% 20% 20% 6% 0% 10%
Error 7 20% 20% 20% 8% 0% 10%
RSE (RSD) 24% 31% 50% 20% 22% 17%
RSE examples
Guidelines Establishing Test Procedures for the Analysis of Pollutants Under the Clean
Water Act; Analysis and Sampling Procedures
When a regression curve is calculated as an alternative to using the average response factor, the quality of the calibration may be evaluated using the Relative Standard Error (RSE). The acceptance criterion for the RSE is the same as the acceptance criterion for Relative Standard Deviation (RSD), in the method. RSE is calculated as:
pn
CPCC
RSE
n
i i
ii
1
2
100
8081A
15 pesticides identifiedWhich are real?
8330B
Solutions to Sample Matrix
• ICPMS – instrumentation advances• Complex chromatograms – possible techniques
exist, but are not used because of cost – GC/GC• Cleanups
2D GC
2E5cps
45 50 55 60 65 70 75 80Mass
Blank Acid Matrices and IPA in ICPMS No Gas Mode
No Gas ModeUnspiked 5% HNO3 + 5% HCl + 1% H2SO4 + 1% IPA Matrix
Unspiked Matrix – ALL peaks are due to polyatomic interferences
Multiple polyatomic interferences affect almost every mass – Interferences are matrix-dependent
Color of spectrum indicates which matrix gave each interfering peak
Page
ClOArC
ArN
ArO, CaO
CaO,NaCl
S2, SO2
ArS, Cl2
Ar2
ArCl
ArOH,CaOH
ClO
CaO
CaO,NaCl
ClO,NaS
SO2, S2,
ArCl
Ar2
Ar2, Ca2, ArCa,S2O, SO3
Br,Ar2H
ArN2H,SO2H
S2, SO2 ArS, Cl2
ArS
Cl2
ClN2, CaOH,ArNaNaClH
Br,Ar2H
SO, SOH
ArC
CO2
SN
CO2H
Cl2H
ArCO, ArCN
45 50 55 60 65 70 75 80Mass
2E5cps
He Mode
ALL polyatomic interferences are removed in He Mode
Unspiked 5% HNO3 + 5% HCl + 1% H2SO4 + 1% IPA Matrix
ALL polyatomic interferences are removed in He Mode (same cell conditions)
Is sensitivity still OK?
Blank Acid Matrices and IPA in He Mode
Color of spectrum indicates which matrix gave each interfering peak
2E5cps
45 50 55 60 65 70 75 80Mass
10ppb Spike in 5% HNO3 + 5% HCl + 1% H2SO4 + 1% IPA Matrix
Consistent high sensitivity for all isotopes of all elements in He Mode
Matrix Mix with Spike (10ppb) in He Mode
He Mode
Good signal for all spike elements at 10ppb Spike. Perfect template fit for all elements – no residual interferences and no loss of analyte signal by reaction
Consistent sensitivity and perfect template match for all elements
False Positive ProbabilityIn Real Data
Dataset
• 19 labs, one month of blank measurements• 301,520 individual blank measurements• 1,306 distinct analytes• 9,991 above MDL (3.3%)• 1,097 above RL (0.4%)• One or more hits above the MDL in 302 analytes
52
The good news
Analyte NameHits above
MDL Number of blanks
Methyl tert-butyl ether 0 2521
1,1,1-Trichloroethane 0 2016
Chloroethane 0 2011
Trichlorofluoromethane 0 1958
Dichlorodifluoromethane 0 1922
2-Hexanone 0 1871
1,1,1,2-Tetrachloroethane 0 1787
Vinyl acetate 0 1700
Chlorodibromomethane 0 1678
2,2-Dichloropropane 0 1657
Acrylonitrile 0 1648
Bromobenzene 0 1642
sec-Butylbenzene 0 1642
1,1,2-Trichloro-1,2,2-trifluoroethane 0 1501
Chlorobromomethane 0 1421
53
Continued
Analyte NameHits above
MDL Number of blanks
2-Chloroethyl vinyl ether 0 1373
Isopropyl ether 0 1185
Acetonitrile 0 1060
Cyclohexane 0 1017
Methyl methacrylate 0 995
Propionitrile 0 951
Ethyl methacrylate 0 897
Methacrylonitrile 0 854
Isobutyl alcohol 0 844
Pentachloroethane 0 806
Acenaphthylene 0 768
3-Chloro-1-propene 0 726
4-Nitrophenol 0 646
Hexachloroethane 0 642
54
And so on…
Analyte Name Hits above MDL Number of blanks
2,4,6-Trichlorophenol 0 602
1-Chlorohexane 0 599
2,6-Dinitrotoluene 0 588
Hexachlorocyclopentadiene 0 547
Dimethyl phthalate 0 533
Isophorone 0 530
2,4,5-Trichlorophenol 0 523
2,4-Dichlorophenol 0 516
2-Chloronaphthalene 0 516
2,4-Dinitrophenol 0 515
2,4-Dimethylphenol 0 512
2-Chlorophenol 0 511
2-Nitrophenol 0 508
3,3'-Dichlorobenzidine 0 507
N-Nitrosodiphenylamine 0 502
55
And on…
Analyte Name Hits above MDLNumber of
blanks
4-Bromophenyl phenyl ether 0 500
N-Nitrosodi-n-propylamine 0 500
bis(2-Chloroethoxy)methane 0 500
4-Chlorophenyl phenyl ether 0 497
4-Chloro-3-methylphenol 0 494
2-Methylphenol 0 481
Tert-butyl ethyl ether 0 480
4,6-Dinitro-2-methylphenol 0 473
Carbazole 0 460
4-Chloroaniline 0 459
2-Nitroaniline 0 455
4-Nitroaniline 0 443
3-Nitroaniline 0 440
Bromodichloromethane 0 426
56
For another 60 pages if we wanted to go that long
Analyte Name Hits above MDL Number of blanks
gamma-BHC (Lindane) 0 423
Heptachlor epoxide 0 415
Methoxychlor 0 407
Pyridine 0 389
n-Heptane 0 388
4,4'-DDE 0 386
Dichlorofluoromethane 0 385
4,4'-DDD 0 383
Aldrin 0 383
Dieldrin 0 381
Endrin aldehyde 0 380
Endosulfan sulfate 0 380
Endosulfan I 0 378
Benzyl chloride 0 372
Ethyl acetate 0 37057
The bad news
Analyte NameHits above
MDLNumber of
blanks% above
MDL % above RL
Naphthalene 410 3021 13.6% 0.5%
Methylene Chloride 364 2006 18.1% 0.7%
Ca 323 1609 20.1% 3.0%
Si 315 753 41.8% 2.8%
SiO2 272 587 46.3% 2.4%
Zn 268 1912 14.0% 0.8%
Acetone 250 1921 13.0% 1.2%
2-Butanone (MEK) 228 1746 13.1% 3.8%
Cu 223 1873 11.9% 1.1%
Al 195 1516 12.9% 0.4%
Toluene 179 2794 6.4% 0.1%
58
Continued
Analyte NameHits above
MDLNumber of
blanks% above
MDL% above
RL
Hexachlorobutadiene 171 2182 7.8% 0.2%
B 165 1200 13.8% 5.8%
1,2-Dichlorobenzene 163 2823 5.8% 0.0%
Mo 160 1446 11.1% 3.9%
Cr 157 1800 8.7% 0.3%
Mn 145 1646 8.8% 0.3%
Ba 141 1654 8.5% 0.2%
K 139 1484 9.4% 2.8%
Na 137 1808 7.6% 0.9%
Benzene 136 2721 5.0% 0.0%
59
A few more
Analyte NameHits above
MDLNumber of
blanks% above
MDL% above
RL
1,2,4-Trichlorobenzene 135 2319 5.8% 0.0%
Pb 134 1974 6.8% 0.1%
Fe 132 1708 7.7% 0.3%
Tl 124 1627 7.6% 0.3%
Sb 123 1634 7.5% 1.0%
Bromomethane 122 1986 6.1% 0.0%
1,2,4-Trimethylbenzene 120 1959 6.1% 0.0%
Mg 118 1566 7.5% 0.0%
1,2,3-Trichlorobenzene 110 1679 6.6% 0.0%
Ethylbenzene 108 2683 4.0% 0.3%
1,3-Dichlorobenzene 104 2808 3.7% 0.0%
Ti 102 1094 9.3% 0.0%
60
Methods
• ICP 10%, ICPMS 8%• 8021B 16%, 8260B 2%• 8270C 0.3%, 8270 SIM 6.4%• Various semivolatile hydrocarbon methods, 24%
to 33%
61
• What do these hits in blanks tell us about the probability of a false positive in a sample?
Compound “A”
09/09/2008
10/29/2008
12/18/2008
02/06/2009
03/28/2009
05/17/2009
07/06/2009
08/25/2009-0.2
-0.1
0
0.1
0.2
0.3
0.4
RL
MDL
3%5% between LOD and LOQ
X Method Blanks
Y Samples
False Positive Rate Samples
Assumption:
The False Positive Rate in Samples =
Detect rate in blanks
for
Detect rate in blank= 3%
False positive rate in samples=3%
x 100% = 60%
Chance that a hit in a sample is a F+
3% 5%
Sample detects between MDL - RL =5%
66
A Closer look at 4 labs
• Actual reported results from samples based on requirement to report to MDL
• 138,212 reported results
66
67
How many do we expect from the blanks?
• 5,043 reported results between MDL and RL, 3.6%
• Expected number based on blanks is 3,511, 2.3%
• If the frequency of false positives in samples is the same as that in blanks
• 3511 of these results would be false positives
• Of those results between MDL and RL, 70% might be false positives
67
68
YIKES!!!!!!!
68
69
What was that again?
69
If the frequency of false positives in samples is the same as that in blanks
Of those results below between MDL and RL, 70% are likely to be false
positives
Questions?