-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
1/103
Best Practices for Efficient Soil
Sampling Designs
Module 1
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
2/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 2
The Point of this Workshop
is not to teach you to be statisticians.
Rather, teach you basic concepts so able to ask forwhat you need & detect issues with statistically-based soil sampling program designs.
is not to say statistics are wrong, neverappropriate, or cannot be used.
Rather, to show that statistics cannot be used blindlyor as a black box for data collection
design/interpretation And identify pitfalls common to sampling programs,
the problems they cause, and how to avoid them.
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
3/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 3
Workshop Agenda
Welcome
Where does decision uncertainty comefrom?
You cant find the answer if you dont knowthe question!
Beware of statistics bearing assumptions
The cure for sampling dilemmas: useemerging best practices
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
4/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 4
Instructors
Deana Crumbling, [email protected]
Office of Superfund Remediation &Technology InnovationU.S. Environmental Protection Agency
Washington, D.C.
(703) 603-0643
Robert Johnson, [email protected]
Environmental Science Division
Argonne National Laboratory
Argonne, Illinois
(630) 252-7004
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
5/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 5
Software Resourcesand Disclaimer
Several software packages are referenced.References do not constitute an
endorsement. For more information:
Visual Sampling Plan (http://dqo.pnl.gov/)
ProUCL (http://www.epa.gov/esd/tsc/software.htm)
BAASS (http://web.ead.anl.gov/baass/register2/)
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
6/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 6
Take Away Points Statistics often are used to address cleanup
decision-making uncertainty Care must be taken when applying traditional
statistical approaches to soil sampling programs
Be extremely cautious with black-box software
Systematic planning, dynamic work plans, andreal-time measurement techniques (the Triad)can greatly improve sampling program designs
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
7/103
Where Does UncertaintyCome from When Making
Restoration Decisions?
Module 2
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
8/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 8
As we know, there are known knowns. There
are things we know we know. We also know
there are known unknowns. That is to say weknow there are some things we do not know.
But there are also unknown unknowns, the
ones we don't know we don't know.
Donald Rumsfeld, Feb. 12, 2002,Department of Defense news briefing
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
9/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 9
Decision Uncertainty Comes from a
Variety of Sources
Political, economic, organizational & social uncertainty
(outside scope of discussion) [know we know] Model uncertainty (also outside discussion scope,
although approaches to be discussed may providemechanisms for addressing this) [know we dont know]
Data uncertainty. Data uncertainty refers to theuncertainty introduced into decision-making byuncertainty associated with data sets used to support
decisions [Dont know we dont know] Data Uncertainty: primary focus of the
trainingsampling programs play important role.
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
10/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 10
Decision Quality Only as Good as the
Weakest Link in the Data Quality Chain
Sampling Analysis Interpretive
SampleSupport
SamplingDesign
SamplePreservation
Sub-Sampling
Sample PrepMethod
DeterminativeMethod
ResultReporting
Extract CleanupMethod
Relationship betweenMeasurement Parameter& Decision Parameter
Each link represents a variable contributing toward thequality of the analytical result. All links in the data quality
chain must be intact for data to be of decision-making quality!
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
11/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 11
Soil Core Sample
Population
AnalyticalSample Prep
AnalyticalSample Unit
Taking a Sample for Analysis
Field
Subsample
23.4567ppmGC
Lab Subsamples (Duplicates)
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
12/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 12
Historically the Focus Has BeenAnalytical Quality
Emphasis on fixed laboratory analyses followingwell-defined protocols
Analytical costs driven to a large degree by
QC/QC requirements Result:
analytical error typically on order of +/-30% for
replicate analyses traditional laboratory data treated as definitivebut
definitive (definite) about what?
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
13/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 13
The Biggest Cause of Misleading Data
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
14/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 14
Within-Sample Variability: Interaction
between Contaminant & Matrix Materials
927(wt-averaged)
Bulk Total
1,970Less than 200-mesh
836Between 50- and 200-mesh
165Between 10- and 50-mesh
108Between 4- and 10-mesh
50Between 3/8 and 4-mesh
10Greater than 3/8 (0.375)
Pb Concentration infraction by AA (mg/kg)
Firing Range Soil Grain Size(Std Sieve Mesh Size)
AdaptedfromI
TRC
(2003)
The decision determines representativeness
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
15/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 15
Regulatory & field practices
have long assumed thatsample size/volume has noeffect on analytical results
Now we know the assumption isinaccurate because of micro-
scale (within-sample)heterogeneity.
Sample volume affects theanalytical result!
SamplePrep
Concentrated Particles within Less
Concentrated Matrix = Nugget Effect
2 g 5 g
The Nugget Effect
SoilSubsample
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
16/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 16
1170 - 230100 g
5136 - 34310 g
39101 - 8001 g
How many subsamplesto average so result isclose (+25%) to true
sample mean?
[144 - 240 ppm]
Range of results
for 20 replicate
subsamples
(ppm)
Replicatesubsample weighttaken from a large,
ground & sieved soil
sample
True sample mean known to be 192 ppm
Micro-scale Heterogeneity Causes
Highly Variable Data Results
Adapted from DOE (1978) americium-241 study
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
17/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 17
136 ppm1
27
6 3
45
286 ppm
2 ft
41,400 ppm
1,220 ppm
42,800 ppm27,700 ppm
416 ppm
Figure adapted from
Jenkins (CRREL), 1996
Short-Scale Variability Can Also beSignificant
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
18/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 18
164 On-site
136 Lab1
27
6 3
45
331 On-site
286 Lab
2 ft
39,800 On-site
41,400 Lab1,280 On-site1,220 Lab
27,800 On-site
42,800 Lab
24,400 On-site
27,700 Lab
500 On-site
416 Lab
Heterogeneity Overwhelms Variability from
Different Analytical Techniques
95% of data variabili tydue to sample location
over a 4 ft diameter
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
19/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 19
Uncertainty Math Magnifies Weakest Links
Effects in Data Quality Chain
Uncertainties add according to (a2 + b2 = c2)
Total UncertaintyAnalytical Uncertainty
Sampling UncertaintyExample:
AU = 10 ppm, SU = 80 ppm: TU = 81 ppm
AU = 5 ppm, SU = 80 ppm: TU = 80 ppm AU = 10 ppm, SU = 40 ppm: TU = 41 ppm
AU = 20 ppm, SU = 40 ppm: TU = 45 ppm
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
20/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 20
How Do We Reduce Data Uncertainty?
For analytical errors:
Modify current or switch to another analyticaltechnique
Improve QC on existing techniques
For sample prep and handling errors:
Improve sample preparation
For sampling errors: Collect samples from more locations
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
21/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 21
We cant control the effects of
uncertainty on our decisions if we dont
know where it is coming from.
Historically sampling programs havefocused resources on the wrong sources
of data uncertainty.
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
22/103
You cant find the answer ifyou dont know the
question!
Module 3
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
23/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 23
To be [averaged]
or not to be [averaged],that is the question
From William Shakespeare's Hamlet, Princeof Denmark, Act III, scene I, as translated by
course instructors
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
24/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 24
Sometimes the Simplest Questionsare the Most Complex
Does this site pose an unacceptable risk?
Do groundwater concentrations exceed
drinking water standards?
Do soil concentrations exceed cleanup
requirements?
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
25/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 25
#1 #2 #3
The decision driving sample collection:Can it be shown that atmospheric
deposition caused contamination?
Layer impacted
by depositionSurface layerof interest
What sample support is mostrepresentative of the decision?
Sample Support, Representativeness and
Decision Unit Support are Intertwined
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
26/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 26
MIP = membrane-
interface probe (w/ECD detector)
Advances in Sampling & Measurement Technologies
Highlight Representativeness Issues
GW data results
HIGHLYdependent on
sample support
Graphic adapted from Columbia TechnologiesGraphic adapted from Columbia Technologies
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
27/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 27
0.000
0.010
0.020
0.030
0.040
0.050
0.060
0.070
0.080
0 100 200 300 400 500 600
Concentration (ppm)
Frequency
Multi-Increment Samples
0.000
0.010
0.020
0.030
0.040
0.050
0.060
0.070
0.080
0 100 200 300 400 500 600
Concen tration (ppm)
Frequency
Homogenized
Discrete Samples`
0.000
0.010
0.020
0.030
0.040
0.050
0.060
0.070
0.080
0 100 200 300 400 500 600
Concen tration (ppm)
Frequency
The Same Holds True for SoilsAction Level
XRF Readings
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
28/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 28
The Decision Unit is Often Not Well-Defined
Lead should not exceed 400 ppm in soilsor
TCE should not exceed 5 ppb in groundwater
Decisions are often ambiguous because cleanup
criteria do not provide enough information to define
the decision units.
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
29/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 29
Complete Cleanup Criteria Definitions Cannot achieve data representativeness w/o a
complete definition of cleanup criteria
Incomplete criteria leads to confusionexample:
an in situ XRF Pb reading from a yard is 560 ppm,
while a homogenized sample from same is 200 ppm,
while the average for the yard is 50 ppm.
Different sample supports different concentrationestimates that are all correct but lead to different
conclusions
Must DEFINE population of interest to interpret data!!
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
30/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 30
For Soils, Three Cleanup RequirementDefinitions are Most Common:
Never-to-Exceed Criteria: Leadconcentrations cannot be > 400 ppm
Hot-Spot Criteria: Lead concentrations
cannot be > 400 ppm averaged over 100 m2 Averaged Criteria: The average
concentration of lead over an exposure unit
cannot be > 400 ppm
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
31/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 31
Technically Defensible SamplingPrograms Require Complete Criteria
Technically defensible: making decisions withknown level of confidence
Impossible to design a defensible program for
never-to-exceed criteria Hot spot criteria typically require the most
intensive sampling to be technically defensible
Exposure unit averages are the easiest
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
32/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 32
All Solid Samples are Composites!! A data result = average for mass of soil digested/ extracted
during analytical prep (the analytical sample support)
Questions: Is analytical sample support representative of original
sample support as recd by lab?
Is sample recd by lab representative of decisionsupport as defined by project team?
Is teams SAP representative of regulatory criterion?
Sample support critical, yet currently determined by
convenience or whim of samplers & analysts.
If so, data quality being left to chance!!
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
33/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 33
Defensible Statistical Sampling ProgramDesign and Data Analysis Requires:
Clearly defined decision units and decision-making (e.g., action level) criteria
Sample supports that are representative ofthe decision unit of interest
Analytical method implementationconsistent with required sample support
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
34/103
Beware of Sampling ProgramsBuilt on Erroneous Statistical
Assumptions
Module 4
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
35/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 35
Statistics: The only science that enables
different experts using the same figuresto draw different conclusions.
Evan Esar: Esar's Comic Dictionary
American Humorist (1899 - 1995)
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
36/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 36
Wed prefer to ignore statistics when they
tell something we dont want to hear
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
37/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 37
Statistical Packages Can Give an Auraof Defensibility
But, if underlying assumptionsare wrong, the conclusions are
wrong!
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
38/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 38
Representativeness Assumed
SAP says: Representative samples will becollected. But provides no explanation of how or
what the samples are supposed to represent.Non-representative data decision errors
Sample support mismatched to cleanup criteria (single
grab vs. area average) Samples with different supports mixed together in
databases & statistical analysis
Use spatially clustered locations or biased sampleswhen calculating average concentrations
Mix different populations together
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
39/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 39
0.000
0.010
0.020
0.030
0.040
0.050
0.060
0.070
0.080
0 100 200 300 400 500 600
Concentration (ppm)
F
requency
Remember theseExamples?
0.000
0.010
0.020
0.030
0.040
0.050
0.060
0.070
0.080
0 100 200 300 400 500 600
Concentration (ppm)
Frequency
0.000
0.010
0.020
0.030
0.040
0.050
0.060
0.070
0.080
0 100 200 300 400 500 600
Concentration (ppm)
Frequency
Sample support
vs action level
Action
Level (AL)
Lognormal seen with smallsample supports & when data
from different supports are
mixed together
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
40/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 40
EU#1
EU#2
Dump
Exposure Unit #1: Biased sampling& spatial clustering when goal is to
calc the area average
Exposure Unit #2: Mix 2
populations (cleaner area &dump) into the same
sampling design & data set
Bias Your Answer by Biasing Your
Sampling Approach
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
41/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 41
Pretend All Data Sets Are Normal Normal distributions make
statistics easy
Can ignore complexitiesof spatial & non-randomrelationships
Many common statisticaltests (typical UCLcalculation, Student t test,etc.) assume normality
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
42/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 42
But Contaminated Sites Are
Rarely Normal
Distributions are usually
heavily skewed to the right
Often can be bimodal (i.e.,two-humped)
Usually reflect overlayingor mixed populations, e.g,:
-- Contaminated overbackground, or
-- Air deposition overdiscrete surface releases
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
43/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 43
Assuming Normality Can Under-estimate the
95% UCL on the Mean
400 ppm Pb requirement for exposure unit
4 lab lead results: 20, 24, 86, and 189 ppm Average of the 4 results: 80 ppm
Too few samples to know whether normal or not.
Options for how ProUCL can calculate 95%UCL
144 472 ppmNon-parametric distribution
246 33,835 ppmLognormal distribution434 ppmGamma distribution
172 ppmNormal distribution
the 95% UCL isIf assume data distribution is
C
S
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
44/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 44
Assuming Normality Can Under-Estimate Sample
Number RequirementsVSP example: How many samples recommended to demonstrate
that mean concentration is less than action level?
Assuming normal distribution: 10 Assuming non-parametric distribution: 23
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
45/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 45
Consequences of Under-estimating
Sample NumbersOnly know statistics were wrong when datas back from lab!
95%UCL is basis for many EPA decisions about risk &compliance.
When calculate 95%UCL from the data, find thatdecisions cannot be made at desired (95%) confidence
95% UCL > action level, even if mean < action level
Need more samples if want to make confident decisionabout risk or compliance: redo project
325 4 55 500 52595%LCL mean AL 95%UCL
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
46/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 46
Assume We Know Altho We Dont
What info is required to use statistics properly?
VSP is a software pkg that calculates number ofsamples based on user inputs. For soil samplingprojects, user must supply:
The conc variability that is present (recall that datavariability is a function of sample volume)
Underlying contaminant distribution (ditto)
How statistically confident do you want to be?
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
47/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 47
0.000
0.010
0.020
0.030
0.040
0.050
0.060
0.070
0.080
0 100 200 300 400 500 600
Concentration (ppm)
Frequency
Variability & population
distribution dependson sample support
0.000
0.010
0.020
0.030
0.040
0.050
0.060
0.070
0.080
0 100 200 300 400 500 600
Concentration (ppm)
Frequency
0.000
0.010
0.020
0.030
0.040
0.050
0.060
0.070
0.080
0 100 200 300 400 500 600
Concentration (ppm)
Freque
ncy
Action
Level (AL)
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
48/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 48
More that We Need to Know
What more is required to use statistics properly?
User must also supply:
Width of gray region (requires prediction of the truemean for the area under investigation)
Recognize that these input values will be different for
different contaminants on the same site Different field concs, different ALs
VSP may predict 10 samples for Zn, 500 for PAHs, 1000 for Hg
655 700
mean AL
500 725
95%LCL 95%UCL
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
49/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 49
Characterize or verify
cleanup?
Statistical confidence
desired
How close to each
other are the true
mean & AL?
How much variability is
present in the soil
concentrations
}
}
How choose the
input values?
Choose wisely: It Makes a Big
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
50/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 50
Choose wisely: It Makes a Big
Difference in Sample Numbers!
79218300
ppm
36105
200
ppm
43250ppm
100
ppm
200
ppm
350
ppm
GRwidth:
StDev
Actual mean closer to AL --->
Increas
ing
variabil
ity----->
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
51/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 51
Fact is, if we knew everything we
needed to know in order to design astatistical sampling program
correctly, we wouldnt need to do
the sampling!!
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
52/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 52
Dilemma Resolutions
How can good approximation inputs be chosen?
Data & experience from similar sites
Historical data from your site Pilot study (efficient if part of dynamic field work)
How does a non-statistician ensure the quality of
VSP applications?
When VSP used to justify sample numbers, require
that submissions include explanations for how each
input value was chosen Verify that the explanations are reasonable
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
53/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 53
There are three kinds of lies:
lies, damned lies, and statistics
-- Attributed to Benjamin Disraeli
(as popularized by Mark Twain)
Verifying Assumptions is a Cure for
Statistical Misrepresentation
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
54/103
The Cure for the Sampling Blues
uncertainty
mgt
Module 5
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
55/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 55
All truths are easy to understand oncethey are discovered; the point is todiscover them.
Galileo
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
56/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 56
Systematic
Project
Planning
Dynamic
Work
Strategies
Real-time Measurement
Technologies
Uncertaintymgt
The Triad Framework
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
57/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 57
Triad Data Collection Designs & DataAnalysis Built On:
Planning systematically (CSM is central)
Improving representativeness (& stats)
Addressing the unknown (with dynamic
work strategies)
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
58/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 58
Systematic Planning Addresses: Defining sample representativeness
Accurately delineating contaminantpopulations
Identifying populations requires using a CSM
Helps design a sampling plan to cultivate data
that conforms to a well-behaved statisticaldistribution
Discovering what we dont know
Plan
ningSy
stematically
Uncertaintymgt
Systematic Project
Planning
Systematic Planning & Data
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
59/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 59
Systematic Planning & Data
Collection Design
Planning must define decisions, decision units &
sample support requirements
Planning must identify sources of decisionuncertainty & strategies for uncertainty management
Planning must clearly define cleanup standards
Conceptual Site Models (CSMs) play a foundationalrole
All the above help define sampling populations
Plan
ningSy
stematically
2 Fundamental Concepts for Sampling Design
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
60/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 60
2 Fundamental Concepts for Sampling Design
& Statistics: (1) Decision Unit
Decision Unit:Area, volume, or set of objects (e.g.-acre area, bin of soil, set of drums)
All items treated as a single unit for decision-making
Statistical goal: discover true mean for that single unit
Amount of variability w/in the unit creates uncertainty inestimating the true mean
Therefore, statistics used to express amount ofuncertainty around the estimate of the mean
Examples: exposure unit, survey unit, remediationunitP
lan
ningSy
stematically
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
61/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 61
Valley of the Drums: These need to be characterized,transported, and disposed properly.
What is the decision unit? How do you sample it?
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
62/103
40 drums were cleaned in batches of 20. You need to
ensure the cleaning process worked.
What is the decision unit and how would you sample it?
Batch #1 Batch #2 Batch #3 Batch #4
2nd Fundamental Concept: Population
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
63/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 63
2nd Fundamental Concept: Population
Population: Set of objects or material volumessharing a common characteristic; can be synonymous
w/ decision unit.
Examples where they are not synonymous:
2 populations (clean As one & dirty Pb one) overlap
w/in a single decision unit (such as residential yard)
A population is so large that more than 1 decision unitis needed to cover it.
Example: Suspected clean population of 50 acres; butdecision unit (exposure unit) is 1 acre.
Plan
ningSy
stematically
Dont Get Confused by Similar Terms!
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
64/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 64
Don t Get Confused by Similar Terms!
Statist ical Population Distribution: Number of times(frequency) that particular values occur in a data set drawn
from the population (also called frequency distribution) This data set is called a sample by statisticians & it contains >1
physical sample
Example: How many times does the value 10 ppm occur?
Spatial Population Distribution: A spatial pattern created by thelocations of values (such as low, medium & high concentrations)across an area or within a volume
Conversion of a spatial distribution to a statistical distributionresults in loss of spatial & physical relationship information
Plan
ningSy
stematically
Relationship between spatial population
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
65/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 65
p p p p
distribution & statistical population distribution
Histogram
0
5
10
0 5 10 15 20 25 More
Bin
F
re
q
u
e
n
c
y
Different spatialdistributions
but same statistical
distribution
CSMs
Should Articulate Decision Uncertainty
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
66/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 66
CSMs Should Articulate Decision Uncertainty
CSM captures current understanding about siteconditions
Identifies additional information needed for confidentdecision-making
Data collection needs & design flow from the CSM
A well-articulated CSM serves as the point ofstakeholder consensus.
CSMs are livingas new data become available,incorporate into CSM.
CSM is mature when desired decision confidence isachievedP
lan
ningSy
stematically
Statistics & the CSM
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
67/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 67
Statistics & the CSM
Spatial considerations are irrelevant to classicalstatistics
Inputs for calculating how many samples does notinclude area to be sampled
If VSP inputs are the same, a 100-sq mi area will getsame # samples as 1-sq ft area
YOU must use the CSM to create decision units &select proper statistics for each decision unit
Statistics cannot be used properly w/o a CSMthat defines the statistical populations!!
Plan
ningSystematically
Improving Representativeness & Statisticalss
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
68/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 68
p g p
Performance Sample support
Volume/dimensions of sample, for XRF in situ analysis,
the sample support is the field of view Match it to decision needs & populations
Control within-sample heterogeneity Appropriate sample preparation important (see EPA
EPA/600/R-03/027 for additional detail) Uncertainty effects quantified by appropriate sub-sample
replicate analyses
Control short-scale between-sample heterogeneity Multi-increment sampling (physical averaging) Multiple readings (mathematical averaging)Im
pr
ovingR
epresen
tativen
e
Guidances on Multi-Increment
ess
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
69/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 69
Sampling/Compositing May Differ Somewhat Verification of PCB Spill Cleanup by Sampling and
Analysis (EPA-560/5-85-026, August, 1985)
up to 10 adjacent samples allowed Cleanup Standards for Groundwater and Soil,
Interim Final Guidance (State of Maryland, 2001) no more than 3 adjacent samples allowed
SW-846 Method 8330b (EPA Rev 2, October, 2006) 30 adjacent samples recommended
Draft Guidance on Multi-Increment Soil Sampling
(State of Alaska, 2007) 30 50 samples recommended for compositing
Impr
ovingR
epresentativen
e
All Samples Are Composites a
t Some Scalee
ss
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
70/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 70
A soil subsample contains many particlesthat are extracted or digested & analyzedtogether
A bulk heap of soil taken from a jar is acomposite of many individuals particles
Jar contents is a composite of many
individual particles The source of the jar contents is a composite
of many, many particles
Impr
ovingR
epresentativen
e
Compositing (from different locations at the between-sample level)is the same principle as stirring a jar to bring spatially separated
particles into the same analytical sample at the within-sample scale
Multi-Increment Sampling vs Compositingess
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
71/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 71
Multi Increment Sampling vs. Compositing
Multi-increment sampling: a strategy to cost-effectively control the effects of heterogeneity
multimulti--increment averagingincrement averaging
Compositing: a strategy to reduce overallanalytical costs when conditions are favorable
while looking for contamination compositecompositesearchingsearching
Same soil aggregation process, but done for
very different reasons
Impr
ovingR
epresentativen
e
Multi-Increment Sampling
vs.e
ss
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
72/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 72
CompositingCompositing
Impr
ovingR
epresentativen
e
Assumption: the cleanup criteriais averaged over decision unit
Decision Unit 1
Multi-Increment Sampling
sample
Form one multi-increment sample for analysis
Decision Unit 1 Decision Unit 2
Decision Unit 3 Decision Unit 4
Decision Unit 5 Decision Unit 6
Compositing
Form one composite sample for analysis
MI Sample
Multi-Increment Samplingess
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
73/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 73
Multi Increment Sampling
Effective when cost of analysis is significantly
greater than cost of sample acquisition/
handling
How many increments?
Practical upper limit imposed by homogenizationcapacity, background concentration & magnitudeof non-background concentration
Enough to bring sampling error under control
relative to other sources of error
Impr
ovingR
epresentativen
e
How Many Increments
per MI Sample?
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
74/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 74
For statistically valid conclusions, the number ofincrements is matched to level of heterogeneitypresent within decision unit
At least 10 preliminary measurements across decisionunit give good estimate of variability (SD)
Use SD & mean of preliminary data in VSP to
determine # increments needed for desired statisticalconfidence in the decision units mean estimate
Dynamic strategies make this process highly efficient
Add
ressing
theunknown
Uncertaintymgt
Dynamic Work
Strategies
How Many MI Samples
Per DU?
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
75/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 75
Only 1 MI sample per decision unit (DU) causes lossof spatial variability information
no means to QA (evaluate sufficiency) of increment #s
difficult to implement statistical tests
if DU average exceeds action level, cant identify where inthe DU the problem is
Multiple MI samples per decision unit (e.g., 5) provides the benefits of MI sampling (cost reduction,
improved performance) while providing variabilityinformation to allow statistical tests (e.g., Student t test,
Sign test, 95%UCL calculation, etc.) can help identify where contamination is in decision unit
Dynamic work strategies highly efficientAdd
ressing
theunknown
Statistical & Decision Benefits of MI
Sampling
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
76/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 76
0.000
0.010
0.020
0.030
0.040
0.050
0.060
0.070
0.080
0 100 200 300 400 500 600
Concentration (ppm)
Frequ
ency
MI sample
0.000
0.010
0.020
0.030
0.040
0.050
0.060
0.070
0.080
0 100 200 300 400 500 600
Concentration (ppm)
Frequency
Large grab
sample
Sampling
Physical equivalent of averaging many individual sampleresults mathematicallyMI sampling creates larger samplesupports & tends to normalize statistical data distributions
Benefits & Limitations of MI Samplinge
ss
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
77/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 77
Significantly reduce analytical costs
Significantly improve decision-making performance:
Reduce decision-making errors Much more likely that hot spots will be found and
accounted for
Impr
ovingR
epresentativen
e
Not as useful for subsurface & other samplingwhere sampling costs higher than analytical
Requires special design & handling for volatilecontaminants (Hg, VOCs, etc.)
In situ & other cost-effective high density analyses(like XRF) potentially substitute or augment MIS
Addressing the Unknown through
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
78/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 78
Dynamic Work Strategies Optimize data collection design
Real-time testing of CSM & obtain statistical design
parameters (SD, preliminary mean, sample support) Adaptive analytics
Strategies to produce collaborative data sets withsufficient analytical & sampling QC checks
Adaptive sampling
Strategies for confident estimates of DUs mean
Strategies for delineating contaminant populations
Adaptive compositing Efficient strategies for searching for contaminationA
dd
ressing
theUn
known
Adaptive Composite Searching
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
79/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 79
Goal: looking for contamination or demonstratingthat large areas are compliant
Assumptions: Most of the area is clean/compliant
Contamination is believed to be spotty
Action level is significantly greater thanbackground levels
Sample acquisition/handling costs are less thananalytical costs
Appropriate methods exist for sample acquisition& aggregation
Add
ressing
theUn
known
Composite Searching Designs
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
80/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 80
Must determine
appropriate number of samples to aggregated intocomposite
develop decision criteria to indicate whenanalyses of contributing samples are necessary
Add
ressing
theUn
known
Performance (cost/benefit) best when
contamination is spotty big difference between background & action level
big difference between average concentration & AL
Best case: no composite requires re-analysis Worst case: every composite requires re-analysis
Recipe for Adaptive Compositing
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
81/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 81
p p p g
Determine appropriate number of samples tocomposite & decision criterion w/ this equation:
Decision criteria = (action level - background)/(#of samples in composite) + background.
Sample and split samples. Use one set of splits
to composite and save other set. If:
composite result < decision criteria, done.
composite result > decision criteria, analyze splitscontributing to the pooled composite.
Add
ressing
theUn
known
How Many Samples to Composite?
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
82/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 82
Normalized Expected Cost vs Composite Size
1.1
0.0
0 5 10 15 20
Number Contributing to Composite
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
NormalizedExpectedCost
Hit Prob = 0.001
Hit Prob = 0.01
Hit Prob = 0.05
Hit Prob = 0.1
Hit Prob = 0.2
How probable is itthat contamination is
present? The less likely it isthat contamination ispresent, the larger the
number of samplesthat can becomposited
Graph at left
illustrates optimalsample numbers fordifferent probabilitiesA
dd
ressing
theUn
known
Example Decision Criteria
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
83/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 83
Background: 10 ppm; Action Level: 100 ppm
Determine decision criteria for a 2-sample, 3-sample, 4-sample, 5-sample & 6-samplecomposite:
2-sample composite: 55 ppm
3-sample composite: 40 ppm
4-sample composite: 33 ppm
5-sample composite: 28 ppm
6-sample composite: 25 ppm
Add
ressing
theUn
known
Decrea
sing
Analytica
lCosts
Increasing
Chance
ofFailing
Performance (Cost/Benefit)
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
84/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 84
Calculation
Compositing has a positive cost/benefit
ratio as long as:
Ff< 1 1/Nc
where: Nc = number contributing to composite
Ff= fraction of composite samples failing
(results above decision criteria)Add
ressing
theUn
known
Other Assorted Statistical Strategies
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
85/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 85
Useful classical statistics strategies
Stratified sampling designs
Bernards sequential t-test sampling design Binomial Sequential Probability Ratio Test (SPRT)
(sequential non-parametric sampling design)
Adaptive cluster sampling Ranked set sampling
Geostatistics (free software, Google: SADA, geostatistics)
Probability maps [cleanup if Pr(non-compliance) > X%] GeoBayesian (free BAASS software, see Bob Johnson)
USACE: Managing Uncertainty with an XRF & Geostatistics
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
86/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 86
Color coding for probabilities that 1-ft deepvolumes > 250 ppm Pb (actual Pb conc not shown)
Decision plan: Any soil w/ Pr(Pb > 250 ppm) > 40% will be landfilled.Soil with Pr(Pb > 250 ppm) < 40% will be reused in new firing berm.
Add
ressing
theUn
known
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
87/103
Module 6
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
88/103
10June2008 Triad Investigations Conference 88
Project Case Study:
Adaptive X-ray Fluorescence (XRF)Sampling & Analysis Design to Achieve
Decision Confidence for Residential Soil
Lead Concentrations
This slimmed down case study illustrates how todetermine & control data error in real-time to
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
89/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 89
generate definitive data
This project used a handheld X-ray fluorescence
(XRF) instrument to measure Pb in minutes at thesite of sample collection
Plastic bag ofsoil
This Projects Decisions
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
90/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 90
Is the Pb conc for each residential property
below the 500 ppm risk-based AL?
What is the greater source of data variability
when reporting Pb conc results & how can it be
reduced as needed?
Data Collection Design
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
91/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 91
An entire yard is an exposure unit
Grassy yard initially stratified into 3 sections
Section F: Front of houseSection S: Side of house
Section B: Behind house
Divide each section into 5 equal area subsections The subsections will be sampled by taking 1 grab soil
sample (~300 g) per subsection & placing it into a
plastic bag for XRF analysis
Illustrative Sampling Design & ResultsAction Level = 500 ppm
Property: 702 Main Street
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
92/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 92
Front yard average (at 95% statistical confidence) = 700 +/- 150
(550 850 ppm Pb)Side yard average (at 95% statistical confidence) = 500 +/- 100
(400 600 ppm Pb)Back yard average (at 95% statistical confidence) = 300 +/- 50
(250 350 ppm Pb)
Back
Yard:5Sam
ples
Fr
ontYard:5
Samples
Property: 702 Main StreetSide Yard: 5 Samples
House Footprint
Total yard average determined statistically (& area-weighted) as
410 +/- 25 (385 435 ppm Pb)
Area fx = 0.6
Area fraction = 0.25
Area fx = 0.15
1 bagged grabsample
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
93/103
Evaluate statistical results for the yard & compare to the
Decision Tree #1
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
94/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 94
Evaluate statistical results for the yard & compare to the500 ppm AL
Go to Decision Tree #2
If neither condition istrue
Decision too uncertain:more information needed
300 +/- 100 (150 520)
yes
Is there statist ical
conf idence that mean is
aboveAL?
Decide Pb conc for the yard isabove AL
Confident that action is
required
700 +/- 150(550 850)
yes
Decide Pb conc for the yard isbelow AL
Is there statist ical
confidence that mean is
belowAL?
Confident that no
action needed
200 +/- 50 (150 250)
Information from spreadsheet to
feed into Decision Tree #2
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
95/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 95
to identify the most important source of data
variability (aka, statistical error) Average within-bag error (std dev, SD) for
each of the 5 bags from a yard section
Between-bag error SD for all bags from a yard
section.
Compare the average within-bag SD to thebetween-bag SD
See example data set
feed into Decision Tree #2
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
96/103
Decision Tree #2
Determine the greater source of data variability(decision uncertainty)
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
97/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 97
Is within-bag variability GREATER than between-bag
variability?
Go toDecision
Tree # 3
yes
no, they are ~equal
Go to Decision Tree #5
Determine the greater source of data variability(decision uncertainty)
no
Is within-bag variabilityLESS than between-bag
variability?
Go to Decision Tree #4
yes
Decision Tree #3
Major source of data error: heterogeneity within sample
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
98/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 98
Re-shoot each bag another 4 times (total of 8 shots/bag. Add resultsto spreadsheet & recalculate stats for whole yard. Examine results.
Is within-bag variability sufficiently reduced?
Major source of data error: heterogeneity within samplebag (subsampling error)
To control this source of variabili ty:
no
Take addl correctiveaction
yes
Make decision at 500 ppmAL w/ desired statistical
confidence
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
99/103
Decision Tree #4
Major source of data error is from concentration
variations across the yard section area
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
100/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 100
Collect another 5 bag samples from section area. Analyze 4 times/bag.Add results to spreadsheet & recalculate statistics for whole yard.
Is between-bag variability sufficiently reduced?
variations across the yard section area.To control this source of variabili ty:
no
Take addl corrective action
yes
Make decision at 500 ppmAL w/ desired statistical
confidence
Decision Tree #2
Determine the greater source of data variability(decision uncertainty)
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
101/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 101
Is within-bag variability significantly
GREATER than between-bag variability?
Go toDecision
Tree # 3
yes
no, they are ~equal
Go to Decision Tree #5
g y(decision uncertainty)
no
Is within-bag variabilitysignificantly
LESS than between-bag
variability?
Go to Decision Tree #4
yes
Decision Tree #5
Concentration variability across yard section & withinsamples about the same
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
102/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 102
Analyze original bags an addl 4 times each. Also collect another 5 bag
samples from the section & analyze 8 times each. Add all results tospreadsheet & recalculate statistics for whole yard.
Is statistical decision uncertainty now sufficientlyresolved?
samples about the same.
To control both sources simultaneously:
no
Take addl corrective
action
yes
Make decision at 500 ppmAL w/ desired statistical
confidence
Benefits of the Dynamic XRF Strategy
-
8/13/2019 Best Practices for Efficient Soil Sampling Designs
103/103
10June2008 Triad Investigations: New Approaches and Innovative Strategies 103
Data gathered in real-time
Data evaluated against decision goals in real-
time
Data uncertainty identified & measured in real-time (definition of definitive data)
Decision tree guides actions to resolve datauncertainty in real-time
Final decisions can be made in real-time Property owners informed of decisions in real-
time