statistical analysis procedures detection limit substitution; cohen’s adjustment; aitchison’s...

95
1 Statistical Analysis Procedures Version 9.3 Copyright Information in this document is subject to change without notice and does not represent a commitment on the part of Sanitas Technologies. The software described in this document is furnished under a license agreement and may be used only in accordance with the terms of the agreement. No part of this manual may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or information storage or retrieval systems, for any purpose other than the purchaser’s personal use without the permission of Sanitas Technologies. © 1992-2012 SANITAS TECHNOLOGIES. All rights reserved. User Guide Version 9.3 designed by Sanitas Technologies. SANITAS TECHNOLOGIES 22052 W 66th Street Suite 133 Shawnee, KS 66226 www.sanitastech.com

Upload: dolien

Post on 01-May-2018

222 views

Category:

Documents


1 download

TRANSCRIPT

1

SSttaattiissttiiccaall AAnnaallyyssiiss PPrroocceedduurreess

Version 9.3

Copyright

Information in this document is subject to change without notice and does not represent a

commitment on the part of Sanitas Technologies. The software described in this

document is furnished under a license agreement and may be used only in accordance

with the terms of the agreement. No part of this manual may be reproduced or

transmitted in any form or by any means, electronic or mechanical, including

photocopying, recording, or information storage or retrieval systems, for any purpose

other than the purchaser’s personal use without the permission of Sanitas Technologies.

© 1992-2012 SANITAS TECHNOLOGIES. All rights reserved.

User Guide Version 9.3 designed by Sanitas Technologies.

SANITAS TECHNOLOGIES

22052 W 66th Street

Suite 133

Shawnee, KS 66226

www.sanitastech.com

2

TABLE OF CONTENTS

INTRODUCTION ................................................................................................................ 2

DESCRIPTIVE STATISTICS ................................................................................................. 5

Time Series Plot .......................................................................................................... 5

Box and Whiskers Plot ................................................................................................ 5

Histogram ................................................................................................................... 6

Probability Plot ........................................................................................................... 9

Seasonality Plot ........................................................................................................ 10

Statistical Outlier Tests ............................................................................................. 10

Rank Von Neumann................................................................................................... 17

Normality Report ...................................................................................................... 19

Stiff Diagram ............................................................................................................. 19

Piper Diagram .......................................................................................................... 20

DETECTION MONITORING STATISTICS ........................................................................... 21

Shewhart-CUSUM Control Chart............................................................................. 21

Intrawell Rank Sum ................................................................................................... 37

Mann-Whitney / Wilcoxon Rank Sum ....................................................................... 38

Welch's t-test ............................................................................................................. 39

One-Way Analysis of Variance (ANOVA) ................................................................. 41

Parametric ANOVA .................................................................................................. 41

Nonparametric ANOVA ............................................................................................ 47

Tolerance Limits ....................................................................................................... 48

Prediction Limits (or Intervals): EPA Standards ..................................................... 52

Prediction Limits (or Intervals): EPA Draft Unified Guidance (UG) Standards ..... 56

California Non-statistical Analysis of VOCs ............................................................ 62

Verification Retest Procedure – California .............................................................. 63

Intrawell ASTM Approach (ASTM Standards Only) ................................................ 64

Interwell ASTM Approach (ASTM Standards Only) ................................................. 70

EVALUATION MONITORING STATISTICS ......................................................................... 77

Trend Analysis .......................................................................................................... 77

Sen’s Slope Estimator ............................................................................................... 80

Seasonal Kendall Test ............................................................................................... 82

COMPLIANCE OR CORRECTIVE ACTION MONITORING STATISTICS ................................. 83

Confidence Intervals ................................................................................................. 83

Tolerance Intervals ................................................................................................... 87

APPENDIX I: GLOSSARY OF SELECTED STATISTICAL TERMS ................... 89

BIBLIOGRAPHY ........................................................................................................... 92

INDEX .............................................................................................................................. 94

Introduction

This section describes the statistical methods incorporated into the Sanitas for Ground

Water and Environmental Media software developed and used by SANITAS

3

TECHNOLOGIES to evaluate environmental data. These methods are proposed for use

in the monitoring and response programs of Subtitle C & D facilities and incorporate the

ground water statistical analysis requirements of:

� 40 CFR Part 264;

� 40 CFR Part 257 and 258;

� the EPA “Statistical Analysis of Ground Water Monitoring Data at RCRA Facilities -

Interim Final Guidance”, 1989;

� the EPA “Addendum to the Interim Final Guidance”, 1992;

� articles 5 and 10, Chapter 15, Title 23 of the California Code of Regulations; and

� the ASTM “Standard Guide for Developing Appropriate Statistical Approaches for

Ground-Water Detection Monitoring Programs” D 6312-98.

� the EPA Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities,

Unified Guidance, March 2009.

Specifically, the descriptive statistics described in this document include:

� Time Series;

� Box and Whiskers Plot (including annual and seasonal);

� Histogram;

� Skewness;

� Kurtosis;

� Probability Plot;

� Seasonality Plot;

� Statistical Outlier Tests;

� Normality Report;

� Rank Von Neumann;

� Normality Report;

� Stiff Diagram; and

� Piper Diagram.

The distributional statistics described include:

� Shapiro-Wilk Test;

� Coefficient-of-Variation Test;

� Shapiro-Francia Test;

� Chi-Squared Test; and

� Levene’s Test.

The censored data substitution functions described include:

4

� Detection Limit Substitution;

� Cohen’s Adjustment;

� Aitchison’s Adjustment, and;

� Kaplan-Meier.

The detection monitoring statistical tests described include:

� Combined Shewhart-CUSUM Control Charts;

� Intrawell Rank Sum:

− Exact Test;

− Large Sample Approximation Test;

� Mann-Whitney;

� Welch's t-test;

� Parametric Analysis of Variance;

� Bonferroni t-statistics (Multiple comparisons procedure);

� Nonparametric Analysis of Variance:

− Kruskal-Wallis;

� Tolerance Limits:

− Parametric;

− Nonparametric;

� Prediction Limits:

− Parametric;

− Nonparametric;

− With Retesting (Unified Guidance method)

� California Non-Statistical Analysis of VOCs;

� Poisson Prediction Limits;

� Intrawell ASTM Method; and

� Interwell ASTM Method.

The evaluation/assessment monitoring statistical tests described include:

� Mann-Kendall:

− Exact Test;

− Normal Approximation Test; and

� Sen’s Slope Estimator and Plot.

� Seasonal Kendall Slope Estimator and Plot

The compliance and corrective action statistical tests described include:

� Confidence Intervals:

5

− Parametric;

− Nonparametric; and

� Tolerance Intervals:

− Parametric;

− Nonparametric.

Moreover, this document describes the analysis decision logic and which pre- and post-

analysis tests are required to ensure that the data do not violate any size, distribution, or

seasonality assumptions of the relevant statistical tests. In general, the behavior

described herein is based on the default settings, and many of the details are subject to

alteration by the user.

Descriptive Statistics

Time Series Plot

Description:

Time Series plots provide a graphical method to view changes in data at a particular well

(monitoring point) or wells over time. Time Series plots display the variability in

concentration levels over time and can be used to indicate possible outliers. More than

one well can be compared on the same plot to look for differences between wells. They

can also be used to examine the data for trends.

Procedures:

Order the well measurements by sampling date. Number the sampling dates starting with

"O" for the initial date of collection. All subsequent dates will be numbered as the days

elapsed relative to this initial date. Plot the analyte measurement on the y-axis by

sampling date on the x-axis. The x-axis is labeled with intermittent month/year on the

Sanitas time series plots.

Box and Whiskers Plot

Description:

A quick way to visualize the distribution of data in a given data set is to construct a Box

and Whiskers plot. The basic box plot graphically locates the median, 25th

and 75th

percentiles of the data set; the "whiskers" extend to the minimum and maximum values of

the data set. The range between the ends of a box plot represents the Interquartile Range,

which can be used as a quick estimate of spread or variability. The mean is denoted by

a"+".

6

When comparing multiple wells or well groups, box plots for each well can be lined up

on the same axes to roughly compare the variability in each well. This may be used as a

quick exploratory screening for the test of homogeneity of variance across multiple wells.

If two or more boxes are very different in length, the variances in those well groups may

be significantly different.

Note that depending on the length of the well names and similar considerations, only

about 10 or 12 wells can fit on a Sanitas Box & Whiskers report without overcrowding.

For standard box plots, Sanitas will prompt the user for a maximum per page, but for

Grouped/Seasonal etc. box plots the user may have to divide the wells manually. To

keep the scale consistent among multiple subsets of a given View, leave all wells selected

in the well list on the left-hand side, and deselect the observations for specific wells on

the right-hand side. The deselected values will still be used in calculating the scale.

Procedures:

The data are first ordered from lowest to highest. The 25th

(lower quartile), 50th

(median),

and 75th

(upper quartile) percentile values from the data set are then computed. To

compute the pth

percentile, find the data point with rank position equal to:

p n( )++++ 1

100

Where:

n = number of samples;

p = the percentile of interest.

In the case of sparse data, the following logic is applied:

When n = 1, minimum value = 25th

percentile value = median = 75th

percentile

value = maximum value;

When n = 2, minimum value = 25th

percentile value, maximum value = 75th

percentile value, and median = ½ (minimum + maximum values);

When n = 3, minimum value = 25th

percentile value, maximum value = 75th

percentile value, and median = middle value.

Histogram

Description:

A frequency distribution may be visually displayed in the form of a histogram.

Procedure:

The analyte measurements are plotted on the x-axis and the frequencies of these

measurements are plotted on the y-axis. Values are collapsed within class intervals, each

7

represented by a rectangular bar on the plot. The height of each bar corresponds with the

respective frequencies. Coefficients of skewness and kurtosis are computed from the data

to give an indication of normality.

Skewness:

Skewness is a measure of the symmetry of the frequency distribution. The coefficient of

skewness, γγγγ, is computed as follows:

Χ−Χ

=

3

2/3

3

1

)(

Sn

n

n

i

γ

Where:

Xi = the value for the i th observation;

X = the mean of the n observations;

S = the standard deviation; and

n = the number of observations.

The mean, X , and the standard deviation, s, are computed as follows:

n

mf

X

k

i

ii∑== 1

( )

1

2

1

Χ−Χ

=∑

=

nS

n

i

i

Where:

fi = the frequency of the ith observation;

mi = the value of the ith observation; and

k = the number of distinct values.

A right skewed distribution has a positive skewness value, and a left skewed distribution

has a negative skewness value. A large absolute skewness value can be an indication of

the presence of outliers. A normally distributed frequency distribution would have a

skewness absolute value of less than 1.

8

Kurtosis:

Kurtosis is a measure of flatness or peakedness of the frequency distribution. The

coefficient of kurtosis, K, is computed as follows:

( )( )( )( )

( )( )( )32

13

321

12

4

−−

−−

Χ−Χ

−−−

+=Κ ∑

nn

n

Snnn

nn i

Where:

Xi = the value for the i th observation;

X = the mean of the n observations;

S = the standard deviation; and

n = the number of observations.

A normal distribution has a kurtosis absolute value of less than 1. A negative kurtosis

value indicates a flatter curve than the normal distribution. A positive kurtosis value

indicates a curve that is more peaked than the normal distribution.

EXAMPLE 1:

Date Xi

(concentration) (Xi-X)3 [(Xi-X)/S]

4

1/5/1992 15 -25.08 0.30

4/8/1992 17.5 -0.08 0.00

7/1/1992 13.2 -105.64 2.05

10/15/1992 14.9 -27.74 0.34

1/20/1993 27 746.82 27.82

4/14/1993 22.6 102.03 1.96

7/12/1993 18.7 0.46 0.00

10/22/1993 17.4 -0.15 0.00

1/15/1994 19 1.23 0.01

4/2/1994 15 -25.08 0.30

7/3/1994 16.9 -1.08 0.00

Example Data for Skewness and Kurtosis

Χ = 17.93 S = 3.95 n = 11

Skewness

( ) 68.6653

=Χ−Χ∑ i

9

132.1

95.311

111

11

68.665

32

3=

Kurtosis

79.32

4

=

Χ−Χ∑

S

i

( )( )( )( )

( )( )( )

844.1311211

111379.32

311211111

11111 2

=−−

−−

−−−

+=Κ

Probability Plot

Description:

Probability plots are a graphical test for normality. These plots may be used to

investigate whether a set of data or the residuals of the data follow a normal or

transformed-normal distribution.

Procedure:

The data are first ordered from lowest to highest. The analyte measurements are plotted

in increasing order on the x-axis and the z-scores from a standard normal distribution

corresponding to the proportion of observations less than or equal to that measurement

are plotted on the y-axis. The coordinated z-score from a standard normal distribution is

computed by the following formula:

+−Φ=

1n

i1i

y

Where:

ΦΦΦΦ −−−−1 = the inverse of the cumulative standard Normal distribution;

n = the sample size; and

i = the rank position of the ith ordered concentration.

If the data are normal, the points when plotted will lie in a straight line. Visual curves or

bends indicate that the data do not follow a normal distribution.

An option exists to draw a straight line connecting the lower and upper quartiles of the

data as a visual aid.

10

EXAMPLE 2

Concentration(X-axis) Order (I) [i/(n+1)] z-score (y-axis)

39 1 0.077 -1.425

56 2 0.154 -1.02

58.8 3 0.231 -0735

64.4 4 0.308 -0504

81.5 5 0.385 -0.294

85.6 6 0.462 -0.095

151 7 0.538 0.095

262 8 0.615 0.294

331 9 0.692 0.504

578 10 0.769 0.735

637 11 0.846 1.02

942 12 0.923 1.425 Example Data for Probability Plot

Seasonality Plot

Description:

Seasonality plots are constructed as Time Series plots for both observed values and

values deseasonalized according to the method described by the EPA (U. S. EPA, April

1989 and March 2009). In addition to the Time Series plots, box plots are presented for

the original and deseasonalized data. The presence of seasonality is tested with the

Kruskal-Wallis H statistic with correction for ties (see Control Charts for method

description).

Statistical Outlier Tests

Description:

A statistical outlier is a value that is extremely different from the other values in the data

set. Outlier tests identify data points that do not appear to fit the distribution of the rest of

the data set and determine if they differ significantly from the rest of the data.

A value is considered to be suspect if it is an order of magnitude larger or smaller than

the rest of the data. Once a value is identified as a statistical outlier, it should be checked

thoroughly for possible lab instrument failure, field collection problems, or data entry

errors. Outliers may exist naturally in the data if there is an extremely wide inherent or

temporal variability in the data, or if there is an on-sight problem such as leakage or a

new impact source. An outlier should not be removed from the data set unless the value

has been documented to be erroneous. Outliers that cannot be explained by error may

call for further investigation (EPA, April 1989, 2009).

11

"EPA 1989" OUTLIER TEST

Assumptions:

The "EPA 1989" outlier test assumes that all data values, except for the suspect

observation, are normally or log normally distributed. A minimum of three observations

is required; however, a minimum of eight observations is recommended.

Procedure:

First, the data are log-transformed, then ordered from lowest to highest. The mean and

standard deviation are then calculated. Next, calculate the outlier test statistic, Tn, as:

( )S

nXnT

X−=

Where:

Xn = the suspect observation;

X = the sample mean; and

S = the sample standard deviation.

Then compare the absolute value of the outlier test statistic (Tn) with the critical value,

(Tn (0.05)), for the given sample size, n, at a five percent significance level (Table 8,

Appendix B, EPA, April 1989). If abs(Tn) exceeds the tabulated value, there is statistical

evidence that Xn is a statistical outlier. If so, this value is removed and the remaining

dataset is retested using the same method, until all such outliers have been accounted for.

12

EXAMPLE 3:

Total Organic Carbon (mg/I) Log Transformed Data

1700 7.4

1900 7.5

1500 7.3

1300 7.2

11000 9.3

1250 7.1

1000 6.9

1300 7.2

1200 7.1

1450 7.3

1000 6.9

1300 7.2

1000 6.9

2200 7.7

4900 8.5

3700 8.2

1600 7.4

2500 7.8

1900 7.5

Example Data for Outlier Test

The mean and standard deviation for all log transformed data including the outlier.

Χ = 7.5 s = 0.61

95.261.0

5.73.919 =

−=Τ

Table 8, Appendix B, US EPA Guidance, T19(.05) is 2.532. Since T19 exceeds the

tabulated value, there is statistical evidence that this observation is an outlier.

DIXON'S OUTLIER TEST

Requirements and Assumptions:

Dixon’s test is only recommended for sample sizes n ≤ 25. It assumes that the data set

(not including the suspected outlier) is normally-distributed.

13

Procedure:

Step 1. Sort the data set and label the ordered values, x(i).

Step 2. To test for a low outlier, compute the test statistic C using the appropriate

equation below, based on the sample size:

C =

(x (2) − x (1))/(x (n) − x (1)) for 3 <= n <= 7

(x (2) − x (1))/(x (n−1) − x (1)) for 8 <= n <= 10

(x (3) − x (1))/(x (n−1) − x (1)) for 11 <= n <= 13

(x (3) − x (1))/(x (n−2) − x (1)) for 14 <= n <= 20

Or, to test for a high outlier, compute the test statistic C using the appropriate equation

below, based on the sample size:

C =

(x (n) − x (n−1))/(x (n) − x (1)) for 3 <= n <= 7

(x (n) − x (n−1))/(x (n) − x (2)) for 8 <= n <= 10

(x (n) − x (n−2))/(x (n) − x (2)) for 11 <= n <= 13

(x (n) − x (n−2))/(x (n) − x (3)) for 14 <= n <= 20

Step 3. Find the critical point for the specified alpha level in table 12-1, US EPA Unified

Guidance 2009. If C exceeds the tabulated value, the suspected outlier should be

declared a statistical outlier and investigated further.

Dixon's test can be modified to test for more than one outlier as follows. If the least

extreme suspected outlier is tested, having removed any more extreme values, and proves

to be a statistical outlier, then it may be concluded that the more extreme suspected

values are also statistical outliers. If not, then the least extreme of the removed values

can be tested in a similar manner. Importantly, though, this method can only test multiple

suspected outliers if they are both on the same tail, i.e. both high outliers or both low

outliers. So if both a high and a low outlier are suspected in a single data set, this test is

not recommended. If the sample size is at least 20, Rosner's should be substituted;

otherwise contact a professional statistician.

14

EXAMPLE:

Step 1: Order data lowest to highest, and calculate the natural log of each data point.

Order Concentration (ppb) Logged Concentration

1 1.7 0.531

2 3.2 1.163

3 6.5 1.872

4 7.3 1.988

5 12.1 2.493

6 13.7 2.617

7 15.6 2.747

8 16.2 2.785

9 35.1 3.558

10 41.6 3.728

11 57.9 4.059

12 59.7 4.089

13 68.4 4.225

14 70.1 4.250

15 75.4 4.323

16 199.0 5.293

17 275.0 5.617

18 302.0 5.710

19 350.0 5.878

20 7066.0 8.863

Step 2: Use the last portion of the equations to test for high outliers in computing C (i.e.

because n = 20).

15

C = (x (n) − x (n−2))/(x (n) − x (3)) for 14 <= n <= 20

C = 8.863 – 5.710 = 0.451

8.863 – 1.872

Step 3: Using Table 12-1 in the Unified Guidance, compare the calculated C to the

critical point for n = 20 and α = .05. The calculated value of 0.451 exceeds the critical

value of 0.450, therefore, the extreme value may be considered a statistical outlier.

ROSNER'S OUTLIER TEST

Requirements and Assumptions:

Rosner’s test is recommended when the sample size is 20 or larger. The critical points

can be used to identify from 2 to 5 outliers. Rosner’s method again assumes the

underlying data set (less any outliers) is normally distributed, or can be transformed to

normal.

Procedure:

Step 1. Sort the data set and label the ordered values x(i). Then identify the maximum

number of suspected outliers, r0.

Step 2. Compute the mean and standard deviation of all the data; call these values x(0) and

s(0). Then determine the measurement farthest from x(0) and label it y(0).

Step 3. Remove y(0) from the data set and compute the mean and standard deviation of the

remaining observations. Call these new values x(1) and s(1). Again find the value in this

data subset furthest from x(1) and label it y(1).

Step 4. Remove y(1), again calculate the mean and standard deviation, and continue this

process until r0 potential outliers have been removed.

Step 5. We now have the values necessary to test for r outliers (r ≤ r0) by computing the

test statistic:

)1()1()1(1-r /R −−− −= rrr sxy ||

First test for r0 outliers. If the test statistic exceeds the first critical point from Table 12-2,

US EPA Unified Guidance 2009, based on the sample size and the alpha level, this may

be taken as evidence that there are r0 outliers. If not, test for r0–1 outliers in the same

manner using the next critical point, continuing until a certain number of outliers have

been identified or until no outliers are found.

16

Note that Sanitas will accept one as the number of suspected outliers. In this case, it uses

the second tabled value from k=2 (as if two outliers were suspected but not found) to test

for one outlier.

Example:

Step 1: Order the data lowest to highest and compute the mean and standard deviation of

the complete data set.

Step 2: Specify K (number of suspected outliers). For our example K = 2.

Successive Naphthalene Subsets (SSi)

SSo SS1 SS2

1.00 1.00 1.00

1.47 1.47 1.47

1.74 1.74 1.74

1.82 1.82 1.82

1.91 1.91 1.91

2.02 2.02 2.02

2.57 2.57 2.57

3.34 3.34 3.34

4.42 4.42 4.42

4.43 4.43 4.43

5.18 5.18 5.18

5.34 5.34 5.34

5.39 5.39 5.39

5.39 5.39 5.39

5.53 5.53 5.53

5.59 5.59 5.59

5.74 5.74 5.74

5.85 5.85 5.85

5.96 5.96 5.96

6.05 6.05 6.05

6.12 6.12 6.12

6.88 6.88 6.88

Naphthalene Concentrations (ppb)

Quarter W-1 W-2 W-3 W-4 W-5

1 3.34 5.59 1.91 6.12 8.64

2 5.39 5.96 1.74 6.05 5.34

3 5.74 1.47 23.23 5.18 5.53

4 6.88 2.57 1.82 4.43 4.42

5 5.85 5.39 2.02 1.00 35.45

17

8.64 8.64 8.64

23.23 23.23

35.45

Χ o = 6.44

S0 = 7.379

y 0 = 35.45

Χ 1 = 5.23

S1 = 4.326

y 1 = 23.23

Χ 2 = 4.45

S2 = 2.050

y 2 = 8.64

Step 3: Identify the observation farthest from the mean, remove it from the data set and

recompute mean and standard deviation. Repeat process until the specified number of K

values are removed.

Step 4: Test for 2 joint outliers using the following equation:

SS(k-1) = SS1

R1 = 23.23 – 5.23 = 4.16

4.326

Step 5: Based on α = .05 sample size of n = 25, and k = 2, the first critical point in Table

12-2 equals 2.83 for n = 20 and 3.05 for n = 30. Both suspected values may be declared

statistical outliers since the calculated R1 is larger than either of these critical points.

Rank Von Neumann

Description:

This statistical procedure is a test for serial correlation at a given well (monitoring point).

The test will also reflect the presence of trends or cycles, such as seasonality. Therefore,

to test for serial correlation only, one must first remove any seasonality or trends that are

present.

Rank Von Neumann Procedure:

The null hypothesis to be tested is:

H0: There is no serial correlation present in the data.

The alternative hypothesis is:

HA: There is serial correlation present in the data.

The data are first ordered from lowest to highest, assigning the rank of 1 to the smallest

observation, the rank of 2 to the next smallest,…, and the rank of n to the largest. Let R1

be the rank of x1, R2 be the rank of x2, and Rn the rank of xn.

18

Compute the Rank Von Neumann statistic as:

( )2

1n

1i 1iR

iR

12nn

12Rv ∑

= +−

−=

Where:

Ri = the rank of the ith observation in the sequence; and

Ri+1 = the rank of the (i+1)st observation in the sequence (the following

observation).

If the sample size n is greater than or equal to ten, or less than or equal to 100, the

calculated value Rv is compared to the tabulated Rvαααα (Table A5, Gilbert). The null

hypothesis is rejected if the computed value Rv is less than the tabulated critical value.

If the sample size, n, is greater than 100, compute:

( )2

2vRn

RZ

−=

Reject the null hypothesis if ZR is negative and the absolute value of ZR is greater than

the tabulated Z1-αααα value (Table A1, Gilbert).

EXAMPLE 4:

Date Concentration Rank [Ri-Ri+1]2

3/3/1995 2.2 10 9

6/3/1995 2.74 13 81

9/3/1995 0.42 4 4

12/3/1995 0.63 6 1

3/3/1996 0.82 7 1

6/3/1996 0.86 8 36

9/3/1996 0.31 2 100

12/3/1996 2.33 12 49

3/3/1997 0.5 5 36

6/3/1997 2.22 11 4

9/3/1997 1.1 9 36

12/3/1997 0.32 3 4

2/3/1998 0.01 1

Rank Von Neumann Example Data

( )∑−

==

+−

1n

1i361

21i

Ri

R

19

( ) 1.984361

121313

12vR =

−=

The tabled critical value at an alpha of .05 is 1.14. Since Rv is greater than the tabled

critical value, we cannot reject H0.

Normality Report

Description:

The Normality Test report is a textual report of normality test results for each well

(monitoring point) selected in the current data set. Either the Shapiro-Wilk/Shapiro

Francia method, the Chi-Squared method or the Coefficient of Variation (see descriptions

elsewhere in this statistical write-up) may be used, and optionally the normality results

after each transformation in the Ladder of Powers (see the User Guide) may be detailed.

Stiff Diagram

Description:

Stiff Diagrams are a graphical method devised to portray water compositions and

facilitate in the interpretation and presentation of chemical analysis. They may be used to

visually compare the chemical composition of water quality across wells, and aid in

determining whether the aquifer is heterogeneous or homogenous. Stiff Diagrams are

calculated in terms of equivalents per million, more commonly referred to as

milliequivalents; and they take into account the ionic charge and the formula weight for

selected constituents, specifically (sodium+potassium), magnesium, calcium, chloride,

sulfate, and bicarbonate.

Procedure:

Milliequivalents per liter for each of the above constituents are calculated as

Weight/Charge.

The resulting values determine the relative distances from the center line for the

respective vertices of the diagram.

Example :

Atomic

Number

Ionic

Charge Concentration meq % meq %

Na 23 1 570 24.78261 0.251108 25

K 39 1 250 6.410256 0.064952 6

Ca 40 2 1275 63.75 0.645943 65

Mg 24 2 45 3.75 0.037997 4

20

Total 98.69287 100

Cl 35.5 1 5000 140.8451 0.803372 80

SO4 96 2 155 3.229167 0.018419 2

Bicarbonate 61 1 635 10.40984 0.059377 6

Carbonate 60 2 625 20.83333 0.118832 12

Total 175.3174 100

Discussion, focusing on Sodium:

A sample of water contains 570 mg/l of sodium (Na). The concentration of Na, in terms

of milliequivalents, in the water sample may be computed as follows:

Atomic Weight of Na: 23

Ionic Charge (volume): 1

Combining Weight (milliequivallent):

23/1 = 23

eq/l = 570 x 1/23 = 24.78 (the relative distance to the left from the center line on the

diagram)

%: 24.78 /98.69 = .25 or 25%.

To run a Stiff Diagram report, choose a sampling date from the drop down list, and

optionally extend this to a range if the sampling event occupied multiple days. Select the

wells to analyze, and click Run.

The following options are available:

Label Axes: Adds a scale (in milliequivalents) to the x-axis of each Stiff Diagram drawn.

Label Constituents: Adds abbreviated constituent names on the vertices.

Compare Dates: Replaces the Date ComboBox (single date selection) with a scrolling list

of dates (multiple date selection). Allows the comparison of data not only by well, but

also by date.

Piper Diagram

Description:

Piper diagrams are a form of tri-linear diagram, which provide a visual representation of

the ion concentration of groundwater. A piper diagram has two triangular plots on the

right and left side of a 4-sided center field. The three major cations are plotted in the left

triangle and anions in the right. Each of the three cation/anion variables, in

milliequivalents, is divided by the sum of the three values, to produce a percent of total

cation/anions. These percentages determine the location of the associated symbol. The

21

data points in the center field are located by extending the points in the lower triangles to

the point of intersection.

In order for a Piper diagram to be produced, the selected data file must contain the

following constituents: Sodium (or Na), Potassium (or K), Calcium (or Ca), Magnesium

(or Mg), Chloride (or Cl), Bicarbonate (or HCO3), Carbonate (or CO3) and Sulfate (or

SO4). The units should be mg/l, ppm, ug/l or ppb, and must be consistent.

If the Note Cation-Anion Balance option is selected (in Configure Sanitas) the report will

also show the Cation-Anion Balance, which is the absolute value of the difference

between the total cations and the total anions, both expressed in milliequivalents, divided

by their sum.

Detection Monitoring Statistics

Shewhart-CUSUM Control Chart

Description:

The combined Shewhart-Cumulative Sum (CUSUM) Control Charts are useful graphical

tools for evaluating detection-monitoring data because they monitor the inherent

statistical variation of data collected within a single well (monitoring point) and/or

between background and compliance wells, and flag anomalous results.

Control Charts are a form of a time-series graph, on which a parametric statistical

representation of concentrations of a given constituent are plotted at intervals over time.

The statistics are computed and plotted together with upper and/or lower control limits on

a chart where the x-axis represents time. If a result falls outside the predetermined control

limits, then the process is considered “out of control” and may indicate potentially

impacted ground water. Otherwise, the process is considered “in control.”

Assumptions:

The standard assumptions in the use of Control Charts are that the data are independent

and normally distributed with a constant mean, X , and constant variance, s2, and that the

background data haven’t been previously impacted by the facility. In addition, it is

assumed that seasonality in the data is sufficiently accounted for to minimize the chance

of mistaking seasonal effects for evidence of water quality degradation due to release

from a nearby waste management unit (WMU). Another assumption is that a sufficient

number of background data points exists to provide reliable estimates of the mean and

standard deviation of the constituent’s concentration values for a given well.

22

Independence:

Prior to construction of the Control Charts, the assumption of data independence should

be considered. The monitoring data should be collected to ensure physical independence

of the samples, and a specified rigorous field sampling protocol should be followed.

Distribution:

The distribution of the data is evaluated by applying the Shapiro-Wilk or Shapiro-Francia

test for normality to the raw data or, when applicable, to the transformed data. The null

hypothesis, H0, to be tested is:

H0: The population has a normal (or transformed-normal) distribution.

The alternative hypothesis, HA, is:

HA: The population does not have a normal (or transformed-normal)

distribution.

Shapiro-Wilk Test Procedure:

Calculation of the Shapiro-Wilk W-statistic to test the null hypothesis is presented in

detail on page 158 of Statistical Methods for Environmental Pollution Monitoring

(Gilbert, 1987). This test will be used when there are 49 or fewer observations to test.

Beyond 49 observations, the Shapiro-Francia test will be used.

The denominator, d, of the W test statistic, using n data is computed as follows:

( )2

1 1 1

22 1∑ ∑ ∑

= = =

Χ−Χ=Χ−Χ=

n

i

n

i

n

i

iiin

d

Where:

Xi = the value for the i th observation;

X = the mean of the n observations; and

n = the number of observations.

Order the n data from smallest to largest (e.g. X[1] < X[2] < ... < X[n]). Then compute k

where:

2k

n= if n is even

2

1-nk = if n is odd

The coefficients a1, a2, ..., ak for the observed n data can be found in Table A6 (Gilbert,

1987).

23

The W test statistic is then computed as follows:

[ ] [ ]( )2

k

1i

i1iiad

1W

Χ−Χ= ∑

=

+−n

The data are tested at the α = 0.05 significance level. The significance level represents

the probability of rejecting the null hypothesis when it is true (i.e. the percent of false

positives). It is customary to set α at 0.05 (corresponding to a 95 percent confidence

level) or at .01 (corresponding to a 99 percent confidence level).

α - This is also known as "Type I error." Reject Ho at the α significance level if W is

less than the quantile given in Table A7 (Gilbert, 1987).

EXAMPLE 5:

Example Data For Shapiro-Wilk Test

10=n [ ] 7865.5

21

=−=∑=i

n

i yyd

52

10

2

nk ===

( )[ ] ( )[ ] ( )[ ]( )[ ] [ ]

87.004879.01133.00399.02744.01823.01224.0

5108.031148.02141.079851.05247.03291.00402.27227.05739.0

7865.5

1W

2

=

−+−−+

−−+−−+−−=

The calculated W is greater than the W found in Table A7, Gilbert 1987 for α= .05 of

0.842. Therefore, it is concluded that the data are lognormally distributed.

The Shapiro-Wilk test of normality can be used for sample sizes up to 49. When the

sample size is larger than 49, the Shapiro-Francia test can be used instead. A less

accurate normality test for smaller samples sizes is the coefficient of variation test.

Rank (smallest to largest Xi yi=In xi [yi-y]2

1 .13 -2.0402 3.49126

2 .45 -0.7985 0.39285

3 .60 -0.5108 0.11499

4 .76 -0.2744 0.01055

5 1.05 0.0488 0.04863

6 1.12 0.1133 0.08126

7 1.20 0.1823 0.12535

8 1.37 0.3148 0.23672

9 1.69 0.5247 0.48505

10 2.06 0.7227 0.80002

24

Coefficient-of-Variation Test Procedure:

Calculate the sample mean, X , of the n observations Xi, where i = 1, ..., n. Then

calculate the sample standard deviation, s. The coefficient-of-variation, CV, is calculated

as:

X

sCV =

If CV exceeds 1.00 then reject H0 that the data are normally distributed.

EXAMPLE 6:

Date Concentration

1/5/1993 0.04

10/3/1993 0.18

2/1/1994 0.18

4/7/1994 0.25

7/2/1994 0.29

10/9/1994 0.38

1/15/1995 0.5

4/17/1995 0.5

7/1/1995 0.6

11/2/1995 0.93

1/15/1996 0.97

4/17/1996 1.1

7/1/1996 1.16

11/2/1996 1.29

1/15/1997 1.37

2/28/1997 1.38

5/1/1997 1.45

8/2/1997 1.46

11/4/1997 2.58

1/7/1998 2.69

3/6/1998 2.8

8/29/1998 3.33

11/2/1998 4.5

1/6/1999 6.6

Example Data for Coefficient of Variation

52.1=Χ 56.1=s

03.11.52

1.56CV ==

Since CV is greater than 1.00, the data were not found to be normally distributed.

25

Shapiro-Francia Test Procedure:

Calculation of the Shapiro-Francia W′ -statistic to test the null hypothesis is presented in

detail by EPA (U.S. EPA, 1992). The test statistic, W′ , is computed as follows:

[ ]( ) 2

im

i2S1n

2i i

xi

mW'

∑−

∑=

Where:

xi = the ith ordered value of the sample;

mi = the approximate expected value of the ith ordered normal quantile;

n = the number of observations; and

S = the standard deviation of the sample.

The values for mi can be approximately computed as:

+Φ=

1n

i1-i

m

Where:

ΦΦΦΦ-1 = The inverse of the standard normal distribution with zero mean and unit

variance.

Reject H0 at the α = 0.05 significance level if W′ is less than the critical value provided

in Table A-3 (Appendix A; U.S. EPA, 1992). When the sample size is larger than 100,

the Chi-Squared Goodness-of-Fit test can be used instead.

Chi-Squared Goodness-of-Fitness Normality Test Procedure:

First divide the N observations by four to compute K, where K will be the number of

subgroups or ‘cells’ for the data set (maximum 10). Second, standardize each

observation, Xi, by subtracting the group mean and dividing by the group standard

deviation as follows:

( )s

XXZ i

i

−=

Where:

Zi = the standardized value;

X = the group mean; and

26

s = the group standard deviation.

Once the standardized values and K have been calculated, the third step is to subgroup

the Zi according to the cell boundaries designated for K cells in Table 4-3 (EPA, April

1989). The Chi-Squared statistic, ΧΧΧΧ2, may be calculated as follows:

( )∑=

−=

K

1i iE

2i

Ei

N2X

Where:

Ni = the number of observations in the ith cell; and

Ei = N/K, The expected number of observations in the ith cell.

Last, compare the calculated ΧΧΧΧ2 to a table of the chi-squared distribution (Table 1,

Appendix B; U.S. EPA, 1989) with α = 0.05 and K=3 degrees of freedom. If the

calculated value exceeds the tabulated value, then reject H0 that the data are normally

distributed.

The following example data represent the residuals from an analysis of variance on

dioxin concentrations. The standardization process has been applied to the residuals,

resulting in the data in the third column, the standardized residuals or Zi.

EXAMPLE 7:

Observation Residuals Standardized Residuals

1 -0.45 -1.9

2 -0.35 -1.48

3 -0.35 -1.48

4 -0.22 -0.93

5 -0.16 -0.67

6 -0.13 -0.55

7 -0.11 -0.46

8 -0.1 -0.42

9 -0.1 -0.42

10 -0.06 -0.25

11 -0.05 -0.21

12 0.04 0.17

13 0.11 0.47

14 0.13 0.55

15 0.16 0.68

16 0.17 0.72

17 0.2 0.85

18 0.21 0.89

19 0.3 1.27

20 0.34 1.44

21 0.41 1.73

Example Data for Chi-Squared Normality Test

27

21=Ν

54

21==Κ

The standardized residuals are then grouped according to the cell boundaries designated

for 5 cells in Table 4-3 (EPA, April 1989). The cell boundaries for K=5 are -0.84, -0.25,

0.25 and 0.84. Applying these boundaries to the above Zi, there are 4 observations in the

first cell, 6 in the second cell, 2 in the third, 4 in the fourth, and 5 in the fifth. These

counts represent the Ni in the above equation that is used to calculate the ΧΧΧΧ2 statistic. The

expected number in each cell, Ei, is N/K or 4.2. The ΧΧΧΧ2 statistic for these data is

calculated as:

( ) ( ) ( ) ( ) ( )10.2

2.4

2.45

2.4

2.44

2.4

2.42

2.4

2.46

2.4

2.4422222

2 =−

+−

+−

+−

+−

The critical value at α = 0.05 for a chi-squared test with 2 (K - 3 = 5-3 = 2) degrees of

freedom is 5.99 (Table 1, Appendix B; U.S. EPA, 1989). Since the calculated chi-

squared value is less than the tabulated value, we fail to reject H0 that the data are

normally distributed.

Seasonality:

Prior to constructing the Control Charts, the significance of data seasonality is evaluated

using the nonparametric Kruskal- Wallis test (U.S. EPA, April 1989) at the α = 0.05

significance level. The null hypothesis to be tested is:

H0: The populations from which the quarterly data sets have been drawn have

the same median.

The alternative hypothesis is:

HA: At least one population has a median larger or smaller than at least one

other population’s median.

Where there are no ties, the Kruskal-Wallis statistic, H, is calculated:

( )( )1N3

N

R

1NN

12H

k

1i i

2

i +−

+= ∑

=

Where:

Ri = the sum of the ranks of the ith group;

Ni = the number of observations in the ith group (station);

N = the total number of observations; and

k = the number of groups (seasons).

28

If there are tied values (more than one data point having the same value) present in the

data, the Kruskal-Wallis Η′ statistic is calculated:

( )

Ν−Ν−

Η=Η

∑=3

g

1i

iT

1

'

Where:

g = the number of groups of distinct tied observations; and

N = the total number of observations

Ti is computed as:

( )i

3

ii ttT −=

Where:

ti = the number of observations in tie group i.

The calculated value H (or Η′ if ties are present) is compared to the tabulated chi-

squared value with (K-1) degrees of freedom, (Table A-1, Appendix B; U.S. EPA, April

1989) where K is the number of seasons. The null hypothesis is rejected if the computed

value exceeds the tabulated critical value.

EXAMPLE:

Well 1 Well 2 Well 3 1/45 (7) 1.52 (8.5) 1.74 (13)

1.27 (6) 2.46 (22) 2.00 (17.5

1.17 (4) 1.23 (5) 1.79 (14)

1.01 (3) 2.20 (20) 1.81 (15)

2.30 (21) 2.68 (23) 1.91 (16)

1.54 (10) 1.52 (8.5) 2.11 (19)

1.71 (11.5) ND (1.5) 2.00 (17.5)

1.71 (11.5)

ND (1.5)

Example Data for Seasonality

9N75.5,R ii == 7N88.5,R 22 == 7N112,R 33 ==

2 tand2,t2,t2,t4,g 4321 =====

( ) 6223

4321 =−=Τ=Τ=Τ=Τ

246666i =+++=Τ∑

29

( )( ) 05.5243

7

112

7

5.88

9

5.75

2423

12 222

=−

++=Η

( )

30.5

2323

241

05.5

2

=

−−

=Η′

From Table A19, Gilbert 1987, X2

.95,2 = 5.99. Since Η′<5.99, we cannot reject H0 at

α=.05 level.

Application of the Kruskal-Wallis test for seasonality requires a minimum sample size of

four data points in each season. A minimum of four years of quarterly data is thus

required in order to appropriately evaluate data for seasonality. Sanitas currently tests

seasonality for up to twelve seasons. The default seasonal start dates are February 1, May

1, August 1, and November 1. Please see the “Options” section for instructions on how

to change the default seasonal cutpoints.

Correcting for Seasonality:

When seasonality is known to exist in a Time Series of concentrations, then the data

should be deseasonalized prior to constructing Control Charts in order to take into

account seasonal variation rather than mistaking seasonal effects for evidence of

contamination. This correction is performed following transformation of the data (if a

data transformation is required) and prior to an adjustment for non-detects, described

below.

Using the method described by the EPA (U.S. EPA, April 1989), the average

concentration for season i over the sampling period, Xi , is calculated as follows:

( )N

XX iNij

i

+⋅⋅⋅+=Χ

Where:

Xij = the unadjusted observation for the ith season during the jth year; and

N = the number of years of sampling.

The grand mean, X , of all the observations is then calculated as:

∑ ∑∑= ==

==n

1i

n

1i

N

1j n

X

Nn

XX

Iij

30

Where:

n = the number of seasons per year.

The adjusted concentrations, Zij, are then computed as:

XXij

Xij

Z +−= i

EXAMPLE:

1983 data 1984 data 1985 data January 1.99 2.01 2.15 February 2.10 2.10 2.17

March 2.12 2.17 2.27 April 2.12 2.13 2.23 May 2.11 2.13 2.24 June 2.15 2.18 2.26 July 2.19 2.25 2.31

August 2.18 2.24 2.32 September 2.16 2.22 2.28

October 2.08 2.13 2.22 November 2.05 2.08 2.19 December 2.08 2.16 2.22

Example Data for Deseasonalizing

EXAMPLE:

3 month average

1983 adjusted

1984 adjusted

1985 adjusted

January 2.05 2.11 2.13 2.27

February 2.12 2.15 2.15 2.21

March 2.19 2.10 2.15 2.25

April 2.16 2.13 2.14 2.24

May 2.16 2.12 2.13 2.25

June 2.20 2.12 2.15 2.23

July 2.25 2.11 2.16 2.23

August 2.25 2.10 2.16 2.24

September 2.22 2.11 2.17 2.22

October 2.14 2.10 2.16 2.24

November 2.11 2.11 2.14 2.25

December 2.16 2.09 2.17 2.23

Deseasonalized Data

2.17X =

January 1983 Adjusted Concentration:

1.99 – 2.05 + 2.17 = 2.11

31

Censored Data:

Censored data include data that are less than the detection limit. If a small proportion

(typically less than 15 percent) of the observations are nondetects, these will be replaced

with one-half of the method detection limit prior to running the analysis (Gilbert, 1987,

and U.S. EPA, April 1989).

If more than, for example, 15 percent but less than 50 percent of the data are less than the

detection limit, the data’s sample mean and sample standard deviation may be adjusted

according to the method of Cohen (1959) or Aitchison as described by EPA (U.S. EPA,

April 1989). Assumptions for use of this technique are that the data are normally

distributed and that the detection limit is always the same. If multiple detection limits

exist, then they are all replaced with the highest detection limit.

Cohen’s Adjustment Procedure:

Using Cohen’s method, the sample mean, xd , is calculated for data above the detection

limit:

∑=

=m

1i iX

m

1dX

Where:

m = the number of data points above the detection limit; and

xi = the value of the ith constituent value above the detection limit.

The sample variance, Sd2 , is then calculated for data above the detection limit:

( )1m

m

1i

2m

1i iX

m

12i

X

1m

2m

1idX

iX

2d

S−

∑=

∑=

=−

∑=

=

The two parameters, h and γ , are then calculated as follows:

( )n

mnh

−=

and

( )2DL

2d

S

X −

=

d

γ

Where:

32

n = the total number of observations (i.e., above and below the detection

limit); and

DL = the detection limit.

These values are then used to determine the tabulated value of the parameter λ (Table A-

5, Appendix A; U.S. EPA, 1992).

The corrected sample mean, xc , which accounts for the data below detection limit, is

calculated as follows:

( )DLddc −Χ−Χ=Χ λ

The corrected sample standard deviation, Sc, which accounts for the data below detection

limit, is calculated as follows:

( )( ) 21

22DLSS ddc −Χ+= λ

The adjusted sample mean, xc , and sample standard deviation, Sc, are then used for

construction of the Shewhart-CUSUM Control Chart.

EXAMPLE:

1984 1985 1986 1987

1850 1780 <1450 1760

1760 1790 1800 1800

<1450 1780 1840 1900

1710 <1450 1820 1770

1575 1790 1860 1790

<1450 1800 1780 1780

Example Data for Cohen’s Adjustment

< Indicates that the value was not detected

1786.75X d =

4174.4S2

d =

.1666724

2024h =

−=

( ).0368

214501786.75

4174.4=

−=γ

From Table 7, Appendix B, US EPA Guidance, 1989:

33

h=.15 h=.20

.00 .17342 0.24268

.05 .17925 0.25033

γ

EPA Guidance

The value for λ is found through double linear interpolation:

.24268 - .17342 = .06926 .06926 * .3334 = .02309

.17342 + .02309 = .19651

.25033 - .17925 = .07108 .07108 * .3334 = .02370

.17925 + .02370 = .20295

.20295 - .19651 = .00644 .00644 * .736 = .004740

.19651 + .004740 = .20125

λ = .20125

cΧ = 1786.75-.20125(1786.75-1450) = 1718.98

CS = [4174.4+.20125(1786.75-1450)

2 ) 2

1

=164.31

Aitchison’s Adjustment Procedure:

Using Aitchison’s method the corrected sample mean, xa , is calculated:

Χ′

−=Χ

n

na

01

Where:

x′ = the average of the n1 detected values;

0n = the number of samples in which the compound is not detected; and

n = the sample size.

The corrected standard deviation, sa, is calculated:

( )( )

2X1nonn

n0

n2s

1n

10

nn

as

−+′

+−=

Where:

34

s′ = the standard deviation of the n1 detected measurements.

EXAMPLE:

Date Date

2/15/1997 <10

5/5/1997 <10

7/8/1997 <10

10/12/1997 15

2/5/1998 17

4/20/1998 13

6/2/1998 <10

10/4/1998 15

12/9/1998 12

2/10/1999 17

Example Data for Aitchison’s Adjustment

14.83X =′ 2.04S =′

10n = 40

n =

8.9a

X = 7.8aS =

Kaplan-Meier Procedure:

For the purposes of automation, Sanitas runs normality tests on the raw data, as described

elsewhere in this document, and selects an appropriate transformation, if any, in place of

the Unified Guidance’s steps 4 and 5 (which involve creating interactive probability plots

to subjectively determine normality). Otherwise the procedures are the same:

Given a sample of size n containing left-censored measurements, identify and sort the m

< n distinct values, including distinct RLs. Label these as x(1), x(2), …, x(m).

For each i = 1 to m, calculate the risk set (ni) as the total number of detects and non-

detects no greater than x(i). Also compute di as the number of detected values exactly

equal to x(i).

Using the following equation, compute the Kaplan-Meier CDF estimate FKM(x(i))for i =

1, …, m–1. Also let FKM (x(m))= 1.

Compute the adjusted mean and standard deviation after applying any necessary

normalizing transformation f() using the following equations.

35

Control Chart Procedure:

This procedure for construction of the Shewhart-CUSUM Control Chart follows the EPA

recommendations (U.S. EPA, April 1989). A version customized for California is also

available in Sanitas, and some minor adjustments have been made for other protocol

standards. The Shewhart-CUSUM Control Chart recommends a minimum of six to eight

background data points in order to reliably determine the mean and standard deviation for

each constituent’s concentration in a given well.

Three parameters are selected prior to plotting:

h = the control limit to which the cumulative sum values (CUSUM) are

compared. The EPA recommended value is h = 5 units of standard deviation.

California does not require this limit to be met for detection monitoring. The

ASTM recommended value is h = 4.5 units of standard deviation for a

background n < 12 and h = 4.0 units of standard deviation for a background n

>= 12.

K = a reference value that establishes the upper limit for the acceptable

displacement of the standardized mean. The EPA and California

recommended value is K = 1. The ASTM recommended value is K=1 for

background n < 12 and K = .75 for background n >= 12 (and the EPA Unified

Guidance mentions using K = .75 “after 12 consecutive in-control

measurements”).

SCL = the upper Shewhart control limit to which the standardized mean will be

compared. For California sites, a value of SCL = 2.327 units of standard

deviation is used per Article 5. USEPA 1992 recommended SCL = 4.5, but the

Unified Guidance suggests SCL = 5.0 for most cases (see the discussion in the

UG: it may be appropriate to use SCL = 4.0 “after 12 consecutive in-control

measurements”). The ASTM recommended value is SCL = 4.5 for a

background n < 12 and SCL = 4.0 for a background n >= 12.

Assume that at time period Ti, ni concentration measurements X1,…,Xni, are available.

Their average, X , is computed.

The Shewhart Control Chart showing the standardized mean is the equivalent to an X

chart for n=1 (within a single sampling period). The standardized mean, Zi, is then

computed:

36

( ) /Si

ni

Xi

Z X−=

Where:

X = the mean obtained from prior monitoring data from the same

station (at least four data points); and

S = the standard deviation obtained from prior monitoring data from

the same station (at least four data points).

When applicable, for each time period, Ti, the cumulative sum, Si (CUSUM), is

calculated:

( ){ }1-i

SKi

Z0,maxi

S +−=

Where max {A,B} is the maximum of A and B, starting with So = O.

The values of Si versus Ti are then plotted. An “out of control” situation occurs under

EPA standards at the time period Ti if, Si > h or Zi > SCL, and under California standards

only if Zi > SCL.

Under Unified Guidance and ASTM Standards a refinement has been added. If a single

value exceeds and is followed immediately by a value that is itself within the control

limits, then the second value serves as a non-validating retest of the first. That is, an out-

of-control situation requires either the most recent point to exceed the control limits, or

two such points in a row.

The results may be plotted in standardized units or may be converted back to their

original metric units.

EXAMPLE 13:

37

Date Data (mg/l) Zi (s.d.) Si (s.d.) Si (mg/l)

1/5/1991 *3.235

4/6/1991 *4.234

8/9/1991 *5.473

2/15/1992 *9.945

6/1/1992 *11.902

10/4/1992 *4.341

1/3/1993 *3.235

4/2/1993 *4.234

9/5/1993 5.473 -0.108 0 5.825

2/6/1994 9.945 1.261 0.261 6.678

5/12/1994 11.9 1.86 1.121 9.486

8/4/1994 4.341 -0454 -0333 4.735

12/22/1994 3.235 -0793 0 5.825

3/4/1995 4.234 -0.487 0 5.825

7/8/195 5.473 -0.108 0 5.825

11/5/1995 9.945 1.261 0.261 6.678

Example Data for Shewhart-CUSUM Control Charts

* = Background data

5.825X = 3.267SD = 1K =

mg/1 20.526SD x 4.5SCL ==

mg/1 22.159SD x 5h ==

Alpha Computation:

To compute an alpha level for a given Control Chart report, Sanitas creates thousands of

control charts using the current parameters and normally-distributed random background

and compliance data. Since each of these “reports” is created with background and

compliance data taken from the same normal distribution, there is no “contamination”,

and any exceedances are therefore by definition false positives. The percentage of

exceedances is thus the false positive rate, or alpha level.

Intrawell Rank Sum

Note: v.9.3 includes a (statistically similar) interwell version of the rank sum also.

Description:

When the historical data are neither normal nor transformed-normal, there is an option to

perform a nonparametric comparison between the historical data and subsequent data

points in lieu of constructing a Control Chart. The Kruskal-Wallis Rank Sum test is a

38

nonparametric procedure where the sums of ranked data sets are compared. Subsequent

sample data are compared with sampling data from the initial monitoring period of the

same well. It is assumed that during the initial monitoring period the well has shown no

evidence of contamination nor an increasing trend. This test does not require a normal

distribution of the data.

The null hypothesis to be tested is:

H0: The historical (background) data and the compliance data have the same

median constituent concentration.

The alternative hypothesis is:

HA: The compliance data have a greater median constituent concentration than

the historical data.

Procedure:

The Kruskal-Wallis test procedure is used to evaluate whether the historical (background

data) and the compliance data have the same median constituent concentration (see

Control-Chart Seasonality test for method description and example).

Mann-Whitney / Wilcoxon Rank Sum

Description:

The Mann-Whitney test, also known as Wilcoxon Rank Sum, may be used to test whether

the measurements from one population are significantly higher or lower than another

population. This test is available for both interwell and intrawell analyses.

The null hypothesis that is being tested is:

HO: The two data sets are equivalent.

The alternative hypothesis is:

HA: There is a statistically significant difference between the two data sets.

Procedure:

Sanitas uses the normal approximation of the Mann-Whitney test as follows.

First divide the data into two groups where

n1 = the number of observations in sample one,

n2 = the number of observations in sample two,

and 21 nnN += .

39

Order the measurements for group 1 and group 2 from the lowest value to the highest

value.

Calculate the Mann-Whitney statistic as:

Or, if ties are present:

Where:

111

21 R2

1)(nnnnU −

++=

R1 = The sum of the ranks of the observations in sample one.

And:

A statistically significant finding is declared if the absolute value of Z is greater than the

tabled value Z1-α/2. Significance is tested at the following alpha levels: .10, .05, .025, and

.01.

Welch's t-test

Assumptions:

All t-tests assume independence of the individual sample values. It is left to the user to

ensure that the time span between subsequent samples allows for independence of the

data. This assumption can be further tested by means of the Rank Von Neumann test,

described elsewhere in this document, if desired.

The hypothesis tests with Welch's t-test assume that errors (residuals) are normally

distributed. The normal distribution can be checked using the multiple group Shapiro-

12

1)(Nnn

2

nnU

Z

21

21

+

−=

∑ ∑ −= )t(tt i

3

i

12

tNN*

NN

nn

2

nnU

Z3

2

21

21

∑−−

−=

40

Wilk test, described below. Two groups (1 background and 1 compliance well in the

case of Interwell; time ranges in the case of Intrawell) are to be compared, and the

minimum sample size requirement is 4 samples per group. If the data normality

assumption is not met after attempted transformation(s) (depending on user settings), then

the Wilcoxon Rank Sum, described elsewhere in this document, is substituted.

In addition, the Wilcoxon Rank Sum will be substituted in cases in which > 20% of the

data are censored values.

Multiple Group Shapiro-Wilk test:

1) Given K groups to be tested, denote the sample size of the ith group as ni.

2) Compute the Shapiro-Wilk statistic (SWi) for each of the K groups, as discussed

elsewhere in this document.

3) Transform each Shapiro-Wilk statistic to the intermediate quantity (Gi). For

sample size >= 7, Gi = γ + δln(Swi - ε/1- SWi), where γ, δ, and ε are from tables in

Technometrics Vol. 10 number 4, and other sources. For sample size< 7, find a

tabled Gi based on ui = ln(Swi - ε/1- SWi).

4) Sum the Gi's, and multiply by the reciprocal of the square root of K to get the

Shapiro-Wilk multiple group statistic G.

5) Given the desired significance level (α), determine an α-level tabulated critical

point as the upper αth normal quantile (zα). If the absolute value of G > zα take

this as significant evidence of non-normality at the α level.

PROCEDURE

Using group means and standard deviations, Welch’s t-statistic is computed as

where B indicates background and C indicates compliance groups.

The approximate degrees of freedom are computed as

This quantity is rounded to the nearest integer to become df.

t is compared to the (1-α)*100th percentage point of the Student’s t-distribution with df

degrees of freedom. If t > the critical value, it can be concluded that the compliance

mean is significantly greater than the background mean at the α significance level.

41

One-Way Analysis of Variance (ANOVA)

Description:

Analysis of variance (ANOVA) is the name given to a variety of similar statistical

procedures. These similar procedures all compare the means or median values of

different groups of observations to determine if a statistical difference exists among

groups. The procedure is an interwell procedure that can be used to compare compliance

well data to background well data. Two types of analysis of variance are presented:

parametric and nonparametric one-way analysis of variance. Both methods are

appropriate when the only factor of concern is the spatial variability of constituent

measurements in a given sampling period. For statistically meaningful results, at least

three observations should be present in each well. Prior to statistical analysis, the

assumption of data independence should be considered. A specified rigorous field

sampling protocol should be followed.

Parametric ANOVA

Assumptions:

The hypothesis tests with parametric ANOVA assume that errors (residuals) are normally

distributed with equal variances across all wells and a single detection limit is used for

the analyte of interest. The normal distribution can be checked by testing the distribution

of the residuals (the difference between the observations and the values predicted by the

ANOVA model). At least p > 2 groups (wells) are to be compared, and the total sample

size, N, should be large enough so that N - p > 5. Under CA standards, the minimum

sample size requirement is 4 samples per well. If the data normality assumption is not

met, then nonparametric ANOVA is performed.

Normality of Residuals:

The residuals are the differences between each observation and its predicted value. In the

case of one-way analysis of variance, the predicted value for each observation is the

group (well) mean. Thus the residuals, Rij, are given by:

iXij

Xij

R −=

Where:

Xij = the jth observation in the ith well; and

Xi = the mean of the observations in the ith well.

42

Once the residuals have been computed, the Shapiro-Wilk test for normality (previously

described) is performed on the absolute values of the residuals. If the residuals are not

found to be normally distributed, the data are transformed and the normality test of the

residuals is repeated. If the residuals are not found to be transformed-normal,

nonparametric ANOVA is performed (subsequently described).

Equality of Variance Test:

Levene’s test for homogeneity of variance is performed as follows:

Compute the absolute values of the residuals from the ANOVA, treating each compliance

point well and the combined set of background wells as separate groups.

Compute the F-statistic for the ANOVA on the absolute residuals.

GroupsWithin

GroupsBetween

MS

MSstatisticF =−

Where:

MS = Mean Squares

( )1−=

p

SSMS

Groups

upsBetweenGro

and

( )pN

SSMS Error

GroupsWithin −

=

Where:

p = the number of groups;

N = the total sample size; and

SS = the Sum of Squares.

Sum of Squares are computed as follows:

∑=∑=

−∑=∑=

=−=

p

1i

2i

n

1j N

X..2ij

Xp

1i

in

1j

2..

ijX

totalSS X

( ) 2X..N

1p

1i

2i.

X

in

1p

1i..

in

StationsSS XX i −∑

=∑=

=−=

43

and

StationsSS

totalSS

ErrorSS −=

Where:

X.. = the sum of the total observations;

X.. = the mean of the total observations; Xi. = the sum of all ni observations in group i;

.X i = the mean of the observations at group i; and

ni = the number observations at group i.

If the calculated F-statistic exceeds the tabulated F-statistic (α = 0.05) for (p - 1) and (N -

p) degrees of freedom found in Table 2, (Appendix B; U.S. EPA, April 1989), conclude

that the variances among the groups are not equal. In this case, Sanitas will (by default)

transform the original data and perform the equality of variance test again. If the

calculated F-statistic does not exceed the tabulated F-statistic, conclude that the variances

are equal and perform ANOVA on the original observations. If the calculated F-statistic

still exceeds the tabulated F-statistic, conclude that the variances among the groups are

not equal and perform a nonparametric analysis of variances. If the calculated F-statistic

is less than the tabulated F-statistic, conclude that the variances among the groups are

equal and perform ANOVA on the transformed data.

EXAMPLE:

Date Well 1 Well 2 Well 3

1/3/1995 22.9 2.0 2.0

2/5/1995 3.09 1.25 109.4

4/5/1995 35.7 7.8 4.5

6/10/1995 4.18 52 2.5

Group mean 16.47 15.76 29.6

Example Data for Levene’s Equality of Variance Test

Date

Well 1

(residuals)

Well 2

(residuals)

Well 3

(residuals)

1/3/1995 6.43 13.76 27.6

2/5/1995 13.38 14.51 79.8

4/5/1995 19.23 7.96 25.1

6/10/1995 12.29 36.23 27.1

Group mean 12.83 18.12 39.9

Overall Mean 23.62

Table 8.1:Residuals of Data

44

( ) ( ) ( )[ ] ( ) 1646.723.621239.9418.12412.834SS2222

wells =−++=

( ) ( ) ( )[ ] ( ) 4318.823.62123.9.913.386.43SS2222

total =−+++= L

2672.11646.74318.8SSerror =−=

2.77296.9

823.3FStatistic ==

The critical value at the .05 α level is F.95, 2, 9 = 4.26. Since the F-statistic of 2.77 is less

than the critical point, the assumption of equal variance can be accepted.

Censored Data:

Censored data include data that are less than the detection limit. If a small proportion

(less than 15 percent) of the observations are less than the detection limit, these will be

replaced with one half of the method detection limit prior to running the analysis (Gilbert,

1987 and U.S. EPA, April 1989). If more than 15 percent of the data are less than the

detection limit, a nonparametric ANOVA is performed.

Parametric ANOVA Procedure:

When there is more than one compliance well but fewer than eleven, and all the

previously mentioned assumptions are met, parametric ANOVA will be performed as

follows (in the case of more than 10 compliance wells, interval analysis is recommended

in lieu of ANOVA):

An F-statistic is computed (as previously described in Levene’s test for homogeneity of

variance) on the well observations (instead of the absolute residuals). When the F-statistic

is found to be significant at the α = 0.05 level, a contrast test will be performed to

determine if any compliance well constituent concentration is significantly higher than

the background well constituent concentration. The ANOVA table is presented as

follows:

EXAMPLE 16:

Source of

Variation

Sum of

Squares

Degrees of

Freedom

Mean

Squares

F

Between

Groups

SS

Groups

p-1

MS Groups =

SS Groups / (p-1)

F =

MS Groups / MS error

Error (within

Groups)

SS error

N-p

MS error =

SS error / (N - p)

Total SS total N-1

ANOVA Table

45

Bonferroni t-statistic (used with 5 or fewer comparisons):

When the F-statistic is found to be statistically significant, a contrast test is recommended

to determine if the significant F-statistic is due to differences between background and

compliance wells. The Bonferroni t-statistic contrast test is recommended when five or

fewer comparisons are to be made (U.S. EPA, April 1989).

The mean(s), Xb , from the background well(s) is (are) computed as follows:

∑=

=u

1iiX

bn

1bX

Where:

nb = the total sample size from all u background groups;

Xi = the mean of the concentrations from the ith background group; and

u = the total number of background groups.

Compute the m differences between the average concentration from each compliance

group Xi , and the average of the background, Xb .

bi. XX − m,1,i K=

Where:

m = the number of compliance groups.

Compute the standard error, SEi, of each difference as:

21

ib

errorin

1

n

1MSSE

+=

Where:

MSerror = determined from the ANOVA table (see above); and

ni = the number of observations at group i.

The t-statistic is obtained from the Bonferroni t-table (Table 3, Appendix B; U.S. EPA,

April 1989)

Where:

αααα = 0.05;

(N - p) = the degrees of freedom;

N = the total number of observations;

p = the total number of groups; and

46

m = the number of comparisons to be made.

Compute the critical values, Di, for each compliance group i.

ti

SEi

D =

If the difference bi. XX − , exceeds the critical value, Di, then conclude that the ith

compliance group has significantly higher constituent concentrations than the average

background group(s). Otherwise, conclude that there is no statistically significant finding.

This computation should be performed for each of the m compliance groups individually.

The test is designed so that the overall experimentwise error is 5%.

When more than five group comparisons are to be made, the t-statistic used is:

− 0.99,pn

t

Obtained from the Bonferroni t-table (Table 3, Appendix B; U.S. EPA, April 1989).

The above is based on one-sided comparisons. When a two-tailed comparison is

indicated, Sanitas will use the t-statistic:

−−=

2mα1,pN

tt

A significant difference is indicated between background and compliance groups when

the absolute value of the difference bi Χ−Χ exceeds the critical value, Di.

When California Standards are selected, the t-statistic used will be t(n-1),(0.99). If a modified

alpha, α*, is computed, the t-statistic used will be t(n-1),(1-α*).

EXAMPLE:

Date Well 1 (up) Well 2 (down) Well 3 (down)

1/3/1995 22.9 70 2.0

2/5/1995 3.09 82 20

4/5/1995 35.7 65 4.5

6/10/1995 4.18 52 2.5

Group mean 16.47 67.25 7.25

Group Sample Size 4 4 4

Example Data for Parametric ANOVA

Source of Variation Sum of Squares

Degrees of Freedom

Mean Squares

F-Statistic

Between Wells 8351.8 2 4175.9 26.39

47

Error (within wells) 1424.2 9 158.2

Total 9776.0 11

Table 8.2:ANOVA Table

16.47X b =

50.7816.4767.25XX b1 =−=−

9.2216.477.25XX b2 −=−=−

8.894

1

4

1158.2SESE

21

21 =

+==

2.262tt 9,.975 ==

20.122.2628.89DD 21 =∗==

For compliance Well 2, the difference 50.78 exceeds the critical value 20.12. Therefore,

we can conclude that Well 2 has significantly higher constituent concentrations than

background. For compliance Well 3, the difference –9.22 does not exceed the critical

value of 20.12. Therefore, we can conclude that Well 3 does not have significantly

higher constituent concentrations than background.

Nonparametric ANOVA

Description:

This statistical procedure is an interwell test that compares the median values of

background wells to the median values of compliance wells and determines if a

significant difference exists among the groups.

Assumptions:

The standard assumption in one-way nonparametric ANOVA is that the data from each

well come from the same continuous distribution, and therefore have the same median

concentrations of chemical constituents. For statistically valuable results, at least four

observations for each well should be used and the total sample size minus the number of

groups (wells) should be greater than four. Under California options, minimums of nine

observations per well are required. In addition, this ANOVA test does not require a

distribution that is normal.

Independence:

Prior to statistical analysis, the assumption of data independence should be considered. A

specified rigorous field sampling protocol should be followed.

48

Procedure:

The Kruskal-Wallis test procedure (see Control Chart-Seasonality test for method

description) is used to evaluate the data sets at the α = 0.05 significance level when there

are two or more wells being compared. This test is performed on the ranked values, and

the null hypothesis to be tested is:

H0: The populations from which the quarterly data sets have been drawn have

the same median concentrations.

The alternative hypothesis to be tested is:

HA: At least one population has a median larger or smaller than the

background population.

The calculated value, H (or H′ , if ties are present) is compared to the tabulated chi-

squared value with (k-1) degrees of freedom (U.S. EPA, April 1989) where k is the

number of groups. The null hypothesis is rejected if the calculated value exceeds the

tabulated critical value. Application of the Kruskal-Wallis test requires a minimum

sample size of four data points for each well.

Censored Data:

Censored data include data that are less than the detection limit. These data will be

replaced with one half of the method detection limit prior to running the analysis (U.S.

EPA, 1992).

Tolerance Limits

Description:

An alternative approach to analysis of variance (to determine whether there is statistically

significant evidence of an impact) is to use Tolerance Limits. A tolerance limit is

constructed from the data on unimpacted background wells (or in the case of intrawell

Tolerance Limits, from non-trending historical data – this discussion will focus on the

interwell case). The concentrations from compliance wells are then compared to the

upper limit of the tolerance interval. With the exception of pH, if the compliance

concentrations fall above the upper limit of the tolerance interval (Tolerance Limit), this

provides statistically significant evidence of a difference. For pH and other constituents

in which low values as well as high values may be indicative of a facility impact, the

lower limit of the tolerance interval is also used. Compliance concentrations that fall

outside the bounds of the tolerance interval provide evidence of a statistical difference.

Assumptions:

Tolerance Limits are most appropriate for use at facilities that do not exhibit high degrees

of spatial variation between background wells and compliance wells. In addition, for a

Parametric Tolerance Limit, the background data must be normally or transformed

49

normally distributed, with at least three observations, but preferably eight or more

observations.

Distribution:

The distribution of data is evaluated using the Shapiro-Wilk test for normality (see

Control Chart-Distribution for method description) for samples with 50 or fewer

observations. The Shapiro-Francia test is used for sample sizes greater than 50 (see

Control Chart-Distribution for method description). Parametric intervals with background

sample sizes over 50 are only applicable for interwell tests.

Parametric Tolerance Limit Procedure:

To construct the upper tolerance limit, the mean, X , and the standard deviation, S, are

calculated from the background data. The one-sided upper tolerance limit, TL, is

constructed as follows:

KSXTL +=

Where:

X = the mean of the background observations;

K = the one-sided normal tolerance factor found in Table 5 (Appendix B;

U.S. EPA, April 1989); and

S = the standard deviation of the background observations.

Each observation from the compliance wells is compared to the upper tolerance limit. If

any observation exceeds the tolerance limit, that is statistically significant evidence of an

impact. In the case of transformed-normal background data, the tolerance interval is

constructed on the transformed background data, and the transformed compliance well

observations are compared to the upper tolerance limit.

In the case of a two-tailed test, both an upper and a lower tolerance limit are constructed.

The upper tolerance limit, UTL, is constructed as follows:

KSXUTL +=

Where:

K = the two-tailed normal tolerance factors (Eisenhart, C., Hastay, M.W.,

and Wallis, W.A., 1947) for 95% (default for interwell) or 99% (default

for intrawell) confidence and 95% coverage.

The lower tolerance limit, LTL, is constructed as follows:

KSLTL X −=

Where:

50

K = the two-tailed normal tolerance factors (Eisenhart, C., Hastay, M.W.,

and Wallis, W.A., 1947) for the confidence level in use and 95%

coverage.

EXAMPLE:

Well 1 (up) Well 2 (up) Well 3 (down)

4.2 7 7.6

3.5 3.4 9

5.6 6.7 6

5.6 4.6 7.2

6 5 4.3

4.3 5 5.4

2.5 4.2 6.3

5 6.3 5.2

Example Data for Parametric Tolerance Limit

4.931X = 1.244s = 2.52K =

( ) 8.072.52*1.2444.931KsTL X =+=+=

Censored data:

If less than 15 percent of the background well observations are nondetects, these will be

replaced with one half of the method detection limit prior to running the analysis (U.S.

EPA, April 1989).

If more than 15 percent but less than 50 percent of the background data are less than the

detection limit, the data’s sample mean and sample standard deviation are adjusted

according to the method of Cohen or Aitchison (see Control Chart-Censored Data for

method description).

If more than 50 percent (by default) of the background data are below the detection limit,

or when the background data are not transformed-normal, a Nonparametric Tolerance

Limit will be constructed.

Nonparametric Tolerance Limit:

When the background data are not normal or transformed-normal, or greater than 50

percent of the background data are less than the detection limit, a nonparametric tolerance

limit is recommended. The highest value from the background data is used as the upper

tolerance limit. The achieved confidence and/or coverage rates depend entirely on the

background number, and coverage rates for various confidence levels for will be

provided for each nonparametric tolerance limit. Fewer background samples will result

in less coverage at a specific false positive rate, or less confidence at a specific coverage.

For instance, given a background number of 18, the level of coverage achieved for the

99% confidence level is approximately 85%. The recommended coverage/confidence for

51

interwell tests is generally 95% coverage/95% confidence, and for intrawell is 95%

coverage/99% confidence.

Procedure:

When there is at least one detectable observation, the highest value for the background

data is used to set the upper limit of the tolerance interval. When all the data are

censored (i.e., nondetects or trace values) the behavior will depend on the user choice in

the Configure Sanitas window: the tolerance limit is either the most recent or highest

detection limit, or a “substitution” such as ½ of that value (again, depending on settings).

Coverage values for a given alpha have been shown to be at least the nth

root of alpha.

For example, if the desired confidence is 99% (alpha = 0.01) and the n is 18, the coverage

is the 18th

root of 0.01, or 0.774.

.

52

Prediction Limits (or Intervals): EPA Standards

Description:

A prediction limit is used to determine whether a single observation is statistically

representative of a group of observations. It is a statistical interval calculated to include

one or more observations from the same population with a specified confidence. In

ground water monitoring, a prediction limit approach may be used to make comparisons

between background and compliance data. The interval is constructed from a

background set of observations such that it will contain K future compliance observations

with stated confidence. If any observation exceeds the bounds of the prediction limit, this

is statistically significant evidence that that observation is not representative of the

background group.

Assumptions:

The parametric prediction limit is constructed if the background data all follow a normal

or transformed-normal distribution. A minimum of four background values should be

used in constructing the interval. The estimate of the standard deviation (S) that is used

should be an unbiased estimator. The usual estimate assumes that there is only one source

of variation. If there are other sources of variation, such as time effects, or spatial

variation in the data used for the background, then the parametric Prediction Limit is

inappropriate. In these situations, a multivariate statistical procedure is suggested.

Distribution:

In order to determine whether a parametric or nonparametric prediction limit should be

used, the distribution of the data is evaluated by applying the Shapiro-Wilk or Shapiro-

Francia tests for normality to the raw data or, when applicable to the ladder of powers

(Helsel & Hirsch, 1992) transformed data. The null hypothesis, Ho, to be tested is:

H0: The population has a normal (or transformed-normal) distribution.

The alternative hypothesis, HA, is:

HA: The population does not have a normal (or transformed-normal)

distribution.

Parametric Prediction Limits Procedure:

The mean, X , and the standard deviation, S, are calculated for the raw or transformed

background data. The number of comparison observations, K, is specified to be included

in the interval. If K will be different from the default in Sanitas™ which assumes K=1

53

for each well, the number of observations, K, to be compared to the interval must be

specified in advance (see Prediction Limit Setup…).

Then the interval is given by:

−−++

αK,11,nt

n

1

m

1S0,X

Where:

m = 1 for K single observations;

n = the number of observations in the background data; and

t(n-1, K, (1-αααα))

is found in Table 3 (Appendix B; U.S. EPA, April 1989) with n-1

degrees of freedom, K comparison observations, and 1-αααα significance level.

K for intrawell tests is 1. The prediction limit is constructed to have a (1-(α /K)) percent

probability of containing each of the next K sampling observations if no change has

occurred from background conditions (or equivalently a probability of 1-α of containing

all K future observations when no change has occurred). If any of the K comparison

observations fall outside the bounds of the Prediction Limit, this is statistically significant

evidence that the comparison data are not representative of the background group of

observations.

In the case of interwell tests when K is less than 5, the t-value used in the above equation

differs under EPA and CA standards for interwell analyses but not for intrawell analyses.

For interwell tests under CA standards and intrawell tests under both EPA and CA

standards, the t-value used is consistent with a 1 percent α-level per individual

comparison observation. For interwell tests under EPA options, the α-level used to derive

the t-value is 5 percent divided by the number of comparison observations. This results in

different limits under EPA versus CA standards for interwell analyses when K is less

than 5.

EXAMPLE 20:

Well 1 (up) Well 2 (up) Well 3 (down)

104 94 112

124 102 95

109 86 87

116 105 114

Example Data for Parametric Prediction Limit

105=Χ 89.11=s 860.1=t

54

128.911.898

1

1

11.860105

n

1

m

1sPL X =++=++=

t

For a two-tailed test, t(n-1,K,(1-( α /2))) is substituted for t(n-1, K, (1-α)) in the above formula.

Statistically significant evidence of an impact is noted when compliance observations fall

outside the bounds of the upper and lower prediction limits.

When a modified alpha, α*, is computed, t(n-1,K,1-α*) will be substituted for t(n-1, K, (1- α)) in

the above formula.

Censored data:

If less than 15 percent of the background observations are nondetects, these will be

replaced with one half of the method detection limit prior to running the analysis (U.S.

EPA, April 1989).

If more than 15 percent but less than 50 percent of the background data are less than the

detection limit, the data’s sample mean and sample standard deviation are adjusted

according to the method of Cohen, Aitchison or Kaplan-Meier (see Control Charts for

method description).

If more than 50 percent of the background data are less than the detection limit, a

nonparametric prediction limit will be computed. Poisson-based prediction limits are

available as an alternative method when greater than 90 percent of the background data

are less than the detection limit.

Nonparametric Prediction Limits:

Distribution:

When the background data are not transformed-normal, or greater than 50 percent of the

background data are less than the detection limit, there is an option to construct a

nonparametric prediction limit. The highest value from the background data is used as

the upper limit of the prediction limit. Minimums of 19 background samples are required

for a 5% false positive rate when comparing a single compliance observation (k=1) to the

prediction limit. Fewer than the required minimum background sample size will result in

an inflated false positive rate that can be computed as (1-(n/(n+k))). Since the highest

background value is always used as the upper prediction limit, the actual significance

level decreases with increasing background sample size. In the case of a two-tailed test,

the lowest value from the background data is used to set the lower limit of the prediction

limit.

The false positive rate is based upon the formula:

( )( )knn/1 +−

Where:

n = the background sample size; and

55

k = the number of future values being compared to the limit.

Poisson-Based Prediction Limit Procedure:

When the background data contain greater than 90 percent observations below the

detection level, Sanitas offers the option to construct a Prediction Limit based upon the

Poisson distribution.

Distribution:

The Poisson distribution is a probability distribution modeled for rare events. The

Poisson probability of a detectable observation is rare unless there is an impact.

The sum of the Poisson counts across background samples, Tn, is computed by adding

the number of parts per billion (ppb) across all observations for the background well(s).

Prior to any calculations, nondetects are set to one-half of the method detection limit

(MDL) and all trace values are evaluated as the average of the MDL and the practical

quantitation limit (PQL).

The 99% upper Poisson prediction limit is calculated as:

4

2z

c

11nTcz

2

2czncT

kT ++++=

Where:

c = k/n;

k = the number of future observations being compared to limit;

n = the background sample size;

Tn = the sum of the Poisson count of background samples; and

z = the upper 99% of the normal distribution.

The value k need not represent multiple samples from a single well. It could also denote a

collection of single samples from k distinct wells, all of which are assumed to follow the

same Poisson distribution in the absence of contamination.

To test the upper prediction limit, the Poisson count of the sum of the next k observations

from the downgradient well or the sum of the single observations from k distinct wells is

compared to the upper prediction limit. If this sum exceeds the prediction limit, there is

significant evidence of a downgradient impact. Should the exceedance occur for a sum of

observations from multiple wells, further investigation will be necessary to determine the

impacted well or wells.

56

EXAMPLE:

MW-1 (up) MW-2 (up) MW -3 (down)

<4 12 <4

<4 <4 6

<4 <4 <4

<4 <4 <4

Example Data for Poisson Prediction Limits

1k = 8n = .1258

1C ==

( ) 26222122222n

T =+++++++=

2.327.99

z =

( )( )

( ) 05.84

2327.2

125.

11261327.2125.

2

2327.2125.26125.

kT =++++=

Note: This test cannot be used for decimal values. When a Poisson analysis is attempted

on decimal data, Sanitas will advise you to change the units and to convert the

observations from parts-per-million to parts-per-billion (ppb) or ppb to parts-per-trillion.

Please note that units for all observations need to be consistent within a constituent.

Prediction Limits (or Intervals) with Retesting: EPA Unified Guidance

(UG) Standards

Description:

Prediction limits with retesting (referred to in this document as UG Prediction Limits) are

statistical intervals which include retesting strategies in order to achieve a low facility-

wide false positive rate while maintaining adequate statistical power to detect

contamination. The intervals are designed to contain K future sample(s) or sample

statistics (mean or median), with a specified probability, from a statistical population. If

any observation exceeds the prediction limit, this is statistically significant evidence that

the observation is not representative of the background group. While an overview of

these plans is provided in this section, the Unified Guidance provides detailed

explanations and recommendations for prediction limits with retesting.

Requirements:

Prior to constructing UG prediction limits, the user must select “Unified Guidance

Standards” under the Options menu. To specify the site configuration and resampling

plan, select Prediction Limit Set Up on the Analysis tab of the Configure Sanitas window.

57

Enter the number of statistical evaluation periods per year (nE), number of constituents

(c), and number of monitoring wells (w). The annual target facility-wide false positive

rate should be no greater than 10% (cumulative throughout the year). If a facility

samples semi-annually, for instance, the overall target rate is distributed evenly among

each sampling event for a 5% target rate (α = .10/2 = .05 = 5%). The individual test

alpha (α*) then equals the targeted per-event false positive rate divided by the total

number of statistical tests (r) in a given sampling event.

For example, a site which samples semi-annually for 15 constituents at 7 wells would

have the following per-test alpha levels:

Semi-annual target rate: α = .10/2 = .05 = 5%

Total # of tests: r = c ● w = 15 x 7 = 105

Per-test alpha level: α* = α/r = .05/105 = .0004

Resample Plans:

Complete the site configuration by specifying whether prediction limits will be

constructed based on future observations, means of order 2, or means of order 3. If

prediction limits will be constructed for future observations, a resample program must be

selected (1 of 2, 1 of 3, 1 of 4, or 2 of 4 Modified CA Plan). The first number in each of

the plans indicates how many resamples must pass the predicted limit in order to declare

an initial exceedance a false finding. The second number indicates the “total” number of

samples required (i.e. the initial sample plus all resamples). When the resample is within

its predicted limit, it should replace the exceeded value in any future statistical analyses.

For instance, the 1 of 3 plan means that when an initial exceedance is noted, two

resamples are collected and one of them must pass the limit in order to declare the initial

exceedance a false finding. The exceedance would then be retained in the data file, but

assigned a user-specified flag so that it may be easily deselected in future statistical

analyses.

The “means of order 2 and 3” resample programs require 4 or 6 independent

measurements from each well. For instance, the “means of order 2” requires collection of

two samples so that the mean may be calculated and compared to a background limit. If

the mean exceeds the prediction limit, two additional samples are averaged and compared

to the limit.

Assumptions:

The parametric prediction limit is constructed if the background data follow a normal or

transformed-normal distribution. A minimum of four background values are required to

construct the interval, however, generally eight or more background samples are

recommended. The estimate of the standard deviation (S) that is used should be an

unbiased estimator. The usual estimate assumes that there is only one source of variation.

If there are other sources of variation, such as time effects, or spatial variation in the data

used for the background, then the parametric prediction limit is inappropriate. In these

58

situations, a multivariate statistical procedure is suggested. For more information see the

Unified Guidance and/or consult with a professional statistician.

Distribution:

In order to determine whether a parametric or nonparametric prediction limit should be

used, the distribution of the data is evaluated by applying the Shapiro-Wilk or Shapiro-

Francia tests for normality to the raw data or, when applicable, to the ladder of powers

(Helsel & Hirsch, 1992) transformed data. The null hypothesis, Ho, to be tested is:

H0: The population has a normal (or transformed-normal) distribution.

The alternative hypothesis, HA, is:

HA: The population does not have a normal (or transformed-normal)

distribution.

UG Parametric Prediction Limits Procedure:

The mean, X , and the standard deviation, S, are calculated for the raw or transformed

background data. The per-evaluation facility-wide false positive rate is determined as

described above based on an annual target rate of .10 (αE = α/nE). The number of

statistical comparisons (r) for each evaluation period (r = the number of wells (w) times

the number of constituents (c) to be sampled at each well) is computed based on user

input. By default, the number of future samples to be compared against the prediction

limit equals one for each well.

Compute the upper prediction limit using kappa multiplier values (depending on the type

of prediction limit, resample program, and per-evaluation alpha level).

The interval is given by:

[ ]S PL X ×+= κ

Where:

X = average of background

κ = multiplier from Tables 19-1 through 19-18 (EPA Unified

Guidance, September 2009

S = standard deviation of background

EXAMPLE:

Background Values

240

220

240

220

59

210

200

220

220

240

230

240

230

Compliance Value

230

Example Data for Intrawell Parametric Prediction Limit

8.225=Χ 1.13=s 49.2=κ *

42.2581.1349.2225.8sPL X =×+=×+= κ

*The kappa multiplier value was based on the Intrawell Parametric Prediction Limit and

the 1 of 2 Plan. The site configuration included 10 constituents (c) and 5 wells (w)

analyzed semi-annually.

Censored data:

If less than 15 percent of the background observations are nondetects, these will be

replaced with one half of the method detection limit prior to running the analysis.

If more than 15 percent but less than 50 percent of the background data are less than the

detection limit, the data’s sample mean and sample standard deviation are adjusted

according to the method of Cohen or Aitchison (see Control Charts for method

description).

If more than 50 percent of the background data are less than the detection limit, a

nonparametric prediction limit will be computed.

Nonparametric Prediction Limits:

Distribution:

When the background data are not transformed-normal, or greater than 50 percent of the

background data are less than the detection limit, there is an option to construct a

nonparametric prediction limit. The highest or second highest value from the background

data may be specified in the prediction limit set-up window and used as the upper limit of

the prediction limit. The alpha level for each test is based on the background number (n)

and the number of wells (w), and may be obtained from Tables 19-19 through 19-24 of

the Unified Guidance.

Using ANOVA to Improve Parametric Intrawell Prediction Limits

60

When using intrawell tests, parametric tests are generally preferred over nonparametric

tests because the individual test false positive rate is “fixed” at 1% or 5%, for example,

prior to construction of the statistical limits. The false positive rate associated with

nonparametric tests, however, is dependent upon the number of background samples

available (for instance, 19 background samples are required to achieve a 5% false

positive rate). When limited background data are available, individual tests result in poor

statistical power to detect when contamination is present, as well as an unacceptably high

false positive rate.

The Unified Guidance provides an alternative method for nonparametric limits that was

first suggested by Davis (1998). This method increases the degrees of freedom of an

individual test by using results from the one-way ANOVA from a number of wells to

provide an alternate estimate of the average intrawell variance. For a parametric

intrawell prediction limit, the well-specific mean ( X ) is computed based on the intrawell

background sample size of n. The root mean squared error (RMSE) component of the

ANOVA test is used to replace the intrawell standard deviation (s). This raises the

degrees of freedom from (n-1) to (N-p), where N is the total sample size across the group

of wells and p is the total number of wells.

Assumptions:

The ANOVA method requires within-well variability to be approximately the same for

all wells, and that any transformations applied to data in order to fit data to a normal

distribution be appropriate for and applied to all wells. The F-test provided in the

ANOVA test may be used to determine whether variability is similar among wells.

When the calculated F statistic of the ANOVA exceeds the tabulated F statistic, evidence

suggests variability is not similar among wells, and therefore, this method is not

recommended.

Procedure:

Select EPA standards* (Options/EPA Standards), choose a constituent, and perform the

one-way ANOVA with all upgradient and downgradient wells selected. A resulting

parametric ANOVA is required in order to continue with the alternate method. When a

nonparametric ANOVA results, this may be an indication that variability for that

constituent is not similar among wells. Using results obtained from the parametric

ANOVA, note any data transformation that was applied to all wells in order to pass the

test of normality and/or equal variances test. The ANOVA will provide the degrees of

freedom and the RMSE (which may be found on the ANOVA table under the Mean

Squares Error Within Wells). Under Configure Sanitas/Prediction Limit tab, enter the

RMSE into the box titled “Override Standard Deviation” and the degrees of freedom in

the box titled “Override Degrees of Freedom”. These numbers are then substituted into

the prediction limit equation.

61

*In a future version, an override for the kappa for use with retesting (and so UG Standards) may be available. For now,

consistent with the example in the 2008 Draft Unified Guidance, this method is recommended for use with “1 of 1”

plans, which use a t value instead of kappa. In order to use the t instead of kappa multipliers, you must be in EPA

Standards.

Example:

Using results obtained from the ANOVA, assume that the log transformation was applied

to the iron data below. This example demonstrates how the statistical limits differ from

the unadjusted parametric prediction limit when the method is applied. Compute the

unadjusted parametric intrawell limit (as described above in the Prediction Limit section)

utilizing a 99% confidence level, background of n = 4 (i.e. background sample size of

individual wells), and t1-α, n-1 = t.99,3 = 4.541 in the following equation:

The ANOVA test provides the adjusted degrees of freedom as p(n-1) = 6(3) = 18, and the

RMSE as .5079, and the following substitutions are made:

Unadjusted 99% Prediction Limits for Iron (ppm)

Well 1 Well 2 Well 3 Well 4 Well 5 Well 6

Log-mean 3.820 3.965 4.348 4.188 4.802 5.000

Log-SD

0.296 0.395 0.658 0.453 0.704 0.396

N 4 4 4 4 4 4

t.99,3 4.541 4.541 4.541 4.541 4.541 4.541

99% PL 204.9 391.6 2183.0 657.0 4341.5 1108.1

+

−−+=

n

11

1,1ysexp

-1y

ntPL

αα

62

When comparing the unadjusted versus the adjusted intrawell limits, note that the

adjusted statistical limits are considerably lower. By estimating the standard deviation

from all wells using the ANOVA, the adjusted limits result in lower and more powerful

statistical limits.

Interwell VOC Screening

Description:

Note 1: this functionality may also be used to run a simple "Intrawell" screening, in which

detected values are reported for selected constituents and wells on the selected dates. The

remainder of this section will deal with the Interwell method, also known as the “California

Non-Statistical Analysis”.

Note 2: constituents can be automatically selected/deselected in this window based on the file

<sanitas>\util\not_VOC.txt. This file is editable, and contains instructions for its use. The

operation is initiated through the Selections>> button in the lower-right-hand corner of the

View window (the button is visible when MULTIPLE CONSTITUENTS>> is the selected

constituent).

The California Non-Statistical Analysis method is an interwell or intrawell test that may

be used to analyze constituents that have less than ten percent detectable observations. A

separate variant of this test is used for qualifying constituents of concern (COCs).

Regardless of the test variant used, the method involves evaluating whether downgradient

constituent values meet either of the test’s two possible triggering conditions.

Adjusted 99% Prediction Limits for Iron (ppm)

Well 1 Well 2 Well 3 Well 4 Well 5 Well 6

Log-mean 3.820 3.965 4.348 4.188 4.802 5.000

RMSE 0.579 0.579 0.579 0.579 0.579 0.579

df 18 18 18 18 18 18

t.99,18 2.552 2.552 2.552 2.552 2.552 2.552

99% PL 194.3 224.6 329.4 280.7 518.7 632.3

63

Assumption:

The background samples have less than ten percent detectable values for the given

parameters. This assumption is automatically enforced in the case of interwell analysis.

The intrawell case is more flexible, but requires the user to specify which constituent/well

pairs will be analyzed. For CA intrawell use, it is recommended that the View be

restricted to those Constituent/Well pairs containing <10% detects (for example,

Selections->Uncheck All, and then Selections->Check Where->Constituent/Well Pair->Is

Detect->Less than 10%) and then can be further restricted by removing cases that will be

analyzed statistically or via the interwell non-statistical approach. This View (which can

be saved specifically for this purpose) is then used to control the data included in

subsequent intrawell VOC analyses.

Procedure:

In the interwell case, the background well observations are checked to determine which

VOCs have less than ten percent detectable values, i.e. are eligible for the Non-Statistical

test. VOCs that have greater than or equal to ten percent detectable values must be

analyzed with a statistical analysis and are referred to as “orphans”.

Of the VOCs that are eligible for a non-statistical analysis (or for all selected constituents

and wells in the intrawell case) the compliance data are checked for the presence of either

three* VOCs exceeding their method detection limit or one VOC exceeding its practical

quantitation limit.

When either of the two possible triggering conditions has been met, VOC contamination

is suspected and a verification retest is indicated (see Verification Retest Procedure

section).

*This value can be user-adjusted in the .ini file.

Verification Retest Procedure – California

The following verification procedure is intended to meet the special performance

standards under Subsection 2550.7(e)(8)(E) in addition to the statistical performance

standards under Subsection 2550.7(e)(9) for detection monitoring.

The proposed verification procedure consists of discrete retests, in which rejection of the

null hypothesis for any one of the retests will be considered confirmation of significant

evidence of an impact. The discrete retest consists of collecting two new suites of

samples for the constituent(s) exceeding the concentration limit from the indicating

monitoring points.

The statistical test method used to evaluate the retest results will be the same as the

method used in the initial statistical comparison. For the original indication to be ignored,

both new analyses must contradict the original indication.

In the case of a Non-Statistical VOC analysis retest, two discrete samples are taken from

the suspected well(s) and a VOC suite chemical analysis is performed to identify

detectable constituents. The same triggering conditions hold for the retest as for the

64

original test; however, the parameters triggering a significant finding may be different

than those triggering the original indication.

Intrawell ASTM Approach (ASTM Standards Only)

This intrawell approach to detection monitoring is described in the Standard Guide for

Developing Appropriate Statistical Approaches for Ground-Water Detection Monitoring

Programs D 6312-98.

Censored Data:

If less than 75 percent of the observations are nondetects, an Intrawell Shewhart-CUSUM

Control Chart will be used. All nondetects will be replaced with the quantification limit

prior to running the analysis. If there are multiple detection limits, the median

quantification limit will be used.

If more than 75 percent but less than 100 percent of the data are less than the detection

limit, an Intrawell Poisson Prediction limit will be computed unless a sufficient number

of data points are available to compute an Intrawell Nonparametric Prediction limit that

will provide 99% confidence.

If 100 percent of the data are less than the detection limit, a Nonparametric Prediction

Limit or a Poisson Prediction Limit will be computed, depending on user selection.

Distribution:

If less than 75 percent of the observations are nondetects, the distribution of the data is

evaluated by applying the Shapiro-Wilk or Shapiro-Francia test for normality to the raw

data or, when applicable, to the transformed data. For a description of both the Shapiro-

Wilk and Shapiro-Francia tests please see the Distribution subsection of the Control

Chart Section.

If the distribution of the data is not found to be Normal, you can continue to run a

Shewhart-CUSUM Control Chart in ASTM Standards.

Seasonality:

Prior to constructing the Control Charts, the significance of data seasonality is evaluated

using the nonparametric Kruskal-Wallis test (U.S. EPA, April 1989). For a description,

please see earlier subsection on Seasonality under the Control Chart section.

When seasonality is known to exist, the data are deseasonalized prior to constructing

Control Charts in order to take into account seasonal variation rather than mistaking

seasonal effects for evidence of contamination. The data are deseasonalized using the

method described by EPA (U.S. EPA, April 1989). For a description, please see earlier

subsection on “Correcting for Seasonality” under the Control Chart Section.

65

Outliers:

To remove the possibility of either a high or low outlier in the historical data set, the

historical data are screened for the existence of outliers. See subsection “Outlier

Procedure” under the Descriptive Statistics Section for a method description. Note that if

the user has manually flagged values with an "O" (or "o") then the outlier test will not be

run, and the manually flagged outliers will instead be treated as confirmed outliers.

Existing Trends:

Prior to constructing a control chart, the background data are tested for the existence of

trends. If any trend exists (positive or negative) Sanitas will not run a control chart. The

ASTM Provisional Standards restrict trend testing to increasing trends. Sanitas tests for

both increasing and decreasing trends to prevent the possibility of a significant trend

confusing the statistical results. Both increasing and decreasing trends may lead to

inflated control limits. The provisional ASTM standards state that when significant

trends in background are present and these trends are not due to an impact, that an

alternative indicator constituent may be required for that well or all wells at the facility.

The Mann-Kendall test is used to test for significant trends in the background data. For a

method description please see the “Trend Analysis” subsection of the Evaluation

Monitoring Section.

Control Chart Procedure:

This procedure for construction of the Shewhart-CUSUM Control Chart follows the

ASTM recommendations (1996). The Shewhart-CUSUM Control Chart requires a

minimum of eight historical data points in order to reliably determine the mean and

standard deviation for each constituent’s concentration in a given well.

Three parameters are selected by the system prior to plotting:

h = the control limit to which the cumulative sum values (CUSUM) are

compared. ASTM (1996) recommends the value h = 4.5 units of

standard deviation for a background n < 12. When the background n >

12 the h is adjusted to = 4.0.

SCL = the upper Shewhart Control Limit to which the standardized mean

will be compared. ASTM (1996) recommends a value of SCL = 4.5

when background n < 12. When the background n > 12 ASTM

recommends SCL = 4.0.

c = a parameter related to the displacement that should be quickly

detected. ASTM (1996) recommends c = 1 for background n < 12. For

background n > 12, ASTM recommends c = 0.75.

The Shewhart CUSUM Control Chart is constructed as the method description describes

in the “Control Chart Procedure” section.

The results are plotted in their original metric units rather than standard deviation units.

For background, sample sizes less than 12:

66

4.5sSCLh +== X

For background sample sizes greater than or equal to 12:

4.0sSCLh +== X

and the Si are converted to the metric concentration by the transformation:

X+∗si

S

Censored Data:

If less than 75 percent of the background data are less than the quantification limit, the

data’s sample mean and standard deviation are adjusted according to the method of

Cohen or Aitchison. Please see previous section for a description of Cohen’s and

Aitchison’s adjustment.

If more than 75 percent of the background data are less than the quantification limit, a

nonparametric prediction limit will be computed. As an option to the nonparametric

prediction limit, a Poisson-based prediction limit may be computed.

67

Figure 8.1:Intrawell ASTM Standards

68

Figure 8.2: Intrawell ASTM Standards (Cont’d)

69

Figure 8.3: Intrawell ASTM Standards (Cont’d)

70

Interwell ASTM Approach (ASTM Standards Only)

This Interwell approach to detection monitoring is described in the Standard Guide for

Developing Appropriate Statistical Approaches for Ground-Water Detection Monitoring

Programs D 6312-98.

Distribution:

The distribution of the data is evaluated by applying the multiple group version of the

Shapiro-Wilk test for normality to the raw data or, when applicable, to the log

transformed data.

The null hypothesis, H0, to be tested is:

H0: The population has a normal (or transformed-normal) distribution.

The alternative hypothesis, HA, is:

HA: The population does not have a normal (or transformed-normal)

distribution.

Multiple Group Version Shapiro-Wilk Procedure:

The multiple group version of the Shapiro-Wilk test takes into consideration that

upgradient measurements are nested within different upgradient monitoring wells.

First, calculate the Shapiro-Wilk W-statistic (see prior section for method description) for

each compliance well and denote as Wi. Calculation of the multiple group version of the

Shapiro Wilk G-statistic to test the null hypothesis is presented in detail in

Technometrics, 10 (Wilk, Shapiro, 1968).

For sample size Ni, ≥ equal to seven, calculate G

i for each well. G

i is the percentage

point of the standard normal distribution corresponding to α α α αi. Under the null

assumptions, the quantities G1,...,G

K may be considered to be a random sample from a

standard normal distribution:

iW1

εi

Wδln

iG

−+= γ

Where the values γγγγ, δδδδ, εεεε are given in the Shapiro-Wilk (1968) table.

For sample sizes between three and six, use the value for Gi obtained from Table 2 of

Shapiro-Wilk (1968) by linear interpolation on the tabulated quantities:

71

−=

iW1

εi

Wln

iu

Then, compute G, the normalized value of Gi:

( )K

G2

G1

GK

1G +++=

K

Where:

K = number of wells.

Refer the normalized mean, G, to a standard table of the normal integral. If the

probability of G is greater than .01, accept the null hypothesis that the population has a

normal (or transformed normal) distribution.

Outliers:

To remove the possibility of either a high or low outlier in the historical data set, the

historical data are screened for the existence of outliers. See subsection “Outlier

Procedure” under the Descriptive Statistics Section for a method description. Note that if

the user has manually flagged values with an "O" (or "o") then the outlier test will not be

run, and the manually flagged outliers will instead be treated as confirmed outliers.

Censored data:

If less than 50 percent of the background data are less than the detection limit, the data’s

sample mean and sample standard deviation are adjusted according to the method of

Aitchison or Cohen. The use of Cohen’s or Aitchison’s adjustment is a user-selected

option. The user has the choice to select between these two approaches for adjusting non-

detects. The U.S. EPA (1992) provides a useful approach to help select which method to

use.

If more than 50 percent of the background data are less than the detection limit, a

nonparametric prediction limit will be computed. As an option to the nonparametric

prediction limit, a poisson-based prediction limit may be computed.

Parametric Prediction Limit Procedure:

The mean, X , and the standard deviation, S, are calculated for the raw or transformed

background data. Then the interval is given by:

n

11S

α1,ntX +

−+

if the data are normal, and the interval is given by:

72

+

−+

n

11ys

α1,ntyexp

if the data are found to be lognormal.

Where:

αααα = false positive rate for each individual test;

n = the number of observations in the background data; and

t(n-1, αααα)

= one-sided (1- α) upper percentage point of Student’s t distribution on

n-1 degrees of freedom

Select α as the minimum of 0.01 or one of the following:

1) Pass the first or one of one verification resamples:

21

k1

0.951α

−=

2) Pass the first or one of two verification resamples:

31

k1

0.951α

−=

3) Pass the first or one of three verification resamples:

2

1k

1

0.951α ∗−=

Where:

K = number of comparisons (monitoring wells times constituents).

For a two-tailed test, t(n-1,α/2)

is substituted for t(n-1, α)

in the above formula. Statistically

significant evidence of an impact is noted when compliance observations fall outside the

bounds of the upper or the lower prediction limits.

When a modified alpha, α *, is computed, t(n-1,K,1-α*)

will be substituted for

73

t(n-1, K, (1-α))

in the above formula.

Nonparametric Prediction Limit Procedure:

When the background data are not transformed-normal or contain greater than 50 percent

of the observations below the detection limit, Sanitas will automatically construct a

nonparametric prediction limit. The highest value from the background data is used to set

the upper limit of the prediction limit. In the case of a two-tailed test, the lowest value

from the background data is used to set the lower limit of the prediction limit. If the

background data contain 100 percent non-detects, the prediction limit is equal to the

median quantification limit. The false positive rate is based upon the background sample

size and the number of compliance points being compared to the limit. The site-wide

false positive rate, γγγγ, is given in Table 2 (Gibbons, R.D., 1991). The minimum sample

size for a false positive rate equal to 1 percent for a single well and one resample is 13.

Poisson-Based Prediction Limit Procedure:

When the background data contain greater than 50 percent observations below the

detection level, you may choose to construct a prediction limit based upon the Poisson

distribution. Poisson prediction limits will be utilized for those cases in which there are

too few background measurements to achieve an adequate site wide false positive rate

using the nonparametric approach.

Distribution:

The Poisson distribution is a probability distribution modeled for rare events. The

Poisson probability of a detectable observation is rare unless there is an impact.

Procedure:

The sum of the Poisson counts across background samples, y, is computed by adding the

number of parts per billion (ppb) across all observations for the background well(s). Prior

to any calculations, nondetects are set to the median method detection limit (MDL) and

all trace values are evaluated as the median practical quantitation limit (PQL).

The 99% upper Poisson prediction limit is calculated as:

( )4

2zn1y

n

z

2n

2z

n

y++++

Where:

y = the sum of the detected measurements or the quantification limit for

those samples in which the constituent was not detected;

n = the background sample size; and

74

z = the (1- α) 100 upper percentage point of the normal distribution

(where α is computed as in the section on parametric prediction limits).

Note: This test cannot be used for decimal values. When a Poisson analysis is attempted

on decimal data, Sanitas will advise you to change the units and to convert the

observations from parts-per-million to parts-per-billion (ppb) or ppb to parts-per-trillion

by multiplying them by 1000. For example, 0.001 ppm should be converted to 1 ppb in

the data spreadsheet, or by using Alternate Values in the View. If you are editing the

data file, please note that units for all observations need to be consistent within a

constituent.

Transform Data

While viewing a single constituent in the View window, you can temporarily correct for

inappropriate units by right-clicking in the Examine Observations Panel and choosing

Alternate Values/Transform Alternate Values/Multiply… from the context menu.

For example, 0.001 ppm should be converted to 1 ppb. In this case, you would multiply

by 1000. The transformed data will be displayed in the Alternate Value column, and

may be used in the analysis by selecting the radio button next to that column header This

provides transformed data in the View, but does not directly affect the original data file

(nor does it affect the units description, which is a disadvantage of this method).

75

Figure 8.4: Interwell ASTM Standards

76

Figure 8.5: Interwell ASTM Standards (Cont’d)

77

Evaluation Monitoring Statistics

Trend Analysis

Description and Procedure:

A trend is the general increase or decrease in observed values of some random variable

over time. A trend analysis can be used to determine the significance of an apparent trend

and to estimate the magnitude of that trend. The Mann-Kendall test for temporal trend

(Hollander & Wolfe, 1973) and Sen’s slope estimate (Gilbert, 1987) were chosen for the

site evaluation (or assessment) monitoring program to evaluate the correlation of selected

constituent concentrations with time.

The Mann-Kendall test is nonparametric, meaning that it does not depend on an

assumption of a particular underlying distribution. The test uses only the relative

magnitude of data rather than actual values. Therefore, missing values are allowed, and

values that are recorded as non-detects by the laboratory can still be used in the statistical

analysis by assigning values equal to half their detection limits (Gilbert, 1987).

The null hypothesis, H0, to be tested is:

H0: No significant trend of a constituent exists over time.

The alternative hypothesis, HA, is:

HA: A significant upward (or downward) trend of a constituent concentration

exists over time.

For groups having fewer than 41 data points, an exact test is performed. If 41 or more

data points are available, the normal approximation test is used (Gilbert, 1987).

- Exact Test (n <= 40):

The Mann-Kendall method assigns a positive or negative score based on the differences

between the data points. The first step is to list the data in the order in which they were

collected over time, and then determine the sign of all possible differences xj - xk, where

j > k:

kx

jxsgn = 0

kx

j xif 1 >−

= 0k

xj

xif 0 =−

= 0k

xj

xif 1 <−−

Where:

xj = the value of the jth observation; and

78

xk = the value of the kth observation.

The Mann-Kendall statistic, S, is then computed, which is the number of positive

differences minus the number of negative differences.

∑−

=∑

+=−=

1n

1k

n

1kj kx

jxsgnS

Where:

n = the total number of observations.

If S (noted on the plot as the Mann-Kendall Statistic) is a large positive number,

measurements taken later in time tend to be larger than those taken earlier, i.e., an upward

trend. Similarly, if S is a large negative number, measurements taken later in time tend to

be smaller, i.e., a downward trend.

For a two-tailed test to detect either an upward or downward trend, the tabulated

probability level corresponding to the absolute value of S (Gilbert, 1987) is doubled and

H0 is rejected if that doubled value is less than or equal to the a priori α significance level

of the test. In other words, the Mann-Kendall Statistic (S) is compared to the Critical

value (or threshold for accepting H0) on the plot, and a trend is statistically significant if

the absolute value of S is greater than the tabulated Critical Value.

A minimum of 4 samples is required to perform the test. However, with a sample size of

only 4, the only meaningful information that can be obtained from this test is the value of

the Sen's slope, which gives the average rate of change in concentration over time. The

Mann Kendall test for significance of trend will always indicate no trend at the 95%

confidence level since the largest possible value of the test statistic, S, with four data

points is 6, and that value is not significant at any available alpha level.

- Normal Approximation Test (n > 40):

The Mann-Kendall test statistic, S, is calculated using the same method of the exact test.

When there are no tied values, the variance of VAR(S) is computed:

18

5)1)(2nn(nVAR(S)

+−=

S and VAR(S) are then used to compute the test statistic, Z, as follows:

79

[ ]

[ ]

0S if

VAR(S)

1SZ

0S if0Z

0S if

VAR(S)

1SZ

2

1

2

1

<+

=

==

>−

=

When tied values (data points having equal values) are present, the variance of S is

computed:

+−∑

=−+−= 5)p1)(2tp(t

g

1ppt5)1)(2nn(n

18

1VAR(S)

Where:

g = the number of tied groups; and

tp = the number of observations in the pth group.

To test for an upward or a downward trend (a two-tailed test), a level of significance, α ,

must first be chosen. The level of significance is the probability of rejecting the null

hypothesis, (Ho) no trend, when no trend actually exists (Type I error). In general, α is

chosen to be 0.05. The split Type I error probability, or α / 2, for a two-tailed test is

then 0.025.

The Z-value associated with the 0.025 significance level is 1.96, from Table A-1

(Hollander and Wolfe, 1973), corresponding to an α -level of 0.05, 95 percent (1-α ) of

the area under the normal curve lies between -Zα = -1.96 and Zα = 1.96.

A positive or negative value of Z can indicate an upward or downward trend,

respectively. With an α -value of 0.05, any Z-value above 1.96 indicates a statistically

significant upward trend, and any value below -1.96 indicates a statistically significant

downward trend. In such cases, the Ho of no trend would be rejected. For values, which

fall between -1.96 and 1.96, the null hypothesis cannot be rejected.

To reject H0, the probability corresponding to the Z-value must be less than the specified

α -value. The smaller the probability value, the greater the likelihood that a trend is

occurring and the greater the likelihood the constituent concentration (the dependent

variable) is an increasing or decreasing function of time.

80

Sen’s Slope Estimator

Description:

This simple nonparametric procedure was developed by Sen (1968) and presented in

Gilbert (1987) to estimate the true slope. The advantage of this method over linear

regression is that it is not greatly affected by gross data errors or outliers, and can be

computed when data are missing.

The N′ individual slope estimates, Q, are computed for each time period:

ii'i

X'i

XQ

−=

Where:

ii X and X′

= the data values at time i′ and i (in days), respectively, i′ ’> I; and

N′ = the number of data pairs for which i′ > i.

A value of one half of the detection limit will be substituted for Xi values below the

detection limit.

Sen’s Slope estimator is the median slope, obtained by ranking the N′ values of Q from

smallest to largest, and choosing the middle-ranked slope as follows.

( )[ ] odd is N' if/21nnN'Q −=

even is N' if/22N'

Q/2N'

Q2

1

++

Where:

n = the number of time periods.

This value is multiplied by 365 to give the yearly slope value.

81

EXAMPLE:

Time Period

Data

1

10

1

22

1

21

2

30

3

22

3

30

4

40

5

40 NC NC +20 +6 +10 +10 +7.5

NC +8 0 +4 +6 +4.5

+9 +.5 +4.5 +6.33 +4.75

-8 0 +5 +3.33

NC +18 +9

+10 +5

0

Example Data for Sen’s Slope

N’ = 24

Q (slope) values ranked from smallest to largest:

-8, 0, 0, 0, 0.5, 3.33, 4, 4.5, 4.5, 4.75, 5, 5, 6, 6, 6.33, 7.5, 8, 9, 9, 10, 10, 10, 18, 20

The median of these Q values is the average of the 12th and 13th largest values, 5 and 6.

The Sen estimate of the true slope is 5.5.

Confidence Interval around Trend Line

The option exists to construct a non-parametric confidence interval or “band” around the

trend line, as described in the Unified Guidance.

Procedure:

Step 1. Given the original sample of n measurements, form a sample of n pairs (ti, xi),

where each pair consists of a sample date (ti) and the concentration measurement from

that date (xi).

Step 2. Form B bootstrap samples by repeatedly sampling n pairs at random with

replacement from the original sample of pairs in Step 1. Set B = 500.

Step 3. For each bootstrap sample, construct a Sen’s Slope estimate. Denote each of

these B trend lines as a bootstrap replicate.

Step 4. Determine a series of equally spaced time points (tj) along the range of sampling

dates represented in the original sample, j = 1 to m. At each time point, use the Sen’s

trend line associated with each bootstrap replicate to compute an estimated concentration.

There will be B such estimates at each of the m equally-spaced time points when this step

is complete.

Step 5. Given a confidence level (1–α ) to construct a two-sided confidence band,

determine the lower (α /2)th and the upper (1–α /2)th percentiles from the distribution of

estimated concentrations at each time point (tj). The collection of these lower and upper

82

percentiles along the range of sampling dates (tj, j = 1 to m) forms the bootstrapped

confidence band.

Seasonal Kendall Test

Description:

The Seasonal Kendall Test is an extension of the Mann-Kendall test that removes

seasonal cycles and tests for trend.

Seasonal Kendall Procedure:

Compute the Mann-Kendall statistic, S, for each season. Let Si denote this statistic for

the ith season, that is:

∑ ∑−

= +=

−=1n

1k

n

1kl

ikil

i i

)xsgn(xSi

Where l > k, ni is the number of data for season i, and:

0 x-x if -1

0 x-x if 0

0 x-x if 1)xsgn(x

ikil

ikil

ikilikil

<=

==

>=−

VAR(Si) is computed as follows:

1)(n2n

1)(uu1)(tt

2)1)(n(n9n

2)1)(u(uu2)1)(t(tt

5)1)(2u(uu5)1)(2t(tt5)1)(2n(nn18

1)VAR(S

ii

g

1p

h

1q

iqiqipip

iii

g

1p

h

1q

iqiqiqipipip

g

1p

h

1q

iqiqiqipipipiiii

i ii i

i i

−−

+−−

−−−−

+

+−−+−−+−=

∑ ∑∑ ∑

∑ ∑

= == =

= =

Where:

gi = The number of groups of tied data in season I;

tip = The number of tied data in the pth group for season I;

hi = The number of sampling times(or time periods) in season i that

contain multiple data; and

uiq = The number of multiple data in the qth time period in season i.

After Si and VAR(Si) are computed, we pool across the K seasons:

∑=

=K

1i

iSS'

and

83

∑=

=K

1i

i )VAR(S)VAR(S'

Next compute:

[ ]0 S' if

)VAR(S'

1)(S'Z

0 S' if 0 Z

0 S' if )][VAR(S'

1)(S'Z

1/2

1/2

<+

=

==

>−

=

For a two tailed test, we reject Ho of no trend if the absolute value of Z is greater than Z1-

α/2. Sanitas tests at the 80%, 90% and 95% confidence levels.

Seasonal Kendall Slope Estimator Procedure:

First compute individual Ni slope estimates for the ith season:

kl

xxQi ikil

−=

Where:

xil = The datum for the ith season of the lth year; and

xik = The datum for the ith season of the kth year, where l > k.

Do this for each of the K seasons. Then rank the N’1 + N’2 + …+ N’K = N’ individual

slope estimates and find their median. This median is the seasonal Kendall slope

estimator.

Compliance or Corrective Action Monitoring Statistics

Confidence Intervals

Description:

A Confidence Interval is constructed from sample data and is designed to contain the

mean concentration of a well analyte in ground water monitoring, with a designated level

of confidence. A Confidence Interval generally should be used when specified by permit

or when downgradient samples are being compared to the maximum concentration limit

84

(MCL) or alternate concentration limit (ACL). In this situation, the MCL or ACL is a

specified concentration limit or determined by the background concentrations.

Assumptions:

The sample data used to construct the intervals must be normally or transformed-

normally distributed. In the case of a transformed-normal distribution, the Confidence

Interval must be constructed on the transformed sample concentration values. In addition

to the interval construction, the comparison must be made to the transformed MCL or

ACL value. When none of the transformed models can be justified, a nonparametric

version of each interval may be utilized. If the entire Confidence Interval exceeds the

compliance limit, there is statistically significant evidence that the mean concentration

exceeds the compliance limit.

Distribution:

The distribution of the data is evaluated by applying the Shapiro-Wilk or Shapiro-Francia

test for normality to the raw data or, when applicable to the Ladder of Powers (Helsel &

Hirsch, 1992) transformed data.

The null hypothesis, H0, to be tested is:

H0: The population has a normal (or transformed-normal) distribution.

The alternative hypothesis, HA, is:

HA: The population does not have a normal (or transformed-normal)

distribution.

Censored Data:

If less than 15 percent of the observations are nondetects, these will be replaced with one

half the method detection limit prior to running the normality test and constructing the

Confidence Interval.

If more than 15 percent but less than 50 percent of the data are less than the detection

limit, the data’s sample mean and standard deviation are adjusted according to the

method of Cohen or Aitchison (U.S. EPA, April 1989). This adjustment is made prior to

construction of the Confidence Interval.

If more than 50 percent of the data are less than the detection limit, these values are

replaced with one half the method detection limit and a nonparametric Confidence

Interval is constructed.

Parametric Confidence Interval Procedures:

A minimum of four sample values is required for the construction of the parametric

Confidence Interval. The mean, X , and standard deviation, S, of the sample

concentration values are calculated separately for each compliance well (monitoring

point). For each well, the Confidence Interval is calculated as:

85

n

S1)nα,(1

tX−−

±

Where:

S = the compliance point’s standard deviation;

n = the number of observations for the compliance point; and

t(1-αααα, n-1) = is obtained from the Student’s t-Distribution found in Table 6

(Appendix B; U.S. EPA, April 1989) with (n -1) degrees of freedom.

Depending on the desired level of confidence, for instance 99% (1-α), and the sample

size n, the t-value is obtained from the Student’s t-table (e.g. t(0.99, n-1)). If the lower

end of the interval is above the compliance limit, then the mean concentration must be

significantly greater than the compliance limit, indicating noncompliance.

For a two-tailed test, t(1-α/2, n-1) will be substituted for t(1-α, n-1) in determining the

confidence interval. When the lower limit exceeds the upper compliance limit or the

upper limit falls below the lower compliance limit, there is statistically significant

evidence of noncompliance.

Note: when the Corrective Action option is selected, the interval is unchanged, but the

triggering condition changes: the situation in which the interval overlaps the upper

compliance limit now also is considered an exceedance.

EXAMPLE:

Date Well#3

1/1/1988 10

4/1/1988 2.5

10/1/1988 16

4/1/1989 15

7/1/1989 8

10/1/89 15

1/1/90 21

Example Data for Parametric Confidence Interval

12.5X = 6.103s = 7n =

3.143.99,6

t =

75.197

6.1033.14312.5 Limit Upper =∗+=

86

25.57

6.1033.143-12.5 Limit Lower =∗=

Nonparametric Confidence Interval Procedure:

The Nonparametric Confidence Interval in Sanitas is an interval around the median, at the

98% confidence level (i.e. 1% for each tail). The procedure requires at least seven

observations in order to obtain a one-sided significance level of 1 percent. The

observations are ordered from smallest to largest and ranks are assigned separately within

each well (monitoring point). In prior versions, average ranks were assigned to tied

values. Current statistical guidance indicates that each value should be assigned a

separate unique rank, meaning that tied values are ranked as if they differed slightly. The

critical values of the order statistics are determined as follows.

If the minimum seven observations are used, the critical values are the first and seventh

values.

Otherwise, the smallest integer, M, is found such that the cumulative binomial

distribution with parameters n (sample size) and probability of success, p = 0.5 is at least

0.99.

The exact confidence coefficients for sample sizes from 4 to 11 are given by the EPA

(Table 6-3; U.S. EPA, April 1989). For larger samples, take as an approximation the

nearest integer value to:

4n)-(1

Z12

nM

α++=

Where:

Z(1-αααα) = the 1-α percentile from the normal distribution found in Table 4

(Appendix B; U.S. EPA, April 1989); and

n = the number of observations in the sample.

Once M has been determined, (n+1-M) is computed and the confidence limits are taken

as the order statistics, X(M) and X(n+1-M). These confidence limits are compared to the

compliance limit. If the lower limit, X(M), exceeds the compliance limit, there is

statistically significant evidence of non-compliance. Otherwise, the well remains in

compliance.

EXAMPLE:

Date Well#1

12/1/1987 .5325

4/13/1988 .825

5/11/1988 .26

6/2/1988 .32

87

10/1/1988 .39

1/01/1989 .515

5/01/1989 .08

9/01/1989 .025

3/01/1990 .022

Example Data for Nonparametric Confidence Interval

2.327.99

Z9n ==

8.994

92.3271

2

9M =∗++=

.825X(9)Limit Upper ==

.022X(1)9)-1X(9Limit Lower ==+=

For a two-tailed test, Z0.995 will be substituted for Z0.99 in deriving M. If the upper limit,

X(n+1-M), falls below the lower compliance limit, or the lower limit exceeds the upper

compliance limit, there is statistically significant evidence of non-compliance.

Tolerance Intervals

Description:

In compliance monitoring, the Tolerance Interval is calculated on the compliance point

data, so that the upper one-sided tolerance limit may be compared to the appropriate

ground water protection standard (i.e., MCL or ACL). If the upper tolerance limit

exceeds the fixed standard, and especially if the tolerance limit has been constructed to

have an average coverage of 95 percent, there is significant evidence that as much as 5

percent or more of all the compliance well measurements will exceed the limit.

Assumptions:

The sample data used to construct the intervals are assumed to be normally or

transformed-normally distributed. In the case of a transformed-normal distribution, the

Tolerance Interval must be constructed on the transformed sample concentration values.

In addition to the interval construction, the comparison must be made to the transformed

MCL or ACL value. When neither the normal nor transformed models can be justified, a

nonparametric version of each interval may be utilized.

Censored Data:

If less than 15 percent of the observations are nondetects, these will be replaced with one-

half of the method detection limit prior to running the normality test and constructing the

Tolerance Interval.

88

If more than 15 percent but less than 50 percent of the data are less than the detection

limit, the data’s sample mean and standard deviation are adjusted according to the

method of Cohen or Aitchison (U.S. EPA, April 1989). This adjustment is made prior to

construction of the Tolerance Interval.

If more than 50 percent of the data are less than the detection limit, these values will be

replaced with one half the method detection limit and a nonparametric Tolerance Interval

may be constructed.

Parametric Tolerance Intervals Procedure:

A minimum of four sample values is recommended for the construction of Tolerance

Intervals. The Shapiro-Wilk or Shapiro-Francia test for normality (see Control Chart for

method description) is used to determine if the sample values are normally or

transformed-normally distributed. The mean, X , and the standard deviation, S , are

computed separately for each compliance well’s data. The factor, K, is determined for the

sample size, n, from Table 5 (Appendix B; U.S. EPA, April 1989). The Tolerance

Interval is computed as:

[ ]KS0,X +

Where:

X = the mean for the compliance observations;

K = the factor obtained for sample size, n, from Table 5 (Appendix B; U.S.

EPA, April 1989); and

S = the standard deviation of the compliance observations.

For a 95% coverage Tolerance Interval with confidence factor 95% for each well.

The upper limit of the Tolerance Interval is compared to the compliance limit. If the

upper limit of the Tolerance Interval exceeds that limit, there is statistically significant

evidence of an impact.

EXAMPLE:

Date Well#3

1/1/1988 10

4/1/1988 2.5

10/1/1988 16

4/1/1989 15

7/1/1989 8

10/1/1989 15

1/1/1990 21

Example Data for Parametric Tolerance Interval

12.5X = 6.103S = 3.399K =

89

33.253.399)(6.10312.5 Interval Tolerance =∗+=

Nonparametric Tolerance Interval Procedure: For a Tolerance Interval the highest

compliance observation is used to set the upper limit of the tolerance interval. This upper

limit is compared to the compliance limit. If the upper limit of the Tolerance Interval

exceeds that limit, there is statistically significant evidence of an impact.

A minimum of 19 sample values is recommended for the construction of a 95%

Confidence/95% Coverage Tolerance Interval. The highest background value is used to

set the upper limit of the Tolerance Interval. This upper limit is compared to the

compliance limit. If the upper limit of the Tolerance Interval exceeds that limit, there is

statistically significant evidence of an impact.

APPENDIX I: GLOSSARY OF SELECTED STATISTICAL

TERMS

2-tailed Mode - The option used when there is a concern that compliance values can be

both too low as well as too high relative to background values. Sanitas automatically

evaluates pH for high and low exceedances, requiring this option to be checked only if

this feature is needed for other parameters.

95% Confidence Interval - Each time a test is performed, there is a 5% chance that it

will result in a false positive conclusion.

95% Coverage - 95% of the population is intended to be contained within the tolerance

interval.

99% Confidence Level - Each time a test is performed, there is a 1% chance that it will

result in a false positive conclusion.

Alpha Level - The false positive rate, or fraction of the results that will show and

exceedance when in fact none exists. The confidence level associated with each test is 1-

α.

Analysis of Variance (ANOVA) - An interwell analysis that compares either well

means or average ranks among wells. This test is typically recommended in detection

monitoring for determining whether intrawell or interwell tests are most appropriate.

The ANOVA, when run on upgradient wells only, will indicate whether variation exists

among the background wells. When spatial variation is present among upgradient wells,

intrawell tests are recommended in the absence of a historical release at the facility.

Box and Whiskers Plots - A concentration plot depicting the mean, median, minimum,

maximum, and 25th

and 75th

percentiles of a data set. This test is useful in detection

monitoring as a method for visualizing the variation within and among wells.

90

California Non-statistical Analysis of VOCs - An interwell analysis for a suite of

VOCs when nondetects comprise 90% or more of the background data.

Central Tendency - A statistical indicator of the average or middle value of a data set.

Confidence Interval (CI) - A concentration range that is designed to contain the mean

concentration level with a designated level of confidence (e.g., 99%). Confidence

Intervals are useful at sites in corrective action where remediation efforts need to be

evaluated.

Intrawell Limit-Based Tests – Within well comparisons. In the case of limit-based

tests, historical data from within a given well for a given constituent is used to construct a

limit. Compliance points are compared to the limit to determine whether a change is

occurring on a per-well/per-constituent basis. Intrawell limit-based tests are

recommended when there is evidence of spatial variation in the ground water, particularly

among upgradient wells, as it is inappropriate to pool those data across wells for the

purpose of creating interwell limits for comparison with compliance well data. Intrawell

tests may be used at both new and existing facilities. When performing intrawell limit-

based tests at existing facilities, ground water must not be impacted by the facility, and

proposed “background” data must be carefully screened for trends, outliers, and

seasonality prior to constructing limits.

Interwell Tests – Between well comparisons. Interwell limit-based tests use pooled

upgradient well data to construct limits. Individual downgradient well data are compared

to the limits to determine if the facility is impacting ground water. Interwell tests may be

used when there is a continuous aquifer at both new and existing facilities.

Lower Confidence Limit (LCL) - Lower limit to a confidence interval.

Log Transformation – In Sanitas, as is typical in the Guidance documents referenced

below, the term log transformation is synonymous with natural log transformation.

Mann-Kendall Statistical Evaluation - A nonparametric statistical analysis typically

used in detection and assessment monitoring. The Mann-Kendall portion of the Sen’s

Slope/Mann-Kendall trend test determines whether suspected increasing or decreasing

trends are statistically significant. In detection monitoring, this test is useful for

screening proposed background data while in assessment monitoring it may be used to

evaluate the ongoing condition of the trends.

Nondetect Data – Also referred to as “censored” measurements, these data fall between

0 and the quantitation limit (QL) as determined by the laboratory. There is much

uncertainty associated with values falling below the QL due to the difficult in

distinguishing the signal characteristic of the analyte from background noise associated

with laboratory equipment. Because of this, these concentrations are reported as

nondetects to represent “undetected” data.

Non-normal Data - The distribution of the population of data from which the sample has

been drawn is unknown; therefore no assumptions about or estimations of the population

parameters (e.g., mean) can be made.

91

Normally Distributed Data - Data (constituent concentration values) follow a normal

(Gaussian) or bell-shaped curve; the majority of values (95%) are within two standard

deviations from the mean of the concentration values.

Outlier - An observation that is at least an order of magnitude different from the rest of

the group of observations.

Power - The power of a statistical test is the probability that the test will reject a false

null hypothesis, or in other words that it will not make a Type II error. The higher the

power, the greater the chance of obtaining a statistically significant result when the null

hypothesis is false.

Precision - The extent to which a given set of sample measurements of the same

population of values agree with a measure of their central tendency.

Prediction Limit Analysis - An interwell or intrawell analysis that compares one or

more future observations to a limit established by background data. These limit-based

tests are recommended for sites in detection monitoring to determine whether changes are

occurring at compliance wells. In the case of both intra- and inter-well prediction limits,

it is recommended that only one future compliance point (referred to as the “K” value)

from each well be compared to a background limit. By default, Sanitas sets K=1. When

combined with retesting, these tests prove to be the most powerful among the EPA-

recommended methodologies, while minimizing the chance of false exceedances at a site.

Poisson Distributed Data - Data (constituent concentration values) follow a model of

rare events, where the probability of detection is low but stays constant from sampling

period to sampling period (U.S. EPA, 1992).

Sen’s Slope Trend Analysis - A nonparametric statistical analysis of the increase or

decrease in concentration levels over time; calculation of the slope of the linear

relationship of concentration level and time. In Sanitas, this test is combined with the

Mann Kendall test which determines whether the calculated slope is statistically

significant. In detection monitoring, this test is useful for screening proposed background

data while in assessment monitoring it may be used to evaluate the ongoing condition of

the trends.

Shewhart-CUSUM Control Charts – Measure both rapid releases as well as long-term,

gradual trends within a given well for a given constituent. Control Charts are

recommended for sites in detection monitoring and may be used as an alternative to

intrawell prediction limits. These tests used screened background data from within a well

to establish a baseline for comparison of future comparisons.

Site-Wide False Positive Rate (SWFPR) - The probability that at least one parameter

for at least one well will result in a statistically significant finding for each sampling

event at a facility. EPA recommends an annual SWFPR of 10% or less (which equals 5%

for each semi-annual sampling event or 2.5% for each quarterly sampling event)>

Skewness - A measure of the degree of asymmetry of a data distribution.

Testwise Alpha – The overall alpha (or false positive) level for a given test.

92

Time Series Plot - A graphic plot of time ( i.e.: days, months, years) versus

concentration levels.

Tolerance Interval (TI) - A concentration range that is constructed to contain a specified

proportion (e.g., 95%) of the population of observations with a specified confidence (i.e.,

confidence level).

Tolerance Limit - An interwell or intrawell analysis that compares compliance

observations to a limit established by background data that is constructed to contain a

specified proportion (e.g., coverage of 95%) of the population of observations. This test

has historically been one of the tests recommended for sites in detection monitoring for

detection releases. However, more recent EPA recommendations discuss the uncertainty

of the false positive rate associated with this test due to the coverage and confidence

levels.

Transformed-normally Distributed Data - The raw data are not normally distributed;

however the natural logarithms (or some other transformation in the Ladder of Powers

[Helsel & Hirsch]) of the data are normally distributed and parametric procedures may be

used.

Upper Confidence Limit (UCL) - Upper limit to a confidence interval.

Variability - A measure of divergence from the mean of a data set.

BIBLIOGRAPHY

93

ASTM, December 1998. Standard Guide for Developing Appropriate Statistical Approaches

for Ground-Water Detection Monitoring Programs. American Society For Testing and

Materials, West Conshocken, PA.

Cohen, A.C., Jr., 1959. Simplified Estimators for the Normal Distribution When Samples Are

Singly Censored or Truncated, Technometrics, 1: 217-237.

Davis, C. B. and McNichols, R. J., 1994. Ground Water Monitoring Statistics Update: Part II:

Nonparametric Prediction Limits, Ground Water Monitoring Review, Fall: 159.

Eisenhart, C., Hastay, M.W., and Wallis, W.A., 1947. Techniques of Statistical Analysis.

McGraw-Hill Book Company, Inc.

Gibbons, R.D., 1991. Some Additional Prediction Limits for Groundwater Detection

Monitoring at Waste Disposal Facilities, Groundwater, 29:5.

Gilbert, R.O., 1987. Statistical Methods for Environmental Pollution Monitoring. Van

Nostrand Reinhold

Helsel, D.R. and Hirsch, R.M., 1992. Statistical Methods in Water Resources. Elsevier.

Hollander, M. and Wolfe, D.A., 1973. Nonparametric Statistical Methods. John Wiley &

Sons.

Sen, P.K., 1968. Estimates of the Regression Coefficient based on Kendall’s Tau, Journal of

the American Statistical Association, 63 : 1379-1389.

U.S. EPA, April 1989. Statistical Analysis of Ground-Water Monitoring Data at RCRA

Facilities, Interim Final Guidance. Office of Solid Waste Management Division, U.S.

Environmental Protection Agency, Washington, DC.

U.S. EPA, July 1992. Statistical Analysis of Ground-Water Monitoring Data at RCRA

Facilities, Addendum to Interim Final Guidance. Office of Solid Waste Management

Division, U.S. Environmental Protection Agency, Washington, DC.

U.S. EPA, March, 2009. Statistical Analysis of Groundwater Monitoring Data at RCRA

Facilities, Unified Guidance. Office of Resource Conservation and Recovery Program

Implementation and Information Division, U.S. Environmental Protection Agency,

Washington, DC.

Wilk, M.B., and Shapiro, S.S., Technometrics, 10, No. 4, 1968, p 825-839

Willits, N., 1994. Personal Communication between Henry R. Horsey and Neil Willits,

statistical consultant to the California State Water Resources Control Board, Use of

nonparametric prediction limits including retests.

Zar, Jerrold H., 1996. Biostatistical Analysis. 3rd

edition (p112) Prentice Hall.

94

INDEX

2-tailed Mode ...................................... 89

Aitchison’s Adjustment ................ 33, 34

Alpha ................................................... 89

Alternate Value ................................... 74

Analysis of Variance ........................... 41

ANOVA ... 39, 41, 42, 43, 44, 45, 46, 47,

89

ASTM ............................... 36, 64, 70, 93

Bonferroni t-statistic ........................... 45

Box & Whiskers Plot .......................... 89

Box and Whiskers Plot.......................... 5

California ................................ 46, 62, 63

California standards ............................ 36

Censored Data ..................................... 31

Central Tendency ................................ 90

Chi-Squared ........................................ 25

Coefficient of-Variation ...................... 24

Cohen’s Adjustment...................... 31, 32

Compliance or Corrective Action ....... 83

Confidence .......................................... 89

Confidence Interval ............................. 90

Confidence Interval around Trend Line

......................................................... 81

Confidence Intervals ........................... 83

Control Chart ................................ 21, 91

Control Chart Procedure ............... 35, 65

Coverage ............................................. 89

Deseasonalizing .................................. 30

Detection Monitoring .......................... 21

Dixon's OutLier Test ........................... 12

EPA ..................................................... 93

EPA 1989 Outlier Test ........................ 11

Equality of Variance Test ............. 42, 43

Evaluation Monitoring ........................ 77

Histogram .............................................. 6

Interwell .............................................. 90

Intrawell .............................................. 90

Kaplan-Meier ...................................... 34

Kruskal-Wallis test............ 29, 38, 48, 64

Kurtosis ............................................. 7, 9

Ladder of Powers .......................... 52, 58

Levene’s test ....................................... 42

Log Transformation ............................ 90

Mann-Kendall ................... 65, 77, 78, 90

Mann-Whitney .............................. 38, 39

Multiple Group Shapiro-Wilk ....... 40, 70

Non-Detects ........................................ 90

Nonparametric..................................... 93

Nonparametric ANOVA ..................... 47

Non-Statistical Analysis................ 62, 90

Normality ............................................ 91

Normality Report ................................ 19

Outlier ........................................... 10, 91

Parametric ..................................... 41, 74

Parametric ANOVA ...................... 41, 44

Piper Diagram ..................................... 20

Poisson ........................ 55, 64, 66, 73, 91

Power .................................................. 91

Precision .............................................. 91

Prediction Limit ...................... 52, 71, 91

Prediction Limits

EPA ................................................. 52

UG Standards .................................. 56

Probability Plot ..................................... 9

Rank Sum ............................................ 37

Rank Von Neumann ............................ 17

Rosner’s Outlier Test .......................... 15

Seasonal Kendall Test ......................... 82

Seasonality ........................ 27, 28, 38, 64

Seasonality Adjustment ...................... 29

Seasonality Plot ................................... 10

Sen’s Slope.......................................... 91

Sen’s Slope Estimator ......................... 80

Shapiro-Francia ................................... 25

Shapiro-Wilk ....................................... 22

Shapiro-Wilk, Multiple Group ...... 40, 70

Shewhart-CUSUM 21, 32, 35, 37, 64, 65

Skewness ......................................... 8, 91

standard deviation ................................. 7

Statistical Outlier ................................ 10

Stiff Diagram ...................................... 19

SWFPR ............................................... 91

Time Series ..................................... 5, 29

Tolerance Intervals.............................. 87

Tolerance Limit ................................... 49

Tolerance Limit ................................... 92

Tolerance Limits ................................. 48

Transformations .................................. 92

95

Trend Analysis .................................... 77

Two-tailed ........................................... 79

Unified Guidance ................................ 56

Unified Guidance ................................ 36

Unified Guidance ................................ 93

Variability ........................................... 92

Verification Retest Procedure ............. 63

Welch's t-test ....................................... 39

Wilcoxon Rank Sum ........................... 38

W-statistic ........................................... 22