a note on the use of outlier criteria in ontario laboratory quality control schemes

6
Review A note on the use of outlier criteria in Ontario laboratory quality control schemes Kevin Hayes a, , Anthony Kinsella b , Norma Coffey a a Department of Mathematics and Statistics, University of Limerick, Limerick, Republic of Ireland b Department of Clinical Pharmacology, Royal College of Surgeons in Ireland, Dublin 2, Republic of Ireland Received 21 March 2006; received in revised form 8 August 2006; accepted 16 August 2006 Available online 19 October 2006 Abstract Objectives: This paper examines the pitfalls that arise when an outlier is assessed using a criterion based on a fixed multiple of the standard deviation rather than an established statistical test. Although the former approach is statistically invalid, it is the favored method for identifying outliers in Ontario laboratory quality control protocols. Design and methods: Computer simulations are used to calculate the probability of a false positive result (classifying a valid observation as an outlier) when outlier criteria based on fixed multiples of the standard deviation are applied to samples containing no outliers. Results: The estimated probability of a false positive result is tabulated over various sample sizes. Outlier criteria based on fixed multiples of the standard deviation are shown to be highly inefficient. Conclusions: This work presents arguments for discontinuing the widespread practice of using outlier criteria based on fixed multiples of the standard deviation to identify outliers in univariate samples. © 2006 The Canadian Society of Clinical Chemists. All rights reserved. Keywords: Boxplot; Dixon test; Grubbs test; Robust methods Contents Introduction ................................................................ 147 Methods.................................................................. 148 Results .................................................................. 148 Discussion ................................................................ 150 Conclusions ................................................................ 151 References ................................................................ 151 Introduction The review paper by Krishnan et al. [1] reports on the in- house quality control (QC) practices of 115 licensed laboratories in Ontario, using cholesterol testing as the quality control paradigm. The participating laboratories calculate monthly means and standard deviations using sample sizes ranging from 4 to 100+ observations. The definition of an outlier used by 33 laboratories (29%) was an observation more than 2 standard deviations above or below the mean, while 55 laboratories (48%) used a cutoff of 3 standard deviations above or below the mean to define an outlying observation. Five laboratories reported using Dixon's Qstatistic [2] and only 13 laboratories reported that it was policy to retain outliers in subsequent calculations. The report concludes that QC practices in Ontario are not entirely satisfactory,and although most laboratories use some form of statistical QC rules for error detection, there is a need for more efficient use of these rules. Clinical Biochemistry 40 (2007) 147 152 Corresponding author. Fax: +353 61 334927. E-mail addresses: [email protected] (K. Hayes), [email protected] (A. Kinsella), [email protected] (N. Coffey). 0009-9120/$ - see front matter © 2006 The Canadian Society of Clinical Chemists. All rights reserved. doi:10.1016/j.clinbiochem.2006.08.019

Upload: kevin-hayes

Post on 26-Jun-2016

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: A note on the use of outlier criteria in Ontario laboratory quality control schemes

Clinical Biochemistry 40 (2007) 147–152

Review

A note on the use of outlier criteria in Ontario laboratoryquality control schemes

Kevin Hayes a,⁎, Anthony Kinsella b, Norma Coffey a

a Department of Mathematics and Statistics, University of Limerick, Limerick, Republic of Irelandb Department of Clinical Pharmacology, Royal College of Surgeons in Ireland, Dublin 2, Republic of Ireland

Received 21 March 2006; received in revised form 8 August 2006; accepted 16 August 2006Available online 19 October 2006

Abstract

Objectives: This paper examines the pitfalls that arise when an outlier is assessed using a criterion based on a fixed multiple of the standarddeviation rather than an established statistical test. Although the former approach is statistically invalid, it is the favored method for identifyingoutliers in Ontario laboratory quality control protocols.

Design and methods: Computer simulations are used to calculate the probability of a false positive result (classifying a valid observation as anoutlier) when outlier criteria based on fixed multiples of the standard deviation are applied to samples containing no outliers.

Results: The estimated probability of a false positive result is tabulated over various sample sizes. Outlier criteria based on fixed multiples ofthe standard deviation are shown to be highly inefficient.

Conclusions: This work presents arguments for discontinuing the widespread practice of using outlier criteria based on fixed multiples of thestandard deviation to identify outliers in univariate samples.© 2006 The Canadian Society of Clinical Chemists. All rights reserved.

Keywords: Boxplot; Dixon test; Grubbs test; Robust methods

Contents

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147Methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

Introduction

The review paper by Krishnan et al. [1] reports on the in-house quality control (QC) practices of 115 licensed laboratoriesin Ontario, using cholesterol testing as the quality controlparadigm. The participating laboratories calculate monthlymeans and standard deviations using sample sizes ranging

⁎ Corresponding author. Fax: +353 61 334927.E-mail addresses: [email protected] (K. Hayes), [email protected]

(A. Kinsella), [email protected] (N. Coffey).

0009-9120/$ - see front matter © 2006 The Canadian Society of Clinical Chemistsdoi:10.1016/j.clinbiochem.2006.08.019

from 4 to 100+ observations. The definition of an outlier used by33 laboratories (29%) was an observation more than 2 standarddeviations above or below the mean, while 55 laboratories(48%) used a cutoff of 3 standard deviations above or below themean to define an outlying observation. Five laboratoriesreported using Dixon's “Q” statistic [2] and only 13 laboratoriesreported that it was policy to retain outliers in subsequentcalculations. The report concludes that “QC practices in Ontarioare not entirely satisfactory,” and although “most laboratoriesuse some form of statistical QC rules for error detection, there isa need for more efficient use of these rules”.

. All rights reserved.

Page 2: A note on the use of outlier criteria in Ontario laboratory quality control schemes

Table 1Estimated probability of a false positive result for outlier criteria based on Gapplied to Gaussian data

The “staircase” within the table represents the value of the upper bound on Gfrom equation (3).

148 K. Hayes et al. / Clinical Biochemistry 40 (2007) 147–152

This article examines the pitfalls associated with usingoutlier criteria set at fixed multiples of the standard deviation.We distinguish between the cases when (i) the standarddeviation is estimated from the available data; (ii) an externalestimate, or established value based on past records, of thestandard deviation is used. The former case is likely to beassociated with routine data analysis and/or measurementsarising from analytical procedures. The latter case usuallyarises in the context of QC schemes. For completeness bothsituations are examined. The results presented below showthat outlier criteria based on a fixed multiple of the standarddeviation are inefficient. Some alternative approaches arediscussed.

Methods

Denote the probability of a false positive result by p, wherea false positive result occurs when a “valid” observation isclassified as an outlier. An efficient outlier test will becharacterized by a constant false positive probability rate,conventionally set at some nominally low value (e.g., 5% or1%). The aim of this study is to estimate p when outlier rulesbased on fixed boundary cutoffs are applied to (good) samplesthat contain no outliers.

A sample yielding a false positive result can be identifiedby checking the most extreme observation in the sampleagainst the fixed boundary cutoff. Statistics G (for samplesusing an estimated standard deviation) and K (for samplesusing a known or historical standard deviation) are used forthis purpose.

The most outlying observation in the sample x1,…, xn can bedefined as the observation that deviates the most from thesample mean x̄ ¼ Pn

i¼1 xi=n. When scaled in terms of the(sample) standard deviation

s ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiXni¼1

ðx� x̄Þ2

n� 1

vuuut;

the most outlying observation in the sample is the observation xisatisfying

G ¼ maxi

jxi � x̄js

: ð1Þ

The statistical software package R [3] was used tosimulate M=3×105 samples of size n from a standardnormal distribution. The value of G was calculated for eachof the M samples and p̂ (proportion of times G>2) wasrecorded. The estimated probabilities are shown in Table 1and are reported to three significant figures [4]. Separatesimulations of the rules G>2.25,..., G>3.75 and G>4 arealso shown in Table 1.

This protocol was repeated using samples drawn from aStudent's t distribution on 4 degrees of freedom. The resultsare shown in Table 2 and demonstrate the effect of departures

from the underlying assumption of normally distributed data,here in the context of a distribution with longer tails.

In a laboratory setting the population (or process) standarddeviation σ is sometimes assumed to be known. A known valueof σ might be obtained from historical data or supplied as theclaimed precision of a laboratory instrument. The importantconsideration is that the value of σ is ‘external’ to the sample. Inthese situations the statistic (1) is inappropriate and should bereplaced by

K ¼ maxi

jxi � x̄jr

: ð2Þ

The statistical software package R was used to simulateM=3×105 samples of size n from a standard normaldistribution. The value of K was calculated for each of the Msamples, setting σ=1. The proportion of times K>2 wasrecorded. The estimated probabilities are shown in Table 3.Separate simulations of the rules K>2.25,…, K>3.75 and K>4are also shown in Table 3.

Results

For the G>2 rule the estimated probability of a false positiveresult in Table 1 grows rapidly from a value that is less than 0.01for n=6 to a value which exceeds 0.12 for n≥8. For n≥8 the

Page 3: A note on the use of outlier criteria in Ontario laboratory quality control schemes

Table 2Estimated probability of a false positive result for outlier criteria based on Gapplied to samples from a Student's t distribution on 4 degrees of freedom

The “staircase” within the table represents the value of the upper bound on G fromequation (3).

Table 3Estimated probability of a false positive result for outlier criteria based on K applied

n K>2.00 K>2.25 K>2.50 K>2.75

3 0.037 0.016 0.006 0.0024 0.075 0.035 0.015 0.0065 0.114 0.055 0.026 0.0106 0.152 0.077 0.036 0.0157 0.188 0.098 0.047 0.0208 0.225 0.118 0.057 0.0269 0.258 0.140 0.069 0.03110 0.291 0.160 0.080 0.03611 0.324 0.181 0.090 0.04212 0.355 0.200 0.102 0.04813 0.383 0.220 0.112 0.05314 0.413 0.237 0.123 0.05915 0.438 0.256 0.134 0.06416 0.463 0.275 0.145 0.06917 0.486 0.291 0.155 0.07518 0.511 0.310 0.166 0.08119 0.534 0.325 0.176 0.08620 0.553 0.343 0.185 0.09130 0.719 0.485 0.281 0.14340 0.824 0.600 0.365 0.19250 0.889 0.687 0.440 0.24060 0.930 0.755 0.507 0.28270 0.957 0.808 0.563 0.32480 0.972 0.851 0.614 0.36590 0.983 0.884 0.660 0.401100 0.989 0.909 0.701 0.436

149K. Hayes et al. / Clinical Biochemistry 40 (2007) 147–152

G>2 rule will incorrectly identify valid observations as out-liers too frequently to be of any practical use. The estimatedprobability of a false positive result when the G>3 rule isinvoked is less than 0.001 when n≤14. Although it isdesirable that the probability of a false positive result be low,Hayes and Kinsella [5] show that this advantage is gained atthe expense of failing to detect a truly outlying observation.Low values of p can reduce the power of the outlier rule todetect true outliers so much that the rule will be ineffective.For the G>3 rule the estimated probability of a false positiveresult continues to increase steadily with increases in samplesize and equals 0.205 for n=100. For the G>4 rule theestimated probability of a false positive result is less than0.003 for n≤100.

It is impossible for any observation in a sample of size n todeviate more than s� ðn� 1Þ= ffiffiffi

np

from the sample mean x̄(when x̄ and s are calculated from the same set of data). Thislimitation arises due to the upper bound

ðn� 1Þffiffiffin

p ð3Þ

on G due to Thompson [6]. When n=5, this constraint impliesthat all observations in the sample must be within ð5� 1Þ= ffiffiffi

5p ¼

1:79 standard deviations of the sample mean. Hence anobservation will never be judged as an outlier using the G>2criterion for samples sizes of n≤5. The ‘power of the test’ equals0, that is, the G>2 criterion is unable to classify any observationas discordant for n≤5, no matter how deviant it may be.

to Gaussian data

K>3.00 K>3.25 K>3.50 K>3.75 K>4.00

0.001 0.000 0.000 0.000 0.0000.002 0.001 0.000 0.000 0.0000.004 0.001 0.000 0.000 0.0000.006 0.002 0.001 0.000 0.0000.009 0.003 0.001 0.000 0.0000.011 0.004 0.001 0.000 0.0000.013 0.005 0.002 0.001 0.0000.015 0.006 0.002 0.001 0.0000.018 0.007 0.003 0.001 0.0000.020 0.008 0.003 0.001 0.0000.023 0.009 0.004 0.001 0.0000.025 0.011 0.004 0.001 0.0010.028 0.012 0.004 0.002 0.0010.031 0.013 0.005 0.002 0.0010.033 0.014 0.005 0.002 0.0010.036 0.015 0.006 0.002 0.0010.038 0.016 0.006 0.002 0.0010.042 0.017 0.007 0.002 0.0010.066 0.028 0.011 0.004 0.0010.091 0.039 0.016 0.006 0.0020.116 0.050 0.020 0.008 0.0020.138 0.060 0.025 0.009 0.0030.161 0.072 0.029 0.011 0.0040.183 0.083 0.034 0.013 0.0040.205 0.094 0.039 0.014 0.0050.225 0.104 0.043 0.016 0.006

Page 4: A note on the use of outlier criteria in Ontario laboratory quality control schemes

Table 4Critical values for G

n α=0.10 α=0.05 α=0.025 α=0.01

3 1.153 1.154 1.155 1.1554 1.463 1.481 1.491 1.4965 1.671 1.715 1.742 1.7646 1.822 1.887 1.933 1.9737 1.938 2.020 2.081 2.1398 2.032 2.127 2.201 2.2749 2.110 2.215 2.300 2.38710 2.176 2.290 2.383 2.48211 2.234 2.355 2.455 2.56412 2.285 2.412 2.519 2.63613 2.331 2.462 2.574 2.69914 2.372 2.507 2.624 2.75515 2.409 2.548 2.669 2.80616 2.443 2.586 2.710 2.85217 2.475 2.620 2.748 2.89418 2.504 2.652 2.782 2.93219 2.531 2.681 2.814 2.96820 2.557 2.708 2.843 3.00130 2.745 2.908 3.058 3.23640 2.868 3.036 3.192 3.38150 2.957 3.128 3.288 3.48260 3.027 3.200 3.361 3.56070 3.084 3.258 3.421 3.62280 3.132 3.306 3.470 3.67390 3.173 3.348 3.512 3.716100 3.210 3.384 3.549 3.754

Table 5Critical values for K

n α=0.10 α=0.05 α=0.025 α=0.01

3 1.738 1.955 2.154 2.3974 1.941 2.163 2.368 2.6185 2.081 2.304 2.511 2.7646 2.185 2.408 2.616 2.8707 2.268 2.491 2.698 2.9528 2.336 2.558 2.764 3.0199 2.394 2.614 2.820 3.07410 2.444 2.663 2.868 3.12211 2.487 2.706 2.910 3.16312 2.526 2.743 2.947 3.19913 2.561 2.777 2.980 3.23214 2.592 2.808 3.010 3.26115 2.621 2.836 3.037 3.28816 2.648 2.861 3.062 3.31217 2.672 2.885 3.085 3.33418 2.695 2.907 3.107 3.35519 2.716 2.928 3.127 3.37520 2.736 2.947 3.146 3.39330 2.886 3.091 3.285 3.52840 2.985 3.187 3.377 3.61650 3.059 3.257 3.446 3.68260 3.118 3.314 3.500 3.73370 3.166 3.360 3.544 3.77680 3.207 3.399 3.582 3.81290 3.243 3.433 3.615 3.843100 3.274 3.463 3.644 3.871

150 K. Hayes et al. / Clinical Biochemistry 40 (2007) 147–152

Similarly, G cannot exceed 2.25 for sample sizes of n≤7,exceed 3 for sample sizes of n≤10, or exceed 4 for sample sizesof n≤17.

A staircase line running from n=5 for G>2 to n=17 forG>4 has been drawn in Table 1 and splits the table into topand bottom parts. The entries 0.000 in the table occurring abovethis line arise as a consequence of the upper bound (3) on G.Although the probability of a false positive result is zero, thepower of the test is also zero—indicating that an outlier willnever be classified as such using these outlier rules.

The probabilities in Table 2 are all larger than the cor-responding probabilities in Table 1, indicating that the estimatedprobability of a false positive result increases for heavy taileddistributions.

For the K>2 rule the probability of a false positive result inTable 3 grows rapidly from a value less than 0.04 for n=3 to avalue which exceeds 0.15 for n≥6. For n≥6 the K>2 rulewill incorrectly identify valid observations as being outliers toofrequently to be of any practical use. The probability of a falsepositive result when the K>3 rule is invoked is less than 0.02when n≥12. The K>3 rule suffers from the same drawbackas the G>3 rule when the sample size is small. For the K>4rule the probability of a false positive result is less than 0.001for n≤30 and continues to be conservative for larger samplesizes.

Note, the upper bound (3) on G does not apply to any ruleinvolving K. The fact that G is bounded above by ðn� 1Þ= ffiffiffi

np

and K has no such bound does not imply that (2) can be useduniversally. The availability of an ‘external’ estimate of σgoverns the need to use K over G.

Discussion

In an early review paper Anscombe [7] wrote that theearliest formulation of an outlier test criterion was based on arule of thumb, due to Wright [8], and involved “reject(ing) anyobservation whose residual exceeds in magnitude five timesthe probable error (i.e., 3.37 times the standard deviation). Thereason given for this is that if the Gaussian law of error is trulysatisfied, only about one observation in a thousand will berejected, ‘and therefore little damage will be done in anycase’.” This procedure was extended by Goodwin [9] and wassimilar to Wright's method in that the critical values wereindependent of the sample size. As Barnett and Lewis [10]point out, these early works fail to “distinguish between (the)population variance and (the) sample variance and, moreimportantly, they were erroneously based on the distributionalbehaviour of a random sample rather than on that of anappropriate sample extreme”. Irwin [11] was the first to pointout the implications of the unreliability of the sample standarddeviation for outlier detection and used extreme values in thedevelopment of a test statistic. Grubbs [12] wrote that aStudentized extreme deviation (denoted Tn) was suggested byPearson and Chandra Sekar [13] “for testing the significance ofthe largest observation” and by drawing on results due toThompson [6], “Pearson and Chandra Sekar were able toobtain certain percentage points of Tn without deriving theexact distribution of Tn.” The sampling distribution problemsof an extreme value in a sample from a Gaussian distributionwere derived by Grubbs [12]. The resulting Grubbs test is

Page 5: A note on the use of outlier criteria in Ontario laboratory quality control schemes

151K. Hayes et al. / Clinical Biochemistry 40 (2007) 147–152

discussed in detail by Hayes and Kinsella [5], and is importantin the context of clinical chemistry laboratory quality controlschemes as it has become the ISO recommended test foroutliers: see Miller and Miller [14], and Mullins [15].

When the data x1,…, xn are an independent random samplefrom a normal distribution the sampling distributions of Gand K are available from Grubbs [12]. The statistic G formsthe basis for a two-sided test of an extreme outlier in anormal sample with both the mean and standard deviationunknown. The statistic K forms the basis of a two-sided testof an extreme outlier in a normal sample with the standarddeviation known and the mean unknown. Denote asignificance level of α as the probability of a false positiveresult, that is, the probability of concluding that a “valid”observation does not belong to the same distribution as theremaining observations in the sample. Critical values for Gand K are tabulated in Table 4 and Table 5 respectively, forsample sizes n=3 to n=100 and various values of α. If eitherG or K exceed their tabulated critical value the nullhypothesis that the most extreme observation in a sample ofsize n comes from the same normal distribution as the otherobservations is rejected at the α significance level. Outliercriteria based on fixed multiples of the standard deviation areinvalid simplifications of the Grubbs test.

Although the Grubbs test is a valid procedure for evaluatinga single outlier in a sample of Gaussian data, it has importantlimitations and its use cannot be recommended universally. Ifthere is more than one outlier present in the sample the Grubbstest suffers from the effect of outlier masking, a term attributableto Murphy [16] who discussed the effect where the presence ofan outlier can be cloaked or masked by a neighbouring value.Also, the Grubbs test can only be applied to Gaussian data.Healy [17] considered the restrictions of this assumption forclinical chemistry quality control data and concluded “Analysisof National Quality Control Scheme data showed reasonablenormality for almost all analytes and methods…but there weresporadic exceptions”.

Dixon's “Q” statistic [2] is a popular parametric alternativeto the Grubbs test, also based on the assumption that the data areGaussian. Dixon's “Q” statistic is also vulnerable to the effectsof outlier masking. Iglewicz and Hoaglin [18] “recommend theDixon tests only for very small samples,” favoring Grubbs typeprocedures for Gaussian data instead.

Boxplots are available in most statistical software packagesand are suitable for the analysis of non-Gaussian data. Athorough discussion of various boxplots can be found in Friggeet al. [19]. The original boxplot version found in Tukey [20]displays a scale for the data, the median, the lower quartile(Q1), the upper quartile (Q3), fences drawn a length k×(Q3−Q1) either side of the median (usually k=1.5), and outsidevalues defined as observations falling beyond the fences. Theboxplot outlier labeling rule is to classify the outside values asoutliers. Hoaglin et al. [21] show that for k=1.5, applyingboxplots to non-contaminated Gaussian data results inapproximately 0.7% of observations being labelled outliers.However, they also show that for small sample sizes theoutside rate may be as high as 10%.

Healy [17] argues that criteria based on fixed thresholdsset as a multiple of the sample standard deviation areinadequate for dealing with outliers in clinical chemistryquality control schemes, as an outlier can so inflate theestimated standard deviation that its presence is not detected.To demonstrate the weakness of this approach he constructsan artificial sample of 21 observations, 20 of which are froma normal distribution with population mean 50 and standarddeviation 10, and the final value is set equal to 90 (fourtimes the known standard deviation above the populationmean). Healy reports a sample mean x̄ =50.9 and samplestandard deviation s=13.08 and notes that the extreme valueat 90 is G= |90–51.9|/13.08=2.91 “SDs away from theestimated mean and will fail to be detected with a 3-SDcutoff.” Healy completes his analysis by calculating a robustestimate σ*=11.0 of the standard deviation based on datatrimming due to Downton [22], and remarks “the extremevalue 90 is now 3.5 standard deviations from the mean andwill be detected by a 3.0 SD cutoff”.

One advantage of robust estimates is that they are lessinfluenced by the presence of outliers. This is demonstrated inHealy's analysis as σ*=10.0 is closer to the true standarddeviation σ=11.0 than the classical estimate s=13.08. Thejustification for using a cutoff fixed at 3 standard deviations iscrucial and this is not addressed in Healy's discussion.

Conclusions

This work presents arguments for discontinuing the wide-spread practice of using outlier criteria based on fixedmultiples of the standard deviation to identify outliers inunivariate samples. This issue is particularly relevant in thecontext of clinical laboratory QC schemes.

References

[1] Krishnan S, Webb S, Henderson AR, Cheung CM, Nazir DJ, RichardsonH. An Overview of quality control practices in Ontario with particularreference to cholesterol analysis. Clin Biochem 1999;32:93–9.

[2] Dixon WJ. Analysis of extreme values. Ann Math Stat 1950;21:488–506.

[3] R Development Core Team. R: a language and environment for statisticalcomputing. Vienna, Austria: R Foundation for Statistical Computing.2003. www.r-project.org.

[4] Krutchkoff RG. How to estimate the standard error of an estimate of asimulated probability. J Stat Comput Simul 1980;10:147–53.

[5] Hayes K, Kinsella A. Spurious and non-spurious power in performancecriteria for tests of discordancy. The Statistician. J R Stat Soc, Ser D2003;52:69–82.

[6] Thompson WA. On a criterion for the rejection of observations and thedistribution of the ratio of deviation to sample standard deviation. AnnMath Stat 1935;6:214–9.

[7] Anscombe FJ. Rejection of outliers. Technometrics 1960;2:123–47.[8] Wright TW. ATreatise on the adjustment of observations by the method of

least squares. New York: VanNostrand; 1884.[9] Elements of the precision of measurements and graphical methods.

McGraw-Hill. 1913.[10] Barnett V, Lewis T. Outliers in statistical data. Chichester: Wiley; 1994.[11] Irwin JO. On a Criterion for the rejection of outlying observations.

Biometrika 1925;17:238–50.

Page 6: A note on the use of outlier criteria in Ontario laboratory quality control schemes

152 K. Hayes et al. / Clinical Biochemistry 40 (2007) 147–152

[12] Grubbs FE. Sample criteria for testing outlying observations. Ann MathStat 1950;21:27–58.

[13] Pearson ES, Chandra Sekar C. The efficiency of statistical tools and acriterion for the rejection of outlying observations. Biometrika 1936;28:308–20.

[14] Miller JN, Miller JC. Statistics and chemometrics for analytical chemistry.Pearson Hall 2005.

[15] Mullins E. Statistics for the quality control chemistry laboratory. R SocChem 2003.

[16] Murphy, RB. On tests for outlying observations. PhD thesis, PrincetonUniversity.

[17] Healy MJR. Outliers in clinical chemistry quality control schemes. ClinChem 1979;25:675–7.

[18] Iglewicz B, Hoaglin DC. How to detect and handle outliers. Am Soc QualControl 1993.

[19] Frigge M, Hoaglin DC, Iglewicz B. Some implementations of the boxplot.Am Stat 1989;43:50–4.

[20] Tukey JW. Exploratory data analysis. Addison-Wesley; 1977.[21] Hoaglin DC, Iglewicz B, Tukey JW. Performance of some resistant rules

for outlier labeling. J Am Stat Assoc 1986;81:991–9.[22] Downton F. Linear estimates with polynomial coefficients. Biometrika

1966;53:129–41.