1-s2.0-s0009898111006644-main.pdf

Upload: bernard-china

Post on 01-Jun-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/9/2019 1-s2.0-S0009898111006644-main.pdf

    1/5

  • 8/9/2019 1-s2.0-S0009898111006644-main.pdf

    2/5

    a contaminated Normal distribution. Some authors  [8,9], however,

    claim that the distribution, even in the absence of outliers, may be

    leptokurtic, i.e. exhibiting heavier tails than the Normal distribution.

    Thus, comparing the two types of estimating approaches for the spe-

    cic classes of symmetric and unimodal distributions, represented by

    the Normal and Student's t -distributions may be of importance.

    This study was designed to address three simple questions: (1) What

    is the false positive rate of   Z -scores estimation methods in non-

    contaminated samples from the Normal and Student's  t   distributions?(2) What is the true positive rate of  Z -score estimation methods in con-

    taminated samplesfrom thesame distributions? (3)What is theaccuracy

    and precision of the different variability estimators for the Normal

    distribution?

    2. Materials and methods

    A total of 1000 random samples was generated from a Normal distri-

    bution with mean and standard deviation arbitrarily set at  μ =10 andσ =0.5. Data were generated for sample sizes ranging from  n =3 to 20.

    Subsequently, to obtain data from a leptokurtic distribution, similar sim-

    ulations were performed using a Student's t -distribution with 5 degrees

    of freedom. Only samples for which all values were within the interval

    [ μ −3σ ,  μ +3σ ] were withheld. Next, the samples were contaminated

    byaddingan outlierat μ +3σ , at μ +5σ andat μ +7σ separately, resulting

    in a sample of size n +1. Samples of size of  n =3 without added outliers

    were not taken into account. Z -scores were calculated on each sample fol-

    lowing ve different approaches. The rst approach used the Grubbs test

    [10,11] to remove outliers in a rst step. This test calculates the distance

    between the most extreme point and the centre of the distribution. If 

    this distance is too large with respect to the standard deviation, this

    point was indicated as an outlier. The test uses a predened false alarm

    rate which was kept small (α =0.05). If an outlier was found, the test

    was repeated on the rest of the sample, using the predened level  α 

    until no outlier was present any more. In a second step, Z -scores were cal-

    culated based on the classical average and standard deviation of the data

    that were not marked as outliers in the  rst step. The second approach

    used the Dixon test [12] to remove outliers in a rst step. The method is

    based on the calculation of ranges between lowest and highest samplevalues, and subrangesbetween the most extreme samplevalueson either

    side. Like theGrubbstest, it uses a hypothesistest. Also here, outliers were

    removed till the null hypothesis of absence of outliers was accepted

    (α =0.05) and subsequently,  Z -scores were obtained using the classical

    average and standard deviation based on the datathat were not removed.

    The third approach, often called the Tukey approach [13,14], calculates a

    robust estimator of scale by dividing the interquartile distance by the

    interquartile distance of a standard Normal distribution (D=1.34898)

    and uses the median as an estimator of the centre of the distribution.

    The fourth approach uses Qn [15,16] as a robust estimator of scale and

    the median as an estimator of the centre. The Qn estimator is approxi-

    mately the median value of all pairwise differences between all values,

    rescaled to reect the standard deviation of a Normal distribution with

    axed value D. At last,the robust estimators of scale and centre accordingto ISO 13528 were calculated. Algorithm A of ISO 13528 [2] is based on

    calculating the classical average and standard deviation of a Winsorized

    sample. Winsorizing, i.e. replacing values beyond a certain limit by the

    limit itself, was applied forvalues deviating by 1.5(δ) standard deviations

    away from the centre.

    The ability of the various approaches of  agging outliers when they

    exist and of not agging themwhentheydo not exist can beassessed in

    a way similar to the evaluation of diagnostic tests. For this purpose, the

    Negative Predictive Value (NPV) and Positive Predictive Value (PPV)

    were calculated for each approach by letting a specic parameter vary

    thatmay be changed to over- or under-estimate the standard deviation,

    and, as a consequence, respectively decrease or increase the number of  Z -scores above 3. The NPV wascalculated as theratio between the True

    Negatives (i.e. the samples to which no outlier was added and that

    showed not Z -scores beyond 3) and the number of samples for which

    no Z -score beyond 3 was found (=True+False Negatives). Likewise,

    the PPV was calculated as the ratio between the True Positives (i.e. the

    samples to which an outlier was added and that showed a Z -score be-

    yond 3) and the number of samples for which a Z -score beyond 3 was

    found (=True+False positives). For the Grubbs and Dixon tests-

    based approaches, the P -value for which outliers are excluded (α ) was

    changed. For the Tukey and Qn approach, D was changed: lower values

    of D result in lower standard deviations, higher  Z -scores and hence ahigher   Z -citation rate. For the ISO-13528 approach,   δ  was changed.

    NPV and PPV for each of the different values of the varying parameter

    were recorded and graphically displayed. At last, for each simulated

    data series of samples generated from the Normal distribution, the var-

    iability estimator obtained by every approach was recorded and its

    mean and standard error calculated.

    3. Results

     3.1. False positives

    A representative part of the False Positive (FP) rates obtained is

    depicted in theupper part of Table 1 (no outlier).Among allapproaches,

    the Tukey method showed the most distinctive behaviour; while all FP

    rates werebelow 15% for theNormal distribution and below 30% forthe

    Student's   t -distribution, Tukey's approach had for almost all sample

    sizes a rate above 20%. The Dixon and ISO approaches showed the low-

    est FP rates. In addition, it is seen that forsamples of size 6 or larger, the

    FP rate of each approach (except ISO) stabilised for the Normaldistribu-

    tion. By contrast, all FP rates increased with increasing sample size for

    the Student's t -distribution.

     3.2. True positives

    The True Positive (TP) rates when adding an outlier at a distance μ +3σ  or μ +5σ  are shown in Table 1. For all outlier distances, differ-

    ences between the different approaches were similar for the Normal

    and Student's   t -distribution. For outliers at   μ +3σ , none of the ap-

    proaches was able to  ag the outliers in more than half of the casesfor all sample sizes. Tukey's approach had the highest performance,

    reaching a   agging rate of nearly 50% as soon as the sample size

    was 6 or larger. The other approaches had much weaker perfor-

    mance; the ISO approach had   agging rates below 10% for very

    small samples. All other approaches exhibited outlier   nding rates

    of roughly 10–30%. In addition, these results point to a clear improve-

    ment of  agging rates for all approaches with increasing sample size

    and outlier distance, with a probability of detection close to 100%

    for outliers at  μ +7σ . The ISO and Dixon approaches, however, still

    had a weak performance for very small sample sizes.

     3.3. Negative and positive predictive value

    The results of the NPV and PPV for sample size  n =6 are shown inFig. 1. Like in Receiver Operating Characteristic(ROC) analysis, the perfect

    approach would ag no Z -scores larger than 3 in case they do not exist

    (negative prediction) and would  ag them all in case they would exist

    (positive prediction), would correspond to a curve made up by a vertical

    line equal to the Y-axis and a horizontallinewhich intersects the Y -axis at

    the value 1. The further the curve departs from the perfect curve, the

    worse the performance of the approach. For outliers at  μ +3σ , curves

    were located far from the ideal line so that NPV and PPV did not reach

    high levels for any of the approaches: only a combination of positive

    and negative predicted values of about 60% was feasible, and although

    the Grubbs approach tended to perform better, there was not much dif-

    ference between the approaches.For increasingoutlierdistance, however,

    it is seen that the curves tend to close to the perfect curve. The Qn ap-

    proach consistently performed the worst. The outlier searching

    583W. Coucke et al. / Clinica Chimica Acta 413 (2012) 582–586 

  • 8/9/2019 1-s2.0-S0009898111006644-main.pdf

    3/5

    algorithms showed a slightly better performance, mainly for outliers at

    moderate distance from the centre ( μ +5σ ). There was almost no differ-

    ence between the results of the data generated from the Normal or from

    the Student's t -distribution.

    A similar trend was seen for a sample size  n =8 (Fig. 2). All algo-

    rithms exhibited a weak performance for outliers at μ +3σ . For outliers

    at  μ +5σ , however, the Grubbs approach performed better than the

    other approaches. This difference became less clear for more distant

    outliers, where all approaches showed almost perfect positive and neg-

    ative predictive values. Focussing on the Grubbs approach, the search

    for the optimal P -value forexcluding outliers (α ) wasmade for different

    combinations of sample size and outlier distance. The optimal  α  de-creased when outliers becamemore distant and with increasing sample

    size. In case of outliers at small distance from the distribution, the opti-

    mal  α  was 0.2 for all sample sizes. This value decreased when the

    outliers was further away from the distribution (0.02–0.1 for outlier at μ +5σ , 0.007–0.06 for outlier at μ +7σ ).

     3.4. Variability and bias of standard deviation

    Results concerning the variability and bias of the estimated stan-

    dard deviations are depicted in Table 2. In absence of outliers, Tukey

    and outlier search-based approaches showed a higher distance be-

    tween the estimated and actual population mean of 0.5, consistently

    underestimating the standard error. The reverse occurred when out-

    liers were present: Tukey, Dixon and Grubbs approaches had thebest accuracy, with the latter performing better when outliers be-

    came more distant. The Qn and ISO approach tended to overestimate

    the standard deviation consistently in presence of outliers and for all

     Table 1

    False and true outlier rates, expressed as percentages, in the samples of the  ve different approaches, for a representative selection of investigated sample sizes. False outlier rates

    are shown next to  ‘no outlier’, true outlier rates in the other lines.

    Sample

    size

    Outlier

    distance

    Normal distribution Student's t  distribution

    Grubbs Dixon Tukey ISO QN Grubbs Dixon Tukey ISO QN

    5 no outlier 10.9 5.7 29.6 4.2 15 12.8 6.7 33.5 5.3 18.7

    6 no outlier 8.1 4.0 22.0 4.6 8.1 15.1 9.1 29.1 9.7 14.8

    7 no outlier 10.2 4.0 20.7 5.0 10.0 15.0 9.0 28.0 9.2 15.9

    8 no outlier 9.3 4.2 20.8 4.7 8.5 18.2 10.0 32.2 11.8 14.8

    20 no outlier 9.8 4.5 21.2 8.4 9.1 31.6 24.3 46.6 32.2 31.5

    5   μ +3σ    21.2 11.9 52.1 8.0 25.4 19.9 10.2 44.7 7.9 23.1

    6   μ +3σ    25.7 13.9 44.7 13.1 20.3 22.7 13.6 42.5 12.1 17.6

    7   μ +3σ    30.0 15.5 47.0 15.3 24.7 22.6 12.9 41.4 13.8 23.3

    8   μ +3σ    30.9 13.7 46.5 16.5 20.9 22.4 12.0 40.6 13.9 18.4

    20   μ +3σ    35.9 17.3 50.7 28.6 28.5 20.5 11.0 43.2 19.7 19.9

    5   μ +5σ    60.0 37.1 85.8 19.5 52.3 51.5 33.4 78.9 17.4 46.5

    6   μ +5σ    73.4 52.9 86.6 43.8 56.1 67.7 47.6 83.4 39.0 49.7

    7   μ +5σ    83.2 62.2 91.3 59.2 64.2 73.2 53.2 83.8 50.9 57.7

    8   μ +5σ    89.7 60.0 90.1 70.3 69.2 79.4 48.8 85.8 59.2 60.9

    20   μ +5σ    100 95.2 99.0 99.3 98.5 99.2 81.1 96.7 93.8 92.9

    5   μ +7σ    87.7 66.6 97.7 35.2 76.7 80.1 58.7 95.3 29.7 70.0

    6   μ +7σ    96.7 85.1 99.1 76.9 83.5 93.8 78.5 98.2 71.6 82.0

    7   μ +7σ    99.5 93.1 99.1 91.9 91.2 97.4 86.3 98.3 84.1 83.2

    20   μ +7σ    100 100 100 100 100 100 100 100 100 100

    Negative predictive value

    0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

    Student’s t, outlier at 3  σ   0

       0 .   1

       0 .   2

       0 .   3

       0 .   4

       0 .   5

       0 .   6

       0 .   7

       0 .   8

       0 .   9

       1

    0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

    Student’s t, outlier at 5  σ

    0.0 0.2 0.4 0.6 0.8 1.0

    Negative predictive value

    Normal, outlier at 3 σ   0

       0 .   1

       0 .   2

       0 .   3

       0 .   4

       0 .   5

       0 .   6

       0 .   7

       0 .   8

       0 .   9

       1

    Normal, outlier at 5 σ

    Grubbs

    Dixon

    Tukey

    ISO

    Qn

       P  o  s   i   t   i  v  e

      p  r  e   d   i  c   t   i  v  e

      v  a   l  u  e

       P  o  s   i   t   i  v  e

      p  r  e   d   i  c   t   i  v  e

      v  a   l  u  e

    Fig. 1. Negative and positive predictive values for the  ve different approaches, based on samples of size n =6, for Normal and Student's t -distributions.

    584   W. Coucke et al. / Clinica Chimica Acta 413 (2012) 582–586 

  • 8/9/2019 1-s2.0-S0009898111006644-main.pdf

    4/5

    sample sizes. Precision was similar for all approaches and increased

    with increasing sample size.

    4. Discussion

    The ndings of the present study illustrate that, as far as symmetric

    unimodal distributions are concerned, the behaviour of the different ap-

    proaches for estimating Z -scores does not really depend on the kurtosis

    (peakness) of the distribution: similar performances were found for the

    datagenerated from theNormal and from the Student's t -distribution.Al-

    though Normal and  t -distributions cover a wide range of distributions

    that describe data reported in EQA surveys, distributions may be multi-

    modal or exhibit skewness in some cases and the  Z -scores may become

    unreliable. Unimodality is a prerequisite to obtain reliable  Z -scores and

    inthe light of thepresence of matrix effects [17], the performanceof a lab-

    oratory should be assessed with respect to its peers by so-called peer

    group comparisons, andthiscan only be assured by grouping dataaccord-

    ingto equal or similar methodology. As a result,peer groupsmaybe small.

    For example, half of the peer groups in Belgian EQA programmes for

    chemistry and immunoassays contain 10 laboratories or less.

    Apart from avoiding multimodality by the EQA set up, post hoc

    controls for unimodality and symmetry may be applied as well. For-

    mal tests have been described to test whether the data are unimodal

    [18]. They are based on kernel density estimation and standard errors

    and signicance of multimodality can be obtained by bootstrapping.

    In addition, asymmetry of the data distribution can be assessed by

    measuring skewness after removing spurious results.

    Regarding PPV, we observed that, for small sample sizes (nb10) and

    for outliers close to the centre, the Tukey, and to a lesser extent, the

    Grubbs approaches performed better than the other ones. Remark

    that the ISO approach has, in comparison with other approaches, low

    outlier  nding capacities for sample sizes below 10. There is however

    not much difference between the various approaches when sample

    size increases and/or when outliers are located further away from the

    centre, so that the question of which approach to select for  Z -scores

    should only be addressed for small sample sizes.

    Negative predictive value

    0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

    Student’s t, outlier at 3  σ   0

       0 .   1

       0 .   2

       0 .   3

       0 .   4

       0 .   5

       0 .   6

       0 .   7

       0 .   8

       0 .   9

       1

    0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

    Student’s t, outlier at 5  σ

    0.0 0.2 0.4 0.6 0.8 1.0

    Negative predictive value

       P  o  s   i   t   i  v  e

      p  r  e   d   i  c   t   i  v  e

      v  a   l  u  e

       P  o  s   i   t   i  v  e

      p  r  e   d   i  c   t   i  v  e

      v  a   l  u  e

    Normal, outlier at 3 σ   0

       0 .   1

       0 .   2

       0 .   3

       0 .   4

       0 .   5

       0 .   6

       0 .   7

       0 .   8

       0 .   9

       1

    Normal, outlier at 5 σ

    Grubbs

    Dixon

    Tukey

    ISO

    Qn

    Fig. 2. Negative and positive predictive values for the  ve different approaches, based on samples of size n =8, for Normal and Student's t -distributions.

     Table 2

    Standard error and average of the estimate of variability obtained by the different approaches, for a representative selection of investigated sample sizes. Better estimates of the

    standard error have lower values in the left part of the table, and should tend as much as possible to the original value of  σ , which was in our setting 0.5, in the right part of the

    table. Results are obtained from a Normal distribution.

    Standard error of estimated standard deviation Mean of estimated standard deviation

    Sample size Outlier distance Grubbs Dixon Tukey ISO QN Grubbs Dixon Tukey ISO QN

    5 0 0.424 0.423 0.454 0.475 0.579 0.478 0.490 0.430 0.553 0.571

    6 0 0.394 0.394 0.412 0.442 0.486 0.483 0.492 0.438 0.541 0.544

    7 0 0.393 0.390 0.415 0.437 0.508 0.484 0.496 0.455 0.538 0.553

    8 0 0.380 0.377 0.397 0.418 0.445 0.484 0.492 0.451 0.530 0.531

    20 0 0.290 0.287 0.346 0.315 0.319 0.488 0.493 0.484 0.511 0.510

    5 3 0.561 0.529 0.559 0.585 0.798 0.732 0.763 0.549 0.857 0.846

    6 3 0.526 0.494 0.513 0.562 0.639 0.686 0.721 0.559 0.771 0.771

    7 3 0.504 0.471 0.477 0.528 0.627 0.658 0.696 0.546 0.726 0.737

    8 3 0.477 0.441 0.482 0.501 0.542 0.634 0.673 0.548 0.682 0.700

    20 3 0.339 0.316 0.350 0.331 0.339 0.547 0.566 0.508 0.553 0.566

    5 5 0.845 0.847 0.559 0.859 0.958 0.822 0.982 0.549 1.168 0.921

    6 5 0.741 0.783 0.513 0.724 0.761 0.685 0.825 0.559 0.871 0.837

    7 5 0.645 0.720 0.477 0.599 0.678 0.605 0.740 0.546 0.760 0.760

    8 5 0.565 0.681 0.482 0.538 0.596 0.547 0.726 0.548 0.697 0.727

    20 5 0.281 0.346 0.350 0.331 0.344 0.486 0.505 0.508 0.553 0.568

    5 7 0.940 1.118 0.559 1.128 0.969 0.703 1.001 0.549 1.367 0.923

    6 7 0.663 0.896 0.513 0.756 0.770 0.535 0.718 0.559 0.881 0.839

    7 7 0.461 0.717 0.477 0.600 0.678 0.490 0.594 0.546 0.760 0.760

    8 7 0.393 0.726 0.482 0.538 0.596 0.481 0.618 0.548 0.697 0.727

    20 7 0.281 0.278 0.35 0.331 0.344 0.486 0.489 0.508 0.553 0.568

    585W. Coucke et al. / Clinica Chimica Acta 413 (2012) 582–586 

  • 8/9/2019 1-s2.0-S0009898111006644-main.pdf

    5/5

    For the NPV, Tukey's approach demonstrated the worst perfor-

    mance, so that, in line with its underestimation of variability in absence

    of outliers, this approach has a much higher agging rate thanother ap-

    proaches, regardless of the contamination of the sample. Further, for

    leptokurtic data, more values will be wrongly agged when the sample

    size increases.The latter caneasily be explainedby the higher frequency

    of data in the tails of the distribution. This  nding is contraintuitive to

    Thienpont's suggestion   [19]   to make the threshold for   agging

     Z -scores dependent on the sample size. The explanation lies in the factthatall tests assumea Normaldistribution, and increasing thethreshold

    value with decreasing sample size would only work for normally dis-

    tributed data [20]. Nevertheless, changing the threshold has an inverse

    effect on the NPV and PPV and it is therefore important to consider NPV 

    and PPV together. A look at the analysis of NPV and PPV shows that the

    difference between the algorithms disappears with increasing sample

    size and for outliers further away from the centre. For outliers relatively

    close to the centre and for smaller sample sizes, however, the outlier-

    search based algorithms tend to perform better than the robust

    algorithms.

    When the estimated standard deviation is not only used for  Z -scores

    but also for a follow up of the performance of the different peer groups,

    this standard deviation seems to be overestimated by every approach

    when outliers are present at a small distance from the centre. The latter

    can be explained by the low performance of all the algorithms with re-

    spect to outliers relatively close to the centre. We see however that,

    alsohere, the outlier-search based algorithms perform better thanthe ro-

    bust algorithms when the outliers are more distant from thecentre of the

    distribution. The low ef cacy of robustestimators forsample sizes up to 6

    has been stipulated already by Rousseeuw [21]. For the particular objec-

    tive of this study, it can be added that robust estimators underperform

    also forsamples of larger size,and that thestability of the estimated stan-

    dard deviation is quite similar across the different approaches.

    When considering NPV, PPV, and bias of the variability estimators

    together, we would recommend the outlier-search based algorithms

    as compared to the robust approaches, certainly when sample sizes

    are small. If however a robust approach is preferred, we would rec-

    ommend the Tukey approach for its simplicity, its unbiased estimator

    of variability and high   agging rate when outliers are present, al-though its relatively low negative predictive value may make it use-

    less for punitive EQA programmes.

    To provide an answer to the question of the minimal sample size

    of a peer group before its members can be evaluated. There are two

    antagonistic arguments involved. Firstly, NPV and PPV, and accuracy

    of the estimated standard deviation increase with increasing sample

    size and hence, larger sample sizes are preferred. Secondly, when

    only large peer groups are evaluated, many laboratories will escape

    evaluation; hence, from this perspective, smaller sample sizes are

    preferred. In our opinion, the Grubbs and Tukey approaches  nd the

    best compromise between the antagonistic arguments with a mini-

    mal sample size of 6, which is in line with previously published

    results  [5]. The Grubbs approach should be applied with a high  α

    (0.2) when sample sizes are small and a lower  α  (0.02–0.01) when

    sample sizes are larger (n≥10). If the EQA organiser favours the ISO

    approach, we would denitively not recommend it for sample sizes

    below 10.

    In conclusion, this study focussed on small sample sizes with one

    outlier added. When sample sizes increase and the probability of en-

    countering multiple outliers becomes high, the outlier searching algo-

    rithms applied here may suffer from the masking effects when thedata contain more outliers, i.e. the presence of an outlier may escape

    notice if a larger outlier is present. In this case masking-free modica-

    tions of the Grubbs and Dixon test may be applied [22,23].

    References

    [1] Plebani M. External quality assessment programs: past, present and future. Jugo-slav Med Biohem 2005;24:201–6.

    [2] International Organization for Standardization. ISO 13528:2005. Statisticalmethods for use in prociency testing by interlaboratory comparisons; 2005.

    [3] M. Thompson, S.L.R. Ellison, R. Wood. The International Harmonised Protocol forthe Prociency Testing of Analytical Chemistry Laboratories. Pure Appl. Chem2006;78:145–96.

    [4] Shif er RE. Maximum  Z  scores and outliers. Am Stat 1988;42:79–80.[5] Hund E, Massart DL, Smeyers-Verbeke J. Inter-laboratory studies in analytical

    chemistry. Anal Chim Acta 2000;423:145–65.

    [6] Healy M. Outliers in clinical chemistry quality-control schemes. Clin Chem1979;25:675–7.[7] Rocke D. Robust statistical analysis of interlaboratory studies. Biometrika

    1983;70:421–31.[8] Heydorn K. The distribution of interlaboratory comparison data. Accredit Qual

    Assur 2008;13:723–4.[9] Duewer DL. The distribution of interlaboratory comparison data: response to the

    contribution by K. Heydorn. Accredit Qual Assur 2008;13:725–6.[10] Grubbs FE. Procedures for detecting outlying observations in samples. Techno-

    metrics 1969;11:1–21.[11] Rosario P, Martínez JL, Silván JM. Comparison of different statistical methods for

    evaluation of prociency test data. Accredit Qual Assur 2008;13:493–9.[12] Dixon WJ. Analysis of extreme values. Ann Math Stat 1950:488–506.[13] Tukey JW. Exploratory data analysis; 1977. th ed. MA: Reading.[14] Sciacovelli L, Secchiero S, Zardo L, Plebani M. External Quality Assessment

    Schemes: need for recognised requirements. Clin Chim Acta 2001;309:183–99.[15] Rousseeuw PJ, Croux C. Alternatives to the median absolute deviation. J Am Stat

    Assoc 1993;88(424):1273–83.[16] Wilrich P-T. Robust estimates of the theoretical standard deviation to be used in

    interlaboratory precision experiments. Accredit Qual Assur 2007;12:231–40.[17] Miller WG. Specimen materials, target values and commutability for external

    quality assessment (prociency testing) schemes. Clin Chim Acta 2003;327:25–37.

    [18] Lowthian PJ, Thompson M. Bump-hunting for the prociency tester-searching formultimodality. Analyst 2002;127:1359–64.

    [19] Thienpont LMR, Steyaert HLC, De Leenheer AP. A modied statistical approach forthe detection of outlying values in external quality control: comparison withother techniques. Clin Chim Acta 1987;168:337–46.

    [20] Zhou Q, Xu J, Xie W, Li S, Li X. Use of robust ZB and ZW to evaluate pro ciencytesting data. Clin Chim Acta 2011;412:936–9.

    [21] Rousseeuw PJ, Verboven S. Robust estimation in very small samples. Comput StatData Anal 2002;40:741–58.

    [22] Rosner B. On the detection of many outliers. Technometrics 1975;17:221–7.[23] Jain RB. A recursive version of Grubbs test for detecting multiple outliers in envi-

    ronmental and chemical data. Clin Biochem 2010;43:1030–3.

    586   W. Coucke et al. / Clinica Chimica Acta 413 (2012) 582–586