method validation

Method validation

Dietmar Stö[email protected]

With Confidence

• Performance specifications

• Experimental protocols

• Statistical interpretation

• EXCEL® Files

Method validation 2

STT ConsultingDietmar Stöckl, PhD

Abraham Hansstraat 11B-9667 Horebeke, Belgium

e-mail: [email protected] + FAX: +32/5549 8671

Copyright: STT Consulting 2007

mailto:[email protected]

Method validation 3

Content

Introduction

Materials

Validation protocols• Imprecision• Limit of detection (LoD)• Working range• Linearity model 1• Linearity model 2, accuracy protocol (= accuracy of calibration curve)• Recovery model 1 (paired sample protocol: spike and control)• Recovery model 2 (accuracy protocol: sample with target value)• Interference• Method comparison

Annex• Summary of protocols, statistics & graphics• System stability, Ruggedness and multifactor protocols• Glossary of terms

Content

Method validation 4

Introduction

WHAT is validation?

Validation is the confirmation, through the provision of objective evidence, that

requirements for a specific intended use or application have been fulfilled (ISO

9000).

We see, from this definition, that we have to • specify the intended use of a method,• define performance requirements,• provide data from validation experiments (objective evidence), and• interprete the validation data (confirmation that requirements have been

fulfilled).

WHICH type of performance requirements (specifications) exist?

Performance requirements can be statistical, analytical, or

application-driven/regulatory.

Statistical and analytical specifications are most useful for method evaluation.

Application-driven/regulatory specifications are used for validation. Some

examples are given in the table below.

WHICH performance characteristics exist?

We have seen that we have to specify performance requirements for a validation.

These requirements refer to the following performance charateristics of an

analytical method:• Imprecision• Limit of detection• Working range• Linearity• Recovery• Interference/Specificity• Total error (method comparison)• [Robustness/Ruggedness]: will not be addressed in this book.

Introduction

Performance requirements (specifications)

Statistical

t-test: P ≥ 0.05

F-test: P ≥ 0.05

Analytical

Bias Calibration tolerance

CV stable CV

Application-driven#

Bias 3%

CV 3%

#Cholesterol (National Cholesterol Education Program)

Method validation 5

Introduction

WHICH experiments do we have to perform?

The experiments we have to perform depend on the performance characteristic

we want to validate. For the estimation of method imprecision, for example, we

need to perform repeated measurements with a stable sample. However, there is

no agreement over the various application fields of analytical methods about the

design of such experiments. In this book, we will mainly refer to the experimental

protocols from the Clinical and Laboratory Standards Institute (CLSI). The table

below gives an overview about typical experiments to be performed during a

method validation study.

These experiments will be described in detail in the following chapters of the

book.

Introduction

Performance

chracteristic

Samples

Measurements

Imprecision IQC-samples; no target

n = 20 (repetition over several days)

LoD/LoQ Blank; Low sample

n = 20 (repetition over several days)

Linearity 5 related samples/-calibrators (mix); no target

n = 4 (repetition within day)

Working range See: Imprecision/Linearity

Interference Samples: Interferent spike & control (no target)

n = 4 (repetition within day)

Recovery

(Accuracy/Trueness)

Samples: Known analyte spike & control or

certified reference materials (CRM)

n = 4 - 5 (repetition over several days)

Total error

(method comparison

40 samples (target by reference method)

n = 1 or 2 (measurement in one or several days)

IQC: Internal Quality Control; LoD: limit of detection; LoQ: limit of quantitation

Method validation 6

Introduction

HOW do we make decisions?When we have created data, we have to decide whether they fulfill the requirements that have been selected for the application of the method "for a specific intended use". Currently, it is common practice to make decisions without considering confidence intervals or statistical significance testing. Modern interpretation of analytical data, however, requires the use of confidence intervals/statistical significance testing.These two approaches are compared in the table below for the case of a recovery experiment.

In the “old” approach, we compare one “naked” number with the specification. This approach misses the information on the number of measurements that have been performed and the imprecision of the method. If we would repeat the validation, we easily could obtain a recovery estimate of 80%, for example. Therefore, decision-making should be statistics-based. This is by applying a formal statistical test or by interpreting the confidence interval of an experimental estimate.

Statistics-based decision – Importance of the “test-value” (= requirement, specification)When we make statistics-based decisions, the selection of the test value will depend on the type of requirement we apply (statistical, analytical, validation).Statistical

- Statistical test versus Null-hypothesis (F-test, t-test, 95% confidence-intervals, …): Bias = 0; Slope = 1; Intercept = 0; etc.

Analytical- Statistical test versus estimate of stable performance (F-test, t-test, 95%

confidence-intervals, etc.): Bias calibration tolerance; etc.Validation case (application-driven; “specific intended use”)

- Statistical test versus validation limit (F-test, t-test, 95% confidence-intervals, etc.): CVexp CVmax; Biasexp Biasmax; etc.

Nevertheless, in all three situations, we apply the same type of statistical tests.

Introduction

Decision making approaches

“Old”

Experimental recovery: 90%

Limit: 85 – 115%

Decision: passed

“Modern”

Experimental recovery: 90%

Confidence interval: 11%

(with n = 4 and CV = 7%)

Limit: 85 – 115%

Decision: fail

(90 – 11 = 79%, exceeds 85%)

Action: increase n or reduce CV

Method validation 7

Introduction

Interpretation of 95%-confidence limits

Confidence limits and quality specifications

The figure below shows a graphical interpretation of 95%-confidence limits versus

a predefined quality specification: "10".

Note

When comparing an estimate with a specification, usually, the confidence limits

are constructed 1-sided.

1. Interpretation of the cases A – D when the specification is a limit

A: "In", the specification is satisfied with 95% probability.

B: Not "In" with 95% probability- More data may help

C: Not "In" with 95% probability, but also not out with 95% probability.

D: "Out"

2. Interpretation when the number characterizes a stable process

If the "number" is the typical performance of a stable process, situation C can still

be accepted.

C: Look at lower limit: Not "Out" with 95% probability.

This situation is applied in the EP 5 protocol to investigate whether the user CV is

different from the typical manufacturer CV.

Introduction

Specification 101. Limit2. Typical performance

http://images.google.be/imgres?imgurl=http://itv.football365.com/mediastore/Story_Images/ITV_Pics/Other_Pics_ITV/exclamation_mark60x60.jpeg&imgrefurl=http://itv.football365.com/Football_on_ITV/On_The_Ball/index.shtml&h=60&w=60&sz=1&tbnid=0gE0-ivOcPEJ:&tbnh=60&tbnw=60&start=255&prev=/images%3Fq%3DExclamation%2Bmark%26start%3D240%26hl%3Den%26lr%3D%26sa%3DN

Method validation 8

Introduction

SUMMARY

For a successful validation, we need performance specifications, experimental

protocols, and statistical interpretation of the data. The whole exercise, however

should be carefully planned, including the samples needed, the foreseen internal

quality control, and the documentation of the results. A validation plan should

consider (at least), the following elements.

Validation plan• Define the application, purpose and scope of the method• Define performance characteristics and acceptance criteria• Develop a validation protocol or operating procedure for the validation• Qualify materials, e.g. standards, reagents, and samples• Perform validation experiments• Document validation experiments and results in the validation report• Interprete the validation data and make statistics-based decisions

Introduction

Method validation 9

Introduction

In the book, the following validation example will be used.

Measurand

Amount-of-substance concentration of glucose in serum

S-glucose: mmol/L (adult reference interval: 3.9 – 5.8 mmol/L).

Specific intended use

For in vitro diagnostic purposes.

Performance specifications

Data simulation

Most data are simulated with an assumed method CV of 1-2% (within-run) and

3% (total).

Introduction

Performance characteristic Specification

Imprecision Within-run: 1.5%#

Total: 3%#

LoD 0.1 mmol/L

Working range 0.1-42 mmol/L

Linearity 0.1-42 mmol/L

Limit: 5%

Recovery Limit: 5%

Interference Limit: 10%

Total error – Method comparison Limit, Bias: 3%; Total error: 10%

#Note: typical values for stable process; not meant as limit!

Method validation 10

Materials

Materials

Instrument XYZ

Standard, Lot#

Reagent, Lot#

Imprecision (CLSI EP5) and IQC during experiments

Low IQC material : 3.9 mmol/L

High-normal IQC material : 5.9 mmol/L

High IQC material : 8.5 mmol/L

LoD, dilutions, "adaptation of control" (CLSI EP17)

Isotonic saline solution (= Blank) : 0 mmol/L

Linearity, experiment 1 (CLSI EP6)

Low sample 1 : 3.0 mmol/L

High sample 1 : 7.0 mmol/L

Linearity, experiment 2 ("manufacturer protocol": accuracy)

Spiked “Blank” : 45.0 mmol/L

Recovery and Interference (CLSI EP7)


Normal sample : 4.8 mmol/L


Glucose solution in isotonic NaCl : 30.0 mmol/L

Bilirubin solution in isotonic NaCl : 600 mg/dL

Low sample 2 spiked with bilirubin : 60 mg/dL

Recovery (Accuracy)

Standard 1 : 4.5 mmol/L



Method comparison (CLSI EP9)

40 native samples : various


Imprecision

Graphics• Dot plot• Histogram

Statistics• Descriptive Statistics: Dispersion• Gaussian "("Normal“) distribution• Outliers• Sampling statistics & Confidence intervals of SD‘s• Significance tests for SD & variance (Chi2, F-test)• ANOVA model II

Imprecision


Imprecision

The CLSI protocol (EP-5)• 2 Different samples (e.g., low and high)• 1 or 2 runs/per day• Duplicates• 20 DaysIQC! with 1 or 2 samples

Specific calculations for a single run

Within-run standard deviation (swr):

swr = SQRT[2dupl/(2 20)]

dupl = Difference of within-run duplicates

Standard deviation of the daily means (smeans = "B" in EP-5):

smeans = SQRT[2means/(20-1)]

means = Difference [daily mean - overall mean of 20 days]

Between-day standard deviation (sbd):

sdd = SQRT[s2means – s2

wr/2]

CAVE: set sdd = 0 when s2means < s2

wr/2 (negative SQRT!)

Total standard deviation (sT):

sT = SQRT[s2means + s2

wr/2]

CAVE: set sT = swr when s2means < s2

wr/2

Calculation of degrees of freedom: (EP5)–s2

wr = number of duplicates measured: 20

–s2T = complex: precalculated in EXCEL-template

Comparing a SD-estimate with a claim–Test overlap of 1-sided confidence limit (CL) of SDs with claim, or–1-sample F-test ("Chi2-test"), 1-sided (EXCEL-template)

Statistics for imprecision can also be treated with Model II ANOVA!

Importance of imprecision• Limit of detection• Working range• Number of analytical replicates• Troubleshooting

Imprecision


Imprecision – EXCEL file

Graphics

The distribution of the mean values does not indicate an outlier.

The distribution of the differences indicates that day 6 may be an outlier (-0.24).

According to the CLSI protocol it is not (4 SD outlier criterium). According to the

Grubbs-test, it is.

Calculations

The Worksheet uses the CLSI EP5 calculations and EXCEL ANOVA (Tools>Data

Analysis). In case ANOVA is used, the formulae for Swr, Sdd, and ST must be

calculated with EXCEL (see examle in the Worksheet).

Note: Due to the nature of calculation of Sdd (SQRT of a difference), Sdd is set to

zero when MS-Between groups is <= MS-Within groups.

We calculate:

Swr = 0.063 mmol/L; CVwr = 1.1%

Sdd = 0.170 mmol/L

ST = 0.181 mmol/L; CVT = 3.1%

Imprecision

Day Replicate 1 Replicate 21 5.95 5.82

2 5.64 5.81

3 5.92 5.98

4 5.85 5.85

5 5.98 5.92

6 5.77 5.53

7 5.91 5.92

8 5.94 5.91

9 6.16 6.14

10 5.83 5.79

11 5.79 5.80

12 6.04 6.06

13 6.18 6.21

14 6.03 6.17

15 6.02 6.03

16 6.14 6.16

17 5.95 5.90

18 6.07 6.17

19 5.78 5.84

20 6.31 6.40


Imprecision – EXCEL file

Interpretation

The calculated values for imprecision are:

CVwr (exp) = 1.1%

CVT (exp) = 3.1%

The specifications are:

CVwr (stable) = 1.5%

CVT (stable) = 3.0%

We compare them by use of the Chi2-statistics.

We test whether the lower, 1-sided 95% confidence limits of the experimental

estimates are equal or smaller than the preset specifications.

Both values pass this statistical test, even though the experimental total CVT

(3.1%) is higher than the limit (= 3%). The reason is that the lower confidence limit

(=2.51%) is <3%.

Calculations

Chi2exp = (SD2exp df)/SD2

claim (df = degrees of freedom, here = 20)

Lower CL of SD = SD • SQRT[(df)/Chi20.05,df]

Conclusion

The validation data demonstrate that the method passes the pre-set

specifications for within and total imprecision.

DETAILED STATISTICAL BACKGROUND

Statistics• Descriptive Statistics: Dispersion• Gaussian "("Normal“) distribution• Outliers• Sampling statistics & Confidence intervals of SD‘s• Significance tests for SD & variance (Chi2, F-test)• ANOVA model II

Imprecision


Limit of detection (LoD)

Concepts

LoD can be calculated from the• standard deviation of a blank• signal-to-noise ratio of a chromatogram of a low sample• calibration line by means of regression

Graphics• Dot plot• Scatter plot

Statistics

From blank• Outlier• Mean• Confidence interval of centiles

• SDtotal (experiments on different days)

• Consideration of -errors and -errors: Power concept

LoD considering of -errors and -errors

Model 1: LoD = Mean + 1.65 s0 (s = at zero)

• 5% false positives when the analyte is not present (-error)

• 50% false negatives (-error) when the analyte "is present at 1.65 s0".

Model 2: LoD = Mean + 2 • 1.65 s = Mean + 3.3 s• Mean and s are from the zero-standard• 3.3 s often simplified to 3 s

Result: 5% false positive (-error) and 5% false negative (-error)

Model applied in this book and in the EXCEL file

Simplified Model 2: LoD = Mean + 3 s

Limit of detection

1.65 s1.65 s1.65 s

3.3 s3.3 s


Limit of detection (LoD) – Other concepts

Chromatographic (S/N = 3)• Outlier• Mean• SDtotal (experiments on different days)

Chromatographic LoD (S/N = 3) compared with LoD from “blank” (mean noise + 3.3 SD)

From calibrationCalculation of LoD from calibration data with regression

Yb = "Signal of blank" via regression = intercept aSb = "Standard deviation of blank" = Sy/xb = slopeTransform "Signal LoD" to concentration"Signal" LoD = a + 3 Sy/xCalculate CLoD via regression equation y = a + b xCLoD = (a + 3 Sy/x – a)/b = [3 Sy/x]/b

When the calibration curve passes through zero, the mean-term is omitted (e.g., in case of an automatic blank).

Limit of detection

No

ise

2 S

D

Sig

na

l 6

SD

0

5

10

15

20

Time

Resp

on

se

0

5

10

15

20

Time

Resp

on

se

LoD = Mean noise+ 3.3 SD

LoD = S/N = 3


Limit of detection (LoD)

Samples

Usually, the LoD is derived from test variation at zero analyte. This requires

suitable "blank" samples. For exogenous compounds, such as drugs, this is easy

to realize. For endogenous compounds, suitable blank samples are more difficult

to realize. Note that "stripped" samples or blank solutions often give an

overoptimistic LoD because of their "clean" matrix.

Ideally, the LoD of a method should be assessed with several native samples

containing concentrations near the detection limit, as determined by a reference

method.

Alternatively, the LoD is derived from measurements of calibrators.

Protocols

Blank ("Common"): Applied in this book and the EXCEL file

20 measurements of the zero-standard/blank- 20 days, for example combined with EP5

Chromatographic

20 measurements of a sample that gives a Signal/Noise ratio of 3.- 20 days, for example combined with EP5

Calibration

From calibration curves at several different days (for example 5).

CLSI Protocol

EP 17 Determination of Limits of Quantitation.

Limit of detection


Limit of detection (LoD) – EXCEL file

Graphic

The graphic gives no indication of an outlier.

Calculations (3 s model)

Mean: 0.0020 mmol/L

SD: 0.0219 mmol/L

Confidence interval 3SD-centile (1-sided, 95%): 0.02 mmol/L

Calculation: t(0.1,19) SQRT[SD2/20 + (32 SD2/2 20)]

LOD: 0.068 mmol/L; #UCL: 0.088 mmol/L

LOD (blanked): 0.066 mmol/L; #UCL: 0.086 mmol/L

#UCL: upper confidence limit

Interpretation

We compare the UCL of the LoD (0.088 or 0.086 mmol/L) with the specification of

0.1 mmol/L.

Conclusion

The validation data demonstrate that the method passes the pre-set specification

for the LoD.

Limit of detection

Day mmol/L1 0.01

2 -0.01

3 0.02

4 0.04

5 0.02

6 -0.03

7 -0.01

8 0.00

9 -0.01

10 0.01

11 0.02

12 -0.03

13 0.03

14 -0.03

15 0.02

16 0.01

17 0.01

18 -0.04

19 0.01

20 0.00

-0.05

-0.04

-0.03

-0.02

-0.01

0.00

0.01

0.02

0.03

0.04

0.05

Blank


Working range – 2 Models

• Fixed value of the precision profile (Figure), or

• Linear part of the calibration function

In this book and in the EXCEL file, the working range is defined by the linearity of

the calibration curve.

Protocol

The protocol is presented in the chapter linearity/manufacturer protocol. In fact,

this is a protocol that assess accuracy with a number of related (mixed) samples.

Statistics & Graphics

The statistics and graphics are presented in the chapters linearity and

accuracy/recovery.

0

5

10

15

2025

30

35

40

45

0 5 10 15 20 25Analyte (arbitrary units)

CV

(%

)

Limit of detection

Working range

Working range


Linearity

Graphics• Scatter plot• Residual plot (preferred)• For "accuracy model": Difference plot (preferred)

Statistics

Model 1• Based on linear regression and ANOVA: F-test for variance around line/within

sample sets (lack-of-fit: old EP 6 model)• Comparison of linear model with 2nd or 3rd order models (new EP 6 model)

Interpretation: Use CBstat

Statistics>Method evaluation>Linearity

Model 2 ("Common", Accuracy)

Often used by manufacturers for defining the Working Range

("Accuracy-based" = true x-values: e.g., weighed-in)

Investigate the deviation from the line of equality with• confidence limits, or• t-test

Interpretation• Use EXCEL® template

Note

In some fields, the correlation coefficient is used to assess linearity.

Linearity


Linearity model 1

CLSI EP-6 protocol 5 interrelated samples

Mixing protocol

1 low

2 low (3) + high (1)

3 low (2) + high (2)

4 low (1) + high (3)

5 high

Alternative mixing

1 low

2 low medium: mix medium and low (1:1)

3 medium: low and high (1:1)

4 high medium: mix medium and high (1:1)

5 high

Measurement design

Measure all samples 4 times (random), within-run or "closely related runs": SDwr.

Linearity


Linearity model 1 – EXCEL file (worksheet Linearity)

Samples

Low sample: 3 mmol/L

High sample: 7 mmol/L

EP 6 mix protocol

Concentrations (C) of samples 2 - 4 (V = volume)

C = (C1*V1 + C5*V5)/(V1 + V5)

Sample# Concentration (mmol/L)

1 3

2 4

3 5

4 6

5 7

Graphic

The graphic may indicate outliers in the levels 4 and 6 mmol/L. The Grubbs test,

however, does not confirm the presence of an outlier.

The residuals plot indicates non-linearity.

Linearity

Sample y1 y2 y3 y4

3.0 2.99 2.94 3.01 3.06

4.0 3.93 4.02 4.01 4.03

5.0 4.97 5.02 4.95 4.92

6.0 5.74 5.90 5.97 5.93

7.0 6.78 6.69 6.82 6.65


Linearity model 1 – EXCEL file (worksheet Linearity)

Calculations

The data are investigated for linearity with specialized software (here: CBstat).

The models used are the "lack-of-fit" method and the evaluation by a second

order polynomial fit (new CLSI EP 6 model).

"Lack-of-fit"

F-test for linearity: F = 2.5125 P: 0.0980

No significant deviation from linearity.

Second order polynomial fit

t-test of last coefficient against zero:

SE of last coef.: 0.0085 t value: -2.8816 P:0.0104

x-level %-difference

3 -1.6

4 0.6

5 1.0

6 0.4

7 -0.7

Significant deviation from linearity, but non of the levels deviates by more than 5%

(chosen limit).

Interpretation

The statistical results show that the second order polynomial fit method is more

sensitive than the lack-of-fit method. The latter shows that the data-set is non-

linear. However, the 5% limit is not exceeded.

Conclusion

The validation data demonstrate that the method passes the pre-set specification

for linearity.

Linearity


Linearity model 2 – EXCEL file (worksheet Lin-Manuf)

Accuracy protocol ("Working Range protocol")

This model is called "Working Range protocol" because it is often applied by

manufacturers to establish the working range.

Samples

11 (for example) interrelated samples, prepared by mixing of a blank sample and

a blank sample spiked with a known amount of analyte.

1: Blank (blank)

2: 9 blank + 1 high (spiked) sample









11: High (spiked, known concentration) sample

Measurement design

Measure all samples 4 times (random), within-run: SDwr.

Linearity

Sample y1 y2 y3 y4

0.0 0.03 0.00 0.00 -0.03

4.5 4.47 4.47 4.59 4.59

9.0 9.06 8.97 8.85 8.91

13.5 13.77 14.22 13.41 13.71

18.0 18.45 17.94 18.09 17.85

22.5 22.62 22.59 22.35 22.47

27.0 26.70 27.24 27.30 26.76

31.5 30.75 32.25 31.59 31.59

36.0 35.67 34.47 35.07 34.02

40.5 39.42 38.13 38.34 38.31

45 42.42 41.79 41.10 42.09


Linearity model 2 – EXCEL file (worksheet Lin-Manuf)

Graphic

The graphic shows an (expected) increase of the scatter of the data around their

mean values (constant measurement CV). Otherwise, there seems to be no

irregularity.

Calculations

The 1-sided 95% confidence interval of the mean is calculated as follows:

CI = ± t (0.1,3) x SD/SQRT(4).

Interpretation

The interpretation of the data is done by use of the difference plot. The plot

indicates that the CLs overlap with the 5% specification from a concentration

>31.5 mmol/L. More replicates could demonstrate that the concentration of 36

mmol/L is within the specified linearity limit of 5%.

Conclusions

The validation data do not support a working range up to 45 mmol/L. The range

should be reduced to 31.5 mmol/L

Linearity


Recovery

Graphics• Ratio plot (%)• Difference plot (%)

Statistics• Descriptive statistics: Location (mean, median & mode)• t-distribution• Central limit theorem• Confidence intervals• t-tests• ANOVA-model I• Power and sample size

Recovery


Recovery experiments

Protocols

Model 1 ("Paired-sample"; see also CLSI EP 7)

Samples

"Paired-sample" experiment: 2 portions of native samples; spike one with known

analyte amount (= Test) and the other with the same volume saline solution (=

Control).

3 – 5 samples at relevant concentrations• Test: Add x-mL analyte standard (preferably in blank-solution) to y-mL sample;

the volume added should be less than 5-10% (requires concentrated analyte

standard)- Added concentration: e.g.; ½-1 of a "normal" sample

• Control: Add same volume blank-solution to same volume sample

Measurement design

Measure Control & Test alternating (n = 2 – 4)- Note: may need repetition with other lots of calibrators/reagents

Calculations

Concentration added = Concentration of standard • x/(x + y)

Concentration recovered = Test - Control

Recovery (%)

= 100 • (Recovered conc./Added conc.) ± 95%CL

Model 2 (Accuracy: "trueness" based; "Common" protocol)

Samples

Experimental design: "Recovery of target values"• Reference materials with target values

- Certified reference materials- IQC materials- Standards

Measurement design• Measure samples 5 times at different days

- Note: may need repetition with other lots of calibrators/reagents

Calculations

Recovery (%) = 100 • (Measured value/Target value) ± 95% CL

Recovery


Recovery – Model 1 (paired sample), EXCEL file

Samples/Materials

Low sample : 3.5 mmol/L

Normal sample : 4.8 mmol/L


Glucose solution in isotonic NaCl : 30 mmol/L (add ≤10% volume)

Isotonic NaCl-solution

Test: Add 0,1 mL (= x) Analyte-standard to 0,9 mL (= y) sample.

Control: Add same volume NaCl-solution to same volume sample.

Calculations (see EXCEL worksheet)

Tests

C = (Csample Vsample+Cstandard Vstandard)/(Vsample+Vstandard)

Controls

C = (Csample Vsample+Csaline Vsaline)/(Vsample+Vsaline)

Added concentration

= Concentration of standard x ml standard/(x ml standard + y ml sample)

Recovered concentration

= Test – Control

Recovery (%)

= 100 (Recovered conc./Added conc.) ± CL

Results

Recovery

Control y1 y2 y3 y4

3.15 3.11 3.14 3.13 3.16

4.32 4.35 4.39 4.26 4.22

5.85 5.82 5.79 5.90 5.77

Test y1 y2 y3 y4

6.15 6.14 6.20 6.25 6.12

7.32 7.27 7.30 7.18 7.42

8.85 8.82 8.72 8.88 8.98


Recovery – Model 1 (paired sample), EXCEL file

Graphics

The graphic shows the distribution of the results around their mean values and the individual recoveries. It shows no irregularities.

CalculationsThe 1-sided 95% confidence interval of the mean difference between Test and Control is calculated withz-value as follows:CI = ± z x SDpr/SQRT(4), with z = 1.65 (1-sided 95%).The interpretation of the results is done with the confidence limits calculated with the z-value and the predicted SD (SDpr) from the EP 5 imprecision data (CLSI EP 7 approach). Note that the imprecision of the %-recoveries depend on the Test and Control level and on the magnitude of the spike (see EXCEL-file).

CAVE: if one uses t, the propagated SD from the actual data has to be calculated (SD from Test and Control: different, because of different levels!). The degrees of freedom must be calculated with the Satterthwaite formula (different concentrations!). The respective test is a t-test.CAVE: the SD of %-recovery will be high when little is spiked!!!

InterpretationThe interpretation of the data is done by use of the % ratio plot. The plot shows that none of the CLs overlap with the 5% specification.

ConclusionsThe validation demonstrates that the method passes the preset 5% limit for recovery.

Recovery


Recovery – Model 2 (accuracy/trueness), EXCEL file

Samples

Low IQC material : 3.9 mmol/L

High-normal IQC material : 5.9 mmol/L

High IQC material : 8.5 mmol/L




Measurement

Measure samples 5 times at different days.

Note: may need repetition with other lots of calibrators/reagents.

Calculations (see EXCEL worksheet)

Recovery (%)

= 100 (Measured value/Target value) ± CL

Results

Graphics

The graphic shows the distribution of the results around their mean values. It

shows no irregularities.

Recovery

Sample y1 y2 y3 y4 y5

3.9 3.93 3.90 3.88 3.92 3.91

5.9 5.83 5.70 5.79 5.63 5.84

8.5 7.92 8.64 8.31 8.79 8.66

4.5 4.63 4.48 4.40 4.50 4.60

5.0 4.92 4.97 5.29 4.95 5.14

5.5 5.59 5.60 5.68 5.93 5.28


Recovery – Model 2 (accuracy/trueness), EXCEL file

Calculations

The 1-sided 95% confidence interval of the mean is calculated as follows:

CI = ± t (0.1,4) x SD/SQRT(5).

Interpretation

The interpretation of the data is done by use of the % ratio plot. The plot shows

that only the CL of Standard 3 overlaps with the 5% specification. This standard

should be repeated.

Conclusions

The validation demonstrates that the method passes the preset 5% limit for

recovery (given that the repetition of Standard 3 is within the specification).

Recovery


Interference testing (CLSI EP7)

Graphics• See "Recovery: Paired sample

Statistics• See Recovery: Paired sample

Protocols (CLSI EP 7, 2 approaches)

Approach 1: "Paired difference method"

Applies similar experimental design and calculations as the paired-sample

recovery experiment (3 – 5 samples). Instead of analyte standard, an interferent

standard has to be prepared.• Test: Add x-mL interferent-solution (preferably in blank-solution) to y-mL sample;

the volume added should be less than 5-10%• Control: Add the same volume blank-solution to the same volume sample

Measure: Control & Test alternating (n = 2 – 4)

Interference (%)

= 100 • (Test - Control)/Contro ± 95% CL

Approach 2: "Dose-response method" (used in EXCEL file)

3 – 5 samples, for each• Low pool (low or no interferent added; if no, add blank!)• High pool (interferent at maximum concentration)

- Note: always add the same volumes blank/interferent solutions• Create 5 levels by "alternative mix-protocol linearity"!

Measure: All levels "up", then down, or random (n = 2 – 4)

Interference (%)

= 100 • (Test - Control)/Control ± CL

Note

CLSI EP7 applies regression analysis for this protocol!

Interference


Interference – EXCEL file

Samples/Materials


Interferent solution in NaCl : 600 mg/dL

Isotonic saline solution

-Make "Low pool" (add 0,1 ml saline to 0,9 ml sample)

-Make "High pool" (add 0,1 ml interferent solution to 0,9 ml sample)

(Note: always add the same volumes saline/interferent solutions)

-Create 5 levels by "alternative mixing protocol"

Measurement

Measure, within-run: All levels "up", then down, or random (n = 4)

Interference (%)

= 100 • (Test - Control)/Control ± CL

Results

Graphics

The graphic shows the distribution of the results around their mean values. It

shows no irregularities.

Interference

BILI y1 y2 y3 y4

0 3.17 3.12 3.13 3.15

15 3.24 3.18 3.03 3.22

30 3.15 3.12 3.20 3.13

45 3.33 3.36 3.34 3.40

60 3.55 3.68 3.40 3.53


Interference – EXCEL file

Calculations

The 1-sided 95% confidence interval of the mean difference between Test and

Control (0 BILI) is calculated with the z-value as follows:

CI = ± z x SDpr/SQRT(4), with z = 1.65 (1-sided 95%).

The interpretation of the results is done with the confidence limits calculated with

the z-value and the within-run imprecision as calculated from the EP 5 protocol

(CLSI EP 7 approach).

Note that the imprecision of the interference results (SDpr) is SQRT(2) times the

measurement imprecision because the interference results are the difference

between 2 measurements (Test and Control).

Interpretation

The interpretation of the data is done by use of the % difference plot. The plot

shows that only the CL of the sample with 60 mg/dL bilirubin overlaps with the

10% specification. The test is valid up to a bilirubin concentration of 45 mg/dL.

Conclusions

The validation data show that the test is valid up to a bilirubin concentration of 45

mg/dL.

Interference


Method comparison

Graphics• Scatter plot• Difference plot• Residual plot• Krouwer plot• Bland and Altman plot

Statistics• Correlation• Regression• Bland and Altman approach• General (F-test, t-test, confidence-intervals)

General remarks

Method comparison supposes:

Appropriate performance of test- and comparison method- Internal Quality Control (verify actual imprecision with expected by use of F-

test; verify calibration with targetted control samples by t-test of confidence

intervals)

Appropriate presentation of the paired observations (xi,yi)

Appropriate interpretation

Interpretation of method comparison makes integrated use of:

Graphical and statistical techniques

Analytical quality specifications

Method comparison – Sample size

Usually, general recommendations are given for sample size (EP 9: n 40, e.g.).

However, to assure given type I and II errors, i.e. sufficient power in a method

comparison study, a minimum sample size is needed depending on:• Slope or intercept deviation to be detected• Measurement range• Constant or proportional analytical error assumption• Magnitude of SD or CV for the methods

Tables are available: See Linnet K. Clin Chem 1999; 45: 882-894.

Method comparison


Method comparison protocols

The CLSI EP-9 protocol

Experimental design:• At least 40 samples• Spread analysis over 5 days, randomize concentrations• Measure duplicates in 1 run, 1st series "upwards", second series "downwards"

Apply adequate internal quality control!

Data presentation and calculations:

• Outlier tests: Diffdupl > 4 • Mean Diffdupl

(if yes, perform the same with % data)• Scatter plots, singlicates and mean of duplicates• Bias plots, singlicates and mean of duplicates• Inspect for linearity, dispersion, and range (r 0.975)• Apply linear regression (ordinary or Deming)

Interpretation:• Dependent on the criteria of the laboratory• Dependent on whether a reference method was used or a "comparative" method

Note: Make a distinction between pure statistical, analytical, and clinical

interpretation!

The Valtech protocol

Experiments• At least 50 samples (better: 80 - 100).• Carry the analyses out in singlicates, spread over 10 measurement series, and

take the samples random.• Adequate internal quality control!

Vassault A, Grafmeyer D, Naudin Cl, Dumont G, Bailly M, Henny J, Gerhardt MF,

Georges P. Société Française de Biologie Clinique. Protocole de validation de

techniques. Ann Biol Clin 1986;44:686-719 (english version: 720-45).

See also: Vassault A, Grafmeyer D, de Graeve J, Cohen R, Beaudonnet A,

Bienvenu J. Société Française de Biologie Clinique. Analyses de biologie

médicale: spécifications et normes d’acceptabilité à l’usage de la validation de

techniques. Ann Biol Clin 1999;57:685-95.

Method comparison


Method comparison protocols

The “UG” protocol

“If possible, use a true reference method for comparison”

Experiments • Start from a reliable calibration basis and verify it with IQC samples from the

manufacturer = Stable basis.• Adapt the number and the sort of samples to the problem (e.g. 50).• Duplicates in 1 series, random sampling (note: for the reference method, adapt

the number of measurements to the problem).• “Intensive” IQC

Dewitte K, Stöckl D, Van de Velde M, Thienpont LM. Evaluation of intrinsic and

routine quality of serum total magnesium measurement. Clin Chim Acta

2000;292:55-68.

The stable basis

Was the method performed adequately: Inspection of the internal quality control

(IQC) data.• Evaluation of precision and traceability to manufacturer

The stable basis – Statistics• F-test• t-test• Confidence-intervals

Method comparison


Method comparison – EXCEL file

Results

Method comparison

Ref. Yours Ref. Yours Ref. Yours Ref. Yours

3.79 3.80 4.89 4.64 5.65 5.55 6.66 6.87

3.84 3.88 4.91 4.62 5.73 5.58 6.71 6.80

3.86 3.65 4.91 4.90 5.79 6.08 6.78 6.90

3.88 3.86 4.95 4.88 5.83 5.65 6.87 7.11

3.92 3.93 5.01 4.86 5.84 6.05 6.94 7.17

3.99 4.09 5.02 4.89 5.86 5.76 7.10 7.07

4.08 4.16 5.03 5.17 5.92 5.76 7.12 7.00

4.11 4.11 5.16 4.90 5.93 5.57 7.13 7.02

4.13 4.05 5.17 5.12 5.94 6.10 7.14 6.90

4.13 4.07 5.17 5.01 5.97 5.80 7.15 7.23

4.23 4.38 5.18 5.26 5.97 5.88 7.15 7.38

4.27 4.21 5.25 5.28 6.06 6.11 7.36 7.19

4.38 4.28 5.39 5.37 6.11 6.08 7.43 7.11

4.39 4.28 5.44 5.49 6.12 5.90 7.47 7.11

4.42 4.31 5.49 5.43 6.30 6.03 7.51 7.15

4.58 4.63 5.53 5.34 6.49 6.48 7.56 7.39

4.70 4.65 5.55 4.99 6.50 6.77 7.90 7.81

4.70 4.48 5.58 5.45 6.59 6.58 8.02 7.83

4.85 5.01 5.58 5.53 6.61 6.22 8.07 7.82

4.85 4.62 5.65 5.27 6.66 6.28 8.19 7.72



Calculations – Bland & Altman approach

The calculations comprise the mean difference and the 1.96 CV of the individual

differences and their respective CLs.

CI (mean) = ± t (0.1,79) x SDdiff/SQRT(80),

CI (1.96s centile) = ± t(0.1,79) SQRT[SD2/80 + (1.962 SD2/2 80)]

= 1.71 CI (mean)

See also Worksheet Meth-Comp3 for calculations.

Graphics and interpretation

The graphic (% differences) reveals no outliers. The CLs of the mean and the

1.96 centile of the differences ("limits of agreement") do not overlap with the

respective specifications of 3% (SE or Bias limit) and 10% (TE limit).

Conclusions

The validation data show that the test passes the preset limits for systematic (3%)

and total error (10%).

Method comparison



Calculations – Regression

See Worksheet Meth-Comp4 for the detailed calculation of the ordinary linear

regression and correlation estimates.

Calculations

CI (line) = ± t Sy/x SQRT[1/n + (Xc –Xmean)2/(Xi –Xmean)2] (df t = n – 2)

Xc: concentration for which the bias shall be investigated.

CI (points) = ± t Sy/x SQRT[1 + 1/n + (Xc –Xmean)2/(Xi –Xmean)2] (df t = n –

2)

Xc: concentration for which the total error shall be investigated.

Graphics

The results are presented in a scatter plot and a residuals plot.

Interpretation

The confidence limits of bias and total error at the minimum and maximum values

of x (respectively y) are compared with the specifications. They are smaller than

the specifications at both concentrations (see Worksheet Meth-Comp4).

Conclusions

The validation data show that the test passes the preset limits for systematic (3%)

and total error (10%).

Method comparison


Notes

Notes


Content

Summary of protocols, statistics & graphics

System stability, Ruggedness and Multifactor protocols

Glossary of terms

Annex


Protocols & statistics

Experimental protocols

Protocols• Imprecision : EP 5• Limit of detection : EP 17 or "Common"• Working range : see linearity or or define by imprecision• Linearity : EP 6• Linearity by recovery : "Common" (Accuracy)• Recovery, reference material : "Common" (Accuracy)• Recovery, added analyte : see interference/specificity• Interference/Specificity : EP 7• Total error : EP 9, UG* (Method comparison)

EP* = CLSI Evaluation protocols; UG = University Ghent

Others• EP 10 Preliminary evaluation• EP 12 Qualitative tests• EP 14 Matrix effects• EP 15 User demonstration precison & accuracy• EP 21 Total error

Statistics (>Statistics Book)

Analytical problem Associated statistics

General Basic statistics

Outlier tests (e.g., Grubbs)

Imprecision F-test; CHI2-test (#), ANOVA

Limit of detection Probability & Power

Working range see linearity or define by imprecision

Linearity Regression, ANOVA

Recovery t-test (#)

Interference/Specificity t-test (#)

Total error (method comparison) Regression & correlation

Bland & Altman plot

Trouble-shooting Power (sample size calculations)

#Alternative: confidence intervals

Annex


Graphics

Univariate data Bivariate dataDot plot Scatter plot

Histogram Difference plot

Box plot Ratio plot (%) (Recovery)

Residuals plot

Annex

0

20

40

60

80

100

120

140

Sample A

Va

lue

0

1

2

3

4

5

6

7

8

55

65

75

85

95

10

51

15

12

51

35

Value-Bin

Fre

qu

en

cy

0

20

40

60

80

100

120

140

Sample A

Va

lue

-6

-4

-2

0

2

4

6

0 5 10 15 20 25

Glucose A (mmol/l)

Re

sid

ua

l (m

mo

l/l)

0

5

10

15

20

25

30

0 5 10 15 20 25

Glucose A (mmol/l)

Glu

co

se

B (

mm

ol/l

)


Overview of experiments, statistics, and graphics

Annex

Performance

chracteristic

•Samples•Measurements#•Relevant SD$

•Graphics•Statistical test vs

specification

CLSI

Doc.

Imprecision •IQC-samples; no target•n = 20•Within & total

•Dot plot•ANOVA & 1-sample F-test

or CL of SD

EP 5

EP 15

LoD/LoQ •Blank; Low sample•n = 20•Total

•Dot-plot•1-sample F-test or CL of

SD

EP 17

Linearity •5 related samples/-

calibrators (mix); no target•n = 4•Within

•Scatter-/residual plot•Lack-of-fit or polynomial

regression

EP 6

Working

range

See: Imprecision/Linearity --- ---

Interference •Samples: Interferent spike

& control (no target)•n = 4•Within

•Difference-/ratio plot•CL of mean difference (or

t-test)

EP 7

Trueness

(Accuracy)

•Samples: Known analyte

spike & control or CRM•n = 5•Total

•Difference-/ratio plot•CL of mean difference or

CL of mean (or t-tests)

EP 7

EP 15

Total error 40 samples (RMP-

target)•n = 1 or 2•Total or within (UG

protocol)

•Scatter-/bias plot•Correlation, Regression/-

Bland&Altman

EP 9

EP 21

UG

#Numbers do not always correspond to the respective CLSI document.

$Abbreviations: SD: standard deviation; IQC: Internal Quality Control; CLSI: Clinical and

Laboratory Standards Institute; CRM: Certified Reference Material; RMP: Reference

Measurement Procedure); UG: University Ghent; CL: Confidence limit.


Overview of experiments, statistics, and graphics

Sensitivity of statistical parameters to different types of errors(From: Westgard JO, Hunt MR. Clin Chem 1973;19:49-57)

Annex

Type of error

Statistic Random Constant Proportional

Slope No No Yes

Intercept No Yes No Least-squares

Sy/x Yes No No

Bias No Yes Yes Paired t-test

SDdiff Yes No Yes

Correlation r Yes No No


System stability

Annex

Trueness is also related to system [in]stability• Drift• Shift

System instability is tackled by internal quality control.

CarryoverCarryover is related to the quality of the instrument and the test procedure (e.g.,

washing).

See CLSI protocol EP-10.

Ruggedness

Ruggedness = ability to reproduce the method in different laboratories or in

different circumstances• Related to the method principle and the test conditions• Related to the instrument a method is performed with

Assessment of ruggedness• Between-laboratory performance data obtained through EQA• Ease of operation within the laboratory

–Efforts needed for internal quality control–Productivity of a method (down time, calibration and service intervals, etc.)


Multifactor protocols

Annex

Classically, single effects are investigated in one experimental design (e.g.,

imprecision, linearity, carryover). Multi-factor evaluation designs investigate

several effects with one experimental design – Advantage: less time consuming!

Example- EP 10: Allows evaluation of• Imprecision• Linearity• Bias• Carryover• Drift

Applies multiple linear regression analysis• Needs special software for interpretation

The EP-10 protocol

The design• 3 interrelated samples: low, mid, high• Prescribed measurement sequence:

Mid, mid, high, low, mid, mid, low, low, high, high, mid.• 5 days, always 1 run


Glossary

Metrology [1]

field of knowledge concerned with measurement

Measurand [1]

quantity intended to be measured

Quantity [1]

property of a phenomenon, body, or substance, to which a number can be assigned with

respect to a reference

Measurement [1]

process of experimentally obtaining one or more quantity values that can reasonably be

attributed to a quantity

Notes: • Quantities are length, mass, amount-of-substance, time, temperature, etc.• The value of a quantity is expressed by both a number and an unit• The full specification of the quantities measured in the medical laboratory comprises

three elements:

System (e.g., blood plasma)

Component (also called analyte) (e.g., glucose)

Kind-of-quantity (e.g., amount-of-substance concentration)

The full report of a glucose measurement would read: “the amount-of-substance

concentration of glucose in blood plasma was 5.2 mmol/L”

Measurement unit [1]

scalar quantity, defined and adopted by convention, with which any other quantity of the

same kind can be compared to express the ratio of the two quantities as a number

Value of a quantity [1]

number and reference together expressing magnitude of a quantity

EXAMPLE: Length of a given rod: 5.34 m

Measurement standard [1]

realization of the definition of a given quantity, with stated quantity value and

measurement uncertainty, used as a reference

EXAMPLE: 1 kg mass standard.

Annex


Glossary

Error [1]difference of measured quantity value and reference quantity value

Systematic error [1]component of measurement error that in replicate measurements remains constant or varies in a predictable manner

Bias [1]systematic measurement error or its estimate, with respect to a reference quantity value

Random error [1]component of measurement error that in replicate measurements varies in an unpredictable manner

Trueness [1]closeness of agreement between the average of an infinite number of replicate measured quantity values and a reference quantity value

Accuracy [1]closeness of agreement between a measured quantity value and a true quantity value of the measurand

Precision [1]closeness of agreement between indications obtained by replicate measurements on the same or similar objects under specified conditions

Repeatability condition [1]condition of measurement in a set of conditions that includes the same measurementprocedure, same operators, same measuring system, same operating conditions and same location, and replicate measurements on the same or similar objects over a short period of time

Reproducibility condition [1]condition of measurement in a set of conditions that includes different locations, operators, measuring systems, and replicate measurements on the same or similar objects

Uncertainty [1]parameter characterizing the dispersion of the quantity values being attributed to ameasurand, based on the information used

[Metrological] Traceability [1]property of a measurement result whereby the result can be related to a stated reference through a documented unbroken chain of calibrations, each contributing to the measurement uncertainty

Annex


Glossary

Commutability [of a reference material] [1]

property of a reference material, demonstrated by the closeness of agreement between

the relation among the measurement results for a stated quantity in this material,

obtained

according to two given measurement procedures, and the relation obtained among the

measurement results for other specified materials

Matrix effect [2]

Influence of a property of the sample, other than the measurand, on the measurement of

the measurand according to a specified measurement procedure and thereby on its

measured value [2]

Influence quantity [1]

quantity that, in a direct measurement, does not affect the quantity that is actually

measured, but affects the relation between the indication and the measurement result

Note: Specificity & Interference are not yet unequivocally defined by ISO.

Selectivity [1]

capability of a measuring system, using a specified measurement procedure, to provide

measurement results, for one or more measurands, that do not depend on each other nor

on any other quantity in the system undergoing measurement (= specificity in chemistry)

Interference [in analysis]

A systematic error in the measure of a signal caused by the presence of concomitants in

a sample (http://goldbook.iupac.org)

specific [in analysis]

A term which expresses qualitatively the extent to which other substances interfere with

the determination of a substance according to a given procedure. Specific is considered

to be the ultimate of selective, meaning that no interferences are supposed to occur

(http://goldbook.iupac.org).

Calibration [1]

operation that, under specified conditions, in a first step establishes a relation between

the

quantity values with measurement uncertainties provided by measurement standards and

corresponding indications with associated measurement uncertainties and, in a second

step, uses this information to establish a relation for obtaining a measurement result from

an indication

Sensitivity [1]

quotient of the change in the indication and the corresponding change in the value of the

quantity being measured

Annex


Glossary

Linear rangeConcentration range over which the intensity of the signal obtained is directly proportional to the concentration of the species producing the signal (http://goldbook.iupac.org).

Linearity (generic)Ability of an analytical procedure to produce test results which are proportional to the concentration (amount) of an analyte, either directly or by means of a well-defined mathematical transformation.

Working interval [1]set of values of the quantities of the same kind that can be measured by a givenmeasuring instrument or measuring system with specified instrumental uncertainty,under defined conditions

Limit of detection (in analysis)The limit of detection, expressed as the concentration, cL, or the quantity, qL, is derived from the smallest measure, xL, that can be detected with reasonable certainty for a given analytical procedure. The value of xL is given by the equation xL = xbi + k • sbi, where xbi is the mean of the blank measures, sbi is the standard deviation of the blank measures, and k is a numerical factor chosen according to the confidence level desired (http://goldbook.iupac.org).

Limit of detection [1]measured quantity value, obtained by a given measurement procedure, for which the probability of falsely claiming the absence of a component in a material is β, given a probability α of falsely claiming its presence

Ruggedness (generic)Ability to reproduce the method in different laboratories or in different circumstances.

Ruggedness (USP)Degree of reproducibility of the results obtained under a variety of conditions, expressed as %RSD. These conditions include different laboratories, analysts, instruments, reagents, days, etc.

Robustness (ICH Q2A 1995)The robustness of an analytical procedure is a measure of its capacity to remain unaffected by small, but deliberate variations in method parameters and provides an indication of its reliability during normal usage.

[1] BIPM, IEC, IFCC, ISO, IUPAC, IUPAP, OIML. Vocabulaire International des Termes Fondamentaux et Généraux de Métrologie. 3rd ed. Geneva: ISO, 2007.[2] EN/ISO 17511:2003. In vitro diagnostic medical devices – Measurement of quantities in biological samples – Metrological traceability of values assigned to calibrators and control materials.[3] See also: www.clsi.org>Harmonized Terminology Database

Annex