method validation
DESCRIPTION
Method validationTRANSCRIPT
Method validation
Dietmar Stö[email protected]
With Confidence
• Performance specifications
• Experimental protocols
• Statistical interpretation
• EXCEL® Files
Method validation 2
STT ConsultingDietmar Stöckl, PhD
Abraham Hansstraat 11B-9667 Horebeke, Belgium
e-mail: [email protected] + FAX: +32/5549 8671
Copyright: STT Consulting 2007
Method validation 3
Content
Introduction
Materials
Validation protocols• Imprecision• Limit of detection (LoD)• Working range• Linearity model 1• Linearity model 2, accuracy protocol (= accuracy of calibration curve)• Recovery model 1 (paired sample protocol: spike and control)• Recovery model 2 (accuracy protocol: sample with target value)• Interference• Method comparison
Annex• Summary of protocols, statistics & graphics• System stability, Ruggedness and multifactor protocols• Glossary of terms
Content
Method validation 4
Introduction
WHAT is validation?
Validation is the confirmation, through the provision of objective evidence, that
requirements for a specific intended use or application have been fulfilled (ISO
9000).
We see, from this definition, that we have to • specify the intended use of a method,• define performance requirements,• provide data from validation experiments (objective evidence), and• interprete the validation data (confirmation that requirements have been
fulfilled).
WHICH type of performance requirements (specifications) exist?
Performance requirements can be statistical, analytical, or
application-driven/regulatory.
Statistical and analytical specifications are most useful for method evaluation.
Application-driven/regulatory specifications are used for validation. Some
examples are given in the table below.
WHICH performance characteristics exist?
We have seen that we have to specify performance requirements for a validation.
These requirements refer to the following performance charateristics of an
analytical method:• Imprecision• Limit of detection• Working range• Linearity• Recovery• Interference/Specificity• Total error (method comparison)• [Robustness/Ruggedness]: will not be addressed in this book.
Introduction
Performance requirements (specifications)
Statistical
t-test: P ≥ 0.05
F-test: P ≥ 0.05
Analytical
Bias Calibration tolerance
CV stable CV
Application-driven#
Bias 3%
CV 3%
#Cholesterol (National Cholesterol Education Program)
Method validation 5
Introduction
WHICH experiments do we have to perform?
The experiments we have to perform depend on the performance characteristic
we want to validate. For the estimation of method imprecision, for example, we
need to perform repeated measurements with a stable sample. However, there is
no agreement over the various application fields of analytical methods about the
design of such experiments. In this book, we will mainly refer to the experimental
protocols from the Clinical and Laboratory Standards Institute (CLSI). The table
below gives an overview about typical experiments to be performed during a
method validation study.
These experiments will be described in detail in the following chapters of the
book.
Introduction
Performance
chracteristic
Samples
Measurements
Imprecision IQC-samples; no target
n = 20 (repetition over several days)
LoD/LoQ Blank; Low sample
n = 20 (repetition over several days)
Linearity 5 related samples/-calibrators (mix); no target
n = 4 (repetition within day)
Working range See: Imprecision/Linearity
Interference Samples: Interferent spike & control (no target)
n = 4 (repetition within day)
Recovery
(Accuracy/Trueness)
Samples: Known analyte spike & control or
certified reference materials (CRM)
n = 4 - 5 (repetition over several days)
Total error
(method comparison
40 samples (target by reference method)
n = 1 or 2 (measurement in one or several days)
IQC: Internal Quality Control; LoD: limit of detection; LoQ: limit of quantitation
Method validation 6
Introduction
HOW do we make decisions?When we have created data, we have to decide whether they fulfill the requirements that have been selected for the application of the method "for a specific intended use". Currently, it is common practice to make decisions without considering confidence intervals or statistical significance testing. Modern interpretation of analytical data, however, requires the use of confidence intervals/statistical significance testing.These two approaches are compared in the table below for the case of a recovery experiment.
In the “old” approach, we compare one “naked” number with the specification. This approach misses the information on the number of measurements that have been performed and the imprecision of the method. If we would repeat the validation, we easily could obtain a recovery estimate of 80%, for example. Therefore, decision-making should be statistics-based. This is by applying a formal statistical test or by interpreting the confidence interval of an experimental estimate.
Statistics-based decision – Importance of the “test-value” (= requirement, specification)When we make statistics-based decisions, the selection of the test value will depend on the type of requirement we apply (statistical, analytical, validation).Statistical
- Statistical test versus Null-hypothesis (F-test, t-test, 95% confidence-intervals, …): Bias = 0; Slope = 1; Intercept = 0; etc.
Analytical- Statistical test versus estimate of stable performance (F-test, t-test, 95%
confidence-intervals, etc.): Bias calibration tolerance; etc.Validation case (application-driven; “specific intended use”)
- Statistical test versus validation limit (F-test, t-test, 95% confidence-intervals, etc.): CVexp CVmax; Biasexp Biasmax; etc.
Nevertheless, in all three situations, we apply the same type of statistical tests.
Introduction
Decision making approaches
“Old”
Experimental recovery: 90%
Limit: 85 – 115%
Decision: passed
“Modern”
Experimental recovery: 90%
Confidence interval: 11%
(with n = 4 and CV = 7%)
Limit: 85 – 115%
Decision: fail
(90 – 11 = 79%, exceeds 85%)
Action: increase n or reduce CV
Method validation 7
Introduction
Interpretation of 95%-confidence limits
Confidence limits and quality specifications
The figure below shows a graphical interpretation of 95%-confidence limits versus
a predefined quality specification: "10".
Note
When comparing an estimate with a specification, usually, the confidence limits
are constructed 1-sided.
1. Interpretation of the cases A – D when the specification is a limit
A: "In", the specification is satisfied with 95% probability.
B: Not "In" with 95% probability- More data may help
C: Not "In" with 95% probability, but also not out with 95% probability.
D: "Out"
2. Interpretation when the number characterizes a stable process
If the "number" is the typical performance of a stable process, situation C can still
be accepted.
C: Look at lower limit: Not "Out" with 95% probability.
This situation is applied in the EP 5 protocol to investigate whether the user CV is
different from the typical manufacturer CV.
Introduction
Specification 101. Limit2. Typical performance
Method validation 8
Introduction
SUMMARY
For a successful validation, we need performance specifications, experimental
protocols, and statistical interpretation of the data. The whole exercise, however
should be carefully planned, including the samples needed, the foreseen internal
quality control, and the documentation of the results. A validation plan should
consider (at least), the following elements.
Validation plan• Define the application, purpose and scope of the method• Define performance characteristics and acceptance criteria• Develop a validation protocol or operating procedure for the validation• Qualify materials, e.g. standards, reagents, and samples• Perform validation experiments• Document validation experiments and results in the validation report• Interprete the validation data and make statistics-based decisions
Introduction
Method validation 9
Introduction
In the book, the following validation example will be used.
Measurand
Amount-of-substance concentration of glucose in serum
S-glucose: mmol/L (adult reference interval: 3.9 – 5.8 mmol/L).
Specific intended use
For in vitro diagnostic purposes.
Performance specifications
Data simulation
Most data are simulated with an assumed method CV of 1-2% (within-run) and
3% (total).
Introduction
Performance characteristic Specification
Imprecision Within-run: 1.5%#
Total: 3%#
LoD 0.1 mmol/L
Working range 0.1-42 mmol/L
Linearity 0.1-42 mmol/L
Limit: 5%
Recovery Limit: 5%
Interference Limit: 10%
Total error – Method comparison Limit, Bias: 3%; Total error: 10%
#Note: typical values for stable process; not meant as limit!
Method validation 10
Materials
Materials
Instrument XYZ
Standard, Lot#
Reagent, Lot#
Imprecision (CLSI EP5) and IQC during experiments
Low IQC material : 3.9 mmol/L
High-normal IQC material : 5.9 mmol/L
High IQC material : 8.5 mmol/L
LoD, dilutions, "adaptation of control" (CLSI EP17)
Isotonic saline solution (= Blank) : 0 mmol/L
Linearity, experiment 1 (CLSI EP6)
Low sample 1 : 3.0 mmol/L
High sample 1 : 7.0 mmol/L
Linearity, experiment 2 ("manufacturer protocol": accuracy)
Spiked “Blank” : 45.0 mmol/L
Recovery and Interference (CLSI EP7)
Low sample 2 : 3.5 mmol/L
Normal sample : 4.8 mmol/L
High sample 2 : 6.5 mmol/L
Glucose solution in isotonic NaCl : 30.0 mmol/L
Bilirubin solution in isotonic NaCl : 600 mg/dL
Low sample 2 spiked with bilirubin : 60 mg/dL
Recovery (Accuracy)
Standard 1 : 4.5 mmol/L
Standard 2 : 5.0 mmol/L
Standard 3 : 5.5 mmol/L
Method comparison (CLSI EP9)
40 native samples : various
Method validation 11
Imprecision
Graphics• Dot plot• Histogram
Statistics• Descriptive Statistics: Dispersion• Gaussian "("Normal“) distribution• Outliers• Sampling statistics & Confidence intervals of SD‘s• Significance tests for SD & variance (Chi2, F-test)• ANOVA model II
Imprecision
Method validation 12
Imprecision
The CLSI protocol (EP-5)• 2 Different samples (e.g., low and high)• 1 or 2 runs/per day• Duplicates• 20 DaysIQC! with 1 or 2 samples
Specific calculations for a single run
Within-run standard deviation (swr):
swr = SQRT[2dupl/(2 20)]
dupl = Difference of within-run duplicates
Standard deviation of the daily means (smeans = "B" in EP-5):
smeans = SQRT[2means/(20-1)]
means = Difference [daily mean - overall mean of 20 days]
Between-day standard deviation (sbd):
sdd = SQRT[s2means – s2
wr/2]
CAVE: set sdd = 0 when s2means < s2
wr/2 (negative SQRT!)
Total standard deviation (sT):
sT = SQRT[s2means + s2
wr/2]
CAVE: set sT = swr when s2means < s2
wr/2
Calculation of degrees of freedom: (EP5)–s2
wr = number of duplicates measured: 20
–s2T = complex: precalculated in EXCEL-template
Comparing a SD-estimate with a claim–Test overlap of 1-sided confidence limit (CL) of SDs with claim, or–1-sample F-test ("Chi2-test"), 1-sided (EXCEL-template)
Statistics for imprecision can also be treated with Model II ANOVA!
Importance of imprecision• Limit of detection• Working range• Number of analytical replicates• Troubleshooting
Imprecision
Method validation 13
Imprecision – EXCEL file
Graphics
The distribution of the mean values does not indicate an outlier.
The distribution of the differences indicates that day 6 may be an outlier (-0.24).
According to the CLSI protocol it is not (4 SD outlier criterium). According to the
Grubbs-test, it is.
Calculations
The Worksheet uses the CLSI EP5 calculations and EXCEL ANOVA (Tools>Data
Analysis). In case ANOVA is used, the formulae for Swr, Sdd, and ST must be
calculated with EXCEL (see examle in the Worksheet).
Note: Due to the nature of calculation of Sdd (SQRT of a difference), Sdd is set to
zero when MS-Between groups is <= MS-Within groups.
We calculate:
Swr = 0.063 mmol/L; CVwr = 1.1%
Sdd = 0.170 mmol/L
ST = 0.181 mmol/L; CVT = 3.1%
Imprecision
Day Replicate 1 Replicate 21 5.95 5.82
2 5.64 5.81
3 5.92 5.98
4 5.85 5.85
5 5.98 5.92
6 5.77 5.53
7 5.91 5.92
8 5.94 5.91
9 6.16 6.14
10 5.83 5.79
11 5.79 5.80
12 6.04 6.06
13 6.18 6.21
14 6.03 6.17
15 6.02 6.03
16 6.14 6.16
17 5.95 5.90
18 6.07 6.17
19 5.78 5.84
20 6.31 6.40
Method validation 14
Imprecision – EXCEL file
Interpretation
The calculated values for imprecision are:
CVwr (exp) = 1.1%
CVT (exp) = 3.1%
The specifications are:
CVwr (stable) = 1.5%
CVT (stable) = 3.0%
We compare them by use of the Chi2-statistics.
We test whether the lower, 1-sided 95% confidence limits of the experimental
estimates are equal or smaller than the preset specifications.
Both values pass this statistical test, even though the experimental total CVT
(3.1%) is higher than the limit (= 3%). The reason is that the lower confidence limit
(=2.51%) is <3%.
Calculations
Chi2exp = (SD2exp df)/SD2
claim (df = degrees of freedom, here = 20)
Lower CL of SD = SD • SQRT[(df)/Chi20.05,df]
Conclusion
The validation data demonstrate that the method passes the pre-set
specifications for within and total imprecision.
DETAILED STATISTICAL BACKGROUND
Statistics• Descriptive Statistics: Dispersion• Gaussian "("Normal“) distribution• Outliers• Sampling statistics & Confidence intervals of SD‘s• Significance tests for SD & variance (Chi2, F-test)• ANOVA model II
Imprecision
Method validation 15
Limit of detection (LoD)
Concepts
LoD can be calculated from the• standard deviation of a blank• signal-to-noise ratio of a chromatogram of a low sample• calibration line by means of regression
Graphics• Dot plot• Scatter plot
Statistics
From blank• Outlier• Mean• Confidence interval of centiles
• SDtotal (experiments on different days)
• Consideration of -errors and -errors: Power concept
LoD considering of -errors and -errors
Model 1: LoD = Mean + 1.65 s0 (s = at zero)
• 5% false positives when the analyte is not present (-error)
• 50% false negatives (-error) when the analyte "is present at 1.65 s0".
Model 2: LoD = Mean + 2 • 1.65 s = Mean + 3.3 s• Mean and s are from the zero-standard• 3.3 s often simplified to 3 s
Result: 5% false positive (-error) and 5% false negative (-error)
Model applied in this book and in the EXCEL file
Simplified Model 2: LoD = Mean + 3 s
Limit of detection
1.65 s1.65 s1.65 s
3.3 s3.3 s
Method validation 16
Limit of detection (LoD) – Other concepts
Chromatographic (S/N = 3)• Outlier• Mean• SDtotal (experiments on different days)
Chromatographic LoD (S/N = 3) compared with LoD from “blank” (mean noise + 3.3 SD)
From calibrationCalculation of LoD from calibration data with regression
Yb = "Signal of blank" via regression = intercept aSb = "Standard deviation of blank" = Sy/xb = slopeTransform "Signal LoD" to concentration"Signal" LoD = a + 3 Sy/xCalculate CLoD via regression equation y = a + b xCLoD = (a + 3 Sy/x – a)/b = [3 Sy/x]/b
When the calibration curve passes through zero, the mean-term is omitted (e.g., in case of an automatic blank).
Limit of detection
No
ise
2 S
D
Sig
na
l 6
SD
0
5
10
15
20
Time
Resp
on
se
0
5
10
15
20
Time
Resp
on
se
LoD = Mean noise+ 3.3 SD
LoD = S/N = 3
Method validation 17
Limit of detection (LoD)
Samples
Usually, the LoD is derived from test variation at zero analyte. This requires
suitable "blank" samples. For exogenous compounds, such as drugs, this is easy
to realize. For endogenous compounds, suitable blank samples are more difficult
to realize. Note that "stripped" samples or blank solutions often give an
overoptimistic LoD because of their "clean" matrix.
Ideally, the LoD of a method should be assessed with several native samples
containing concentrations near the detection limit, as determined by a reference
method.
Alternatively, the LoD is derived from measurements of calibrators.
Protocols
Blank ("Common"): Applied in this book and the EXCEL file
20 measurements of the zero-standard/blank- 20 days, for example combined with EP5
Chromatographic
20 measurements of a sample that gives a Signal/Noise ratio of 3.- 20 days, for example combined with EP5
Calibration
From calibration curves at several different days (for example 5).
CLSI Protocol
EP 17 Determination of Limits of Quantitation.
Limit of detection
Method validation 18
Limit of detection (LoD) – EXCEL file
Graphic
The graphic gives no indication of an outlier.
Calculations (3 s model)
Mean: 0.0020 mmol/L
SD: 0.0219 mmol/L
Confidence interval 3SD-centile (1-sided, 95%): 0.02 mmol/L
Calculation: t(0.1,19) SQRT[SD2/20 + (32 SD2/2 20)]
LOD: 0.068 mmol/L; #UCL: 0.088 mmol/L
LOD (blanked): 0.066 mmol/L; #UCL: 0.086 mmol/L
#UCL: upper confidence limit
Interpretation
We compare the UCL of the LoD (0.088 or 0.086 mmol/L) with the specification of
0.1 mmol/L.
Conclusion
The validation data demonstrate that the method passes the pre-set specification
for the LoD.
Limit of detection
Day mmol/L1 0.01
2 -0.01
3 0.02
4 0.04
5 0.02
6 -0.03
7 -0.01
8 0.00
9 -0.01
10 0.01
11 0.02
12 -0.03
13 0.03
14 -0.03
15 0.02
16 0.01
17 0.01
18 -0.04
19 0.01
20 0.00
-0.05
-0.04
-0.03
-0.02
-0.01
0.00
0.01
0.02
0.03
0.04
0.05
Blank
Method validation 19
Working range – 2 Models
• Fixed value of the precision profile (Figure), or
• Linear part of the calibration function
In this book and in the EXCEL file, the working range is defined by the linearity of
the calibration curve.
Protocol
The protocol is presented in the chapter linearity/manufacturer protocol. In fact,
this is a protocol that assess accuracy with a number of related (mixed) samples.
Statistics & Graphics
The statistics and graphics are presented in the chapters linearity and
accuracy/recovery.
0
5
10
15
2025
30
35
40
45
0 5 10 15 20 25Analyte (arbitrary units)
CV
(%
)
Limit of detection
Working range
Working range
Method validation 20
Linearity
Graphics• Scatter plot• Residual plot (preferred)• For "accuracy model": Difference plot (preferred)
Statistics
Model 1• Based on linear regression and ANOVA: F-test for variance around line/within
sample sets (lack-of-fit: old EP 6 model)• Comparison of linear model with 2nd or 3rd order models (new EP 6 model)
Interpretation: Use CBstat
Statistics>Method evaluation>Linearity
Model 2 ("Common", Accuracy)
Often used by manufacturers for defining the Working Range
("Accuracy-based" = true x-values: e.g., weighed-in)
Investigate the deviation from the line of equality with• confidence limits, or• t-test
Interpretation• Use EXCEL® template
Note
In some fields, the correlation coefficient is used to assess linearity.
Linearity
Method validation 21
Linearity model 1
CLSI EP-6 protocol 5 interrelated samples
Mixing protocol
1 low
2 low (3) + high (1)
3 low (2) + high (2)
4 low (1) + high (3)
5 high
Alternative mixing
1 low
2 low medium: mix medium and low (1:1)
3 medium: low and high (1:1)
4 high medium: mix medium and high (1:1)
5 high
Measurement design
Measure all samples 4 times (random), within-run or "closely related runs": SDwr.
Linearity
Method validation 22
Linearity model 1 – EXCEL file (worksheet Linearity)
Samples
Low sample: 3 mmol/L
High sample: 7 mmol/L
EP 6 mix protocol
Concentrations (C) of samples 2 - 4 (V = volume)
C = (C1*V1 + C5*V5)/(V1 + V5)
Sample# Concentration (mmol/L)
1 3
2 4
3 5
4 6
5 7
Graphic
The graphic may indicate outliers in the levels 4 and 6 mmol/L. The Grubbs test,
however, does not confirm the presence of an outlier.
The residuals plot indicates non-linearity.
Linearity
Sample y1 y2 y3 y4
3.0 2.99 2.94 3.01 3.06
4.0 3.93 4.02 4.01 4.03
5.0 4.97 5.02 4.95 4.92
6.0 5.74 5.90 5.97 5.93
7.0 6.78 6.69 6.82 6.65
Method validation 23
Linearity model 1 – EXCEL file (worksheet Linearity)
Calculations
The data are investigated for linearity with specialized software (here: CBstat).
The models used are the "lack-of-fit" method and the evaluation by a second
order polynomial fit (new CLSI EP 6 model).
"Lack-of-fit"
F-test for linearity: F = 2.5125 P: 0.0980
No significant deviation from linearity.
Second order polynomial fit
t-test of last coefficient against zero:
SE of last coef.: 0.0085 t value: -2.8816 P:0.0104
x-level %-difference
3 -1.6
4 0.6
5 1.0
6 0.4
7 -0.7
Significant deviation from linearity, but non of the levels deviates by more than 5%
(chosen limit).
Interpretation
The statistical results show that the second order polynomial fit method is more
sensitive than the lack-of-fit method. The latter shows that the data-set is non-
linear. However, the 5% limit is not exceeded.
Conclusion
The validation data demonstrate that the method passes the pre-set specification
for linearity.
Linearity
Method validation 24
Linearity model 2 – EXCEL file (worksheet Lin-Manuf)
Accuracy protocol ("Working Range protocol")
This model is called "Working Range protocol" because it is often applied by
manufacturers to establish the working range.
Samples
11 (for example) interrelated samples, prepared by mixing of a blank sample and
a blank sample spiked with a known amount of analyte.
1: Blank (blank)
2: 9 blank + 1 high (spiked) sample
3: 8 blank + 2 high (spiked) sample
4: 7 blank + 3 high (spiked) sample
5: 6 blank + 4 high (spiked) sample
6: 5 blank + 5 high (spiked) sample
7: 4 blank + 6 high (spiked) sample
8: 3 blank + 7 high (spiked) sample
9: 2 blank + 8 high (spiked) sample
10: 1 blank + 9 high (spiked) sample
11: High (spiked, known concentration) sample
Measurement design
Measure all samples 4 times (random), within-run: SDwr.
Linearity
Sample y1 y2 y3 y4
0.0 0.03 0.00 0.00 -0.03
4.5 4.47 4.47 4.59 4.59
9.0 9.06 8.97 8.85 8.91
13.5 13.77 14.22 13.41 13.71
18.0 18.45 17.94 18.09 17.85
22.5 22.62 22.59 22.35 22.47
27.0 26.70 27.24 27.30 26.76
31.5 30.75 32.25 31.59 31.59
36.0 35.67 34.47 35.07 34.02
40.5 39.42 38.13 38.34 38.31
45 42.42 41.79 41.10 42.09
Method validation 25
Linearity model 2 – EXCEL file (worksheet Lin-Manuf)
Graphic
The graphic shows an (expected) increase of the scatter of the data around their
mean values (constant measurement CV). Otherwise, there seems to be no
irregularity.
Calculations
The 1-sided 95% confidence interval of the mean is calculated as follows:
CI = ± t (0.1,3) x SD/SQRT(4).
Interpretation
The interpretation of the data is done by use of the difference plot. The plot
indicates that the CLs overlap with the 5% specification from a concentration
>31.5 mmol/L. More replicates could demonstrate that the concentration of 36
mmol/L is within the specified linearity limit of 5%.
Conclusions
The validation data do not support a working range up to 45 mmol/L. The range
should be reduced to 31.5 mmol/L
Linearity
Method validation 26
Recovery
Graphics• Ratio plot (%)• Difference plot (%)
Statistics• Descriptive statistics: Location (mean, median & mode)• t-distribution• Central limit theorem• Confidence intervals• t-tests• ANOVA-model I• Power and sample size
Recovery
Method validation 27
Recovery experiments
Protocols
Model 1 ("Paired-sample"; see also CLSI EP 7)
Samples
"Paired-sample" experiment: 2 portions of native samples; spike one with known
analyte amount (= Test) and the other with the same volume saline solution (=
Control).
3 – 5 samples at relevant concentrations• Test: Add x-mL analyte standard (preferably in blank-solution) to y-mL sample;
the volume added should be less than 5-10% (requires concentrated analyte
standard)- Added concentration: e.g.; ½-1 of a "normal" sample
• Control: Add same volume blank-solution to same volume sample
Measurement design
Measure Control & Test alternating (n = 2 – 4)- Note: may need repetition with other lots of calibrators/reagents
Calculations
Concentration added = Concentration of standard • x/(x + y)
Concentration recovered = Test - Control
Recovery (%)
= 100 • (Recovered conc./Added conc.) ± 95%CL
Model 2 (Accuracy: "trueness" based; "Common" protocol)
Samples
Experimental design: "Recovery of target values"• Reference materials with target values
- Certified reference materials- IQC materials- Standards
Measurement design• Measure samples 5 times at different days
- Note: may need repetition with other lots of calibrators/reagents
Calculations
Recovery (%) = 100 • (Measured value/Target value) ± 95% CL
Recovery
Method validation 28
Recovery – Model 1 (paired sample), EXCEL file
Samples/Materials
Low sample : 3.5 mmol/L
Normal sample : 4.8 mmol/L
High sample 2 : 6.5 mmol/L
Glucose solution in isotonic NaCl : 30 mmol/L (add ≤10% volume)
Isotonic NaCl-solution
Test: Add 0,1 mL (= x) Analyte-standard to 0,9 mL (= y) sample.
Control: Add same volume NaCl-solution to same volume sample.
Calculations (see EXCEL worksheet)
Tests
C = (Csample Vsample+Cstandard Vstandard)/(Vsample+Vstandard)
Controls
C = (Csample Vsample+Csaline Vsaline)/(Vsample+Vsaline)
Added concentration
= Concentration of standard x ml standard/(x ml standard + y ml sample)
Recovered concentration
= Test – Control
Recovery (%)
= 100 (Recovered conc./Added conc.) ± CL
Results
Recovery
Control y1 y2 y3 y4
3.15 3.11 3.14 3.13 3.16
4.32 4.35 4.39 4.26 4.22
5.85 5.82 5.79 5.90 5.77
Test y1 y2 y3 y4
6.15 6.14 6.20 6.25 6.12
7.32 7.27 7.30 7.18 7.42
8.85 8.82 8.72 8.88 8.98
Method validation 29
Recovery – Model 1 (paired sample), EXCEL file
Graphics
The graphic shows the distribution of the results around their mean values and the individual recoveries. It shows no irregularities.
CalculationsThe 1-sided 95% confidence interval of the mean difference between Test and Control is calculated withz-value as follows:CI = ± z x SDpr/SQRT(4), with z = 1.65 (1-sided 95%).The interpretation of the results is done with the confidence limits calculated with the z-value and the predicted SD (SDpr) from the EP 5 imprecision data (CLSI EP 7 approach). Note that the imprecision of the %-recoveries depend on the Test and Control level and on the magnitude of the spike (see EXCEL-file).
CAVE: if one uses t, the propagated SD from the actual data has to be calculated (SD from Test and Control: different, because of different levels!). The degrees of freedom must be calculated with the Satterthwaite formula (different concentrations!). The respective test is a t-test.CAVE: the SD of %-recovery will be high when little is spiked!!!
InterpretationThe interpretation of the data is done by use of the % ratio plot. The plot shows that none of the CLs overlap with the 5% specification.
ConclusionsThe validation demonstrates that the method passes the preset 5% limit for recovery.
Recovery
Method validation 30
Recovery – Model 2 (accuracy/trueness), EXCEL file
Samples
Low IQC material : 3.9 mmol/L
High-normal IQC material : 5.9 mmol/L
High IQC material : 8.5 mmol/L
Standard 1 : 4.5 mmol/L
Standard 2 : 5.0 mmol/L
Standard 3 : 5.5 mmol/L
Measurement
Measure samples 5 times at different days.
Note: may need repetition with other lots of calibrators/reagents.
Calculations (see EXCEL worksheet)
Recovery (%)
= 100 (Measured value/Target value) ± CL
Results
Graphics
The graphic shows the distribution of the results around their mean values. It
shows no irregularities.
Recovery
Sample y1 y2 y3 y4 y5
3.9 3.93 3.90 3.88 3.92 3.91
5.9 5.83 5.70 5.79 5.63 5.84
8.5 7.92 8.64 8.31 8.79 8.66
4.5 4.63 4.48 4.40 4.50 4.60
5.0 4.92 4.97 5.29 4.95 5.14
5.5 5.59 5.60 5.68 5.93 5.28
Method validation 31
Recovery – Model 2 (accuracy/trueness), EXCEL file
Calculations
The 1-sided 95% confidence interval of the mean is calculated as follows:
CI = ± t (0.1,4) x SD/SQRT(5).
Interpretation
The interpretation of the data is done by use of the % ratio plot. The plot shows
that only the CL of Standard 3 overlaps with the 5% specification. This standard
should be repeated.
Conclusions
The validation demonstrates that the method passes the preset 5% limit for
recovery (given that the repetition of Standard 3 is within the specification).
Recovery
Method validation 32
Interference testing (CLSI EP7)
Graphics• See "Recovery: Paired sample
Statistics• See Recovery: Paired sample
Protocols (CLSI EP 7, 2 approaches)
Approach 1: "Paired difference method"
Applies similar experimental design and calculations as the paired-sample
recovery experiment (3 – 5 samples). Instead of analyte standard, an interferent
standard has to be prepared.• Test: Add x-mL interferent-solution (preferably in blank-solution) to y-mL sample;
the volume added should be less than 5-10%• Control: Add the same volume blank-solution to the same volume sample
Measure: Control & Test alternating (n = 2 – 4)
Interference (%)
= 100 • (Test - Control)/Contro ± 95% CL
Approach 2: "Dose-response method" (used in EXCEL file)
3 – 5 samples, for each• Low pool (low or no interferent added; if no, add blank!)• High pool (interferent at maximum concentration)
- Note: always add the same volumes blank/interferent solutions• Create 5 levels by "alternative mix-protocol linearity"!
Measure: All levels "up", then down, or random (n = 2 – 4)
Interference (%)
= 100 • (Test - Control)/Control ± CL
Note
CLSI EP7 applies regression analysis for this protocol!
Interference
Method validation 33
Interference – EXCEL file
Samples/Materials
Low sample 2 : 3.5 mmol/L
Interferent solution in NaCl : 600 mg/dL
Isotonic saline solution
-Make "Low pool" (add 0,1 ml saline to 0,9 ml sample)
-Make "High pool" (add 0,1 ml interferent solution to 0,9 ml sample)
(Note: always add the same volumes saline/interferent solutions)
-Create 5 levels by "alternative mixing protocol"
Measurement
Measure, within-run: All levels "up", then down, or random (n = 4)
Interference (%)
= 100 • (Test - Control)/Control ± CL
Results
Graphics
The graphic shows the distribution of the results around their mean values. It
shows no irregularities.
Interference
BILI y1 y2 y3 y4
0 3.17 3.12 3.13 3.15
15 3.24 3.18 3.03 3.22
30 3.15 3.12 3.20 3.13
45 3.33 3.36 3.34 3.40
60 3.55 3.68 3.40 3.53
Method validation 34
Interference – EXCEL file
Calculations
The 1-sided 95% confidence interval of the mean difference between Test and
Control (0 BILI) is calculated with the z-value as follows:
CI = ± z x SDpr/SQRT(4), with z = 1.65 (1-sided 95%).
The interpretation of the results is done with the confidence limits calculated with
the z-value and the within-run imprecision as calculated from the EP 5 protocol
(CLSI EP 7 approach).
Note that the imprecision of the interference results (SDpr) is SQRT(2) times the
measurement imprecision because the interference results are the difference
between 2 measurements (Test and Control).
Interpretation
The interpretation of the data is done by use of the % difference plot. The plot
shows that only the CL of the sample with 60 mg/dL bilirubin overlaps with the
10% specification. The test is valid up to a bilirubin concentration of 45 mg/dL.
Conclusions
The validation data show that the test is valid up to a bilirubin concentration of 45
mg/dL.
Interference
Method validation 35
Method comparison
Graphics• Scatter plot• Difference plot• Residual plot• Krouwer plot• Bland and Altman plot
Statistics• Correlation• Regression• Bland and Altman approach• General (F-test, t-test, confidence-intervals)
General remarks
Method comparison supposes:
Appropriate performance of test- and comparison method- Internal Quality Control (verify actual imprecision with expected by use of F-
test; verify calibration with targetted control samples by t-test of confidence
intervals)
Appropriate presentation of the paired observations (xi,yi)
Appropriate interpretation
Interpretation of method comparison makes integrated use of:
Graphical and statistical techniques
Analytical quality specifications
Method comparison – Sample size
Usually, general recommendations are given for sample size (EP 9: n 40, e.g.).
However, to assure given type I and II errors, i.e. sufficient power in a method
comparison study, a minimum sample size is needed depending on:• Slope or intercept deviation to be detected• Measurement range• Constant or proportional analytical error assumption• Magnitude of SD or CV for the methods
Tables are available: See Linnet K. Clin Chem 1999; 45: 882-894.
Method comparison
Method validation 36
Method comparison protocols
The CLSI EP-9 protocol
Experimental design:• At least 40 samples• Spread analysis over 5 days, randomize concentrations• Measure duplicates in 1 run, 1st series "upwards", second series "downwards"
Apply adequate internal quality control!
Data presentation and calculations:
• Outlier tests: Diffdupl > 4 • Mean Diffdupl
(if yes, perform the same with % data)• Scatter plots, singlicates and mean of duplicates• Bias plots, singlicates and mean of duplicates• Inspect for linearity, dispersion, and range (r 0.975)• Apply linear regression (ordinary or Deming)
Interpretation:• Dependent on the criteria of the laboratory• Dependent on whether a reference method was used or a "comparative" method
Note: Make a distinction between pure statistical, analytical, and clinical
interpretation!
The Valtech protocol
Experiments• At least 50 samples (better: 80 - 100).• Carry the analyses out in singlicates, spread over 10 measurement series, and
take the samples random.• Adequate internal quality control!
Vassault A, Grafmeyer D, Naudin Cl, Dumont G, Bailly M, Henny J, Gerhardt MF,
Georges P. Société Française de Biologie Clinique. Protocole de validation de
techniques. Ann Biol Clin 1986;44:686-719 (english version: 720-45).
See also: Vassault A, Grafmeyer D, de Graeve J, Cohen R, Beaudonnet A,
Bienvenu J. Société Française de Biologie Clinique. Analyses de biologie
médicale: spécifications et normes d’acceptabilité à l’usage de la validation de
techniques. Ann Biol Clin 1999;57:685-95.
Method comparison
Method validation 37
Method comparison protocols
The “UG” protocol
“If possible, use a true reference method for comparison”
Experiments • Start from a reliable calibration basis and verify it with IQC samples from the
manufacturer = Stable basis.• Adapt the number and the sort of samples to the problem (e.g. 50).• Duplicates in 1 series, random sampling (note: for the reference method, adapt
the number of measurements to the problem).• “Intensive” IQC
Dewitte K, Stöckl D, Van de Velde M, Thienpont LM. Evaluation of intrinsic and
routine quality of serum total magnesium measurement. Clin Chim Acta
2000;292:55-68.
The stable basis
Was the method performed adequately: Inspection of the internal quality control
(IQC) data.• Evaluation of precision and traceability to manufacturer
The stable basis – Statistics• F-test• t-test• Confidence-intervals
Method comparison
Method validation 38
Method comparison – EXCEL file
Results
Method comparison
Ref. Yours Ref. Yours Ref. Yours Ref. Yours
3.79 3.80 4.89 4.64 5.65 5.55 6.66 6.87
3.84 3.88 4.91 4.62 5.73 5.58 6.71 6.80
3.86 3.65 4.91 4.90 5.79 6.08 6.78 6.90
3.88 3.86 4.95 4.88 5.83 5.65 6.87 7.11
3.92 3.93 5.01 4.86 5.84 6.05 6.94 7.17
3.99 4.09 5.02 4.89 5.86 5.76 7.10 7.07
4.08 4.16 5.03 5.17 5.92 5.76 7.12 7.00
4.11 4.11 5.16 4.90 5.93 5.57 7.13 7.02
4.13 4.05 5.17 5.12 5.94 6.10 7.14 6.90
4.13 4.07 5.17 5.01 5.97 5.80 7.15 7.23
4.23 4.38 5.18 5.26 5.97 5.88 7.15 7.38
4.27 4.21 5.25 5.28 6.06 6.11 7.36 7.19
4.38 4.28 5.39 5.37 6.11 6.08 7.43 7.11
4.39 4.28 5.44 5.49 6.12 5.90 7.47 7.11
4.42 4.31 5.49 5.43 6.30 6.03 7.51 7.15
4.58 4.63 5.53 5.34 6.49 6.48 7.56 7.39
4.70 4.65 5.55 4.99 6.50 6.77 7.90 7.81
4.70 4.48 5.58 5.45 6.59 6.58 8.02 7.83
4.85 5.01 5.58 5.53 6.61 6.22 8.07 7.82
4.85 4.62 5.65 5.27 6.66 6.28 8.19 7.72
Method validation 39
Method comparison – EXCEL file
Calculations – Bland & Altman approach
The calculations comprise the mean difference and the 1.96 CV of the individual
differences and their respective CLs.
CI (mean) = ± t (0.1,79) x SDdiff/SQRT(80),
CI (1.96s centile) = ± t(0.1,79) SQRT[SD2/80 + (1.962 SD2/2 80)]
= 1.71 CI (mean)
See also Worksheet Meth-Comp3 for calculations.
Graphics and interpretation
The graphic (% differences) reveals no outliers. The CLs of the mean and the
1.96 centile of the differences ("limits of agreement") do not overlap with the
respective specifications of 3% (SE or Bias limit) and 10% (TE limit).
Conclusions
The validation data show that the test passes the preset limits for systematic (3%)
and total error (10%).
Method comparison
Method validation 40
Method comparison – EXCEL file
Calculations – Regression
See Worksheet Meth-Comp4 for the detailed calculation of the ordinary linear
regression and correlation estimates.
Calculations
CI (line) = ± t Sy/x SQRT[1/n + (Xc –Xmean)2/(Xi –Xmean)2] (df t = n – 2)
Xc: concentration for which the bias shall be investigated.
CI (points) = ± t Sy/x SQRT[1 + 1/n + (Xc –Xmean)2/(Xi –Xmean)2] (df t = n –
2)
Xc: concentration for which the total error shall be investigated.
Graphics
The results are presented in a scatter plot and a residuals plot.
Interpretation
The confidence limits of bias and total error at the minimum and maximum values
of x (respectively y) are compared with the specifications. They are smaller than
the specifications at both concentrations (see Worksheet Meth-Comp4).
Conclusions
The validation data show that the test passes the preset limits for systematic (3%)
and total error (10%).
Method comparison
Method validation 41
Notes
Notes
Method validation 42
Content
Summary of protocols, statistics & graphics
System stability, Ruggedness and Multifactor protocols
Glossary of terms
Annex
Method validation 43
Protocols & statistics
Experimental protocols
Protocols• Imprecision : EP 5• Limit of detection : EP 17 or "Common"• Working range : see linearity or or define by imprecision• Linearity : EP 6• Linearity by recovery : "Common" (Accuracy)• Recovery, reference material : "Common" (Accuracy)• Recovery, added analyte : see interference/specificity• Interference/Specificity : EP 7• Total error : EP 9, UG* (Method comparison)
EP* = CLSI Evaluation protocols; UG = University Ghent
Others• EP 10 Preliminary evaluation• EP 12 Qualitative tests• EP 14 Matrix effects• EP 15 User demonstration precison & accuracy• EP 21 Total error
Statistics (>Statistics Book)
Analytical problem Associated statistics
General Basic statistics
Outlier tests (e.g., Grubbs)
Imprecision F-test; CHI2-test (#), ANOVA
Limit of detection Probability & Power
Working range see linearity or define by imprecision
Linearity Regression, ANOVA
Recovery t-test (#)
Interference/Specificity t-test (#)
Total error (method comparison) Regression & correlation
Bland & Altman plot
Trouble-shooting Power (sample size calculations)
#Alternative: confidence intervals
Annex
Method validation 44
Graphics
Univariate data Bivariate dataDot plot Scatter plot
Histogram Difference plot
Box plot Ratio plot (%) (Recovery)
Residuals plot
Annex
0
20
40
60
80
100
120
140
Sample A
Va
lue
0
1
2
3
4
5
6
7
8
55
65
75
85
95
10
51
15
12
51
35
Value-Bin
Fre
qu
en
cy
0
20
40
60
80
100
120
140
Sample A
Va
lue
-6
-4
-2
0
2
4
6
0 5 10 15 20 25
Glucose A (mmol/l)
Re
sid
ua
l (m
mo
l/l)
0
5
10
15
20
25
30
0 5 10 15 20 25
Glucose A (mmol/l)
Glu
co
se
B (
mm
ol/l
)
Method validation 45
Overview of experiments, statistics, and graphics
Annex
Performance
chracteristic
•Samples•Measurements#•Relevant SD$
•Graphics•Statistical test vs
specification
CLSI
Doc.
Imprecision •IQC-samples; no target•n = 20•Within & total
•Dot plot•ANOVA & 1-sample F-test
or CL of SD
EP 5
EP 15
LoD/LoQ •Blank; Low sample•n = 20•Total
•Dot-plot•1-sample F-test or CL of
SD
EP 17
Linearity •5 related samples/-
calibrators (mix); no target•n = 4•Within
•Scatter-/residual plot•Lack-of-fit or polynomial
regression
EP 6
Working
range
See: Imprecision/Linearity --- ---
Interference •Samples: Interferent spike
& control (no target)•n = 4•Within
•Difference-/ratio plot•CL of mean difference (or
t-test)
EP 7
Trueness
(Accuracy)
•Samples: Known analyte
spike & control or CRM•n = 5•Total
•Difference-/ratio plot•CL of mean difference or
CL of mean (or t-tests)
EP 7
EP 15
Total error 40 samples (RMP-
target)•n = 1 or 2•Total or within (UG
protocol)
•Scatter-/bias plot•Correlation, Regression/-
Bland&Altman
EP 9
EP 21
UG
#Numbers do not always correspond to the respective CLSI document.
$Abbreviations: SD: standard deviation; IQC: Internal Quality Control; CLSI: Clinical and
Laboratory Standards Institute; CRM: Certified Reference Material; RMP: Reference
Measurement Procedure); UG: University Ghent; CL: Confidence limit.
Method validation 46
Overview of experiments, statistics, and graphics
Sensitivity of statistical parameters to different types of errors(From: Westgard JO, Hunt MR. Clin Chem 1973;19:49-57)
Annex
Type of error
Statistic Random Constant Proportional
Slope No No Yes
Intercept No Yes No Least-squares
Sy/x Yes No No
Bias No Yes Yes Paired t-test
SDdiff Yes No Yes
Correlation r Yes No No
Method validation 47
System stability
Annex
Trueness is also related to system [in]stability• Drift• Shift
System instability is tackled by internal quality control.
CarryoverCarryover is related to the quality of the instrument and the test procedure (e.g.,
washing).
See CLSI protocol EP-10.
Ruggedness
Ruggedness = ability to reproduce the method in different laboratories or in
different circumstances• Related to the method principle and the test conditions• Related to the instrument a method is performed with
Assessment of ruggedness• Between-laboratory performance data obtained through EQA• Ease of operation within the laboratory
–Efforts needed for internal quality control–Productivity of a method (down time, calibration and service intervals, etc.)
Method validation 48
Multifactor protocols
Annex
Classically, single effects are investigated in one experimental design (e.g.,
imprecision, linearity, carryover). Multi-factor evaluation designs investigate
several effects with one experimental design – Advantage: less time consuming!
Example- EP 10: Allows evaluation of• Imprecision• Linearity• Bias• Carryover• Drift
Applies multiple linear regression analysis• Needs special software for interpretation
The EP-10 protocol
The design• 3 interrelated samples: low, mid, high• Prescribed measurement sequence:
Mid, mid, high, low, mid, mid, low, low, high, high, mid.• 5 days, always 1 run
Method validation 49
Glossary
Metrology [1]
field of knowledge concerned with measurement
Measurand [1]
quantity intended to be measured
Quantity [1]
property of a phenomenon, body, or substance, to which a number can be assigned with
respect to a reference
Measurement [1]
process of experimentally obtaining one or more quantity values that can reasonably be
attributed to a quantity
Notes: • Quantities are length, mass, amount-of-substance, time, temperature, etc.• The value of a quantity is expressed by both a number and an unit• The full specification of the quantities measured in the medical laboratory comprises
three elements:
System (e.g., blood plasma)
Component (also called analyte) (e.g., glucose)
Kind-of-quantity (e.g., amount-of-substance concentration)
The full report of a glucose measurement would read: “the amount-of-substance
concentration of glucose in blood plasma was 5.2 mmol/L”
Measurement unit [1]
scalar quantity, defined and adopted by convention, with which any other quantity of the
same kind can be compared to express the ratio of the two quantities as a number
Value of a quantity [1]
number and reference together expressing magnitude of a quantity
EXAMPLE: Length of a given rod: 5.34 m
Measurement standard [1]
realization of the definition of a given quantity, with stated quantity value and
measurement uncertainty, used as a reference
EXAMPLE: 1 kg mass standard.
Annex
Method validation 50
Glossary
Error [1]difference of measured quantity value and reference quantity value
Systematic error [1]component of measurement error that in replicate measurements remains constant or varies in a predictable manner
Bias [1]systematic measurement error or its estimate, with respect to a reference quantity value
Random error [1]component of measurement error that in replicate measurements varies in an unpredictable manner
Trueness [1]closeness of agreement between the average of an infinite number of replicate measured quantity values and a reference quantity value
Accuracy [1]closeness of agreement between a measured quantity value and a true quantity value of the measurand
Precision [1]closeness of agreement between indications obtained by replicate measurements on the same or similar objects under specified conditions
Repeatability condition [1]condition of measurement in a set of conditions that includes the same measurementprocedure, same operators, same measuring system, same operating conditions and same location, and replicate measurements on the same or similar objects over a short period of time
Reproducibility condition [1]condition of measurement in a set of conditions that includes different locations, operators, measuring systems, and replicate measurements on the same or similar objects
Uncertainty [1]parameter characterizing the dispersion of the quantity values being attributed to ameasurand, based on the information used
[Metrological] Traceability [1]property of a measurement result whereby the result can be related to a stated reference through a documented unbroken chain of calibrations, each contributing to the measurement uncertainty
Annex
Method validation 51
Glossary
Commutability [of a reference material] [1]
property of a reference material, demonstrated by the closeness of agreement between
the relation among the measurement results for a stated quantity in this material,
obtained
according to two given measurement procedures, and the relation obtained among the
measurement results for other specified materials
Matrix effect [2]
Influence of a property of the sample, other than the measurand, on the measurement of
the measurand according to a specified measurement procedure and thereby on its
measured value [2]
Influence quantity [1]
quantity that, in a direct measurement, does not affect the quantity that is actually
measured, but affects the relation between the indication and the measurement result
Note: Specificity & Interference are not yet unequivocally defined by ISO.
Selectivity [1]
capability of a measuring system, using a specified measurement procedure, to provide
measurement results, for one or more measurands, that do not depend on each other nor
on any other quantity in the system undergoing measurement (= specificity in chemistry)
Interference [in analysis]
A systematic error in the measure of a signal caused by the presence of concomitants in
a sample (http://goldbook.iupac.org)
specific [in analysis]
A term which expresses qualitatively the extent to which other substances interfere with
the determination of a substance according to a given procedure. Specific is considered
to be the ultimate of selective, meaning that no interferences are supposed to occur
(http://goldbook.iupac.org).
Calibration [1]
operation that, under specified conditions, in a first step establishes a relation between
the
quantity values with measurement uncertainties provided by measurement standards and
corresponding indications with associated measurement uncertainties and, in a second
step, uses this information to establish a relation for obtaining a measurement result from
an indication
Sensitivity [1]
quotient of the change in the indication and the corresponding change in the value of the
quantity being measured
Annex
Method validation 52
Glossary
Linear rangeConcentration range over which the intensity of the signal obtained is directly proportional to the concentration of the species producing the signal (http://goldbook.iupac.org).
Linearity (generic)Ability of an analytical procedure to produce test results which are proportional to the concentration (amount) of an analyte, either directly or by means of a well-defined mathematical transformation.
Working interval [1]set of values of the quantities of the same kind that can be measured by a givenmeasuring instrument or measuring system with specified instrumental uncertainty,under defined conditions
Limit of detection (in analysis)The limit of detection, expressed as the concentration, cL, or the quantity, qL, is derived from the smallest measure, xL, that can be detected with reasonable certainty for a given analytical procedure. The value of xL is given by the equation xL = xbi + k • sbi, where xbi is the mean of the blank measures, sbi is the standard deviation of the blank measures, and k is a numerical factor chosen according to the confidence level desired (http://goldbook.iupac.org).
Limit of detection [1]measured quantity value, obtained by a given measurement procedure, for which the probability of falsely claiming the absence of a component in a material is β, given a probability α of falsely claiming its presence
Ruggedness (generic)Ability to reproduce the method in different laboratories or in different circumstances.
Ruggedness (USP)Degree of reproducibility of the results obtained under a variety of conditions, expressed as %RSD. These conditions include different laboratories, analysts, instruments, reagents, days, etc.
Robustness (ICH Q2A 1995)The robustness of an analytical procedure is a measure of its capacity to remain unaffected by small, but deliberate variations in method parameters and provides an indication of its reliability during normal usage.
[1] BIPM, IEC, IFCC, ISO, IUPAC, IUPAP, OIML. Vocabulaire International des Termes Fondamentaux et Généraux de Métrologie. 3rd ed. Geneva: ISO, 2007.[2] EN/ISO 17511:2003. In vitro diagnostic medical devices – Measurement of quantities in biological samples – Metrological traceability of values assigned to calibrators and control materials.[3] See also: www.clsi.org>Harmonized Terminology Database
Annex