exploratory analysis of clinical safety data to …hbiostat.org/talks/gsksafety.pdfexploratory...
TRANSCRIPT
Exploratory Analysis of
Clinical Safety Data to Detect
Safety Signals
Frank E Harrell JrDepartment of Biostatistics
Vanderbilt University School of Medicine
Nashville TN USA
biostat.mc.vanderbilt.edu
Slides and R Code at Jump: FHHandouts
FDA VISITING PROFESSOR LECTURE SERIES
WHITE OAK MD 11 OCT 2005
Outline
1. The problem
2. Statistical efficiency issues
3. Graphs, not tables
4. ECDFs and extended box plots
5. Clinical trial data
6. Empirical CDFs for lab variables
7. Variable clustering
8. Time trends and clustering of AEs
9. Clustering of lab variables
10. Recursive partitioning
11. Who is having selected AEs?
12. Which AEs and lab abnormalities are
independently related to treatment?
13. Examples from other trials
1
The Problem
• No analytic plan
• Often collect safety data to learn what safety
parameters are of concern
• Large number of lab parameters and types of
adverse events encountered
• More emphasis on type II error vs. type I error in
contrast to efficacy assessment
• Adjustment for multiple P -values and ad hoc
comparisons ill-defined; done informally but with an
eye on consistent patterns
• Exploratory multivariate problem
2
Statistical Efficiency Issues
• For clinical lab data, common to compute
proportions of subjects above 2× or 3× upper
limit of normal
• “Normal” is arbitrary, not tied to clinical outcomes
• Most efficient cutoff of a continuous variable is the
median
• Still results in efficiency of only 2π ≈ 0.64 if
distribution is normal (median test)
• Wilcoxon 2-sample test has efficiency of 3π ≈ 0.95
• Nonparametric tests generally have greater
efficiency than parametric tests because of
skewness, high-influence points, etc.
3
Efficiency Issues, continued
• Keep continuous variables continuous as much as
possible
– Simple tests: Wilcoxon, Kruskal-Wallis,
Spearman
– Graphs: empirical cumulative distributions,
scatterplots, multiple quantiles
4
Graphs, Not Tables
• Have pity on statistical and medical reviewers
• Difficult to see patterns in tables
• Substituting graphs for tables increases efficiency
of review
5
ECDFs and Extended Box Plots
• Empirical cumulative distribution functions:
– full resolution of data
– unique; no arbitrary binning of data
– ideal for comparing 2 treatments
• Box plots can be extended to show not only 3
quartiles but other quantiles [2]
• 0.25, 0.5, 0.75, 0.9 intervals + median and mean
6
[19.0,31.0)
[31.0,41.0)
[41.0,50.2)
[50.2,62.0)
[62.0,92.0]
malesmall
4 6 8 10 12 14 16
femalesmall
[19.0,31.0)
[31.0,41.0)
[41.0,50.2)
[50.2,62.0)
[62.0,92.0]
malemedium
femalemedium
[19.0,31.0)
[31.0,41.0)
[41.0,50.2)
[50.2,62.0)
[62.0,92.0]
malelarge
femalelarge
4 6 8 10 12 14 16
glyhb
Figure 1: Extended box plots for glycosolated hemoglobin, strati-
fied by two categorical variables (forming panels) and one continu-
ous variable (categorized into quintiles).
7
Clinical Trial Data
• A pharmaceutical company generously supplied
excellent demographic, AE, vital signs, clinical
chemistry, and ECG data
• Three protocols combined
• Phase III randomized double-masked
placebo-controlled parallel-group studies
• Drug:placebo 2:1 randomization (n = 1374 and
684)
• Analyzed asessments at weeks 0, 2, 4, 8, 12, 16,
20 (plus week 1 for AEs)
8
Comparing Lab Variables Between Groups
• Means and SDs are not very helpful for highly
skewed data
• Examining summary stats individually can
exaggerate treatment differences
• Empirical CDFs display all information objectively
9
lymphocytes absolute
Pro
port
ion
<=
x
2 4 6 8
0.0
0.2
0.4
0.6
0.8
1.0
n:1720 m:338 albumin
Pro
port
ion
<=
x
30 35 40 45 50
0.0
0.2
0.4
0.6
0.8
1.0
n:1720 m:338 alkaline phosphatase
Pro
port
ion
<=
x
100 200 300
0.0
0.2
0.4
0.6
0.8
1.0
n:1718 m:340 hematocrit
Pro
port
ion
<=
x
20 30 40 50 60
0.0
0.2
0.4
0.6
0.8
1.0
n:1725 m:333
total bilirubin
Pro
port
ion
<=
x
0 10 20 30 40 50
0.0
0.2
0.4
0.6
0.8
1.0
n:1719 m:339 creatinine
Pro
port
ion
<=
x
50 100 150 200 250 300
0.0
0.2
0.4
0.6
0.8
1.0
n:1719 m:339 total bilirubin
Pro
port
ion
<=
x0 10 20 30 40 50
0.0
0.2
0.4
0.6
0.8
1.0
n:1719 m:339 white blood cell count
Pro
port
ion
<=
x
5 10 15
0.0
0.2
0.4
0.6
0.8
1.0
n:1725 m:333
red blood cell count
Pro
port
ion
<=
x
2 3 4 5 6 7
0.0
0.2
0.4
0.6
0.8
1.0
n:1725 m:333 platelets
Pro
port
ion
<=
x
0 200 400 600
0.0
0.2
0.4
0.6
0.8
1.0
n:1724 m:334 corrected qt
Pro
port
ion
<=
x
350 400 450 500
0.0
0.2
0.4
0.6
0.8
1.0
n:1746 m:312 qrs
Pro
port
ion
<=
x50 100 150
0.0
0.2
0.4
0.6
0.8
1.0
n:1746 m:312
Figure 2: Empirical CDFs of 12 lab and ECG parameters stratified
by treatment group for week 8. CDFs are virtually superimposed.
10
Usefulness of Variable Clustering
• Learn how multivariate responses occur/move
together
• Learn about redundancy—variables that are
sufficiently captured by other variables
• Cluster separately by treatment to see if treatment
more likely than placebo to cause multiple
abnormalities in the same subject
• Standard hierarchical clustering algorithms may be
run on a variety of similarity matrices based on
pairwise similarity measures
– proportion of subjects missing on two variables
– proportion of subjects having a pair or AEs
– Spearman ρ2: strength of monotonic
relationship
11
Clustering of AEs
• Analyzed 7 AEs having at least 100 episodes
• Which AEs occur together?
• Similarity measure: proportion of patients having
both AEs (diagonal = 1)
12
diarrhea
ab.pain
nausea
headache
dyspepsia
upper.resp.infect
coad
0.010
0.008
0.006
0.004
0.002
0.000
Proportion
Week 1 D
rug
coad
upper.resp.infect
dyspepsia
headache
ab.pain
nausea
diarrhea
0.005
0.004
0.003
0.002
0.001
0.000
Proportion
Week 1 P
lacebo
Figure 3: Variable clustering of AEs at week 1 using proportion of
patients having two AEs as similarity measure.
13
Time Trends of AE Clustering
• Difficult to track changes in dendograms over
multiple times
• Can separately plot all pairwise similarities over
time, stratified by treatment
• Estimate incidence of pairs of AEs above
coincidence levels
• Pij − PiPj
14
headache ab.pain nausea dyspepsia diarrhea upper.resp.infect coad
headache
ab.pain
nausea
dyspepsia
diarrhea
upper.resp.infect
coad
0.00
0.08
−0.04
0.04
Figure 4: Time trends in incidence of AEs (diagonal) and chance-
corrected joint incidence (off-diagonal). Solid lines represent drug
and dotted lines placebo. Horizontal reference lines are at zero
(chance level of joint incidence). Week is on x-axes.
15
Clustering Lab Variables
Similarity measure = Spearman ρ2
16
monocytesneutrophils
wbcbasophils
eosinophilsglucose
potassiumchloridesodium
buncreatinine
uric.acidalk.phosplateletsbilirubin
lymphocytesrbc
hematocrithemoglobin
albuminprotein
ggtalat
asat
1.0
0.8
0.6
0.4
0.2
0.0
Spearman ρ2W
eek 8 Dru
g
bilirubinrbc
hematocrithemoglobin
eosinophilsbasophils
monocyteslymphocytes
plateletsneutrophils
wbcggt
alatasat
potassiumalbuminprotein
alk.phoschloridesodium
glucoseuric.acid
buncreatinine
1.0
0.8
0.6
0.4
0.2
0.0
Spearman ρ2
Week 8 P
lacebo
Figure 5: Variable clustering of clinical chemistry variables at week
8.17
axis
pr
qrs
corr.qt
rr0.12
0.10
0.08
0.06
0.04
0.02
0.00
Spearman ρ2
Week 8 D
rug
corr.qt
rr
axis
pr
qrs
0.10
0.08
0.06
0.04
0.02
0.00
Spearman ρ2
Week 8 P
lacebo
Figure 6: Variable clustering of ECG parameters at week 8.
17-1
buncreatinineuric.acid axisprchloridesodiumpotassiumglucosehrrr sbpdbpalbuminprotein ggtalatasat corr.qtqrsmonocytesneutrophilswbc basophilseosinophilscoadplateletsrbchematocrithemoglobin alk.phosbilirubinlymphocytes1.0
0.8
0.6
0.4
0.2
0.0
Spearman ρ2
Week 8 D
rug
bilirubinlymphocytesplateletsalbuminprotein ggtalatasat chloridesodiumalk.phosneutrophilswbc uric.acidbuncreatinine glucoserbchematocrithemoglobin hrrr potassiumbasophilseosinophilsmonocytescoadcorr.qtqrssbpdbp axispr
1.0
0.8
0.6
0.4
0.2
0.0
Spearman ρ2
Week 8 P
lacebo
Figure 7: Variable clustering of combined vital signs, AE, and clinical
chemistry variables at week 8.
17-2
neutrophils
alat
albumin
alk.phos
asat
basophils
bilirubin
bun
chloride
creatinine
eosinophils
ggt
glucose
hematocrit
hemoglobin
potassium
lymphocytes
monocytes
sodium
platelets
protein
rbc
uric.acid
wbc
01
Figure 8: Time trends in correlation between selected lab variables,
stratified by treatment (dotted line = placebo). Y -axes are Spearman
ρ2. Week is on all x-axes.
17-3
Summarizing Time Trends in Correlations
• For each treatment compute slope of squared rank
correlations over time
• Compute differences (drug - placebo)
18
neutrophils
alat
albumin
alk.phos
asat
basophils
bilirubin
bun
chloride
creatinine
eosinophils
ggt
glucose
hematocrit
hemoglobin
potassium
lymphocytes
monocytes
sodium
platelets
protein
rbc
uric.acid
wbc
neutrophils
alat
albumin
alk.phos
asat
basophils
bilirubin
bun
chloride
creatinine
eosinophils
ggt
glucose
hematocrit
hemoglobin
potassium
lymphocytes
monocytes
sodium
platelets
protein
rbc
uric.acid
wbc
Figure 9: Differences in slopes of squared rank correlations over
time. Vertical line segments above gray horizontal reference lines
correspond to positive slope differences and hence indicate that
correlations became stronger over time for drug than for placebo;
those below the gray lines correspond to negative slope differences
and hence indicate that the rank correlation between the two indi-
cated variables became smaller over time for drug as compared to
placebo. Maximum difference was 0.033 and minimum was -0.035.
19
Recursive Partitioning
• Almost model-free
• Useful for handling large numbers of potential
predictors
• Can provide interesting leads if not internally
validated
• Conservative if tree pruned to internally validate
20
Who is Having an AE?
• Chronic obstructive airways disease is the most
common AE
• Use recursive partitioning to develop a regression
tree predicting Prob(COAD)
• Descriptive analysis; requires validation
• Candidate predictors: treatment, time, 6
demographics, 2 smoking, systolic and diastolic bp,
24 clinical chemistry parameters, 5 ECG
parameters
21
Chronic Obstructive Airways DiseaseWeek 8
wbc< 8.75
rr>=648.6
0.083n=2011
0.066n=1685
0.166n=326
0.132n=287
0.41n=39
Figure 10: Regression tree predicting Prob(COAD) at week 8.
22
Chronic Obstructive Airways DiseaseAll Weeks
week< 0.5
wbc< 8.75
0.067n=16103
0.006n=2058
0.076n=14045
0.069n=12669
0.136n=1376
Figure 11: Regression tree predicting Prob(COAD) at any week.
23
Logistic Model Predicting COAD
• Predict COAD at week 8 for subjects having no
COAD before week 8
• Predictors: treatment, age, sex, smoking, restricted
cubic splines in lymphocytes, WBC, heart rate, BMI
• Only 54 events and 458 non-events due to missing
data (especially lymphocytes)
• Recursive partitioning using treatment, baseline
variables, vital signs, EKG variables, and clinical
chemistry variables could not find a split that
validated
• Logistic model: P = 0.11 for treatment, 0.04 for
lymphocytes (very nonlinear), 0.06 for WBC
(nonlinear)
• C = 0.739 (apparent) 0.66 (bootstrap
overfitting-corrected)
24
white blood cell count, 109 L
Pro
b[N
ew C
OA
D]
2 4 6 8 10 12 14 16
0.02
0.04
0.06
0.08
0.10
female
male
Adjusted to: age=65 smoking=0 lymphocytes=1.796 hr=76 trx=Drug bmi=26.01
Figure 12: Predicted probability of new COAD as a function of
WBC and sex from binary logistic model.
25
4 5 6 7 8 9 10
1.0
1.5
2.0
2.5
3.0
white blood cell count, 109 L
lym
phoc
ytes
, 109
L
Adjusted to: age=65 sex=male smoking=0 hr=76 trx=Drug bmi=26.01
Figure 13: Predicted probability of new COAD as a function of
WBC and lymphocytes. Regions beyond which fewer than 10 sub-
jects exist are not shown.
26
Prediction of GI Problems
• Recursive partitioning to predict union of
abdominal pain, nausea, dyspepsia, diarrhea at 8
weeks using predictors measured at 4 weeks
(except AEs) plus some baseline variables.
• Used treatment, baseline variables, vitals, EKG,
clinical chemistry
• No splits that cross-validated
• Try predictors treatment, age, sex, BMI, smoking,
SBP, DBP, HR, WBC
• Again no splits
• Logistic regression using same variables (with
splines) — using 161 GI events, 1577 non-events
• Apparent C=0.64, bootstrap validated 0.58
• Males less likely to have GI AE
(OR = 0.58, P = 0.004)
27
• Treatment: P = 0.15
• Low SBP associated with ↑ GI events
χ2 − df
−2 0 2 4 6
sex
sbp
hr
trx
bmi
dbp
smoking
age
wbc
Partial Associations with GI AEs
Figure 14: Tests of partial association of subject characteristics
at week 4 or baseline, with GI events at week 8. χ2 values are
adjusted for d.f.
28
Multivariate Analysis of Treatment Differences
• Multiple responses: AEs, vital signs, labs
• True multivariate methods are cumbersome and
make many assumptions
• O’Brien [3] turned the 2-sample t-test backwards
• Predict treatment from Y using binary logistic
model (propensity score [1])
• To allow differences in means and variances use
Y, Y 2
• Extend to multiple Y s: more flexible than Hotelling
T 2
• Start with recursive partitioning
29
Regression Tree for Prob[drug]Week 8−20
platelets< 179.5
coad>=0.5
alat>=18.5
sodium< 139.5
0.668n=8232
0.571n=553
0.675n=7679
0.585n=537
0.681n=7142
0.661n=2835
0.544n=287
0.674n=2548
0.695n=4307
Figure 15: Regression tree predicting Prob(drug) using safety re-
sponses for weeks 8-20. When an inequality holds for a subject,
branch to the left, otherwise to the right. coad>=0.5 means that
chronic obstructive airways disease is present. alat stands for ala-
nine aminotransferase.30
Estimating Prob(conditions C|treatment)
• Bayes’ rule:
P (C|drug) = P (drug|C)P (C)/P (drug) =
P (drug|C)P (C)/23
• P (C|placebo) = [1− P (drug|C)]P (C)/13
• RR = P (C|drug)/P (C|placebo) =12P (drug|C)/[1− P (drug|C)]
• Example: If P (drug|C) = 23 , drug:placebo RR of
C = 1
• drug:placebo RR that platelets are below 180 =120.571/.429 = 0.67
• drug:placebo RR that platelets are above 179 and
COAD is present = 12 .585/.415 = 0.71.
31
Binary Logistic Model for Prob(drug)
• Assume additivity
• Do not assume linearity
• Restricted cubic splines for continuous variables
• Wald χ2 for each variable gauges the partial
association between that variable and treatment
after adjusting for associations between all other
variables and treatment
32
χ2 − df
−2 0 2 4 6 8 10
pr bilirubin
sbp hematocrit
rbc rr
dbp wbc
protein coad
headache corr.qt
platelets nausea
axis dyspepsia
lymphocytes qrs
alk.phos upper.resp.infect
diarrhea potassium
week albumin ab.pain
bun creatinine
glucose
Independent Predictors of DrugPost 2 Weeks
Figure 16: Degree of partial associations with treatment.
33
Examples from Other Studies
• Similar studies A (23 subjects) and C (49 subjects)
• Multiple doses and days (one dose/subject)
Day
Dos
e
10 20 30
0
2
4
6
8
10
12
A
10 20 30
C
116
Figure 17: Study designs. Size of bubble is proportional to num-
ber of subjects (see key for 1 and 16 subjects). Left panel is study
A, right is C.
34
Data
• 5 normalized liver function parameters
TB Total Bilirubin
ALT Alanine Aminotransferase
AST Aspartate Aminotransferase
GGT γ Glutamyl Transferase
AP Alkaline Phosphatase
35
Day
Nor
mal
ized
Alk
alin
e P
hosp
hata
se
10 20 30
20
40
60
80
0 1
10 20 30
3
6
10 20 30
20
40
60
80
12
Figure 18: Individual normalized alkaline phosphatase measure-
ments for study C, for 5 doses. Each line represents one subject.
36
APn
StudyC
Day
TBn
GGTn
ALTn
ASTn
Age
Dose
0.20
0.15
0.10
0.05
0.00
30 * Hoeffding D
Figure 19: Clustering of individual normalized post-baseline clini-
cal chemistry parameters using Hoeffding’s D index as a similarity
measure.
37
Data Reduction
• Data reduced to summary scores: within-subject
slope of response vs. time, and AUC
• Slopes emphasized
38
Dose
sTBn
sGGTn
sAPn
StudyC
sALTn
sASTn
0.5
0.4
0.3
0.2
0.1
0.0
Spearman ρ2
Figure 20: Clustering of slopes, using Spearman ρ2 as a similarity
measure.
39
Multivariate Analysis
• Proportional odds ordinal logistic model to predict
dose from study and 5 slopes
• Initially allowed all interactions with study (global
test of interaction: P = 0.07)
• Only slope of normalized AP strongly interacted
with study
• Spearman ρ for AP slope vs. dose in study C:
P = 0.0004
• In model with single interaction, only TB and AP
were independently affected by increasing dose
40
χ2 − df
−1 0 1 2 3 4 5 6
sTBn
sAPn
Study * sAPn
Study
sALTn
sGGTn
sASTn
Figure 21: Partial χ2 statistics penalized for d.f.
41
Another Study
• Study B (13 subjects)
• No dose effect on slope of bilirubin
• AP slope had strongest correlation with dose
42
sGGTn
Dose
sAPn
sTBn
sALTn
sASTn
0.6
0.5
0.4
0.3
0.2
0.1
0.0
Spearman ρ2
Figure 22: Variable clustering of slopes of normalized clinical
chemistry parameters in study B.
43
Summary
• Graphical exploration of multiple safety response
variables has many advantages over generating
reams of tables
• Empirical CDFs and extended box plots contain
more information than proportion > k× ULN,
mean± SD, quartiles
• There are many exploratory analyses to be tapped
for safety data
• Some help transform complex multivariate
analyses into univariate ones
• Recursive partitioning is a useful exploratory tool
• Exploratory analyses, while not confirming
problems or providing causal inference, may
provide hypotheses for subject matter experts
• Note: this work was done using only free
open-source software: R, LATEX, Linux
44
Abstract
It is difficult to design a clinical study to provide sound inferences about
safety effects of drugs in addition to providing trustworthy evidence for
efficacy. Patient entry criteria and experimental design are targeted at
efficacy, and there are too many possible safety endpoints to be able to
control type I error while preserving power. Safety analysis tends to be
somewhat ad hoc and exploratory. But with the large quantity of safety data
acquired during clinical drug testing, safety data are rarely harvested to
their fullest potential. Also, decisions are sometimes made that result in
analyses that are somewhat arbitrary or that lose statistical efficiency. For
example, safety assessments can be too quick to rely on the proportion of
patients in each treatment group at each clinic visit who have a lab
measurement above two or three times the upper limit of normal.
Safety reports frequently fail to fully explore areas such as
• which types of patients are having AEs?
• what distortions in the tails of the distribution of lab values are taking
place?
• which AEs tend to occur in the same patient?
• how to clinical AEs correlate to continuous lab measurements at a
given time
• which AEs and lab abnormalities are uniquely related to treatment
assigned?
• do preclinically significant measurements at an earlier visit predict
AEs at a later visit?
45
• how can time trends in many variables be digested into an
understandable picture?
This talk will demonstrate some of the exploratory statistical and graphical
methods that can help answer questions such as the above, using
examples based on data from real pharmaceutical trials.
46
References
[1] E. F. Cook and L. Goldman. Asymmetric
stratification: An outline for an efficient method for
controlling confounding in cohort studies. American
Journal of Epidemiology, 127:626–639, 1988.
[2] J. L. Hintze and R. D. Nelson. Violin plots: A box
plot-density trace synergism. American Statistician,
52:181–184, 1998.
[3] P. C. O’Brien. Comparing two samples: Extensions
of the t, rank-sum, and log-rank test. Journal of the
American Statistical Association, 83:52–61, 1988.
47