using cart to unravel clusters for the testing of interactions in asthma databases

17
Using CART to Unravel Clusters for the Testing of Interactions in Asthma Databases Ben Trzaskoma Sr Statistical Scientist Genentech, Inc

Upload: salford-systems

Post on 08-Jul-2015

665 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Using CART to Unravel Clusters for the Testing of Interactions in Asthma Databases

Using CART to Unravel Clusters for the Testing of Interactions in Asthma Databases

Ben Trzaskoma

Sr Statistical Scientist

Genentech, Inc

Page 2: Using CART to Unravel Clusters for the Testing of Interactions in Asthma Databases

Introduction

• Ultimate Purpose: Determine subgroups with greatest

“Drug advantage” (versus Placebo)

• Variable clustering

− Used to identify groups of variables

− Created composite scores summarizing groups of variables

− Determined “Drug benefit” for each composite score

• Patient (case) clustering

− Used to identify subsets of patients

− Determined “Drug benefit” for each subset of patients

Page 3: Using CART to Unravel Clusters for the Testing of Interactions in Asthma Databases

Clinical Trial Dataset

• Phase IIIb multicenter, randomized, double-blind,

placebo-controlled study

• Original purpose of the study:

− Evaluate efficacy and safety of Asthma Drug

− Subjects have moderate to severe asthma

− Asthma is inadequately controlled with ICS and LABA

• 850 patients age 12-75 from about 150 sites were

enrolled and followed for 48 weeks

− Half given Drug + ICS + LABA

− Half given placebo + ICS + LABA

Page 4: Using CART to Unravel Clusters for the Testing of Interactions in Asthma Databases

Methods

• Two approaches:

• Variable Clustering (Proc Varclus in SAS)

• Related to factor analysis

• Looks for relationship among variables

• Patient (Case) Clustering (Proc Cluster in SAS)

• Looks for groups of “similar” cases

Page 5: Using CART to Unravel Clusters for the Testing of Interactions in Asthma Databases

• Patient (case) clustering with SAS CLUSTER procedure− Identify groups of similar patients− Variables with most variability are most important − The resulting clusters allow for the analysis of patient groups with

different characteristics

Methods: Patient Clustering

Page 6: Using CART to Unravel Clusters for the Testing of Interactions in Asthma Databases

• Patient clustering details− The SAS CLUSTER procedure is an agglomerative clustering

technique Starts with one cluster per patient and iteratively groups the two

nearest clusters until there is only one cluster with all patients in it. Based on a set of key variables Ward’s method selected. This method uses the minimum variance

to determine which two clusters should be the next to cluster together. It tends to maximize the corresponding ANOVA.

Selected stopping point with moderate number of clusters (selection is somewhat arbitrary -- see dendrogram)

The squared multiple correlation, R-squared, is the proportion of variance accounted for by the clusters and is used to assess the goodness of a particular cluster solution.

Methods: Patient Clustering

Page 7: Using CART to Unravel Clusters for the Testing of Interactions in Asthma Databases

Methods: Patient Clustering

Ward’s MethodWard proposes that at any stage of an analysis the loss of information which results fromthe grouping of individuals into clusters can be measured by the total sum of squareddeviations of every point from the mean of the cluster to which it belongs.

The distance between a group k and a group (ij) formed by the fusion of i and j:

dk(ij) = αidki + αjdkj + βdij

Where dij is the distance between groups i and j

αi =

nk + ni

nk + ni + nj

αj =

nk + nj

nk + ni + nj

β =

-nk

nk + ni + nj

Everitt B, Cluster Analysis, 1977

And ni is the number of cases in group i.

Page 8: Using CART to Unravel Clusters for the Testing of Interactions in Asthma Databases

Patient Clusters from Clinical Trial Dataset

Age, Sex,

Race, BMI

Demographics

Data File

FEV1, FVC

variables

Spirometry

Data File

Duration,

Onset, Skin

tests

Allergic History

Data File

Symptoms,

Activity,

Smoking

AQLQ Data

File

Data Variables used to Cluster

Page 9: Using CART to Unravel Clusters for the Testing of Interactions in Asthma Databases

Patient Clusters

• Used CART to better understand patient clusters Used to uncover hidden structure in complex data to predict our 7

clusters 10-fold cross validation used to build the CART model

– We set the variable indicating the 7 clusters as the target Allowed CART to help us describe and ultimately name the 7

clusters via the nodes in the final tree

Page 10: Using CART to Unravel Clusters for the Testing of Interactions in Asthma Databases

Patient Clusters

• CART Method Details– The target is the Cluster assignment variable, 30 predictors

were included– Cluster was considered categorical and the classification tree

was used– CART single variable splitting criteria method used was Gini– Each predictor was given equal weight– No priors, constraints, or penalties were defined– Default of 10-fold cross validation was used

Page 11: Using CART to Unravel Clusters for the Testing of Interactions in Asthma Databases

Patient Clusters

CART Tree Page 1

Page 12: Using CART to Unravel Clusters for the Testing of Interactions in Asthma Databases

Patient Clusters

CART Tree Page 2

Page 13: Using CART to Unravel Clusters for the Testing of Interactions in Asthma Databases

Patient Clusters

CART Tree Node Descriptions1. ICS Use = Low/Medium2. ICS Use High and ICU/Intubated3. ICS Use High and Not ICU/Intubated and On Women’s Hormone Therapy4. ICS Use High and Not ICU/Intubated and Not On Women’s Hormone Therapy and Black5. ICS Use High and Not ICU/Intubated and Not On Women’s Hormone Therapy and Not Black and Age < 44.5 and Post-Bronchodilator % Predicted FVC <= .726. ICS Use High and Not ICU/Intubated and Not On Women’s Hormone Therapy and Not Black and Age < 44.5 and Post-Bronchodilator % Predicted FVC > .72

and Activity Score <= 3.59

7. ICS Use High and Not ICU/Intubated and Not On Women’s Hormone Therapy and Not Black and Age < 44.5 and Post-Bronchodilator % Predicted FVC > .72 and Activity Score > 3.598. ICS Use High and Not ICU/Intubated and Not On Women’s Hormone Therapy and Not Black and Age > 44.5 and Post-Bronchodilator % Predicted FEV1 <= 61.889. ICS Use High and Not ICU/Intubated and Not On Women’s Hormone Therapy and Not Black and Age > 44.5 and Post-Bronchodilator % Predicted FEV1 > 61.88

Page 14: Using CART to Unravel Clusters for the Testing of Interactions in Asthma Databases

Patient Clusters

• CART output and node descriptions helped to name the

7 clusters– (1) Older/Poor Lung Function – (2) Younger/Good Lung Function/Good Activity– (3) Older/Moderate Lung Function– (4) High Women’s Hormone Therapy– (5) Race - Black– (6) High ICS Use– (7) High ICU/Intubation

Page 15: Using CART to Unravel Clusters for the Testing of Interactions in Asthma Databases

Results

FEV1* and Exacerbation Advantage for the Seven Cluster Solution

Page 16: Using CART to Unravel Clusters for the Testing of Interactions in Asthma Databases

Limitations

• Potential Issues with Case Clustering– Additional variables could have been included– CART is one way to describe the cluster splits, but not the

only way

Page 17: Using CART to Unravel Clusters for the Testing of Interactions in Asthma Databases

Conclusions

• Conclusions– CART helped us describe the clusters in a clinically

meaningful way– There are groups of patients that respond to Active Drug

over placebo better than other groups– Patients in cluster 4 (High Women’s Hormone Therapy) and

the cluster 2 (Younger/Good Lung Function/Good Activity) responded better than other patient clusters