michael biehl kerstin bunte petra schneider dream 6 / flowcap 2 challenge: molecular classification...
TRANSCRIPT
![Page 1: Michael Biehl Kerstin Bunte Petra Schneider DREAM 6 / FlowCAP 2 Challenge: Molecular Classification of Acute Myeloid Leukaemia Johann Bernoulli Institute](https://reader038.vdocument.in/reader038/viewer/2022110103/5697c0021a28abf838cc33fa/html5/thumbnails/1.jpg)
Michael Biehl
Kerstin Bunte
Petra Schneider
DREAM 6 / FlowCAP 2 Challenge: Molecular Classification of Acute
Myeloid Leukaemia
Johann Bernoulli Institute for Mathematics and Computer ScienceUniversity of Groningen, The Netherlands
1
Centre for Diabetes, Endicronology & Metabolism School of Clinical & Experimental MedicineUniversity of Birmingham, UK
Team Admire-LVQAdaptive Distance Measures In Relevance Learning Vector Quantization
![Page 2: Michael Biehl Kerstin Bunte Petra Schneider DREAM 6 / FlowCAP 2 Challenge: Molecular Classification of Acute Myeloid Leukaemia Johann Bernoulli Institute](https://reader038.vdocument.in/reader038/viewer/2022110103/5697c0021a28abf838cc33fa/html5/thumbnails/2.jpg)
![Page 3: Michael Biehl Kerstin Bunte Petra Schneider DREAM 6 / FlowCAP 2 Challenge: Molecular Classification of Acute Myeloid Leukaemia Johann Bernoulli Institute](https://reader038.vdocument.in/reader038/viewer/2022110103/5697c0021a28abf838cc33fa/html5/thumbnails/3.jpg)
33
DREAM6/FlowCAP2 challenge 2011
The DREAM project [www.the-dream-project.org]
Dialogue for Reverse Engineering Assessments and Methods
FlowCAP initiative [http://flowcap.flowsite.org]
Flow Cytometry: Critical Assessment of Population Identification Methods
Organizers Ryan Brinkman, British Columbia Cancer Agency Raphael Gottardo, Fred Hutchinson Cancer Research Center Tim Mosmann, University of Rochester Richard H. Scheuermann, University of Texas Southwestern Medical Center
Organizers Gustavo Stolovitzky, Robert Prill, Raquel Norel, Pablo Meyer, IBM Computational Biology Center Julio Saez-Rodriguez, European Bioinformatics Institute (EMBL-EBI)
![Page 4: Michael Biehl Kerstin Bunte Petra Schneider DREAM 6 / FlowCAP 2 Challenge: Molecular Classification of Acute Myeloid Leukaemia Johann Bernoulli Institute](https://reader038.vdocument.in/reader038/viewer/2022110103/5697c0021a28abf838cc33fa/html5/thumbnails/4.jpg)
44
flow cytometry
preprocessing
cell size, granularity,
+26 protein markers
(ten-) thousands
of events per marker
4
training set: 23 AML patients, 156 healthy donors
test set : 180 unlabeled patients
Wade Rogers,
U. of Pennsylvania
peripheral blood/bone marrow aspirate
fluorophore-conjugated antibodiesfor specific proteins
© www.the-dream-project.org
![Page 5: Michael Biehl Kerstin Bunte Petra Schneider DREAM 6 / FlowCAP 2 Challenge: Molecular Classification of Acute Myeloid Leukaemia Johann Bernoulli Institute](https://reader038.vdocument.in/reader038/viewer/2022110103/5697c0021a28abf838cc33fa/html5/thumbnails/5.jpg)
55
list of markers
1 FS lin (~ cell size)
2 SS log (~ granularity)
3 CD45 (protein marker)
measured in all cells}
5© www.the-dream-project.org
four diff.features
![Page 6: Michael Biehl Kerstin Bunte Petra Schneider DREAM 6 / FlowCAP 2 Challenge: Molecular Classification of Acute Myeloid Leukaemia Johann Bernoulli Institute](https://reader038.vdocument.in/reader038/viewer/2022110103/5697c0021a28abf838cc33fa/html5/thumbnails/6.jpg)
66
possible workflow:
- selection of cells, based on e.g. FS Lin, SS Log, CD-45
- inspection of all markers only for selected cells
e.g. differential diagnosis (subtypes)
list of markers
here: classification based on entire cell population and all markers
target diagnosis: AML patient / healthy donor
unspecific with respect to types of AML
consideration of frequencies / histograms only
information about single cells disregarded
![Page 7: Michael Biehl Kerstin Bunte Petra Schneider DREAM 6 / FlowCAP 2 Challenge: Molecular Classification of Acute Myeloid Leukaemia Johann Bernoulli Institute](https://reader038.vdocument.in/reader038/viewer/2022110103/5697c0021a28abf838cc33fa/html5/thumbnails/7.jpg)
77
class-conditional mean histograms
healthy donors
AML patients
suggested set of features
(1)mean (2) standard deviation (3) skewness
(4) kurtosis (5) median (6) interquartile range
![Page 8: Michael Biehl Kerstin Bunte Petra Schneider DREAM 6 / FlowCAP 2 Challenge: Molecular Classification of Acute Myeloid Leukaemia Johann Bernoulli Institute](https://reader038.vdocument.in/reader038/viewer/2022110103/5697c0021a28abf838cc33fa/html5/thumbnails/8.jpg)
88
class-conditional mean histograms
healthy donors
AML patients
suggested set of features
(1)mean (2) standard deviation (3) skewness
(4) kurtosis (5) median (6) interquartile range
![Page 9: Michael Biehl Kerstin Bunte Petra Schneider DREAM 6 / FlowCAP 2 Challenge: Molecular Classification of Acute Myeloid Leukaemia Johann Bernoulli Institute](https://reader038.vdocument.in/reader038/viewer/2022110103/5697c0021a28abf838cc33fa/html5/thumbnails/9.jpg)
99
feature vectors (186-dim.)
healthy donors(mean)
AML patients(mean)
![Page 10: Michael Biehl Kerstin Bunte Petra Schneider DREAM 6 / FlowCAP 2 Challenge: Molecular Classification of Acute Myeloid Leukaemia Johann Bernoulli Institute](https://reader038.vdocument.in/reader038/viewer/2022110103/5697c0021a28abf838cc33fa/html5/thumbnails/10.jpg)
1010
matrix relevance LVQ
Training:
:d( , ) d( , )E ,
d( , ) d( , ) :
JJ m K mi
J m K m Km
ww x w x
ww x w x w
correct prototype
∙ cost function based Generalized Matrix LVQ (GMLVQ)
d , w x w x w x• 2
(186 186)
Ω ( - ) Ω Ω
with
x w •
∙ gradient based optimization of E ( prototypes and matrix Ω )
simplest setting: 1 prototype per class, healthy donors / AML patients
vectors w in 186-dim. features space
nearest prototype classifier according to adaptive distance measure
wrong prototype
![Page 11: Michael Biehl Kerstin Bunte Petra Schneider DREAM 6 / FlowCAP 2 Challenge: Molecular Classification of Acute Myeloid Leukaemia Johann Bernoulli Institute](https://reader038.vdocument.in/reader038/viewer/2022110103/5697c0021a28abf838cc33fa/html5/thumbnails/11.jpg)
1111
- 5/6 of data for training, 1/6 for validation- ROC, threshold-average over 50 random splits
validation
FS LinSS LogCD45
all markers
false positive rate false positive rate
tru
e p
ositi
ve r
ate
![Page 12: Michael Biehl Kerstin Bunte Petra Schneider DREAM 6 / FlowCAP 2 Challenge: Molecular Classification of Acute Myeloid Leukaemia Johann Bernoulli Institute](https://reader038.vdocument.in/reader038/viewer/2022110103/5697c0021a28abf838cc33fa/html5/thumbnails/12.jpg)
1212
- 5/6 of data for training, 1/6 for validation- ROC, threshold-average over 50 random splits
- note: patient 116 consistently misclassified
validation
tru
e p
ositi
ve r
ate
false positive rate
![Page 13: Michael Biehl Kerstin Bunte Petra Schneider DREAM 6 / FlowCAP 2 Challenge: Molecular Classification of Acute Myeloid Leukaemia Johann Bernoulli Institute](https://reader038.vdocument.in/reader038/viewer/2022110103/5697c0021a28abf838cc33fa/html5/thumbnails/13.jpg)
1313
validationtr
ain
ing
se
t
err
ors
val
idat
ion
se
t
err
ors
patient “116”
(AML)
![Page 14: Michael Biehl Kerstin Bunte Petra Schneider DREAM 6 / FlowCAP 2 Challenge: Molecular Classification of Acute Myeloid Leukaemia Johann Bernoulli Institute](https://reader038.vdocument.in/reader038/viewer/2022110103/5697c0021a28abf838cc33fa/html5/thumbnails/14.jpg)
1414
visualization
patient 116patient 116
projection on first eigenvector of Λ
proj
ectio
n on
firs
t ei
genv
ecto
r of
Λ
prototypes
![Page 15: Michael Biehl Kerstin Bunte Petra Schneider DREAM 6 / FlowCAP 2 Challenge: Molecular Classification of Acute Myeloid Leukaemia Johann Bernoulli Institute](https://reader038.vdocument.in/reader038/viewer/2022110103/5697c0021a28abf838cc33fa/html5/thumbnails/15.jpg)
1515
prediction: 180 test set patients
projection on first eigenvector of Λ
proj
ectio
n on
firs
t ei
genv
ecto
r of
Λ test set
prototypes
![Page 16: Michael Biehl Kerstin Bunte Petra Schneider DREAM 6 / FlowCAP 2 Challenge: Molecular Classification of Acute Myeloid Leukaemia Johann Bernoulli Institute](https://reader038.vdocument.in/reader038/viewer/2022110103/5697c0021a28abf838cc33fa/html5/thumbnails/16.jpg)
1616
1 2
1 2
1 d( , ) d( , )0 s = 1 1
2 d( , ) d( , )
w x w x
w x w x“AML – score”
prediction: 180 test set patients
20 AML cases!
perfect test set prediction
e.g. AUROC = 1
(achieved by 8 teams!)
Note: GMLVQ scores are
not directly interpretable
as “certainties” or
probabilistic assignments
![Page 17: Michael Biehl Kerstin Bunte Petra Schneider DREAM 6 / FlowCAP 2 Challenge: Molecular Classification of Acute Myeloid Leukaemia Johann Bernoulli Institute](https://reader038.vdocument.in/reader038/viewer/2022110103/5697c0021a28abf838cc33fa/html5/thumbnails/17.jpg)
1717
difference vector “ AML - healthy ” prototype
here: components corresponding to mean values
prototypes
![Page 18: Michael Biehl Kerstin Bunte Petra Schneider DREAM 6 / FlowCAP 2 Challenge: Molecular Classification of Acute Myeloid Leukaemia Johann Bernoulli Institute](https://reader038.vdocument.in/reader038/viewer/2022110103/5697c0021a28abf838cc33fa/html5/thumbnails/18.jpg)
1818
relevances
relevance of markers: in detail:
iqr
median
kurtosis
skewness
std. dev.
mean
← diagonal elements of Λ
![Page 19: Michael Biehl Kerstin Bunte Petra Schneider DREAM 6 / FlowCAP 2 Challenge: Molecular Classification of Acute Myeloid Leukaemia Johann Bernoulli Institute](https://reader038.vdocument.in/reader038/viewer/2022110103/5697c0021a28abf838cc33fa/html5/thumbnails/19.jpg)
1919
relevances
relevance of markers: in detail:
iqr
median
kurtosis
skewness
std. dev.
mean
SS log
![Page 20: Michael Biehl Kerstin Bunte Petra Schneider DREAM 6 / FlowCAP 2 Challenge: Molecular Classification of Acute Myeloid Leukaemia Johann Bernoulli Institute](https://reader038.vdocument.in/reader038/viewer/2022110103/5697c0021a28abf838cc33fa/html5/thumbnails/20.jpg)
2020
1 2
1 2
1 d( , ) d( , )0 s = 1 1
2 d( , ) d( , )
w x w x
w x w x“AML – score”
scores, certainties, ranking ?
20 AML cases!
perfect test set prediction
e.g. AUC =1 (ROC)
comparison:
scores vs. ground truth (?) :
Pearson-correlation: 0.9703
sum of |differences|: 3.8455
![Page 21: Michael Biehl Kerstin Bunte Petra Schneider DREAM 6 / FlowCAP 2 Challenge: Molecular Classification of Acute Myeloid Leukaemia Johann Bernoulli Institute](https://reader038.vdocument.in/reader038/viewer/2022110103/5697c0021a28abf838cc33fa/html5/thumbnails/21.jpg)
2121
tanh (3 s)0 1
tanh(3) “transformed AML – score”
20 AML cases!
perfect test set prediction
e.g. AUC =1 (ROC)
comparison:
scores vs. ground truth:
Pearson-correlation: 0.9820
sum of |differences|: 4.4347
scores, certainties, ranking ?
Pearson-correlation: 0.9703
sum of |differences|: 3.8455
![Page 22: Michael Biehl Kerstin Bunte Petra Schneider DREAM 6 / FlowCAP 2 Challenge: Molecular Classification of Acute Myeloid Leukaemia Johann Bernoulli Institute](https://reader038.vdocument.in/reader038/viewer/2022110103/5697c0021a28abf838cc33fa/html5/thumbnails/22.jpg)
2222
summary
feature vectors:
moment based characteristics of flow cytometry data
[mean, standard deviation, skewness, kurtosis, median, iqr ]
Matrix Relevance Learning Vector Quantization
- perfect classification with respect to training and test set
(e.g. AUC(roc)=1)
- weighting of features (pairs of features) according to
their relevance in the classification
- visualization of the data set
- identification of outliers (“116” ?)
![Page 23: Michael Biehl Kerstin Bunte Petra Schneider DREAM 6 / FlowCAP 2 Challenge: Molecular Classification of Acute Myeloid Leukaemia Johann Bernoulli Institute](https://reader038.vdocument.in/reader038/viewer/2022110103/5697c0021a28abf838cc33fa/html5/thumbnails/23.jpg)
2323
outlook
selection of reduced feature set:
relevance matrix results suggest a selection of
protein markers and/or specific features
identification / diagnosis of AML subtypes
- AML subtypes to be identified by specific marker profiles
- machine learning approach requires larger data sets, e.g.
GMLVQ with several prototypes representing AML
- back to gating – selection of cells for differential diagnosis?
direct classification of histograms
non-Euclidean, histogram-specific distance measures
e.g. Divergence-based LVQ [Mwebaze et al., 2010]
![Page 24: Michael Biehl Kerstin Bunte Petra Schneider DREAM 6 / FlowCAP 2 Challenge: Molecular Classification of Acute Myeloid Leukaemia Johann Bernoulli Institute](https://reader038.vdocument.in/reader038/viewer/2022110103/5697c0021a28abf838cc33fa/html5/thumbnails/24.jpg)
2424
P. Schneider, M. Biehl, B. Hammer, Adaptive relevance matrices in learning vector quantization Neural Computation 21: 3532-3561 (2009)
A recent application in tumor classification:
references (www.cs.rug.nl/~biehl)
W. Arlt, M. Biehl, A.E. Taylor et al. J Clinical Endocrinology & Metabolism, in press (2011) Urine Steroid Metabolomics as a Biomarker Tool for Detecting Malignancy in Patients with Adrenal Tumors
The method (GMLVQ):
![Page 25: Michael Biehl Kerstin Bunte Petra Schneider DREAM 6 / FlowCAP 2 Challenge: Molecular Classification of Acute Myeloid Leukaemia Johann Bernoulli Institute](https://reader038.vdocument.in/reader038/viewer/2022110103/5697c0021a28abf838cc33fa/html5/thumbnails/25.jpg)
2525
thanks
Thanks