transfer learning for auto-gating of flow cytometry...
TRANSCRIPT
Transfer Learningfor Auto-gating of Flow Cytometry Data
Gyemin LeeLloyd StoolmanClayton Scott
University of Michigan
ICML 2011 Workshop on Unsupervised and Transfer LearningJuly 2, 2011
Lee, Stoolman, Scott (University of Michigan) TL for Auto-gating of Flow Cytometry Data ICML 2011 workshop (July 2, 2011) 1 / 13
Flow Cytometry
A technique for rapidly quantifying physical and chemical properties of largenumbers of cells.e.g. size, shape, and fluorescent antigen attributes
Applications : diagnosis of blood-related diseases such as acute leukemia,chronic lymphoproliferative disorders and malignant lymphomas
FS SS CD45 CD4 CD8 CD3790 626 592 177 252 303496 477 675 485 306 383684 553 548 180 325 322681 588 563 221 258 272632 565 531 0 134 41... ... ... ... ... ...
Each column corresponds to ameasured feature
Each row corresponds to a cell
10,000 ∼ 100,000 cells/rows for anexperiment
Lee, Stoolman, Scott (University of Michigan) TL for Auto-gating of Flow Cytometry Data ICML 2011 workshop (July 2, 2011) 2 / 13
Gating
Typical flow cytometry data analysis involves visualizing multiple2-dimensional scatter plots and manually selecting subset of cells from thescatter plots.
⇓ gating
⇒ assigning binary labels yi ∈ {−1,1} to every cell xi
Lee, Stoolman, Scott (University of Michigan) TL for Auto-gating of Flow Cytometry Data ICML 2011 workshop (July 2, 2011) 3 / 13
Gating
The distribution of cell populations differs from patient to patient.
Lee, Stoolman, Scott (University of Michigan) TL for Auto-gating of Flow Cytometry Data ICML 2011 workshop (July 2, 2011) 4 / 13
Automated Gating
Problems of manual gating
labor-intensive and time-consuminghighly subjective and not standardizedmodern clinical laboratories see dozens of cases per day
⇒ highly desirable to automate “gating”
Automated gating
In flow cytometry data analysis, more than 70% of studies focused onautomated gating techniques 1.In automatic gating, majority of approaches rely on unsupervisedclustering/mixture modeling.
1Bashashati & Brinkman, 2009Lee, Stoolman, Scott (University of Michigan) TL for Auto-gating of Flow Cytometry Data ICML 2011 workshop (July 2, 2011) 5 / 13
Automated Gating
Problems of manual gating
labor-intensive and time-consuminghighly subjective and not standardizedmodern clinical laboratories see dozens of cases per day
⇒ highly desirable to automate “gating”
Automated gating
In flow cytometry data analysis, more than 70% of studies focused onautomated gating techniques 1.In automatic gating, majority of approaches rely on unsupervisedclustering/mixture modeling.
1Bashashati & Brinkman, 2009Lee, Stoolman, Scott (University of Michigan) TL for Auto-gating of Flow Cytometry Data ICML 2011 workshop (July 2, 2011) 5 / 13
Auto-gating as a Transfer Learning Problem
Given
M labeled source datasets Dm = {(xm,i , ym,i)}Nmi=1 ∼ Pm for m = 1, . . . ,M
an unlabeled target dataset T = {xt,i}Nti=1 ∼ Pt
Goal : assign labels {yt,i}Nt
i=1 to T with low misclassification
D1 D2
⋯
DM
T
⇒
{yt,i}Nti=1
Lee, Stoolman, Scott (University of Michigan) TL for Auto-gating of Flow Cytometry Data ICML 2011 workshop (July 2, 2011) 6 / 13
Our Approach (1/2)
Consider linear decision functions
ftest(x) = ⟨w , x⟩ + b ≷ 0
1. Summarize expert knowledge fm from each of the M source dataset Dm
to build a baseline classifier f0.
D1 ⇒ f1D2 ⇒ f2⋮ ⋮DM ⇒ fM
⎫⎪⎪⎪⎪⎪⎬⎪⎪⎪⎪⎪⎭
⇒ f0 = ⟨w0,x⟩ + b0 ≷ 0 (baseline)
where fm ∶ (wm,bm)← SVM(Dm), m = 1, . . . ,M
f0 ∶ (w0,b0)← robust mean({(wm,bm)}m)
f0
Lee, Stoolman, Scott (University of Michigan) TL for Auto-gating of Flow Cytometry Data ICML 2011 workshop (July 2, 2011) 7 / 13
Our Approach (2/2)
2. Transfer the knowledge by adapting f0 to the target task T based on thelow-density separation principle.
f0T }⇒ ft = ⟨wt ,x⟩ + bt ≷ 0
Adjust the hyperplane parameters (w,b) so that the decision boundary passesthrough a region where the marginal density of T is low.
Find (wt ,bt) near (w0,b0) that minimizes the number of data points insidethe margin
Nt
∑i=1
I{ ∣⟨wt , xt,i ⟩ + bt ∣∥wt∥
< ∆}
ft
Lee, Stoolman, Scott (University of Michigan) TL for Auto-gating of Flow Cytometry Data ICML 2011 workshop (July 2, 2011) 8 / 13
Auto-gating Example
Comparison of the gating from the baseline (f0) and the proposed transferlearning (ft) to the gating by the expert (true).
true
f0
ft
Lee, Stoolman, Scott (University of Michigan) TL for Auto-gating of Flow Cytometry Data ICML 2011 workshop (July 2, 2011) 9 / 13
Experiments - setup
0 5 10 15 20 25 30 350
2
4
6
8
10x 10
4
Case
Num
ber
of C
ells
total Cells(+) labeled Cells
35 peripheral blood datasets are provided by the Department of Pathology,University of Michigan
Leave-One-Out Setting
choose a dataset as a target task Thide the labels of Ttreat the other datasets as source tasks Dm, m = 1, . . . ,34
Lee, Stoolman, Scott (University of Michigan) TL for Auto-gating of Flow Cytometry Data ICML 2011 workshop (July 2, 2011) 10 / 13
Experiments - results
Our Transfer Learning Approach
f0 : baseline classifier with no adaptationft : classifier adapted to T by varying both the direction and the bias
Reference Approaches
Pooling : merge all the source data, and learn a classifier on this datasetOracle : standard SVM with the true labels of the target task data
Lee, Stoolman, Scott (University of Michigan) TL for Auto-gating of Flow Cytometry Data ICML 2011 workshop (July 2, 2011) 11 / 13
Experiments - results
Pool f0 ft Oracleavg 9.81 3.70 2.49 2.12
std err 1.68 0.54 0.30 0.27
⇒ Our strategy can successfully replicate what experts do in thefield without labeled training set for the target task.
Lee, Stoolman, Scott (University of Michigan) TL for Auto-gating of Flow Cytometry Data ICML 2011 workshop (July 2, 2011) 12 / 13
Conclusion and Forthcoming work
Conclusion
We cast flow cytometry auto-gating as a transfer learning problem.
By combining the transfer learning and the low-density separation criterionfor class separation, our strategy can leverage expert-gated datasets for theautomatic gating of a new unlabeled dataset.
Forthcoming work
General kernel-based framework
Generalization error analysis
Joint with Gilles Blanchard
Lee, Stoolman, Scott (University of Michigan) TL for Auto-gating of Flow Cytometry Data ICML 2011 workshop (July 2, 2011) 13 / 13
Our Approach - detail
2-1. Varying biasFor a grid of biases {sj}, count points inside the margin
cj ←Nt
∑
i=1
I{∣⟨w, xt,i ⟩ + b − si ∣
∥w∥<∆}, ∀j : count
p(z)←∑j
cj δ(z − sj) ∗1
√
2πhexp(−
z2
2h2) : smooth
z∗ ← gradient descent (p(z), 0) : find minimizing bias
bnew ← b − z∗ : update bias
2-2. Varying normal vectorLet wt = w0 + atvt where vt = eig(cov([w1, . . . ,wM])).
For a grid of the amount of changes {ak}, count points inside the margin
ck ←Nt
∑
i=1
I{∣⟨w0 + akvt , xt,i ⟩ + b∣
∥w0 + akvt∥< 1} : count
g(a)←∑k
ck δ(a − ak) ∗1
√
2πhexp(−
a2
2h2) : smooth
at ← gradient descent (g(a), 0) : find minimizing at
wnew← w0 + atvt : update direction
Lee, Stoolman, Scott (University of Michigan) TL for Auto-gating of Flow Cytometry Data ICML 2011 workshop (July 2, 2011) 13 / 13