sample-efficient learning with auxiliary class-label ...quang/papers/amia11_presentation.pdf ·...
TRANSCRIPT
Sample-efficient learning with auxiliary class-label information
Quang Nguyen Hamed Valizadegan
Amy Seybert Milos Hauskrecht
Computer Science Department University of Pittsburgh
Building a Classification Model
2
Training Data (patient records,
diagnoses)
Classifier New patient
Disease or not disease ?
Learning
Typically: More training data => Better classifier
Data Labeling and Its Cost
3
• Labeling requires human experts
Time consuming and costly
Small training data
• How to reduce the number of examples to label ?
• Active learning: select only the most critical examples to label
• Can we obtain more information from selected examples ?
Patient records Diagnoses (class labels) Training
data Labeling
disease/ no disease
Labs Medications Notes ……...
Our Solution
• Idea: ask a human expert to provide, in addition to class labels, his/her certainty in the label decision and incorporate it into the learning process
• Certainty can be represented in terms of
• Probability: p = 0.85 or
• Qualitative ordinal category: strong, medium or weak belief in disease
4
Training with Class Label Information
5
Learning
Patient record (labs, medications etc)
Binary class label (disease/no disease)
……..
x1
xN
y1=1/0
y1=1/0
……..
Classifier
Discriminant projection
Decision boundary
Training with Class and Auxiliary Information
6
Patient record (labs, medications etc)
Binary class label (disease/no disease)
Certainty score (certainty in disease)
……..
x1
xN
y1=1/0, p1
y1=1/0, pN
……..
Classifier
Learning
Learning with Auxiliary Information: Regression
7
Patient record (labs, medications etc)
Certainty info (certainty in disease)
……..
x1
xN
p1
pN
……..
Regression ( f: X → P )
Learning
Learning with Auxiliary information: Noise
• Human certainty estimates are often noisy
– certainty score p may be inconsistent
• Regression relies on exact values of p
Sensitive to noise
8
Solution ?
Patient record
p = ?
Modeling pairwise orders
9
• Idea: Certainty scores let us order examples
• Our approach: build a discriminant projection f(x) that respects this order
• Minimize the number of violated pairwise order constraints
• Modeling pairwise orders instead of relying on exact values of p
=> learning less sensitive to noise
<
f(x)
Learning with Pairwise Order Constraints
• Approach 1: adapt SVM Rank Algorithm (Herbrich 2000, Joachims 2002)
– Optimize:
minw 1
2wTw + C 𝜉𝑖,𝑗𝑖,𝑗:𝑝𝑖>𝑝𝑗
– Pairwise order constraints:
∀ i,j: pi > pj: wT (xi – xj) ≥ 1 - 𝜉𝑖,𝑗
∀i∀j : 𝜉𝑖,𝑗 ≥ 0
10
Penalty for violating pairwise orders
Penalty is the same for all pairs
SVM Rank with Weighted Examples
• Approach 2: SVM Rank Weighted
– Optimize:
minw 1
2wTw + C (𝑝𝑖−𝑝𝑗)𝜉𝑖,𝑗𝑖,𝑗:𝑝𝑖>𝑝𝑗
– Pairwise order constraints:
∀ i,j: pi > pj: wT (xi – xj) ≥ 1 - 𝜉𝑖,𝑗
∀i∀j : 𝜉𝑖,𝑗 ≥ 0
11
Larger p difference => larger penalty for
violating pairwise orders
Experiments: HIT Data
Heparin-induced thrombocytopenia (HIT):
• A life-threatening condition that may develop when patients are treated by heparin
Data:
• 182 patient instances labeled by an expert
• 50 features derived from time series of
– Labs: Hemoglobin levels, platelet and white blood cell counts
– Medications: Heparin and its administration record
– Procedure: a major heart procedure
12
HIT Data
Labeling:
• For each patient case we asked the expert 2 questions
– Do you agree with raising an alert on HIT or not ? Yes/no
– How strongly do you agree/disagree ? Scale 0 – 4 : strongly-disagree to strongly-agree
Case review:
• We used a special graphical interface
– Average time to review a patient: 247 seconds
– Average time to enter assessments: under 10 seconds
13
Experiments: Models
• Models
– Trained on binary labels
• SVM: standard linear SVM
– Trained on certainty labels
• LinReg: Linear Regression
• SVM-Rank: SVM model applied to pairwise data points
• SVM-RankW: Weighted version of SVM-Rank
• Evaluation
– Data are randomly split into train/test set
– Repeat training/testing process 30 times
– Average AUC (Area under ROC curve) is recorded 14
Experiments on HIT Data: Results
• Our proposed methods consistently outperform both regression and standard SVM
• To reach AUC = 0.8
– SVM (training on binary labels) needs 100 examples
– SVM_RankW (training on certainty labels) needs 60 examples
Experiments on unbalanced HIT data
16
• Data in medicine are often unbalanced (rare positive examples)
We need more examples to learn a classifier
• Auxiliary information can help us to learn also in this setting
Standard SVM cannot learn without positive examples (AUC ~ 0.5, not shown)
Conclusions
• Auxiliary certainty information
– Helps to learn better classification models with smaller numbers of examples
– Can be obtained with little additional cost
• Human subjective certainty assessments are noisy
• Propose methods are robust to noise
– Pairwise orders are more consistent than exact estimates
17
Thank you for your attention !
• Acknowledgment: this research was supported by grants from
the National Institute of Health
– 1R01LM010019 (M. Hauskrecht)
– 1R01GM088224 (M. Hauskrecht)
18