sample-efficient learning with auxiliary class-label ...quang/papers/amia11_presentation.pdf ·...

Sample-efficient learning with auxiliary class-label information

Quang Nguyen Hamed Valizadegan

Amy Seybert Milos Hauskrecht

Computer Science Department University of Pittsburgh

Building a Classification Model

2

Training Data (patient records,

diagnoses)

Classifier New patient

Disease or not disease ?

Learning

Typically: More training data => Better classifier

Data Labeling and Its Cost

3

• Labeling requires human experts

Time consuming and costly

Small training data

• How to reduce the number of examples to label ?

• Active learning: select only the most critical examples to label

• Can we obtain more information from selected examples ?

Patient records Diagnoses (class labels) Training

data Labeling

disease/ no disease

Labs Medications Notes ……...

Our Solution

• Idea: ask a human expert to provide, in addition to class labels, his/her certainty in the label decision and incorporate it into the learning process

• Certainty can be represented in terms of

• Probability: p = 0.85 or

• Qualitative ordinal category: strong, medium or weak belief in disease

4

Training with Class Label Information

5

Learning

Patient record (labs, medications etc)

Binary class label (disease/no disease)

……..

x1

xN

y1=1/0

y1=1/0

……..

Classifier

Discriminant projection

Decision boundary

Training with Class and Auxiliary Information

6


Binary class label (disease/no disease)

Certainty score (certainty in disease)

……..

x1

xN

y1=1/0, p1

y1=1/0, pN

……..

Classifier

Learning

Learning with Auxiliary Information: Regression

7


Certainty info (certainty in disease)

……..

x1

xN

p1

pN

……..

Regression ( f: X → P )

Learning

Learning with Auxiliary information: Noise

• Human certainty estimates are often noisy

– certainty score p may be inconsistent

• Regression relies on exact values of p

Sensitive to noise

8

Solution ?

Patient record

p = ?

Modeling pairwise orders

9

• Idea: Certainty scores let us order examples

• Our approach: build a discriminant projection f(x) that respects this order

• Minimize the number of violated pairwise order constraints

• Modeling pairwise orders instead of relying on exact values of p

=> learning less sensitive to noise

<

f(x)

Learning with Pairwise Order Constraints

• Approach 1: adapt SVM Rank Algorithm (Herbrich 2000, Joachims 2002)

– Optimize:

minw 1

2wTw + C 𝜉𝑖,𝑗𝑖,𝑗:𝑝𝑖>𝑝𝑗

– Pairwise order constraints:

∀ i,j: pi > pj: wT (xi – xj) ≥ 1 - 𝜉𝑖,𝑗

∀i∀j : 𝜉𝑖,𝑗 ≥ 0

10

Penalty for violating pairwise orders

Penalty is the same for all pairs

SVM Rank with Weighted Examples

• Approach 2: SVM Rank Weighted

– Optimize:

minw 1

2wTw + C (𝑝𝑖−𝑝𝑗)𝜉𝑖,𝑗𝑖,𝑗:𝑝𝑖>𝑝𝑗

– Pairwise order constraints:

∀ i,j: pi > pj: wT (xi – xj) ≥ 1 - 𝜉𝑖,𝑗

∀i∀j : 𝜉𝑖,𝑗 ≥ 0

11

Larger p difference => larger penalty for

violating pairwise orders

Experiments: HIT Data

Heparin-induced thrombocytopenia (HIT):

• A life-threatening condition that may develop when patients are treated by heparin

Data:

• 182 patient instances labeled by an expert

• 50 features derived from time series of

– Labs: Hemoglobin levels, platelet and white blood cell counts

– Medications: Heparin and its administration record

– Procedure: a major heart procedure

12

HIT Data

Labeling:

• For each patient case we asked the expert 2 questions

– Do you agree with raising an alert on HIT or not ? Yes/no

– How strongly do you agree/disagree ? Scale 0 – 4 : strongly-disagree to strongly-agree

Case review:

• We used a special graphical interface

– Average time to review a patient: 247 seconds

– Average time to enter assessments: under 10 seconds

13

Experiments: Models

• Models

– Trained on binary labels

• SVM: standard linear SVM

– Trained on certainty labels

• LinReg: Linear Regression

• SVM-Rank: SVM model applied to pairwise data points

• SVM-RankW: Weighted version of SVM-Rank

• Evaluation

– Data are randomly split into train/test set

– Repeat training/testing process 30 times

– Average AUC (Area under ROC curve) is recorded 14

Experiments on HIT Data: Results

• Our proposed methods consistently outperform both regression and standard SVM

• To reach AUC = 0.8

– SVM (training on binary labels) needs 100 examples

– SVM_RankW (training on certainty labels) needs 60 examples

Experiments on unbalanced HIT data

16

• Data in medicine are often unbalanced (rare positive examples)

We need more examples to learn a classifier

• Auxiliary information can help us to learn also in this setting

Standard SVM cannot learn without positive examples (AUC ~ 0.5, not shown)

Conclusions

• Auxiliary certainty information

– Helps to learn better classification models with smaller numbers of examples

– Can be obtained with little additional cost

• Human subjective certainty assessments are noisy

• Propose methods are robust to noise

– Pairwise orders are more consistent than exact estimates

17

Thank you for your attention !

• Acknowledgment: this research was supported by grants from

the National Institute of Health

– 1R01LM010019 (M. Hauskrecht)

– 1R01GM088224 (M. Hauskrecht)

18

sample-efficient learning with auxiliary class-label ...quang/papers/amia11_presentation.pdf ·...

Documents