multiple instance learning for sparse positive bags razvan c. bunescu machine learning group...

45
Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin [email protected] Raymond J. Mooney Machine Learning Group Department of Computer Sciences University of Texas at Austin [email protected]

Upload: daniela-rose

Post on 18-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin

Multiple Instance Learning for Sparse Positive Bags

Razvan C. BunescuMachine Learning GroupDepartment of Computer

SciencesUniversity of Texas at Austin

[email protected]

Raymond J. MooneyMachine Learning GroupDepartment of Computer

SciencesUniversity of Texas at Austin

[email protected]

Page 2: Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin

Two Types of Supervision

• Single Instance Learning (SIL):– the traditional type of supervision in machine learning.

– a dataset of positive and negative training instances.

• Multiple Instance Learning (MIL):

– a dataset of positive and negative training bags of instances.

– a bag is positive if at least one instance in the bag is positive.

– a bag is negative if all instances in the bag are negative.

– the bag instance labels are hidden.

2

Page 3: Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin

MIL Background: Domains

• Originally introduced to solve a Drug Activity prediction problem in biochemistry [Dietterich et al., 1997]

• Content Based Image Retrieval [Zhang et al., 2002]

• Text categorization [Andrews et al., 03], [Ray et al., 05].

3

Page 4: Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin

MIL Background: Algorithms

• Axis Parallel Rectangles [Dietterich, 1997]

• Diverse Density [Maron, 1998]

• Multiple Instance Logistic Regression [Ray & Craven, 05]

• Multi-Instance SVM kernels of [Gartner et al., 2002]

– Normalized Set Kernel.

– Statistic Kernel.

4

Page 5: Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin

Outline

Introduction

• MIL as SIL with one-side noise

• The Normalized Set Kernel (NSK)

• Three SVM approaches to MIL:– An SVM approach to sparse MIL (sMIL)

– A transductive SVM approach to sparse MIL (stMIL)

– A balanced SVM approach to MIL (sbMIL)

• Experimental Results

• Future Work & Conclusion

5

Page 6: Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin

SIL Approach to MIL

• Apply bag label to all bag instances.

• Formulate as SVM problem.

pn X Xxx

X XxxL

CwJ 2

2

1)(

minimize:

subject to:

0

,1)(

,1)(

x

px

nx

Xxbxw

Xxbxw

6

Page 7: Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin

SIL Approach to MIL

• Apply bag label to all bag instances.

• Formulate as SVM problem.

pn X Xxx

X XxxL

CwJ 2

2

1)(

0

,1)(

,1)(

x

px

nx

Xxbxw

Xxbxw

minimize:

subject to:

Negative Bags

7

Page 8: Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin

SIL Approach to MIL

• Apply bag label to all bag instances.

• Formulate as SVM problem.

pn X Xxx

X XxxL

CwJ 2

2

1)(

0

,1)(

,1)(

x

px

nx

Xxbxw

Xxbxw

minimize:

subject to:

Positive Bags

8

Page 9: Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin

SIL Approach to MIL

• Apply bag label to all bag instances.

• Formulate as SVM problem.

pn X Xxx

X XxxL

CwJ 2

2

1)(

0

,1)(

,1)(

x

px

nx

Xxbxw

Xxbxw

minimize:

subject to:

Regularization term

9

Page 10: Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin

SIL Approach to MIL

• Apply bag label to all bag instances.

• Formulate as SVM problem.

pn X Xxx

X XxxL

CwJ 2

2

1)(

0

,1)(

,1)(

x

px

nx

Xxbxw

Xxbxw

minimize:

subject to:

Error on negative bags

10

Page 11: Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin

SIL Approach to MIL

• Apply bag label to all bag instances.

• Formulate as SVM problem.

pn X Xxx

X XxxL

CwJ 2

2

1)(

0

,1)(

,1)(

x

px

nx

Xxbxw

Xxbxw

minimize:

subject to:

Error on positive bags

11

Page 12: Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin

Outline

Introduction

MIL as SIL with one-side noise

• The Normalized Set Kernel (NSK)

• Three SVM approaches to MIL:– An SVM approach to sparse MIL (sMIL)

– A transductive SVM approach to sparse MIL (stMIL)

– A balanced SVM approach to MIL (sbMIL)

• Experimental Results

• Future Work & Conclusion

12

Page 13: Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin

From SIL to the Normalized Set Kernel

• Apply bag label to all bag instances.

• Formulate as SVM problem.

pn X Xxx

X XxxL

CwJ 2

2

1)(

minimize:

subject to:

0

,1)(

,1)(

x

px

nx

Xxbxw

Xxbxw

13

Page 14: Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin

From SIL to the Normalized Set Kernel

• Apply bag label to all bag instances.

• Formulate as SVM problem.

pn X Xxx

X XxxL

CwJ 2

2

1)(

minimize:

subject to:

0

,1)(

,1)(

x

px

nx

Xxbxw

Xxbxw

14

Page 15: Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin

From SIL to the Normalized Set Kernel

• Apply bag label to all bag instances.

• Formulate as SVM problem.

pn X Xxx

X XxxL

CwJ 2

2

1)(

minimize:

subject to:

0

,1)(

,1)(

x

px

nx

Xxbxw

Xxbxw

pXx

xXx

nXx

xXx

XXX

bX

xw

XXX

bX

xw

,||

1||

)(

,||

1||

)(

15

Page 16: Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin

From SIL to the Normalized Set Kernel

• Apply bag label to all bag instances.

• Formulate as SVM problem.

pn X Xxx

X XxxL

CwJ 2

2

1)(

minimize:

subject to:

0

,1)(

,1)(

x

px

nx

Xxbxw

Xxbxw

pXx

xXx

nXx

xXx

XXX

bX

xw

XXX

bX

xw

,||

1||

)(

,||

1||

)(

(X)

16

Page 17: Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin

From SIL to the Normalized Set Kernel

• Apply bag label to all bag instances.

• Formulate as SVM problem.

pn X Xxx

X XxxL

CwJ 2

2

1)(

minimize:

subject to:

0

,1)(

,1)(

x

px

nx

Xxbxw

Xxbxw

pXx

x

nXx

x

XXX

bX

Xw

XXX

bX

Xw

,||

1||

)(

,||

1||

)(

X

17

Page 18: Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin

From SIL to the Normalized Set Kernel

• Apply bag label to all bag instances.

• Formulate as SVM problem.

pn X Xxx

X XxxL

CwJ 2

2

1)(

minimize:

subject to:

0

,1)(

,1)(

x

px

nx

Xxbxw

Xxbxw

pX

nX

XXbX

Xw

XXbX

Xw

,1||

)(

,1||

)(

Normalized Set Kernel

18

Page 19: Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin

The Normalized Set Kernel

• A bag is represented as the normalized sum of its instances.

• Use bags as examples in an SVM formulation.

pn XX

XX

pn

CwJ

||2

1)(

2

minimize:

subject to:

pX

nX

XXbX

Xw

XXbX

Xw

,1||

)(

,1||

)(

[Gartner et al., 2002]

19

Page 20: Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin

The Normalized Set Kernel

• A bag is represented as the normalized sum of its instances.

• Use bags as examples in an SVM formulation.

pn XX

XX

pn

CwJ

||2

1)(

2

minimize:

subject to:

pX

nX

XXbX

Xw

XXbX

Xw

,1||

)(

,1||

)(

[Gartner et al., 2002]

20

nx Xxbxw ,1)(

Page 21: Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin

The Normalized Set Kernel (NSK)

• A positive bag is the normalized sum of its instances.

• Use positive bags and negative instances as examples.

pn X

XpX Xx

xn

C

L

CwJ

||2

1)(

2

minimize:

subject to:

pX

nx

XXbX

Xw

XXxbxw

,1||

)(,1)(

21

Page 22: Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin

Outline

Introduction

MIL as SIL with one-side noise

The Normalized Set Kernel (NSK)

• Three SVM approaches to MIL:– An SVM approach to sparse MIL (sMIL)

– A transductive SVM approach to sparse MIL (stMIL)

– A balanced SVM approach to MIL (sbMIL)

• Experimental Results

• Future Work & Conclusion

22

Page 23: Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin

The Normalized Set Kernel (NSK)

• A positive bag is the normalized sum of its instances.

• Use positive bags and negative instances as examples.

pn X

XpX Xx

xn

C

L

CwJ

||2

1)(

2

minimize:

subject to:

pX

nx

XXbX

Xw

XXxbxw

,1||

)(,1)(

23

too strong, especially when

sparse positive bags

Page 24: Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin

Inequality Constraints for Positive Bags

24

XbX

Xw

1||

)(

NSK constraint

Xxxy

X

xy

X

bxwX

XxXx

,1)(

||

)(

||

)(

Balancing constraint

implicitly assumes that all instances inside the bag X are positive

Page 25: Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin

Inequality Constraints for Positive Bags

25

want balancing contraint to express that at least one instance in the bag X is

positive

1)ˆ(

}ˆ{,1)(

||

)(

||

)(

xy

xXxxy

X

xy

X

bxwX

XxXx

XX

Xb

X

Xw

||

||2

||

)(

sparse MIL constraint

Page 26: Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin

The Sparse MIL (sMIL)

pn X

XpX Xx

xn

C

L

CwJ

||2

1)(

2

minimize:

subject to:

pX

nx

XXX

Xb

X

Xw

XXxbxw

,||

||2

||

)(

,1)(

26

larger for smaller bags small positive bags are more informative

than large positive bags

Page 27: Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin

Outline

Introduction

MIL as SIL with one-side noise

The Normalized Set Kernel (NSK)

• Three SVM approaches to MIL: An SVM approach to sparse MIL (sMIL)

– A transductive SVM approach to sparse MIL (stMIL)

– A balanced SVM approach to MIL (sbMIL)

• Experimental Results

• Future Work & Conclusion

27

Page 28: Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin

Xxbxw X ,1)(

Inequality Constraints for Positive Bags

• sMIL is closer than NSK at expressing the constraint that at least one instance from a positive bag is positive.

• However, sMIL does not guarantee that at least one instance is positive

– Problem: constraint may be satisfied when all instances have negative scores that are very close to zero.

– Solution: force all negative instances to have scores –1 + X using the transductive constraint:

28

XX

Xb

X

Xw

||

||2

||

)(

sparse MIL constraint

Page 29: Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin

Xxbxw X ,1)(

Inequality Constraints for Positive Bags

29

XX

Xb

X

Xw

1||

||2

||

)(

sparse MIL constraint

transductive constraintXbxwXx 1)(such that ˆ

shared slacks mixed integer programming problem

at least one instance is positive

Page 30: Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin

Xxbxw x ,1)(

Inequality Constraints for Positive Bags

30

XX

Xb

X

Xw

||

||2

||

)(

sparse MIL constraint

transductive constraintXbxwXx 1)(such that ˆ

independent slacks easier problem, solve with CCCP [Yuille et al., 2002]

at least one instance is positive

Page 31: Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin

The Sparse Transductive MIL (stMIL)

ppn X

XpX Xx

xpX Xx

xn

C

L

C

L

CwJ

||2

1)(

*2

minimize:

subject to:

0,

,||

||2

||

)(,1)(

,1)(

Xx

pX

p

n

x

x

XXX

Xb

X

Xw

XXx

XXx

bxw

bxw

31

solve with CCCP, as in [Collobert et al. 2006]

Page 32: Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin

Outline

Introduction

MIL as SIL with one-side noise

The Normalized Set Kernel (NSK)

• Three SVM approaches to MIL: An SVM approach to sparse MIL (sMIL)

A transductive SVM approach to sparse MIL (stMIL)

– A balanced SVM approach to MIL (sbMIL)

• Experimental Results

• Future Work & Conclusion

32

Page 33: Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin

A Balanced SVM Approach to MIL

• SIL ideal when bags are dense in positive instances.

• sMIL ideal when bags are sparse in positive instances.

• If expected density of positive instances is known, design a method that:

– converges to SIL when 1.

– converges to sMIL when 0.

• If is unknown, can set it using cross-validation.

33

Page 34: Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin

The Balanced MIL (sbMIL)

• Input:– Training negative bags Xn, define Xn {x | x X Xn}.

– Training positive bags Xp, define Xp {x | x X Xp}

– Features (x), or kernel K(x,y).– Capacity parameter C 0 and balance parameter [0,1].

• Output:– Decision function f(x) w(x)+b.

34

(w,b) solve_sMIL(Xn, Xp, , C).

order all instances xXp using f(x).

label instances xXp:

the top |Xp| as positive.

the rest (1–) |Xp| as negative.

(w,b) solve_SIL(Xn , Xp , , C).

Page 35: Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin

Outline

Introduction

MIL as SIL with one-side noise

The Normalized Set Kernel (NSK)

Three SVM approaches to MIL: An SVM approach to sparse MIL (sMIL)

A transductive SVM approach to sparse MIL (stMIL)

A balanced SVM approach to MIL (sbMIL)

• Experimental Results

• Future Work & Conclusion

35

Page 36: Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin

Experimental Results: Datasets

• [AIMed] An artificial, maximally sparse dataset:– Created from AIMed [Bunescu et al., 2005]:

• A dataset of documents annotated for protein interactions;• A sentence example contains a pair of proteins – the sentence is

positive iff it asserts an interaction between the two proteins;– Create positive bags of sentences:

• choose bag size randomly between Smin and Smax.

• start with exactly one positive instance, • randomly add negative instances.

– Create negative bags of sentences:• choose bag size randomly between Smin and Smax.

• randomly add negative instances.

• Use subsequence kernel from [Bunescu & Mooney, 2005].

36

Page 37: Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin

Experimental Results: Datasets

• [CBIR] Content Based Image Retrieval:– categorize images as to whether they contain an object of interest.– an image is a bag of image regions.– the number of regions varies widely between images.– for every image, expect that relatively few regions contain object of

interest naturally sparse positive bags.– Evaluate on [Tiger], [Elephant], [Fox] datasets from [Andrews et

al., 2003].

• Use a quadratic kernel with the original feature vectors.

37

Page 38: Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin

Experimental Results: Datasets

• [TST] Text categorization datasets:– Medline articles are bags of overlapping text passages.– Articles are annotated with Mesh terms – use them as classes.– Use [TST1] and [TST2] from [Andrews et al., 2003].

• [MUSK] Drug Activity prediction:– Bags of 3D low energy conformations for every molecule.– A bag is positive is at least one conformation binds to target.– [MUSK1] and MUSK2] datasets from [Dietterich et al., 1997]– A bag is positive if the molecule smells “musky”.

• Use a quadratic kernel with the original feature vectors.38

Page 39: Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin

Experimental Results: Systems

• [SIL] The MIL as SIL with one-side noise.

• [NSK] The Normalized Set Kernel.

• [STK] The Statistic Kernel.

• [sMIL] The SVM approach to sparse MIL.

• [stMIL] The transductive SVM approach to sparse MIL.

• [sbMIL] The balanced SVM approach to MIL.

39

Page 40: Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin

Experimental Results

40

Dataset SIL NSK STK sMIL sbMIL stMIL

AIMed 57.44 87.11 N/A 87.19 87.99 92.11

AIMed½ 45.86 54.06 N/A 54.08 67.66 72.94

Tiger 76.65 79.07 80.80 81.12 82.95 74.48

Elephant 85.08 82.94 85.22 87.98 88.58 81.64

Fox 52.72 64.01 62.14 66.13 69.78 60.67

MUSK1 87.82 85.61 69.44 86.91 91.78 79.46

MUSK2 87.33 90.78 61.01 81.19 87.74 68.41

TST1 96.25 97.16 96.19 97.29 97.41 96.81

TST2 85.37 90.60 86.87 87.97 90.57 88.55

Page 41: Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin

Experimental Results

41

Dataset SIL NSK STK sMIL sbMIL stMIL

AIMed 57.44 87.11 N/A 87.19 87.99 92.11

AIMed½ 45.86 54.06 N/A 54.08 67.66 72.94

Tiger 76.65 79.07 80.80 81.12 82.95 74.48

Elephant 85.08 82.94 85.22 87.98 88.58 81.64

Fox 52.72 64.01 62.14 66.13 69.78 60.67

MUSK1 87.82 85.61 69.44 86.91 91.78 79.46

MUSK2 87.33 90.78 61.01 81.19 87.74 68.41

TST1 96.25 97.16 96.19 97.29 97.41 96.81

TST2 85.37 90.60 86.87 87.97 90.57 88.55

Page 42: Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin

Experimental Results

42

Dataset SIL NSK STK sMIL sbMIL stMIL

AIMed 57.44 87.11 N/A 87.19 87.99 92.11

AIMed½ 45.86 54.06 N/A 54.08 67.66 72.94

Tiger 76.65 79.07 80.80 81.12 82.95 74.48

Elephant 85.08 82.94 85.22 87.98 88.58 81.64

Fox 52.72 64.01 62.14 66.13 69.78 60.67

MUSK1 87.82 85.61 69.44 86.91 91.78 79.46

MUSK2 87.33 90.78 61.01 81.19 87.74 68.41

TST1 96.25 97.16 96.19 97.29 97.41 96.81

TST2 85.37 90.60 86.87 87.97 90.57 88.55

Page 43: Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin

Future Work

• Capture distribution imbalance in the MIL model:– instances belonging to the same bag are, in general, more similar

than instances belonging to different bags.

• Incorporate estimates of bag-level densitiy in the MIL model:– in some applications, estimates of density of positive instances

are available for every bag.

43

Page 44: Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin

Conclusion

• Proposed an SVM approach to MIL that is particularly effective when bags are sparse in positive instances.

• Modeling a global density of positive instancs in positive bags further improves the accuracy.

• Treating instances from positive bags as unlabeled data in a transductive setting is useful when negative instances in positive and negative bags come from the same distribution.

44

Page 45: Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin

Questions

45

?