multiplemultiplemultiple li l p iinstanceinstanceinstance ...verma/papers/mil_icml_poster.pdf ·...

1
lil I L i ih if ld Multiple Instance Learning with Manifold Bags Multiple Instance Learning with Manifold Bags Multiple Instance Learning with Manifold Bags Multiple Instance Learning with Manifold Bags Boris Babenko 1 Nakul Verma 1 Piotr Dollar 2 Serge Belongie 1 Boris Babenko 1 Nakul Verma 1 Piotr Dollar 2 Serge Belongie 1 1 d 1 Computer Science and Engineering 2 Computer Science Computer Science and Engineering Computer Science University of California San Diego California Institute of Technology University of California, San Diego California Institute of Technology {bbabenko naverma sjb}@cs ucsd edu {pdollar}@caltech edu {bbabenko,naverma,sjb}@cs.ucsd.edu {pdollar}@caltech.edu ABSTRACT MANIFOLD BAGS TAKE HOME MESSAGE ABSTRACT MANIFOLD BAGS TAKEHOME MESSAGE MANIFOLD BAGS TAKE HOME MESSAGE Multiple Instance Learning is a relaxed form of supervised learning FORMULATION Increasing reduces complexity term Multiple Instance Learning is a relaxed form of supervised learning FORMULATION Increasing reduces complexity term » Learner receives labeled bags rather than labeled instances M if ld b d f b di t ib ti Increasing reduces failure probability » Learner receives labeled bags rather than labeled instances Manifold bag drawn from bag distribution Increasing reduces failure probability » Reduces burden of collecting labeled data Instance hypotheses » Seems to contradict previous results (smaller bag size is better) » Reduces burden of collecting labeled data Instance hypotheses: » Seems to contradict previous results (smaller bag size is better) Existing analysis assumes bags have a finite size I t t diff bt d ! Existing analysis assumes bags have a finite size » Important difference between and ! For many applications bags are modeled better as manifolds in If i ll l t ti i t f iti b For many applications, bags are modeled better as manifolds in » If is small, may only get negative instances from a positive bag f t th i ti l ii t it Corresponding bag hypotheses: I i i lbl i i d feature space; thus existing analysis is not appropriate Corresponding bag hypotheses: Increasing requires extra labels, increasing does not I thi tti h Increasing requires extra labels, increasing does not In this setting we show: In this setting we show: f f ldb ff l bl » geometric structure of manifold bags affects PAC learnability » geometric structure of manifold bags affects PAC learnability » a MIL algorithm that learns from finite sized bags can be trained » a MIL algorithm that learns from finite sized bags can be trained with manifold bags TRAINING WITH MANY INSTANCES with manifold bags TRAINING WITH MANY INSTANCES » a simple heuristic algorithm for reducing memory requirements TRAINING WITH MANY INSTANCES Negative region Positive region » a simple heuristic algorithm for reducing memory requirements Negative region Positive region bl i /b b h i l li i Problem: want many instances/bag, but have computational limits Problem: want many instances/bag, but have computational limits Solution: Iterative Querying Heuristic (IQH) Solution: Iterative Querying Heuristic (IQH) » Grab small number of instances/bag run standard MIL algorithm GENERALIZATION BOUNDS MULTIPLE INSTANCE LEARNING (MIL) » Grab small number of instances/bag, run standard MIL algorithm f hb l k h h h h GENERALIZATION BOUNDS MULTIPLE INSTANCE LEARNING (MIL) » Query more instances from each bag, only keep the ones that get high score MULTIPLE INSTANCE LEARNING (MIL) Query more instances from each bag, only keep the ones that get high score f l ifi from current classifier VC DIMENSION VC DIMENSION DEFINITION A f l ti ii l ( )t li ti ( ) DEFINITION Iteration 1 Iteration 2 Iteration 3 A way of relating empirical error ( ) to generalization error ( ) DEFINITION { } { } { } Iteration 1 Iteration 2 Iteration 3 Standard bound: MIL l df f i dl i { } { } { } Standard bound: MIL: relaxed form of supervised learning B 1 ii { } { } { } » (set of examples label) pairs provided Bag 1: positive At each iteration train with small # of instances » (set of examples, label) pairs provided VC Dimension of bag h pothesis class Bag 2: positive At each iteration, train with small # of instances » MIL lingo: set of examples = bag of instances VC Dimension of bag hypothesis class Bag 2: positive MIL lingo: set of examples bag of instances B lbld iti if tl t Bag 3: negative W tt lt t » Bag labeled positive if at least one + Bag 3: negative Want to relate to instance in bag is positive + For finite sized bags (Sabato & Tishby 2009): instance in bag is positive 0 For finite sized bags (Sabato & Tishby 2009): 0 Negative instance Positive instance 0 EXPERIMENTS EXISTING ANALYSIS 0 EXPERIMENTS EXISTING ANALYSIS Turns out is unbounded even for arbitrarily smooth bags EXPERIMENTS EXISTING ANALYSIS Turns out is unbounded even for arbitrarily smooth bags Data model (bottom up) Data model (bottom up) D i t dth i l b l f fi d di t ib ti TAMING THE RICHNESS SYNTHETIC DATA » Draw instances and their labels from fixed distribution TAMING THE RICHNESS SYNTHETIC DATA » Create bag from instances determine its label (max of instance labels) TAMING THE RICHNESS » Create bag from instances, determine its label (max of instance labels) Bag hypothesis class too powerful: for positive bag » Return bag & bag label to learner Bag hypothesis class too powerful: for positive bag, d l l f Blum & Kalai (1998) need to only classify one instance as positive Blum & Kalai (1998) 1 Infinitely many instances > too much flexibility » If: access to noise tolerant instance learner instances drawn independently Infinitely many instances > too much flexibility » If: access to noise tolerant instance learner, instances drawn independently Th b l l it li i ½ 3smooth ll iti for bag hypothesis » Then: bag sample complexity linear in 3smooth small positive for bag hypothesis W ld lik t li ibl ti f Sabato & Tishby (2009) 0 2 margin region Would like to ensure a nonnegligible portion of Sabato & Tishby (2009) positive bags is labeled positive » If: can minimize empirical error on bags 1 smooth curvy positive bags is labeled positive l » Then: bag sample complexity logarithmic in ½ smooth curvy Solution: » Then: bag sample complexity logarithmic in 2 smooth small positive » Switch to real valued hypothesis class Error scales with curvature and volume 0 3 margin region » Switch to realvalued hypothesis class Error scales with curvature and volume 1 tb Li hit th t APPLICATIONS 3 margin » must be Lipschitz smooth w.r.t. REAL DATA APPLICATIONS ½ 3 smooth large positive margin » must label bags with a margin REAL DATA Object Detection (images) 0 3 margin region » must label bags with a margin INRIA H d(Dll l ‘05) Object Detection (images) 0 INRIA Heads (Dalal et al. ‘05) Instance: image patch TIMIT Ph (G fl t l ‘93) { } { } { } Instance: image patch I Lbl if ? TIMIT Phonemes (Garofolo et al., ‘93) { …} + { …} + { …} Instance Label: is face? FAT SHATTERING DIMENSION { …} + { …} + { …} Bag: whole image FAT SHATTERING DIMENSION Bag Label: contains face? Fthtt i di i lt ii l Bag Label: contains face? Fat shattering dimension relates empirical error Phoneme Detection (audio) pad=16 pad=32 at margin to generalization error Phoneme Detection (audio) pad=16 pad=32 at margin to generalization error Instance: audio clip Standard bound: + Instance: audio clip I Lbl i‘ h’? + Instance Label: is ‘ sh’? hi Bag: whole audio Fat shattering dimension “machine” “learning” Bag: whole audio Bag Label: contains ‘ sh’? Fat shattering dimension f b h th i l Bag Label: contains sh ? of bag hypothesis class Event Detection (video) Unlike VC we can relate to Event Detection (video) Unlike VC, we can relate to Instance: video clip Key quantities: empirical error at margin ( ), number of training bags ( ), manifold bag dimension Instance: video clip I Lbl i ‘j ’? ( ) manifold bag volume ( ) smoothness ( ) Instance Label: is ‘jump’? ( ), manifold bag volume ( ), smoothness ( ) + Bag: whole video + Bag: whole video Bag Label: contains ‘jump’? Bag Label: contains jump ? QUERYING INSTANCES QUERYING INSTANCES OBSERVATIONS QUERYING INSTANCES OBSERVATIONS In practice, learner can only access small number of T d d ti b f b di t ib ti th ti t In practice, learner can only access small number of instances ( ) Top down process: draw entire bag from a bag distribution, then get instances instances ( ) Instances of a bag lie on a manifold Same bound holds with increased failure probability Error scales with number of training bags volume number of queried Instances of a bag lie on a manifold ll f b f b l Same bound holds with increased failure probability Error scales with number of training bags, volume, number of queried Potentially infinite number of instances per bag ‐‐ existing analysis inappropriate instances and number of IQH iterations Expect sample complexity to scale with manifold parameters (curvature dimension instances and number of IQH iterations. Expect sample complexity to scale with manifold parameters (curvature, dimension, volume etc) volume, etc)

Upload: others

Post on 24-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MultipleMultipleMultiple li l p IInstanceInstanceInstance ...verma/papers/mil_icml_poster.pdf · »» geometricgeometric structurestructure ofoff manifoldmanifoldfld bbagsbags affectsaffectsff

l i l I L i i h if ldMultiple Instance Learning with Manifold BagsMultiple Instance Learning with Manifold BagsMultiple Instance Learning with Manifold BagsMultiple Instance Learning with Manifold Bagsp g gBoris Babenko1 Nakul Verma1 Piotr Dollar2 Serge Belongie1Boris Babenko1 Nakul Verma1 Piotr Dollar2 Serge Belongie1o s abe o a u e a ot o a Se ge e o g e

1 d1 Computer Science and Engineering 2 Computer ScienceComputer Science and Engineering Computer ScienceUniversity of California San Diego California Institute of TechnologyUniversity of California, San Diego California Institute of Technology

{bbabenko naverma sjb}@cs ucsd edu {pdollar}@caltech edu{bbabenko,naverma,sjb}@cs.ucsd.edu {pdollar}@caltech.edu

ABSTRACT MANIFOLD BAGS TAKE HOME MESSAGEABSTRACT MANIFOLD BAGS TAKE‐HOME MESSAGES MANIFOLD BAGS TAKE HOME MESSAGE• Multiple Instance Learning is a relaxed form of supervised learning FORMULATION • Increasing reduces complexity termMultiple Instance Learning is a relaxed form of supervised learning FORMULATION • Increasing       reduces complexity term» Learner receives labeled bags rather than labeled instances M if ld b d f b di t ib ti • Increasing reduces failure probability» Learner receives labeled bags rather than labeled instances • Manifold bag     drawn from bag distribution • Increasing       reduces failure probability» Reduces burden of collecting labeled data

g g• Instance hypotheses » Seems to contradict previous results (smaller bag size is better)» Reduces burden of collecting labeled data • Instance hypotheses:  » Seems to contradict previous results (smaller bag size     is better)

• Existing analysis assumes bags have a finite sizep ( g )

I t t diff b t d !• Existing analysis assumes bags have a finite size » Important difference between      and     ! • For many applications bags are modeled better asmanifolds in

pIf i ll l t ti i t f iti b• For many applications, bags are modeled better as manifolds in  » If      is small, may only get negative instances from a positive bagy pp , g

f t th i ti l i i t i t • Corresponding bag hypotheses:s s a , ay o y get egat e sta ces o a pos t e bag

I i i l b l i i dfeature space; thus existing analysis is not appropriate Corresponding bag hypotheses: • Increasing requires extra labels, increasing does notp ; g y pp pI thi tti h

Increasing       requires extra labels, increasing      does not• In this setting we show:In this setting we show:

f f ld b ff l b l» geometric structure of manifold bags affects PAC learnability» geometric structure of manifold bags affects PAC learnability» a MIL algorithm that learns from finite sized bags can be trained» a MIL algorithm that learns from finite sized bags can be trained with manifold bags TRAINING WITH MANY INSTANCESwith manifold bags TRAINING WITH MANY INSTANCES» a simple heuristic algorithm for reducing memory requirements TRAINING WITH MANY INSTANCESNegative region Positive region» a simple heuristic algorithm for reducing memory requirements Negative region Positive region

bl i /b b h i l li i• Problem: want many instances/bag, but have computational limitsProblem: want many instances/bag, but have computational limits• Solution: Iterative Querying Heuristic (IQH)Solution: Iterative Querying Heuristic (IQH)» Grab small number of instances/bag run standard MIL algorithmGENERALIZATION BOUNDSMULTIPLE INSTANCE LEARNING (MIL)» Grab small number of instances/bag, run standard MIL algorithm

f h b l k h h h hGENERALIZATION BOUNDSMULTIPLE INSTANCE LEARNING (MIL) » Query more instances from each bag, only keep the ones that get high scoreG O OU SMULTIPLE INSTANCE LEARNING (MIL) Query more instances from each bag, only keep the ones that get high score 

f l ififrom current classifierVC DIMENSIONVC DIMENSIONDEFINITION A f l ti i i l ( ) t li ti ( )DEFINITION Iteration 1 Iteration 2 Iteration 3• A way of relating empirical error (     ) to generalization error (     )DEFINITION

{ } { } { }Iteration 1 Iteration 2 Iteration 3

• Standard bound:MIL l d f f i d l i { } { } { }• Standard bound:MIL: relaxed form of supervised learningB 1 i i

{          } {          } {          }p g» (set of examples label) pairs provided

Bag 1: positive

• At each iteration train with small # of instances» (set of examples, label) pairs provided

VC Dimension of bag h pothesis classBag 2: positive • At each iteration, train with small # of instances» MIL lingo: set of examples = bag of instances VC Dimension of bag hypothesis classBag 2: positiveMIL lingo: set of examples   bag of instancesB l b l d iti if t l t Bag 3: negative

W t t l t t» Bag labeled positive if at least one  +

Bag 3: negative

• Want to relate                to   instance in bag is positive+

• For finite sized bags (Sabato & Tishby 2009):instance in bag is positive

0–• For finite sized bags (Sabato & Tishby 2009): 0Negative instance Positive instance 0EXPERIMENTSEXISTING ANALYSIS

0EXPERIMENTSEXISTING ANALYSIS • Turns out is unbounded even for arbitrarily smooth bags EXPERIMENTSEXISTING ANALYSIS Turns out                is unbounded even for arbitrarily smooth bags

Data model (bottom up)Data model (bottom up)D i t d th i l b l f fi d di t ib ti

TAMING THE RICHNESS SYNTHETIC DATA» Draw     instances and their labels from fixed distributionTAMING THE RICHNESS SYNTHETIC DATA» Create bag from instances determine its label (max of instance labels) TAMING THE RICHNESS» Create bag from instances, determine its label (max of instance labels)• Bag hypothesis class too powerful: for positive bag» Return bag & bag label to learner Bag hypothesis class too powerful:  for positive bag, 

d l l fg g

Blum & Kalai (1998) need to only classify one instance as positiveBlum & Kalai (1998)1y y p

• Infinitely many instances > too much flexibility» If: access to noise tolerant instance learner instances drawn independently • Infinitely many instances ‐> too much flexibility » If: access to noise tolerant instance learner, instances drawn independentlyTh b l l it li i

½ 

smooth ll itifor bag hypothesis» Then: bag sample complexity linear in  smooth  small positive for bag hypothesisW ld lik t li ibl ti f

g p p ySabato & Tishby (2009)

0 margin region• Would like to ensure a non‐negligible portion of Sabato & Tishby (2009)positive bags is labeled positive» If: can minimize empirical error on bags

1

smooth curvypositive bags is labeled positivel

p g» Then: bag sample complexity logarithmic in

½ 

smooth curvy• Solution:» Then: bag sample complexity logarithmic in  smooth  small positive 

» Switch to real valued hypothesis class • Error scales with curvature and volume0 margin region» Switch to real‐valued hypothesis class Error scales with curvature and volumeg eg o

1

t b Li hit th tAPPLICATIONS margin» must be Lipschitz smooth w.r.t. REAL DATAAPPLICATIONS ½  smooth  large positive margin

» must label bags with amargin REAL DATAObject Detection (images) 0margin region» must label bags with a margin

INRIA H d (D l l l ‘05)Object Detection (images) 0

g g

• INRIA Heads (Dalal et al. ‘05) Instance: image patch ( )TIMIT Ph (G f l t l ‘93){ } { } { }

Instance: image patchI L b l i f ? • TIMIT Phonemes (Garofolo et al., ‘93){ … }+ { … }+ { … }‐Instance Label: is face?

FAT SHATTERING DIMENSION( , ){          … }+ {          … }+ {          … }

Bag: whole image FAT SHATTERING DIMENSIONag: whole imageBag Label: contains face?

F t h tt i di i l t i i lBag Label: contains face?

• Fat shattering dimension relates empirical error Phoneme Detection (audio) pad=16 pad=32atmargin to generalization error

Phoneme Detection (audio) pad=16 pad=32at margin  to generalization error

Instance: audio clip • Standard bound:+Instance: audio clipI L b l i ‘ h’? + ‐Instance Label: is ‘sh’?

“ hi ”Bag: whole audio Fat shattering dimension“machine” “learning”Bag: whole audioBag Label: contains ‘sh’?

Fat shattering dimensionf b h th i lgBag Label: contains  sh ? of bag hypothesis class

Event Detection (video) • Unlike VC we can relate toEvent Detection (video) • Unlike VC, we can relate                 toInstance: video clip • Key quantities:  empirical error at margin      (          ), number of training bags (         ), manifold bag dimension Instance: video clipI L b l i ‘j ’?

y q p g ( ), g g ( ), g( ) manifold bag volume ( ) smoothness ( )

Instance Label: is ‘jump’?(       ), manifold bag volume (        ), smoothness (      )+ ‐Bag: whole video + ‐Bag: whole video

Bag Label: contains ‘jump’?Bag Label: contains  jump ?

QUERYING INSTANCESQUERYING INSTANCESOBSERVATIONS QUERYING INSTANCESOBSERVATIONS• In practice, learner can only access small number ofT d d ti b f b di t ib ti th t i t In practice, learner can only access small number ofinstances ( )

• Top down process: draw entire bag from a bag distribution, then get instancesinstances  (       )

p p g g g• Instances of a bag lie on amanifold • Same bound holds with increased failure probability • Error scales with number of training bags volume number of queriedInstances of a bag lie on a manifold

ll f b f b l Same bound holds with increased failure probability         • Error scales with number of training bags, volume, number of queried • Potentially infinite number of instances per bag ‐‐ existing analysis inappropriate g g , , qinstances and number of IQH iterations

y p g g y pp p• Expect sample complexity to scale withmanifold parameters (curvature dimension instances and number of IQH iterations.• Expect sample complexity to scale with manifold parameters (curvature, dimension, volume etc)volume, etc)