guiding semi-supervision with constraint-driven learningdanroth/teaching/cis... · bene ts: simple...

48
Guiding Semi-Supervision with Constraint-Driven Learning Ming-Wei Chang 1 Lev Ratinov 2 Dan Roth 3 1 Department of Computer Science University of Illinois at Urbana-Champaign Paper presentation by: Drew Stone Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign) Guiding Semi-Supervision with Constraint-Driven Learning Paper presentation by: Drew Stone 1/ 39

Upload: others

Post on 22-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Guiding Semi-Supervision with Constraint-Driven Learningdanroth/Teaching/CIS... · Bene ts: Simple models (less features) are more computationally e cient Intuition: Fix a set of

Guiding Semi-Supervision with Constraint-DrivenLearning

Ming-Wei Chang1 Lev Ratinov2 Dan Roth3

1Department of Computer ScienceUniversity of Illinois at Urbana-Champaign

Paper presentation by: Drew Stone

Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 1 /

39

Page 2: Guiding Semi-Supervision with Constraint-Driven Learningdanroth/Teaching/CIS... · Bene ts: Simple models (less features) are more computationally e cient Intuition: Fix a set of

Outline

1 BackgroundSemi-supervised learningConstraint-driven learning

2 Constraint-driven learning with semi-supervisionIntroductionModelScoringConstraint pipelineConstraint Driven Learning (CODL) algorithm

3 Experiments

Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 2 /

39

Page 3: Guiding Semi-Supervision with Constraint-Driven Learningdanroth/Teaching/CIS... · Bene ts: Simple models (less features) are more computationally e cient Intuition: Fix a set of

Outline

1 BackgroundSemi-supervised learningConstraint-driven learning

2 Constraint-driven learning with semi-supervisionIntroductionModelScoringConstraint pipelineConstraint Driven Learning (CODL) algorithm

3 Experiments

Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 3 /

39

Page 4: Guiding Semi-Supervision with Constraint-Driven Learningdanroth/Teaching/CIS... · Bene ts: Simple models (less features) are more computationally e cient Intuition: Fix a set of

Semi-supervised learningIntroduction

Question: When labelled data is scarce, how can we take advantageof unlabelled data for training?

Intuition: If there exists structure in the underlying distribution ofsamples, points close/clustered to one another may share labels

Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 4 /

39

Page 5: Guiding Semi-Supervision with Constraint-Driven Learningdanroth/Teaching/CIS... · Bene ts: Simple models (less features) are more computationally e cient Intuition: Fix a set of

Semi-supervised learningFramework

Given a model M trained on labelled samples from a distribution Dand an unlabelled set of examples U ⊆ Dm

Learn labels for each example in U −→ Learn(U,M) and re-useexamples (now labelled) to tune M−→M∗

Benefit: Access to more training data

Drawback: Learned model might drift from correct classifier if theassumptions on the distribution do not hold.

Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 5 /

39

Page 6: Guiding Semi-Supervision with Constraint-Driven Learningdanroth/Teaching/CIS... · Bene ts: Simple models (less features) are more computationally e cient Intuition: Fix a set of

Semi-supervised learningFramework

Given a model M trained on labelled samples from a distribution Dand an unlabelled set of examples U ⊆ Dm

Learn labels for each example in U −→ Learn(U,M) and re-useexamples (now labelled) to tune M−→M∗

Benefit: Access to more training data

Drawback: Learned model might drift from correct classifier if theassumptions on the distribution do not hold.

Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 5 /

39

Page 7: Guiding Semi-Supervision with Constraint-Driven Learningdanroth/Teaching/CIS... · Bene ts: Simple models (less features) are more computationally e cient Intuition: Fix a set of

Semi-supervised learningFramework

Given a model M trained on labelled samples from a distribution Dand an unlabelled set of examples U ⊆ Dm

Learn labels for each example in U −→ Learn(U,M) and re-useexamples (now labelled) to tune M−→M∗

Benefit: Access to more training data

Drawback: Learned model might drift from correct classifier if theassumptions on the distribution do not hold.

Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 5 /

39

Page 8: Guiding Semi-Supervision with Constraint-Driven Learningdanroth/Teaching/CIS... · Bene ts: Simple models (less features) are more computationally e cient Intuition: Fix a set of

Outline

1 BackgroundSemi-supervised learningConstraint-driven learning

2 Constraint-driven learning with semi-supervisionIntroductionModelScoringConstraint pipelineConstraint Driven Learning (CODL) algorithm

3 Experiments

Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 6 /

39

Page 9: Guiding Semi-Supervision with Constraint-Driven Learningdanroth/Teaching/CIS... · Bene ts: Simple models (less features) are more computationally e cient Intuition: Fix a set of

Constraint-driven learningIntroduction

Motivation: Keep the learned model simple by using constraints tobalance over-simplicity

Benefits: Simple models (less features) are more computationallyefficient

Intuition: Fix a set of task-specific constraints to enable the use of asimple machine learning model but encode task-specific constraints tomake both learning easier and more correct.

Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 7 /

39

Page 10: Guiding Semi-Supervision with Constraint-Driven Learningdanroth/Teaching/CIS... · Bene ts: Simple models (less features) are more computationally e cient Intuition: Fix a set of

Constraint-driven learningIntroduction

Motivation: Keep the learned model simple by using constraints tobalance over-simplicity

Benefits: Simple models (less features) are more computationallyefficient

Intuition: Fix a set of task-specific constraints to enable the use of asimple machine learning model but encode task-specific constraints tomake both learning easier and more correct.

Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 7 /

39

Page 11: Guiding Semi-Supervision with Constraint-Driven Learningdanroth/Teaching/CIS... · Bene ts: Simple models (less features) are more computationally e cient Intuition: Fix a set of

Constraint-driven learningIntroduction

Motivation: Keep the learned model simple by using constraints tobalance over-simplicity

Benefits: Simple models (less features) are more computationallyefficient

Intuition: Fix a set of task-specific constraints to enable the use of asimple machine learning model but encode task-specific constraints tomake both learning easier and more correct.

Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 7 /

39

Page 12: Guiding Semi-Supervision with Constraint-Driven Learningdanroth/Teaching/CIS... · Bene ts: Simple models (less features) are more computationally e cient Intuition: Fix a set of

Constraint-driven learningFramework

Given an objective function

argmaxy λ · F (x , y)

Define the set of linear (non-linear) constraints {Ci}i≤k

Ci : X × Y −→ {0, 1}

Solve the optimization problem given the constraints

Any Boolean rule can be encoded as a set of linear inequalities.

Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 8 /

39

Page 13: Guiding Semi-Supervision with Constraint-Driven Learningdanroth/Teaching/CIS... · Bene ts: Simple models (less features) are more computationally e cient Intuition: Fix a set of

Constraint-driven learningFramework

Given an objective function

argmaxy λ · F (x , y)

Define the set of linear (non-linear) constraints {Ci}i≤k

Ci : X × Y −→ {0, 1}

Solve the optimization problem given the constraints

Any Boolean rule can be encoded as a set of linear inequalities.

Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 8 /

39

Page 14: Guiding Semi-Supervision with Constraint-Driven Learningdanroth/Teaching/CIS... · Bene ts: Simple models (less features) are more computationally e cient Intuition: Fix a set of

Constraint-driven learningExample task

Sequence tagging with HMM/CRF + global constraints

Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 9 /

39

Page 15: Guiding Semi-Supervision with Constraint-Driven Learningdanroth/Teaching/CIS... · Bene ts: Simple models (less features) are more computationally e cient Intuition: Fix a set of

Constraint-driven learningSequence tagging constraints

Unique label for each word

Edges must for a path

A verb must exist

Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 10 /

39

Page 16: Guiding Semi-Supervision with Constraint-Driven Learningdanroth/Teaching/CIS... · Bene ts: Simple models (less features) are more computationally e cient Intuition: Fix a set of

Constraint-driven learningExample task

Semantic role labelling with independent classifiers + globalconstraints

Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 11 /

39

Page 17: Guiding Semi-Supervision with Constraint-Driven Learningdanroth/Teaching/CIS... · Bene ts: Simple models (less features) are more computationally e cient Intuition: Fix a set of

Constraint-driven learningExample task

Each verb predicate carries with it a new inference problem

Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 12 /

39

Page 18: Guiding Semi-Supervision with Constraint-Driven Learningdanroth/Teaching/CIS... · Bene ts: Simple models (less features) are more computationally e cient Intuition: Fix a set of

Constraint-driven learningExample task

Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 13 /

39

Page 19: Guiding Semi-Supervision with Constraint-Driven Learningdanroth/Teaching/CIS... · Bene ts: Simple models (less features) are more computationally e cient Intuition: Fix a set of

Constraint-driven learningSRL constraints

No duplicate argument classes

Unique labels

Order constraints

Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 14 /

39

Page 20: Guiding Semi-Supervision with Constraint-Driven Learningdanroth/Teaching/CIS... · Bene ts: Simple models (less features) are more computationally e cient Intuition: Fix a set of

Outline

1 BackgroundSemi-supervised learningConstraint-driven learning

2 Constraint-driven learning with semi-supervisionIntroductionModelScoringConstraint pipelineConstraint Driven Learning (CODL) algorithm

3 Experiments

Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 15 /

39

Page 21: Guiding Semi-Supervision with Constraint-Driven Learningdanroth/Teaching/CIS... · Bene ts: Simple models (less features) are more computationally e cient Intuition: Fix a set of

Constraint-drive learning w/ semi-supervisionIntroduction

Intuition: Use constraints to provide better training samples from ourunlabelled set of samples

Benefit: Deviations from simple model only do so towards a moreexpressive answer, since constraints guide the model

Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 16 /

39

Page 22: Guiding Semi-Supervision with Constraint-Driven Learningdanroth/Teaching/CIS... · Bene ts: Simple models (less features) are more computationally e cient Intuition: Fix a set of

Constraint-drive learning w/ semi-supervisionIntroduction

Intuition: Use constraints to provide better training samples from ourunlabelled set of samples

Benefit: Deviations from simple model only do so towards a moreexpressive answer, since constraints guide the model

Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 16 /

39

Page 23: Guiding Semi-Supervision with Constraint-Driven Learningdanroth/Teaching/CIS... · Bene ts: Simple models (less features) are more computationally e cient Intuition: Fix a set of

Constraint-drive learning w/ semi-supervisionExamples

Consider the constraint over {0, 1}n of 1∗0∗. For every possiblesequence {1∗010∗}, there are 2 good fixes {1∗110∗, 1∗000∗}. What isthe best approach for learning?

Consider the constraint, state transitions must occur on punctuationmarks. There could be many good sequences abiding by thisconstraint.

Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 17 /

39

Page 24: Guiding Semi-Supervision with Constraint-Driven Learningdanroth/Teaching/CIS... · Bene ts: Simple models (less features) are more computationally e cient Intuition: Fix a set of

Outline

1 BackgroundSemi-supervised learningConstraint-driven learning

2 Constraint-driven learning with semi-supervisionIntroductionModelScoringConstraint pipelineConstraint Driven Learning (CODL) algorithm

3 Experiments

Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 18 /

39

Page 25: Guiding Semi-Supervision with Constraint-Driven Learningdanroth/Teaching/CIS... · Bene ts: Simple models (less features) are more computationally e cient Intuition: Fix a set of

Constraint-driven learning w/ semi-supervisionModel

Consider a data and output domain X , Y and a distribution Ddefined over X × YInput/output pairs (x , y) ∈ X × Y are given as sequence pairs:

x = (x1, · · · , xN), y = (y1, · · · , yM)

We wish to find a structured output classifier h : X −→ Y that uses astructured scoring function f : X ×Y −→ R to choose the most likelyoutput sequence, where the correct output sequence is given thehighest score

We are given (or define) a set of constraints {Ci}i≤K

Ci : X × Y −→ {0, 1}

Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 19 /

39

Page 26: Guiding Semi-Supervision with Constraint-Driven Learningdanroth/Teaching/CIS... · Bene ts: Simple models (less features) are more computationally e cient Intuition: Fix a set of

Outline

1 BackgroundSemi-supervised learningConstraint-driven learning

2 Constraint-driven learning with semi-supervisionIntroductionModelScoringConstraint pipelineConstraint Driven Learning (CODL) algorithm

3 Experiments

Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 20 /

39

Page 27: Guiding Semi-Supervision with Constraint-Driven Learningdanroth/Teaching/CIS... · Bene ts: Simple models (less features) are more computationally e cient Intuition: Fix a set of

Constraint-driven learning w/ semi-supervisionScoring

Scoring rule:

f (x , y) =M∑i=1

λi fi (x , y) = λ · F (x , y) ≡ wT · φ(x , y)

Compatible for any linear discriminative and generative model such asHMMs and CRFs

Local feature functions {fi}i≤M allow for tractable inference bycapturing contextual structure

Space of structured pairs could be hugeLocally there are smaller distances to account for

Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 21 /

39

Page 28: Guiding Semi-Supervision with Constraint-Driven Learningdanroth/Teaching/CIS... · Bene ts: Simple models (less features) are more computationally e cient Intuition: Fix a set of

Outline

1 BackgroundSemi-supervised learningConstraint-driven learning

2 Constraint-driven learning with semi-supervisionIntroductionModelScoringConstraint pipelineConstraint Driven Learning (CODL) algorithm

3 Experiments

Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 22 /

39

Page 29: Guiding Semi-Supervision with Constraint-Driven Learningdanroth/Teaching/CIS... · Bene ts: Simple models (less features) are more computationally e cient Intuition: Fix a set of

Constraint-driven learning w/ semi-supervisionSample constraints

Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 23 /

39

Page 30: Guiding Semi-Supervision with Constraint-Driven Learningdanroth/Teaching/CIS... · Bene ts: Simple models (less features) are more computationally e cient Intuition: Fix a set of

Constraint-driven learning w/ semi-supervisionConstraint pipeline (hard)

Define 1C(x) ⊆ Y as the set of output sequences that assign the value1 for a given (x , y)

Our objective function then becomes

argmaxy∈1C(x)λ · F (x , y)

Intuition: Find best output sequence y that maximizes the score,fitting the hard constraints

Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 24 /

39

Page 31: Guiding Semi-Supervision with Constraint-Driven Learningdanroth/Teaching/CIS... · Bene ts: Simple models (less features) are more computationally e cient Intuition: Fix a set of

Constraint-driven learning w/ semi-supervisionConstraint pipeline (hard)

Define 1C(x) ⊆ Y as the set of output sequences that assign the value1 for a given (x , y)

Our objective function then becomes

argmaxy∈1C(x)λ · F (x , y)

Intuition: Find best output sequence y that maximizes the score,fitting the hard constraints

Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 24 /

39

Page 32: Guiding Semi-Supervision with Constraint-Driven Learningdanroth/Teaching/CIS... · Bene ts: Simple models (less features) are more computationally e cient Intuition: Fix a set of

Constraint-driven learning w/ semi-supervisionConstraint pipeline (soft)

Given a suitable distance metric between an output sequence and thespace of outputs respecting a single constraint

Given a set of soft constraints {Ci}i≤K with penalties {ρi}i≤K , weget a new objective function:

argmaxy λ · F (x , y)−K∑i=1

ρi · d(y , 1Ci (x))

Goal: Find best output sequence under our model that violates theleast amount of constraints

Intuition: Given a learned model, λ · F (x , y), we can bias itsdecisions using the amount by which the output sequence violateseach soft constraint

Option: Use minimal Hamming distance to a sequence

d(y , 1Ci (x)) = miny ′∈Ci (x)H(y , y ′)

Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 25 /

39

Page 33: Guiding Semi-Supervision with Constraint-Driven Learningdanroth/Teaching/CIS... · Bene ts: Simple models (less features) are more computationally e cient Intuition: Fix a set of

Constraint-driven learning w/ semi-supervisionConstraint pipeline (soft)

Given a suitable distance metric between an output sequence and thespace of outputs respecting a single constraint

Given a set of soft constraints {Ci}i≤K with penalties {ρi}i≤K , weget a new objective function:

argmaxy λ · F (x , y)−K∑i=1

ρi · d(y , 1Ci (x))

Goal: Find best output sequence under our model that violates theleast amount of constraints

Intuition: Given a learned model, λ · F (x , y), we can bias itsdecisions using the amount by which the output sequence violateseach soft constraint

Option: Use minimal Hamming distance to a sequence

d(y , 1Ci (x)) = miny ′∈Ci (x)H(y , y ′)

Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 25 /

39

Page 34: Guiding Semi-Supervision with Constraint-Driven Learningdanroth/Teaching/CIS... · Bene ts: Simple models (less features) are more computationally e cient Intuition: Fix a set of

Constraint-driven learning w/ semi-supervisionConstraint pipeline (soft)

Given a suitable distance metric between an output sequence and thespace of outputs respecting a single constraint

Given a set of soft constraints {Ci}i≤K with penalties {ρi}i≤K , weget a new objective function:

argmaxy λ · F (x , y)−K∑i=1

ρi · d(y , 1Ci (x))

Goal: Find best output sequence under our model that violates theleast amount of constraints

Intuition: Given a learned model, λ · F (x , y), we can bias itsdecisions using the amount by which the output sequence violateseach soft constraint

Option: Use minimal Hamming distance to a sequence

d(y , 1Ci (x)) = miny ′∈Ci (x)H(y , y ′)

Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 25 /

39

Page 35: Guiding Semi-Supervision with Constraint-Driven Learningdanroth/Teaching/CIS... · Bene ts: Simple models (less features) are more computationally e cient Intuition: Fix a set of

Constraint-driven learning w/ semi-supervisionDistance example

Taken from CCM-NAACL-12-Tutorial

d(y , 1Ci (x)) =M∑j=1

φCi(yj) −→ count violations

Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 26 /

39

Page 36: Guiding Semi-Supervision with Constraint-Driven Learningdanroth/Teaching/CIS... · Bene ts: Simple models (less features) are more computationally e cient Intuition: Fix a set of

Constraint-driven learningRecall: sequence tagging

Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 27 /

39

Page 37: Guiding Semi-Supervision with Constraint-Driven Learningdanroth/Teaching/CIS... · Bene ts: Simple models (less features) are more computationally e cient Intuition: Fix a set of

Constraint-driven learningRecall: sequence tagging

Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 28 /

39

Page 38: Guiding Semi-Supervision with Constraint-Driven Learningdanroth/Teaching/CIS... · Bene ts: Simple models (less features) are more computationally e cient Intuition: Fix a set of

Constraint-driven learningRecall: sequence tagging

(a) correct citation parsing

(b) HMM predicted citation parsing

Adding punctuation state transition constraint returns correct parsingunder the same HMM

Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 29 /

39

Page 39: Guiding Semi-Supervision with Constraint-Driven Learningdanroth/Teaching/CIS... · Bene ts: Simple models (less features) are more computationally e cient Intuition: Fix a set of

Constraint-driven learningRecall: SRL

Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 30 /

39

Page 40: Guiding Semi-Supervision with Constraint-Driven Learningdanroth/Teaching/CIS... · Bene ts: Simple models (less features) are more computationally e cient Intuition: Fix a set of

Constraint-driven learningRecall: SRL

Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 31 /

39

Page 41: Guiding Semi-Supervision with Constraint-Driven Learningdanroth/Teaching/CIS... · Bene ts: Simple models (less features) are more computationally e cient Intuition: Fix a set of

Constraint-driven learningRecall: SRL

Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 32 /

39

Page 42: Guiding Semi-Supervision with Constraint-Driven Learningdanroth/Teaching/CIS... · Bene ts: Simple models (less features) are more computationally e cient Intuition: Fix a set of

Outline

1 BackgroundSemi-supervised learningConstraint-driven learning

2 Constraint-driven learning with semi-supervisionIntroductionModelScoringConstraint pipelineConstraint Driven Learning (CODL) algorithm

3 Experiments

Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 33 /

39

Page 43: Guiding Semi-Supervision with Constraint-Driven Learningdanroth/Teaching/CIS... · Bene ts: Simple models (less features) are more computationally e cient Intuition: Fix a set of

Constraint-driven learning w/ semi-supervisionCODL algorithm

Constraints are used ininference not learning

Maximum likelihood estimationof λ (EM algorithm)

Semi-supervision occurs on line7,9 (i.e. for each unlabelledsample, we generate K trainingexamples)

All unlabelled samples arere-labelled each training cycle,unlike ”self-training”perspective

Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 34 /

39

Page 44: Guiding Semi-Supervision with Constraint-Driven Learningdanroth/Teaching/CIS... · Bene ts: Simple models (less features) are more computationally e cient Intuition: Fix a set of

Constraint-driven learning w/ semi-supervisionExpectation Maximization: Top K -hard EM

EM procedures typically assign a distribution over all input/outputpairs for a given unlabelled x ∈ XInstead, we choose top K , y ∈ Y maximizing our soft objectivefunction and assign uniform probability to each output

Constraints mutate distribution each step

Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 35 /

39

Page 45: Guiding Semi-Supervision with Constraint-Driven Learningdanroth/Teaching/CIS... · Bene ts: Simple models (less features) are more computationally e cient Intuition: Fix a set of

Constraint-driven learning w/ semi-supervisionMotivation for K > 1

There may be multiple ways tocorrect constraint-violatingsamples

We have access to moretraining data

Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 36 /

39

Page 46: Guiding Semi-Supervision with Constraint-Driven Learningdanroth/Teaching/CIS... · Bene ts: Simple models (less features) are more computationally e cient Intuition: Fix a set of

ExperimentsCitations

Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 37 /

39

Page 47: Guiding Semi-Supervision with Constraint-Driven Learningdanroth/Teaching/CIS... · Bene ts: Simple models (less features) are more computationally e cient Intuition: Fix a set of

ExperimentsHMM

Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 38 /

39

Page 48: Guiding Semi-Supervision with Constraint-Driven Learningdanroth/Teaching/CIS... · Bene ts: Simple models (less features) are more computationally e cient Intuition: Fix a set of

Pictures taken from:

CCM-NAACL-12-Tutorial

Chang, M. W., Ratinov, L., & Roth, D. (2007, June). Guidingsemi-supervision with constraint-driven learning. In ACL (pp.280-287).

Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 39 /

39