practical structured learning for natural language...

39
Structured Learning for Language Slide 1 Hal Daumé III ([email protected]) Practical Structured Learning for Natural Language Processing Hal Daumé III School of Computing University of Utah [email protected] CS Colloquium, Brigham Young University, 14 Sep 2006

Upload: others

Post on 29-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Practical Structured Learning for Natural Language Processingusers.umiacs.umd.edu/~hal/tmp/byu-talk.pdfSlide 9 Structured Learning for Language Hal Daumé III (me@hal3.name) Result:

Structured Learning for LanguageSlide 1

Hal Daumé III ([email protected])

Practical Structured Learningfor Natural Language Processing

Hal Daumé IIISchool of Computing

University of Utah

[email protected]

CS Colloquium, Brigham Young University, 14 Sep 2006

Page 2: Practical Structured Learning for Natural Language Processingusers.umiacs.umd.edu/~hal/tmp/byu-talk.pdfSlide 9 Structured Learning for Language Hal Daumé III (me@hal3.name) Result:

Structured Learning for LanguageSlide 2

Hal Daumé III ([email protected])

What Is Natural Language Processing?Processing of language to solve a problem

Machine Translation

Summarization

Information Extraction

Question Answering

Parsing

Understanding

Input OutputArabic sentence English sentence

Long documents Short documents

Document Database

Corpus + query

Sentence

Sentence

Answer

Syntactic tree

Logical sentence

All have innatestructure to

varying degrees

Corpus­based NLP:Collect example input/output pairs and learn

Page 3: Practical Structured Learning for Natural Language Processingusers.umiacs.umd.edu/~hal/tmp/byu-talk.pdfSlide 9 Structured Learning for Language Hal Daumé III (me@hal3.name) Result:

Structured Learning for LanguageSlide 3

Hal Daumé III ([email protected])

What is Machine Learning?Automatically uncovering structure in data

Classification

Regression

Dimensionality reduction

Clustering

Sequence labeling

Reinforcement learning State space +   observations

Input OutputVector Bit (or bits)

Vector Real number

Vector Shorter vector

Vectors

Vectors

Partition

Labels

Policy

Page 4: Practical Structured Learning for Natural Language Processingusers.umiacs.umd.edu/~hal/tmp/byu-talk.pdfSlide 9 Structured Learning for Language Hal Daumé III (me@hal3.name) Result:

Structured Learning for LanguageSlide 4

Hal Daumé III ([email protected])

NLP versus Machine LearningMachine TranslationSummarizationInformation ExtractionQuestion AnsweringParsingUnderstanding

Arabic sentence English sentenceLong documents Short documentsDocument DatabaseCorpus + querySentenceSentence

AnswerSyntactic treeLogical sentence

ClassificationRegressionDimensionality reductionClusteringSequence labeling

Vector Bit (or bits)Vector Real numberVector Shorter vectorVectorsVectors

PartitionLabels

Reinforcement learning­inspired

State space Policy

EncodingAd HocSearch

ProvablyGood

NaturalDecomposition

Page 5: Practical Structured Learning for Natural Language Processingusers.umiacs.umd.edu/~hal/tmp/byu-talk.pdfSlide 9 Structured Learning for Language Hal Daumé III (me@hal3.name) Result:

Structured Learning for LanguageSlide 5

Hal Daumé III ([email protected])

Entity Detection and Tracking

Shakespeare scholarship, lively at the best of times, saw the fur flying yesterday after a German academic claimed to have authenticated not just one but four contemporary images of the playwright ­ and suggested, to boot, that he had died of cancer.

As the National Portrait Gallery planned to reveal that only one of half a dozen claimed portraits of William Shakespeare can now be considered genuine, Prof Hildegard Hammerschmidt­Hummel said she could prove that there were at least four surviving portraits of the playwright.

Why?  Summarization, Machine Translation, etc...

§1.3

(Exa

mple

Prob

lem: E

DT)

Page 6: Practical Structured Learning for Natural Language Processingusers.umiacs.umd.edu/~hal/tmp/byu-talk.pdfSlide 9 Structured Learning for Language Hal Daumé III (me@hal3.name) Result:

Structured Learning for LanguageSlide 6

Hal Daumé III ([email protected])

Why is EDT Hard?

Long­range dependencies

Syntax, Semantics and Discourse

World Knowledge

Highly constrained outputs

Shakespeare scholarship, lively at the best of times, saw the fur flying yesterday after a German academic claimed to have authenticated not just one but four contemporary images of the playwright ­ and suggested, to boot, that he had died of cancer.

§1.3

(Exa

mple

Prob

lem: E

DT)

Page 7: Practical Structured Learning for Natural Language Processingusers.umiacs.umd.edu/~hal/tmp/byu-talk.pdfSlide 9 Structured Learning for Language Hal Daumé III (me@hal3.name) Result:

Structured Learning for LanguageSlide 7

Hal Daumé III ([email protected])

How is EDT Typically Attacked?

Shakespeare scholarship, lively at the best of times, saw the fur flying yesterday after a German academic claimed to have authenticated not just one but four contemporary images of the playwright ­ and suggested, to boot, that he had died of cancer.

Shakespeare  scholarship  ,  ...   a  German  academic  claimed  ...      PER                 *         *        *    GPE         PER           *

Shakespeare

playwright

German

academiche

Sequence Labeling

Binary Classification +    Clustering

§5.2

(EDT

: Prio

r Wor

k)

Page 8: Practical Structured Learning for Natural Language Processingusers.umiacs.umd.edu/~hal/tmp/byu-talk.pdfSlide 9 Structured Learning for Language Hal Daumé III (me@hal3.name) Result:

Structured Learning for LanguageSlide 8

Hal Daumé III ([email protected])

Learning in NLP Applications

Model

Features

Learning

Search

Page 9: Practical Structured Learning for Natural Language Processingusers.umiacs.umd.edu/~hal/tmp/byu-talk.pdfSlide 9 Structured Learning for Language Hal Daumé III (me@hal3.name) Result:

Structured Learning for LanguageSlide 9

Hal Daumé III ([email protected])

Result: Searn (= “Search + Learn”)

Computationally efficient

Strong theoretical guarantees

State­of­the­art performance

Broadly applicable

Easy to implement

§3 (S

earch

-bas

ed S

tructu

red P

redic

tion)

Page 10: Practical Structured Learning for Natural Language Processingusers.umiacs.umd.edu/~hal/tmp/byu-talk.pdfSlide 9 Structured Learning for Language Hal Daumé III (me@hal3.name) Result:

Structured Learning for LanguageSlide 10

Hal Daumé III ([email protected])

Talk Outline➢ Searn: Search­based Structured Prediction

➢ Experimental Results:➢ Sequence Labeling➢ Entity Detection and Tracking➢ Automatic Document Summarization

➢ Discussion and Future Work

Page 11: Practical Structured Learning for Natural Language Processingusers.umiacs.umd.edu/~hal/tmp/byu-talk.pdfSlide 9 Structured Learning for Language Hal Daumé III (me@hal3.name) Result:

Structured Learning for LanguageSlide 11

Hal Daumé III ([email protected])

Structured Prediction

Formulated as a maximization problem:

y = arg max y∈Y x f x , y ;

Search Model Features Learning

Search (and learning) tractableonly for very simple Y(x) and f

§2.2

(Mac

hine L

earn

ing: S

tructu

red P

redic

tion)

Page 12: Practical Structured Learning for Natural Language Processingusers.umiacs.umd.edu/~hal/tmp/byu-talk.pdfSlide 9 Structured Learning for Language Hal Daumé III (me@hal3.name) Result:

Structured Learning for LanguageSlide 12

Hal Daumé III ([email protected])

Reduction for Structured Prediction➢ Idea: view structured prediction in light of search

NPD

VR

NPD

VR

NPD

VR

Each step here looks like it could be represented as a weighted multi­class problem. 

Can we formalize this idea?

I sing merrily

§3.3

(Sea

rch-b

ased

Stru

cture

d Pre

dictio

n)

Page 13: Practical Structured Learning for Natural Language Processingusers.umiacs.umd.edu/~hal/tmp/byu-talk.pdfSlide 9 Structured Learning for Language Hal Daumé III (me@hal3.name) Result:

Structured Learning for LanguageSlide 13

Hal Daumé III ([email protected])

Error­Limiting Reductions➢ A formal mapping between learning problems

LearningProblem

“A”

LearningProblem “B”

What we wantto solve

What we know howto solve

F

F ­1

➢ F maps examples from A to examples from B➢ F ­1 maps a classifier h for B to a classifier that solves A➢ Error limiting: good performance on B implies good 

performance on A

[Beygelzimer et al.,ICML 2005]

§2.3

(Mac

hine L

earn

ing: R

educ

tions

)

Page 14: Practical Structured Learning for Natural Language Processingusers.umiacs.umd.edu/~hal/tmp/byu-talk.pdfSlide 9 Structured Learning for Language Hal Daumé III (me@hal3.name) Result:

Structured Learning for LanguageSlide 14

Hal Daumé III ([email protected])

Our Reduction Goal

StructuredPrediction

BinaryClassification

ImportanceWeighted

Classification

“Searn”

Cost­sensitive

Classification

“Weighted All Pairs”[Beygelzimer et al.,

ICML 2005]

“Costing”[Zadronzy,

Langford &Abe,

ICDM 2003]

➢ Error­boundsget composed

➢ New techniques for binaryclassification lead to newtechniques for structured prediction

➢ Any generalization theory for binaryclassification immediately appliesto structured prediction

§2.3

(Mac

hine L

earn

ing: R

educ

tions

)

Page 15: Practical Structured Learning for Natural Language Processingusers.umiacs.umd.edu/~hal/tmp/byu-talk.pdfSlide 9 Structured Learning for Language Hal Daumé III (me@hal3.name) Result:

Structured Learning for LanguageSlide 15

Hal Daumé III ([email protected])

Two Easy ReductionsCosting [Zadronzy, Langford &

 Abe, ICDM 2003]

Weighted­All­Pairs[Beygelzimer, Dani, Hayes,Langford and Zadrozny, ICML 2005]

X×{±1}×ℝ≥0

X×{±1}

hh hh hh L IW ≤L BC E [w]

X×{±1}×ℝ≥0X×ℝ

≥0×⋯×ℝ

≥0

1v21v2 1v31v3 1vK1vK

2v32v3 2vK2vK......

L CS ≤2 L IW

§2.3

(Mac

hine L

earn

ing: R

educ

tions

)

Page 16: Practical Structured Learning for Natural Language Processingusers.umiacs.umd.edu/~hal/tmp/byu-talk.pdfSlide 9 Structured Learning for Language Hal Daumé III (me@hal3.name) Result:

Structured Learning for LanguageSlide 16

Hal Daumé III ([email protected])

A First Attempt

AND

VR

AND

VR

AND

VR

I sing merrily

At eachcorrect state:  Train classifier  to choose   next state

Thm (Kääriäinen): There is a 1st­order binary Markov problem  such that a classification error  implies a Hamming loss of:

T2−

1−1−2T1

4

12≈

T2

§3.3

(Sea

rch-b

ased

Stru

cture

d Pre

dictio

n)

Page 17: Practical Structured Learning for Natural Language Processingusers.umiacs.umd.edu/~hal/tmp/byu-talk.pdfSlide 9 Structured Learning for Language Hal Daumé III (me@hal3.name) Result:

Structured Learning for LanguageSlide 17

Hal Daumé III ([email protected])

Reducing Structured Prediction

AND

VR

AND

VR

AND

VR

I sing merrily

Key Assumption: Optimal Policy for training data

Weak!

Given: input, true output and state;Return: best successor state

§3.4.

2 (Se

arn:

Optim

al Po

licy)

Page 18: Practical Structured Learning for Natural Language Processingusers.umiacs.umd.edu/~hal/tmp/byu-talk.pdfSlide 9 Structured Learning for Language Hal Daumé III (me@hal3.name) Result:

Structured Learning for LanguageSlide 18

Hal Daumé III ([email protected])

Idea

Optimal Policy Learned Policy(Features)

§3.4

(Sea

rn: T

raini

ng)

Page 19: Practical Structured Learning for Natural Language Processingusers.umiacs.umd.edu/~hal/tmp/byu-talk.pdfSlide 9 Structured Learning for Language Hal Daumé III (me@hal3.name) Result:

Structured Learning for LanguageSlide 19

Hal Daumé III ([email protected])

Generate classification problems

Iterative Algorithm: Searn

Set current policy = optimal policy

Decode using current policy

... after  a  German  academic  claimed ...

    *      *  GPE     PER       *

Repeat:

PER

ORG

*

*         *

PER       *

PER       *

L = 0.3

L = 0

L = 0.2

L = 0.25

Learn new multiclass classifierInterpolate: cur = *cur + (1­)*new

§3.4.

3 (Se

arn:

Algo

rithm)

Page 20: Practical Structured Learning for Natural Language Processingusers.umiacs.umd.edu/~hal/tmp/byu-talk.pdfSlide 9 Structured Learning for Language Hal Daumé III (me@hal3.name) Result:

Structured Learning for LanguageSlide 20

Hal Daumé III ([email protected])

Theoretical Analysis

Theorem: For conservative , after 2T3 ln Titerations, the loss of the learned policy is bounded as follows:

L h ≤ L h0 2 T lnT lavg 1lnT cmax

T

Loss of theoptimal policy

Averagemulticlass

classificationloss

Worst caseper­step

loss

§3.5

(Sea

rn: T

heor

etica

l Ana

lysis)

Page 21: Practical Structured Learning for Natural Language Processingusers.umiacs.umd.edu/~hal/tmp/byu-talk.pdfSlide 9 Structured Learning for Language Hal Daumé III (me@hal3.name) Result:

Structured Learning for LanguageSlide 21

Hal Daumé III ([email protected])

Why Does Searn Work?

A classifier tells us how to search

Impossible to train on full space

Want to train only where we'll wind up

Searn tells us where we'll wind up

§3.7

(Sea

rn: D

iscus

sion a

nd C

onclu

sions

)

Page 22: Practical Structured Learning for Natural Language Processingusers.umiacs.umd.edu/~hal/tmp/byu-talk.pdfSlide 9 Structured Learning for Language Hal Daumé III (me@hal3.name) Result:

Structured Learning for LanguageSlide 22

Hal Daumé III ([email protected])

Talk Outline➢ Searn: Search­based Structured Prediction

➢ Experimental Results:➢ Sequence Labeling➢ Entity Detection and Tracking➢ Automatic Document Summarization

➢ Discussion and Future Work

Page 23: Practical Structured Learning for Natural Language Processingusers.umiacs.umd.edu/~hal/tmp/byu-talk.pdfSlide 9 Structured Learning for Language Hal Daumé III (me@hal3.name) Result:

Structured Learning for LanguageSlide 23

Hal Daumé III ([email protected])

Proof on Concept: Sequence LabelsSpanish Named

Entity Recognition

94

95

96

97

98

SVM

ISO

CR

FSe

arn

Sear

n+da

ta

65

70

75

80

85

90

95

HandwritingRecognition

LR SVM

M3N

Sear

nSe

arn+

data

93

94

95

96

97

Chunking+Tagging

Incr

emen

tal P

erce

ptro

n

LaSO

CR

F

Sear

n

§4.4

(Seq

uenc

e Lab

eling

: Com

paris

on)

Page 24: Practical Structured Learning for Natural Language Processingusers.umiacs.umd.edu/~hal/tmp/byu-talk.pdfSlide 9 Structured Learning for Language Hal Daumé III (me@hal3.name) Result:

Structured Learning for LanguageSlide 24

Hal Daumé III ([email protected])

Talk Outline➢ Searn: Search­based Structured Prediction

➢ Experimental Results:➢ Sequence Labeling➢ Entity Detection and Tracking➢ Automatic Document Summarization

➢ Discussion and Future Work

Page 25: Practical Structured Learning for Natural Language Processingusers.umiacs.umd.edu/~hal/tmp/byu-talk.pdfSlide 9 Structured Learning for Language Hal Daumé III (me@hal3.name) Result:

Structured Learning for LanguageSlide 25

Hal Daumé III ([email protected])

Entity Detection and Tracking

Shakespeare scholarship, lively at the best of times, saw the fur flying yesterday after a German academic claimed to have authenticated not just one but four contemporary images of the playwright ­ and suggested, to boot, that he had died of cancer.

... he had died of cancer

... he had died of cancer

... he had died of cancer

... he had died of cancer

... he had died of cancer

... he had died of cancer

... he had died of cancer

...

§5.6.

1 (ED

T: Jo

int D

etecti

on an

d Cor

efere

nce:

Sear

ch)

Page 26: Practical Structured Learning for Natural Language Processingusers.umiacs.umd.edu/~hal/tmp/byu-talk.pdfSlide 9 Structured Learning for Language Hal Daumé III (me@hal3.name) Result:

Structured Learning for LanguageSlide 26

Hal Daumé III ([email protected])

EDT Features

➢ Lexical➢ Syntactic➢ Semantic➢ Gazetteer➢ Knowledge­based

Shakespeare scholarship, lively at the best of times, saw the fur flying yesterday after a German academic claimed to have authenticated not just one but four contemporary images of the playwright ­ and suggested, to boot, that he had died of cancer.

§5.[4

5].3 (

EDT:

Fea

ture F

uncti

ons)

Page 27: Practical Structured Learning for Natural Language Processingusers.umiacs.umd.edu/~hal/tmp/byu-talk.pdfSlide 9 Structured Learning for Language Hal Daumé III (me@hal3.name) Result:

Structured Learning for LanguageSlide 27

Hal Daumé III ([email protected])

EDT Results (ACE 2004 data)

Sys1 Sys2 Sys3 Sys4 Searn72

74

76

78

80

Previous state of the artwith extra proprietary data§5

.6.3 (

EDT:

Exp

erim

ental

Res

ults)

Page 28: Practical Structured Learning for Natural Language Processingusers.umiacs.umd.edu/~hal/tmp/byu-talk.pdfSlide 9 Structured Learning for Language Hal Daumé III (me@hal3.name) Result:

Structured Learning for LanguageSlide 28

Hal Daumé III ([email protected])

Feature Contributions

­Lex ­Syn ­Sem ­Gaz ­KB68

70

72

74

76

78

80

82

§5.6.

3 (ED

T: E

xper

imen

tal R

esult

s)

Page 29: Practical Structured Learning for Natural Language Processingusers.umiacs.umd.edu/~hal/tmp/byu-talk.pdfSlide 9 Structured Learning for Language Hal Daumé III (me@hal3.name) Result:

Structured Learning for LanguageSlide 29

Hal Daumé III ([email protected])

Talk Outline➢ Searn: Search­based Structured Prediction

➢ Experimental Results:➢ Sequence Labeling➢ Entity Detection and Tracking➢ Automatic Document Summarization

➢ Discussion and Future Work

Page 30: Practical Structured Learning for Natural Language Processingusers.umiacs.umd.edu/~hal/tmp/byu-talk.pdfSlide 9 Structured Learning for Language Hal Daumé III (me@hal3.name) Result:

Structured Learning for LanguageSlide 30

Hal Daumé III ([email protected])

New Task: SummarizationGive me information about

the Falkland islands war

Argentina was still obsessed with the Falkland Islands even in 1994, 12 years after its defeat in the 74­day war with Britain. The country's overriding foreign policy aim continued to be winning sovereignty over the islands.

That's too muchinformation to read!

The Falkland islandswar, in 1982, wasfought betweenBritain and Argentina.

That's perfect!

Standard approach is sentence extraction, but that is often deemed too “coarse” to producegood, very short summaries.  We wish to also drop words and phrases => document compression

Page 31: Practical Structured Learning for Natural Language Processingusers.umiacs.umd.edu/~hal/tmp/byu-talk.pdfSlide 9 Structured Learning for Language Hal Daumé III (me@hal3.name) Result:

Structured Learning for LanguageSlide 31

Hal Daumé III ([email protected])

Structure of Search

S1 S2 S3 S4 S5 Sn...

= frontier node = summary node

➢ Lay sentences out sequentially➢ Generate a dependency parse of each sentence➢ Mark each root as a frontier node➢ Repeat:

➢ Choose a frontier node node to add to the summary➢ Add all its children to the frontier➢ Finish when we have enough words

To make search more tractable, we run an initial round of sentence extraction @ 5x length

The   man   ate   a   big   sandwich

The

man

ate

a big

sandwich

Argentina was still obsessed with the Falkland Islands even in 1994, 12 years after its defeat in the 74­day war with Britain. The country's overriding foreign policy aim continued to be winning sovereignty over the islands.

Optimal Policy not analytically available;Approximated with beam search.

§6.1

(Sum

mariz

ation

: Vine

-Gro

wth M

odel)

Page 32: Practical Structured Learning for Natural Language Processingusers.umiacs.umd.edu/~hal/tmp/byu-talk.pdfSlide 9 Structured Learning for Language Hal Daumé III (me@hal3.name) Result:

Structured Learning for LanguageSlide 32

Hal Daumé III ([email protected])

Example Output (40 word limit)Sentence Extraction + Compression:

Argentina and Britain announced an agreement, nearly eight years after they fought a 74­day war a populated archipelago off Argentina's coast.  Argentina gets out the red carpet, official royal visitor since the end of the Falklands war in 1982.

Vine Growth:Argentina and Britain announced to restore full ties, eight years after they fought a 74­day war over the Falkland islands.  Britain invited Argentina's minister Cavallo to London in 1992 in the first official visit since the Falklands war in 1982.

6  Diplomatic ties restored 3  Falkland war was in 19825  Major cabinet member visits 3  Cavallo visited UK5  Exchanges were in 1992 2  War was 74­days long3  War between Britain and Argentina

+24

+13

§6.7

(Sum

mariz

ation

: Erro

r Ana

laysis

)

Page 33: Practical Structured Learning for Natural Language Processingusers.umiacs.umd.edu/~hal/tmp/byu-talk.pdfSlide 9 Structured Learning for Language Hal Daumé III (me@hal3.name) Result:

Structured Learning for LanguageSlide 33

Hal Daumé III ([email protected])

Results

Approx. Optimal Vine Growth

Vine Growth (Searn)

Optimal Extract @100

Extract @100

Baseline2

2.5

3

3.5

4

[Daumé III and Marcu, DUC 2005]

§6.6

(Sum

mariz

ation

: Exp

erim

ental

Res

ults)

Page 34: Practical Structured Learning for Natural Language Processingusers.umiacs.umd.edu/~hal/tmp/byu-talk.pdfSlide 9 Structured Learning for Language Hal Daumé III (me@hal3.name) Result:

Structured Learning for LanguageSlide 34

Hal Daumé III ([email protected])

Talk Outline➢ Searn: Search­based Structured Prediction

➢ Experimental Results:➢ Sequence Labeling➢ Entity Detection and Tracking➢ Automatic Document Summarization

➢ Discussion and Future Work

Page 35: Practical Structured Learning for Natural Language Processingusers.umiacs.umd.edu/~hal/tmp/byu-talk.pdfSlide 9 Structured Learning for Language Hal Daumé III (me@hal3.name) Result:

Structured Learning for LanguageSlide 35

Hal Daumé III ([email protected])

What Can Searn Do?

Solve structured prediction problems...

...efficiently.

...with theoretical guarantees.

...with weak assumptions on structure.

...and outperform the competition.

...with little extra code required.

§7 (C

onclu

sions

and F

uture

Dire

ction

s)

Page 36: Practical Structured Learning for Natural Language Processingusers.umiacs.umd.edu/~hal/tmp/byu-talk.pdfSlide 9 Structured Learning for Language Hal Daumé III (me@hal3.name) Result:

Structured Learning for LanguageSlide 36

Hal Daumé III ([email protected])

What Can Searn Not Do? (Yet)

Solve structured prediction problems...

...with only weak feedback.

...with hidden variables.

...with enormous branching factors.

...without pre­defined search spaces.

...with “combinatorial” outputs.

...with only an optimal path.

§7 (C

onclu

sions

and F

uture

Dire

ction

s)

Page 37: Practical Structured Learning for Natural Language Processingusers.umiacs.umd.edu/~hal/tmp/byu-talk.pdfSlide 9 Structured Learning for Language Hal Daumé III (me@hal3.name) Result:

Structured Learning for LanguageSlide 37

Hal Daumé III ([email protected])

Thanks!Questions?

Page 38: Practical Structured Learning for Natural Language Processingusers.umiacs.umd.edu/~hal/tmp/byu-talk.pdfSlide 9 Structured Learning for Language Hal Daumé III (me@hal3.name) Result:

Structured Learning for LanguageSlide 38

Hal Daumé III ([email protected])

BACKUP SLIDES

Page 39: Practical Structured Learning for Natural Language Processingusers.umiacs.umd.edu/~hal/tmp/byu-talk.pdfSlide 9 Structured Learning for Language Hal Daumé III (me@hal3.name) Result:

Structured Learning for LanguageSlide 39

Hal Daumé III ([email protected])

Conditional Random Fields[Lafferty, McCallum, Pereira, ICML 2001]

Max­margin Markov Networks[Taskar, Guestrin, Koller, NIPS 2003]

Searn[Daumé III, Langford, Marcu,  submitted to NIPS 2006]

Incremental Perceptron[Collins & Roark, EMNLP 2004]

LaSO[Daumé III, Marcu, ICML 2005]

Optimal Policies

Optimal Path

Approximately Optimal Policy

Optimal Policy

Weak Feedback

Loss­augmented Search

Normalized Search

Optimally Completable

⊇ ⊇⊃ ⊃

⊃ ⊃⊃ ⊃

⊃ ⊃⊃ ⊃

Most NLP Problems