joint ambiguity modeling in nlp - uio.no · joint ambiguity modeling in nlp woodley packard...

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Joint Ambiguity Modeling in NLP

Woodley Packard

Universitet i Oslo

March 21, 2011


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Introduction

Ambiguity is a central phenomenon in natural language,affecting accuracy and efficiency in most if not all types ofNLP.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

◮ Word boundaries


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

◮ Word boundaries

◮ Difficult in some languages – e.g. Chinese


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

◮ Word boundaries

◮ Difficult in some languages – e.g. Chinese

◮ Norwegian examples lifted from recent mailing listactivity:

◮ Lege-ring, sei-del, sel-skap, bru-sau-tomat,

sports-av-iser...


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

◮ Sentence Boundaries


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity


◮ Again, not marked in some languages


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity


◮ Again, not marked in some languages

◮ Even in English, the markers are overloaded:

◮ The CEO had the P.R. Department leaders make risky

moves.

◮ The citizens voted in the U.S. Presidential Election

polls.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

◮ Word Senses


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

◮ Word Senses

◮ English word bank:


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

◮ Word Senses


◮ (n) Financial institution


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

◮ Word Senses



◮ (n) Side of a river / stream


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

◮ Word Senses




◮ (n) Repository of resources (tree bank, bank of

switches)


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

◮ Word Senses





switches)

◮ (v) To bet everything (on ...)


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

◮ Word Senses





switches)


◮ (v) Tilting to make a turn in an airplane


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

◮ Word Senses





switches)


◮ (v) Tilting to make a turn in an airplane

◮ (v) To do business at a bank (financial institution)

◮ .... others.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

◮ Syntactic Ambiguity


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity


◮ I saw the man with the telescope.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity



◮ Anaphora


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity



◮ Anaphora

◮ Jessie and Alex grew up together, but eventually he

moved to the West coast and she moved to the East

coast.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity



◮ Anaphora

◮ Jessie and Alex grew up together, but eventually he

moved to the West coast and she moved to the East

coast.

◮ .... others.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Traditional Approaches to Dealing with Ambiguity

◮ How well can we resolve some particular type ofambiguity?


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics



◮ Rich literature about how to handle each type ofambiguity.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics




◮ None are completely solved problems.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics





◮ Some have been solved fairly well (word boundaries,sentence boundaries).


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics





◮ Some have been solved fairly well (word boundaries,sentence boundaries).

◮ But most have room for improvement.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

An Emerging Trend in Research

◮ Results about how diverse types of information can behelpful in ambiguity resolution.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics



◮ Large body of research about using syntax as a guidefor resolving anaphora.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics




Some others:

◮ “Using Syntactic Dependency as Local Context toResolve Word Sense Ambiguity” D. Lin, ACL 1997.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics




Some others:

◮ “Using Syntactic Dependency as Local Context toResolve Word Sense Ambiguity” D. Lin, ACL 1997.

◮ “Improving Parsing and PP attachment Performancewith Sense Information” E. Agirre, T. Baldwin, D.Martinez, ACL 2008.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Joint Modeling of AmbiguityIdea:Instead of modeling individual marginal distributions for eachtype of ambiguity, or conditional models involving two typesof ambiguity, what if we model a joint distribution for all thetypes of ambiguity we are interested in at once?


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics


◮ Gain: modeling flexibility


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics



◮ Cost: greater complexity


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics




◮ Possible framework: graphical models


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics




◮ Possible framework: graphical models

◮ Inference and parameter estimation: sometimestractable, sometimes not.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

A Graphical Model for Ambiguity

Anaphora

Syntax

Hobbs 1978

WSD

Lin 1997Agirre et al. 2008


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Other Dependencies Seem Likely


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics


◮ Information about anaphora may be helpful indisambiguating syntax.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics



◮ Word Priming: information about the senses of words ina given sentence may be helpful for determining wordsenses in nearby subsequent sentences.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics




◮ Syntactic Priming: information about the constructionsused in a given sentence may be helpful for determiningthe syntax of nearby subsequent sentences.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics




◮ Syntactic Priming: information about the constructionsused in a given sentence may be helpful for determiningthe syntax of nearby subsequent sentences.

◮ Maybe others too!


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

A Revised Graphical Model for Ambiguity

Anaphora

Syntax1 Syntax2 Syntax3

WSD1 WSD2 WSD3


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

A Roadmap for my Project


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics


◮ Build basic disambiguation systems for a few types ofambiguity


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics



◮ Set baselines for how well we can disambiguate withouta joint model


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics




◮ Learn how to combine information from disparatesystems to form a joint model


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics




◮ Learn how to combine information from disparatesystems to form a joint model

◮ Evaluate the joint model’s performance on each type ofambiguity vs. the baselines


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Where am I on that roadmap?


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics


◮ Not very far along, really. I’ve built a syntaxdisambiguation system and set a strong baseline forsyntax disambiguation in isolation.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics


◮ Not very far along, really. I’ve built a syntaxdisambiguation system and set a strong baseline forsyntax disambiguation in isolation.

◮ Started some experiments into using more globalinformation, but so far nothing worth reporting.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Next, shifting gears...

I’ll describe the work I’ve done exploring the space of syntaxdisambiguation for HPSG grammars.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Syntax Disambiguation

◮ Given an utterance, find the best analysis licensed byyour grammar.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics



◮ With broad-coverage grammars, there can be a lot ofcandidates analyses!


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics



◮ With broad-coverage grammars, there can be a lot ofcandidates analyses!

◮ ERG licenses more than 10,000 distinct analyses for:

I would still have an appointment slot free on Tuesday, the

sixth of April, but only in the afternoon.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Syntax DisambiguationThe different analyses are usually not all semanticallyequivalent. How do we know which meaning was intendedby the speaker?


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Syntax DisambiguationThe different analyses are usually not all semanticallyequivalent. How do we know which meaning was intendedby the speaker?

Common solution: annotate the intended meaning on asufficiently large corpus of example utterances, and thenapply machine learning techniques to build a model that willallow us to guess the intended meaning on unseen data.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Layout

Modeling Ambiguity

Syntax DisambiguationMaximum Entropy ModelsMaxEnt for the ERGEvaluation Metrics


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Conditional log-linear models

p(y |x) =ew ·f (x ,y)

∑

y ′ ew ·f (x ,y ′)

where w is a vector of feature weights, typically learned fromthe training data.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics


p(y |x) =ew ·f (x ,y)

∑

y ′ ew ·f (x ,y ′)


◮ Conditional probability model for classification/rankingproblems.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics


p(y |x) =ew ·f (x ,y)

∑

y ′ ew ·f (x ,y ′)



◮ Describe relationship of class y to input x by n

real-valued feature functions fj(x , y).


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics


p(y |x) =ew ·f (x ,y)

∑

y ′ ew ·f (x ,y ′)





◮ Equivalently, a vector valued feature function f (x , y).


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics


p(y |x) =ew ·f (x ,y)

∑

y ′ ew ·f (x ,y ′)





◮ Equivalently, a vector valued feature function f (x , y).

Since the denominator is a function only of x and does notdepend on y , determining arg maxy p(y |x) amounts tomaximizing w · f (x , y).


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Maximum Entropy Models

◮ One common way of selecting w .


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics



◮ Also known as Multinomial Logistic Regression.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics




◮ Corresponds to maximum likelihood estimation –maximize the conditional likelihood of the training data.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics





◮ Alternate description: model with maximum entropysubject to Ep(f (x , y)) = f̃ (x , y), the empiricalexpectation of f (x , y) on the training data.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics






◮ Popular applications: POS tagging, NER, parsedisambiguation, ...


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics






◮ Popular applications: POS tagging, NER, parsedisambiguation, ...

◮ Other techniques for picking w : SVMs, Perceptrons, ...


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Layout

Modeling Ambiguity



Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

The ERG and WeScience

◮ English Resource Grammar (ERG): broad coverageprecision computational English grammar based onHPSG


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

The ERG and WeScience

◮ English Resource Grammar (ERG): broad coverageprecision computational English grammar based onHPSG

◮ WeScience: a treebank of gold ERG analyses for around9000 sentences from Wikipedia in the domain ofComputational Linguistics.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

How well can we disambiguate when parsing WeScienceinputs just by looking at ERG analyses in isolation?


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics


◮ Use Maximum Entropy modeling; x = an inputsentence and y = a candidate analysis.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics



◮ f (x , y) = a vector of features describing a candidateanalysis, possibly making reference to the input item.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics




◮ Train a MaxEnt model on WeScience.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics





◮ To disambiguate an unseen input, select the candidateanalysis with the largest p(y |x).


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics






◮ Scoring: on a held-out test set, for what proportion ofthe inputs can our model identify the correct goldanalysis? (exact match)


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics






◮ Scoring: on a held-out test set, for what proportion ofthe inputs can our model identify the correct goldanalysis? (exact match)

◮ Use 10-fold cross-validation to reduce measurementnoise.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

An ERG analysis consists of a derivation tree and an MRS

meaning representation.Simplified candidate analysis for The very large cat meowed.:

SB-HD MC C

SP-HD N C

D - THE LEThe

AJ-HDN NORM C

SP-HD HC C

AV - DG-V LEvery

AJ - I LElarge

N SG ILR

N - C LEcat

W PERIOD PLR

V PST OLR

V - LEmeowed.

{

the q(x1), cat n(x1), large a(e2, x1), meow v(e3, x1), very x deg(e4, e2)}


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

What’s there left to do?Define f (x , y).


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics


◮ Note that since the analysis y includes all theinformation needed to reconstruct x , we may as welljust talk about f (y) instead of f (x , y).


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics



◮ We’ll make f (y) a sparse and very high dimensionalvector, with a few 1’s and 2’s here and there.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics



◮ We’ll make f (y) a sparse and very high dimensionalvector, with a few 1’s and 2’s here and there.

◮ Each dimension is called a feature.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Cookie Cutter Features

◮ In previous work on HPSG parse disambiguation, eachfeature records the number of times some particularpattern of tree nodes or MRS predications/variables isfound in the candidate analysis y .


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics



◮ Defining f (y) amounts to making a list of subgraphs tolook for in the tree and MRS for y .


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics




◮ A straightforward way of defining a bunch of“interesting” subgraphs to look for is to decide on a“cookie-cutter” shape, and use that to cut out sectionsof all the trees in the treebank.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics




◮ A straightforward way of defining a bunch of“interesting” subgraphs to look for is to decide on a“cookie-cutter” shape, and use that to cut out sectionsof all the trees in the treebank.

◮ This allows us to quickly enumerate tens or hundreds ofthousands of subgraphs that can occur in analyses, andbuild feature vectors out of them.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Baseline FeaturesOne of the simplest useful cookie cutters looks like this:

Cookie Cutter Example Subgraph

?

? ?

SP-HD HC C

AV - DG-V LE AJ - I LE

This cookie-cutter matches about 57,000 distinct subgraphsfrom WeScience.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Baseline Features (cont’d)

◮ So, we get a 57,000 dimensional feature space.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics



◮ MaxEnt model accuracy: 40.4%


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics




◮ For comparison, random choice accuracy: 8.1%


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics





◮ It turns out 40.4% is a fairly strong baseline.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics





◮ It turns out 40.4% is a fairly strong baseline.

◮ We’ll take this as our baseline against which to evaluateother ideas.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Can we do better?


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Can we do better?

◮ Of course!


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Can we do better?

◮ Of course!

◮ I tried about 60 different combinations of feature sets.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Can we do better?

◮ Of course!


◮ I’ll show you several of the most interesting ones.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Can we do better?

◮ Of course!


◮ I’ll show you several of the most interesting ones.

◮ Baseline features included in addition to those I’lldescribe.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

GrandparentingThe single most helpful feature type just adds some parentsto the baseline cookie cutter:

GP[1] GP[2] GP[3]

?

?

? ?

?

?

?

? ?

?

?

?

?

? ?


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

GrandparentingThe single most helpful feature type just adds some parentsto the baseline cookie cutter:

GP[1] GP[2] GP[3]

?

?

? ?

?

?

?

? ?

?

?

?

?

? ?42.97% 44.46% 44.9%


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Uncles

?

? ?

? ? ?

? ?


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Uncles

?

? ?

? ? ?

? ?

42.97%


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Syntactic Dependencies

◮ Converts the tree into a list of syntactic dependencies


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics



◮ e.g.: SB-HD MC C (meowed. : V - LE, cat : N - C LE)


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics




◮ Each such dependency is considered a feature.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics




◮ Each such dependency is considered a feature.

◮ Accuracy: 42.97%


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Lexicalizations

◮ Decorate each node in the tree with information aboutthe lexical head of that subtree.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Lexicalizations


◮ Some decoration choices: part of speech, lexical type,HEAD value, lexeme name, surface form, stem


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Lexicalizations



◮ Best performing decoration: lexeme name


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Lexicalizations




◮ Accuracy with baseline cookie cutter: 44.16%


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Lexicalizations





◮ We can do this with other cookie cutters as well.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Lexicalizations





◮ We can do this with other cookie cutters as well.

◮ Accuracy with GP[2] cookie cutter: 45.89%


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

MRS Features˘


So far, we’ve only described features extracted from thederivation tree portion of the analysis.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

MRS Features˘



◮ What about the MRS?


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

MRS Features˘



◮ What about the MRS?

◮ Tried two schemes for encoding MRS into features:variable-centric and predication-centric


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Variable-centric MRS Features˘


For each MRS variable, features for all (unordered) pairs andtriples from the set of relations that are known to apply tothat variable.Example triple for x1: (large a.ARG1, meow v.ARG1, cat n.ARG0)

Example pair for e2: (large a.ARG0, very x deg.ARG1)


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Variable-centric MRS Features{


For each MRS variable, features for all (unordered) pairs andtriples from the set of relations that are known to apply tothat variable.Example triple for x1: (large a.ARG1, meow v.ARG1, cat n.ARG0)

Example pair for e2: (large a.ARG0, very x deg.ARG1)

Accuracy: 41.40%


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Predication-centric MRS Features{


One feature for each elementary predication in the MRS,describing the relations pointed to by the non-ARG0 roles.Example feature: very x deg(ARG1=large a)


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics



One feature for each elementary predication in the MRS,describing the relations pointed to by the non-ARG0 roles.Example feature: very x deg(ARG1=large a)Accuracy: 42.61%


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics



One feature for each elementary predication in the MRS,describing the relations pointed to by the non-ARG0 roles.Example feature: very x deg(ARG1=large a)Accuracy: 42.61%Both MRS feature sets combined: 42.63% ...


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Combining Several Feature Sets

◮ Information contained in different feature templates isnot orthogonal


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics



◮ Subadditivity of performance improvements


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics




◮ However, modest improvements are possible.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics





◮ Best combination I’ve found: all the MRS features,lexeme name lexicalization, GP[2], and a few otherfeatures I didn’t describe that don’t perform well inisolation.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics





◮ Best combination I’ve found: all the MRS features,lexeme name lexicalization, GP[2], and a few otherfeatures I didn’t describe that don’t perform well inisolation.

◮ Accuracy: 47.52%


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

MaxEnt Disambiguation for WeScience: Summary

Picking the right analysis is hard.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

MaxEnt Disambiguation for WeScience: Summary

Picking the right analysis is hard.

Random choice 8.1%Strong baseline 40.4%

Best model 47.5%


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Layout

Modeling Ambiguity



Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Doesn’t 47% seem kind of sad?Well, yes, in a way.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics


◮ But there are other ways to evaluate that give us muchcheerier numbers!


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics



◮ 47% of the time we get an exactly correct answer.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics



◮ 47% of the time we get an exactly correct answer.

◮ But we don’t assign ourselves any partial credit forgetting a partially correct answer.

◮ Many other metrics do.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Some Other Metrics

Random Baseline Best Model

Exact Tree Match 8.1% 40.4% 47.5%Exact MRS Match 8.8% 41.3% 48.5%

Unlabeled PARSEVAL 80.7% 93.3% 94.7%Labeled PARSEVAL 70.3% 88.0% 90.5%Unlabeled Syn-Deps 79.2% 92.1% 93.8%Labeled Syn-Deps 71.0% 89.1% 91.3%Elementary Deps 82.1% 94.2% 95.4%Leaf Ancestor 79.0% 92.4% 93.7%


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Do metrics agree with each other?What if models that perform well under Exact Tree Matchdon’t necessarily perform will under, say, PARSEVAL?


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics


◮ For an arbitrary pair of evaluation metrics, that couldhappen.

◮ Different metrics can evaluate different aspects of amodel’s performance.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics


◮ For an arbitrary pair of evaluation metrics, that couldhappen.

◮ Different metrics can evaluate different aspects of amodel’s performance.

◮ But how about for the metrics that people commonlyemploy as overall figures of merit?


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Fortunately, this doesn’t turn out to be much of an issue.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics


◮ Conducted two experiments


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics



◮ First: optimizing MaxEnt meta-parameter


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics



◮ First: optimizing MaxEnt meta-parameter

◮ Second: picking feature combinations


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Optimizing MaxEnt meta-parameter

◮ MaxEnt has a regularization parameter


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics



◮ Controls the trade-off between generalization andoverfitting


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics




◮ Given an evaluation metric, we can determine the“optimal” value for the regularization parameterthrough cross-validation.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics




◮ Given an evaluation metric, we can determine the“optimal” value for the regularization parameterthrough cross-validation.

◮ How does this optimal value vary as a function of whichmetric is used?


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Optimizing MaxEnt meta-parameter (cont’d)

24

26

28

30

32

34

36

38

40

42

0.001 0.01 0.1 1 10 100 1000

Exa

ct M

atch

Acc

urac

y (%

)

Regularization Variance Parameter

Regularized Performance of pcfg baseline

pcfg baseline


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Optimizing MaxEnt meta-parameter (cont’d)

-1.5

-1

-0.5

0

0.5

1

0.001 0.01 0.1 1 10 100 1000

Z-S

core

s

Regularization

Z-Score Comparison of Metrics


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Optimizing MaxEnt meta-parameter (cont’d)Evidently, in practice the optimum meta-parameter is almostexactly the same for all the different metrics.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics


◮ Maximum error rate increase from optimizing with adifferent metric from the set listed a few slides ago, onbaseline feature configuration: 0.41%.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics



◮ Averaging of that figure over all the feature setconfigurations: 0.81%.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics



◮ Averaging of that figure over all the feature setconfigurations: 0.81%.

◮ Conclusion: it doesn’t really matter what metric youuse to optimize the meta-parameter.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Selecting a feature set combination

◮ Above, I described a handful of feature configurations; Iactually tested about 60 different combinations.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics



◮ In principal, the metric that I used to decide which onewas best matters.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics



◮ In principal, the metric that I used to decide which onewas best matters.

◮ However, in fact the metrics all ranked the sameconfiguration as the best.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Metrics: SummaryThere are many different syntax disambiguation evaluationmetrics available.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Metrics: SummaryThere are many different syntax disambiguation evaluationmetrics available.However, from the point of view of optimizing a model,there is little difference between the 6 or so most commonlyused metrics.


Woodley Packard

ModelingAmbiguity



MaxEnt for the ERG

Evaluation Metrics

Syntax Disambiguation: ConclusionMy best combination model appears to represent a decentapproximation of the best performance available fromcurrent techniques when viewed through any of thecommonly used metrics.Hence it is a suitable baseline for judging the success offuture forays into joint disambiguation.

joint ambiguity modeling in nlp - uio.no · joint ambiguity modeling in nlp woodley packard...

Documents