joint ambiguity modeling in nlp - uio.no · joint ambiguity modeling in nlp woodley packard...
TRANSCRIPT
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Joint Ambiguity Modeling in NLP
Woodley Packard
Universitet i Oslo
March 21, 2011
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Introduction
Ambiguity is a central phenomenon in natural language,affecting accuracy and efficiency in most if not all types ofNLP.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Types of Ambiguity
◮ Word boundaries
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Types of Ambiguity
◮ Word boundaries
◮ Difficult in some languages – e.g. Chinese
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Types of Ambiguity
◮ Word boundaries
◮ Difficult in some languages – e.g. Chinese
◮ Norwegian examples lifted from recent mailing listactivity:
◮ Lege-ring, sei-del, sel-skap, bru-sau-tomat,
sports-av-iser...
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Types of Ambiguity
◮ Sentence Boundaries
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Types of Ambiguity
◮ Sentence Boundaries
◮ Again, not marked in some languages
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Types of Ambiguity
◮ Sentence Boundaries
◮ Again, not marked in some languages
◮ Even in English, the markers are overloaded:
◮ The CEO had the P.R. Department leaders make risky
moves.
◮ The citizens voted in the U.S. Presidential Election
polls.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Types of Ambiguity
◮ Word Senses
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Types of Ambiguity
◮ Word Senses
◮ English word bank:
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Types of Ambiguity
◮ Word Senses
◮ English word bank:
◮ (n) Financial institution
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Types of Ambiguity
◮ Word Senses
◮ English word bank:
◮ (n) Financial institution
◮ (n) Side of a river / stream
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Types of Ambiguity
◮ Word Senses
◮ English word bank:
◮ (n) Financial institution
◮ (n) Side of a river / stream
◮ (n) Repository of resources (tree bank, bank of
switches)
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Types of Ambiguity
◮ Word Senses
◮ English word bank:
◮ (n) Financial institution
◮ (n) Side of a river / stream
◮ (n) Repository of resources (tree bank, bank of
switches)
◮ (v) To bet everything (on ...)
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Types of Ambiguity
◮ Word Senses
◮ English word bank:
◮ (n) Financial institution
◮ (n) Side of a river / stream
◮ (n) Repository of resources (tree bank, bank of
switches)
◮ (v) To bet everything (on ...)
◮ (v) Tilting to make a turn in an airplane
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Types of Ambiguity
◮ Word Senses
◮ English word bank:
◮ (n) Financial institution
◮ (n) Side of a river / stream
◮ (n) Repository of resources (tree bank, bank of
switches)
◮ (v) To bet everything (on ...)
◮ (v) Tilting to make a turn in an airplane
◮ (v) To do business at a bank (financial institution)
◮ .... others.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Types of Ambiguity
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Types of Ambiguity
◮ Syntactic Ambiguity
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Types of Ambiguity
◮ Syntactic Ambiguity
◮ I saw the man with the telescope.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Types of Ambiguity
◮ Syntactic Ambiguity
◮ I saw the man with the telescope.
◮ Anaphora
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Types of Ambiguity
◮ Syntactic Ambiguity
◮ I saw the man with the telescope.
◮ Anaphora
◮ Jessie and Alex grew up together, but eventually he
moved to the West coast and she moved to the East
coast.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Types of Ambiguity
◮ Syntactic Ambiguity
◮ I saw the man with the telescope.
◮ Anaphora
◮ Jessie and Alex grew up together, but eventually he
moved to the West coast and she moved to the East
coast.
◮ .... others.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Traditional Approaches to Dealing with Ambiguity
◮ How well can we resolve some particular type ofambiguity?
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Traditional Approaches to Dealing with Ambiguity
◮ How well can we resolve some particular type ofambiguity?
◮ Rich literature about how to handle each type ofambiguity.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Traditional Approaches to Dealing with Ambiguity
◮ How well can we resolve some particular type ofambiguity?
◮ Rich literature about how to handle each type ofambiguity.
◮ None are completely solved problems.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Traditional Approaches to Dealing with Ambiguity
◮ How well can we resolve some particular type ofambiguity?
◮ Rich literature about how to handle each type ofambiguity.
◮ None are completely solved problems.
◮ Some have been solved fairly well (word boundaries,sentence boundaries).
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Traditional Approaches to Dealing with Ambiguity
◮ How well can we resolve some particular type ofambiguity?
◮ Rich literature about how to handle each type ofambiguity.
◮ None are completely solved problems.
◮ Some have been solved fairly well (word boundaries,sentence boundaries).
◮ But most have room for improvement.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
An Emerging Trend in Research
◮ Results about how diverse types of information can behelpful in ambiguity resolution.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
An Emerging Trend in Research
◮ Results about how diverse types of information can behelpful in ambiguity resolution.
◮ Large body of research about using syntax as a guidefor resolving anaphora.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
An Emerging Trend in Research
◮ Results about how diverse types of information can behelpful in ambiguity resolution.
◮ Large body of research about using syntax as a guidefor resolving anaphora.
Some others:
◮ “Using Syntactic Dependency as Local Context toResolve Word Sense Ambiguity” D. Lin, ACL 1997.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
An Emerging Trend in Research
◮ Results about how diverse types of information can behelpful in ambiguity resolution.
◮ Large body of research about using syntax as a guidefor resolving anaphora.
Some others:
◮ “Using Syntactic Dependency as Local Context toResolve Word Sense Ambiguity” D. Lin, ACL 1997.
◮ “Improving Parsing and PP attachment Performancewith Sense Information” E. Agirre, T. Baldwin, D.Martinez, ACL 2008.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Joint Modeling of AmbiguityIdea:Instead of modeling individual marginal distributions for eachtype of ambiguity, or conditional models involving two typesof ambiguity, what if we model a joint distribution for all thetypes of ambiguity we are interested in at once?
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Joint Modeling of AmbiguityIdea:Instead of modeling individual marginal distributions for eachtype of ambiguity, or conditional models involving two typesof ambiguity, what if we model a joint distribution for all thetypes of ambiguity we are interested in at once?
◮ Gain: modeling flexibility
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Joint Modeling of AmbiguityIdea:Instead of modeling individual marginal distributions for eachtype of ambiguity, or conditional models involving two typesof ambiguity, what if we model a joint distribution for all thetypes of ambiguity we are interested in at once?
◮ Gain: modeling flexibility
◮ Cost: greater complexity
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Joint Modeling of AmbiguityIdea:Instead of modeling individual marginal distributions for eachtype of ambiguity, or conditional models involving two typesof ambiguity, what if we model a joint distribution for all thetypes of ambiguity we are interested in at once?
◮ Gain: modeling flexibility
◮ Cost: greater complexity
◮ Possible framework: graphical models
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Joint Modeling of AmbiguityIdea:Instead of modeling individual marginal distributions for eachtype of ambiguity, or conditional models involving two typesof ambiguity, what if we model a joint distribution for all thetypes of ambiguity we are interested in at once?
◮ Gain: modeling flexibility
◮ Cost: greater complexity
◮ Possible framework: graphical models
◮ Inference and parameter estimation: sometimestractable, sometimes not.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
A Graphical Model for Ambiguity
Anaphora
Syntax
Hobbs 1978
WSD
Lin 1997Agirre et al. 2008
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Other Dependencies Seem Likely
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Other Dependencies Seem Likely
◮ Information about anaphora may be helpful indisambiguating syntax.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Other Dependencies Seem Likely
◮ Information about anaphora may be helpful indisambiguating syntax.
◮ Word Priming: information about the senses of words ina given sentence may be helpful for determining wordsenses in nearby subsequent sentences.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Other Dependencies Seem Likely
◮ Information about anaphora may be helpful indisambiguating syntax.
◮ Word Priming: information about the senses of words ina given sentence may be helpful for determining wordsenses in nearby subsequent sentences.
◮ Syntactic Priming: information about the constructionsused in a given sentence may be helpful for determiningthe syntax of nearby subsequent sentences.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Other Dependencies Seem Likely
◮ Information about anaphora may be helpful indisambiguating syntax.
◮ Word Priming: information about the senses of words ina given sentence may be helpful for determining wordsenses in nearby subsequent sentences.
◮ Syntactic Priming: information about the constructionsused in a given sentence may be helpful for determiningthe syntax of nearby subsequent sentences.
◮ Maybe others too!
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
A Revised Graphical Model for Ambiguity
Anaphora
Syntax1 Syntax2 Syntax3
WSD1 WSD2 WSD3
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
A Roadmap for my Project
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
A Roadmap for my Project
◮ Build basic disambiguation systems for a few types ofambiguity
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
A Roadmap for my Project
◮ Build basic disambiguation systems for a few types ofambiguity
◮ Set baselines for how well we can disambiguate withouta joint model
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
A Roadmap for my Project
◮ Build basic disambiguation systems for a few types ofambiguity
◮ Set baselines for how well we can disambiguate withouta joint model
◮ Learn how to combine information from disparatesystems to form a joint model
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
A Roadmap for my Project
◮ Build basic disambiguation systems for a few types ofambiguity
◮ Set baselines for how well we can disambiguate withouta joint model
◮ Learn how to combine information from disparatesystems to form a joint model
◮ Evaluate the joint model’s performance on each type ofambiguity vs. the baselines
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Where am I on that roadmap?
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Where am I on that roadmap?
◮ Not very far along, really. I’ve built a syntaxdisambiguation system and set a strong baseline forsyntax disambiguation in isolation.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Where am I on that roadmap?
◮ Not very far along, really. I’ve built a syntaxdisambiguation system and set a strong baseline forsyntax disambiguation in isolation.
◮ Started some experiments into using more globalinformation, but so far nothing worth reporting.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Next, shifting gears...
I’ll describe the work I’ve done exploring the space of syntaxdisambiguation for HPSG grammars.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Syntax Disambiguation
◮ Given an utterance, find the best analysis licensed byyour grammar.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Syntax Disambiguation
◮ Given an utterance, find the best analysis licensed byyour grammar.
◮ With broad-coverage grammars, there can be a lot ofcandidates analyses!
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Syntax Disambiguation
◮ Given an utterance, find the best analysis licensed byyour grammar.
◮ With broad-coverage grammars, there can be a lot ofcandidates analyses!
◮ ERG licenses more than 10,000 distinct analyses for:
I would still have an appointment slot free on Tuesday, the
sixth of April, but only in the afternoon.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Syntax DisambiguationThe different analyses are usually not all semanticallyequivalent. How do we know which meaning was intendedby the speaker?
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Syntax DisambiguationThe different analyses are usually not all semanticallyequivalent. How do we know which meaning was intendedby the speaker?
Common solution: annotate the intended meaning on asufficiently large corpus of example utterances, and thenapply machine learning techniques to build a model that willallow us to guess the intended meaning on unseen data.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Layout
Modeling Ambiguity
Syntax DisambiguationMaximum Entropy ModelsMaxEnt for the ERGEvaluation Metrics
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Conditional log-linear models
p(y |x) =ew ·f (x ,y)
∑
y ′ ew ·f (x ,y ′)
where w is a vector of feature weights, typically learned fromthe training data.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Conditional log-linear models
p(y |x) =ew ·f (x ,y)
∑
y ′ ew ·f (x ,y ′)
where w is a vector of feature weights, typically learned fromthe training data.
◮ Conditional probability model for classification/rankingproblems.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Conditional log-linear models
p(y |x) =ew ·f (x ,y)
∑
y ′ ew ·f (x ,y ′)
where w is a vector of feature weights, typically learned fromthe training data.
◮ Conditional probability model for classification/rankingproblems.
◮ Describe relationship of class y to input x by n
real-valued feature functions fj(x , y).
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Conditional log-linear models
p(y |x) =ew ·f (x ,y)
∑
y ′ ew ·f (x ,y ′)
where w is a vector of feature weights, typically learned fromthe training data.
◮ Conditional probability model for classification/rankingproblems.
◮ Describe relationship of class y to input x by n
real-valued feature functions fj(x , y).
◮ Equivalently, a vector valued feature function f (x , y).
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Conditional log-linear models
p(y |x) =ew ·f (x ,y)
∑
y ′ ew ·f (x ,y ′)
where w is a vector of feature weights, typically learned fromthe training data.
◮ Conditional probability model for classification/rankingproblems.
◮ Describe relationship of class y to input x by n
real-valued feature functions fj(x , y).
◮ Equivalently, a vector valued feature function f (x , y).
Since the denominator is a function only of x and does notdepend on y , determining arg maxy p(y |x) amounts tomaximizing w · f (x , y).
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Maximum Entropy Models
◮ One common way of selecting w .
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Maximum Entropy Models
◮ One common way of selecting w .
◮ Also known as Multinomial Logistic Regression.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Maximum Entropy Models
◮ One common way of selecting w .
◮ Also known as Multinomial Logistic Regression.
◮ Corresponds to maximum likelihood estimation –maximize the conditional likelihood of the training data.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Maximum Entropy Models
◮ One common way of selecting w .
◮ Also known as Multinomial Logistic Regression.
◮ Corresponds to maximum likelihood estimation –maximize the conditional likelihood of the training data.
◮ Alternate description: model with maximum entropysubject to Ep(f (x , y)) = f̃ (x , y), the empiricalexpectation of f (x , y) on the training data.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Maximum Entropy Models
◮ One common way of selecting w .
◮ Also known as Multinomial Logistic Regression.
◮ Corresponds to maximum likelihood estimation –maximize the conditional likelihood of the training data.
◮ Alternate description: model with maximum entropysubject to Ep(f (x , y)) = f̃ (x , y), the empiricalexpectation of f (x , y) on the training data.
◮ Popular applications: POS tagging, NER, parsedisambiguation, ...
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Maximum Entropy Models
◮ One common way of selecting w .
◮ Also known as Multinomial Logistic Regression.
◮ Corresponds to maximum likelihood estimation –maximize the conditional likelihood of the training data.
◮ Alternate description: model with maximum entropysubject to Ep(f (x , y)) = f̃ (x , y), the empiricalexpectation of f (x , y) on the training data.
◮ Popular applications: POS tagging, NER, parsedisambiguation, ...
◮ Other techniques for picking w : SVMs, Perceptrons, ...
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Layout
Modeling Ambiguity
Syntax DisambiguationMaximum Entropy ModelsMaxEnt for the ERGEvaluation Metrics
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
The ERG and WeScience
◮ English Resource Grammar (ERG): broad coverageprecision computational English grammar based onHPSG
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
The ERG and WeScience
◮ English Resource Grammar (ERG): broad coverageprecision computational English grammar based onHPSG
◮ WeScience: a treebank of gold ERG analyses for around9000 sentences from Wikipedia in the domain ofComputational Linguistics.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
How well can we disambiguate when parsing WeScienceinputs just by looking at ERG analyses in isolation?
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
How well can we disambiguate when parsing WeScienceinputs just by looking at ERG analyses in isolation?
◮ Use Maximum Entropy modeling; x = an inputsentence and y = a candidate analysis.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
How well can we disambiguate when parsing WeScienceinputs just by looking at ERG analyses in isolation?
◮ Use Maximum Entropy modeling; x = an inputsentence and y = a candidate analysis.
◮ f (x , y) = a vector of features describing a candidateanalysis, possibly making reference to the input item.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
How well can we disambiguate when parsing WeScienceinputs just by looking at ERG analyses in isolation?
◮ Use Maximum Entropy modeling; x = an inputsentence and y = a candidate analysis.
◮ f (x , y) = a vector of features describing a candidateanalysis, possibly making reference to the input item.
◮ Train a MaxEnt model on WeScience.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
How well can we disambiguate when parsing WeScienceinputs just by looking at ERG analyses in isolation?
◮ Use Maximum Entropy modeling; x = an inputsentence and y = a candidate analysis.
◮ f (x , y) = a vector of features describing a candidateanalysis, possibly making reference to the input item.
◮ Train a MaxEnt model on WeScience.
◮ To disambiguate an unseen input, select the candidateanalysis with the largest p(y |x).
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
How well can we disambiguate when parsing WeScienceinputs just by looking at ERG analyses in isolation?
◮ Use Maximum Entropy modeling; x = an inputsentence and y = a candidate analysis.
◮ f (x , y) = a vector of features describing a candidateanalysis, possibly making reference to the input item.
◮ Train a MaxEnt model on WeScience.
◮ To disambiguate an unseen input, select the candidateanalysis with the largest p(y |x).
◮ Scoring: on a held-out test set, for what proportion ofthe inputs can our model identify the correct goldanalysis? (exact match)
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
How well can we disambiguate when parsing WeScienceinputs just by looking at ERG analyses in isolation?
◮ Use Maximum Entropy modeling; x = an inputsentence and y = a candidate analysis.
◮ f (x , y) = a vector of features describing a candidateanalysis, possibly making reference to the input item.
◮ Train a MaxEnt model on WeScience.
◮ To disambiguate an unseen input, select the candidateanalysis with the largest p(y |x).
◮ Scoring: on a held-out test set, for what proportion ofthe inputs can our model identify the correct goldanalysis? (exact match)
◮ Use 10-fold cross-validation to reduce measurementnoise.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
An ERG analysis consists of a derivation tree and an MRS
meaning representation.Simplified candidate analysis for The very large cat meowed.:
SB-HD MC C
SP-HD N C
D - THE LEThe
AJ-HDN NORM C
SP-HD HC C
AV - DG-V LEvery
AJ - I LElarge
N SG ILR
N - C LEcat
W PERIOD PLR
V PST OLR
V - LEmeowed.
{
the q(x1), cat n(x1), large a(e2, x1), meow v(e3, x1), very x deg(e4, e2)}
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
What’s there left to do?Define f (x , y).
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
What’s there left to do?Define f (x , y).
◮ Note that since the analysis y includes all theinformation needed to reconstruct x , we may as welljust talk about f (y) instead of f (x , y).
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
What’s there left to do?Define f (x , y).
◮ Note that since the analysis y includes all theinformation needed to reconstruct x , we may as welljust talk about f (y) instead of f (x , y).
◮ We’ll make f (y) a sparse and very high dimensionalvector, with a few 1’s and 2’s here and there.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
What’s there left to do?Define f (x , y).
◮ Note that since the analysis y includes all theinformation needed to reconstruct x , we may as welljust talk about f (y) instead of f (x , y).
◮ We’ll make f (y) a sparse and very high dimensionalvector, with a few 1’s and 2’s here and there.
◮ Each dimension is called a feature.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Cookie Cutter Features
◮ In previous work on HPSG parse disambiguation, eachfeature records the number of times some particularpattern of tree nodes or MRS predications/variables isfound in the candidate analysis y .
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Cookie Cutter Features
◮ In previous work on HPSG parse disambiguation, eachfeature records the number of times some particularpattern of tree nodes or MRS predications/variables isfound in the candidate analysis y .
◮ Defining f (y) amounts to making a list of subgraphs tolook for in the tree and MRS for y .
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Cookie Cutter Features
◮ In previous work on HPSG parse disambiguation, eachfeature records the number of times some particularpattern of tree nodes or MRS predications/variables isfound in the candidate analysis y .
◮ Defining f (y) amounts to making a list of subgraphs tolook for in the tree and MRS for y .
◮ A straightforward way of defining a bunch of“interesting” subgraphs to look for is to decide on a“cookie-cutter” shape, and use that to cut out sectionsof all the trees in the treebank.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Cookie Cutter Features
◮ In previous work on HPSG parse disambiguation, eachfeature records the number of times some particularpattern of tree nodes or MRS predications/variables isfound in the candidate analysis y .
◮ Defining f (y) amounts to making a list of subgraphs tolook for in the tree and MRS for y .
◮ A straightforward way of defining a bunch of“interesting” subgraphs to look for is to decide on a“cookie-cutter” shape, and use that to cut out sectionsof all the trees in the treebank.
◮ This allows us to quickly enumerate tens or hundreds ofthousands of subgraphs that can occur in analyses, andbuild feature vectors out of them.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Baseline FeaturesOne of the simplest useful cookie cutters looks like this:
Cookie Cutter Example Subgraph
?
? ?
SP-HD HC C
AV - DG-V LE AJ - I LE
This cookie-cutter matches about 57,000 distinct subgraphsfrom WeScience.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Baseline Features (cont’d)
◮ So, we get a 57,000 dimensional feature space.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Baseline Features (cont’d)
◮ So, we get a 57,000 dimensional feature space.
◮ MaxEnt model accuracy: 40.4%
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Baseline Features (cont’d)
◮ So, we get a 57,000 dimensional feature space.
◮ MaxEnt model accuracy: 40.4%
◮ For comparison, random choice accuracy: 8.1%
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Baseline Features (cont’d)
◮ So, we get a 57,000 dimensional feature space.
◮ MaxEnt model accuracy: 40.4%
◮ For comparison, random choice accuracy: 8.1%
◮ It turns out 40.4% is a fairly strong baseline.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Baseline Features (cont’d)
◮ So, we get a 57,000 dimensional feature space.
◮ MaxEnt model accuracy: 40.4%
◮ For comparison, random choice accuracy: 8.1%
◮ It turns out 40.4% is a fairly strong baseline.
◮ We’ll take this as our baseline against which to evaluateother ideas.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Can we do better?
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Can we do better?
◮ Of course!
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Can we do better?
◮ Of course!
◮ I tried about 60 different combinations of feature sets.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Can we do better?
◮ Of course!
◮ I tried about 60 different combinations of feature sets.
◮ I’ll show you several of the most interesting ones.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Can we do better?
◮ Of course!
◮ I tried about 60 different combinations of feature sets.
◮ I’ll show you several of the most interesting ones.
◮ Baseline features included in addition to those I’lldescribe.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
GrandparentingThe single most helpful feature type just adds some parentsto the baseline cookie cutter:
GP[1] GP[2] GP[3]
?
?
? ?
?
?
?
? ?
?
?
?
?
? ?
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
GrandparentingThe single most helpful feature type just adds some parentsto the baseline cookie cutter:
GP[1] GP[2] GP[3]
?
?
? ?
?
?
?
? ?
?
?
?
?
? ?42.97% 44.46% 44.9%
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Uncles
?
? ?
? ? ?
? ?
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Uncles
?
? ?
? ? ?
? ?
42.97%
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Syntactic Dependencies
◮ Converts the tree into a list of syntactic dependencies
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Syntactic Dependencies
◮ Converts the tree into a list of syntactic dependencies
◮ e.g.: SB-HD MC C (meowed. : V - LE, cat : N - C LE)
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Syntactic Dependencies
◮ Converts the tree into a list of syntactic dependencies
◮ e.g.: SB-HD MC C (meowed. : V - LE, cat : N - C LE)
◮ Each such dependency is considered a feature.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Syntactic Dependencies
◮ Converts the tree into a list of syntactic dependencies
◮ e.g.: SB-HD MC C (meowed. : V - LE, cat : N - C LE)
◮ Each such dependency is considered a feature.
◮ Accuracy: 42.97%
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Lexicalizations
◮ Decorate each node in the tree with information aboutthe lexical head of that subtree.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Lexicalizations
◮ Decorate each node in the tree with information aboutthe lexical head of that subtree.
◮ Some decoration choices: part of speech, lexical type,HEAD value, lexeme name, surface form, stem
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Lexicalizations
◮ Decorate each node in the tree with information aboutthe lexical head of that subtree.
◮ Some decoration choices: part of speech, lexical type,HEAD value, lexeme name, surface form, stem
◮ Best performing decoration: lexeme name
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Lexicalizations
◮ Decorate each node in the tree with information aboutthe lexical head of that subtree.
◮ Some decoration choices: part of speech, lexical type,HEAD value, lexeme name, surface form, stem
◮ Best performing decoration: lexeme name
◮ Accuracy with baseline cookie cutter: 44.16%
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Lexicalizations
◮ Decorate each node in the tree with information aboutthe lexical head of that subtree.
◮ Some decoration choices: part of speech, lexical type,HEAD value, lexeme name, surface form, stem
◮ Best performing decoration: lexeme name
◮ Accuracy with baseline cookie cutter: 44.16%
◮ We can do this with other cookie cutters as well.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Lexicalizations
◮ Decorate each node in the tree with information aboutthe lexical head of that subtree.
◮ Some decoration choices: part of speech, lexical type,HEAD value, lexeme name, surface form, stem
◮ Best performing decoration: lexeme name
◮ Accuracy with baseline cookie cutter: 44.16%
◮ We can do this with other cookie cutters as well.
◮ Accuracy with GP[2] cookie cutter: 45.89%
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
MRS Features˘
the q(x1), cat n(x1), large a(e2, x1), meow v(e3, x1), very x deg(e4, e2)}
So far, we’ve only described features extracted from thederivation tree portion of the analysis.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
MRS Features˘
the q(x1), cat n(x1), large a(e2, x1), meow v(e3, x1), very x deg(e4, e2)}
So far, we’ve only described features extracted from thederivation tree portion of the analysis.
◮ What about the MRS?
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
MRS Features˘
the q(x1), cat n(x1), large a(e2, x1), meow v(e3, x1), very x deg(e4, e2)}
So far, we’ve only described features extracted from thederivation tree portion of the analysis.
◮ What about the MRS?
◮ Tried two schemes for encoding MRS into features:variable-centric and predication-centric
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Variable-centric MRS Features˘
the q(x1), cat n(x1), large a(e2, x1), meow v(e3, x1), very x deg(e4, e2)}
For each MRS variable, features for all (unordered) pairs andtriples from the set of relations that are known to apply tothat variable.Example triple for x1: (large a.ARG1, meow v.ARG1, cat n.ARG0)
Example pair for e2: (large a.ARG0, very x deg.ARG1)
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Variable-centric MRS Features{
the q(x1), cat n(x1), large a(e2, x1), meow v(e3, x1), very x deg(e4, e2)}
For each MRS variable, features for all (unordered) pairs andtriples from the set of relations that are known to apply tothat variable.Example triple for x1: (large a.ARG1, meow v.ARG1, cat n.ARG0)
Example pair for e2: (large a.ARG0, very x deg.ARG1)
Accuracy: 41.40%
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Predication-centric MRS Features{
the q(x1), cat n(x1), large a(e2, x1), meow v(e3, x1), very x deg(e4, e2)}
One feature for each elementary predication in the MRS,describing the relations pointed to by the non-ARG0 roles.Example feature: very x deg(ARG1=large a)
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Predication-centric MRS Features{
the q(x1), cat n(x1), large a(e2, x1), meow v(e3, x1), very x deg(e4, e2)}
One feature for each elementary predication in the MRS,describing the relations pointed to by the non-ARG0 roles.Example feature: very x deg(ARG1=large a)Accuracy: 42.61%
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Predication-centric MRS Features{
the q(x1), cat n(x1), large a(e2, x1), meow v(e3, x1), very x deg(e4, e2)}
One feature for each elementary predication in the MRS,describing the relations pointed to by the non-ARG0 roles.Example feature: very x deg(ARG1=large a)Accuracy: 42.61%Both MRS feature sets combined: 42.63% ...
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Combining Several Feature Sets
◮ Information contained in different feature templates isnot orthogonal
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Combining Several Feature Sets
◮ Information contained in different feature templates isnot orthogonal
◮ Subadditivity of performance improvements
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Combining Several Feature Sets
◮ Information contained in different feature templates isnot orthogonal
◮ Subadditivity of performance improvements
◮ However, modest improvements are possible.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Combining Several Feature Sets
◮ Information contained in different feature templates isnot orthogonal
◮ Subadditivity of performance improvements
◮ However, modest improvements are possible.
◮ Best combination I’ve found: all the MRS features,lexeme name lexicalization, GP[2], and a few otherfeatures I didn’t describe that don’t perform well inisolation.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Combining Several Feature Sets
◮ Information contained in different feature templates isnot orthogonal
◮ Subadditivity of performance improvements
◮ However, modest improvements are possible.
◮ Best combination I’ve found: all the MRS features,lexeme name lexicalization, GP[2], and a few otherfeatures I didn’t describe that don’t perform well inisolation.
◮ Accuracy: 47.52%
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
MaxEnt Disambiguation for WeScience: Summary
Picking the right analysis is hard.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
MaxEnt Disambiguation for WeScience: Summary
Picking the right analysis is hard.
Random choice 8.1%Strong baseline 40.4%
Best model 47.5%
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Layout
Modeling Ambiguity
Syntax DisambiguationMaximum Entropy ModelsMaxEnt for the ERGEvaluation Metrics
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Doesn’t 47% seem kind of sad?Well, yes, in a way.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Doesn’t 47% seem kind of sad?Well, yes, in a way.
◮ But there are other ways to evaluate that give us muchcheerier numbers!
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Doesn’t 47% seem kind of sad?Well, yes, in a way.
◮ But there are other ways to evaluate that give us muchcheerier numbers!
◮ 47% of the time we get an exactly correct answer.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Doesn’t 47% seem kind of sad?Well, yes, in a way.
◮ But there are other ways to evaluate that give us muchcheerier numbers!
◮ 47% of the time we get an exactly correct answer.
◮ But we don’t assign ourselves any partial credit forgetting a partially correct answer.
◮ Many other metrics do.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Some Other Metrics
Random Baseline Best Model
Exact Tree Match 8.1% 40.4% 47.5%Exact MRS Match 8.8% 41.3% 48.5%
Unlabeled PARSEVAL 80.7% 93.3% 94.7%Labeled PARSEVAL 70.3% 88.0% 90.5%Unlabeled Syn-Deps 79.2% 92.1% 93.8%Labeled Syn-Deps 71.0% 89.1% 91.3%Elementary Deps 82.1% 94.2% 95.4%Leaf Ancestor 79.0% 92.4% 93.7%
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Do metrics agree with each other?What if models that perform well under Exact Tree Matchdon’t necessarily perform will under, say, PARSEVAL?
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Do metrics agree with each other?What if models that perform well under Exact Tree Matchdon’t necessarily perform will under, say, PARSEVAL?
◮ For an arbitrary pair of evaluation metrics, that couldhappen.
◮ Different metrics can evaluate different aspects of amodel’s performance.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Do metrics agree with each other?What if models that perform well under Exact Tree Matchdon’t necessarily perform will under, say, PARSEVAL?
◮ For an arbitrary pair of evaluation metrics, that couldhappen.
◮ Different metrics can evaluate different aspects of amodel’s performance.
◮ But how about for the metrics that people commonlyemploy as overall figures of merit?
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Fortunately, this doesn’t turn out to be much of an issue.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Fortunately, this doesn’t turn out to be much of an issue.
◮ Conducted two experiments
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Fortunately, this doesn’t turn out to be much of an issue.
◮ Conducted two experiments
◮ First: optimizing MaxEnt meta-parameter
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Fortunately, this doesn’t turn out to be much of an issue.
◮ Conducted two experiments
◮ First: optimizing MaxEnt meta-parameter
◮ Second: picking feature combinations
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Optimizing MaxEnt meta-parameter
◮ MaxEnt has a regularization parameter
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Optimizing MaxEnt meta-parameter
◮ MaxEnt has a regularization parameter
◮ Controls the trade-off between generalization andoverfitting
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Optimizing MaxEnt meta-parameter
◮ MaxEnt has a regularization parameter
◮ Controls the trade-off between generalization andoverfitting
◮ Given an evaluation metric, we can determine the“optimal” value for the regularization parameterthrough cross-validation.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Optimizing MaxEnt meta-parameter
◮ MaxEnt has a regularization parameter
◮ Controls the trade-off between generalization andoverfitting
◮ Given an evaluation metric, we can determine the“optimal” value for the regularization parameterthrough cross-validation.
◮ How does this optimal value vary as a function of whichmetric is used?
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Optimizing MaxEnt meta-parameter (cont’d)
24
26
28
30
32
34
36
38
40
42
0.001 0.01 0.1 1 10 100 1000
Exa
ct M
atch
Acc
urac
y (%
)
Regularization Variance Parameter
Regularized Performance of pcfg baseline
pcfg baseline
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Optimizing MaxEnt meta-parameter (cont’d)
-1.5
-1
-0.5
0
0.5
1
0.001 0.01 0.1 1 10 100 1000
Z-S
core
s
Regularization
Z-Score Comparison of Metrics
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Optimizing MaxEnt meta-parameter (cont’d)Evidently, in practice the optimum meta-parameter is almostexactly the same for all the different metrics.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Optimizing MaxEnt meta-parameter (cont’d)Evidently, in practice the optimum meta-parameter is almostexactly the same for all the different metrics.
◮ Maximum error rate increase from optimizing with adifferent metric from the set listed a few slides ago, onbaseline feature configuration: 0.41%.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Optimizing MaxEnt meta-parameter (cont’d)Evidently, in practice the optimum meta-parameter is almostexactly the same for all the different metrics.
◮ Maximum error rate increase from optimizing with adifferent metric from the set listed a few slides ago, onbaseline feature configuration: 0.41%.
◮ Averaging of that figure over all the feature setconfigurations: 0.81%.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Optimizing MaxEnt meta-parameter (cont’d)Evidently, in practice the optimum meta-parameter is almostexactly the same for all the different metrics.
◮ Maximum error rate increase from optimizing with adifferent metric from the set listed a few slides ago, onbaseline feature configuration: 0.41%.
◮ Averaging of that figure over all the feature setconfigurations: 0.81%.
◮ Conclusion: it doesn’t really matter what metric youuse to optimize the meta-parameter.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Selecting a feature set combination
◮ Above, I described a handful of feature configurations; Iactually tested about 60 different combinations.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Selecting a feature set combination
◮ Above, I described a handful of feature configurations; Iactually tested about 60 different combinations.
◮ In principal, the metric that I used to decide which onewas best matters.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Selecting a feature set combination
◮ Above, I described a handful of feature configurations; Iactually tested about 60 different combinations.
◮ In principal, the metric that I used to decide which onewas best matters.
◮ However, in fact the metrics all ranked the sameconfiguration as the best.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Metrics: SummaryThere are many different syntax disambiguation evaluationmetrics available.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Metrics: SummaryThere are many different syntax disambiguation evaluationmetrics available.However, from the point of view of optimizing a model,there is little difference between the 6 or so most commonlyused metrics.
Joint AmbiguityModeling in NLP
Woodley Packard
ModelingAmbiguity
SyntaxDisambiguation
Maximum EntropyModels
MaxEnt for the ERG
Evaluation Metrics
Syntax Disambiguation: ConclusionMy best combination model appears to represent a decentapproximation of the best performance available fromcurrent techniques when viewed through any of thecommonly used metrics.Hence it is a suitable baseline for judging the success offuture forays into joint disambiguation.