natural language processing lecture 22—11/14/2013 jim martin
TRANSCRIPT
Semantics
What we’ve been discussing has mainly concerned sentence meanings. These refer to propositions, things that can be said to be true or false with respect to some model (world).
Today we’ll look at a related notion called lexical semantics. Roughly what we mean when we talk about the meaning of a word.
04/19/23 Speech and Language Processing - Jurafsky and Martin 2
04/19/23 Speech and Language Processing - Jurafsky and Martin 3
Today
Lexical Semantics Word sense disambiguation Semantic role labeling
04/19/23 Speech and Language Processing - Jurafsky and Martin 4
Back to Words: Meanings
In our discussion of compositional semantics, we only scratched surface of word meaning
The only place we did anything was with the complex attachments to verbs and determiners (quantifiers)
Let’s start by looking at yet another resource
04/19/23 Speech and Language Processing - Jurafsky and Martin 5
WordNet
WordNet is a database of facts about words Here we’ll only use the English WordNet; there
are others Meanings and the relations among them
www.cogsci.princeton.edu/~wn Currently about 100,000 nouns, 11,000 verbs,
20,000 adjectives, and 4,000 adverbs Arranged in separate files (DBs)
04/19/23 Speech and Language Processing - Jurafsky and Martin 8
Inside Words
WordNet relations are called paradigmatic relations They connect lexemes together in
particular ways but don’t say anything about what the meaning representation of a particular lexeme should consist of
That’s what I mean by inside word meanings.
04/19/23 Speech and Language Processing - Jurafsky and Martin 9
Inside Words
Various approaches have been followed to describe the internal structure of individual word meanings. We’ll look at only a few… Thematic roles in predicate-bearing
lexemes Selection restrictions on thematic roles
04/19/23 Speech and Language Processing - Jurafsky and Martin 10
Inside Words
In our semantics examples, we used various FOL predicates to capture various aspects of events, including the notion of roles Havers, takers, givers, servers, etc. All specific to each verb/predicate.
Thematic roles Thematic roles are semantic generalizations over the
specific roles that occur with specific verbs. I.e. Takers, givers, eaters, makers, doers, killers, all have
something in common -er They’re all the agents of the actions
We can generalize across other roles as well to come up with a small finite set of such roles
04/19/23 Speech and Language Processing - Jurafsky and Martin 12
Thematic Roles
Takes some of the work away from the verbs. It’s not the case that every verb is
unique and has to completely specify how all of its arguments behave
Provides a locus for organizing semantic processing
It permits us to distinguish near surface-level semantics from deeper semantics
04/19/23 Speech and Language Processing - Jurafsky and Martin 13
Linking
Thematic roles, syntactic categories and their positions in larger syntactic structures are all intertwined in complicated ways. For example… AGENTS often appear as grammatical
subjects In a VP->V NP rule, the NP is often a THEME
So how might we go about studying these ideas? Get a corpus Do some annotation
Resources
For parsing we had TreeBanks For lexical semantics we have
WordNets So for thematic roles....
04/19/23 Speech and Language Processing - Jurafsky and Martin 14
04/19/23 Speech and Language Processing - Jurafsky and Martin 15
Resources
There are 2 major English resources out there with thematic-role-like data PropBank
Layered on the Penn TreeBank Small number (25ish) labels For each semantic predicate, identify the
constituents in the tree that are arguments to that predicate and
Label each with its appropriate role
FrameNet Based on a theory of semantics known as
frame semantics. Large number of frame-specific labels
Propbank Example
Cover (as in smear) Arg0 (agent: the causer of the smearing) Arg1 (theme: “thing covered”) Arg2 (covering: “stuff being smeared”)
[McAdams and crew] covered [the floors] with [checked linoleum].
04/19/23 Speech and Language Processing - Jurafsky and Martin 16
Propbank
Arg 0 and Arg 1 roughly correspond to the notions of agent and theme Causer and thing most directly effected
The remaining args are intended to be verb specific So there really aren’t a small finite set of
roles Arg3 for “cover” isn’t the same as the
Arg3 for “give”...
04/19/23 Speech and Language Processing - Jurafsky and Martin 17
04/19/23 Speech and Language Processing - Jurafsky and Martin 18
Problems
What exactly is a role? What’s the right set of roles? Are such roles universals? Are these roles atomic?
I.e. Agents Animate, Volitional, Direct causers, etc
Can we automatically label syntactic constituents with thematic roles? Semantic Role Labeling (coming up)
04/19/23 Speech and Language Processing - Jurafsky and Martin 19
Selection Restrictions
Consider.... I want to eat someplace near campus
04/19/23 Speech and Language Processing - Jurafsky and Martin 20
Selection Restrictions
Consider.... I want to eat someplace near campus Using thematic roles we can now say
that eat is a predicate that has an AGENT and a THEME What else?
And that the AGENT must be capable of eating and the THEME must be something typically capable of being eaten
04/19/23 Speech and Language Processing - Jurafsky and Martin 21
As Logical Statements
For eat… Eating(e) ^Agent(e,x)^
Theme(e,y)^Food(y)(adding in all the right quantifiers and
lambdas)
04/19/23 Speech and Language Processing - Jurafsky and Martin 22
Back to WordNet
Use WordNet hyponyms (type) to encode the selection restrictions
04/19/23 Speech and Language Processing - Jurafsky and Martin 23
Specificity of Restrictions
Consider the verbs imagine, lift and diagonalize in the following examples To diagonalize a matrix is to find its
eigenvalues Atlantis lifted Galileo from the pad Imagine a tennis game
What can you say about THEME in each with respect to the verb?
Some will be high up in the WordNet hierarchy, others not so high…
04/19/23 Speech and Language Processing - Jurafsky and Martin 24
Problems
Unfortunately, verbs are polysemous and language is creative… WSJ examples… … ate glass on an empty stomach
accompanied only by water and tea you can’t eat gold for lunch if you’re
hungry … get it to try to eat Afghanistan
04/19/23 Speech and Language Processing - Jurafsky and Martin 25
Solutions
Eat glass Not really a problem. It is actually about an eating
event
Eat gold Also about eating, and the can’t creates a scope that
permits the THEME to not be edible
Eat Afghanistan This is harder, its not really about eating at all
04/19/23 Speech and Language Processing - Jurafsky and Martin 26
Discovering the Restrictions
Instead of hand-coding the restrictions for each verb, can we discover a verb’s restrictions by using a corpus and WordNet?1. Parse sentences and find heads2. Label the thematic roles3. Collect statistics on the co-occurrence of
particular headwords with particular thematic roles
4. Use the WordNet hypernym structure to find the most meaningful level to use as a restriction
04/19/23 Speech and Language Processing - Jurafsky and Martin 27
WSD and Selection Restrictions
Word sense disambiguation refers to the process of selecting the right sense for a word from among the senses that the word is known to have
Semantic selection restrictions can be used to disambiguate Ambiguous arguments to unambiguous predicates Ambiguous predicates with unambiguous
arguments Ambiguity all around
04/19/23 Speech and Language Processing - Jurafsky and Martin 28
WSD and Selection Restrictions
Ambiguous arguments Prepare a dish Wash a dish
Ambiguous predicates Serve Denver Serve breakfast
Both Serves vegetarian dishes
04/19/23 Speech and Language Processing - Jurafsky and Martin 29
WSD and Selection Restrictions
This approach is complementary to the compositional analysis approach. You need a parse tree and some form of
predicate-argument analysis derived from The tree and its attachments All the word senses coming up from the
lexemes at the leaves of the tree Ill-formed analyses are eliminated by noting
any selection restriction violations
04/19/23 Speech and Language Processing - Jurafsky and Martin 30
Problems
As we saw earlier, selection restrictions are violated all the time.
This doesn’t mean that the sentences are ill-formed or preferred less than others.
This approach needs some way of categorizing and dealing with the various ways that restrictions can be violated
04/19/23 Speech and Language Processing - Jurafsky and Martin 31
Supervised ML Approaches
That’s too hard… try something more data-oriented
In supervised machine learning approaches, a training corpus of words tagged in context with their sense is used to train a classifier that can tag words in new text (that reflects the training text)
04/19/23 Speech and Language Processing - Jurafsky and Martin 32
WSD Tags
What’s a tag? A dictionary sense?
For example, for WordNet an instance of “bass” in a text has 8 possible tags or labels (bass1 through bass8).
04/19/23 Speech and Language Processing - Jurafsky and Martin 33
WordNet Bass
The noun ``bass'' has 8 senses in WordNet
1. bass - (the lowest part of the musical range)2. bass, bass part - (the lowest part in polyphonic music)3. bass, basso - (an adult male singer with the lowest voice)4. sea bass, bass - (flesh of lean-fleshed saltwater fish of the family
Serranidae)5. freshwater bass, bass - (any of various North American lean-
fleshed freshwater fishes especially of the genus Micropterus)6. bass, bass voice, basso - (the lowest adult male singing voice)7. bass - (the member with the lowest range of a family of musical
instruments)8. bass -(nontechnical name for any of numerous edible marine and freshwater spiny-finned fishes)
04/19/23 Speech and Language Processing - Jurafsky and Martin 34
Representations
Most supervised ML approaches require a very simple representation for the input training data. Vectors of sets of feature/value pairs
I.e. files of comma-separated values
So our first task is to extract training data from a corpus with respect to a particular instance of a target word This typically consists of a characterization of
the window of text surrounding the target
04/19/23 Speech and Language Processing - Jurafsky and Martin 35
Representations
This is where ML and NLP intersect If you stick to trivial surface features
that are easy to extract from a text, then most of the work is in the ML system
If you decide to use features that require more analysis (say parse trees) then the ML part may be doing less work (relatively) if these features are truly informative
04/19/23 Speech and Language Processing - Jurafsky and Martin 36
Surface Representations
Collocational and co-occurrence information Collocational
Encode features about the words that appear in specific positions to the right and left of the target word Often limited to the words themselves as well as they’re
part of speech
Co-occurrence Features characterizing the words that occur anywhere
in the window regardless of position Typically limited to frequency counts
04/19/23 Speech and Language Processing - Jurafsky and Martin 37
Examples
Example text (WSJ) An electric guitar and bass player stand
off to one side not really part of the scene, just as a sort of nod to gringo expectations perhaps
Assume a window of +/- 2 from the target
04/19/23 Speech and Language Processing - Jurafsky and Martin 38
Examples
Example text An electric guitar and bass player stand
off to one side not really part of the scene, just as a sort of nod to gringo expectations perhaps
Assume a window of +/- 2 from the target
04/19/23 Speech and Language Processing - Jurafsky and Martin 39
Collocational
Position-specific information about the words in the window
guitar and bass player stand [guitar, NN, and, CJC, player, NN, stand,
VVB] In other words, a vector consisting of [position n word, position n part-of-
speech…]
04/19/23 Speech and Language Processing - Jurafsky and Martin 40
Co-occurrence
Information about the words that occur within the window.
First derive a set of terms to place in the vector.
Then note how often each of those terms occurs in a given window.
04/19/23 Speech and Language Processing - Jurafsky and Martin 41
Co-Occurrence Example
Assume we’ve settled on a possible vocabulary of 12 words that includes guitar and player but not and and stand
guitar and bass player stand [0,0,0,1,0,0,0,0,0,1,0,0]
04/19/23 Speech and Language Processing - Jurafsky and Martin 42
Classifiers
Once we cast the WSD problem as a classification problem, then all sorts of techniques are possible Naïve Bayes (the right thing to try first) Decision lists Decision trees MaxEnt Support vector machines Nearest neighbor methods…
04/19/23 Speech and Language Processing - Jurafsky and Martin 43
Classifiers
The choice of technique, in part, depends on the set of features that have been used Some techniques work better/worse with
features with numerical values Some techniques work better/worse with
features that have large numbers of possible values For example, the feature the word to the left
has a fairly large number of possible values
04/19/23 Speech and Language Processing - Jurafsky and Martin 44
Naïve Bayes
Argmax P(sense|feature vector) Rewriting with Bayes and assuming
independence of the features
04/19/23 Speech and Language Processing - Jurafsky and Martin 45
Naïve Bayes
P(s) … just the prior of that sense. Just as with part of speech tagging, not
all senses will occur with equal frequency
P(vj|s)… conditional probability of some particular feature/value combination given a particular sense
You can get both of these from a tagged corpus with the features encoded
04/19/23 Speech and Language Processing - Jurafsky and Martin 46
Naïve Bayes Test
On a corpus of examples of uses of the word line, naïve Bayes achieved about 73% correct
Good?
04/19/23 Speech and Language Processing - Jurafsky and Martin 47
Bootstrapping
What if you don’t have enough data to train a system…
Bootstrap Pick a word that you as an analyst think
will co-occur with your target word in particular sense
Grep through your corpus for your target word and the hypothesized word
Assume that the target tag is the right one
04/19/23 Speech and Language Processing - Jurafsky and Martin 48
Bootstrapping
For bass Assume the word play occurs with the
music sense and the word fish occurs with the fish sense
Grep
04/19/23 Speech and Language Processing - Jurafsky and Martin 49
Bass Results
Assume these results are correct and use them as training data.
Assume these results are correct and use them as training data.
04/19/23 Speech and Language Processing - Jurafsky and Martin 50
Problems
Given these general ML approaches, how many classifiers do I need to perform WSD robustly? One for each ambiguous word in the
language How do you decide what set of
tags/labels/senses to use for a given word? Depends on the application
04/19/23 Speech and Language Processing - Jurafsky and Martin 51
WordNet Bass
Tagging with this set of senses is an impossibly hard task that’s probably overkill for any realistic application
1. bass - (the lowest part of the musical range)2. bass, bass part - (the lowest part in polyphonic music)3. bass, basso - (an adult male singer with the lowest voice)4. sea bass, bass - (flesh of lean-fleshed saltwater fish of the family
Serranidae)5. freshwater bass, bass - (any of various North American lean-fleshed
freshwater fishes especially of the genus Micropterus)6. bass, bass voice, basso - (the lowest adult male singing voice)7. bass - (the member with the lowest range of a family of musical
instruments)8. bass -(nontechnical name for any of numerous edible marine and freshwater spiny-finned fishes)
Possible Applications
Tagging to improve Translation
Tag sets are specific to the language pairs
Q/A and Information retrieval (search) Tag sets are sensitive to frequent query
terms In these cases, you need to be able to
disambiguate on both sides effectively Not always easy with queries
“bat care”
04/19/23 Speech and Language Processing - Jurafsky and Martin 52
Quiz Roadmap
For parsing/grammars Grammars, trees, derivations, treebanks Algorithms...
CKY Probabilistic CKY
What’s the model? How do you get the parameters? How do you decode (argmax parse)
Why is the basic PCFG model weak? How can you improve it
04/19/23 Speech and Language Processing - Jurafsky and Martin 53
Quiz Roadmap
For FOL/Compositional semantics stuff there are 2 issues Representing meaning Composing meanings
So I can ask questions like... Here’s a sentence give me a valid reasonable FOL
representation for it Here’s a grammar with semantic attachments; give me the
FOL expression for the following sentence Here’s a grammar and a sentence and a desired FOL
output; give me semantic attachments that would deliver that FOL for that sentence given that grammar
04/19/23 Speech and Language Processing - Jurafsky and Martin 54
Quiz 2
Chapter 12: Skip 12.7.2, 12.8, and 12.9
Chapter 13: Skip 13.4.2, 13.4.3 Chapter 14: Skip 14.6.2, 14.8, 14.9,
14.10 Chapter 17: Skip 17.4.2, 17.5, 17.6 Chapter 18: Skip 18.3.2, 18.4, 18.5
and 18.6
04/19/23 Speech and Language Processing - Jurafsky and Martin 55