natural language processing lecture 22—11/14/2013 jim martin

Natural Language Processing

Lecture 22—11/14/2013Jim Martin

Semantics

What we’ve been discussing has mainly concerned sentence meanings. These refer to propositions, things that can be said to be true or false with respect to some model (world).

Today we’ll look at a related notion called lexical semantics. Roughly what we mean when we talk about the meaning of a word.

04/19/23 Speech and Language Processing - Jurafsky and Martin 2


Today

Lexical Semantics Word sense disambiguation Semantic role labeling


Back to Words: Meanings

In our discussion of compositional semantics, we only scratched surface of word meaning

The only place we did anything was with the complex attachments to verbs and determiners (quantifiers)

Let’s start by looking at yet another resource


WordNet

WordNet is a database of facts about words Here we’ll only use the English WordNet; there

are others Meanings and the relations among them

www.cogsci.princeton.edu/~wn Currently about 100,000 nouns, 11,000 verbs,

20,000 adjectives, and 4,000 adverbs Arranged in separate files (DBs)


WordNet Relations


WordNet Hierarchies


Inside Words

WordNet relations are called paradigmatic relations They connect lexemes together in

particular ways but don’t say anything about what the meaning representation of a particular lexeme should consist of

That’s what I mean by inside word meanings.


Inside Words

Various approaches have been followed to describe the internal structure of individual word meanings. We’ll look at only a few… Thematic roles in predicate-bearing

lexemes Selection restrictions on thematic roles


Inside Words

In our semantics examples, we used various FOL predicates to capture various aspects of events, including the notion of roles Havers, takers, givers, servers, etc. All specific to each verb/predicate.

Thematic roles Thematic roles are semantic generalizations over the

specific roles that occur with specific verbs. I.e. Takers, givers, eaters, makers, doers, killers, all have

something in common -er They’re all the agents of the actions

We can generalize across other roles as well to come up with a small finite set of such roles


Thematic Roles


Thematic Roles

Takes some of the work away from the verbs. It’s not the case that every verb is

unique and has to completely specify how all of its arguments behave

Provides a locus for organizing semantic processing

It permits us to distinguish near surface-level semantics from deeper semantics


Linking

Thematic roles, syntactic categories and their positions in larger syntactic structures are all intertwined in complicated ways. For example… AGENTS often appear as grammatical

subjects In a VP->V NP rule, the NP is often a THEME

So how might we go about studying these ideas? Get a corpus Do some annotation

Resources

For parsing we had TreeBanks For lexical semantics we have

WordNets So for thematic roles....



Resources

There are 2 major English resources out there with thematic-role-like data PropBank

Layered on the Penn TreeBank Small number (25ish) labels For each semantic predicate, identify the

constituents in the tree that are arguments to that predicate and

Label each with its appropriate role

FrameNet Based on a theory of semantics known as

frame semantics. Large number of frame-specific labels

Propbank Example

Cover (as in smear) Arg0 (agent: the causer of the smearing) Arg1 (theme: “thing covered”) Arg2 (covering: “stuff being smeared”)

[McAdams and crew] covered [the floors] with [checked linoleum].


Propbank

Arg 0 and Arg 1 roughly correspond to the notions of agent and theme Causer and thing most directly effected

The remaining args are intended to be verb specific So there really aren’t a small finite set of

roles Arg3 for “cover” isn’t the same as the

Arg3 for “give”...



Problems

What exactly is a role? What’s the right set of roles? Are such roles universals? Are these roles atomic?

I.e. Agents Animate, Volitional, Direct causers, etc

Can we automatically label syntactic constituents with thematic roles? Semantic Role Labeling (coming up)


Selection Restrictions

Consider.... I want to eat someplace near campus


Selection Restrictions

Consider.... I want to eat someplace near campus Using thematic roles we can now say

that eat is a predicate that has an AGENT and a THEME What else?

And that the AGENT must be capable of eating and the THEME must be something typically capable of being eaten


As Logical Statements

For eat… Eating(e) ^Agent(e,x)^

Theme(e,y)^Food(y)(adding in all the right quantifiers and

lambdas)


Back to WordNet

Use WordNet hyponyms (type) to encode the selection restrictions


Specificity of Restrictions

Consider the verbs imagine, lift and diagonalize in the following examples To diagonalize a matrix is to find its

eigenvalues Atlantis lifted Galileo from the pad Imagine a tennis game

What can you say about THEME in each with respect to the verb?

Some will be high up in the WordNet hierarchy, others not so high…


Problems

Unfortunately, verbs are polysemous and language is creative… WSJ examples… … ate glass on an empty stomach

accompanied only by water and tea you can’t eat gold for lunch if you’re

hungry … get it to try to eat Afghanistan


Solutions

Eat glass Not really a problem. It is actually about an eating

event

Eat gold Also about eating, and the can’t creates a scope that

permits the THEME to not be edible

Eat Afghanistan This is harder, its not really about eating at all


Discovering the Restrictions

Instead of hand-coding the restrictions for each verb, can we discover a verb’s restrictions by using a corpus and WordNet?1. Parse sentences and find heads2. Label the thematic roles3. Collect statistics on the co-occurrence of

particular headwords with particular thematic roles

4. Use the WordNet hypernym structure to find the most meaningful level to use as a restriction


WSD and Selection Restrictions

Word sense disambiguation refers to the process of selecting the right sense for a word from among the senses that the word is known to have

Semantic selection restrictions can be used to disambiguate Ambiguous arguments to unambiguous predicates Ambiguous predicates with unambiguous

arguments Ambiguity all around



Ambiguous arguments Prepare a dish Wash a dish

Ambiguous predicates Serve Denver Serve breakfast

Both Serves vegetarian dishes



This approach is complementary to the compositional analysis approach. You need a parse tree and some form of

predicate-argument analysis derived from The tree and its attachments All the word senses coming up from the

lexemes at the leaves of the tree Ill-formed analyses are eliminated by noting

any selection restriction violations


Problems

As we saw earlier, selection restrictions are violated all the time.

This doesn’t mean that the sentences are ill-formed or preferred less than others.

This approach needs some way of categorizing and dealing with the various ways that restrictions can be violated


Supervised ML Approaches

That’s too hard… try something more data-oriented

In supervised machine learning approaches, a training corpus of words tagged in context with their sense is used to train a classifier that can tag words in new text (that reflects the training text)


WSD Tags

What’s a tag? A dictionary sense?

For example, for WordNet an instance of “bass” in a text has 8 possible tags or labels (bass1 through bass8).


WordNet Bass

The noun ``bass'' has 8 senses in WordNet

1. bass - (the lowest part of the musical range)2. bass, bass part - (the lowest part in polyphonic music)3. bass, basso - (an adult male singer with the lowest voice)4. sea bass, bass - (flesh of lean-fleshed saltwater fish of the family

Serranidae)5. freshwater bass, bass - (any of various North American lean-

fleshed freshwater fishes especially of the genus Micropterus)6. bass, bass voice, basso - (the lowest adult male singing voice)7. bass - (the member with the lowest range of a family of musical

instruments)8. bass -(nontechnical name for any of numerous edible marine and freshwater spiny-finned fishes)


Representations

Most supervised ML approaches require a very simple representation for the input training data. Vectors of sets of feature/value pairs

I.e. files of comma-separated values

So our first task is to extract training data from a corpus with respect to a particular instance of a target word This typically consists of a characterization of

the window of text surrounding the target


Representations

This is where ML and NLP intersect If you stick to trivial surface features

that are easy to extract from a text, then most of the work is in the ML system

If you decide to use features that require more analysis (say parse trees) then the ML part may be doing less work (relatively) if these features are truly informative


Surface Representations

Collocational and co-occurrence information Collocational

Encode features about the words that appear in specific positions to the right and left of the target word Often limited to the words themselves as well as they’re

part of speech

Co-occurrence Features characterizing the words that occur anywhere

in the window regardless of position Typically limited to frequency counts


Examples

Example text (WSJ) An electric guitar and bass player stand

off to one side not really part of the scene, just as a sort of nod to gringo expectations perhaps

Assume a window of +/- 2 from the target


Examples

Example text An electric guitar and bass player stand

off to one side not really part of the scene, just as a sort of nod to gringo expectations perhaps

Assume a window of +/- 2 from the target


Collocational

Position-specific information about the words in the window

guitar and bass player stand [guitar, NN, and, CJC, player, NN, stand,

VVB] In other words, a vector consisting of [position n word, position n part-of-

speech…]


Co-occurrence

Information about the words that occur within the window.

First derive a set of terms to place in the vector.

Then note how often each of those terms occurs in a given window.


Co-Occurrence Example

Assume we’ve settled on a possible vocabulary of 12 words that includes guitar and player but not and and stand

guitar and bass player stand [0,0,0,1,0,0,0,0,0,1,0,0]


Classifiers

Once we cast the WSD problem as a classification problem, then all sorts of techniques are possible Naïve Bayes (the right thing to try first) Decision lists Decision trees MaxEnt Support vector machines Nearest neighbor methods…


Classifiers

The choice of technique, in part, depends on the set of features that have been used Some techniques work better/worse with

features with numerical values Some techniques work better/worse with

features that have large numbers of possible values For example, the feature the word to the left

has a fairly large number of possible values


Naïve Bayes

Argmax P(sense|feature vector) Rewriting with Bayes and assuming

independence of the features


Naïve Bayes

P(s) … just the prior of that sense. Just as with part of speech tagging, not

all senses will occur with equal frequency

P(vj|s)… conditional probability of some particular feature/value combination given a particular sense

You can get both of these from a tagged corpus with the features encoded


Naïve Bayes Test

On a corpus of examples of uses of the word line, naïve Bayes achieved about 73% correct

Good?


Bootstrapping

What if you don’t have enough data to train a system…

Bootstrap Pick a word that you as an analyst think

will co-occur with your target word in particular sense

Grep through your corpus for your target word and the hypothesized word

Assume that the target tag is the right one


Bootstrapping

For bass Assume the word play occurs with the

music sense and the word fish occurs with the fish sense

Grep


Bass Results

Assume these results are correct and use them as training data.

Assume these results are correct and use them as training data.


Problems

Given these general ML approaches, how many classifiers do I need to perform WSD robustly? One for each ambiguous word in the

language How do you decide what set of

tags/labels/senses to use for a given word? Depends on the application


WordNet Bass

Tagging with this set of senses is an impossibly hard task that’s probably overkill for any realistic application

1. bass - (the lowest part of the musical range)2. bass, bass part - (the lowest part in polyphonic music)3. bass, basso - (an adult male singer with the lowest voice)4. sea bass, bass - (flesh of lean-fleshed saltwater fish of the family

Serranidae)5. freshwater bass, bass - (any of various North American lean-fleshed

freshwater fishes especially of the genus Micropterus)6. bass, bass voice, basso - (the lowest adult male singing voice)7. bass - (the member with the lowest range of a family of musical

instruments)8. bass -(nontechnical name for any of numerous edible marine and freshwater spiny-finned fishes)

Possible Applications

Tagging to improve Translation

Tag sets are specific to the language pairs

Q/A and Information retrieval (search) Tag sets are sensitive to frequent query

terms In these cases, you need to be able to

disambiguate on both sides effectively Not always easy with queries

“bat care”


Quiz Roadmap

For parsing/grammars Grammars, trees, derivations, treebanks Algorithms...

CKY Probabilistic CKY

What’s the model? How do you get the parameters? How do you decode (argmax parse)

Why is the basic PCFG model weak? How can you improve it


Quiz Roadmap

For FOL/Compositional semantics stuff there are 2 issues Representing meaning Composing meanings

So I can ask questions like... Here’s a sentence give me a valid reasonable FOL

representation for it Here’s a grammar with semantic attachments; give me the

FOL expression for the following sentence Here’s a grammar and a sentence and a desired FOL

output; give me semantic attachments that would deliver that FOL for that sentence given that grammar


Quiz 2

Chapter 12: Skip 12.7.2, 12.8, and 12.9

Chapter 13: Skip 13.4.2, 13.4.3 Chapter 14: Skip 14.6.2, 14.8, 14.9,

14.10 Chapter 17: Skip 17.4.2, 17.5, 17.6 Chapter 18: Skip 18.3.2, 18.4, 18.5

and 18.6


Schedule

Tuesday 11/19 Quiz 2 Thursday 11/21 Python help session


natural language processing lecture 22—11/14/2013 jim martin

Documents

language processing

semantic processing

thematic roles thematic

wordnet relations slide

resource slide

jim martin

specific roles

wordnet wordnet