1 cs 224u autumn 2007 cs 224u linguist 288/188 natural language understanding jurafsky and manning...

71
1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations Sep 27, 2007 Dan Jurafsky

Upload: letitia-knight

Post on 11-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

1 CS 224U Autumn 2007

CS 224U LINGUIST 288/188Natural Language UnderstandingJurafsky and Manning

Lecture 2: WordNet, word similarity, and sense relations Sep 27, 2007Dan Jurafsky

Page 2: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

2 CS 224U Autumn 2007

Outline: Mainly useful background for today’s papers

1) Lexical Semantics, word-word-relations2) WordNet3) Word Similarity: Thesaurus-based Measures4) Word Similarity: Distributional Measures5) Background: Dependency Parsing

Page 3: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

3 CS 224U Autumn 2007

Three Perspectives on Meaning

1. Lexical Semantics• The meanings of individual words

2. Formal Semantics (or Compositional Semantics or Sentential Semantics)• How those meanings combine to make meanings for

individual sentences or utterances

3. Discourse or PragmaticsHow those meanings combine with each other and with other facts about various kinds of context to make meanings for a text or discourse

– Dialog or Conversation is often lumped together with Discourse

Page 4: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

4 CS 224U Autumn 2007

Relationships between word meanings

HomonymyPolysemySynonymyAntonymyHypernomyHyponomyMeronomy

Page 5: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

5 CS 224U Autumn 2007

Homonymy

Homonymy:Lexemes that share a form

– Phonological, orthographic or both

But have unrelated, distinct meaningsClear example:

– Bat (wooden stick-like thing) vs– Bat (flying scary mammal thing)– Or bank (financial institution) versus bank (riverside)

Can be homophones, homographs, or both:– Homophones:

Write and right Piece and peace

Page 6: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

6 CS 224U Autumn 2007

Homonymy causes problems for NLP applications

Text-to-SpeechSame orthographic form but different phonological form

– bass vs bass

Information retrievalDifferent meanings same orthographic form

– QUERY: bat care

Machine TranslationSpeech recognition

Why?

Page 7: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

7 CS 224U Autumn 2007

Polysemy

The bank is constructed from red brickI withdrew the money from the bank Are those the same sense?Or consider the following WSJ example

While some banks furnish sperm only to married women, others are less restrictive

Which sense of bank is this?– Is it distinct from (homonymous with) the river bank sense?– How about the savings bank sense?

Page 8: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

8 CS 224U Autumn 2007

Polysemy

A single lexeme with multiple related meanings (bank the building, bank the financial institution)Most non-rare words have multiple meanings

The number of meanings is related to its frequencyVerbs tend more to polysemyDistinguishing polysemy from homonymy isn’t always easy (or necessary)

Page 9: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

9 CS 224U Autumn 2007

Metaphor and Metonymy

Specific types of polysemyMetaphor:

Germany will pull Slovenia out of its economic slump.I spent 2 hours on that homework.

MetonymyThe White House announced yesterday.This chapter talks about part-of-speech taggingBank (building) and bank (financial institution)

Page 10: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

10 CS 224U Autumn 2007

Synonyms

Word that have the same meaning in some or all contexts.

filbert / hazelnutcouch / sofabig / largeautomobile / carvomit / throw upWater / H20

Two lexemes are synonyms if they can be successfully substituted for each other in all situations

If so they have the same propositional meaning

Page 11: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

11 CS 224U Autumn 2007

Synonyms

But there are few (or no) examples of perfect synonymy.

Why should that be? Even if many aspects of meaning are identicalStill may not preserve the acceptability based on notions of politeness, slang, register, genre, etc.

Example:Water and H20

Page 12: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

12 CS 224U Autumn 2007

Some terminology

Lemmas and wordformsA lexeme is an abstract pairing of meaning and formA lemma or citation form is the grammatical form that is used to represent a lexeme.

– Carpet is the lemma for carpets– Dormir is the lemma for duermes.

Specific surface forms carpets, sung, duermes are called wordforms

The lemma bank has two senses:Instead, a bank can hold the investments in a custodial account in the client’s nameBut as agriculture burgeons on the east bank, the river will shrink even more.

A sense is a discrete representation of one aspect of the meaning of a word

Page 13: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

13 CS 224U Autumn 2007

Synonymy is a relation between senses rather than words

Consider the words big and largeAre they synonyms?

How big is that plane?Would I be flying on a large or small plane?

How about here:Miss Nelson, for instance, became a kind of big sister to Benjamin.?Miss Nelson, for instance, became a kind of large sister to Benjamin.

Why?big has a sense that means being older, or grown uplarge lacks this sense

Page 14: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

14 CS 224U Autumn 2007

Antonyms

Senses that are opposites with respect to one feature of their meaningOtherwise, they are very similar!

dark / lightshort / longhot / coldup / downin / out

More formally: antonyms candefine a binary opposition or at opposite ends of a scale (long/short, fast/slow)Be reversives: rise/fall, up/down

Page 15: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

15 CS 224U Autumn 2007

Hyponymy

One sense is a hyponym of another if the first sense is more specific, denoting a subclass of the other

car is a hyponym of vehicledog is a hyponym of animalmango is a hyponym of fruit

Converselyvehicle is a hypernym/superordinate of caranimal is a hypernym of dogfruit is a hypernym of mango

superordinate

vehicle fruit furniture mammal

hyponym car mango chair dog

Page 16: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

16 CS 224U Autumn 2007

Hypernymy more formally

Extensional:The class denoted by the superordinateextensionally includes the class denoted by the hyponym

Entailment:A sense A is a hyponym of sense B if being an A entails being a B

Hyponymy is usually transitive (A hypo B and B hypo C entails A hypo C)

Page 17: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

17 CS 224U Autumn 2007

II. WordNet

A hierarchically organized lexical databaseOn-line thesaurus + aspects of a dictionary

– Versions for other languages are under development

Category Unique Forms

Noun 117,097

Verb 11,488

Adjective 22,141

Adverb 4,601

Page 18: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

18 CS 224U Autumn 2007

WordNet

Where it is:http://www.cogsci.princeton.edu/cgi-bin/webwn

Page 19: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

19 CS 224U Autumn 2007

Format of Wordnet Entries

Page 20: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

20 CS 224U Autumn 2007

WordNet Noun Relations

Page 21: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

21 CS 224U Autumn 2007

WordNet Verb Relations

Page 22: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

22 CS 224U Autumn 2007

WordNet Hierarchies

Page 23: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

23 CS 224U Autumn 2007

How is “sense” defined in WordNet?

The set of near-synonyms for a WordNet sense is called a synset (synonym set); it’s their version of a sense or a conceptExample: chump as a noun to mean

‘a person who is gullible and easy to take advantage of’

Each of these senses share this same glossThus for WordNet, the meaning of this sense of chump is this list.

Page 24: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

24 CS 224U Autumn 2007

Word Similarity

Synonymy is a binary relationTwo words are either synonymous or not

We want a looser metricWord similarity orWord distance

Two words are more similarIf they share more features of meaning

Actually these are really relations between senses:Instead of saying “bank is like fund”We say

– Bank1 is similar to fund3– Bank2 is similar to slope5

We’ll compute them over both words and senses

Page 25: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

25 CS 224U Autumn 2007

Why word similarity

Spell CheckingInformation retrievalQuestion answeringMachine translationNatural language generationLanguage modelingAutomatic essay grading

Page 26: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

26 CS 224U Autumn 2007

Two classes of algorithms

Thesaurus-based algorithmsBased on whether words are “nearby” in Wordnet

Distributional algorithmsBy comparing words based on their distributional context in corpora

Page 27: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

27 CS 224U Autumn 2007

Thesaurus-based word similarity

We could use anything in the thesaurusMeronymy, hyponymy, troponymyGlosses and example sentencesDerivational relations and sentence frames

In practiceBy “thesaurus-based” we often mean these 2 cues:

– the is-a/subsumption/hypernym hierarchy– Sometimes using the glosses too

Word similarity versus word relatednessSimilar words are near-synonymsRelated could be related any way

– Car, gasoline: related, not similar– Car, bicycle: similar

Page 28: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

28 CS 224U Autumn 2007

Path based similarity

Two words are similar if nearby in thesaurus hierarchy (i.e. short path between them)

Page 29: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

29 CS 224U Autumn 2007

Refinements to path-based similarity

pathlen(c1,c2) = number of edges in the shortest path in the thesaurus graph between the sense nodes c1 and c2simpath(c1,c2) = -log pathlen(c1,c2)wordsim(w1,w2) =

maxc1senses(w1),c2senses(w2) sim(c1,c2)

Page 30: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

30 CS 224U Autumn 2007

Problem with basic path-based similarity

Assumes each link represents a uniform distanceNickel to money seem closer than nickel to standardInstead:

Want a metric which lets us Represent the cost of each edge independently

Page 31: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

31 CS 224U Autumn 2007

Information content similarity metrics

Let’s define P(C) as:The probability that a randomly selected word in a corpus is an instance of concept cFormally: there is a distinct random variable, ranging over words, associated with each concept in the hierarchyP(root)=1The lower a node in the hierarchy, the lower its probability

Page 32: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

32 CS 224U Autumn 2007

Information content similarity

Train by counting in a corpus1 instance of “dime” could count toward frequency of coin, currency, standard, etc

More formally:

P(c)count(w)

wwords(c )

N

Page 33: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

33 CS 224U Autumn 2007

Information content similarity

WordNet hierarchy augmented with probabilities P(C)

Page 34: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

34 CS 224U Autumn 2007

Information content: definitions

Information content:IC(c)=-logP(c)

Lowest common subsumerLCS(c1,c2) = the lowest common subsumer– I.e. the lowest node in the hierarchy– That subsumes (is a hypernym of) both c1

and c2

We are now ready to see how to use information content IC as a similarity metric

Page 35: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

35 CS 224U Autumn 2007

Resnik method

The similarity between two words is related to their common informationThe more two words have in common, the more similar they areResnik: measure the common information as:

The info content of the lowest common subsumer of the two nodessimresnik(c1,c2) = -log P(LCS(c1,c2))

Page 36: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

36 CS 224U Autumn 2007

Dekang Lin method

Similarity between A and B needs to do more than measure common informationThe more differences between A and B, the less similar they are:

Commonality: the more info A and B have in common, the more similar they areDifference: the more differences between the info in A and B, the less similar

Commonality: IC(Common(A,B))Difference: IC(description(A,B)-IC(common(A,B))

Page 37: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

37 CS 224U Autumn 2007

Dekang Lin method

Similarity theorem: The similarity between A and B is measured by the ratio between the amount of information needed to state the commonality of A and B and the information needed to fully describe what A and B aresimLin(A,B)= log P(common(A,B))

_______________ log P(description(A,B))q Lin furthermore shows (modifying Resnik) that

info in common is twice the info content of the LCS

Page 38: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

38 CS 224U Autumn 2007

Lin similarity function

SimLin(c1,c2) = 2 x log P (LCS(c1,c2)) ________________ log P(c1) + log P(c2)

SimLin(hill,coast) = 2 x log P (geological-formation))

________________ log P(hill) + log P(coast)

= .59

Page 39: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

39 CS 224U Autumn 2007

Extended

Two concepts are similar if their glosses contain similar words

Drawing paper: paper that is specially prepared for use in draftingDecal: the art of transferring designs from specially prepared paper to a wood or glass or metal surface

For each n-word phrase that occurs in both glosses

Add a score of n2 Paper and specially prepared for 1 + 4 = 5…

Page 40: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

40 CS 224U Autumn 2007

Summary: thesaurus-based similarity

Page 41: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

41 CS 224U Autumn 2007

Evaluating thesaurus-based similarity

Intrinsic Evaluation:Correlation coefficient between

– algorithm scores– word similarity ratings from humans

Extrinsic (task-based, end-to-end) Evaluation:Embed in some end application

– Malapropism (spelling error) detection– WSD– Essay grading– Language modeling in some application

Page 42: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

42 CS 224U Autumn 2007

Problems with thesaurus-based methods

We don’t have a thesaurus for every languageEven if we do, many words are missingThey rely on hyponym info:

Strong for nouns, but lacking for adjectives and even verbs

AlternativeDistributional methods for word similarity

Page 43: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

43 CS 224U Autumn 2007

Distributional methods for word similarity

Firth (1957): “You shall know a word by the company it keeps!”Nida example noted by Lin:

A bottle of tezgüino is on the tableEverybody likes tezgüinoTezgüino makes you drunkWe make tezgüino out of corn.

Intuition: just from these contexts a human could guess meaning of tezguinoSo we should look at the surrounding contexts, see what other words have similar context.

Page 44: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

44 CS 224U Autumn 2007

Context vector

Consider a target word wSuppose we had one binary feature fi for each of the N words in the lexicon vi

Which means “word vi occurs in the neighborhood of w”w=(f1,f2,f3,…,fN)If w=tezguino, v1 = bottle, v2 = drunk, v3 = matrix:w = (1,1,0,…)

Page 45: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

45 CS 224U Autumn 2007

Intuition

Define two words by these sparse features vectorsApply a vector distance metricSay that two words are similar if two vectors are similar

Page 46: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

46 CS 224U Autumn 2007

Distributional similarity

So we just need to specify 3 things1.How the co-occurrence terms are

defined2.How terms are weighted

– (frequency? Logs? Mutual information?)

3.What vector distance metric should we use?– Cosine? Euclidean distance?

Page 47: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

47 CS 224U Autumn 2007

Defining co-occurrence vectors

We could have windows of neighboring wordsBag-of-wordsWe generally remove stopwords

But the vectors are still very sparseSo instead of using ALL the words in the neighborhoodLet’s just the words occurring in particular relations

Page 48: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

48 CS 224U Autumn 2007

Defining co-occurrence vectors

Zellig Harris (1968)The meaning of entities, and the meaning of grammatical relations among them, is related to the restriction of combinations of these entitites relative to other entities

Idea: parse the sentence, extract syntactic dependencies:

Page 49: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

49 CS 224U Autumn 2007

Quick background: Dependency Parsing

Among the earliest kinds of parsers in NLPDrew linguistic insights from the work of L. Tesniere (1959)David Hays, one of the founders of computational linguistics, built early (first?) dependency parser (Hays 1962)The idea dates back to the ancient Greek and Indian grammarians of “parsing” into subject and predicateA sentence is parsed by relating each word to other words in the sentence which depend on it.

Page 50: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

50 CS 224U Autumn 2007

A sample dependency parse

Page 51: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

51 CS 224U Autumn 2007

Dependency parsers

MINIPAR is Lin’s parserhttp://www.cs.ualberta.ca/~lindek/minipar.htmAnother one is the Link Grammar parser: http://www.link.cs.cmu.edu/link/

Standard “CFG” parsers like the Stanford parser

http://www-nlp.stanford.edu/software/lex-parser.shtml

can also produce dependency representations, as follows

Page 52: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

52 CS 224U Autumn 2007

The relationship between a CFG parse and a dependency parse (1)

Page 53: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

53 CS 224U Autumn 2007

The relationship between a CFG parse and a dependency parse (2)

Page 54: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

54 CS 224U Autumn 2007

Conversion from CFG to dependency parse

CFG’s include “head rules”The head of a Noun Phrase is a nounThe head of a Verb Phrase is a verb.Etc.

The head rules can be used to extract a dependency parse from a CFG parse (follow the heads).

Page 55: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

55 CS 224U Autumn 2007

Popping back: Co-occurrence vectors based on dependencies

For the word “cell”: vector of NxR featuresR is the number of dependency relations

Page 56: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

56 CS 224U Autumn 2007

2. Weighting the counts (“Measures of association with context”)

We have been using the frequency of some feature as its weight or valueBut we could use any function of this frequencyLet’s consider one featuref=(r,w’) = (obj-of,attack)P(f|w)=count(f,w)/count(w)

Assocprob(w,f)=p(f|w)

Page 57: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

58 CS 224U Autumn 2007

Weighting: Mutual Information

Mutual information: between 2 random variables X and Y

Pointwise mutual information: measure of how often two events x and y occur, compared with what we would expect if they were independent:

Page 58: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

59 CS 224U Autumn 2007

Weighting: Mutual Information

Pointwise mutual information: measure of how often two events x and y occur, compared with what we would expect if they were independent:

PMI between a target word w and a feature f :

Page 59: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

60 CS 224U Autumn 2007

Mutual information intuition

Objects of the verb drink

Page 60: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

61 CS 224U Autumn 2007

Lin is a variant on PMI

Pointwise mutual information: measure of how often two events x and y occur, compared with what we would expect if they were independent:

PMI between a target word w and a feature f :

Lin measure: breaks down expected value for P(f) differently:

Page 61: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

62 CS 224U Autumn 2007

Summary: weightings

See Manning and Schuetze (1999) for more

Page 62: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

63 CS 224U Autumn 2007

3. Defining similarity between vectors

Page 63: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

64 CS 224U Autumn 2007

Summary of similarity measures

Page 64: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

65 CS 224U Autumn 2007

Evaluating similarity

Intrinsic Evaluation:Correlation coefficient between algorithm scores

– And word similarity ratings from humans

Extrinsic (task-based, end-to-end) Evaluation:– Malapropism (spelling error) detection– WSD– Essay grading– Taking TOEFL multiple-choice vocabulary tests– Language modeling in some application

Page 65: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

66Part III: Natural Language Processing

Part III: Natural Language ProcessingCS 224U Autumn 2007

An example of detected plagiarism

Page 66: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

67 CS 224U Autumn 2007

What about other relations?

Similarity can be used for adding new links to a thesaurus, and Lin used thesaurus induction as his motivationBut thesauruses have more structure than just similarityIn particular, hyponym/hypernym structure

Page 67: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

68 CS 224U Autumn 2007

Detecting hyponymy and other relations

Could we discover new hyponyms, and add them to a taxonomy under the appropriate hypernym?Why is this important? Some examples from Rion Snow:

“insulin” and “progesterone are in WN 2.1, but “leptin” and “pregnenolone” are not.

“combustibility” and “navigability”,but not “affordability”, “reusability”, or “extensibility”.

“HTML” and “SGML”, but not “XML” or “XHTML”.

“Google” and “Yahoo”, but not “Microsoft” or “IBM”.

This unknown word problem occurs throughout NLP

Page 68: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

69 CS 224U Autumn 2007

Hearst Approach

Agar is a substance prepared from a mixture of red algae, such as Gelidium, for laboratory or industrial use.What does Gelidium mean? How do you know?

Page 69: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

70 CS 224U Autumn 2007

Hearst’s hand-built patterns

Page 70: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

71 CS 224U Autumn 2007

What to do for the data assignments

Some things people did last year on the wordnet assignmentNotice interesting inconsistencies or incompleteness in Wordnet

There is no link in the WordNet synset between "kitten" or "kitty" and "cat”.

– But the entry for "puppy" lists "dog" as a direct hypernym but does not list "young mammal" as one.

“Sister term” relation is nontransitive and nonsymmetric“entailment” relation incomplete; "Snore" entails "sleep," but "die"doesn't entail "live.”antonymy is not a reflexive relation in WordNet

Notice potential problems in wordnetLots of rare sensesLots of senses are very very similar, hard to distinguishLack of rich detail about each entry (focus only on rich relational info)

Page 71: 1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations

72 CS 224U Autumn 2007

Notice interesting things It appears that WordNet verbs do not follow as strict a hierarchy as the nouns.

What percentage of words have one sense?