lecture 24 distributional word similarity ii topics distributional based word similarity example pmi...

34
Lecture 24 Distributional Word Similarity II Topics Topics Distributional based word similarity example PMI context = syntactic dependencies Readings: Readings: NLTK book Chapter 2 (wordnet) Text Chapter 20 April 15, 2013 CSCE 771 Natural Language Processing

Upload: kenneth-hicks

Post on 08-Jan-2018

230 views

Category:

Documents


0 download

DESCRIPTION

– 3 – CSCE 771 Spring 2013 Pointwise Mutual Informatiom (PMI)  mutual Information Church and Hanks 1989  (eq 20.36)  Pointwise Mutual Information (PMI) Fano 1961 . (eq 20.37)  assoc-PMI  (eq 20.38)

TRANSCRIPT

Page 1: Lecture 24 Distributional Word Similarity II Topics Distributional based word similarity example PMI context = syntactic dependenciesReadings: NLTK book

Lecture 24Distributional Word Similarity II

Topics Topics Distributional based word similarity example PMI context = syntactic dependencies

Readings:Readings: NLTK book Chapter 2 (wordnet) Text Chapter 20

April 15, 2013

CSCE 771 Natural Language Processing

Page 2: Lecture 24 Distributional Word Similarity II Topics Distributional based word similarity example PMI context = syntactic dependenciesReadings: NLTK book

– 2 – CSCE 771 Spring 2013

OverviewLast TimeLast Time

Finish up Thesaurus based similarity …

Distributional based word similarity

TodayToday Last Lectures slides 21- Distributional based word similarity II syntax based contexts

Readings: Readings: Text 19,20 NLTK Book: Chapter 10

Next Time: Computational Lexical Semantics IINext Time: Computational Lexical Semantics II

Page 3: Lecture 24 Distributional Word Similarity II Topics Distributional based word similarity example PMI context = syntactic dependenciesReadings: NLTK book

– 3 – CSCE 771 Spring 2013

Pointwise Mutual Informatiom (PMI) mutual Information Church and Hanks 1989mutual Information Church and Hanks 1989 (eq 20.36) (eq 20.36)

Pointwise Mutual Information (PMI) Fano 1961Pointwise Mutual Information (PMI) Fano 1961 . (eq 20.37) . (eq 20.37)

assoc-PMIassoc-PMI (eq 20.38) (eq 20.38)

Page 4: Lecture 24 Distributional Word Similarity II Topics Distributional based word similarity example PMI context = syntactic dependenciesReadings: NLTK book

– 4 – CSCE 771 Spring 2013

Computing PPMI Matrix F with W (words) rows and C (contexts) Matrix F with W (words) rows and C (contexts)

columnscolumns ffijij is frequency of w is frequency of wii in c in cjj, ,

Page 5: Lecture 24 Distributional Word Similarity II Topics Distributional based word similarity example PMI context = syntactic dependenciesReadings: NLTK book

– 5 – CSCE 771 Spring 2013

Example computing PPMI

computer data pinch result salt

apricot 0 0 1 0 1

pineapple 0 0 1 0 1

digital 2 1 0 1 0

information 1 6 0 4 0

Word Similarity_ Distributional Similarity I --NLP Jurafsky & Manning

p(w information, c=data) = p(w information) =

p(c=data) =

Page 6: Lecture 24 Distributional Word Similarity II Topics Distributional based word similarity example PMI context = syntactic dependenciesReadings: NLTK book

– 6 – CSCE 771 Spring 2013

Example computing PPMI

computer data pinch result salt

apricot 0 0 1 0 1

pineapple 0 0 1 0 1

digital 2 1 0 1 0

information 1 6 0 4 0

Word Similarity_ Distributional Similarity I --NLP Jurafsky & Manning

p(w information, c=data) = p(w information) =

p(c=data) =

Page 7: Lecture 24 Distributional Word Similarity II Topics Distributional based word similarity example PMI context = syntactic dependenciesReadings: NLTK book

– 7 – CSCE 771 Spring 2013

Associations

Page 8: Lecture 24 Distributional Word Similarity II Topics Distributional based word similarity example PMI context = syntactic dependenciesReadings: NLTK book

– 8 – CSCE 771 Spring 2013

PMI: More data trumps smarter algorithms

“More data trumps smarter algorithms: Comparing pointwise mutual information with latent semantic analysis”Indiana University, 2009http://www.indiana.edu/~clcl/Papers/BSC901.pdf“we demonstrate that this metric • benefits from training on extremely large amounts of

data and • correlates more closely with human semantic

similarity ratings than do publicly available implementations of several more complex models. “

Page 9: Lecture 24 Distributional Word Similarity II Topics Distributional based word similarity example PMI context = syntactic dependenciesReadings: NLTK book

– 9 – CSCE 771 Spring 2013

Figure 20.10 Co-occurrence vectors Based on syntactic dependencies Dependency based parser – special case of shallow Dependency based parser – special case of shallow

parsingparsing identify from “I discovered dried tangerines.” (20.32)identify from “I discovered dried tangerines.” (20.32)

discover(subject I) I(subject-of discover) tangerine(obj-of discover) tangerine(adj-mod dried)

Page 10: Lecture 24 Distributional Word Similarity II Topics Distributional based word similarity example PMI context = syntactic dependenciesReadings: NLTK book

– 10 – CSCE 771 Spring 2013

Defining Context using syntactic info• dependency parsingdependency parsing• chunkingchunking

discover(subject I) -- S NP VP I(subject-of discover) tangerine(obj-of discover) -- VP verb NP tangerine(adj-mod dried) -- NP det ? ADJ N

Page 11: Lecture 24 Distributional Word Similarity II Topics Distributional based word similarity example PMI context = syntactic dependenciesReadings: NLTK book

– 11 – CSCE 771 Spring 2013

Figure 20.11 Objects of the verb drink Hindle 1990 ACL

• frequenciesfrequencies• it, much and anything

more frequent than wine

• PMI-AssocPMI-Assoc• wine more drinkable

Object Count PMI-Assoc

tea 4 11.75

Pepsi 2 11.75

champagne 4 11.75

liquid 2 10.53

beer 5 10.20

wine 2 9.34

water 7 7.65

anything 3 5.15

much 3 2.54

it 3 1.25<some Amounnt> 2 1.22

http://acl.ldc.upenn.edu/P/P90/P90-1034.pdf

Page 12: Lecture 24 Distributional Word Similarity II Topics Distributional based word similarity example PMI context = syntactic dependenciesReadings: NLTK book

– 12 – CSCE 771 Spring 2013

vectors reviewdot-productdot-product

lengthlength

sim-cosinesim-cosine

Page 13: Lecture 24 Distributional Word Similarity II Topics Distributional based word similarity example PMI context = syntactic dependenciesReadings: NLTK book

– 13 – CSCE 771 Spring 2013

Figure 20.12 Similarity of Vectors

Page 14: Lecture 24 Distributional Word Similarity II Topics Distributional based word similarity example PMI context = syntactic dependenciesReadings: NLTK book

– 14 – CSCE 771 Spring 2013

Fig 20.13 Vector Similarity Summary

Page 15: Lecture 24 Distributional Word Similarity II Topics Distributional based word similarity example PMI context = syntactic dependenciesReadings: NLTK book

– 15 – CSCE 771 Spring 2013

Figure 20.14 Hand-built patterns for hypernyms Hearst 1992 Finding hypernyms (IS-A links)Finding hypernyms (IS-A links) (20.58) One example of red algae is Gelidium.(20.58) One example of red algae is Gelidium. one example of *** is a ***one example of *** is a ***

500,000 hits on google

Semantic drift in bootstrappingSemantic drift in bootstrapping

Page 16: Lecture 24 Distributional Word Similarity II Topics Distributional based word similarity example PMI context = syntactic dependenciesReadings: NLTK book

– 16 – CSCE 771 Spring 2013

Hyponym Learning Alg. (Snow 2005)Rely on wordnet to learn large numbers of weak hyponym patternsRely on wordnet to learn large numbers of weak hyponym patterns

Snow’s AlgorithmSnow’s Algorithm

1.1. Collect all pairs of wordnet noun concepts with <cCollect all pairs of wordnet noun concepts with <c ii IS-A c IS-A cjj,>,>

2.2. For each pair collect all sentences containing the pairFor each pair collect all sentences containing the pair

3.3. Parse the sentences and automatically extract every possible Hearst-Parse the sentences and automatically extract every possible Hearst-style syntactic patterns from the parse treestyle syntactic patterns from the parse tree

4.4. Use the large set of patterns as features in a logistic regression classifierUse the large set of patterns as features in a logistic regression classifier

5.5. Given each pair extract features and use the classifier to determine if the Given each pair extract features and use the classifier to determine if the pair is a hypernym/hyponympair is a hypernym/hyponym

New patterns learnedNew patterns learned NPH like NP NP is a NPH

NPH called NP NP, a NPH (appositive)

Page 17: Lecture 24 Distributional Word Similarity II Topics Distributional based word similarity example PMI context = syntactic dependenciesReadings: NLTK book

– 17 – CSCE 771 Spring 2013

Vector Similarities from Lin 1998 hope (N):hope (N):

optimism 0.141, chance 0.137, expectation 0.137, prospect 0.126, dream 0.119, desire 0.118, fear 0.116, effort 0.111, confidence 0.109, promise 0.108

hope(V)hope(V) would like 0.158, wish 0.140. …

brief (N)brief (N) legal brief 0.256, affidavit 0.191, …

brief (A)brief (A) lengthy .256, hour-long 0.191, short 0.174, extended 0.163 …

full lists on page 667full lists on page 667

Page 18: Lecture 24 Distributional Word Similarity II Topics Distributional based word similarity example PMI context = syntactic dependenciesReadings: NLTK book

– 18 – CSCE 771 Spring 2013

Supersenses26 broad-category “lexicograher class” wordnet labels26 broad-category “lexicograher class” wordnet labels

Page 19: Lecture 24 Distributional Word Similarity II Topics Distributional based word similarity example PMI context = syntactic dependenciesReadings: NLTK book

– 19 – CSCE 771 Spring 2013

Figure 20.15 Semantic Role Labelling

Page 20: Lecture 24 Distributional Word Similarity II Topics Distributional based word similarity example PMI context = syntactic dependenciesReadings: NLTK book

– 20 – CSCE 771 Spring 2013

Figure 20.16

Page 21: Lecture 24 Distributional Word Similarity II Topics Distributional based word similarity example PMI context = syntactic dependenciesReadings: NLTK book

– 21 – CSCE 771 Spring 2013

google(Wordnet NLTK)..

Page 22: Lecture 24 Distributional Word Similarity II Topics Distributional based word similarity example PMI context = syntactic dependenciesReadings: NLTK book

– 22 – CSCE 771 Spring 2013

wn01.py# Wordnet examples from nltk.googlecode.com # Wordnet examples from nltk.googlecode.com

import nltkimport nltk

from nltk.corpus import wordnet as wnfrom nltk.corpus import wordnet as wn

motorcar = wn.synset('car.n.01')motorcar = wn.synset('car.n.01')

types_of_motorcar = motorcar.hyponyms()types_of_motorcar = motorcar.hyponyms()

types_of_motorcar[26]types_of_motorcar[26]

print wn.synset('ambulance.n.01')print wn.synset('ambulance.n.01')

print sorted([lemma.name for synset in types_of_motorcar print sorted([lemma.name for synset in types_of_motorcar for lemma in synset.lemmas])for lemma in synset.lemmas])• http://nltk.googlecode.com/svn/trunk/doc/howto/wordnet.html

http://nltk.googlecode.com/svn/trunk/doc/howto/wordnet.html

Page 23: Lecture 24 Distributional Word Similarity II Topics Distributional based word similarity example PMI context = syntactic dependenciesReadings: NLTK book

– 23 – CSCE 771 Spring 2013

wn01.py continuedprint "wn.synsets('dog', pos=wn.VERB)= ", print "wn.synsets('dog', pos=wn.VERB)= ",

wn.synsets('dog', pos=wn.VERB)wn.synsets('dog', pos=wn.VERB)

print wn.synset('dog.n.01')print wn.synset('dog.n.01')

### Synset('dog.n.01')### Synset('dog.n.01')

print wn.synset('dog.n.01').definitionprint wn.synset('dog.n.01').definition

###'a member of the genus Canis (probably ###'a member of the genus Canis (probably descended from the common wolf) that has been descended from the common wolf) that has been domesticated by man since prehistoric times; occurs domesticated by man since prehistoric times; occurs in many breeds'in many breeds'

print wn.synset('dog.n.01').examplesprint wn.synset('dog.n.01').examples

### ['the dog barked all night']### ['the dog barked all night']

Page 24: Lecture 24 Distributional Word Similarity II Topics Distributional based word similarity example PMI context = syntactic dependenciesReadings: NLTK book

– 24 – CSCE 771 Spring 2013

wn01.py continuedprint wn.synset('dog.n.01').lemmasprint wn.synset('dog.n.01').lemmas

###[Lemma('dog.n.01.dog'), ###[Lemma('dog.n.01.dog'), Lemma('dog.n.01.domestic_dog'), Lemma('dog.n.01.domestic_dog'), Lemma('dog.n.01.Canis_familiaris')]Lemma('dog.n.01.Canis_familiaris')]

print [lemma.name for lemma in print [lemma.name for lemma in wn.synset('dog.n.01').lemmas]wn.synset('dog.n.01').lemmas]

### ['dog', 'domestic_dog', 'Canis_familiaris']### ['dog', 'domestic_dog', 'Canis_familiaris']

print wn.lemma('dog.n.01.dog').synsetprint wn.lemma('dog.n.01.dog').synset

Page 25: Lecture 24 Distributional Word Similarity II Topics Distributional based word similarity example PMI context = syntactic dependenciesReadings: NLTK book

– 25 – CSCE 771 Spring 2013

Section 2 synsets, hypernyms, hyponyms# Section 2 Synsets, # Section 2 Synsets, hypernyms, hyponymshypernyms, hyponyms

import nltkimport nltk

from nltk.corpus import from nltk.corpus import wordnet as wnwordnet as wn

dog = wn.synset('dog.n.01')dog = wn.synset('dog.n.01')

print "dog hyperyms=", print "dog hyperyms=", dog.hypernyms()dog.hypernyms()

###dog hyperyms= ###dog hyperyms= [Synset('domestic_animal.n.01')[Synset('domestic_animal.n.01'), Synset('canine.n.02')], Synset('canine.n.02')]

print "dog hyponyms=", print "dog hyponyms=", dog.hyponyms()dog.hyponyms()

print "dog holonyms=", print "dog holonyms=", dog.member_holonyms()dog.member_holonyms()

print "dog.roo_hyperyms=", print "dog.roo_hyperyms=", dog.root_hypernyms()dog.root_hypernyms()

good = wn.synset('good.a.01')good = wn.synset('good.a.01')

###print "good.antonyms()=", ###print "good.antonyms()=", good.antonyms()good.antonyms()

print print "good.lemmas[0].antonyms()=", "good.lemmas[0].antonyms()=", good.lemmas[0].antonyms()good.lemmas[0].antonyms()

Page 26: Lecture 24 Distributional Word Similarity II Topics Distributional based word similarity example PMI context = syntactic dependenciesReadings: NLTK book

– 26 – CSCE 771 Spring 2013

wn03-Lemmas.py### Section 3 Lemmas### Section 3 Lemmaseat = wn.lemma('eat.v.03.eat')eat = wn.lemma('eat.v.03.eat')print eatprint eatprint eat.keyprint eat.keyprint eat.count()print eat.count()print wn.lemma_from_key(eat.key)print wn.lemma_from_key(eat.key)print print wn.lemma_from_key(eat.key).synsewn.lemma_from_key(eat.key).synsettprint print wn.lemma_from_key( 'feeblemindewn.lemma_from_key( 'feebleminded%5:00:00:retarded:00')d%5:00:00:retarded:00')for lemma in for lemma in wn.synset('eat.v.03').lemmas:wn.synset('eat.v.03').lemmas: print lemma, lemma.count()print lemma, lemma.count()

for lemma in wn.lemmas('eat', 'v'):for lemma in wn.lemmas('eat', 'v'):

print lemma, lemma.count()print lemma, lemma.count()

vocal = vocal = wn.lemma('vocal.a.01.vocal')wn.lemma('vocal.a.01.vocal')

print print vocal.derivationally_related_forms(vocal.derivationally_related_forms())

#[Lemma('vocalize.v.02.vocalize')]#[Lemma('vocalize.v.02.vocalize')]

print vocal.pertainyms()print vocal.pertainyms()

#[Lemma('voice.n.02.voice')]#[Lemma('voice.n.02.voice')]

print vocal.antonyms()print vocal.antonyms()

Page 27: Lecture 24 Distributional Word Similarity II Topics Distributional based word similarity example PMI context = syntactic dependenciesReadings: NLTK book

– 27 – CSCE 771 Spring 2013

wn04-VerbFrames.py# Section 4 Verb Frames# Section 4 Verb Framesprint wn.synset('think.v.01').frame_idsprint wn.synset('think.v.01').frame_idsfor lemma in wn.synset('think.v.01').lemmas:for lemma in wn.synset('think.v.01').lemmas: print lemma, lemma.frame_idsprint lemma, lemma.frame_ids print lemma.frame_stringsprint lemma.frame_stringsprint wn.synset('stretch.v.02').frame_idsprint wn.synset('stretch.v.02').frame_ids

for lemma in wn.synset('stretch.v.02').lemmas:for lemma in wn.synset('stretch.v.02').lemmas: print lemma, lemma.frame_idsprint lemma, lemma.frame_ids print lemma.frame_stringsprint lemma.frame_strings

Page 28: Lecture 24 Distributional Word Similarity II Topics Distributional based word similarity example PMI context = syntactic dependenciesReadings: NLTK book

– 28 – CSCE 771 Spring 2013

wn05-Similarity.py### Section 5 Similarity### Section 5 Similarityimport nltkimport nltkfrom nltk.corpus import wordnet as wnfrom nltk.corpus import wordnet as wn

dog = wn.synset('dog.n.01')dog = wn.synset('dog.n.01')cat = wn.synset('cat.n.01')cat = wn.synset('cat.n.01')print dog.path_similarity(cat)print dog.path_similarity(cat)print dog.lch_similarity(cat)print dog.lch_similarity(cat)print dog.wup_similarity(cat)print dog.wup_similarity(cat)

from nltk.corpus import wordnet_icfrom nltk.corpus import wordnet_icbrown_ic = wordnet_ic.ic('ic-brown.dat')brown_ic = wordnet_ic.ic('ic-brown.dat')semcor_ic = wordnet_ic.ic('ic-semcor.dat')semcor_ic = wordnet_ic.ic('ic-semcor.dat')

Page 29: Lecture 24 Distributional Word Similarity II Topics Distributional based word similarity example PMI context = syntactic dependenciesReadings: NLTK book

– 29 – CSCE 771 Spring 2013

wn05-Similarity.py continuedfrom nltk.corpus import genesisfrom nltk.corpus import genesis

genesis_ic = wn.ic(genesis, False, 0.0)genesis_ic = wn.ic(genesis, False, 0.0)

print dog.res_similarity(cat, brown_ic)print dog.res_similarity(cat, brown_ic)

print dog.res_similarity(cat, genesis_ic)print dog.res_similarity(cat, genesis_ic)

print dog.jcn_similarity(cat, brown_ic)print dog.jcn_similarity(cat, brown_ic)

print dog.jcn_similarity(cat, genesis_ic)print dog.jcn_similarity(cat, genesis_ic)

print dog.lin_similarity(cat, semcor_ic)print dog.lin_similarity(cat, semcor_ic)

Page 30: Lecture 24 Distributional Word Similarity II Topics Distributional based word similarity example PMI context = syntactic dependenciesReadings: NLTK book

– 30 – CSCE 771 Spring 2013

wn06-AccessToAllSynsets.py### Section 6 access to all synsets### Section 6 access to all synsetsimport nltkimport nltkfrom nltk.corpus import wordnet as wnfrom nltk.corpus import wordnet as wn

for synset in list(wn.all_synsets('n'))[:10]:for synset in list(wn.all_synsets('n'))[:10]: print synsetprint synset

wn.synsets('dog')wn.synsets('dog')wn.synsets('dog', pos='v')wn.synsets('dog', pos='v')from itertools import islicefrom itertools import islice

for synset in islice(wn.all_synsets('n'), 5):for synset in islice(wn.all_synsets('n'), 5): print synset, synset.hypernyms()print synset, synset.hypernyms()

Page 31: Lecture 24 Distributional Word Similarity II Topics Distributional based word similarity example PMI context = syntactic dependenciesReadings: NLTK book

– 31 – CSCE 771 Spring 2013

wn07-Morphy.py# Wordnet in NLTK# Wordnet in NLTK# # http://nltk.googlecode.com/svn/trunk/doc/howto/wordnet.htmlhttp://nltk.googlecode.com/svn/trunk/doc/howto/wordnet.html

import nltkimport nltk

from nltk.corpus import wordnet as wnfrom nltk.corpus import wordnet as wn

### Section 7 Morphy### Section 7 Morphy

print wn.morphy('denied', wn.NOUN)print wn.morphy('denied', wn.NOUN)

print wn.synsets('denied', wn.NOUN)print wn.synsets('denied', wn.NOUN)

print wn.synsets('denied', wn.VERB) print wn.synsets('denied', wn.VERB)

Page 32: Lecture 24 Distributional Word Similarity II Topics Distributional based word similarity example PMI context = syntactic dependenciesReadings: NLTK book

– 32 – CSCE 771 Spring 2013

8   Regression TestsBug 85: morphy returns the base form of a word, if it's input is Bug 85: morphy returns the base form of a word, if it's input is

given as a base form for a POS for which that word is not given as a base form for a POS for which that word is not defined:defined:

>>> wn.synsets('book', wn.NOUN)>>> wn.synsets('book', wn.NOUN)

[Synset('book.n.01'), Synset('book.n.02'), Synset('record.n.05'), [Synset('book.n.01'), Synset('book.n.02'), Synset('record.n.05'), Synset('script.n.01'), Synset('ledger.n.01'), Synset('book.n.06'), Synset('script.n.01'), Synset('ledger.n.01'), Synset('book.n.06'), Synset('book.n.07'), Synset('koran.n.01'), Synset('bible.n.01'), Synset('book.n.07'), Synset('koran.n.01'), Synset('bible.n.01'), Synset('book.n.10'), Synset('book.n.11')]Synset('book.n.10'), Synset('book.n.11')]

>>> wn.synsets('book', wn.ADJ)>>> wn.synsets('book', wn.ADJ)

[][]

>>> wn.morphy('book', wn.NOUN)>>> wn.morphy('book', wn.NOUN)

'book''book'

>>> wn.morphy('book', wn.ADJ)>>> wn.morphy('book', wn.ADJ)

Page 33: Lecture 24 Distributional Word Similarity II Topics Distributional based word similarity example PMI context = syntactic dependenciesReadings: NLTK book

– 33 – CSCE 771 Spring 2013

nltk.corpus.reader.wordnet. ic(self, corpus, weight_senses_equally=False, (self, corpus, weight_senses_equally=False,

smoothing=1.0)smoothing=1.0)Creates an information content lookup dictionary from a Creates an information content lookup dictionary from a corpus.corpus.

http://nltk.googlecode.com/svn/trunk/doc/api/nltk.corpus.reader.wordnet-pysrc.html#WordNetCorpusReader.ic

def demo(): def demo(): import nltk import nltk print('loading wordnet') print('loading wordnet') wn = wn = WordNetCorpusReader(nltk.data.find('corpora/wordnet'))WordNetCorpusReader(nltk.data.find('corpora/wordnet'))

print('done loading') print('done loading') S = wn.synset S = wn.synset L = wn.lemmaL = wn.lemma

Page 34: Lecture 24 Distributional Word Similarity II Topics Distributional based word similarity example PMI context = syntactic dependenciesReadings: NLTK book

– 34 – CSCE 771 Spring 2013

root_hypernymsdef root_hypernyms(self): def root_hypernyms(self):

"""Get the topmost hypernyms of this synset in """Get the topmost hypernyms of this synset in WordNet.""" WordNet."""

result = [] result = []

seen = set() seen = set()

todo = [self] while todo: todo = [self] while todo:

next_synset = todo.pop() next_synset = todo.pop()

if next_synset not in seen: if next_synset not in seen:

seen.add(next_synset) seen.add(next_synset)

next_hypernyms = next_synset.hypernyms() + …next_hypernyms = next_synset.hypernyms() + …

return resultreturn result