computer science cpsc422, lecture 24carenini/teaching/cpsc422-19/lecture24... · semantics,”...

38
CPSC 422, Lecture 24 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 24 Nov, 1, 2019 Slide credit: Satanjeev Banerjee Ted Pedersen 2003, Jurfsky & Martin 2008-2016

Upload: others

Post on 22-May-2020

8 views

Category:

Documents


1 download

TRANSCRIPT

CPSC 422, Lecture 24 Slide 1

Intelligent Systems (AI-2)

Computer Science cpsc422, Lecture 24

Nov, 1, 2019

Slide credit: Satanjeev Banerjee Ted Pedersen 2003, Jurfsky & Martin 2008-2016

CPSC 422, Lecture 24 2

Lecture Overview

• Semantic Similarity/Distance

• Concepts: Thesaurus/Ontology Methods

• Words: Distributional Methods

Why words/concepts similarity is important ?

“fast” is similar to “rapid”

“tall” is similar to “height”

Question answering:

Q: “How tall is Mt. Everest?”Candidate A: “The official height of Mount Everest is 29029 feet”

• Extends to sentence/paragraph similarity

• Summarization: identify and eliminate redundancy, aggregate similar phrase/sentences

• ………

3 11/1/2019

CPSC 422, Lecture 24

4

WordNet: entry for “table”The noun "table" has 6 senses in WordNet.

The verb "table" has 1 sense in WordNet.

1. postpone, prorogue, hold over, put over,

table, shelve, set back, defer, remit, put off –

(hold back to a later time; "let's postpone the exam")

1. table, tabular array (a set of data arranged in rows and columns) "see

table 1"

2. table (a piece of furniture having a smooth flat top that is usually

supported by one or more vertical legs) "it was a sturdy table"

3. table (a piece of furniture with tableware for a meal laid out on it) "I

reserved a table at my favorite restaurant"

4. mesa, table (flat tableland with steep edges) "the tribe was relatively safe

on the mesa but they had to descend into the valley for water"

5. table (a company of people assembled at a table for a meal or game) "he

entertained the whole table with his witty remarks"

6. board, table (food or meals in general) "she sets a fine table"; "room and

board"

CPSC 422, Lecture 24 5

WordNet Relations (between synsets!)

fi

Visualizing Wordnet Relations

CPSC 422, Lecture 24 Slide 6

C. Collins, “WordNet Explorer: Applying visualization principles to lexical

semantics,” University of Toronto, Technical Report kmdi 2007-2, 2007.

CPSC 422, Lecture 24 7

Semantic Similarity/Distance: example

(n) table -- (a piece of furniture having a smooth flat top that is usually supported by one or more vertical legs)

(n) mesa, table --(flat tableland with steep edges)

(n) lamp (a piece of furniture holding one or more electric light bulbs)

(n) hill (a local and well-defined elevation of the land)

CPSC 422, Lecture 24 8

Semantic Similarity/DistanceBetween two concepts in an ontology, e.g., between

two senses in Wordnet

What would you use to compute it ?

A. The distance between the two concepts in

the underlying hierarchies / graphs

B. The glosses of the concepts

C. None of the above D. Both of the above

9

Gloss Overlaps ≈ Relatedness

►Lesk’s (1986) idea: Related word senses are (often) defined using the same words. E.g:

▪ bank(1): “a financial institution”

▪ bank(2): “sloping land beside a body of water”

▪ lake: “a body of water surrounded by land”

10

Gloss Overlaps ≈ Relatedness

►Lesk’s (1986) idea: Related word senses are (often) defined using the same words. E.g:

▪ bank(1): “a financial institution”

▪ bank(2): “sloping land beside a body of water”

▪ lake: “a body of water surrounded by land”

11

Gloss Overlaps ≈ Relatedness

►Lesk’s (1986) idea: Related word senses are (often) defined using the same words. E.g:

▪ bank(1): “a financial institution”

▪ bank(2): “sloping land beside a body of water”

▪ lake: “a body of water surrounded by land”

►Gloss overlaps = # content words common to two glosses ≈ relatedness

▪ Thus, relatedness (bank(2), lake) = 3

▪ And, relatedness (bank(1), lake) = 0

12

Limitations of (Lesk’s)Gloss Overlaps

►Most glosses are very short.▪ So not enough words to find overlaps with.

►Solution?

Extended gloss overlaps▪ Add glosses of synsets connected to the input

synsets.

13

sentence: “the

penalty meted out to one

adjudged guilty”

bench: “persons

who hear cases in a

court of law”

# overlapped words = 0

Extending a Gloss

14

sentence: “the

penalty meted out to one

adjudged guilty”

final judgment: “a

judgment disposing of the case

before the court of law”

bench: “persons

who hear cases in a

court of law”

hypernym

# overlapped words = 0

Extending a Gloss

15

sentence: “the

penalty meted out to one

adjudged guilty”

final judgment: “a

judgment disposing of the case

before the court of law”

bench: “persons

who hear cases in a

court of law”

hypernym

# overlapped words = 2

Extending a Gloss

16

Creating the Extended Gloss Overlap Measure

►How to measure overlaps?

►Which relations to use for gloss extension?

17

How to Score Overlaps?

► Lesk simply summed up overlapped words.

► But matches involving phrases – phrasal matches – are rarer, and more informative

▪ E.g. “court of law” “body of water”

► Aim: Score of n words in a phrase > sum of scores of n words in shorter phrases

► Solution: Give a phrase of n words a score of

▪ “court of law” gets score of 9.

2n

▪ bank(2): “sloping land beside a body of water”

▪ lake: “a body of water surrounded by land”

18

Which Relations to Use?

Typically include…

►Hypernyms [ “car” → “vehicle” ]

►Hyponyms [ “car” → “convertible” ]

►Meronyms [ “car” → “accelerator” ]

►…

19

Extended Gloss Overlap Measure► Input two synsets A and B

► Find phrasal gloss overlaps between A and B

► For each relation, compute phrasal gloss overlaps between every synset connected to A, and

every synset connected to B

► Add phrasal scores to get relatedness of A and B

A and B can be from different parts of speech!

20

Distance: Path-lengthPath-length sim based on is-a/hypernyms hierarchies

),(/1),(sim 2121path ccpathlencc =

CPSC 422, Lecture 24

But this is assuming that all the links are the same….

Encode the same semantic distance….

21

Probability of a concept/sense and its info content

N

ccountc

)()(P = )(log)(I cPcC −=

probability

Information Content

CPSC 422, Lecture 24

22

Concept Distance: info content

• Similarity should be proportional to the information

that the two concepts share… what is that?

N

ccountc

)()(P =

)(log)(I cPcC −= ),( 21 ccLCSprobability

Information Lowest Common Subsumer

)),((log),(sim 2121resnik ccLCSPcc −=

CPSC 422, Lecture 24

CPSC 422, Lecture 24 23

),(sim resnik SnakeDog

)),((log),(sim 2121resnik ccLCSPcc −=

)Re,(sim resnik ptileMammal

Are these two the same?

A. Yes B. No C. Cannot tell

Is this reasonable?

Given this measure of similarity

A. Yes B. No C. Cannot tell

24

Concept Distance: info content

• One of best performers – Jiang-Conrath distance• How much information the two DO NOT share

CPSC 422, Lecture 24

25

Concept Distance: info content

))(log)((log)),((log2),(d 212121 cPcPccLCSPccistJC +−=

• This is a measure of distance. Reciprocal for similarity!

CPSC 422, Lecture 24

• Problem for measures working on hierarchies/graphs:

only compare concepts associated with words of one

part-of speech (typically nouns)

))),((log2()))(log())(log((),(d 212121 ccLCSPcPcPccistJC −−−+−=

26

Concept Distance: info content

• One of best performers – Jiang-Conrath distance• How much information the two DO NOT share

))(log)((log)),((log2),(d 212121 cPcPccLCSPccistJC +−=

• This is a measure of distance. Reciprocal for similarity!

CPSC 422, Lecture 24

• Problem for measures working on hierarchies/graphs:

only compare concepts associated with words of one

part-of speech (typically nouns)

))),((log2()))(log())(log((),(d 212121 ccLCSPcPcPccistJC −−−+−=

CPSC 422, Lecture 24 28

CPSC 422, Lecture 24 30

Lecture Overview

• Semantic Similarity/Distance

• Concepts: Thesaurus/Ontology Methods

• Words: Distributional Methods – Word

Similarity (WS)

CPSC 422, Lecture 24 31

Word Similarity: Distributional Methods

• Do not have any thesauri/ontologies for target language (e.g., Russian)

• If you have thesaurus/ontology, still– Missing domain-specific (e.g., technical words)

– Poor hyponym knowledge (for V) and nothing for Adjand Adv

– Difficult to compare senses from different hierarchies (although extended Lesk can do this)

• Solution: extract similarity from corpora

• Basic idea: two words are similar if they appear in similar contexts

Intuition of distributional word similarity

• Example: Suppose I asked you what is tesgüino?A bottle of tesgüino is on the table

Everybody likes tesgüino

Tesgüino makes you drunk

We make tesgüino out of corn.

• From context words humans can guess tesgüino means

• an alcoholic beverage like beer

• Intuition for algorithm: • Two words are similar if they have similar word contexts.

11/1/2019CPSC503 Winter 201632

Word-Word matrix: Sample contexts ± 7 words

33

aardvark computer data pinch result sugar …

apricot 0 0 0 1 0 1

pineapple 0 0 0 1 0 1

digital 0 2 1 0 1 0

information 0 1 6 0 4 0

… …

11/1/2019CPSC503 Winter 2016

…… …

• Portion of matrix from the Brown corpus

Simple example of Vectors Models aka “embeddings”.

• Model the meaning of a word by “embedding” in a vector space.

• The meaning of a word is a vector of numbers

WS Distributional Methods (1)

CPSC 422, Lecture 24 34

WS Distributional Methods (2)• More informative values (referred to as

weights or measure of association in the literature)

• Point-wise Mutual Information

)()(

),(log),( 2

i

iiPMI

wPwP

wwPwwassoc =

• t-test

)()(

)()(),(),(

i

iiitestt

wPwP

wPwPwwPwwassoct

−=−

Positive Pointwise Mutual Information– PMI ranges from −∞ to +∞

– But the negative values are problematic

• Things are co-occurring less than we expect by chance

• Unreliable without enormous corpora– Imagine w1 and w2 whose probability is each 10-6

– Hard to be sure p(w1,w2) is significantly different than 10-12

• Plus it’s not clear people are good at “unrelatedness”

– So we just replace negative PMI values by 0

– Positive PMI (PPMI) between word1 and word2:

PPMI 𝑤𝑜𝑟𝑑1, 𝑤𝑜𝑟𝑑2 = max log2𝑃(𝑤𝑜𝑟𝑑1, 𝑤𝑜𝑟𝑑2)

𝑃 𝑤𝑜𝑟𝑑1 𝑃(𝑤𝑜𝑟𝑑2), 0

CPSC503 Winter 2016

35

PMI example

CPSC 422, Lecture 24 36

)()(

),(log),( 2

i

iiPMI

wPwP

wwPwwassoc =

CPSC 422, Lecture 24 37

Other popular vector representations

Dense vector representations (less dimensions):

1. Singular value decomposition applied to word-word PointWise-MI matrix

2. Neural-Network-inspired models (skip-grams, CBOW)

CPSC 422, Lecture 24 38

WS Distributional Methods (3)• Similarity between vectors

Not sensitive to extreme values

)cos(),(cos =

•=•=

wv

wv

w

w

v

vwvsim ine

v

w

=

==N

i

ii

N

i

ii

Jaccard

wv

wv

wvsim

1

1

),max(

),min(

),(

Normalized (weighted) number of overlapping features

CPSC 422, Lecture 24

Learning Goals for today’s class

You can:

• Describe and Justify metrics to compute the

similarity/distance of two concepts in an

ontology

• Describe and Justify distributional metrics to

compute the similarity/distance of two words

(or phrases) in a Natural Language

Slide 39

Next class Mon: Paper Discussion

CPSC 422, Lecture 24 40

• Material that will be covered: Natural language Processing (Context free grammars and parsing)

Assignment-3 out – due Nov 18(8-18 hours – working in pairs is strongly advised)

Next week I will be away attending EMNLP:

Jordon Johnson will sub for me Mon and Wed (Fri class is cancelled)