ling 2000 - 2006 nlp 1 introduction to computational linguistics martha palmer april 19, 2006

30
LING 2000 - 2006 NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006

Upload: randell-potter

Post on 18-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: LING 2000 - 2006 NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006

LING 2000 - 2006 NLP1

Introduction to Computational Linguistics

Martha Palmer

April 19, 2006

Page 2: LING 2000 - 2006 NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006

LING 2000 - 2006 NLP2

Natural Language Processing

• Machine Translation

• Predicate argument structures

• Syntactic parses

• Producing semantic representations

• Ambiguities in sentence interpretation

Page 3: LING 2000 - 2006 NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006

LING 2000 - 2006 NLP3

Machine Translation

• One of the first applications for computers– bilingual dictionary > word-word translation

• Good translation requires understanding!– War and Peace, The Sound and The Fury?

• What can we do? Sublanguages.– technical domains, static vocabulary– Meteo in Canada, Caterpillar Tractor Manuals,

Botanical descriptions, Military Messages

Page 4: LING 2000 - 2006 NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006

LING 2000 - 2006 NLP4

Example translation

Page 5: LING 2000 - 2006 NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006

LING 2000 - 2006 NLP5

Translation Issues: Korean to English

- Word order- Dropped arguments- Lexical ambiguities- Structure vs morphology

Page 6: LING 2000 - 2006 NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006

LING 2000 - 2006 NLP6

Common Thread

• Predicate-argument structure– Basic constituents of the sentence and how

they are related to each other

• Constituents– John, Mary, the dog, pleasure, the store.

• Relations– Loves, feeds, go, to, bring

Page 7: LING 2000 - 2006 NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006

LING 2000 - 2006 NLP7

Abstracting away from surface structure

Page 8: LING 2000 - 2006 NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006

LING 2000 - 2006 NLP8

Transfer lexicons

Page 9: LING 2000 - 2006 NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006

LING 2000 - 2006 NLP9

Machine Translation Lexical Choice- Word Sense Disambiguation

Iraq lost the battle. Ilakuka centwey ciessta. [Iraq ] [battle] [lost].

John lost his computer. John-i computer-lul ilepelyessta. [John] [computer] [misplaced].

Page 10: LING 2000 - 2006 NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006

LING 2000 - 2006 NLP10

Natural Language Processing

• Syntax– Grammars, parsers, parse trees,

dependency structures

• Semantics– Subcategorization frames, semantic

classes, ontologies, formal semantics

• Pragmatics– Pronouns, reference resolution, discourse

models

Page 11: LING 2000 - 2006 NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006

LING 2000 - 2006 NLP11

Syntactic Categories

• Nouns, pronouns, Proper nouns

• Verbs, intransitive verbs, transitive verbs, ditransitive verbs (subcategorization frames)

• Modifiers, Adjectives, Adverbs

• Prepositions

• Conjunctions

Page 12: LING 2000 - 2006 NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006

LING 2000 - 2006 NLP12

Syntactic Parsing

• The cat sat on the mat. Det Noun Verb Prep Det Noun

• Time flies like an arrow. Noun Verb Prep Det Noun

• Fruit flies like a banana. Noun Noun Verb Det Noun

Page 13: LING 2000 - 2006 NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006

Context Free Grammar

• S -> NP VP

• NP -> det (adj) N

• NP -> Proper N

• NP -> N

• VP -> V, VP -> V PP

• VP -> V NP

• VP -> V NP PP, PP -> Prep NP

• VP -> V NP NPLING 2000 - 2006 NLP

13

Page 14: LING 2000 - 2006 NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006

LING 2000 - 2006 NLP14

Parses

V PP

VP

S

NP

the

the mat

satcat

onNPPrep

The cat sat on the mat

DetN

Det N

Page 15: LING 2000 - 2006 NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006

LING 2000 - 2006 NLP15

Parses

VPP

VP

S

NP

time

an arrow

flies

likeNPPrep

Time flies like an arrow.

N

Det N

Page 16: LING 2000 - 2006 NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006

LING 2000 - 2006 NLP16

Parses

V NP

VP

S

NP

flies like

an

NDet

Time flies like an arrow.

Ntime

arrow

N

Page 17: LING 2000 - 2006 NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006

LING 2000 - 2006 NLP17

Features

• C for Case, Subjective/Objective– She visited her.

• P for Person agreement, (1st, 2nd, 3rd)– I like him, You like him, He likes him,

• N for Number agreement, Subject/Verb– He likes him, They like him.

• G for Gender agreement, Subject/Verb– English, reflexive pronouns He washed himself.– Romance languages, det/noun

• T for Tense, – auxiliaries, sentential complements, etc. – * will finished is bad

Page 18: LING 2000 - 2006 NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006

LING 2000 - 2006 NLP18

Probabilistic Context Free Grammars

• Adding probabilities

• Lexicalizing the probabilities

Page 19: LING 2000 - 2006 NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006

LING 2000 - 2006 NLP19

Simple Context Free Grammar in BNFS → NP VPNP → Pronoun

| Noun | Det Adj Noun |NP PP

PP → Prep NPV → Verb

| Aux VerbVP → V

| V NP | V NP NP | V NP PP | VP PP

Page 20: LING 2000 - 2006 NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006

LING 2000 - 2006 NLP20

Simple Probabilistic CFGS → NP VPNP → Pronoun [0.10]

| Noun [0.20]| Det Adj Noun [0.50]|NP PP [0.20]

PP → Prep NP [1.00]V → Verb [0.33]

| Aux Verb [0.67]VP → V [0.10]

| V NP [0.40]| V NP NP [0.10]| V NP PP [0.20]| VP PP [0.20]

Page 21: LING 2000 - 2006 NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006

LING 2000 - 2006 NLP21

Simple Probabilistic Lexicalized CFGS → NP VPNP → Pronoun [0.10]

| Noun [0.20]| Det Adj Noun [0.50]|NP PP [0.20]

PP → Prep NP [1.00]V → Verb [0.33]

| Aux Verb [0.67]VP → V [0.87] {sleep, cry, laugh}

| V NP [0.03]| V NP NP [0.00]| V NP PP [0.00]| VP PP [0.10]

Page 22: LING 2000 - 2006 NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006

LING 2000 - 2006 NLP22

Simple Probabilistic Lexicalized CFGVP → V [0.30]

| V NP [0.60] {break,split,crack..}

| V NP NP [0.00]| V NP PP [0.00]| VP PP [0.10]

VP → V [0.10] what about | V NP [0.40] leave?| V NP NP [0.10] leave1,

leave2?| V NP PP [0.20]| VP PP [0.20]

Page 23: LING 2000 - 2006 NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006

LING 2000 - 2006 NLP23

Language to Logic

• John went to the book store. John store1, go(John, store1)

• John bought a book. buy(John,book1)

• John gave the book to Mary. give(John,book1,Mary)

• Mary put the book on the table. put(Mary,book1,table1)

Page 24: LING 2000 - 2006 NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006

LING 2000 - 2006 NLP24

SemanticsSame event - different sentences

  John broke the window with a hammer.

  John broke the window with the crack.

  The hammer broke the window.

  The window broke.

Page 25: LING 2000 - 2006 NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006

LING 2000 - 2006 NLP25

Same event - different syntactic frames

  John broke the window with a hammer.  SUBJ VERB OBJ MODIFIER

  John broke the window with the crack.  SUBJ VERB OBJ MODIFIER

  The hammer broke the window.  SUBJ VERB OBJ

  The window broke.  SUBJ VERB

Page 26: LING 2000 - 2006 NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006

LING 2000 - 2006 NLP26

Semantics -predicate arguments

  break(AGENT, INSTRUMENT, PATIENT)

  AGENT PATIENT INSTRUMENT  John broke the window with a hammer.

  INSTRUMENT PATIENT  The hammer broke the window.

  PATIENT  The window broke.

  Fillmore 68 - The case for case

Page 27: LING 2000 - 2006 NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006

LING 2000 - 2006 NLP27

    AGENT PATIENT INSTRUMENT

  John broke the window with a hammer.  SUBJ OBJ MODIFIER

  INSTRUMENT PATIENT

  The hammer broke the window.  SUBJ OBJ

  PATIENT

  The window broke.  SUBJ

Page 28: LING 2000 - 2006 NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006

LING 2000 - 2006 NLP28

Canonical Representation

  break (Agent: animate,  Instrument: tool,  Patient: physical-object)

  Agent <=> subj  Instrument <=> subj, with-pp  Patient <=> obj, subj

 

Page 29: LING 2000 - 2006 NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006

LING 2000 - 2006 NLP29

Syntax/semantics interaction

• Parsers will produce syntactically valid parses for semantically anomalous sentences

• Lexical semantics can be used to rule them out

Page 30: LING 2000 - 2006 NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006

LING 2000 - 2006 NLP30

Headlines

• Police Begin Campaign To Run Down Jaywalkers

• Iraqi Head Seeks Arms

• Teacher Strikes Idle Kids

• Miners Refuse To Work After Death

• Juvenile Court To Try Shooting Defendant