inf5820 – 2011 fall natural language processing...in constructing models from examples...

42
INF5830 – 2011 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lønning & Lilja Øvrelid

Upload: others

Post on 11-Mar-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: INF5820 – 2011 fall Natural Language Processing...In constructing models from examples (”learning”): What is the best model given these examples? 3. In evaluation: Model1 is

INF5830 – 2011 FALLNATURAL LANGUAGE PROCESSING

Jan Tore Lønning & Lilja Øvrelid

Page 2: INF5820 – 2011 fall Natural Language Processing...In constructing models from examples (”learning”): What is the best model given these examples? 3. In evaluation: Model1 is

Today

Course overview What? How?

First lecture: ”Looking at text”: Texts and corpora Counting words Tokenization and splitting

Page 3: INF5820 – 2011 fall Natural Language Processing...In constructing models from examples (”learning”): What is the best model given these examples? 3. In evaluation: Model1 is

NLP applications - examples

1. General:1. Translation (Google translate)2. Dialogue3. Information processing, including search

2. Speech1. Speech text2. Voice control

3. Language support4. Mobile devices

Page 4: INF5820 – 2011 fall Natural Language Processing...In constructing models from examples (”learning”): What is the best model given these examples? 3. In evaluation: Model1 is

Name game

Computational Linguistics Traditional name, stresses interdisciplinarity

Natural Language Processing Computer science/AI/NLP ”Natural language” a CS term

Language Technology Newer term Stresses applicability LT today is not SciFi (AI), but part of everyday app(lication)s

The terms are more or less interchangeble

Page 5: INF5820 – 2011 fall Natural Language Processing...In constructing models from examples (”learning”): What is the best model given these examples? 3. In evaluation: Model1 is

Communicating with the computer

The model of the computer as communicatior: Analysis Process Generate/synthesis

Syntacticand semantic

analysis

Generation

Page 6: INF5820 – 2011 fall Natural Language Processing...In constructing models from examples (”learning”): What is the best model given these examples? 3. In evaluation: Model1 is

Oral communication

The model of the computer as communicatior: Analysis: speech, grammar, semantics, pragmatics Process Generate/synthesis: content, grammar, speech

Speech recognition

Syntacticand semantic

analysis

GenerationSpeech synthesis

Page 7: INF5820 – 2011 fall Natural Language Processing...In constructing models from examples (”learning”): What is the best model given these examples? 3. In evaluation: Model1 is

The communicating computer

This model fits many applications Translation Dialogue Watson (with or without speech)

The processing step varies: Translation Find an answer Carry out an order

Page 8: INF5820 – 2011 fall Natural Language Processing...In constructing models from examples (”learning”): What is the best model given these examples? 3. In evaluation: Model1 is

From NLTK

Page 9: INF5820 – 2011 fall Natural Language Processing...In constructing models from examples (”learning”): What is the best model given these examples? 3. In evaluation: Model1 is

From theory to application

One modulecorresponds(more or less to) A step in the human

processing A subfield of linguistics One computational

process Example: syntax-parser

Not always distinctlinguistic modules E.g. STATMT

Similar methods acrossdifferent steps E.g. classification

Many different methodswork together, or in parallel E.g. Watson

Traditional view: Modules Today: less tight connections

Page 10: INF5820 – 2011 fall Natural Language Processing...In constructing models from examples (”learning”): What is the best model given these examples? 3. In evaluation: Model1 is

Example: Word sense disambiguation

(from Word Net:) Noun S: (n) bass (the lowest part of the musical range) S: (n) bass, bass part (the lowest part in polyphonic music) S: (n) bass, basso (an adult male singer with the lowest voice) S: (n) sea bass, bass (the lean flesh of a saltwater fish of the family

Serranidae) S: (n) freshwater bass, bass (any of various North American

freshwater fish with lean flesh (especially of the genus Micropterus)) S: (n) bass, bass voice, basso (the lowest adult male singing voice) S: (n) bass (the member with the lowest range of a family of musical

instruments) S: (n) bass (nontechnical name for any of numerous edible marine

and freshwater spiny-finned fishes)

Page 11: INF5820 – 2011 fall Natural Language Processing...In constructing models from examples (”learning”): What is the best model given these examples? 3. In evaluation: Model1 is

A class of methods(supervised, text-based): Propose features that may be relevant, e.g.

Words in context: music, sing, perform, soprano,… fish, river, boat, eat, …

Properties of these words, distance to target word etc. Training corpus with marked senses:

Count features in examples Construct the classifier from these counts Test the classifier on new material!

use a test corpus

Page 12: INF5820 – 2011 fall Natural Language Processing...In constructing models from examples (”learning”): What is the best model given these examples? 3. In evaluation: Model1 is

Example of classifcaton tasks

WSD: bass – fish, voice, instrument, … E-mail: spam or no spam Language class: Given text, which language? Genre Author attribution Search: Is this document relevant for the given search

phrase? Textual entailment: does sentence A entail sentence B? Anaphora co-reference: Who is ”she”?

Page 13: INF5820 – 2011 fall Natural Language Processing...In constructing models from examples (”learning”): What is the best model given these examples? 3. In evaluation: Model1 is

The benefits of statistics:

1. Part of the (learned) model: What is the most probable meaning of this occurrence

of bass? What is the most probable parse of this sentence?

2. In constructing models from examples (”learning”): What is the best model given these examples?

3. In evaluation: Model1 is performing slightly better than model 2

(78.4 vs 73.2), can we conclude that model 1 is better? How large test corpus to we need?

Page 14: INF5820 – 2011 fall Natural Language Processing...In constructing models from examples (”learning”): What is the best model given these examples? 3. In evaluation: Model1 is

Two approaches to NLP

Theoretical, formal Build a declarative model using (grammar) Linguistics Logic

Algorithms (parser) How does it fit the data?

Empirical Start with naturally occurring text What information can we get?

Page 15: INF5820 – 2011 fall Natural Language Processing...In constructing models from examples (”learning”): What is the best model given these examples? 3. In evaluation: Model1 is

S NP VPNP DET NVP IVVP TV NPNP NP som VPNP NP PPPP P NPNP kari | ola N barn | by | mann

Context Free Phrase-Structure Grammar (CF P-SG)

BNF (Backus-Naur Form)

S ::= NP VPNP ::= DET N | NP som VP |

NP PP | kari | olaVP ::= IV | TV NPPP ::= P NPN ::= barn | by | mann

Grammars (formal approach)

Better for NLP: unification-based grammars, cf. INF2820

Page 16: INF5820 – 2011 fall Natural Language Processing...In constructing models from examples (”learning”): What is the best model given these examples? 3. In evaluation: Model1 is

Formal approach: challenges

Coverage (even for English Resource Grammar) Ca 80% The grammar isn’t complete The text isn’t grammatical

Ambiguities Sentences are ambiguous Long sentences may get many parses

(in the thousands) Larger coverage more rules more ambiguities Efficiency

Page 17: INF5820 – 2011 fall Natural Language Processing...In constructing models from examples (”learning”): What is the best model given these examples? 3. In evaluation: Model1 is

Empirical methods

Examples: Tagging Speech recognition Statistical MT

Learn from examples: generalize Stochastic methods: probabilities

Page 18: INF5820 – 2011 fall Natural Language Processing...In constructing models from examples (”learning”): What is the best model given these examples? 3. In evaluation: Model1 is

Two approaches

coverage

”deepness”

100%

formal

empirical

Page 19: INF5820 – 2011 fall Natural Language Processing...In constructing models from examples (”learning”): What is the best model given these examples? 3. In evaluation: Model1 is

A decisive difference

Formal methods: A clearcut division betweenGrammatical – ungrammatical Possible analysis – impossible

Choosing the most probable between the grammaticalones

Empirical, stochastic approach Choose the ”best” (most probable) No division between possible and impossible

Page 20: INF5820 – 2011 fall Natural Language Processing...In constructing models from examples (”learning”): What is the best model given these examples? 3. In evaluation: Model1 is

Course content

General introduction FSNLP ch 4-8

Tools useful across tasks and methods: Probability theory Stat. Inference Experimental set up Evaluation

Problems, methods Collocations N-gram models Classification: naive Bayes,

max-ent Clustering, word similarities

In depth some particularproblems and methods: Dependency parsing Semantic role labelling

The tools will mostly be taughtin parallel with the methods

Page 21: INF5820 – 2011 fall Natural Language Processing...In constructing models from examples (”learning”): What is the best model given these examples? 3. In evaluation: Model1 is

INF5830

http://www.uio.no/studier/emner/matnat/ifi/INF580/index.xml

Recommende prior knowledge INF4820 (may be studied the same term)

Advantage, but not assumed INF2820 Some statistics

Alternates with INF5820 Language technological applications

Page 22: INF5820 – 2011 fall Natural Language Processing...In constructing models from examples (”learning”): What is the best model given these examples? 3. In evaluation: Model1 is

Mixed audience

Challenge: Participants have different backgrounds

(e.g. INF4820, INF5820) Wait for INF4820 Content of some courses have changed over time E.g. probabilistic CFG in INF2820/INF4820

Goal: INF2820 or INF4820 sufficient background Main emphasis on methods not covered by INF2820 or

INF4820 Avoid repetition:

less overlap with INF4820 than 2 years ago

Page 23: INF5820 – 2011 fall Natural Language Processing...In constructing models from examples (”learning”): What is the best model given these examples? 3. In evaluation: Model1 is

Schedule

Class Monday 14.15-16 Wednesday 10.15-12 (not every week)

Exam Written 16 Dec

Page 24: INF5820 – 2011 fall Natural Language Processing...In constructing models from examples (”learning”): What is the best model given these examples? 3. In evaluation: Model1 is

Assignments

3 obligatory sets Familarize ourselves with techniques and tools1. Collocations, 23.92. N-gram models and smoothing, 7.103. Project:

1. Dependency parsing, 28.102. Semantic role labelling, 11.11

Several non-obligatory exercises tools

Page 25: INF5820 – 2011 fall Natural Language Processing...In constructing models from examples (”learning”): What is the best model given these examples? 3. In evaluation: Model1 is

PhD-students

Use code INF9830 Supposed to do more than master students Class presentation

Page 26: INF5820 – 2011 fall Natural Language Processing...In constructing models from examples (”learning”): What is the best model given these examples? 3. In evaluation: Model1 is

Are the following familiar?

If not, I offer an extra session

Page 27: INF5820 – 2011 fall Natural Language Processing...In constructing models from examples (”learning”): What is the best model given these examples? 3. In evaluation: Model1 is

First lecture: ”Looking at text”

Page 28: INF5820 – 2011 fall Natural Language Processing...In constructing models from examples (”learning”): What is the best model given these examples? 3. In evaluation: Model1 is

Today

Course overview What? How?

First lecture: ”Looking at text”: Texts and corpora Counting words Tokenization and splitting

Page 29: INF5820 – 2011 fall Natural Language Processing...In constructing models from examples (”learning”): What is the best model given these examples? 3. In evaluation: Model1 is

Looking at a text

FSNLP: ”Tom Sawyer”: 71,370 word tokens 8,018 word types Average #(token/type)= 8.9 Not evenly distributed, cf. Tab. 1.1 p.21

Page 30: INF5820 – 2011 fall Natural Language Processing...In constructing models from examples (”learning”): What is the best model given these examples? 3. In evaluation: Model1 is
Page 31: INF5820 – 2011 fall Natural Language Processing...In constructing models from examples (”learning”): What is the best model given these examples? 3. In evaluation: Model1 is

Frequencies of frequencies

½ of the words occuronce ”hapax legomena”

1/6 of the words occurtwice

See table 1.2

Page 32: INF5820 – 2011 fall Natural Language Processing...In constructing models from examples (”learning”): What is the best model given these examples? 3. In evaluation: Model1 is

Example of random variable

W are all the different word froms in the text Freq(w) which tells how many times w occurs is a

random variable X: W ℜ How probable is is that Freq(w)=X=n? P(X=n)?

Page 33: INF5820 – 2011 fall Natural Language Processing...In constructing models from examples (”learning”): What is the best model given these examples? 3. In evaluation: Model1 is

Zipf’s law

r for rank, f for frequencies Inversely: How many words occur exactly n times? It must be:

the rank of the last words that occur at least n times minus the rank of the last word that occurs n+1 times

Page 34: INF5820 – 2011 fall Natural Language Processing...In constructing models from examples (”learning”): What is the best model given these examples? 3. In evaluation: Model1 is

Zipf’s law

”Nearly correct” Proposed refinements

Page 35: INF5820 – 2011 fall Natural Language Processing...In constructing models from examples (”learning”): What is the best model given these examples? 3. In evaluation: Model1 is

Other counts

Sentences: Number of sentences Average sentence

length Frequencies of

sentence lengths

Collocations (bigrams) N-grams …

Page 36: INF5820 – 2011 fall Natural Language Processing...In constructing models from examples (”learning”): What is the best model given these examples? 3. In evaluation: Model1 is

It is not dangerous

Det var en gang en liten geitekilling som hadde lært å telle til ti. Da han kom til en vannpytt, stod han lenge og så på speilbildet sitt i vannet, og nå skal du høre hvordan det gikk:

- Én, sa geitekillingen. Dette hørte en kalv som gikk i nærheten og åt gras. - Hva gjør du for noe? sa kalven. - Jeg teller meg, sa geitekillingen. - Skal jeg telle

deg også? - Hvis det ikke gjør vondt, så, sa kalven. - Det gjør det vel ikke, stå stille, så skal jeg telle

deg også? - Nei, jeg tør ikke, kanskje jeg ikke får lov av mora

mi engang, sa kalven og trakk seg unna. Men geitekillingen fulgte etter, og så sa han:

- Jeg er én, og du er to, 1-2. - Mo-er! rautet kalven og begynte å gråte …

Page 37: INF5820 – 2011 fall Natural Language Processing...In constructing models from examples (”learning”): What is the best model given these examples? 3. In evaluation: Model1 is

Corpora

Collections of texts Different sources: spoken, written:

News papers Novels Dialogue …

May (or may not) aspire to be balanced/ representative (of a domain)

Plain or marked-up

Page 38: INF5820 – 2011 fall Natural Language Processing...In constructing models from examples (”learning”): What is the best model given these examples? 3. In evaluation: Model1 is

Corpora

Famous: Balanced: Brown corpus (Am.English) LOB (Brit. Eng) (Lancaster-Oslo-Bergen) British National Corpus (BNC)

Wall Street Journal (WSJ), much used e.g. in stat. parsing

Europarl, used in Stat MT Here to find more:

http://nlp.stanford.edu/links/statnlp.html

Page 39: INF5820 – 2011 fall Natural Language Processing...In constructing models from examples (”learning”): What is the best model given these examples? 3. In evaluation: Model1 is

/norstore/project/ltr

Check web for LDC content + nltk-data (address later)

jtl@rubin / $ ls -l /norstore/project/ltrtotal 83904drwxr-xr-x+ 6 oe ifi-ltr 65536 2010-03-08 11:19 ARCdrwxrwxr-x+ 3 oe ifi-ltr 65536 2009-09-30 20:56 BNCdrwxrwxr-x+ 22 oe ifi-ltr 65536 2010-09-23 12:09 LDCdrwxrwxr-x+ 4 oe ifi-ltr 65536 2009-06-17 15:02 NISTdrwxr-xr-x+ 3 oe oe 65536 2010-02-26 12:12 NORAdrwxr-xr-x+ 3 oe oe 65536 2010-02-25 10:09 OANC-rw-r-xr--+ 1 root ifi-ltr 21531 2011-08-20 21:21 SYNCED-rw-r-xr--+ 1 root ifi-ltr 84424449 2011-08-20 21:21 SYNCED.logdrwxr-xr-x 32 oe oe 65536 2011-03-04 13:25 WikiWoodsdrwxr-xr-x 3 oe oe 65536 2011-03-04 13:41 lingodrwxrwxr-x+ 3 oe ifi-ltr 65536 2009-09-30 21:01 oe-rw-r--r-- 1 root root 44 2011-03-07 18:17 testfiledrwxr-xr-x+ 3 oe oe 65536 2011-08-19 16:03 wwwjtl@rubin / $

Page 40: INF5820 – 2011 fall Natural Language Processing...In constructing models from examples (”learning”): What is the best model given these examples? 3. In evaluation: Model1 is

Tokenization

A text file: a sequence of characters (ASCII or utf-8 or …) May need to convert between character sets

Tokenize: Turn it into a sequence of (word) tokens. What is a token/word? Should ? , ; . ! be tokens ”king” ” king ” (3 tokens), king (1 tok.) or ”king” (1 tok.)? It’s It ’ s (3 tokens), It ’s(2 tok.), It is(2 tok.), or It’s (1 tok.)?

Page 41: INF5820 – 2011 fall Natural Language Processing...In constructing models from examples (”learning”): What is the best model given these examples? 3. In evaluation: Model1 is

Beyond tokenization

Downcase or not? (depends on purpose) Lemmatize?

king king, kings king, man man, men man

Relevant e.g. for counting type frequencies: word forms or lexemes

Sentence splitting: Not always clear sentence boundaries Use: Heuristics, orMachine learning

Page 42: INF5820 – 2011 fall Natural Language Processing...In constructing models from examples (”learning”): What is the best model given these examples? 3. In evaluation: Model1 is

Covered, read, work

Foundations of Statistical Natural LanguageProcessing (FSNLP): Ch. 1, ch 4.1-4.2.

Two papers by Joakim Nivre Work through Natural Language Processing with

Python (NLTK), sec 1.1-1.3, 1.5, 2.1-2.2 Nivre’s web course lecture 1