part ii. statistical nlp advanced artificial intelligence introduction and grammar models wolfram...

55
Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting ides taken from Helmut Schmid, Rada Mihalcea, Bonnie osseim, Peter Flach and others

Post on 19-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

Part II Statistical NLP

Advanced Artificial Intelligence

Introduction and Grammar Models

Wolfram Burgard Luc De Raedt Bernhard Nebel Kristian Kersting

Some slides taken from Helmut Schmid Rada Mihalcea Bonnie Dorr Leila Kosseim Peter Flach and others

Topic

Statistical Natural Language Processing Applies

bull Machine Learning Statistics to Learning the ability to improve onersquos behaviour at a specific

task over time - involves the analysis of data (statistics)

bull Natural Language Processing Following parts of the book

bull Statistical NLP (Manning and Schuetze) MIT Press 1999

Contents

Motivation Zipfrsquos law Some natural language processing tasks Non-probabilistic NLP models

bull Regular grammars and finite state automatabull Context-Free Grammarsbull Definite Clause Grammars

Motivation for statistical NLP Overview of the rest of this part

Rationalism versus Empiricism

Rationalist bull Noam Chomsky - innate language structuresbull AI hand coding NLPbull Dominant view 1960-1985bull Cf eg Steven Pinkerrsquos The language instinct (popular

science book) Empiricist

bull Ability to learn is innatebull AI language is learned from corporabull Dominant 1920-1960 and becoming increasingly important

Rationalism versus Empiricism

Noam Chomskybull But it must be recognized that the notion of ldquoprobability of

a sentencerdquo is an entirely useless one under any known interpretation of this term

Fred Jelinek (IBM 1988)bull Every time a linguist leaves the room the recognition rate

goes upbull (Alternative Every time I fire a linguist the recognizer

improves)

This course

Empiricist approach bull Focus will be on probabilistic models for learning

of natural language No time to treat natural language in depth

bull (though this would be quite useful and interesting)

bull Deserves a full course by itself Covered in more depth in Logic Language and

Learning (SS 05 prob SS 06)

Ambiguity

Statistical Disambiguation

bull Define a probability model for the data

bull Compute the probability of each alternative

bull Choose the most likely alternative

NLP and Statistics

Statistical Methods deal with uncertaintyThey predict the future behaviour of a systembased on the behaviour observed in the past

Statistical Methods require training data

The data in Statistical NLP are the Corpora

NLP and Statistics

Corpus text collection for linguistic purposes

TokensHow many words are contained in Tom Sawyer 71370

TypesHow many different words are contained in TS 8018

Hapax Legomenawords appearing only once

Corpora

The most frequent words are function words

word freq word freq

the 3332 in 906

and 2972 that 877

a 1775 he 877

to 1725 I 783

of 1440 his 772

was 1161 you 686

it 1027 Tom 679

Word Counts

f nf

1 39932 12923 6644 4105 2436 1997 1728 1319 8210 9111-50 54051-100 99gt 100 102

How many words appear f times

Word Counts

About half of the words occurs just onceAbout half of the text consists of the

100 most common wordshellip

Word Counts (Brown corpus)

Word Counts (Brown corpus)

word f r fr word f r frthe 3332 1 3332 turned 51 200 10200and 2972 2 5944 youlsquoll 30 300 9000a 1775 3 5235 name 21 400 8400he 877 10 8770 comes 16 500 8000but 410 20 8400 group 13 600 7800be 294 30 8820 lead 11 700 7700there 222 40 8880 friends 10 800 8000one 172 50 8600 begin 9 900 8100about 158 60 9480 family 8 1000 8000more 138 70 9660 brushed 4 2000 8000never 124 80 9920 sins 2 3000 6000Oh 116 90 10440 Could 2 4000 8000two 104 100 10400 Applausive 1 8000 8000

Zipflsquos Law f~1r (fr = const)

Zipflsquos Law

Minimize effort

Language and sequences

Natural language processingbull Is concerned with the analysis of

sequences of words sentencesbullConstruction of language models

Two types of modelsbullNon-probabilisticbull Probabilistic

Human Language is highly ambiguous at all levels

bull acoustic levelrecognize speech vs wreck a nice beach

bull morphological levelsaw to see (past) saw (noun) to saw (present inf)

bull syntactic levelI saw the man on the hill with a telescope

bull semantic levelOne book has to be read by every student

Key NLP Problem Ambiguity

Language Model

A formal model about language Two types

bull Non-probabilistic Allows one to compute whether a certain sequence

(sentence or part thereof) is possible Often grammar based

bull Probabilistic Allows one to compute the probability of a certain

sequence Often extends grammars with probabilities

Example of bad language model

A bad language model

A bad language model

A good language model

Non-Probabilisticbull ldquoI swear to tell the truthrdquo is possiblebull ldquoI swerve to smell de souprdquo is impossible

Probabilisticbull P(I swear to tell the truth) ~ 0001bull P(I swerve to smell de soup) ~ 0

Why language models

Consider a Shannon Gamebull Predicting the next word in the sequence

Statistical natural language hellip The cat is thrown out of the hellip The large green hellip Sue swallowed the large green hellip hellip

Model at the sentence level

Applications

Spelling correction Mobile phone texting Speech recognition Handwriting recognition Disabled users hellip

Spelling errors

They are leaving in about fifteen minuets to go to her house

The study was conducted mainly be John Black Hopefully all with continue smoothly in my absence Can they lave him my messages I need to notified the bank ofhellip He is trying to fine out

Handwriting recognition

Assume a note is given to a bank teller which the teller reads as I have a gub (cf Woody Allen)

NLP to the rescue hellipbull gub is not a wordbull gun gum Gus and gull are words but gun has a

higher probability in the context of a bank

For Spell Checkers

Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere

ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo

Another dimension in language models

Do we mainly want to infer (probabilities) of legal sentences sequences bull So far

Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL

Letrsquos look at some tasks

Sequence Tagging

Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun

Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End

Sequence Tagging

Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip

Parsing

Given a sentence find its parse tree Important step in understanding NL

Parsing

In bioinformatics allows to predict (elements of) structure from sequence

Language models based on Grammars

Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars

A particular type of Unification Based Grammar (Prolog)

Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about

words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)

bull Grammar encode rules

Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing

(more than just recognition) Result of parsing mostly parse tree

showing the constituents of a sentence eg verb or noun phrases

Syntax usually specified in terms of a grammar consisting of grammar rules

Regular Grammars and Finite State Automata

Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no

argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an

argumentbull Adj (adjective)

Now acceptbull The cat sleptbull Det N Vi

As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]

Lexicon bull The - Detbull Cat - Nbull Slept - Vi

bull hellip

Finite State Automaton

Sentences

bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N

Phrase structure

S

NP

D N

VP

NPV

D N

PP

P NP

D N

the dog chased a cat into the garden

Notation

S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase

Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]

Terminals ~ Lexicon

Phrase structure

Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the

Recursionbull bdquoThe girl thought the dog chased the catldquo

VP -gt V SN -gt [girl]V -gt [thought]

Top-down parsing

S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat

Context-free grammarSS --gt --gt NPNPVPVP

NPNP --gt PN --gt PN Proper nounProper noun

NPNP --gt Art Adj N--gt Art Adj N

NPNP --gt ArtN--gt ArtN

VPVP --gt VI --gt VI intransitive verbintransitive verb

VPVP --gt VT --gt VT NPNP transitive verbtransitive verb

ArtArt --gt [the]--gt [the]

AdjAdj --gt [lazy]--gt [lazy]

AdjAdj --gt [rapid]--gt [rapid]

PNPN --gt [achilles]--gt [achilles]

NN --gt [turtle]--gt [turtle]

VIVI --gt [sleeps]--gt [sleeps]

VTVT --gt [beats]--gt [beats]

Parse tree

SS

NPNP VPVP

ArtArt AdjAdj NN VtVt NPNP

PNPN

achillesachillesbeatsbeatsturtleturtlerapidrapidthethe

Definite Clause GrammarsNon-terminals may have arguments

SS --gt --gt NPNP((NN))VPVP((NN))

NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))

VP(VP(NN)) --gt VI(--gt VI(NN))

Art(Art(singularsingular)) --gt [a]--gt [a]

Art(Art(singularsingular)) --gt [the]--gt [the]

Art(Art(pluralplural)) --gt [the]--gt [the]

N(N(singularsingular)) --gt [turtle]--gt [turtle]

N(N(pluralplural)) --gt [turtles]--gt [turtles]

VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]

VI(VI(pluralplural)) --gt [sleep]--gt [sleep]

Number Agreement

DCGs

Non-terminals may have argumentsbull Variables (start with capital)

Eg Number Any

bull Constants (start with lower case) Eg singular plural

bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)

Parsing needs to be adapted bull Using unification

Unification in a nutshell (cf AI course)

Substitutions

Eg Num singular T vp(VNP)

Applying substitution bull Simultaneously replace variables by

corresponding termsbull S(Num) Num singular = S(singular)

Unification

Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)

Gives Num singular

bull Art(singular) and Art(plural) Fails

bull Art(Num1) and Art(Num2) Num1 Num2

bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative

Parsing with DCGs

Now require successful unification at each step

S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps

S-gt a turtle sleep fails

Case Marking

PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]

PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]

PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]

PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]

S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)

VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)

VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)

VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)

He sees her She sees him They see her

But not Them see he

DCGs

Are strictly more expressive than CFGs Can represent for instance

bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]

Probabilistic Models

Traditional grammar models are very rigid bull essentially a yes no decision

Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative

Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing

Illustration

Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known

bull Constructed by handbull Can be used to derive stochastic context free

grammarsbull SCFG assign probability to parse trees

Compute the most probable parse tree

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 2: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

Topic

Statistical Natural Language Processing Applies

bull Machine Learning Statistics to Learning the ability to improve onersquos behaviour at a specific

task over time - involves the analysis of data (statistics)

bull Natural Language Processing Following parts of the book

bull Statistical NLP (Manning and Schuetze) MIT Press 1999

Contents

Motivation Zipfrsquos law Some natural language processing tasks Non-probabilistic NLP models

bull Regular grammars and finite state automatabull Context-Free Grammarsbull Definite Clause Grammars

Motivation for statistical NLP Overview of the rest of this part

Rationalism versus Empiricism

Rationalist bull Noam Chomsky - innate language structuresbull AI hand coding NLPbull Dominant view 1960-1985bull Cf eg Steven Pinkerrsquos The language instinct (popular

science book) Empiricist

bull Ability to learn is innatebull AI language is learned from corporabull Dominant 1920-1960 and becoming increasingly important

Rationalism versus Empiricism

Noam Chomskybull But it must be recognized that the notion of ldquoprobability of

a sentencerdquo is an entirely useless one under any known interpretation of this term

Fred Jelinek (IBM 1988)bull Every time a linguist leaves the room the recognition rate

goes upbull (Alternative Every time I fire a linguist the recognizer

improves)

This course

Empiricist approach bull Focus will be on probabilistic models for learning

of natural language No time to treat natural language in depth

bull (though this would be quite useful and interesting)

bull Deserves a full course by itself Covered in more depth in Logic Language and

Learning (SS 05 prob SS 06)

Ambiguity

Statistical Disambiguation

bull Define a probability model for the data

bull Compute the probability of each alternative

bull Choose the most likely alternative

NLP and Statistics

Statistical Methods deal with uncertaintyThey predict the future behaviour of a systembased on the behaviour observed in the past

Statistical Methods require training data

The data in Statistical NLP are the Corpora

NLP and Statistics

Corpus text collection for linguistic purposes

TokensHow many words are contained in Tom Sawyer 71370

TypesHow many different words are contained in TS 8018

Hapax Legomenawords appearing only once

Corpora

The most frequent words are function words

word freq word freq

the 3332 in 906

and 2972 that 877

a 1775 he 877

to 1725 I 783

of 1440 his 772

was 1161 you 686

it 1027 Tom 679

Word Counts

f nf

1 39932 12923 6644 4105 2436 1997 1728 1319 8210 9111-50 54051-100 99gt 100 102

How many words appear f times

Word Counts

About half of the words occurs just onceAbout half of the text consists of the

100 most common wordshellip

Word Counts (Brown corpus)

Word Counts (Brown corpus)

word f r fr word f r frthe 3332 1 3332 turned 51 200 10200and 2972 2 5944 youlsquoll 30 300 9000a 1775 3 5235 name 21 400 8400he 877 10 8770 comes 16 500 8000but 410 20 8400 group 13 600 7800be 294 30 8820 lead 11 700 7700there 222 40 8880 friends 10 800 8000one 172 50 8600 begin 9 900 8100about 158 60 9480 family 8 1000 8000more 138 70 9660 brushed 4 2000 8000never 124 80 9920 sins 2 3000 6000Oh 116 90 10440 Could 2 4000 8000two 104 100 10400 Applausive 1 8000 8000

Zipflsquos Law f~1r (fr = const)

Zipflsquos Law

Minimize effort

Language and sequences

Natural language processingbull Is concerned with the analysis of

sequences of words sentencesbullConstruction of language models

Two types of modelsbullNon-probabilisticbull Probabilistic

Human Language is highly ambiguous at all levels

bull acoustic levelrecognize speech vs wreck a nice beach

bull morphological levelsaw to see (past) saw (noun) to saw (present inf)

bull syntactic levelI saw the man on the hill with a telescope

bull semantic levelOne book has to be read by every student

Key NLP Problem Ambiguity

Language Model

A formal model about language Two types

bull Non-probabilistic Allows one to compute whether a certain sequence

(sentence or part thereof) is possible Often grammar based

bull Probabilistic Allows one to compute the probability of a certain

sequence Often extends grammars with probabilities

Example of bad language model

A bad language model

A bad language model

A good language model

Non-Probabilisticbull ldquoI swear to tell the truthrdquo is possiblebull ldquoI swerve to smell de souprdquo is impossible

Probabilisticbull P(I swear to tell the truth) ~ 0001bull P(I swerve to smell de soup) ~ 0

Why language models

Consider a Shannon Gamebull Predicting the next word in the sequence

Statistical natural language hellip The cat is thrown out of the hellip The large green hellip Sue swallowed the large green hellip hellip

Model at the sentence level

Applications

Spelling correction Mobile phone texting Speech recognition Handwriting recognition Disabled users hellip

Spelling errors

They are leaving in about fifteen minuets to go to her house

The study was conducted mainly be John Black Hopefully all with continue smoothly in my absence Can they lave him my messages I need to notified the bank ofhellip He is trying to fine out

Handwriting recognition

Assume a note is given to a bank teller which the teller reads as I have a gub (cf Woody Allen)

NLP to the rescue hellipbull gub is not a wordbull gun gum Gus and gull are words but gun has a

higher probability in the context of a bank

For Spell Checkers

Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere

ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo

Another dimension in language models

Do we mainly want to infer (probabilities) of legal sentences sequences bull So far

Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL

Letrsquos look at some tasks

Sequence Tagging

Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun

Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End

Sequence Tagging

Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip

Parsing

Given a sentence find its parse tree Important step in understanding NL

Parsing

In bioinformatics allows to predict (elements of) structure from sequence

Language models based on Grammars

Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars

A particular type of Unification Based Grammar (Prolog)

Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about

words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)

bull Grammar encode rules

Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing

(more than just recognition) Result of parsing mostly parse tree

showing the constituents of a sentence eg verb or noun phrases

Syntax usually specified in terms of a grammar consisting of grammar rules

Regular Grammars and Finite State Automata

Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no

argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an

argumentbull Adj (adjective)

Now acceptbull The cat sleptbull Det N Vi

As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]

Lexicon bull The - Detbull Cat - Nbull Slept - Vi

bull hellip

Finite State Automaton

Sentences

bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N

Phrase structure

S

NP

D N

VP

NPV

D N

PP

P NP

D N

the dog chased a cat into the garden

Notation

S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase

Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]

Terminals ~ Lexicon

Phrase structure

Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the

Recursionbull bdquoThe girl thought the dog chased the catldquo

VP -gt V SN -gt [girl]V -gt [thought]

Top-down parsing

S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat

Context-free grammarSS --gt --gt NPNPVPVP

NPNP --gt PN --gt PN Proper nounProper noun

NPNP --gt Art Adj N--gt Art Adj N

NPNP --gt ArtN--gt ArtN

VPVP --gt VI --gt VI intransitive verbintransitive verb

VPVP --gt VT --gt VT NPNP transitive verbtransitive verb

ArtArt --gt [the]--gt [the]

AdjAdj --gt [lazy]--gt [lazy]

AdjAdj --gt [rapid]--gt [rapid]

PNPN --gt [achilles]--gt [achilles]

NN --gt [turtle]--gt [turtle]

VIVI --gt [sleeps]--gt [sleeps]

VTVT --gt [beats]--gt [beats]

Parse tree

SS

NPNP VPVP

ArtArt AdjAdj NN VtVt NPNP

PNPN

achillesachillesbeatsbeatsturtleturtlerapidrapidthethe

Definite Clause GrammarsNon-terminals may have arguments

SS --gt --gt NPNP((NN))VPVP((NN))

NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))

VP(VP(NN)) --gt VI(--gt VI(NN))

Art(Art(singularsingular)) --gt [a]--gt [a]

Art(Art(singularsingular)) --gt [the]--gt [the]

Art(Art(pluralplural)) --gt [the]--gt [the]

N(N(singularsingular)) --gt [turtle]--gt [turtle]

N(N(pluralplural)) --gt [turtles]--gt [turtles]

VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]

VI(VI(pluralplural)) --gt [sleep]--gt [sleep]

Number Agreement

DCGs

Non-terminals may have argumentsbull Variables (start with capital)

Eg Number Any

bull Constants (start with lower case) Eg singular plural

bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)

Parsing needs to be adapted bull Using unification

Unification in a nutshell (cf AI course)

Substitutions

Eg Num singular T vp(VNP)

Applying substitution bull Simultaneously replace variables by

corresponding termsbull S(Num) Num singular = S(singular)

Unification

Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)

Gives Num singular

bull Art(singular) and Art(plural) Fails

bull Art(Num1) and Art(Num2) Num1 Num2

bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative

Parsing with DCGs

Now require successful unification at each step

S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps

S-gt a turtle sleep fails

Case Marking

PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]

PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]

PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]

PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]

S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)

VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)

VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)

VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)

He sees her She sees him They see her

But not Them see he

DCGs

Are strictly more expressive than CFGs Can represent for instance

bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]

Probabilistic Models

Traditional grammar models are very rigid bull essentially a yes no decision

Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative

Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing

Illustration

Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known

bull Constructed by handbull Can be used to derive stochastic context free

grammarsbull SCFG assign probability to parse trees

Compute the most probable parse tree

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 3: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

Contents

Motivation Zipfrsquos law Some natural language processing tasks Non-probabilistic NLP models

bull Regular grammars and finite state automatabull Context-Free Grammarsbull Definite Clause Grammars

Motivation for statistical NLP Overview of the rest of this part

Rationalism versus Empiricism

Rationalist bull Noam Chomsky - innate language structuresbull AI hand coding NLPbull Dominant view 1960-1985bull Cf eg Steven Pinkerrsquos The language instinct (popular

science book) Empiricist

bull Ability to learn is innatebull AI language is learned from corporabull Dominant 1920-1960 and becoming increasingly important

Rationalism versus Empiricism

Noam Chomskybull But it must be recognized that the notion of ldquoprobability of

a sentencerdquo is an entirely useless one under any known interpretation of this term

Fred Jelinek (IBM 1988)bull Every time a linguist leaves the room the recognition rate

goes upbull (Alternative Every time I fire a linguist the recognizer

improves)

This course

Empiricist approach bull Focus will be on probabilistic models for learning

of natural language No time to treat natural language in depth

bull (though this would be quite useful and interesting)

bull Deserves a full course by itself Covered in more depth in Logic Language and

Learning (SS 05 prob SS 06)

Ambiguity

Statistical Disambiguation

bull Define a probability model for the data

bull Compute the probability of each alternative

bull Choose the most likely alternative

NLP and Statistics

Statistical Methods deal with uncertaintyThey predict the future behaviour of a systembased on the behaviour observed in the past

Statistical Methods require training data

The data in Statistical NLP are the Corpora

NLP and Statistics

Corpus text collection for linguistic purposes

TokensHow many words are contained in Tom Sawyer 71370

TypesHow many different words are contained in TS 8018

Hapax Legomenawords appearing only once

Corpora

The most frequent words are function words

word freq word freq

the 3332 in 906

and 2972 that 877

a 1775 he 877

to 1725 I 783

of 1440 his 772

was 1161 you 686

it 1027 Tom 679

Word Counts

f nf

1 39932 12923 6644 4105 2436 1997 1728 1319 8210 9111-50 54051-100 99gt 100 102

How many words appear f times

Word Counts

About half of the words occurs just onceAbout half of the text consists of the

100 most common wordshellip

Word Counts (Brown corpus)

Word Counts (Brown corpus)

word f r fr word f r frthe 3332 1 3332 turned 51 200 10200and 2972 2 5944 youlsquoll 30 300 9000a 1775 3 5235 name 21 400 8400he 877 10 8770 comes 16 500 8000but 410 20 8400 group 13 600 7800be 294 30 8820 lead 11 700 7700there 222 40 8880 friends 10 800 8000one 172 50 8600 begin 9 900 8100about 158 60 9480 family 8 1000 8000more 138 70 9660 brushed 4 2000 8000never 124 80 9920 sins 2 3000 6000Oh 116 90 10440 Could 2 4000 8000two 104 100 10400 Applausive 1 8000 8000

Zipflsquos Law f~1r (fr = const)

Zipflsquos Law

Minimize effort

Language and sequences

Natural language processingbull Is concerned with the analysis of

sequences of words sentencesbullConstruction of language models

Two types of modelsbullNon-probabilisticbull Probabilistic

Human Language is highly ambiguous at all levels

bull acoustic levelrecognize speech vs wreck a nice beach

bull morphological levelsaw to see (past) saw (noun) to saw (present inf)

bull syntactic levelI saw the man on the hill with a telescope

bull semantic levelOne book has to be read by every student

Key NLP Problem Ambiguity

Language Model

A formal model about language Two types

bull Non-probabilistic Allows one to compute whether a certain sequence

(sentence or part thereof) is possible Often grammar based

bull Probabilistic Allows one to compute the probability of a certain

sequence Often extends grammars with probabilities

Example of bad language model

A bad language model

A bad language model

A good language model

Non-Probabilisticbull ldquoI swear to tell the truthrdquo is possiblebull ldquoI swerve to smell de souprdquo is impossible

Probabilisticbull P(I swear to tell the truth) ~ 0001bull P(I swerve to smell de soup) ~ 0

Why language models

Consider a Shannon Gamebull Predicting the next word in the sequence

Statistical natural language hellip The cat is thrown out of the hellip The large green hellip Sue swallowed the large green hellip hellip

Model at the sentence level

Applications

Spelling correction Mobile phone texting Speech recognition Handwriting recognition Disabled users hellip

Spelling errors

They are leaving in about fifteen minuets to go to her house

The study was conducted mainly be John Black Hopefully all with continue smoothly in my absence Can they lave him my messages I need to notified the bank ofhellip He is trying to fine out

Handwriting recognition

Assume a note is given to a bank teller which the teller reads as I have a gub (cf Woody Allen)

NLP to the rescue hellipbull gub is not a wordbull gun gum Gus and gull are words but gun has a

higher probability in the context of a bank

For Spell Checkers

Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere

ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo

Another dimension in language models

Do we mainly want to infer (probabilities) of legal sentences sequences bull So far

Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL

Letrsquos look at some tasks

Sequence Tagging

Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun

Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End

Sequence Tagging

Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip

Parsing

Given a sentence find its parse tree Important step in understanding NL

Parsing

In bioinformatics allows to predict (elements of) structure from sequence

Language models based on Grammars

Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars

A particular type of Unification Based Grammar (Prolog)

Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about

words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)

bull Grammar encode rules

Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing

(more than just recognition) Result of parsing mostly parse tree

showing the constituents of a sentence eg verb or noun phrases

Syntax usually specified in terms of a grammar consisting of grammar rules

Regular Grammars and Finite State Automata

Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no

argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an

argumentbull Adj (adjective)

Now acceptbull The cat sleptbull Det N Vi

As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]

Lexicon bull The - Detbull Cat - Nbull Slept - Vi

bull hellip

Finite State Automaton

Sentences

bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N

Phrase structure

S

NP

D N

VP

NPV

D N

PP

P NP

D N

the dog chased a cat into the garden

Notation

S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase

Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]

Terminals ~ Lexicon

Phrase structure

Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the

Recursionbull bdquoThe girl thought the dog chased the catldquo

VP -gt V SN -gt [girl]V -gt [thought]

Top-down parsing

S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat

Context-free grammarSS --gt --gt NPNPVPVP

NPNP --gt PN --gt PN Proper nounProper noun

NPNP --gt Art Adj N--gt Art Adj N

NPNP --gt ArtN--gt ArtN

VPVP --gt VI --gt VI intransitive verbintransitive verb

VPVP --gt VT --gt VT NPNP transitive verbtransitive verb

ArtArt --gt [the]--gt [the]

AdjAdj --gt [lazy]--gt [lazy]

AdjAdj --gt [rapid]--gt [rapid]

PNPN --gt [achilles]--gt [achilles]

NN --gt [turtle]--gt [turtle]

VIVI --gt [sleeps]--gt [sleeps]

VTVT --gt [beats]--gt [beats]

Parse tree

SS

NPNP VPVP

ArtArt AdjAdj NN VtVt NPNP

PNPN

achillesachillesbeatsbeatsturtleturtlerapidrapidthethe

Definite Clause GrammarsNon-terminals may have arguments

SS --gt --gt NPNP((NN))VPVP((NN))

NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))

VP(VP(NN)) --gt VI(--gt VI(NN))

Art(Art(singularsingular)) --gt [a]--gt [a]

Art(Art(singularsingular)) --gt [the]--gt [the]

Art(Art(pluralplural)) --gt [the]--gt [the]

N(N(singularsingular)) --gt [turtle]--gt [turtle]

N(N(pluralplural)) --gt [turtles]--gt [turtles]

VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]

VI(VI(pluralplural)) --gt [sleep]--gt [sleep]

Number Agreement

DCGs

Non-terminals may have argumentsbull Variables (start with capital)

Eg Number Any

bull Constants (start with lower case) Eg singular plural

bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)

Parsing needs to be adapted bull Using unification

Unification in a nutshell (cf AI course)

Substitutions

Eg Num singular T vp(VNP)

Applying substitution bull Simultaneously replace variables by

corresponding termsbull S(Num) Num singular = S(singular)

Unification

Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)

Gives Num singular

bull Art(singular) and Art(plural) Fails

bull Art(Num1) and Art(Num2) Num1 Num2

bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative

Parsing with DCGs

Now require successful unification at each step

S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps

S-gt a turtle sleep fails

Case Marking

PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]

PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]

PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]

PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]

S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)

VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)

VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)

VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)

He sees her She sees him They see her

But not Them see he

DCGs

Are strictly more expressive than CFGs Can represent for instance

bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]

Probabilistic Models

Traditional grammar models are very rigid bull essentially a yes no decision

Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative

Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing

Illustration

Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known

bull Constructed by handbull Can be used to derive stochastic context free

grammarsbull SCFG assign probability to parse trees

Compute the most probable parse tree

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 4: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

Rationalism versus Empiricism

Rationalist bull Noam Chomsky - innate language structuresbull AI hand coding NLPbull Dominant view 1960-1985bull Cf eg Steven Pinkerrsquos The language instinct (popular

science book) Empiricist

bull Ability to learn is innatebull AI language is learned from corporabull Dominant 1920-1960 and becoming increasingly important

Rationalism versus Empiricism

Noam Chomskybull But it must be recognized that the notion of ldquoprobability of

a sentencerdquo is an entirely useless one under any known interpretation of this term

Fred Jelinek (IBM 1988)bull Every time a linguist leaves the room the recognition rate

goes upbull (Alternative Every time I fire a linguist the recognizer

improves)

This course

Empiricist approach bull Focus will be on probabilistic models for learning

of natural language No time to treat natural language in depth

bull (though this would be quite useful and interesting)

bull Deserves a full course by itself Covered in more depth in Logic Language and

Learning (SS 05 prob SS 06)

Ambiguity

Statistical Disambiguation

bull Define a probability model for the data

bull Compute the probability of each alternative

bull Choose the most likely alternative

NLP and Statistics

Statistical Methods deal with uncertaintyThey predict the future behaviour of a systembased on the behaviour observed in the past

Statistical Methods require training data

The data in Statistical NLP are the Corpora

NLP and Statistics

Corpus text collection for linguistic purposes

TokensHow many words are contained in Tom Sawyer 71370

TypesHow many different words are contained in TS 8018

Hapax Legomenawords appearing only once

Corpora

The most frequent words are function words

word freq word freq

the 3332 in 906

and 2972 that 877

a 1775 he 877

to 1725 I 783

of 1440 his 772

was 1161 you 686

it 1027 Tom 679

Word Counts

f nf

1 39932 12923 6644 4105 2436 1997 1728 1319 8210 9111-50 54051-100 99gt 100 102

How many words appear f times

Word Counts

About half of the words occurs just onceAbout half of the text consists of the

100 most common wordshellip

Word Counts (Brown corpus)

Word Counts (Brown corpus)

word f r fr word f r frthe 3332 1 3332 turned 51 200 10200and 2972 2 5944 youlsquoll 30 300 9000a 1775 3 5235 name 21 400 8400he 877 10 8770 comes 16 500 8000but 410 20 8400 group 13 600 7800be 294 30 8820 lead 11 700 7700there 222 40 8880 friends 10 800 8000one 172 50 8600 begin 9 900 8100about 158 60 9480 family 8 1000 8000more 138 70 9660 brushed 4 2000 8000never 124 80 9920 sins 2 3000 6000Oh 116 90 10440 Could 2 4000 8000two 104 100 10400 Applausive 1 8000 8000

Zipflsquos Law f~1r (fr = const)

Zipflsquos Law

Minimize effort

Language and sequences

Natural language processingbull Is concerned with the analysis of

sequences of words sentencesbullConstruction of language models

Two types of modelsbullNon-probabilisticbull Probabilistic

Human Language is highly ambiguous at all levels

bull acoustic levelrecognize speech vs wreck a nice beach

bull morphological levelsaw to see (past) saw (noun) to saw (present inf)

bull syntactic levelI saw the man on the hill with a telescope

bull semantic levelOne book has to be read by every student

Key NLP Problem Ambiguity

Language Model

A formal model about language Two types

bull Non-probabilistic Allows one to compute whether a certain sequence

(sentence or part thereof) is possible Often grammar based

bull Probabilistic Allows one to compute the probability of a certain

sequence Often extends grammars with probabilities

Example of bad language model

A bad language model

A bad language model

A good language model

Non-Probabilisticbull ldquoI swear to tell the truthrdquo is possiblebull ldquoI swerve to smell de souprdquo is impossible

Probabilisticbull P(I swear to tell the truth) ~ 0001bull P(I swerve to smell de soup) ~ 0

Why language models

Consider a Shannon Gamebull Predicting the next word in the sequence

Statistical natural language hellip The cat is thrown out of the hellip The large green hellip Sue swallowed the large green hellip hellip

Model at the sentence level

Applications

Spelling correction Mobile phone texting Speech recognition Handwriting recognition Disabled users hellip

Spelling errors

They are leaving in about fifteen minuets to go to her house

The study was conducted mainly be John Black Hopefully all with continue smoothly in my absence Can they lave him my messages I need to notified the bank ofhellip He is trying to fine out

Handwriting recognition

Assume a note is given to a bank teller which the teller reads as I have a gub (cf Woody Allen)

NLP to the rescue hellipbull gub is not a wordbull gun gum Gus and gull are words but gun has a

higher probability in the context of a bank

For Spell Checkers

Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere

ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo

Another dimension in language models

Do we mainly want to infer (probabilities) of legal sentences sequences bull So far

Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL

Letrsquos look at some tasks

Sequence Tagging

Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun

Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End

Sequence Tagging

Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip

Parsing

Given a sentence find its parse tree Important step in understanding NL

Parsing

In bioinformatics allows to predict (elements of) structure from sequence

Language models based on Grammars

Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars

A particular type of Unification Based Grammar (Prolog)

Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about

words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)

bull Grammar encode rules

Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing

(more than just recognition) Result of parsing mostly parse tree

showing the constituents of a sentence eg verb or noun phrases

Syntax usually specified in terms of a grammar consisting of grammar rules

Regular Grammars and Finite State Automata

Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no

argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an

argumentbull Adj (adjective)

Now acceptbull The cat sleptbull Det N Vi

As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]

Lexicon bull The - Detbull Cat - Nbull Slept - Vi

bull hellip

Finite State Automaton

Sentences

bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N

Phrase structure

S

NP

D N

VP

NPV

D N

PP

P NP

D N

the dog chased a cat into the garden

Notation

S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase

Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]

Terminals ~ Lexicon

Phrase structure

Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the

Recursionbull bdquoThe girl thought the dog chased the catldquo

VP -gt V SN -gt [girl]V -gt [thought]

Top-down parsing

S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat

Context-free grammarSS --gt --gt NPNPVPVP

NPNP --gt PN --gt PN Proper nounProper noun

NPNP --gt Art Adj N--gt Art Adj N

NPNP --gt ArtN--gt ArtN

VPVP --gt VI --gt VI intransitive verbintransitive verb

VPVP --gt VT --gt VT NPNP transitive verbtransitive verb

ArtArt --gt [the]--gt [the]

AdjAdj --gt [lazy]--gt [lazy]

AdjAdj --gt [rapid]--gt [rapid]

PNPN --gt [achilles]--gt [achilles]

NN --gt [turtle]--gt [turtle]

VIVI --gt [sleeps]--gt [sleeps]

VTVT --gt [beats]--gt [beats]

Parse tree

SS

NPNP VPVP

ArtArt AdjAdj NN VtVt NPNP

PNPN

achillesachillesbeatsbeatsturtleturtlerapidrapidthethe

Definite Clause GrammarsNon-terminals may have arguments

SS --gt --gt NPNP((NN))VPVP((NN))

NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))

VP(VP(NN)) --gt VI(--gt VI(NN))

Art(Art(singularsingular)) --gt [a]--gt [a]

Art(Art(singularsingular)) --gt [the]--gt [the]

Art(Art(pluralplural)) --gt [the]--gt [the]

N(N(singularsingular)) --gt [turtle]--gt [turtle]

N(N(pluralplural)) --gt [turtles]--gt [turtles]

VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]

VI(VI(pluralplural)) --gt [sleep]--gt [sleep]

Number Agreement

DCGs

Non-terminals may have argumentsbull Variables (start with capital)

Eg Number Any

bull Constants (start with lower case) Eg singular plural

bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)

Parsing needs to be adapted bull Using unification

Unification in a nutshell (cf AI course)

Substitutions

Eg Num singular T vp(VNP)

Applying substitution bull Simultaneously replace variables by

corresponding termsbull S(Num) Num singular = S(singular)

Unification

Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)

Gives Num singular

bull Art(singular) and Art(plural) Fails

bull Art(Num1) and Art(Num2) Num1 Num2

bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative

Parsing with DCGs

Now require successful unification at each step

S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps

S-gt a turtle sleep fails

Case Marking

PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]

PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]

PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]

PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]

S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)

VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)

VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)

VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)

He sees her She sees him They see her

But not Them see he

DCGs

Are strictly more expressive than CFGs Can represent for instance

bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]

Probabilistic Models

Traditional grammar models are very rigid bull essentially a yes no decision

Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative

Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing

Illustration

Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known

bull Constructed by handbull Can be used to derive stochastic context free

grammarsbull SCFG assign probability to parse trees

Compute the most probable parse tree

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 5: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

Rationalism versus Empiricism

Noam Chomskybull But it must be recognized that the notion of ldquoprobability of

a sentencerdquo is an entirely useless one under any known interpretation of this term

Fred Jelinek (IBM 1988)bull Every time a linguist leaves the room the recognition rate

goes upbull (Alternative Every time I fire a linguist the recognizer

improves)

This course

Empiricist approach bull Focus will be on probabilistic models for learning

of natural language No time to treat natural language in depth

bull (though this would be quite useful and interesting)

bull Deserves a full course by itself Covered in more depth in Logic Language and

Learning (SS 05 prob SS 06)

Ambiguity

Statistical Disambiguation

bull Define a probability model for the data

bull Compute the probability of each alternative

bull Choose the most likely alternative

NLP and Statistics

Statistical Methods deal with uncertaintyThey predict the future behaviour of a systembased on the behaviour observed in the past

Statistical Methods require training data

The data in Statistical NLP are the Corpora

NLP and Statistics

Corpus text collection for linguistic purposes

TokensHow many words are contained in Tom Sawyer 71370

TypesHow many different words are contained in TS 8018

Hapax Legomenawords appearing only once

Corpora

The most frequent words are function words

word freq word freq

the 3332 in 906

and 2972 that 877

a 1775 he 877

to 1725 I 783

of 1440 his 772

was 1161 you 686

it 1027 Tom 679

Word Counts

f nf

1 39932 12923 6644 4105 2436 1997 1728 1319 8210 9111-50 54051-100 99gt 100 102

How many words appear f times

Word Counts

About half of the words occurs just onceAbout half of the text consists of the

100 most common wordshellip

Word Counts (Brown corpus)

Word Counts (Brown corpus)

word f r fr word f r frthe 3332 1 3332 turned 51 200 10200and 2972 2 5944 youlsquoll 30 300 9000a 1775 3 5235 name 21 400 8400he 877 10 8770 comes 16 500 8000but 410 20 8400 group 13 600 7800be 294 30 8820 lead 11 700 7700there 222 40 8880 friends 10 800 8000one 172 50 8600 begin 9 900 8100about 158 60 9480 family 8 1000 8000more 138 70 9660 brushed 4 2000 8000never 124 80 9920 sins 2 3000 6000Oh 116 90 10440 Could 2 4000 8000two 104 100 10400 Applausive 1 8000 8000

Zipflsquos Law f~1r (fr = const)

Zipflsquos Law

Minimize effort

Language and sequences

Natural language processingbull Is concerned with the analysis of

sequences of words sentencesbullConstruction of language models

Two types of modelsbullNon-probabilisticbull Probabilistic

Human Language is highly ambiguous at all levels

bull acoustic levelrecognize speech vs wreck a nice beach

bull morphological levelsaw to see (past) saw (noun) to saw (present inf)

bull syntactic levelI saw the man on the hill with a telescope

bull semantic levelOne book has to be read by every student

Key NLP Problem Ambiguity

Language Model

A formal model about language Two types

bull Non-probabilistic Allows one to compute whether a certain sequence

(sentence or part thereof) is possible Often grammar based

bull Probabilistic Allows one to compute the probability of a certain

sequence Often extends grammars with probabilities

Example of bad language model

A bad language model

A bad language model

A good language model

Non-Probabilisticbull ldquoI swear to tell the truthrdquo is possiblebull ldquoI swerve to smell de souprdquo is impossible

Probabilisticbull P(I swear to tell the truth) ~ 0001bull P(I swerve to smell de soup) ~ 0

Why language models

Consider a Shannon Gamebull Predicting the next word in the sequence

Statistical natural language hellip The cat is thrown out of the hellip The large green hellip Sue swallowed the large green hellip hellip

Model at the sentence level

Applications

Spelling correction Mobile phone texting Speech recognition Handwriting recognition Disabled users hellip

Spelling errors

They are leaving in about fifteen minuets to go to her house

The study was conducted mainly be John Black Hopefully all with continue smoothly in my absence Can they lave him my messages I need to notified the bank ofhellip He is trying to fine out

Handwriting recognition

Assume a note is given to a bank teller which the teller reads as I have a gub (cf Woody Allen)

NLP to the rescue hellipbull gub is not a wordbull gun gum Gus and gull are words but gun has a

higher probability in the context of a bank

For Spell Checkers

Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere

ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo

Another dimension in language models

Do we mainly want to infer (probabilities) of legal sentences sequences bull So far

Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL

Letrsquos look at some tasks

Sequence Tagging

Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun

Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End

Sequence Tagging

Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip

Parsing

Given a sentence find its parse tree Important step in understanding NL

Parsing

In bioinformatics allows to predict (elements of) structure from sequence

Language models based on Grammars

Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars

A particular type of Unification Based Grammar (Prolog)

Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about

words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)

bull Grammar encode rules

Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing

(more than just recognition) Result of parsing mostly parse tree

showing the constituents of a sentence eg verb or noun phrases

Syntax usually specified in terms of a grammar consisting of grammar rules

Regular Grammars and Finite State Automata

Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no

argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an

argumentbull Adj (adjective)

Now acceptbull The cat sleptbull Det N Vi

As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]

Lexicon bull The - Detbull Cat - Nbull Slept - Vi

bull hellip

Finite State Automaton

Sentences

bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N

Phrase structure

S

NP

D N

VP

NPV

D N

PP

P NP

D N

the dog chased a cat into the garden

Notation

S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase

Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]

Terminals ~ Lexicon

Phrase structure

Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the

Recursionbull bdquoThe girl thought the dog chased the catldquo

VP -gt V SN -gt [girl]V -gt [thought]

Top-down parsing

S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat

Context-free grammarSS --gt --gt NPNPVPVP

NPNP --gt PN --gt PN Proper nounProper noun

NPNP --gt Art Adj N--gt Art Adj N

NPNP --gt ArtN--gt ArtN

VPVP --gt VI --gt VI intransitive verbintransitive verb

VPVP --gt VT --gt VT NPNP transitive verbtransitive verb

ArtArt --gt [the]--gt [the]

AdjAdj --gt [lazy]--gt [lazy]

AdjAdj --gt [rapid]--gt [rapid]

PNPN --gt [achilles]--gt [achilles]

NN --gt [turtle]--gt [turtle]

VIVI --gt [sleeps]--gt [sleeps]

VTVT --gt [beats]--gt [beats]

Parse tree

SS

NPNP VPVP

ArtArt AdjAdj NN VtVt NPNP

PNPN

achillesachillesbeatsbeatsturtleturtlerapidrapidthethe

Definite Clause GrammarsNon-terminals may have arguments

SS --gt --gt NPNP((NN))VPVP((NN))

NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))

VP(VP(NN)) --gt VI(--gt VI(NN))

Art(Art(singularsingular)) --gt [a]--gt [a]

Art(Art(singularsingular)) --gt [the]--gt [the]

Art(Art(pluralplural)) --gt [the]--gt [the]

N(N(singularsingular)) --gt [turtle]--gt [turtle]

N(N(pluralplural)) --gt [turtles]--gt [turtles]

VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]

VI(VI(pluralplural)) --gt [sleep]--gt [sleep]

Number Agreement

DCGs

Non-terminals may have argumentsbull Variables (start with capital)

Eg Number Any

bull Constants (start with lower case) Eg singular plural

bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)

Parsing needs to be adapted bull Using unification

Unification in a nutshell (cf AI course)

Substitutions

Eg Num singular T vp(VNP)

Applying substitution bull Simultaneously replace variables by

corresponding termsbull S(Num) Num singular = S(singular)

Unification

Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)

Gives Num singular

bull Art(singular) and Art(plural) Fails

bull Art(Num1) and Art(Num2) Num1 Num2

bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative

Parsing with DCGs

Now require successful unification at each step

S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps

S-gt a turtle sleep fails

Case Marking

PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]

PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]

PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]

PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]

S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)

VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)

VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)

VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)

He sees her She sees him They see her

But not Them see he

DCGs

Are strictly more expressive than CFGs Can represent for instance

bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]

Probabilistic Models

Traditional grammar models are very rigid bull essentially a yes no decision

Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative

Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing

Illustration

Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known

bull Constructed by handbull Can be used to derive stochastic context free

grammarsbull SCFG assign probability to parse trees

Compute the most probable parse tree

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 6: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

This course

Empiricist approach bull Focus will be on probabilistic models for learning

of natural language No time to treat natural language in depth

bull (though this would be quite useful and interesting)

bull Deserves a full course by itself Covered in more depth in Logic Language and

Learning (SS 05 prob SS 06)

Ambiguity

Statistical Disambiguation

bull Define a probability model for the data

bull Compute the probability of each alternative

bull Choose the most likely alternative

NLP and Statistics

Statistical Methods deal with uncertaintyThey predict the future behaviour of a systembased on the behaviour observed in the past

Statistical Methods require training data

The data in Statistical NLP are the Corpora

NLP and Statistics

Corpus text collection for linguistic purposes

TokensHow many words are contained in Tom Sawyer 71370

TypesHow many different words are contained in TS 8018

Hapax Legomenawords appearing only once

Corpora

The most frequent words are function words

word freq word freq

the 3332 in 906

and 2972 that 877

a 1775 he 877

to 1725 I 783

of 1440 his 772

was 1161 you 686

it 1027 Tom 679

Word Counts

f nf

1 39932 12923 6644 4105 2436 1997 1728 1319 8210 9111-50 54051-100 99gt 100 102

How many words appear f times

Word Counts

About half of the words occurs just onceAbout half of the text consists of the

100 most common wordshellip

Word Counts (Brown corpus)

Word Counts (Brown corpus)

word f r fr word f r frthe 3332 1 3332 turned 51 200 10200and 2972 2 5944 youlsquoll 30 300 9000a 1775 3 5235 name 21 400 8400he 877 10 8770 comes 16 500 8000but 410 20 8400 group 13 600 7800be 294 30 8820 lead 11 700 7700there 222 40 8880 friends 10 800 8000one 172 50 8600 begin 9 900 8100about 158 60 9480 family 8 1000 8000more 138 70 9660 brushed 4 2000 8000never 124 80 9920 sins 2 3000 6000Oh 116 90 10440 Could 2 4000 8000two 104 100 10400 Applausive 1 8000 8000

Zipflsquos Law f~1r (fr = const)

Zipflsquos Law

Minimize effort

Language and sequences

Natural language processingbull Is concerned with the analysis of

sequences of words sentencesbullConstruction of language models

Two types of modelsbullNon-probabilisticbull Probabilistic

Human Language is highly ambiguous at all levels

bull acoustic levelrecognize speech vs wreck a nice beach

bull morphological levelsaw to see (past) saw (noun) to saw (present inf)

bull syntactic levelI saw the man on the hill with a telescope

bull semantic levelOne book has to be read by every student

Key NLP Problem Ambiguity

Language Model

A formal model about language Two types

bull Non-probabilistic Allows one to compute whether a certain sequence

(sentence or part thereof) is possible Often grammar based

bull Probabilistic Allows one to compute the probability of a certain

sequence Often extends grammars with probabilities

Example of bad language model

A bad language model

A bad language model

A good language model

Non-Probabilisticbull ldquoI swear to tell the truthrdquo is possiblebull ldquoI swerve to smell de souprdquo is impossible

Probabilisticbull P(I swear to tell the truth) ~ 0001bull P(I swerve to smell de soup) ~ 0

Why language models

Consider a Shannon Gamebull Predicting the next word in the sequence

Statistical natural language hellip The cat is thrown out of the hellip The large green hellip Sue swallowed the large green hellip hellip

Model at the sentence level

Applications

Spelling correction Mobile phone texting Speech recognition Handwriting recognition Disabled users hellip

Spelling errors

They are leaving in about fifteen minuets to go to her house

The study was conducted mainly be John Black Hopefully all with continue smoothly in my absence Can they lave him my messages I need to notified the bank ofhellip He is trying to fine out

Handwriting recognition

Assume a note is given to a bank teller which the teller reads as I have a gub (cf Woody Allen)

NLP to the rescue hellipbull gub is not a wordbull gun gum Gus and gull are words but gun has a

higher probability in the context of a bank

For Spell Checkers

Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere

ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo

Another dimension in language models

Do we mainly want to infer (probabilities) of legal sentences sequences bull So far

Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL

Letrsquos look at some tasks

Sequence Tagging

Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun

Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End

Sequence Tagging

Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip

Parsing

Given a sentence find its parse tree Important step in understanding NL

Parsing

In bioinformatics allows to predict (elements of) structure from sequence

Language models based on Grammars

Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars

A particular type of Unification Based Grammar (Prolog)

Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about

words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)

bull Grammar encode rules

Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing

(more than just recognition) Result of parsing mostly parse tree

showing the constituents of a sentence eg verb or noun phrases

Syntax usually specified in terms of a grammar consisting of grammar rules

Regular Grammars and Finite State Automata

Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no

argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an

argumentbull Adj (adjective)

Now acceptbull The cat sleptbull Det N Vi

As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]

Lexicon bull The - Detbull Cat - Nbull Slept - Vi

bull hellip

Finite State Automaton

Sentences

bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N

Phrase structure

S

NP

D N

VP

NPV

D N

PP

P NP

D N

the dog chased a cat into the garden

Notation

S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase

Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]

Terminals ~ Lexicon

Phrase structure

Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the

Recursionbull bdquoThe girl thought the dog chased the catldquo

VP -gt V SN -gt [girl]V -gt [thought]

Top-down parsing

S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat

Context-free grammarSS --gt --gt NPNPVPVP

NPNP --gt PN --gt PN Proper nounProper noun

NPNP --gt Art Adj N--gt Art Adj N

NPNP --gt ArtN--gt ArtN

VPVP --gt VI --gt VI intransitive verbintransitive verb

VPVP --gt VT --gt VT NPNP transitive verbtransitive verb

ArtArt --gt [the]--gt [the]

AdjAdj --gt [lazy]--gt [lazy]

AdjAdj --gt [rapid]--gt [rapid]

PNPN --gt [achilles]--gt [achilles]

NN --gt [turtle]--gt [turtle]

VIVI --gt [sleeps]--gt [sleeps]

VTVT --gt [beats]--gt [beats]

Parse tree

SS

NPNP VPVP

ArtArt AdjAdj NN VtVt NPNP

PNPN

achillesachillesbeatsbeatsturtleturtlerapidrapidthethe

Definite Clause GrammarsNon-terminals may have arguments

SS --gt --gt NPNP((NN))VPVP((NN))

NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))

VP(VP(NN)) --gt VI(--gt VI(NN))

Art(Art(singularsingular)) --gt [a]--gt [a]

Art(Art(singularsingular)) --gt [the]--gt [the]

Art(Art(pluralplural)) --gt [the]--gt [the]

N(N(singularsingular)) --gt [turtle]--gt [turtle]

N(N(pluralplural)) --gt [turtles]--gt [turtles]

VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]

VI(VI(pluralplural)) --gt [sleep]--gt [sleep]

Number Agreement

DCGs

Non-terminals may have argumentsbull Variables (start with capital)

Eg Number Any

bull Constants (start with lower case) Eg singular plural

bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)

Parsing needs to be adapted bull Using unification

Unification in a nutshell (cf AI course)

Substitutions

Eg Num singular T vp(VNP)

Applying substitution bull Simultaneously replace variables by

corresponding termsbull S(Num) Num singular = S(singular)

Unification

Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)

Gives Num singular

bull Art(singular) and Art(plural) Fails

bull Art(Num1) and Art(Num2) Num1 Num2

bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative

Parsing with DCGs

Now require successful unification at each step

S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps

S-gt a turtle sleep fails

Case Marking

PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]

PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]

PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]

PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]

S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)

VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)

VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)

VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)

He sees her She sees him They see her

But not Them see he

DCGs

Are strictly more expressive than CFGs Can represent for instance

bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]

Probabilistic Models

Traditional grammar models are very rigid bull essentially a yes no decision

Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative

Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing

Illustration

Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known

bull Constructed by handbull Can be used to derive stochastic context free

grammarsbull SCFG assign probability to parse trees

Compute the most probable parse tree

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 7: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

Ambiguity

Statistical Disambiguation

bull Define a probability model for the data

bull Compute the probability of each alternative

bull Choose the most likely alternative

NLP and Statistics

Statistical Methods deal with uncertaintyThey predict the future behaviour of a systembased on the behaviour observed in the past

Statistical Methods require training data

The data in Statistical NLP are the Corpora

NLP and Statistics

Corpus text collection for linguistic purposes

TokensHow many words are contained in Tom Sawyer 71370

TypesHow many different words are contained in TS 8018

Hapax Legomenawords appearing only once

Corpora

The most frequent words are function words

word freq word freq

the 3332 in 906

and 2972 that 877

a 1775 he 877

to 1725 I 783

of 1440 his 772

was 1161 you 686

it 1027 Tom 679

Word Counts

f nf

1 39932 12923 6644 4105 2436 1997 1728 1319 8210 9111-50 54051-100 99gt 100 102

How many words appear f times

Word Counts

About half of the words occurs just onceAbout half of the text consists of the

100 most common wordshellip

Word Counts (Brown corpus)

Word Counts (Brown corpus)

word f r fr word f r frthe 3332 1 3332 turned 51 200 10200and 2972 2 5944 youlsquoll 30 300 9000a 1775 3 5235 name 21 400 8400he 877 10 8770 comes 16 500 8000but 410 20 8400 group 13 600 7800be 294 30 8820 lead 11 700 7700there 222 40 8880 friends 10 800 8000one 172 50 8600 begin 9 900 8100about 158 60 9480 family 8 1000 8000more 138 70 9660 brushed 4 2000 8000never 124 80 9920 sins 2 3000 6000Oh 116 90 10440 Could 2 4000 8000two 104 100 10400 Applausive 1 8000 8000

Zipflsquos Law f~1r (fr = const)

Zipflsquos Law

Minimize effort

Language and sequences

Natural language processingbull Is concerned with the analysis of

sequences of words sentencesbullConstruction of language models

Two types of modelsbullNon-probabilisticbull Probabilistic

Human Language is highly ambiguous at all levels

bull acoustic levelrecognize speech vs wreck a nice beach

bull morphological levelsaw to see (past) saw (noun) to saw (present inf)

bull syntactic levelI saw the man on the hill with a telescope

bull semantic levelOne book has to be read by every student

Key NLP Problem Ambiguity

Language Model

A formal model about language Two types

bull Non-probabilistic Allows one to compute whether a certain sequence

(sentence or part thereof) is possible Often grammar based

bull Probabilistic Allows one to compute the probability of a certain

sequence Often extends grammars with probabilities

Example of bad language model

A bad language model

A bad language model

A good language model

Non-Probabilisticbull ldquoI swear to tell the truthrdquo is possiblebull ldquoI swerve to smell de souprdquo is impossible

Probabilisticbull P(I swear to tell the truth) ~ 0001bull P(I swerve to smell de soup) ~ 0

Why language models

Consider a Shannon Gamebull Predicting the next word in the sequence

Statistical natural language hellip The cat is thrown out of the hellip The large green hellip Sue swallowed the large green hellip hellip

Model at the sentence level

Applications

Spelling correction Mobile phone texting Speech recognition Handwriting recognition Disabled users hellip

Spelling errors

They are leaving in about fifteen minuets to go to her house

The study was conducted mainly be John Black Hopefully all with continue smoothly in my absence Can they lave him my messages I need to notified the bank ofhellip He is trying to fine out

Handwriting recognition

Assume a note is given to a bank teller which the teller reads as I have a gub (cf Woody Allen)

NLP to the rescue hellipbull gub is not a wordbull gun gum Gus and gull are words but gun has a

higher probability in the context of a bank

For Spell Checkers

Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere

ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo

Another dimension in language models

Do we mainly want to infer (probabilities) of legal sentences sequences bull So far

Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL

Letrsquos look at some tasks

Sequence Tagging

Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun

Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End

Sequence Tagging

Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip

Parsing

Given a sentence find its parse tree Important step in understanding NL

Parsing

In bioinformatics allows to predict (elements of) structure from sequence

Language models based on Grammars

Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars

A particular type of Unification Based Grammar (Prolog)

Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about

words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)

bull Grammar encode rules

Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing

(more than just recognition) Result of parsing mostly parse tree

showing the constituents of a sentence eg verb or noun phrases

Syntax usually specified in terms of a grammar consisting of grammar rules

Regular Grammars and Finite State Automata

Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no

argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an

argumentbull Adj (adjective)

Now acceptbull The cat sleptbull Det N Vi

As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]

Lexicon bull The - Detbull Cat - Nbull Slept - Vi

bull hellip

Finite State Automaton

Sentences

bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N

Phrase structure

S

NP

D N

VP

NPV

D N

PP

P NP

D N

the dog chased a cat into the garden

Notation

S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase

Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]

Terminals ~ Lexicon

Phrase structure

Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the

Recursionbull bdquoThe girl thought the dog chased the catldquo

VP -gt V SN -gt [girl]V -gt [thought]

Top-down parsing

S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat

Context-free grammarSS --gt --gt NPNPVPVP

NPNP --gt PN --gt PN Proper nounProper noun

NPNP --gt Art Adj N--gt Art Adj N

NPNP --gt ArtN--gt ArtN

VPVP --gt VI --gt VI intransitive verbintransitive verb

VPVP --gt VT --gt VT NPNP transitive verbtransitive verb

ArtArt --gt [the]--gt [the]

AdjAdj --gt [lazy]--gt [lazy]

AdjAdj --gt [rapid]--gt [rapid]

PNPN --gt [achilles]--gt [achilles]

NN --gt [turtle]--gt [turtle]

VIVI --gt [sleeps]--gt [sleeps]

VTVT --gt [beats]--gt [beats]

Parse tree

SS

NPNP VPVP

ArtArt AdjAdj NN VtVt NPNP

PNPN

achillesachillesbeatsbeatsturtleturtlerapidrapidthethe

Definite Clause GrammarsNon-terminals may have arguments

SS --gt --gt NPNP((NN))VPVP((NN))

NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))

VP(VP(NN)) --gt VI(--gt VI(NN))

Art(Art(singularsingular)) --gt [a]--gt [a]

Art(Art(singularsingular)) --gt [the]--gt [the]

Art(Art(pluralplural)) --gt [the]--gt [the]

N(N(singularsingular)) --gt [turtle]--gt [turtle]

N(N(pluralplural)) --gt [turtles]--gt [turtles]

VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]

VI(VI(pluralplural)) --gt [sleep]--gt [sleep]

Number Agreement

DCGs

Non-terminals may have argumentsbull Variables (start with capital)

Eg Number Any

bull Constants (start with lower case) Eg singular plural

bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)

Parsing needs to be adapted bull Using unification

Unification in a nutshell (cf AI course)

Substitutions

Eg Num singular T vp(VNP)

Applying substitution bull Simultaneously replace variables by

corresponding termsbull S(Num) Num singular = S(singular)

Unification

Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)

Gives Num singular

bull Art(singular) and Art(plural) Fails

bull Art(Num1) and Art(Num2) Num1 Num2

bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative

Parsing with DCGs

Now require successful unification at each step

S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps

S-gt a turtle sleep fails

Case Marking

PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]

PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]

PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]

PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]

S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)

VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)

VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)

VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)

He sees her She sees him They see her

But not Them see he

DCGs

Are strictly more expressive than CFGs Can represent for instance

bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]

Probabilistic Models

Traditional grammar models are very rigid bull essentially a yes no decision

Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative

Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing

Illustration

Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known

bull Constructed by handbull Can be used to derive stochastic context free

grammarsbull SCFG assign probability to parse trees

Compute the most probable parse tree

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 8: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

Statistical Disambiguation

bull Define a probability model for the data

bull Compute the probability of each alternative

bull Choose the most likely alternative

NLP and Statistics

Statistical Methods deal with uncertaintyThey predict the future behaviour of a systembased on the behaviour observed in the past

Statistical Methods require training data

The data in Statistical NLP are the Corpora

NLP and Statistics

Corpus text collection for linguistic purposes

TokensHow many words are contained in Tom Sawyer 71370

TypesHow many different words are contained in TS 8018

Hapax Legomenawords appearing only once

Corpora

The most frequent words are function words

word freq word freq

the 3332 in 906

and 2972 that 877

a 1775 he 877

to 1725 I 783

of 1440 his 772

was 1161 you 686

it 1027 Tom 679

Word Counts

f nf

1 39932 12923 6644 4105 2436 1997 1728 1319 8210 9111-50 54051-100 99gt 100 102

How many words appear f times

Word Counts

About half of the words occurs just onceAbout half of the text consists of the

100 most common wordshellip

Word Counts (Brown corpus)

Word Counts (Brown corpus)

word f r fr word f r frthe 3332 1 3332 turned 51 200 10200and 2972 2 5944 youlsquoll 30 300 9000a 1775 3 5235 name 21 400 8400he 877 10 8770 comes 16 500 8000but 410 20 8400 group 13 600 7800be 294 30 8820 lead 11 700 7700there 222 40 8880 friends 10 800 8000one 172 50 8600 begin 9 900 8100about 158 60 9480 family 8 1000 8000more 138 70 9660 brushed 4 2000 8000never 124 80 9920 sins 2 3000 6000Oh 116 90 10440 Could 2 4000 8000two 104 100 10400 Applausive 1 8000 8000

Zipflsquos Law f~1r (fr = const)

Zipflsquos Law

Minimize effort

Language and sequences

Natural language processingbull Is concerned with the analysis of

sequences of words sentencesbullConstruction of language models

Two types of modelsbullNon-probabilisticbull Probabilistic

Human Language is highly ambiguous at all levels

bull acoustic levelrecognize speech vs wreck a nice beach

bull morphological levelsaw to see (past) saw (noun) to saw (present inf)

bull syntactic levelI saw the man on the hill with a telescope

bull semantic levelOne book has to be read by every student

Key NLP Problem Ambiguity

Language Model

A formal model about language Two types

bull Non-probabilistic Allows one to compute whether a certain sequence

(sentence or part thereof) is possible Often grammar based

bull Probabilistic Allows one to compute the probability of a certain

sequence Often extends grammars with probabilities

Example of bad language model

A bad language model

A bad language model

A good language model

Non-Probabilisticbull ldquoI swear to tell the truthrdquo is possiblebull ldquoI swerve to smell de souprdquo is impossible

Probabilisticbull P(I swear to tell the truth) ~ 0001bull P(I swerve to smell de soup) ~ 0

Why language models

Consider a Shannon Gamebull Predicting the next word in the sequence

Statistical natural language hellip The cat is thrown out of the hellip The large green hellip Sue swallowed the large green hellip hellip

Model at the sentence level

Applications

Spelling correction Mobile phone texting Speech recognition Handwriting recognition Disabled users hellip

Spelling errors

They are leaving in about fifteen minuets to go to her house

The study was conducted mainly be John Black Hopefully all with continue smoothly in my absence Can they lave him my messages I need to notified the bank ofhellip He is trying to fine out

Handwriting recognition

Assume a note is given to a bank teller which the teller reads as I have a gub (cf Woody Allen)

NLP to the rescue hellipbull gub is not a wordbull gun gum Gus and gull are words but gun has a

higher probability in the context of a bank

For Spell Checkers

Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere

ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo

Another dimension in language models

Do we mainly want to infer (probabilities) of legal sentences sequences bull So far

Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL

Letrsquos look at some tasks

Sequence Tagging

Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun

Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End

Sequence Tagging

Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip

Parsing

Given a sentence find its parse tree Important step in understanding NL

Parsing

In bioinformatics allows to predict (elements of) structure from sequence

Language models based on Grammars

Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars

A particular type of Unification Based Grammar (Prolog)

Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about

words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)

bull Grammar encode rules

Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing

(more than just recognition) Result of parsing mostly parse tree

showing the constituents of a sentence eg verb or noun phrases

Syntax usually specified in terms of a grammar consisting of grammar rules

Regular Grammars and Finite State Automata

Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no

argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an

argumentbull Adj (adjective)

Now acceptbull The cat sleptbull Det N Vi

As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]

Lexicon bull The - Detbull Cat - Nbull Slept - Vi

bull hellip

Finite State Automaton

Sentences

bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N

Phrase structure

S

NP

D N

VP

NPV

D N

PP

P NP

D N

the dog chased a cat into the garden

Notation

S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase

Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]

Terminals ~ Lexicon

Phrase structure

Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the

Recursionbull bdquoThe girl thought the dog chased the catldquo

VP -gt V SN -gt [girl]V -gt [thought]

Top-down parsing

S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat

Context-free grammarSS --gt --gt NPNPVPVP

NPNP --gt PN --gt PN Proper nounProper noun

NPNP --gt Art Adj N--gt Art Adj N

NPNP --gt ArtN--gt ArtN

VPVP --gt VI --gt VI intransitive verbintransitive verb

VPVP --gt VT --gt VT NPNP transitive verbtransitive verb

ArtArt --gt [the]--gt [the]

AdjAdj --gt [lazy]--gt [lazy]

AdjAdj --gt [rapid]--gt [rapid]

PNPN --gt [achilles]--gt [achilles]

NN --gt [turtle]--gt [turtle]

VIVI --gt [sleeps]--gt [sleeps]

VTVT --gt [beats]--gt [beats]

Parse tree

SS

NPNP VPVP

ArtArt AdjAdj NN VtVt NPNP

PNPN

achillesachillesbeatsbeatsturtleturtlerapidrapidthethe

Definite Clause GrammarsNon-terminals may have arguments

SS --gt --gt NPNP((NN))VPVP((NN))

NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))

VP(VP(NN)) --gt VI(--gt VI(NN))

Art(Art(singularsingular)) --gt [a]--gt [a]

Art(Art(singularsingular)) --gt [the]--gt [the]

Art(Art(pluralplural)) --gt [the]--gt [the]

N(N(singularsingular)) --gt [turtle]--gt [turtle]

N(N(pluralplural)) --gt [turtles]--gt [turtles]

VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]

VI(VI(pluralplural)) --gt [sleep]--gt [sleep]

Number Agreement

DCGs

Non-terminals may have argumentsbull Variables (start with capital)

Eg Number Any

bull Constants (start with lower case) Eg singular plural

bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)

Parsing needs to be adapted bull Using unification

Unification in a nutshell (cf AI course)

Substitutions

Eg Num singular T vp(VNP)

Applying substitution bull Simultaneously replace variables by

corresponding termsbull S(Num) Num singular = S(singular)

Unification

Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)

Gives Num singular

bull Art(singular) and Art(plural) Fails

bull Art(Num1) and Art(Num2) Num1 Num2

bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative

Parsing with DCGs

Now require successful unification at each step

S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps

S-gt a turtle sleep fails

Case Marking

PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]

PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]

PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]

PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]

S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)

VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)

VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)

VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)

He sees her She sees him They see her

But not Them see he

DCGs

Are strictly more expressive than CFGs Can represent for instance

bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]

Probabilistic Models

Traditional grammar models are very rigid bull essentially a yes no decision

Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative

Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing

Illustration

Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known

bull Constructed by handbull Can be used to derive stochastic context free

grammarsbull SCFG assign probability to parse trees

Compute the most probable parse tree

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 9: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

Statistical Methods deal with uncertaintyThey predict the future behaviour of a systembased on the behaviour observed in the past

Statistical Methods require training data

The data in Statistical NLP are the Corpora

NLP and Statistics

Corpus text collection for linguistic purposes

TokensHow many words are contained in Tom Sawyer 71370

TypesHow many different words are contained in TS 8018

Hapax Legomenawords appearing only once

Corpora

The most frequent words are function words

word freq word freq

the 3332 in 906

and 2972 that 877

a 1775 he 877

to 1725 I 783

of 1440 his 772

was 1161 you 686

it 1027 Tom 679

Word Counts

f nf

1 39932 12923 6644 4105 2436 1997 1728 1319 8210 9111-50 54051-100 99gt 100 102

How many words appear f times

Word Counts

About half of the words occurs just onceAbout half of the text consists of the

100 most common wordshellip

Word Counts (Brown corpus)

Word Counts (Brown corpus)

word f r fr word f r frthe 3332 1 3332 turned 51 200 10200and 2972 2 5944 youlsquoll 30 300 9000a 1775 3 5235 name 21 400 8400he 877 10 8770 comes 16 500 8000but 410 20 8400 group 13 600 7800be 294 30 8820 lead 11 700 7700there 222 40 8880 friends 10 800 8000one 172 50 8600 begin 9 900 8100about 158 60 9480 family 8 1000 8000more 138 70 9660 brushed 4 2000 8000never 124 80 9920 sins 2 3000 6000Oh 116 90 10440 Could 2 4000 8000two 104 100 10400 Applausive 1 8000 8000

Zipflsquos Law f~1r (fr = const)

Zipflsquos Law

Minimize effort

Language and sequences

Natural language processingbull Is concerned with the analysis of

sequences of words sentencesbullConstruction of language models

Two types of modelsbullNon-probabilisticbull Probabilistic

Human Language is highly ambiguous at all levels

bull acoustic levelrecognize speech vs wreck a nice beach

bull morphological levelsaw to see (past) saw (noun) to saw (present inf)

bull syntactic levelI saw the man on the hill with a telescope

bull semantic levelOne book has to be read by every student

Key NLP Problem Ambiguity

Language Model

A formal model about language Two types

bull Non-probabilistic Allows one to compute whether a certain sequence

(sentence or part thereof) is possible Often grammar based

bull Probabilistic Allows one to compute the probability of a certain

sequence Often extends grammars with probabilities

Example of bad language model

A bad language model

A bad language model

A good language model

Non-Probabilisticbull ldquoI swear to tell the truthrdquo is possiblebull ldquoI swerve to smell de souprdquo is impossible

Probabilisticbull P(I swear to tell the truth) ~ 0001bull P(I swerve to smell de soup) ~ 0

Why language models

Consider a Shannon Gamebull Predicting the next word in the sequence

Statistical natural language hellip The cat is thrown out of the hellip The large green hellip Sue swallowed the large green hellip hellip

Model at the sentence level

Applications

Spelling correction Mobile phone texting Speech recognition Handwriting recognition Disabled users hellip

Spelling errors

They are leaving in about fifteen minuets to go to her house

The study was conducted mainly be John Black Hopefully all with continue smoothly in my absence Can they lave him my messages I need to notified the bank ofhellip He is trying to fine out

Handwriting recognition

Assume a note is given to a bank teller which the teller reads as I have a gub (cf Woody Allen)

NLP to the rescue hellipbull gub is not a wordbull gun gum Gus and gull are words but gun has a

higher probability in the context of a bank

For Spell Checkers

Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere

ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo

Another dimension in language models

Do we mainly want to infer (probabilities) of legal sentences sequences bull So far

Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL

Letrsquos look at some tasks

Sequence Tagging

Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun

Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End

Sequence Tagging

Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip

Parsing

Given a sentence find its parse tree Important step in understanding NL

Parsing

In bioinformatics allows to predict (elements of) structure from sequence

Language models based on Grammars

Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars

A particular type of Unification Based Grammar (Prolog)

Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about

words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)

bull Grammar encode rules

Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing

(more than just recognition) Result of parsing mostly parse tree

showing the constituents of a sentence eg verb or noun phrases

Syntax usually specified in terms of a grammar consisting of grammar rules

Regular Grammars and Finite State Automata

Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no

argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an

argumentbull Adj (adjective)

Now acceptbull The cat sleptbull Det N Vi

As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]

Lexicon bull The - Detbull Cat - Nbull Slept - Vi

bull hellip

Finite State Automaton

Sentences

bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N

Phrase structure

S

NP

D N

VP

NPV

D N

PP

P NP

D N

the dog chased a cat into the garden

Notation

S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase

Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]

Terminals ~ Lexicon

Phrase structure

Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the

Recursionbull bdquoThe girl thought the dog chased the catldquo

VP -gt V SN -gt [girl]V -gt [thought]

Top-down parsing

S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat

Context-free grammarSS --gt --gt NPNPVPVP

NPNP --gt PN --gt PN Proper nounProper noun

NPNP --gt Art Adj N--gt Art Adj N

NPNP --gt ArtN--gt ArtN

VPVP --gt VI --gt VI intransitive verbintransitive verb

VPVP --gt VT --gt VT NPNP transitive verbtransitive verb

ArtArt --gt [the]--gt [the]

AdjAdj --gt [lazy]--gt [lazy]

AdjAdj --gt [rapid]--gt [rapid]

PNPN --gt [achilles]--gt [achilles]

NN --gt [turtle]--gt [turtle]

VIVI --gt [sleeps]--gt [sleeps]

VTVT --gt [beats]--gt [beats]

Parse tree

SS

NPNP VPVP

ArtArt AdjAdj NN VtVt NPNP

PNPN

achillesachillesbeatsbeatsturtleturtlerapidrapidthethe

Definite Clause GrammarsNon-terminals may have arguments

SS --gt --gt NPNP((NN))VPVP((NN))

NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))

VP(VP(NN)) --gt VI(--gt VI(NN))

Art(Art(singularsingular)) --gt [a]--gt [a]

Art(Art(singularsingular)) --gt [the]--gt [the]

Art(Art(pluralplural)) --gt [the]--gt [the]

N(N(singularsingular)) --gt [turtle]--gt [turtle]

N(N(pluralplural)) --gt [turtles]--gt [turtles]

VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]

VI(VI(pluralplural)) --gt [sleep]--gt [sleep]

Number Agreement

DCGs

Non-terminals may have argumentsbull Variables (start with capital)

Eg Number Any

bull Constants (start with lower case) Eg singular plural

bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)

Parsing needs to be adapted bull Using unification

Unification in a nutshell (cf AI course)

Substitutions

Eg Num singular T vp(VNP)

Applying substitution bull Simultaneously replace variables by

corresponding termsbull S(Num) Num singular = S(singular)

Unification

Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)

Gives Num singular

bull Art(singular) and Art(plural) Fails

bull Art(Num1) and Art(Num2) Num1 Num2

bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative

Parsing with DCGs

Now require successful unification at each step

S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps

S-gt a turtle sleep fails

Case Marking

PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]

PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]

PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]

PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]

S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)

VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)

VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)

VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)

He sees her She sees him They see her

But not Them see he

DCGs

Are strictly more expressive than CFGs Can represent for instance

bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]

Probabilistic Models

Traditional grammar models are very rigid bull essentially a yes no decision

Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative

Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing

Illustration

Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known

bull Constructed by handbull Can be used to derive stochastic context free

grammarsbull SCFG assign probability to parse trees

Compute the most probable parse tree

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 10: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

Corpus text collection for linguistic purposes

TokensHow many words are contained in Tom Sawyer 71370

TypesHow many different words are contained in TS 8018

Hapax Legomenawords appearing only once

Corpora

The most frequent words are function words

word freq word freq

the 3332 in 906

and 2972 that 877

a 1775 he 877

to 1725 I 783

of 1440 his 772

was 1161 you 686

it 1027 Tom 679

Word Counts

f nf

1 39932 12923 6644 4105 2436 1997 1728 1319 8210 9111-50 54051-100 99gt 100 102

How many words appear f times

Word Counts

About half of the words occurs just onceAbout half of the text consists of the

100 most common wordshellip

Word Counts (Brown corpus)

Word Counts (Brown corpus)

word f r fr word f r frthe 3332 1 3332 turned 51 200 10200and 2972 2 5944 youlsquoll 30 300 9000a 1775 3 5235 name 21 400 8400he 877 10 8770 comes 16 500 8000but 410 20 8400 group 13 600 7800be 294 30 8820 lead 11 700 7700there 222 40 8880 friends 10 800 8000one 172 50 8600 begin 9 900 8100about 158 60 9480 family 8 1000 8000more 138 70 9660 brushed 4 2000 8000never 124 80 9920 sins 2 3000 6000Oh 116 90 10440 Could 2 4000 8000two 104 100 10400 Applausive 1 8000 8000

Zipflsquos Law f~1r (fr = const)

Zipflsquos Law

Minimize effort

Language and sequences

Natural language processingbull Is concerned with the analysis of

sequences of words sentencesbullConstruction of language models

Two types of modelsbullNon-probabilisticbull Probabilistic

Human Language is highly ambiguous at all levels

bull acoustic levelrecognize speech vs wreck a nice beach

bull morphological levelsaw to see (past) saw (noun) to saw (present inf)

bull syntactic levelI saw the man on the hill with a telescope

bull semantic levelOne book has to be read by every student

Key NLP Problem Ambiguity

Language Model

A formal model about language Two types

bull Non-probabilistic Allows one to compute whether a certain sequence

(sentence or part thereof) is possible Often grammar based

bull Probabilistic Allows one to compute the probability of a certain

sequence Often extends grammars with probabilities

Example of bad language model

A bad language model

A bad language model

A good language model

Non-Probabilisticbull ldquoI swear to tell the truthrdquo is possiblebull ldquoI swerve to smell de souprdquo is impossible

Probabilisticbull P(I swear to tell the truth) ~ 0001bull P(I swerve to smell de soup) ~ 0

Why language models

Consider a Shannon Gamebull Predicting the next word in the sequence

Statistical natural language hellip The cat is thrown out of the hellip The large green hellip Sue swallowed the large green hellip hellip

Model at the sentence level

Applications

Spelling correction Mobile phone texting Speech recognition Handwriting recognition Disabled users hellip

Spelling errors

They are leaving in about fifteen minuets to go to her house

The study was conducted mainly be John Black Hopefully all with continue smoothly in my absence Can they lave him my messages I need to notified the bank ofhellip He is trying to fine out

Handwriting recognition

Assume a note is given to a bank teller which the teller reads as I have a gub (cf Woody Allen)

NLP to the rescue hellipbull gub is not a wordbull gun gum Gus and gull are words but gun has a

higher probability in the context of a bank

For Spell Checkers

Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere

ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo

Another dimension in language models

Do we mainly want to infer (probabilities) of legal sentences sequences bull So far

Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL

Letrsquos look at some tasks

Sequence Tagging

Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun

Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End

Sequence Tagging

Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip

Parsing

Given a sentence find its parse tree Important step in understanding NL

Parsing

In bioinformatics allows to predict (elements of) structure from sequence

Language models based on Grammars

Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars

A particular type of Unification Based Grammar (Prolog)

Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about

words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)

bull Grammar encode rules

Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing

(more than just recognition) Result of parsing mostly parse tree

showing the constituents of a sentence eg verb or noun phrases

Syntax usually specified in terms of a grammar consisting of grammar rules

Regular Grammars and Finite State Automata

Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no

argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an

argumentbull Adj (adjective)

Now acceptbull The cat sleptbull Det N Vi

As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]

Lexicon bull The - Detbull Cat - Nbull Slept - Vi

bull hellip

Finite State Automaton

Sentences

bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N

Phrase structure

S

NP

D N

VP

NPV

D N

PP

P NP

D N

the dog chased a cat into the garden

Notation

S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase

Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]

Terminals ~ Lexicon

Phrase structure

Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the

Recursionbull bdquoThe girl thought the dog chased the catldquo

VP -gt V SN -gt [girl]V -gt [thought]

Top-down parsing

S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat

Context-free grammarSS --gt --gt NPNPVPVP

NPNP --gt PN --gt PN Proper nounProper noun

NPNP --gt Art Adj N--gt Art Adj N

NPNP --gt ArtN--gt ArtN

VPVP --gt VI --gt VI intransitive verbintransitive verb

VPVP --gt VT --gt VT NPNP transitive verbtransitive verb

ArtArt --gt [the]--gt [the]

AdjAdj --gt [lazy]--gt [lazy]

AdjAdj --gt [rapid]--gt [rapid]

PNPN --gt [achilles]--gt [achilles]

NN --gt [turtle]--gt [turtle]

VIVI --gt [sleeps]--gt [sleeps]

VTVT --gt [beats]--gt [beats]

Parse tree

SS

NPNP VPVP

ArtArt AdjAdj NN VtVt NPNP

PNPN

achillesachillesbeatsbeatsturtleturtlerapidrapidthethe

Definite Clause GrammarsNon-terminals may have arguments

SS --gt --gt NPNP((NN))VPVP((NN))

NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))

VP(VP(NN)) --gt VI(--gt VI(NN))

Art(Art(singularsingular)) --gt [a]--gt [a]

Art(Art(singularsingular)) --gt [the]--gt [the]

Art(Art(pluralplural)) --gt [the]--gt [the]

N(N(singularsingular)) --gt [turtle]--gt [turtle]

N(N(pluralplural)) --gt [turtles]--gt [turtles]

VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]

VI(VI(pluralplural)) --gt [sleep]--gt [sleep]

Number Agreement

DCGs

Non-terminals may have argumentsbull Variables (start with capital)

Eg Number Any

bull Constants (start with lower case) Eg singular plural

bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)

Parsing needs to be adapted bull Using unification

Unification in a nutshell (cf AI course)

Substitutions

Eg Num singular T vp(VNP)

Applying substitution bull Simultaneously replace variables by

corresponding termsbull S(Num) Num singular = S(singular)

Unification

Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)

Gives Num singular

bull Art(singular) and Art(plural) Fails

bull Art(Num1) and Art(Num2) Num1 Num2

bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative

Parsing with DCGs

Now require successful unification at each step

S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps

S-gt a turtle sleep fails

Case Marking

PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]

PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]

PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]

PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]

S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)

VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)

VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)

VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)

He sees her She sees him They see her

But not Them see he

DCGs

Are strictly more expressive than CFGs Can represent for instance

bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]

Probabilistic Models

Traditional grammar models are very rigid bull essentially a yes no decision

Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative

Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing

Illustration

Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known

bull Constructed by handbull Can be used to derive stochastic context free

grammarsbull SCFG assign probability to parse trees

Compute the most probable parse tree

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 11: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

The most frequent words are function words

word freq word freq

the 3332 in 906

and 2972 that 877

a 1775 he 877

to 1725 I 783

of 1440 his 772

was 1161 you 686

it 1027 Tom 679

Word Counts

f nf

1 39932 12923 6644 4105 2436 1997 1728 1319 8210 9111-50 54051-100 99gt 100 102

How many words appear f times

Word Counts

About half of the words occurs just onceAbout half of the text consists of the

100 most common wordshellip

Word Counts (Brown corpus)

Word Counts (Brown corpus)

word f r fr word f r frthe 3332 1 3332 turned 51 200 10200and 2972 2 5944 youlsquoll 30 300 9000a 1775 3 5235 name 21 400 8400he 877 10 8770 comes 16 500 8000but 410 20 8400 group 13 600 7800be 294 30 8820 lead 11 700 7700there 222 40 8880 friends 10 800 8000one 172 50 8600 begin 9 900 8100about 158 60 9480 family 8 1000 8000more 138 70 9660 brushed 4 2000 8000never 124 80 9920 sins 2 3000 6000Oh 116 90 10440 Could 2 4000 8000two 104 100 10400 Applausive 1 8000 8000

Zipflsquos Law f~1r (fr = const)

Zipflsquos Law

Minimize effort

Language and sequences

Natural language processingbull Is concerned with the analysis of

sequences of words sentencesbullConstruction of language models

Two types of modelsbullNon-probabilisticbull Probabilistic

Human Language is highly ambiguous at all levels

bull acoustic levelrecognize speech vs wreck a nice beach

bull morphological levelsaw to see (past) saw (noun) to saw (present inf)

bull syntactic levelI saw the man on the hill with a telescope

bull semantic levelOne book has to be read by every student

Key NLP Problem Ambiguity

Language Model

A formal model about language Two types

bull Non-probabilistic Allows one to compute whether a certain sequence

(sentence or part thereof) is possible Often grammar based

bull Probabilistic Allows one to compute the probability of a certain

sequence Often extends grammars with probabilities

Example of bad language model

A bad language model

A bad language model

A good language model

Non-Probabilisticbull ldquoI swear to tell the truthrdquo is possiblebull ldquoI swerve to smell de souprdquo is impossible

Probabilisticbull P(I swear to tell the truth) ~ 0001bull P(I swerve to smell de soup) ~ 0

Why language models

Consider a Shannon Gamebull Predicting the next word in the sequence

Statistical natural language hellip The cat is thrown out of the hellip The large green hellip Sue swallowed the large green hellip hellip

Model at the sentence level

Applications

Spelling correction Mobile phone texting Speech recognition Handwriting recognition Disabled users hellip

Spelling errors

They are leaving in about fifteen minuets to go to her house

The study was conducted mainly be John Black Hopefully all with continue smoothly in my absence Can they lave him my messages I need to notified the bank ofhellip He is trying to fine out

Handwriting recognition

Assume a note is given to a bank teller which the teller reads as I have a gub (cf Woody Allen)

NLP to the rescue hellipbull gub is not a wordbull gun gum Gus and gull are words but gun has a

higher probability in the context of a bank

For Spell Checkers

Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere

ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo

Another dimension in language models

Do we mainly want to infer (probabilities) of legal sentences sequences bull So far

Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL

Letrsquos look at some tasks

Sequence Tagging

Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun

Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End

Sequence Tagging

Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip

Parsing

Given a sentence find its parse tree Important step in understanding NL

Parsing

In bioinformatics allows to predict (elements of) structure from sequence

Language models based on Grammars

Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars

A particular type of Unification Based Grammar (Prolog)

Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about

words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)

bull Grammar encode rules

Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing

(more than just recognition) Result of parsing mostly parse tree

showing the constituents of a sentence eg verb or noun phrases

Syntax usually specified in terms of a grammar consisting of grammar rules

Regular Grammars and Finite State Automata

Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no

argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an

argumentbull Adj (adjective)

Now acceptbull The cat sleptbull Det N Vi

As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]

Lexicon bull The - Detbull Cat - Nbull Slept - Vi

bull hellip

Finite State Automaton

Sentences

bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N

Phrase structure

S

NP

D N

VP

NPV

D N

PP

P NP

D N

the dog chased a cat into the garden

Notation

S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase

Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]

Terminals ~ Lexicon

Phrase structure

Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the

Recursionbull bdquoThe girl thought the dog chased the catldquo

VP -gt V SN -gt [girl]V -gt [thought]

Top-down parsing

S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat

Context-free grammarSS --gt --gt NPNPVPVP

NPNP --gt PN --gt PN Proper nounProper noun

NPNP --gt Art Adj N--gt Art Adj N

NPNP --gt ArtN--gt ArtN

VPVP --gt VI --gt VI intransitive verbintransitive verb

VPVP --gt VT --gt VT NPNP transitive verbtransitive verb

ArtArt --gt [the]--gt [the]

AdjAdj --gt [lazy]--gt [lazy]

AdjAdj --gt [rapid]--gt [rapid]

PNPN --gt [achilles]--gt [achilles]

NN --gt [turtle]--gt [turtle]

VIVI --gt [sleeps]--gt [sleeps]

VTVT --gt [beats]--gt [beats]

Parse tree

SS

NPNP VPVP

ArtArt AdjAdj NN VtVt NPNP

PNPN

achillesachillesbeatsbeatsturtleturtlerapidrapidthethe

Definite Clause GrammarsNon-terminals may have arguments

SS --gt --gt NPNP((NN))VPVP((NN))

NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))

VP(VP(NN)) --gt VI(--gt VI(NN))

Art(Art(singularsingular)) --gt [a]--gt [a]

Art(Art(singularsingular)) --gt [the]--gt [the]

Art(Art(pluralplural)) --gt [the]--gt [the]

N(N(singularsingular)) --gt [turtle]--gt [turtle]

N(N(pluralplural)) --gt [turtles]--gt [turtles]

VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]

VI(VI(pluralplural)) --gt [sleep]--gt [sleep]

Number Agreement

DCGs

Non-terminals may have argumentsbull Variables (start with capital)

Eg Number Any

bull Constants (start with lower case) Eg singular plural

bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)

Parsing needs to be adapted bull Using unification

Unification in a nutshell (cf AI course)

Substitutions

Eg Num singular T vp(VNP)

Applying substitution bull Simultaneously replace variables by

corresponding termsbull S(Num) Num singular = S(singular)

Unification

Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)

Gives Num singular

bull Art(singular) and Art(plural) Fails

bull Art(Num1) and Art(Num2) Num1 Num2

bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative

Parsing with DCGs

Now require successful unification at each step

S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps

S-gt a turtle sleep fails

Case Marking

PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]

PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]

PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]

PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]

S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)

VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)

VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)

VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)

He sees her She sees him They see her

But not Them see he

DCGs

Are strictly more expressive than CFGs Can represent for instance

bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]

Probabilistic Models

Traditional grammar models are very rigid bull essentially a yes no decision

Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative

Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing

Illustration

Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known

bull Constructed by handbull Can be used to derive stochastic context free

grammarsbull SCFG assign probability to parse trees

Compute the most probable parse tree

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 12: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

f nf

1 39932 12923 6644 4105 2436 1997 1728 1319 8210 9111-50 54051-100 99gt 100 102

How many words appear f times

Word Counts

About half of the words occurs just onceAbout half of the text consists of the

100 most common wordshellip

Word Counts (Brown corpus)

Word Counts (Brown corpus)

word f r fr word f r frthe 3332 1 3332 turned 51 200 10200and 2972 2 5944 youlsquoll 30 300 9000a 1775 3 5235 name 21 400 8400he 877 10 8770 comes 16 500 8000but 410 20 8400 group 13 600 7800be 294 30 8820 lead 11 700 7700there 222 40 8880 friends 10 800 8000one 172 50 8600 begin 9 900 8100about 158 60 9480 family 8 1000 8000more 138 70 9660 brushed 4 2000 8000never 124 80 9920 sins 2 3000 6000Oh 116 90 10440 Could 2 4000 8000two 104 100 10400 Applausive 1 8000 8000

Zipflsquos Law f~1r (fr = const)

Zipflsquos Law

Minimize effort

Language and sequences

Natural language processingbull Is concerned with the analysis of

sequences of words sentencesbullConstruction of language models

Two types of modelsbullNon-probabilisticbull Probabilistic

Human Language is highly ambiguous at all levels

bull acoustic levelrecognize speech vs wreck a nice beach

bull morphological levelsaw to see (past) saw (noun) to saw (present inf)

bull syntactic levelI saw the man on the hill with a telescope

bull semantic levelOne book has to be read by every student

Key NLP Problem Ambiguity

Language Model

A formal model about language Two types

bull Non-probabilistic Allows one to compute whether a certain sequence

(sentence or part thereof) is possible Often grammar based

bull Probabilistic Allows one to compute the probability of a certain

sequence Often extends grammars with probabilities

Example of bad language model

A bad language model

A bad language model

A good language model

Non-Probabilisticbull ldquoI swear to tell the truthrdquo is possiblebull ldquoI swerve to smell de souprdquo is impossible

Probabilisticbull P(I swear to tell the truth) ~ 0001bull P(I swerve to smell de soup) ~ 0

Why language models

Consider a Shannon Gamebull Predicting the next word in the sequence

Statistical natural language hellip The cat is thrown out of the hellip The large green hellip Sue swallowed the large green hellip hellip

Model at the sentence level

Applications

Spelling correction Mobile phone texting Speech recognition Handwriting recognition Disabled users hellip

Spelling errors

They are leaving in about fifteen minuets to go to her house

The study was conducted mainly be John Black Hopefully all with continue smoothly in my absence Can they lave him my messages I need to notified the bank ofhellip He is trying to fine out

Handwriting recognition

Assume a note is given to a bank teller which the teller reads as I have a gub (cf Woody Allen)

NLP to the rescue hellipbull gub is not a wordbull gun gum Gus and gull are words but gun has a

higher probability in the context of a bank

For Spell Checkers

Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere

ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo

Another dimension in language models

Do we mainly want to infer (probabilities) of legal sentences sequences bull So far

Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL

Letrsquos look at some tasks

Sequence Tagging

Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun

Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End

Sequence Tagging

Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip

Parsing

Given a sentence find its parse tree Important step in understanding NL

Parsing

In bioinformatics allows to predict (elements of) structure from sequence

Language models based on Grammars

Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars

A particular type of Unification Based Grammar (Prolog)

Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about

words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)

bull Grammar encode rules

Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing

(more than just recognition) Result of parsing mostly parse tree

showing the constituents of a sentence eg verb or noun phrases

Syntax usually specified in terms of a grammar consisting of grammar rules

Regular Grammars and Finite State Automata

Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no

argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an

argumentbull Adj (adjective)

Now acceptbull The cat sleptbull Det N Vi

As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]

Lexicon bull The - Detbull Cat - Nbull Slept - Vi

bull hellip

Finite State Automaton

Sentences

bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N

Phrase structure

S

NP

D N

VP

NPV

D N

PP

P NP

D N

the dog chased a cat into the garden

Notation

S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase

Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]

Terminals ~ Lexicon

Phrase structure

Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the

Recursionbull bdquoThe girl thought the dog chased the catldquo

VP -gt V SN -gt [girl]V -gt [thought]

Top-down parsing

S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat

Context-free grammarSS --gt --gt NPNPVPVP

NPNP --gt PN --gt PN Proper nounProper noun

NPNP --gt Art Adj N--gt Art Adj N

NPNP --gt ArtN--gt ArtN

VPVP --gt VI --gt VI intransitive verbintransitive verb

VPVP --gt VT --gt VT NPNP transitive verbtransitive verb

ArtArt --gt [the]--gt [the]

AdjAdj --gt [lazy]--gt [lazy]

AdjAdj --gt [rapid]--gt [rapid]

PNPN --gt [achilles]--gt [achilles]

NN --gt [turtle]--gt [turtle]

VIVI --gt [sleeps]--gt [sleeps]

VTVT --gt [beats]--gt [beats]

Parse tree

SS

NPNP VPVP

ArtArt AdjAdj NN VtVt NPNP

PNPN

achillesachillesbeatsbeatsturtleturtlerapidrapidthethe

Definite Clause GrammarsNon-terminals may have arguments

SS --gt --gt NPNP((NN))VPVP((NN))

NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))

VP(VP(NN)) --gt VI(--gt VI(NN))

Art(Art(singularsingular)) --gt [a]--gt [a]

Art(Art(singularsingular)) --gt [the]--gt [the]

Art(Art(pluralplural)) --gt [the]--gt [the]

N(N(singularsingular)) --gt [turtle]--gt [turtle]

N(N(pluralplural)) --gt [turtles]--gt [turtles]

VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]

VI(VI(pluralplural)) --gt [sleep]--gt [sleep]

Number Agreement

DCGs

Non-terminals may have argumentsbull Variables (start with capital)

Eg Number Any

bull Constants (start with lower case) Eg singular plural

bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)

Parsing needs to be adapted bull Using unification

Unification in a nutshell (cf AI course)

Substitutions

Eg Num singular T vp(VNP)

Applying substitution bull Simultaneously replace variables by

corresponding termsbull S(Num) Num singular = S(singular)

Unification

Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)

Gives Num singular

bull Art(singular) and Art(plural) Fails

bull Art(Num1) and Art(Num2) Num1 Num2

bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative

Parsing with DCGs

Now require successful unification at each step

S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps

S-gt a turtle sleep fails

Case Marking

PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]

PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]

PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]

PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]

S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)

VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)

VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)

VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)

He sees her She sees him They see her

But not Them see he

DCGs

Are strictly more expressive than CFGs Can represent for instance

bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]

Probabilistic Models

Traditional grammar models are very rigid bull essentially a yes no decision

Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative

Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing

Illustration

Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known

bull Constructed by handbull Can be used to derive stochastic context free

grammarsbull SCFG assign probability to parse trees

Compute the most probable parse tree

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 13: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

Word Counts (Brown corpus)

Word Counts (Brown corpus)

word f r fr word f r frthe 3332 1 3332 turned 51 200 10200and 2972 2 5944 youlsquoll 30 300 9000a 1775 3 5235 name 21 400 8400he 877 10 8770 comes 16 500 8000but 410 20 8400 group 13 600 7800be 294 30 8820 lead 11 700 7700there 222 40 8880 friends 10 800 8000one 172 50 8600 begin 9 900 8100about 158 60 9480 family 8 1000 8000more 138 70 9660 brushed 4 2000 8000never 124 80 9920 sins 2 3000 6000Oh 116 90 10440 Could 2 4000 8000two 104 100 10400 Applausive 1 8000 8000

Zipflsquos Law f~1r (fr = const)

Zipflsquos Law

Minimize effort

Language and sequences

Natural language processingbull Is concerned with the analysis of

sequences of words sentencesbullConstruction of language models

Two types of modelsbullNon-probabilisticbull Probabilistic

Human Language is highly ambiguous at all levels

bull acoustic levelrecognize speech vs wreck a nice beach

bull morphological levelsaw to see (past) saw (noun) to saw (present inf)

bull syntactic levelI saw the man on the hill with a telescope

bull semantic levelOne book has to be read by every student

Key NLP Problem Ambiguity

Language Model

A formal model about language Two types

bull Non-probabilistic Allows one to compute whether a certain sequence

(sentence or part thereof) is possible Often grammar based

bull Probabilistic Allows one to compute the probability of a certain

sequence Often extends grammars with probabilities

Example of bad language model

A bad language model

A bad language model

A good language model

Non-Probabilisticbull ldquoI swear to tell the truthrdquo is possiblebull ldquoI swerve to smell de souprdquo is impossible

Probabilisticbull P(I swear to tell the truth) ~ 0001bull P(I swerve to smell de soup) ~ 0

Why language models

Consider a Shannon Gamebull Predicting the next word in the sequence

Statistical natural language hellip The cat is thrown out of the hellip The large green hellip Sue swallowed the large green hellip hellip

Model at the sentence level

Applications

Spelling correction Mobile phone texting Speech recognition Handwriting recognition Disabled users hellip

Spelling errors

They are leaving in about fifteen minuets to go to her house

The study was conducted mainly be John Black Hopefully all with continue smoothly in my absence Can they lave him my messages I need to notified the bank ofhellip He is trying to fine out

Handwriting recognition

Assume a note is given to a bank teller which the teller reads as I have a gub (cf Woody Allen)

NLP to the rescue hellipbull gub is not a wordbull gun gum Gus and gull are words but gun has a

higher probability in the context of a bank

For Spell Checkers

Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere

ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo

Another dimension in language models

Do we mainly want to infer (probabilities) of legal sentences sequences bull So far

Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL

Letrsquos look at some tasks

Sequence Tagging

Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun

Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End

Sequence Tagging

Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip

Parsing

Given a sentence find its parse tree Important step in understanding NL

Parsing

In bioinformatics allows to predict (elements of) structure from sequence

Language models based on Grammars

Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars

A particular type of Unification Based Grammar (Prolog)

Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about

words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)

bull Grammar encode rules

Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing

(more than just recognition) Result of parsing mostly parse tree

showing the constituents of a sentence eg verb or noun phrases

Syntax usually specified in terms of a grammar consisting of grammar rules

Regular Grammars and Finite State Automata

Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no

argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an

argumentbull Adj (adjective)

Now acceptbull The cat sleptbull Det N Vi

As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]

Lexicon bull The - Detbull Cat - Nbull Slept - Vi

bull hellip

Finite State Automaton

Sentences

bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N

Phrase structure

S

NP

D N

VP

NPV

D N

PP

P NP

D N

the dog chased a cat into the garden

Notation

S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase

Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]

Terminals ~ Lexicon

Phrase structure

Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the

Recursionbull bdquoThe girl thought the dog chased the catldquo

VP -gt V SN -gt [girl]V -gt [thought]

Top-down parsing

S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat

Context-free grammarSS --gt --gt NPNPVPVP

NPNP --gt PN --gt PN Proper nounProper noun

NPNP --gt Art Adj N--gt Art Adj N

NPNP --gt ArtN--gt ArtN

VPVP --gt VI --gt VI intransitive verbintransitive verb

VPVP --gt VT --gt VT NPNP transitive verbtransitive verb

ArtArt --gt [the]--gt [the]

AdjAdj --gt [lazy]--gt [lazy]

AdjAdj --gt [rapid]--gt [rapid]

PNPN --gt [achilles]--gt [achilles]

NN --gt [turtle]--gt [turtle]

VIVI --gt [sleeps]--gt [sleeps]

VTVT --gt [beats]--gt [beats]

Parse tree

SS

NPNP VPVP

ArtArt AdjAdj NN VtVt NPNP

PNPN

achillesachillesbeatsbeatsturtleturtlerapidrapidthethe

Definite Clause GrammarsNon-terminals may have arguments

SS --gt --gt NPNP((NN))VPVP((NN))

NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))

VP(VP(NN)) --gt VI(--gt VI(NN))

Art(Art(singularsingular)) --gt [a]--gt [a]

Art(Art(singularsingular)) --gt [the]--gt [the]

Art(Art(pluralplural)) --gt [the]--gt [the]

N(N(singularsingular)) --gt [turtle]--gt [turtle]

N(N(pluralplural)) --gt [turtles]--gt [turtles]

VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]

VI(VI(pluralplural)) --gt [sleep]--gt [sleep]

Number Agreement

DCGs

Non-terminals may have argumentsbull Variables (start with capital)

Eg Number Any

bull Constants (start with lower case) Eg singular plural

bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)

Parsing needs to be adapted bull Using unification

Unification in a nutshell (cf AI course)

Substitutions

Eg Num singular T vp(VNP)

Applying substitution bull Simultaneously replace variables by

corresponding termsbull S(Num) Num singular = S(singular)

Unification

Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)

Gives Num singular

bull Art(singular) and Art(plural) Fails

bull Art(Num1) and Art(Num2) Num1 Num2

bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative

Parsing with DCGs

Now require successful unification at each step

S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps

S-gt a turtle sleep fails

Case Marking

PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]

PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]

PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]

PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]

S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)

VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)

VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)

VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)

He sees her She sees him They see her

But not Them see he

DCGs

Are strictly more expressive than CFGs Can represent for instance

bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]

Probabilistic Models

Traditional grammar models are very rigid bull essentially a yes no decision

Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative

Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing

Illustration

Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known

bull Constructed by handbull Can be used to derive stochastic context free

grammarsbull SCFG assign probability to parse trees

Compute the most probable parse tree

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 14: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

Word Counts (Brown corpus)

word f r fr word f r frthe 3332 1 3332 turned 51 200 10200and 2972 2 5944 youlsquoll 30 300 9000a 1775 3 5235 name 21 400 8400he 877 10 8770 comes 16 500 8000but 410 20 8400 group 13 600 7800be 294 30 8820 lead 11 700 7700there 222 40 8880 friends 10 800 8000one 172 50 8600 begin 9 900 8100about 158 60 9480 family 8 1000 8000more 138 70 9660 brushed 4 2000 8000never 124 80 9920 sins 2 3000 6000Oh 116 90 10440 Could 2 4000 8000two 104 100 10400 Applausive 1 8000 8000

Zipflsquos Law f~1r (fr = const)

Zipflsquos Law

Minimize effort

Language and sequences

Natural language processingbull Is concerned with the analysis of

sequences of words sentencesbullConstruction of language models

Two types of modelsbullNon-probabilisticbull Probabilistic

Human Language is highly ambiguous at all levels

bull acoustic levelrecognize speech vs wreck a nice beach

bull morphological levelsaw to see (past) saw (noun) to saw (present inf)

bull syntactic levelI saw the man on the hill with a telescope

bull semantic levelOne book has to be read by every student

Key NLP Problem Ambiguity

Language Model

A formal model about language Two types

bull Non-probabilistic Allows one to compute whether a certain sequence

(sentence or part thereof) is possible Often grammar based

bull Probabilistic Allows one to compute the probability of a certain

sequence Often extends grammars with probabilities

Example of bad language model

A bad language model

A bad language model

A good language model

Non-Probabilisticbull ldquoI swear to tell the truthrdquo is possiblebull ldquoI swerve to smell de souprdquo is impossible

Probabilisticbull P(I swear to tell the truth) ~ 0001bull P(I swerve to smell de soup) ~ 0

Why language models

Consider a Shannon Gamebull Predicting the next word in the sequence

Statistical natural language hellip The cat is thrown out of the hellip The large green hellip Sue swallowed the large green hellip hellip

Model at the sentence level

Applications

Spelling correction Mobile phone texting Speech recognition Handwriting recognition Disabled users hellip

Spelling errors

They are leaving in about fifteen minuets to go to her house

The study was conducted mainly be John Black Hopefully all with continue smoothly in my absence Can they lave him my messages I need to notified the bank ofhellip He is trying to fine out

Handwriting recognition

Assume a note is given to a bank teller which the teller reads as I have a gub (cf Woody Allen)

NLP to the rescue hellipbull gub is not a wordbull gun gum Gus and gull are words but gun has a

higher probability in the context of a bank

For Spell Checkers

Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere

ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo

Another dimension in language models

Do we mainly want to infer (probabilities) of legal sentences sequences bull So far

Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL

Letrsquos look at some tasks

Sequence Tagging

Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun

Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End

Sequence Tagging

Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip

Parsing

Given a sentence find its parse tree Important step in understanding NL

Parsing

In bioinformatics allows to predict (elements of) structure from sequence

Language models based on Grammars

Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars

A particular type of Unification Based Grammar (Prolog)

Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about

words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)

bull Grammar encode rules

Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing

(more than just recognition) Result of parsing mostly parse tree

showing the constituents of a sentence eg verb or noun phrases

Syntax usually specified in terms of a grammar consisting of grammar rules

Regular Grammars and Finite State Automata

Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no

argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an

argumentbull Adj (adjective)

Now acceptbull The cat sleptbull Det N Vi

As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]

Lexicon bull The - Detbull Cat - Nbull Slept - Vi

bull hellip

Finite State Automaton

Sentences

bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N

Phrase structure

S

NP

D N

VP

NPV

D N

PP

P NP

D N

the dog chased a cat into the garden

Notation

S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase

Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]

Terminals ~ Lexicon

Phrase structure

Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the

Recursionbull bdquoThe girl thought the dog chased the catldquo

VP -gt V SN -gt [girl]V -gt [thought]

Top-down parsing

S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat

Context-free grammarSS --gt --gt NPNPVPVP

NPNP --gt PN --gt PN Proper nounProper noun

NPNP --gt Art Adj N--gt Art Adj N

NPNP --gt ArtN--gt ArtN

VPVP --gt VI --gt VI intransitive verbintransitive verb

VPVP --gt VT --gt VT NPNP transitive verbtransitive verb

ArtArt --gt [the]--gt [the]

AdjAdj --gt [lazy]--gt [lazy]

AdjAdj --gt [rapid]--gt [rapid]

PNPN --gt [achilles]--gt [achilles]

NN --gt [turtle]--gt [turtle]

VIVI --gt [sleeps]--gt [sleeps]

VTVT --gt [beats]--gt [beats]

Parse tree

SS

NPNP VPVP

ArtArt AdjAdj NN VtVt NPNP

PNPN

achillesachillesbeatsbeatsturtleturtlerapidrapidthethe

Definite Clause GrammarsNon-terminals may have arguments

SS --gt --gt NPNP((NN))VPVP((NN))

NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))

VP(VP(NN)) --gt VI(--gt VI(NN))

Art(Art(singularsingular)) --gt [a]--gt [a]

Art(Art(singularsingular)) --gt [the]--gt [the]

Art(Art(pluralplural)) --gt [the]--gt [the]

N(N(singularsingular)) --gt [turtle]--gt [turtle]

N(N(pluralplural)) --gt [turtles]--gt [turtles]

VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]

VI(VI(pluralplural)) --gt [sleep]--gt [sleep]

Number Agreement

DCGs

Non-terminals may have argumentsbull Variables (start with capital)

Eg Number Any

bull Constants (start with lower case) Eg singular plural

bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)

Parsing needs to be adapted bull Using unification

Unification in a nutshell (cf AI course)

Substitutions

Eg Num singular T vp(VNP)

Applying substitution bull Simultaneously replace variables by

corresponding termsbull S(Num) Num singular = S(singular)

Unification

Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)

Gives Num singular

bull Art(singular) and Art(plural) Fails

bull Art(Num1) and Art(Num2) Num1 Num2

bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative

Parsing with DCGs

Now require successful unification at each step

S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps

S-gt a turtle sleep fails

Case Marking

PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]

PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]

PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]

PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]

S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)

VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)

VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)

VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)

He sees her She sees him They see her

But not Them see he

DCGs

Are strictly more expressive than CFGs Can represent for instance

bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]

Probabilistic Models

Traditional grammar models are very rigid bull essentially a yes no decision

Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative

Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing

Illustration

Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known

bull Constructed by handbull Can be used to derive stochastic context free

grammarsbull SCFG assign probability to parse trees

Compute the most probable parse tree

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 15: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

word f r fr word f r frthe 3332 1 3332 turned 51 200 10200and 2972 2 5944 youlsquoll 30 300 9000a 1775 3 5235 name 21 400 8400he 877 10 8770 comes 16 500 8000but 410 20 8400 group 13 600 7800be 294 30 8820 lead 11 700 7700there 222 40 8880 friends 10 800 8000one 172 50 8600 begin 9 900 8100about 158 60 9480 family 8 1000 8000more 138 70 9660 brushed 4 2000 8000never 124 80 9920 sins 2 3000 6000Oh 116 90 10440 Could 2 4000 8000two 104 100 10400 Applausive 1 8000 8000

Zipflsquos Law f~1r (fr = const)

Zipflsquos Law

Minimize effort

Language and sequences

Natural language processingbull Is concerned with the analysis of

sequences of words sentencesbullConstruction of language models

Two types of modelsbullNon-probabilisticbull Probabilistic

Human Language is highly ambiguous at all levels

bull acoustic levelrecognize speech vs wreck a nice beach

bull morphological levelsaw to see (past) saw (noun) to saw (present inf)

bull syntactic levelI saw the man on the hill with a telescope

bull semantic levelOne book has to be read by every student

Key NLP Problem Ambiguity

Language Model

A formal model about language Two types

bull Non-probabilistic Allows one to compute whether a certain sequence

(sentence or part thereof) is possible Often grammar based

bull Probabilistic Allows one to compute the probability of a certain

sequence Often extends grammars with probabilities

Example of bad language model

A bad language model

A bad language model

A good language model

Non-Probabilisticbull ldquoI swear to tell the truthrdquo is possiblebull ldquoI swerve to smell de souprdquo is impossible

Probabilisticbull P(I swear to tell the truth) ~ 0001bull P(I swerve to smell de soup) ~ 0

Why language models

Consider a Shannon Gamebull Predicting the next word in the sequence

Statistical natural language hellip The cat is thrown out of the hellip The large green hellip Sue swallowed the large green hellip hellip

Model at the sentence level

Applications

Spelling correction Mobile phone texting Speech recognition Handwriting recognition Disabled users hellip

Spelling errors

They are leaving in about fifteen minuets to go to her house

The study was conducted mainly be John Black Hopefully all with continue smoothly in my absence Can they lave him my messages I need to notified the bank ofhellip He is trying to fine out

Handwriting recognition

Assume a note is given to a bank teller which the teller reads as I have a gub (cf Woody Allen)

NLP to the rescue hellipbull gub is not a wordbull gun gum Gus and gull are words but gun has a

higher probability in the context of a bank

For Spell Checkers

Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere

ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo

Another dimension in language models

Do we mainly want to infer (probabilities) of legal sentences sequences bull So far

Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL

Letrsquos look at some tasks

Sequence Tagging

Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun

Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End

Sequence Tagging

Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip

Parsing

Given a sentence find its parse tree Important step in understanding NL

Parsing

In bioinformatics allows to predict (elements of) structure from sequence

Language models based on Grammars

Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars

A particular type of Unification Based Grammar (Prolog)

Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about

words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)

bull Grammar encode rules

Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing

(more than just recognition) Result of parsing mostly parse tree

showing the constituents of a sentence eg verb or noun phrases

Syntax usually specified in terms of a grammar consisting of grammar rules

Regular Grammars and Finite State Automata

Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no

argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an

argumentbull Adj (adjective)

Now acceptbull The cat sleptbull Det N Vi

As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]

Lexicon bull The - Detbull Cat - Nbull Slept - Vi

bull hellip

Finite State Automaton

Sentences

bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N

Phrase structure

S

NP

D N

VP

NPV

D N

PP

P NP

D N

the dog chased a cat into the garden

Notation

S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase

Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]

Terminals ~ Lexicon

Phrase structure

Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the

Recursionbull bdquoThe girl thought the dog chased the catldquo

VP -gt V SN -gt [girl]V -gt [thought]

Top-down parsing

S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat

Context-free grammarSS --gt --gt NPNPVPVP

NPNP --gt PN --gt PN Proper nounProper noun

NPNP --gt Art Adj N--gt Art Adj N

NPNP --gt ArtN--gt ArtN

VPVP --gt VI --gt VI intransitive verbintransitive verb

VPVP --gt VT --gt VT NPNP transitive verbtransitive verb

ArtArt --gt [the]--gt [the]

AdjAdj --gt [lazy]--gt [lazy]

AdjAdj --gt [rapid]--gt [rapid]

PNPN --gt [achilles]--gt [achilles]

NN --gt [turtle]--gt [turtle]

VIVI --gt [sleeps]--gt [sleeps]

VTVT --gt [beats]--gt [beats]

Parse tree

SS

NPNP VPVP

ArtArt AdjAdj NN VtVt NPNP

PNPN

achillesachillesbeatsbeatsturtleturtlerapidrapidthethe

Definite Clause GrammarsNon-terminals may have arguments

SS --gt --gt NPNP((NN))VPVP((NN))

NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))

VP(VP(NN)) --gt VI(--gt VI(NN))

Art(Art(singularsingular)) --gt [a]--gt [a]

Art(Art(singularsingular)) --gt [the]--gt [the]

Art(Art(pluralplural)) --gt [the]--gt [the]

N(N(singularsingular)) --gt [turtle]--gt [turtle]

N(N(pluralplural)) --gt [turtles]--gt [turtles]

VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]

VI(VI(pluralplural)) --gt [sleep]--gt [sleep]

Number Agreement

DCGs

Non-terminals may have argumentsbull Variables (start with capital)

Eg Number Any

bull Constants (start with lower case) Eg singular plural

bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)

Parsing needs to be adapted bull Using unification

Unification in a nutshell (cf AI course)

Substitutions

Eg Num singular T vp(VNP)

Applying substitution bull Simultaneously replace variables by

corresponding termsbull S(Num) Num singular = S(singular)

Unification

Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)

Gives Num singular

bull Art(singular) and Art(plural) Fails

bull Art(Num1) and Art(Num2) Num1 Num2

bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative

Parsing with DCGs

Now require successful unification at each step

S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps

S-gt a turtle sleep fails

Case Marking

PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]

PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]

PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]

PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]

S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)

VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)

VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)

VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)

He sees her She sees him They see her

But not Them see he

DCGs

Are strictly more expressive than CFGs Can represent for instance

bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]

Probabilistic Models

Traditional grammar models are very rigid bull essentially a yes no decision

Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative

Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing

Illustration

Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known

bull Constructed by handbull Can be used to derive stochastic context free

grammarsbull SCFG assign probability to parse trees

Compute the most probable parse tree

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 16: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

Language and sequences

Natural language processingbull Is concerned with the analysis of

sequences of words sentencesbullConstruction of language models

Two types of modelsbullNon-probabilisticbull Probabilistic

Human Language is highly ambiguous at all levels

bull acoustic levelrecognize speech vs wreck a nice beach

bull morphological levelsaw to see (past) saw (noun) to saw (present inf)

bull syntactic levelI saw the man on the hill with a telescope

bull semantic levelOne book has to be read by every student

Key NLP Problem Ambiguity

Language Model

A formal model about language Two types

bull Non-probabilistic Allows one to compute whether a certain sequence

(sentence or part thereof) is possible Often grammar based

bull Probabilistic Allows one to compute the probability of a certain

sequence Often extends grammars with probabilities

Example of bad language model

A bad language model

A bad language model

A good language model

Non-Probabilisticbull ldquoI swear to tell the truthrdquo is possiblebull ldquoI swerve to smell de souprdquo is impossible

Probabilisticbull P(I swear to tell the truth) ~ 0001bull P(I swerve to smell de soup) ~ 0

Why language models

Consider a Shannon Gamebull Predicting the next word in the sequence

Statistical natural language hellip The cat is thrown out of the hellip The large green hellip Sue swallowed the large green hellip hellip

Model at the sentence level

Applications

Spelling correction Mobile phone texting Speech recognition Handwriting recognition Disabled users hellip

Spelling errors

They are leaving in about fifteen minuets to go to her house

The study was conducted mainly be John Black Hopefully all with continue smoothly in my absence Can they lave him my messages I need to notified the bank ofhellip He is trying to fine out

Handwriting recognition

Assume a note is given to a bank teller which the teller reads as I have a gub (cf Woody Allen)

NLP to the rescue hellipbull gub is not a wordbull gun gum Gus and gull are words but gun has a

higher probability in the context of a bank

For Spell Checkers

Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere

ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo

Another dimension in language models

Do we mainly want to infer (probabilities) of legal sentences sequences bull So far

Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL

Letrsquos look at some tasks

Sequence Tagging

Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun

Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End

Sequence Tagging

Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip

Parsing

Given a sentence find its parse tree Important step in understanding NL

Parsing

In bioinformatics allows to predict (elements of) structure from sequence

Language models based on Grammars

Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars

A particular type of Unification Based Grammar (Prolog)

Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about

words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)

bull Grammar encode rules

Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing

(more than just recognition) Result of parsing mostly parse tree

showing the constituents of a sentence eg verb or noun phrases

Syntax usually specified in terms of a grammar consisting of grammar rules

Regular Grammars and Finite State Automata

Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no

argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an

argumentbull Adj (adjective)

Now acceptbull The cat sleptbull Det N Vi

As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]

Lexicon bull The - Detbull Cat - Nbull Slept - Vi

bull hellip

Finite State Automaton

Sentences

bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N

Phrase structure

S

NP

D N

VP

NPV

D N

PP

P NP

D N

the dog chased a cat into the garden

Notation

S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase

Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]

Terminals ~ Lexicon

Phrase structure

Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the

Recursionbull bdquoThe girl thought the dog chased the catldquo

VP -gt V SN -gt [girl]V -gt [thought]

Top-down parsing

S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat

Context-free grammarSS --gt --gt NPNPVPVP

NPNP --gt PN --gt PN Proper nounProper noun

NPNP --gt Art Adj N--gt Art Adj N

NPNP --gt ArtN--gt ArtN

VPVP --gt VI --gt VI intransitive verbintransitive verb

VPVP --gt VT --gt VT NPNP transitive verbtransitive verb

ArtArt --gt [the]--gt [the]

AdjAdj --gt [lazy]--gt [lazy]

AdjAdj --gt [rapid]--gt [rapid]

PNPN --gt [achilles]--gt [achilles]

NN --gt [turtle]--gt [turtle]

VIVI --gt [sleeps]--gt [sleeps]

VTVT --gt [beats]--gt [beats]

Parse tree

SS

NPNP VPVP

ArtArt AdjAdj NN VtVt NPNP

PNPN

achillesachillesbeatsbeatsturtleturtlerapidrapidthethe

Definite Clause GrammarsNon-terminals may have arguments

SS --gt --gt NPNP((NN))VPVP((NN))

NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))

VP(VP(NN)) --gt VI(--gt VI(NN))

Art(Art(singularsingular)) --gt [a]--gt [a]

Art(Art(singularsingular)) --gt [the]--gt [the]

Art(Art(pluralplural)) --gt [the]--gt [the]

N(N(singularsingular)) --gt [turtle]--gt [turtle]

N(N(pluralplural)) --gt [turtles]--gt [turtles]

VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]

VI(VI(pluralplural)) --gt [sleep]--gt [sleep]

Number Agreement

DCGs

Non-terminals may have argumentsbull Variables (start with capital)

Eg Number Any

bull Constants (start with lower case) Eg singular plural

bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)

Parsing needs to be adapted bull Using unification

Unification in a nutshell (cf AI course)

Substitutions

Eg Num singular T vp(VNP)

Applying substitution bull Simultaneously replace variables by

corresponding termsbull S(Num) Num singular = S(singular)

Unification

Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)

Gives Num singular

bull Art(singular) and Art(plural) Fails

bull Art(Num1) and Art(Num2) Num1 Num2

bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative

Parsing with DCGs

Now require successful unification at each step

S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps

S-gt a turtle sleep fails

Case Marking

PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]

PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]

PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]

PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]

S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)

VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)

VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)

VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)

He sees her She sees him They see her

But not Them see he

DCGs

Are strictly more expressive than CFGs Can represent for instance

bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]

Probabilistic Models

Traditional grammar models are very rigid bull essentially a yes no decision

Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative

Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing

Illustration

Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known

bull Constructed by handbull Can be used to derive stochastic context free

grammarsbull SCFG assign probability to parse trees

Compute the most probable parse tree

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 17: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

Human Language is highly ambiguous at all levels

bull acoustic levelrecognize speech vs wreck a nice beach

bull morphological levelsaw to see (past) saw (noun) to saw (present inf)

bull syntactic levelI saw the man on the hill with a telescope

bull semantic levelOne book has to be read by every student

Key NLP Problem Ambiguity

Language Model

A formal model about language Two types

bull Non-probabilistic Allows one to compute whether a certain sequence

(sentence or part thereof) is possible Often grammar based

bull Probabilistic Allows one to compute the probability of a certain

sequence Often extends grammars with probabilities

Example of bad language model

A bad language model

A bad language model

A good language model

Non-Probabilisticbull ldquoI swear to tell the truthrdquo is possiblebull ldquoI swerve to smell de souprdquo is impossible

Probabilisticbull P(I swear to tell the truth) ~ 0001bull P(I swerve to smell de soup) ~ 0

Why language models

Consider a Shannon Gamebull Predicting the next word in the sequence

Statistical natural language hellip The cat is thrown out of the hellip The large green hellip Sue swallowed the large green hellip hellip

Model at the sentence level

Applications

Spelling correction Mobile phone texting Speech recognition Handwriting recognition Disabled users hellip

Spelling errors

They are leaving in about fifteen minuets to go to her house

The study was conducted mainly be John Black Hopefully all with continue smoothly in my absence Can they lave him my messages I need to notified the bank ofhellip He is trying to fine out

Handwriting recognition

Assume a note is given to a bank teller which the teller reads as I have a gub (cf Woody Allen)

NLP to the rescue hellipbull gub is not a wordbull gun gum Gus and gull are words but gun has a

higher probability in the context of a bank

For Spell Checkers

Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere

ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo

Another dimension in language models

Do we mainly want to infer (probabilities) of legal sentences sequences bull So far

Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL

Letrsquos look at some tasks

Sequence Tagging

Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun

Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End

Sequence Tagging

Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip

Parsing

Given a sentence find its parse tree Important step in understanding NL

Parsing

In bioinformatics allows to predict (elements of) structure from sequence

Language models based on Grammars

Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars

A particular type of Unification Based Grammar (Prolog)

Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about

words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)

bull Grammar encode rules

Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing

(more than just recognition) Result of parsing mostly parse tree

showing the constituents of a sentence eg verb or noun phrases

Syntax usually specified in terms of a grammar consisting of grammar rules

Regular Grammars and Finite State Automata

Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no

argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an

argumentbull Adj (adjective)

Now acceptbull The cat sleptbull Det N Vi

As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]

Lexicon bull The - Detbull Cat - Nbull Slept - Vi

bull hellip

Finite State Automaton

Sentences

bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N

Phrase structure

S

NP

D N

VP

NPV

D N

PP

P NP

D N

the dog chased a cat into the garden

Notation

S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase

Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]

Terminals ~ Lexicon

Phrase structure

Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the

Recursionbull bdquoThe girl thought the dog chased the catldquo

VP -gt V SN -gt [girl]V -gt [thought]

Top-down parsing

S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat

Context-free grammarSS --gt --gt NPNPVPVP

NPNP --gt PN --gt PN Proper nounProper noun

NPNP --gt Art Adj N--gt Art Adj N

NPNP --gt ArtN--gt ArtN

VPVP --gt VI --gt VI intransitive verbintransitive verb

VPVP --gt VT --gt VT NPNP transitive verbtransitive verb

ArtArt --gt [the]--gt [the]

AdjAdj --gt [lazy]--gt [lazy]

AdjAdj --gt [rapid]--gt [rapid]

PNPN --gt [achilles]--gt [achilles]

NN --gt [turtle]--gt [turtle]

VIVI --gt [sleeps]--gt [sleeps]

VTVT --gt [beats]--gt [beats]

Parse tree

SS

NPNP VPVP

ArtArt AdjAdj NN VtVt NPNP

PNPN

achillesachillesbeatsbeatsturtleturtlerapidrapidthethe

Definite Clause GrammarsNon-terminals may have arguments

SS --gt --gt NPNP((NN))VPVP((NN))

NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))

VP(VP(NN)) --gt VI(--gt VI(NN))

Art(Art(singularsingular)) --gt [a]--gt [a]

Art(Art(singularsingular)) --gt [the]--gt [the]

Art(Art(pluralplural)) --gt [the]--gt [the]

N(N(singularsingular)) --gt [turtle]--gt [turtle]

N(N(pluralplural)) --gt [turtles]--gt [turtles]

VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]

VI(VI(pluralplural)) --gt [sleep]--gt [sleep]

Number Agreement

DCGs

Non-terminals may have argumentsbull Variables (start with capital)

Eg Number Any

bull Constants (start with lower case) Eg singular plural

bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)

Parsing needs to be adapted bull Using unification

Unification in a nutshell (cf AI course)

Substitutions

Eg Num singular T vp(VNP)

Applying substitution bull Simultaneously replace variables by

corresponding termsbull S(Num) Num singular = S(singular)

Unification

Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)

Gives Num singular

bull Art(singular) and Art(plural) Fails

bull Art(Num1) and Art(Num2) Num1 Num2

bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative

Parsing with DCGs

Now require successful unification at each step

S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps

S-gt a turtle sleep fails

Case Marking

PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]

PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]

PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]

PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]

S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)

VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)

VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)

VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)

He sees her She sees him They see her

But not Them see he

DCGs

Are strictly more expressive than CFGs Can represent for instance

bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]

Probabilistic Models

Traditional grammar models are very rigid bull essentially a yes no decision

Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative

Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing

Illustration

Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known

bull Constructed by handbull Can be used to derive stochastic context free

grammarsbull SCFG assign probability to parse trees

Compute the most probable parse tree

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 18: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

Language Model

A formal model about language Two types

bull Non-probabilistic Allows one to compute whether a certain sequence

(sentence or part thereof) is possible Often grammar based

bull Probabilistic Allows one to compute the probability of a certain

sequence Often extends grammars with probabilities

Example of bad language model

A bad language model

A bad language model

A good language model

Non-Probabilisticbull ldquoI swear to tell the truthrdquo is possiblebull ldquoI swerve to smell de souprdquo is impossible

Probabilisticbull P(I swear to tell the truth) ~ 0001bull P(I swerve to smell de soup) ~ 0

Why language models

Consider a Shannon Gamebull Predicting the next word in the sequence

Statistical natural language hellip The cat is thrown out of the hellip The large green hellip Sue swallowed the large green hellip hellip

Model at the sentence level

Applications

Spelling correction Mobile phone texting Speech recognition Handwriting recognition Disabled users hellip

Spelling errors

They are leaving in about fifteen minuets to go to her house

The study was conducted mainly be John Black Hopefully all with continue smoothly in my absence Can they lave him my messages I need to notified the bank ofhellip He is trying to fine out

Handwriting recognition

Assume a note is given to a bank teller which the teller reads as I have a gub (cf Woody Allen)

NLP to the rescue hellipbull gub is not a wordbull gun gum Gus and gull are words but gun has a

higher probability in the context of a bank

For Spell Checkers

Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere

ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo

Another dimension in language models

Do we mainly want to infer (probabilities) of legal sentences sequences bull So far

Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL

Letrsquos look at some tasks

Sequence Tagging

Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun

Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End

Sequence Tagging

Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip

Parsing

Given a sentence find its parse tree Important step in understanding NL

Parsing

In bioinformatics allows to predict (elements of) structure from sequence

Language models based on Grammars

Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars

A particular type of Unification Based Grammar (Prolog)

Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about

words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)

bull Grammar encode rules

Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing

(more than just recognition) Result of parsing mostly parse tree

showing the constituents of a sentence eg verb or noun phrases

Syntax usually specified in terms of a grammar consisting of grammar rules

Regular Grammars and Finite State Automata

Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no

argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an

argumentbull Adj (adjective)

Now acceptbull The cat sleptbull Det N Vi

As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]

Lexicon bull The - Detbull Cat - Nbull Slept - Vi

bull hellip

Finite State Automaton

Sentences

bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N

Phrase structure

S

NP

D N

VP

NPV

D N

PP

P NP

D N

the dog chased a cat into the garden

Notation

S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase

Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]

Terminals ~ Lexicon

Phrase structure

Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the

Recursionbull bdquoThe girl thought the dog chased the catldquo

VP -gt V SN -gt [girl]V -gt [thought]

Top-down parsing

S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat

Context-free grammarSS --gt --gt NPNPVPVP

NPNP --gt PN --gt PN Proper nounProper noun

NPNP --gt Art Adj N--gt Art Adj N

NPNP --gt ArtN--gt ArtN

VPVP --gt VI --gt VI intransitive verbintransitive verb

VPVP --gt VT --gt VT NPNP transitive verbtransitive verb

ArtArt --gt [the]--gt [the]

AdjAdj --gt [lazy]--gt [lazy]

AdjAdj --gt [rapid]--gt [rapid]

PNPN --gt [achilles]--gt [achilles]

NN --gt [turtle]--gt [turtle]

VIVI --gt [sleeps]--gt [sleeps]

VTVT --gt [beats]--gt [beats]

Parse tree

SS

NPNP VPVP

ArtArt AdjAdj NN VtVt NPNP

PNPN

achillesachillesbeatsbeatsturtleturtlerapidrapidthethe

Definite Clause GrammarsNon-terminals may have arguments

SS --gt --gt NPNP((NN))VPVP((NN))

NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))

VP(VP(NN)) --gt VI(--gt VI(NN))

Art(Art(singularsingular)) --gt [a]--gt [a]

Art(Art(singularsingular)) --gt [the]--gt [the]

Art(Art(pluralplural)) --gt [the]--gt [the]

N(N(singularsingular)) --gt [turtle]--gt [turtle]

N(N(pluralplural)) --gt [turtles]--gt [turtles]

VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]

VI(VI(pluralplural)) --gt [sleep]--gt [sleep]

Number Agreement

DCGs

Non-terminals may have argumentsbull Variables (start with capital)

Eg Number Any

bull Constants (start with lower case) Eg singular plural

bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)

Parsing needs to be adapted bull Using unification

Unification in a nutshell (cf AI course)

Substitutions

Eg Num singular T vp(VNP)

Applying substitution bull Simultaneously replace variables by

corresponding termsbull S(Num) Num singular = S(singular)

Unification

Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)

Gives Num singular

bull Art(singular) and Art(plural) Fails

bull Art(Num1) and Art(Num2) Num1 Num2

bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative

Parsing with DCGs

Now require successful unification at each step

S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps

S-gt a turtle sleep fails

Case Marking

PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]

PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]

PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]

PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]

S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)

VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)

VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)

VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)

He sees her She sees him They see her

But not Them see he

DCGs

Are strictly more expressive than CFGs Can represent for instance

bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]

Probabilistic Models

Traditional grammar models are very rigid bull essentially a yes no decision

Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative

Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing

Illustration

Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known

bull Constructed by handbull Can be used to derive stochastic context free

grammarsbull SCFG assign probability to parse trees

Compute the most probable parse tree

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 19: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

Example of bad language model

A bad language model

A bad language model

A good language model

Non-Probabilisticbull ldquoI swear to tell the truthrdquo is possiblebull ldquoI swerve to smell de souprdquo is impossible

Probabilisticbull P(I swear to tell the truth) ~ 0001bull P(I swerve to smell de soup) ~ 0

Why language models

Consider a Shannon Gamebull Predicting the next word in the sequence

Statistical natural language hellip The cat is thrown out of the hellip The large green hellip Sue swallowed the large green hellip hellip

Model at the sentence level

Applications

Spelling correction Mobile phone texting Speech recognition Handwriting recognition Disabled users hellip

Spelling errors

They are leaving in about fifteen minuets to go to her house

The study was conducted mainly be John Black Hopefully all with continue smoothly in my absence Can they lave him my messages I need to notified the bank ofhellip He is trying to fine out

Handwriting recognition

Assume a note is given to a bank teller which the teller reads as I have a gub (cf Woody Allen)

NLP to the rescue hellipbull gub is not a wordbull gun gum Gus and gull are words but gun has a

higher probability in the context of a bank

For Spell Checkers

Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere

ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo

Another dimension in language models

Do we mainly want to infer (probabilities) of legal sentences sequences bull So far

Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL

Letrsquos look at some tasks

Sequence Tagging

Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun

Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End

Sequence Tagging

Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip

Parsing

Given a sentence find its parse tree Important step in understanding NL

Parsing

In bioinformatics allows to predict (elements of) structure from sequence

Language models based on Grammars

Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars

A particular type of Unification Based Grammar (Prolog)

Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about

words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)

bull Grammar encode rules

Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing

(more than just recognition) Result of parsing mostly parse tree

showing the constituents of a sentence eg verb or noun phrases

Syntax usually specified in terms of a grammar consisting of grammar rules

Regular Grammars and Finite State Automata

Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no

argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an

argumentbull Adj (adjective)

Now acceptbull The cat sleptbull Det N Vi

As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]

Lexicon bull The - Detbull Cat - Nbull Slept - Vi

bull hellip

Finite State Automaton

Sentences

bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N

Phrase structure

S

NP

D N

VP

NPV

D N

PP

P NP

D N

the dog chased a cat into the garden

Notation

S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase

Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]

Terminals ~ Lexicon

Phrase structure

Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the

Recursionbull bdquoThe girl thought the dog chased the catldquo

VP -gt V SN -gt [girl]V -gt [thought]

Top-down parsing

S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat

Context-free grammarSS --gt --gt NPNPVPVP

NPNP --gt PN --gt PN Proper nounProper noun

NPNP --gt Art Adj N--gt Art Adj N

NPNP --gt ArtN--gt ArtN

VPVP --gt VI --gt VI intransitive verbintransitive verb

VPVP --gt VT --gt VT NPNP transitive verbtransitive verb

ArtArt --gt [the]--gt [the]

AdjAdj --gt [lazy]--gt [lazy]

AdjAdj --gt [rapid]--gt [rapid]

PNPN --gt [achilles]--gt [achilles]

NN --gt [turtle]--gt [turtle]

VIVI --gt [sleeps]--gt [sleeps]

VTVT --gt [beats]--gt [beats]

Parse tree

SS

NPNP VPVP

ArtArt AdjAdj NN VtVt NPNP

PNPN

achillesachillesbeatsbeatsturtleturtlerapidrapidthethe

Definite Clause GrammarsNon-terminals may have arguments

SS --gt --gt NPNP((NN))VPVP((NN))

NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))

VP(VP(NN)) --gt VI(--gt VI(NN))

Art(Art(singularsingular)) --gt [a]--gt [a]

Art(Art(singularsingular)) --gt [the]--gt [the]

Art(Art(pluralplural)) --gt [the]--gt [the]

N(N(singularsingular)) --gt [turtle]--gt [turtle]

N(N(pluralplural)) --gt [turtles]--gt [turtles]

VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]

VI(VI(pluralplural)) --gt [sleep]--gt [sleep]

Number Agreement

DCGs

Non-terminals may have argumentsbull Variables (start with capital)

Eg Number Any

bull Constants (start with lower case) Eg singular plural

bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)

Parsing needs to be adapted bull Using unification

Unification in a nutshell (cf AI course)

Substitutions

Eg Num singular T vp(VNP)

Applying substitution bull Simultaneously replace variables by

corresponding termsbull S(Num) Num singular = S(singular)

Unification

Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)

Gives Num singular

bull Art(singular) and Art(plural) Fails

bull Art(Num1) and Art(Num2) Num1 Num2

bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative

Parsing with DCGs

Now require successful unification at each step

S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps

S-gt a turtle sleep fails

Case Marking

PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]

PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]

PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]

PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]

S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)

VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)

VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)

VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)

He sees her She sees him They see her

But not Them see he

DCGs

Are strictly more expressive than CFGs Can represent for instance

bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]

Probabilistic Models

Traditional grammar models are very rigid bull essentially a yes no decision

Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative

Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing

Illustration

Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known

bull Constructed by handbull Can be used to derive stochastic context free

grammarsbull SCFG assign probability to parse trees

Compute the most probable parse tree

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 20: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

A bad language model

A bad language model

A good language model

Non-Probabilisticbull ldquoI swear to tell the truthrdquo is possiblebull ldquoI swerve to smell de souprdquo is impossible

Probabilisticbull P(I swear to tell the truth) ~ 0001bull P(I swerve to smell de soup) ~ 0

Why language models

Consider a Shannon Gamebull Predicting the next word in the sequence

Statistical natural language hellip The cat is thrown out of the hellip The large green hellip Sue swallowed the large green hellip hellip

Model at the sentence level

Applications

Spelling correction Mobile phone texting Speech recognition Handwriting recognition Disabled users hellip

Spelling errors

They are leaving in about fifteen minuets to go to her house

The study was conducted mainly be John Black Hopefully all with continue smoothly in my absence Can they lave him my messages I need to notified the bank ofhellip He is trying to fine out

Handwriting recognition

Assume a note is given to a bank teller which the teller reads as I have a gub (cf Woody Allen)

NLP to the rescue hellipbull gub is not a wordbull gun gum Gus and gull are words but gun has a

higher probability in the context of a bank

For Spell Checkers

Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere

ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo

Another dimension in language models

Do we mainly want to infer (probabilities) of legal sentences sequences bull So far

Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL

Letrsquos look at some tasks

Sequence Tagging

Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun

Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End

Sequence Tagging

Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip

Parsing

Given a sentence find its parse tree Important step in understanding NL

Parsing

In bioinformatics allows to predict (elements of) structure from sequence

Language models based on Grammars

Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars

A particular type of Unification Based Grammar (Prolog)

Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about

words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)

bull Grammar encode rules

Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing

(more than just recognition) Result of parsing mostly parse tree

showing the constituents of a sentence eg verb or noun phrases

Syntax usually specified in terms of a grammar consisting of grammar rules

Regular Grammars and Finite State Automata

Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no

argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an

argumentbull Adj (adjective)

Now acceptbull The cat sleptbull Det N Vi

As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]

Lexicon bull The - Detbull Cat - Nbull Slept - Vi

bull hellip

Finite State Automaton

Sentences

bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N

Phrase structure

S

NP

D N

VP

NPV

D N

PP

P NP

D N

the dog chased a cat into the garden

Notation

S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase

Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]

Terminals ~ Lexicon

Phrase structure

Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the

Recursionbull bdquoThe girl thought the dog chased the catldquo

VP -gt V SN -gt [girl]V -gt [thought]

Top-down parsing

S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat

Context-free grammarSS --gt --gt NPNPVPVP

NPNP --gt PN --gt PN Proper nounProper noun

NPNP --gt Art Adj N--gt Art Adj N

NPNP --gt ArtN--gt ArtN

VPVP --gt VI --gt VI intransitive verbintransitive verb

VPVP --gt VT --gt VT NPNP transitive verbtransitive verb

ArtArt --gt [the]--gt [the]

AdjAdj --gt [lazy]--gt [lazy]

AdjAdj --gt [rapid]--gt [rapid]

PNPN --gt [achilles]--gt [achilles]

NN --gt [turtle]--gt [turtle]

VIVI --gt [sleeps]--gt [sleeps]

VTVT --gt [beats]--gt [beats]

Parse tree

SS

NPNP VPVP

ArtArt AdjAdj NN VtVt NPNP

PNPN

achillesachillesbeatsbeatsturtleturtlerapidrapidthethe

Definite Clause GrammarsNon-terminals may have arguments

SS --gt --gt NPNP((NN))VPVP((NN))

NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))

VP(VP(NN)) --gt VI(--gt VI(NN))

Art(Art(singularsingular)) --gt [a]--gt [a]

Art(Art(singularsingular)) --gt [the]--gt [the]

Art(Art(pluralplural)) --gt [the]--gt [the]

N(N(singularsingular)) --gt [turtle]--gt [turtle]

N(N(pluralplural)) --gt [turtles]--gt [turtles]

VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]

VI(VI(pluralplural)) --gt [sleep]--gt [sleep]

Number Agreement

DCGs

Non-terminals may have argumentsbull Variables (start with capital)

Eg Number Any

bull Constants (start with lower case) Eg singular plural

bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)

Parsing needs to be adapted bull Using unification

Unification in a nutshell (cf AI course)

Substitutions

Eg Num singular T vp(VNP)

Applying substitution bull Simultaneously replace variables by

corresponding termsbull S(Num) Num singular = S(singular)

Unification

Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)

Gives Num singular

bull Art(singular) and Art(plural) Fails

bull Art(Num1) and Art(Num2) Num1 Num2

bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative

Parsing with DCGs

Now require successful unification at each step

S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps

S-gt a turtle sleep fails

Case Marking

PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]

PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]

PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]

PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]

S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)

VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)

VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)

VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)

He sees her She sees him They see her

But not Them see he

DCGs

Are strictly more expressive than CFGs Can represent for instance

bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]

Probabilistic Models

Traditional grammar models are very rigid bull essentially a yes no decision

Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative

Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing

Illustration

Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known

bull Constructed by handbull Can be used to derive stochastic context free

grammarsbull SCFG assign probability to parse trees

Compute the most probable parse tree

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 21: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

A bad language model

A good language model

Non-Probabilisticbull ldquoI swear to tell the truthrdquo is possiblebull ldquoI swerve to smell de souprdquo is impossible

Probabilisticbull P(I swear to tell the truth) ~ 0001bull P(I swerve to smell de soup) ~ 0

Why language models

Consider a Shannon Gamebull Predicting the next word in the sequence

Statistical natural language hellip The cat is thrown out of the hellip The large green hellip Sue swallowed the large green hellip hellip

Model at the sentence level

Applications

Spelling correction Mobile phone texting Speech recognition Handwriting recognition Disabled users hellip

Spelling errors

They are leaving in about fifteen minuets to go to her house

The study was conducted mainly be John Black Hopefully all with continue smoothly in my absence Can they lave him my messages I need to notified the bank ofhellip He is trying to fine out

Handwriting recognition

Assume a note is given to a bank teller which the teller reads as I have a gub (cf Woody Allen)

NLP to the rescue hellipbull gub is not a wordbull gun gum Gus and gull are words but gun has a

higher probability in the context of a bank

For Spell Checkers

Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere

ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo

Another dimension in language models

Do we mainly want to infer (probabilities) of legal sentences sequences bull So far

Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL

Letrsquos look at some tasks

Sequence Tagging

Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun

Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End

Sequence Tagging

Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip

Parsing

Given a sentence find its parse tree Important step in understanding NL

Parsing

In bioinformatics allows to predict (elements of) structure from sequence

Language models based on Grammars

Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars

A particular type of Unification Based Grammar (Prolog)

Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about

words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)

bull Grammar encode rules

Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing

(more than just recognition) Result of parsing mostly parse tree

showing the constituents of a sentence eg verb or noun phrases

Syntax usually specified in terms of a grammar consisting of grammar rules

Regular Grammars and Finite State Automata

Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no

argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an

argumentbull Adj (adjective)

Now acceptbull The cat sleptbull Det N Vi

As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]

Lexicon bull The - Detbull Cat - Nbull Slept - Vi

bull hellip

Finite State Automaton

Sentences

bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N

Phrase structure

S

NP

D N

VP

NPV

D N

PP

P NP

D N

the dog chased a cat into the garden

Notation

S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase

Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]

Terminals ~ Lexicon

Phrase structure

Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the

Recursionbull bdquoThe girl thought the dog chased the catldquo

VP -gt V SN -gt [girl]V -gt [thought]

Top-down parsing

S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat

Context-free grammarSS --gt --gt NPNPVPVP

NPNP --gt PN --gt PN Proper nounProper noun

NPNP --gt Art Adj N--gt Art Adj N

NPNP --gt ArtN--gt ArtN

VPVP --gt VI --gt VI intransitive verbintransitive verb

VPVP --gt VT --gt VT NPNP transitive verbtransitive verb

ArtArt --gt [the]--gt [the]

AdjAdj --gt [lazy]--gt [lazy]

AdjAdj --gt [rapid]--gt [rapid]

PNPN --gt [achilles]--gt [achilles]

NN --gt [turtle]--gt [turtle]

VIVI --gt [sleeps]--gt [sleeps]

VTVT --gt [beats]--gt [beats]

Parse tree

SS

NPNP VPVP

ArtArt AdjAdj NN VtVt NPNP

PNPN

achillesachillesbeatsbeatsturtleturtlerapidrapidthethe

Definite Clause GrammarsNon-terminals may have arguments

SS --gt --gt NPNP((NN))VPVP((NN))

NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))

VP(VP(NN)) --gt VI(--gt VI(NN))

Art(Art(singularsingular)) --gt [a]--gt [a]

Art(Art(singularsingular)) --gt [the]--gt [the]

Art(Art(pluralplural)) --gt [the]--gt [the]

N(N(singularsingular)) --gt [turtle]--gt [turtle]

N(N(pluralplural)) --gt [turtles]--gt [turtles]

VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]

VI(VI(pluralplural)) --gt [sleep]--gt [sleep]

Number Agreement

DCGs

Non-terminals may have argumentsbull Variables (start with capital)

Eg Number Any

bull Constants (start with lower case) Eg singular plural

bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)

Parsing needs to be adapted bull Using unification

Unification in a nutshell (cf AI course)

Substitutions

Eg Num singular T vp(VNP)

Applying substitution bull Simultaneously replace variables by

corresponding termsbull S(Num) Num singular = S(singular)

Unification

Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)

Gives Num singular

bull Art(singular) and Art(plural) Fails

bull Art(Num1) and Art(Num2) Num1 Num2

bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative

Parsing with DCGs

Now require successful unification at each step

S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps

S-gt a turtle sleep fails

Case Marking

PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]

PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]

PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]

PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]

S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)

VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)

VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)

VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)

He sees her She sees him They see her

But not Them see he

DCGs

Are strictly more expressive than CFGs Can represent for instance

bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]

Probabilistic Models

Traditional grammar models are very rigid bull essentially a yes no decision

Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative

Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing

Illustration

Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known

bull Constructed by handbull Can be used to derive stochastic context free

grammarsbull SCFG assign probability to parse trees

Compute the most probable parse tree

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 22: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

A good language model

Non-Probabilisticbull ldquoI swear to tell the truthrdquo is possiblebull ldquoI swerve to smell de souprdquo is impossible

Probabilisticbull P(I swear to tell the truth) ~ 0001bull P(I swerve to smell de soup) ~ 0

Why language models

Consider a Shannon Gamebull Predicting the next word in the sequence

Statistical natural language hellip The cat is thrown out of the hellip The large green hellip Sue swallowed the large green hellip hellip

Model at the sentence level

Applications

Spelling correction Mobile phone texting Speech recognition Handwriting recognition Disabled users hellip

Spelling errors

They are leaving in about fifteen minuets to go to her house

The study was conducted mainly be John Black Hopefully all with continue smoothly in my absence Can they lave him my messages I need to notified the bank ofhellip He is trying to fine out

Handwriting recognition

Assume a note is given to a bank teller which the teller reads as I have a gub (cf Woody Allen)

NLP to the rescue hellipbull gub is not a wordbull gun gum Gus and gull are words but gun has a

higher probability in the context of a bank

For Spell Checkers

Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere

ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo

Another dimension in language models

Do we mainly want to infer (probabilities) of legal sentences sequences bull So far

Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL

Letrsquos look at some tasks

Sequence Tagging

Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun

Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End

Sequence Tagging

Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip

Parsing

Given a sentence find its parse tree Important step in understanding NL

Parsing

In bioinformatics allows to predict (elements of) structure from sequence

Language models based on Grammars

Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars

A particular type of Unification Based Grammar (Prolog)

Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about

words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)

bull Grammar encode rules

Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing

(more than just recognition) Result of parsing mostly parse tree

showing the constituents of a sentence eg verb or noun phrases

Syntax usually specified in terms of a grammar consisting of grammar rules

Regular Grammars and Finite State Automata

Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no

argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an

argumentbull Adj (adjective)

Now acceptbull The cat sleptbull Det N Vi

As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]

Lexicon bull The - Detbull Cat - Nbull Slept - Vi

bull hellip

Finite State Automaton

Sentences

bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N

Phrase structure

S

NP

D N

VP

NPV

D N

PP

P NP

D N

the dog chased a cat into the garden

Notation

S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase

Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]

Terminals ~ Lexicon

Phrase structure

Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the

Recursionbull bdquoThe girl thought the dog chased the catldquo

VP -gt V SN -gt [girl]V -gt [thought]

Top-down parsing

S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat

Context-free grammarSS --gt --gt NPNPVPVP

NPNP --gt PN --gt PN Proper nounProper noun

NPNP --gt Art Adj N--gt Art Adj N

NPNP --gt ArtN--gt ArtN

VPVP --gt VI --gt VI intransitive verbintransitive verb

VPVP --gt VT --gt VT NPNP transitive verbtransitive verb

ArtArt --gt [the]--gt [the]

AdjAdj --gt [lazy]--gt [lazy]

AdjAdj --gt [rapid]--gt [rapid]

PNPN --gt [achilles]--gt [achilles]

NN --gt [turtle]--gt [turtle]

VIVI --gt [sleeps]--gt [sleeps]

VTVT --gt [beats]--gt [beats]

Parse tree

SS

NPNP VPVP

ArtArt AdjAdj NN VtVt NPNP

PNPN

achillesachillesbeatsbeatsturtleturtlerapidrapidthethe

Definite Clause GrammarsNon-terminals may have arguments

SS --gt --gt NPNP((NN))VPVP((NN))

NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))

VP(VP(NN)) --gt VI(--gt VI(NN))

Art(Art(singularsingular)) --gt [a]--gt [a]

Art(Art(singularsingular)) --gt [the]--gt [the]

Art(Art(pluralplural)) --gt [the]--gt [the]

N(N(singularsingular)) --gt [turtle]--gt [turtle]

N(N(pluralplural)) --gt [turtles]--gt [turtles]

VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]

VI(VI(pluralplural)) --gt [sleep]--gt [sleep]

Number Agreement

DCGs

Non-terminals may have argumentsbull Variables (start with capital)

Eg Number Any

bull Constants (start with lower case) Eg singular plural

bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)

Parsing needs to be adapted bull Using unification

Unification in a nutshell (cf AI course)

Substitutions

Eg Num singular T vp(VNP)

Applying substitution bull Simultaneously replace variables by

corresponding termsbull S(Num) Num singular = S(singular)

Unification

Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)

Gives Num singular

bull Art(singular) and Art(plural) Fails

bull Art(Num1) and Art(Num2) Num1 Num2

bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative

Parsing with DCGs

Now require successful unification at each step

S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps

S-gt a turtle sleep fails

Case Marking

PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]

PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]

PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]

PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]

S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)

VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)

VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)

VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)

He sees her She sees him They see her

But not Them see he

DCGs

Are strictly more expressive than CFGs Can represent for instance

bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]

Probabilistic Models

Traditional grammar models are very rigid bull essentially a yes no decision

Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative

Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing

Illustration

Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known

bull Constructed by handbull Can be used to derive stochastic context free

grammarsbull SCFG assign probability to parse trees

Compute the most probable parse tree

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 23: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

Why language models

Consider a Shannon Gamebull Predicting the next word in the sequence

Statistical natural language hellip The cat is thrown out of the hellip The large green hellip Sue swallowed the large green hellip hellip

Model at the sentence level

Applications

Spelling correction Mobile phone texting Speech recognition Handwriting recognition Disabled users hellip

Spelling errors

They are leaving in about fifteen minuets to go to her house

The study was conducted mainly be John Black Hopefully all with continue smoothly in my absence Can they lave him my messages I need to notified the bank ofhellip He is trying to fine out

Handwriting recognition

Assume a note is given to a bank teller which the teller reads as I have a gub (cf Woody Allen)

NLP to the rescue hellipbull gub is not a wordbull gun gum Gus and gull are words but gun has a

higher probability in the context of a bank

For Spell Checkers

Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere

ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo

Another dimension in language models

Do we mainly want to infer (probabilities) of legal sentences sequences bull So far

Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL

Letrsquos look at some tasks

Sequence Tagging

Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun

Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End

Sequence Tagging

Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip

Parsing

Given a sentence find its parse tree Important step in understanding NL

Parsing

In bioinformatics allows to predict (elements of) structure from sequence

Language models based on Grammars

Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars

A particular type of Unification Based Grammar (Prolog)

Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about

words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)

bull Grammar encode rules

Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing

(more than just recognition) Result of parsing mostly parse tree

showing the constituents of a sentence eg verb or noun phrases

Syntax usually specified in terms of a grammar consisting of grammar rules

Regular Grammars and Finite State Automata

Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no

argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an

argumentbull Adj (adjective)

Now acceptbull The cat sleptbull Det N Vi

As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]

Lexicon bull The - Detbull Cat - Nbull Slept - Vi

bull hellip

Finite State Automaton

Sentences

bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N

Phrase structure

S

NP

D N

VP

NPV

D N

PP

P NP

D N

the dog chased a cat into the garden

Notation

S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase

Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]

Terminals ~ Lexicon

Phrase structure

Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the

Recursionbull bdquoThe girl thought the dog chased the catldquo

VP -gt V SN -gt [girl]V -gt [thought]

Top-down parsing

S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat

Context-free grammarSS --gt --gt NPNPVPVP

NPNP --gt PN --gt PN Proper nounProper noun

NPNP --gt Art Adj N--gt Art Adj N

NPNP --gt ArtN--gt ArtN

VPVP --gt VI --gt VI intransitive verbintransitive verb

VPVP --gt VT --gt VT NPNP transitive verbtransitive verb

ArtArt --gt [the]--gt [the]

AdjAdj --gt [lazy]--gt [lazy]

AdjAdj --gt [rapid]--gt [rapid]

PNPN --gt [achilles]--gt [achilles]

NN --gt [turtle]--gt [turtle]

VIVI --gt [sleeps]--gt [sleeps]

VTVT --gt [beats]--gt [beats]

Parse tree

SS

NPNP VPVP

ArtArt AdjAdj NN VtVt NPNP

PNPN

achillesachillesbeatsbeatsturtleturtlerapidrapidthethe

Definite Clause GrammarsNon-terminals may have arguments

SS --gt --gt NPNP((NN))VPVP((NN))

NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))

VP(VP(NN)) --gt VI(--gt VI(NN))

Art(Art(singularsingular)) --gt [a]--gt [a]

Art(Art(singularsingular)) --gt [the]--gt [the]

Art(Art(pluralplural)) --gt [the]--gt [the]

N(N(singularsingular)) --gt [turtle]--gt [turtle]

N(N(pluralplural)) --gt [turtles]--gt [turtles]

VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]

VI(VI(pluralplural)) --gt [sleep]--gt [sleep]

Number Agreement

DCGs

Non-terminals may have argumentsbull Variables (start with capital)

Eg Number Any

bull Constants (start with lower case) Eg singular plural

bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)

Parsing needs to be adapted bull Using unification

Unification in a nutshell (cf AI course)

Substitutions

Eg Num singular T vp(VNP)

Applying substitution bull Simultaneously replace variables by

corresponding termsbull S(Num) Num singular = S(singular)

Unification

Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)

Gives Num singular

bull Art(singular) and Art(plural) Fails

bull Art(Num1) and Art(Num2) Num1 Num2

bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative

Parsing with DCGs

Now require successful unification at each step

S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps

S-gt a turtle sleep fails

Case Marking

PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]

PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]

PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]

PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]

S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)

VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)

VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)

VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)

He sees her She sees him They see her

But not Them see he

DCGs

Are strictly more expressive than CFGs Can represent for instance

bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]

Probabilistic Models

Traditional grammar models are very rigid bull essentially a yes no decision

Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative

Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing

Illustration

Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known

bull Constructed by handbull Can be used to derive stochastic context free

grammarsbull SCFG assign probability to parse trees

Compute the most probable parse tree

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 24: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

Applications

Spelling correction Mobile phone texting Speech recognition Handwriting recognition Disabled users hellip

Spelling errors

They are leaving in about fifteen minuets to go to her house

The study was conducted mainly be John Black Hopefully all with continue smoothly in my absence Can they lave him my messages I need to notified the bank ofhellip He is trying to fine out

Handwriting recognition

Assume a note is given to a bank teller which the teller reads as I have a gub (cf Woody Allen)

NLP to the rescue hellipbull gub is not a wordbull gun gum Gus and gull are words but gun has a

higher probability in the context of a bank

For Spell Checkers

Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere

ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo

Another dimension in language models

Do we mainly want to infer (probabilities) of legal sentences sequences bull So far

Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL

Letrsquos look at some tasks

Sequence Tagging

Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun

Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End

Sequence Tagging

Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip

Parsing

Given a sentence find its parse tree Important step in understanding NL

Parsing

In bioinformatics allows to predict (elements of) structure from sequence

Language models based on Grammars

Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars

A particular type of Unification Based Grammar (Prolog)

Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about

words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)

bull Grammar encode rules

Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing

(more than just recognition) Result of parsing mostly parse tree

showing the constituents of a sentence eg verb or noun phrases

Syntax usually specified in terms of a grammar consisting of grammar rules

Regular Grammars and Finite State Automata

Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no

argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an

argumentbull Adj (adjective)

Now acceptbull The cat sleptbull Det N Vi

As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]

Lexicon bull The - Detbull Cat - Nbull Slept - Vi

bull hellip

Finite State Automaton

Sentences

bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N

Phrase structure

S

NP

D N

VP

NPV

D N

PP

P NP

D N

the dog chased a cat into the garden

Notation

S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase

Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]

Terminals ~ Lexicon

Phrase structure

Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the

Recursionbull bdquoThe girl thought the dog chased the catldquo

VP -gt V SN -gt [girl]V -gt [thought]

Top-down parsing

S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat

Context-free grammarSS --gt --gt NPNPVPVP

NPNP --gt PN --gt PN Proper nounProper noun

NPNP --gt Art Adj N--gt Art Adj N

NPNP --gt ArtN--gt ArtN

VPVP --gt VI --gt VI intransitive verbintransitive verb

VPVP --gt VT --gt VT NPNP transitive verbtransitive verb

ArtArt --gt [the]--gt [the]

AdjAdj --gt [lazy]--gt [lazy]

AdjAdj --gt [rapid]--gt [rapid]

PNPN --gt [achilles]--gt [achilles]

NN --gt [turtle]--gt [turtle]

VIVI --gt [sleeps]--gt [sleeps]

VTVT --gt [beats]--gt [beats]

Parse tree

SS

NPNP VPVP

ArtArt AdjAdj NN VtVt NPNP

PNPN

achillesachillesbeatsbeatsturtleturtlerapidrapidthethe

Definite Clause GrammarsNon-terminals may have arguments

SS --gt --gt NPNP((NN))VPVP((NN))

NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))

VP(VP(NN)) --gt VI(--gt VI(NN))

Art(Art(singularsingular)) --gt [a]--gt [a]

Art(Art(singularsingular)) --gt [the]--gt [the]

Art(Art(pluralplural)) --gt [the]--gt [the]

N(N(singularsingular)) --gt [turtle]--gt [turtle]

N(N(pluralplural)) --gt [turtles]--gt [turtles]

VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]

VI(VI(pluralplural)) --gt [sleep]--gt [sleep]

Number Agreement

DCGs

Non-terminals may have argumentsbull Variables (start with capital)

Eg Number Any

bull Constants (start with lower case) Eg singular plural

bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)

Parsing needs to be adapted bull Using unification

Unification in a nutshell (cf AI course)

Substitutions

Eg Num singular T vp(VNP)

Applying substitution bull Simultaneously replace variables by

corresponding termsbull S(Num) Num singular = S(singular)

Unification

Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)

Gives Num singular

bull Art(singular) and Art(plural) Fails

bull Art(Num1) and Art(Num2) Num1 Num2

bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative

Parsing with DCGs

Now require successful unification at each step

S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps

S-gt a turtle sleep fails

Case Marking

PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]

PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]

PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]

PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]

S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)

VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)

VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)

VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)

He sees her She sees him They see her

But not Them see he

DCGs

Are strictly more expressive than CFGs Can represent for instance

bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]

Probabilistic Models

Traditional grammar models are very rigid bull essentially a yes no decision

Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative

Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing

Illustration

Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known

bull Constructed by handbull Can be used to derive stochastic context free

grammarsbull SCFG assign probability to parse trees

Compute the most probable parse tree

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 25: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

Spelling errors

They are leaving in about fifteen minuets to go to her house

The study was conducted mainly be John Black Hopefully all with continue smoothly in my absence Can they lave him my messages I need to notified the bank ofhellip He is trying to fine out

Handwriting recognition

Assume a note is given to a bank teller which the teller reads as I have a gub (cf Woody Allen)

NLP to the rescue hellipbull gub is not a wordbull gun gum Gus and gull are words but gun has a

higher probability in the context of a bank

For Spell Checkers

Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere

ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo

Another dimension in language models

Do we mainly want to infer (probabilities) of legal sentences sequences bull So far

Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL

Letrsquos look at some tasks

Sequence Tagging

Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun

Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End

Sequence Tagging

Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip

Parsing

Given a sentence find its parse tree Important step in understanding NL

Parsing

In bioinformatics allows to predict (elements of) structure from sequence

Language models based on Grammars

Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars

A particular type of Unification Based Grammar (Prolog)

Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about

words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)

bull Grammar encode rules

Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing

(more than just recognition) Result of parsing mostly parse tree

showing the constituents of a sentence eg verb or noun phrases

Syntax usually specified in terms of a grammar consisting of grammar rules

Regular Grammars and Finite State Automata

Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no

argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an

argumentbull Adj (adjective)

Now acceptbull The cat sleptbull Det N Vi

As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]

Lexicon bull The - Detbull Cat - Nbull Slept - Vi

bull hellip

Finite State Automaton

Sentences

bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N

Phrase structure

S

NP

D N

VP

NPV

D N

PP

P NP

D N

the dog chased a cat into the garden

Notation

S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase

Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]

Terminals ~ Lexicon

Phrase structure

Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the

Recursionbull bdquoThe girl thought the dog chased the catldquo

VP -gt V SN -gt [girl]V -gt [thought]

Top-down parsing

S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat

Context-free grammarSS --gt --gt NPNPVPVP

NPNP --gt PN --gt PN Proper nounProper noun

NPNP --gt Art Adj N--gt Art Adj N

NPNP --gt ArtN--gt ArtN

VPVP --gt VI --gt VI intransitive verbintransitive verb

VPVP --gt VT --gt VT NPNP transitive verbtransitive verb

ArtArt --gt [the]--gt [the]

AdjAdj --gt [lazy]--gt [lazy]

AdjAdj --gt [rapid]--gt [rapid]

PNPN --gt [achilles]--gt [achilles]

NN --gt [turtle]--gt [turtle]

VIVI --gt [sleeps]--gt [sleeps]

VTVT --gt [beats]--gt [beats]

Parse tree

SS

NPNP VPVP

ArtArt AdjAdj NN VtVt NPNP

PNPN

achillesachillesbeatsbeatsturtleturtlerapidrapidthethe

Definite Clause GrammarsNon-terminals may have arguments

SS --gt --gt NPNP((NN))VPVP((NN))

NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))

VP(VP(NN)) --gt VI(--gt VI(NN))

Art(Art(singularsingular)) --gt [a]--gt [a]

Art(Art(singularsingular)) --gt [the]--gt [the]

Art(Art(pluralplural)) --gt [the]--gt [the]

N(N(singularsingular)) --gt [turtle]--gt [turtle]

N(N(pluralplural)) --gt [turtles]--gt [turtles]

VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]

VI(VI(pluralplural)) --gt [sleep]--gt [sleep]

Number Agreement

DCGs

Non-terminals may have argumentsbull Variables (start with capital)

Eg Number Any

bull Constants (start with lower case) Eg singular plural

bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)

Parsing needs to be adapted bull Using unification

Unification in a nutshell (cf AI course)

Substitutions

Eg Num singular T vp(VNP)

Applying substitution bull Simultaneously replace variables by

corresponding termsbull S(Num) Num singular = S(singular)

Unification

Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)

Gives Num singular

bull Art(singular) and Art(plural) Fails

bull Art(Num1) and Art(Num2) Num1 Num2

bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative

Parsing with DCGs

Now require successful unification at each step

S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps

S-gt a turtle sleep fails

Case Marking

PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]

PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]

PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]

PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]

S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)

VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)

VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)

VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)

He sees her She sees him They see her

But not Them see he

DCGs

Are strictly more expressive than CFGs Can represent for instance

bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]

Probabilistic Models

Traditional grammar models are very rigid bull essentially a yes no decision

Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative

Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing

Illustration

Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known

bull Constructed by handbull Can be used to derive stochastic context free

grammarsbull SCFG assign probability to parse trees

Compute the most probable parse tree

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 26: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

Handwriting recognition

Assume a note is given to a bank teller which the teller reads as I have a gub (cf Woody Allen)

NLP to the rescue hellipbull gub is not a wordbull gun gum Gus and gull are words but gun has a

higher probability in the context of a bank

For Spell Checkers

Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere

ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo

Another dimension in language models

Do we mainly want to infer (probabilities) of legal sentences sequences bull So far

Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL

Letrsquos look at some tasks

Sequence Tagging

Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun

Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End

Sequence Tagging

Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip

Parsing

Given a sentence find its parse tree Important step in understanding NL

Parsing

In bioinformatics allows to predict (elements of) structure from sequence

Language models based on Grammars

Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars

A particular type of Unification Based Grammar (Prolog)

Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about

words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)

bull Grammar encode rules

Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing

(more than just recognition) Result of parsing mostly parse tree

showing the constituents of a sentence eg verb or noun phrases

Syntax usually specified in terms of a grammar consisting of grammar rules

Regular Grammars and Finite State Automata

Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no

argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an

argumentbull Adj (adjective)

Now acceptbull The cat sleptbull Det N Vi

As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]

Lexicon bull The - Detbull Cat - Nbull Slept - Vi

bull hellip

Finite State Automaton

Sentences

bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N

Phrase structure

S

NP

D N

VP

NPV

D N

PP

P NP

D N

the dog chased a cat into the garden

Notation

S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase

Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]

Terminals ~ Lexicon

Phrase structure

Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the

Recursionbull bdquoThe girl thought the dog chased the catldquo

VP -gt V SN -gt [girl]V -gt [thought]

Top-down parsing

S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat

Context-free grammarSS --gt --gt NPNPVPVP

NPNP --gt PN --gt PN Proper nounProper noun

NPNP --gt Art Adj N--gt Art Adj N

NPNP --gt ArtN--gt ArtN

VPVP --gt VI --gt VI intransitive verbintransitive verb

VPVP --gt VT --gt VT NPNP transitive verbtransitive verb

ArtArt --gt [the]--gt [the]

AdjAdj --gt [lazy]--gt [lazy]

AdjAdj --gt [rapid]--gt [rapid]

PNPN --gt [achilles]--gt [achilles]

NN --gt [turtle]--gt [turtle]

VIVI --gt [sleeps]--gt [sleeps]

VTVT --gt [beats]--gt [beats]

Parse tree

SS

NPNP VPVP

ArtArt AdjAdj NN VtVt NPNP

PNPN

achillesachillesbeatsbeatsturtleturtlerapidrapidthethe

Definite Clause GrammarsNon-terminals may have arguments

SS --gt --gt NPNP((NN))VPVP((NN))

NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))

VP(VP(NN)) --gt VI(--gt VI(NN))

Art(Art(singularsingular)) --gt [a]--gt [a]

Art(Art(singularsingular)) --gt [the]--gt [the]

Art(Art(pluralplural)) --gt [the]--gt [the]

N(N(singularsingular)) --gt [turtle]--gt [turtle]

N(N(pluralplural)) --gt [turtles]--gt [turtles]

VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]

VI(VI(pluralplural)) --gt [sleep]--gt [sleep]

Number Agreement

DCGs

Non-terminals may have argumentsbull Variables (start with capital)

Eg Number Any

bull Constants (start with lower case) Eg singular plural

bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)

Parsing needs to be adapted bull Using unification

Unification in a nutshell (cf AI course)

Substitutions

Eg Num singular T vp(VNP)

Applying substitution bull Simultaneously replace variables by

corresponding termsbull S(Num) Num singular = S(singular)

Unification

Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)

Gives Num singular

bull Art(singular) and Art(plural) Fails

bull Art(Num1) and Art(Num2) Num1 Num2

bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative

Parsing with DCGs

Now require successful unification at each step

S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps

S-gt a turtle sleep fails

Case Marking

PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]

PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]

PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]

PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]

S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)

VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)

VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)

VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)

He sees her She sees him They see her

But not Them see he

DCGs

Are strictly more expressive than CFGs Can represent for instance

bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]

Probabilistic Models

Traditional grammar models are very rigid bull essentially a yes no decision

Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative

Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing

Illustration

Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known

bull Constructed by handbull Can be used to derive stochastic context free

grammarsbull SCFG assign probability to parse trees

Compute the most probable parse tree

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 27: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

For Spell Checkers

Collect list of commonly substituted wordsbull piecepeace whetherweather theirthere

ExampleldquoOn Tuesday the whether helliprsquorsquoldquoOn Tuesday the weather helliprdquo

Another dimension in language models

Do we mainly want to infer (probabilities) of legal sentences sequences bull So far

Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL

Letrsquos look at some tasks

Sequence Tagging

Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun

Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End

Sequence Tagging

Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip

Parsing

Given a sentence find its parse tree Important step in understanding NL

Parsing

In bioinformatics allows to predict (elements of) structure from sequence

Language models based on Grammars

Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars

A particular type of Unification Based Grammar (Prolog)

Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about

words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)

bull Grammar encode rules

Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing

(more than just recognition) Result of parsing mostly parse tree

showing the constituents of a sentence eg verb or noun phrases

Syntax usually specified in terms of a grammar consisting of grammar rules

Regular Grammars and Finite State Automata

Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no

argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an

argumentbull Adj (adjective)

Now acceptbull The cat sleptbull Det N Vi

As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]

Lexicon bull The - Detbull Cat - Nbull Slept - Vi

bull hellip

Finite State Automaton

Sentences

bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N

Phrase structure

S

NP

D N

VP

NPV

D N

PP

P NP

D N

the dog chased a cat into the garden

Notation

S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase

Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]

Terminals ~ Lexicon

Phrase structure

Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the

Recursionbull bdquoThe girl thought the dog chased the catldquo

VP -gt V SN -gt [girl]V -gt [thought]

Top-down parsing

S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat

Context-free grammarSS --gt --gt NPNPVPVP

NPNP --gt PN --gt PN Proper nounProper noun

NPNP --gt Art Adj N--gt Art Adj N

NPNP --gt ArtN--gt ArtN

VPVP --gt VI --gt VI intransitive verbintransitive verb

VPVP --gt VT --gt VT NPNP transitive verbtransitive verb

ArtArt --gt [the]--gt [the]

AdjAdj --gt [lazy]--gt [lazy]

AdjAdj --gt [rapid]--gt [rapid]

PNPN --gt [achilles]--gt [achilles]

NN --gt [turtle]--gt [turtle]

VIVI --gt [sleeps]--gt [sleeps]

VTVT --gt [beats]--gt [beats]

Parse tree

SS

NPNP VPVP

ArtArt AdjAdj NN VtVt NPNP

PNPN

achillesachillesbeatsbeatsturtleturtlerapidrapidthethe

Definite Clause GrammarsNon-terminals may have arguments

SS --gt --gt NPNP((NN))VPVP((NN))

NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))

VP(VP(NN)) --gt VI(--gt VI(NN))

Art(Art(singularsingular)) --gt [a]--gt [a]

Art(Art(singularsingular)) --gt [the]--gt [the]

Art(Art(pluralplural)) --gt [the]--gt [the]

N(N(singularsingular)) --gt [turtle]--gt [turtle]

N(N(pluralplural)) --gt [turtles]--gt [turtles]

VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]

VI(VI(pluralplural)) --gt [sleep]--gt [sleep]

Number Agreement

DCGs

Non-terminals may have argumentsbull Variables (start with capital)

Eg Number Any

bull Constants (start with lower case) Eg singular plural

bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)

Parsing needs to be adapted bull Using unification

Unification in a nutshell (cf AI course)

Substitutions

Eg Num singular T vp(VNP)

Applying substitution bull Simultaneously replace variables by

corresponding termsbull S(Num) Num singular = S(singular)

Unification

Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)

Gives Num singular

bull Art(singular) and Art(plural) Fails

bull Art(Num1) and Art(Num2) Num1 Num2

bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative

Parsing with DCGs

Now require successful unification at each step

S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps

S-gt a turtle sleep fails

Case Marking

PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]

PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]

PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]

PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]

S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)

VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)

VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)

VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)

He sees her She sees him They see her

But not Them see he

DCGs

Are strictly more expressive than CFGs Can represent for instance

bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]

Probabilistic Models

Traditional grammar models are very rigid bull essentially a yes no decision

Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative

Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing

Illustration

Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known

bull Constructed by handbull Can be used to derive stochastic context free

grammarsbull SCFG assign probability to parse trees

Compute the most probable parse tree

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 28: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

Another dimension in language models

Do we mainly want to infer (probabilities) of legal sentences sequences bull So far

Or do we want to infer properties of these sentences bull Eg parse tree part-of-speech-taggingbullNeeded for understanding NL

Letrsquos look at some tasks

Sequence Tagging

Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun

Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End

Sequence Tagging

Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip

Parsing

Given a sentence find its parse tree Important step in understanding NL

Parsing

In bioinformatics allows to predict (elements of) structure from sequence

Language models based on Grammars

Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars

A particular type of Unification Based Grammar (Prolog)

Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about

words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)

bull Grammar encode rules

Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing

(more than just recognition) Result of parsing mostly parse tree

showing the constituents of a sentence eg verb or noun phrases

Syntax usually specified in terms of a grammar consisting of grammar rules

Regular Grammars and Finite State Automata

Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no

argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an

argumentbull Adj (adjective)

Now acceptbull The cat sleptbull Det N Vi

As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]

Lexicon bull The - Detbull Cat - Nbull Slept - Vi

bull hellip

Finite State Automaton

Sentences

bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N

Phrase structure

S

NP

D N

VP

NPV

D N

PP

P NP

D N

the dog chased a cat into the garden

Notation

S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase

Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]

Terminals ~ Lexicon

Phrase structure

Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the

Recursionbull bdquoThe girl thought the dog chased the catldquo

VP -gt V SN -gt [girl]V -gt [thought]

Top-down parsing

S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat

Context-free grammarSS --gt --gt NPNPVPVP

NPNP --gt PN --gt PN Proper nounProper noun

NPNP --gt Art Adj N--gt Art Adj N

NPNP --gt ArtN--gt ArtN

VPVP --gt VI --gt VI intransitive verbintransitive verb

VPVP --gt VT --gt VT NPNP transitive verbtransitive verb

ArtArt --gt [the]--gt [the]

AdjAdj --gt [lazy]--gt [lazy]

AdjAdj --gt [rapid]--gt [rapid]

PNPN --gt [achilles]--gt [achilles]

NN --gt [turtle]--gt [turtle]

VIVI --gt [sleeps]--gt [sleeps]

VTVT --gt [beats]--gt [beats]

Parse tree

SS

NPNP VPVP

ArtArt AdjAdj NN VtVt NPNP

PNPN

achillesachillesbeatsbeatsturtleturtlerapidrapidthethe

Definite Clause GrammarsNon-terminals may have arguments

SS --gt --gt NPNP((NN))VPVP((NN))

NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))

VP(VP(NN)) --gt VI(--gt VI(NN))

Art(Art(singularsingular)) --gt [a]--gt [a]

Art(Art(singularsingular)) --gt [the]--gt [the]

Art(Art(pluralplural)) --gt [the]--gt [the]

N(N(singularsingular)) --gt [turtle]--gt [turtle]

N(N(pluralplural)) --gt [turtles]--gt [turtles]

VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]

VI(VI(pluralplural)) --gt [sleep]--gt [sleep]

Number Agreement

DCGs

Non-terminals may have argumentsbull Variables (start with capital)

Eg Number Any

bull Constants (start with lower case) Eg singular plural

bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)

Parsing needs to be adapted bull Using unification

Unification in a nutshell (cf AI course)

Substitutions

Eg Num singular T vp(VNP)

Applying substitution bull Simultaneously replace variables by

corresponding termsbull S(Num) Num singular = S(singular)

Unification

Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)

Gives Num singular

bull Art(singular) and Art(plural) Fails

bull Art(Num1) and Art(Num2) Num1 Num2

bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative

Parsing with DCGs

Now require successful unification at each step

S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps

S-gt a turtle sleep fails

Case Marking

PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]

PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]

PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]

PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]

S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)

VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)

VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)

VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)

He sees her She sees him They see her

But not Them see he

DCGs

Are strictly more expressive than CFGs Can represent for instance

bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]

Probabilistic Models

Traditional grammar models are very rigid bull essentially a yes no decision

Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative

Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing

Illustration

Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known

bull Constructed by handbull Can be used to derive stochastic context free

grammarsbull SCFG assign probability to parse trees

Compute the most probable parse tree

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 29: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

Sequence Tagging

Part-of-speech taggingbull He drives with his bikebull N V PR PN N noun verb preposition pronoun noun

Text extractionbull The job is that of a programmerbull X X X X X X JobTypebull The seminar is taking place from 1500 to 1600bull X X X X X X Start End

Sequence Tagging

Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip

Parsing

Given a sentence find its parse tree Important step in understanding NL

Parsing

In bioinformatics allows to predict (elements of) structure from sequence

Language models based on Grammars

Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars

A particular type of Unification Based Grammar (Prolog)

Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about

words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)

bull Grammar encode rules

Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing

(more than just recognition) Result of parsing mostly parse tree

showing the constituents of a sentence eg verb or noun phrases

Syntax usually specified in terms of a grammar consisting of grammar rules

Regular Grammars and Finite State Automata

Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no

argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an

argumentbull Adj (adjective)

Now acceptbull The cat sleptbull Det N Vi

As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]

Lexicon bull The - Detbull Cat - Nbull Slept - Vi

bull hellip

Finite State Automaton

Sentences

bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N

Phrase structure

S

NP

D N

VP

NPV

D N

PP

P NP

D N

the dog chased a cat into the garden

Notation

S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase

Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]

Terminals ~ Lexicon

Phrase structure

Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the

Recursionbull bdquoThe girl thought the dog chased the catldquo

VP -gt V SN -gt [girl]V -gt [thought]

Top-down parsing

S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat

Context-free grammarSS --gt --gt NPNPVPVP

NPNP --gt PN --gt PN Proper nounProper noun

NPNP --gt Art Adj N--gt Art Adj N

NPNP --gt ArtN--gt ArtN

VPVP --gt VI --gt VI intransitive verbintransitive verb

VPVP --gt VT --gt VT NPNP transitive verbtransitive verb

ArtArt --gt [the]--gt [the]

AdjAdj --gt [lazy]--gt [lazy]

AdjAdj --gt [rapid]--gt [rapid]

PNPN --gt [achilles]--gt [achilles]

NN --gt [turtle]--gt [turtle]

VIVI --gt [sleeps]--gt [sleeps]

VTVT --gt [beats]--gt [beats]

Parse tree

SS

NPNP VPVP

ArtArt AdjAdj NN VtVt NPNP

PNPN

achillesachillesbeatsbeatsturtleturtlerapidrapidthethe

Definite Clause GrammarsNon-terminals may have arguments

SS --gt --gt NPNP((NN))VPVP((NN))

NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))

VP(VP(NN)) --gt VI(--gt VI(NN))

Art(Art(singularsingular)) --gt [a]--gt [a]

Art(Art(singularsingular)) --gt [the]--gt [the]

Art(Art(pluralplural)) --gt [the]--gt [the]

N(N(singularsingular)) --gt [turtle]--gt [turtle]

N(N(pluralplural)) --gt [turtles]--gt [turtles]

VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]

VI(VI(pluralplural)) --gt [sleep]--gt [sleep]

Number Agreement

DCGs

Non-terminals may have argumentsbull Variables (start with capital)

Eg Number Any

bull Constants (start with lower case) Eg singular plural

bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)

Parsing needs to be adapted bull Using unification

Unification in a nutshell (cf AI course)

Substitutions

Eg Num singular T vp(VNP)

Applying substitution bull Simultaneously replace variables by

corresponding termsbull S(Num) Num singular = S(singular)

Unification

Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)

Gives Num singular

bull Art(singular) and Art(plural) Fails

bull Art(Num1) and Art(Num2) Num1 Num2

bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative

Parsing with DCGs

Now require successful unification at each step

S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps

S-gt a turtle sleep fails

Case Marking

PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]

PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]

PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]

PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]

S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)

VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)

VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)

VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)

He sees her She sees him They see her

But not Them see he

DCGs

Are strictly more expressive than CFGs Can represent for instance

bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]

Probabilistic Models

Traditional grammar models are very rigid bull essentially a yes no decision

Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative

Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing

Illustration

Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known

bull Constructed by handbull Can be used to derive stochastic context free

grammarsbull SCFG assign probability to parse trees

Compute the most probable parse tree

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 30: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

Sequence Tagging

Predicting the secondary structure of proteins mRNA hellipbull X = AFARLMMAhellipbull Y = hehestststhesthe hellip

Parsing

Given a sentence find its parse tree Important step in understanding NL

Parsing

In bioinformatics allows to predict (elements of) structure from sequence

Language models based on Grammars

Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars

A particular type of Unification Based Grammar (Prolog)

Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about

words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)

bull Grammar encode rules

Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing

(more than just recognition) Result of parsing mostly parse tree

showing the constituents of a sentence eg verb or noun phrases

Syntax usually specified in terms of a grammar consisting of grammar rules

Regular Grammars and Finite State Automata

Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no

argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an

argumentbull Adj (adjective)

Now acceptbull The cat sleptbull Det N Vi

As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]

Lexicon bull The - Detbull Cat - Nbull Slept - Vi

bull hellip

Finite State Automaton

Sentences

bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N

Phrase structure

S

NP

D N

VP

NPV

D N

PP

P NP

D N

the dog chased a cat into the garden

Notation

S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase

Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]

Terminals ~ Lexicon

Phrase structure

Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the

Recursionbull bdquoThe girl thought the dog chased the catldquo

VP -gt V SN -gt [girl]V -gt [thought]

Top-down parsing

S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat

Context-free grammarSS --gt --gt NPNPVPVP

NPNP --gt PN --gt PN Proper nounProper noun

NPNP --gt Art Adj N--gt Art Adj N

NPNP --gt ArtN--gt ArtN

VPVP --gt VI --gt VI intransitive verbintransitive verb

VPVP --gt VT --gt VT NPNP transitive verbtransitive verb

ArtArt --gt [the]--gt [the]

AdjAdj --gt [lazy]--gt [lazy]

AdjAdj --gt [rapid]--gt [rapid]

PNPN --gt [achilles]--gt [achilles]

NN --gt [turtle]--gt [turtle]

VIVI --gt [sleeps]--gt [sleeps]

VTVT --gt [beats]--gt [beats]

Parse tree

SS

NPNP VPVP

ArtArt AdjAdj NN VtVt NPNP

PNPN

achillesachillesbeatsbeatsturtleturtlerapidrapidthethe

Definite Clause GrammarsNon-terminals may have arguments

SS --gt --gt NPNP((NN))VPVP((NN))

NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))

VP(VP(NN)) --gt VI(--gt VI(NN))

Art(Art(singularsingular)) --gt [a]--gt [a]

Art(Art(singularsingular)) --gt [the]--gt [the]

Art(Art(pluralplural)) --gt [the]--gt [the]

N(N(singularsingular)) --gt [turtle]--gt [turtle]

N(N(pluralplural)) --gt [turtles]--gt [turtles]

VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]

VI(VI(pluralplural)) --gt [sleep]--gt [sleep]

Number Agreement

DCGs

Non-terminals may have argumentsbull Variables (start with capital)

Eg Number Any

bull Constants (start with lower case) Eg singular plural

bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)

Parsing needs to be adapted bull Using unification

Unification in a nutshell (cf AI course)

Substitutions

Eg Num singular T vp(VNP)

Applying substitution bull Simultaneously replace variables by

corresponding termsbull S(Num) Num singular = S(singular)

Unification

Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)

Gives Num singular

bull Art(singular) and Art(plural) Fails

bull Art(Num1) and Art(Num2) Num1 Num2

bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative

Parsing with DCGs

Now require successful unification at each step

S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps

S-gt a turtle sleep fails

Case Marking

PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]

PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]

PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]

PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]

S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)

VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)

VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)

VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)

He sees her She sees him They see her

But not Them see he

DCGs

Are strictly more expressive than CFGs Can represent for instance

bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]

Probabilistic Models

Traditional grammar models are very rigid bull essentially a yes no decision

Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative

Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing

Illustration

Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known

bull Constructed by handbull Can be used to derive stochastic context free

grammarsbull SCFG assign probability to parse trees

Compute the most probable parse tree

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 31: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

Parsing

Given a sentence find its parse tree Important step in understanding NL

Parsing

In bioinformatics allows to predict (elements of) structure from sequence

Language models based on Grammars

Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars

A particular type of Unification Based Grammar (Prolog)

Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about

words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)

bull Grammar encode rules

Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing

(more than just recognition) Result of parsing mostly parse tree

showing the constituents of a sentence eg verb or noun phrases

Syntax usually specified in terms of a grammar consisting of grammar rules

Regular Grammars and Finite State Automata

Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no

argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an

argumentbull Adj (adjective)

Now acceptbull The cat sleptbull Det N Vi

As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]

Lexicon bull The - Detbull Cat - Nbull Slept - Vi

bull hellip

Finite State Automaton

Sentences

bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N

Phrase structure

S

NP

D N

VP

NPV

D N

PP

P NP

D N

the dog chased a cat into the garden

Notation

S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase

Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]

Terminals ~ Lexicon

Phrase structure

Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the

Recursionbull bdquoThe girl thought the dog chased the catldquo

VP -gt V SN -gt [girl]V -gt [thought]

Top-down parsing

S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat

Context-free grammarSS --gt --gt NPNPVPVP

NPNP --gt PN --gt PN Proper nounProper noun

NPNP --gt Art Adj N--gt Art Adj N

NPNP --gt ArtN--gt ArtN

VPVP --gt VI --gt VI intransitive verbintransitive verb

VPVP --gt VT --gt VT NPNP transitive verbtransitive verb

ArtArt --gt [the]--gt [the]

AdjAdj --gt [lazy]--gt [lazy]

AdjAdj --gt [rapid]--gt [rapid]

PNPN --gt [achilles]--gt [achilles]

NN --gt [turtle]--gt [turtle]

VIVI --gt [sleeps]--gt [sleeps]

VTVT --gt [beats]--gt [beats]

Parse tree

SS

NPNP VPVP

ArtArt AdjAdj NN VtVt NPNP

PNPN

achillesachillesbeatsbeatsturtleturtlerapidrapidthethe

Definite Clause GrammarsNon-terminals may have arguments

SS --gt --gt NPNP((NN))VPVP((NN))

NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))

VP(VP(NN)) --gt VI(--gt VI(NN))

Art(Art(singularsingular)) --gt [a]--gt [a]

Art(Art(singularsingular)) --gt [the]--gt [the]

Art(Art(pluralplural)) --gt [the]--gt [the]

N(N(singularsingular)) --gt [turtle]--gt [turtle]

N(N(pluralplural)) --gt [turtles]--gt [turtles]

VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]

VI(VI(pluralplural)) --gt [sleep]--gt [sleep]

Number Agreement

DCGs

Non-terminals may have argumentsbull Variables (start with capital)

Eg Number Any

bull Constants (start with lower case) Eg singular plural

bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)

Parsing needs to be adapted bull Using unification

Unification in a nutshell (cf AI course)

Substitutions

Eg Num singular T vp(VNP)

Applying substitution bull Simultaneously replace variables by

corresponding termsbull S(Num) Num singular = S(singular)

Unification

Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)

Gives Num singular

bull Art(singular) and Art(plural) Fails

bull Art(Num1) and Art(Num2) Num1 Num2

bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative

Parsing with DCGs

Now require successful unification at each step

S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps

S-gt a turtle sleep fails

Case Marking

PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]

PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]

PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]

PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]

S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)

VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)

VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)

VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)

He sees her She sees him They see her

But not Them see he

DCGs

Are strictly more expressive than CFGs Can represent for instance

bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]

Probabilistic Models

Traditional grammar models are very rigid bull essentially a yes no decision

Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative

Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing

Illustration

Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known

bull Constructed by handbull Can be used to derive stochastic context free

grammarsbull SCFG assign probability to parse trees

Compute the most probable parse tree

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 32: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

Parsing

In bioinformatics allows to predict (elements of) structure from sequence

Language models based on Grammars

Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars

A particular type of Unification Based Grammar (Prolog)

Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about

words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)

bull Grammar encode rules

Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing

(more than just recognition) Result of parsing mostly parse tree

showing the constituents of a sentence eg verb or noun phrases

Syntax usually specified in terms of a grammar consisting of grammar rules

Regular Grammars and Finite State Automata

Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no

argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an

argumentbull Adj (adjective)

Now acceptbull The cat sleptbull Det N Vi

As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]

Lexicon bull The - Detbull Cat - Nbull Slept - Vi

bull hellip

Finite State Automaton

Sentences

bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N

Phrase structure

S

NP

D N

VP

NPV

D N

PP

P NP

D N

the dog chased a cat into the garden

Notation

S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase

Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]

Terminals ~ Lexicon

Phrase structure

Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the

Recursionbull bdquoThe girl thought the dog chased the catldquo

VP -gt V SN -gt [girl]V -gt [thought]

Top-down parsing

S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat

Context-free grammarSS --gt --gt NPNPVPVP

NPNP --gt PN --gt PN Proper nounProper noun

NPNP --gt Art Adj N--gt Art Adj N

NPNP --gt ArtN--gt ArtN

VPVP --gt VI --gt VI intransitive verbintransitive verb

VPVP --gt VT --gt VT NPNP transitive verbtransitive verb

ArtArt --gt [the]--gt [the]

AdjAdj --gt [lazy]--gt [lazy]

AdjAdj --gt [rapid]--gt [rapid]

PNPN --gt [achilles]--gt [achilles]

NN --gt [turtle]--gt [turtle]

VIVI --gt [sleeps]--gt [sleeps]

VTVT --gt [beats]--gt [beats]

Parse tree

SS

NPNP VPVP

ArtArt AdjAdj NN VtVt NPNP

PNPN

achillesachillesbeatsbeatsturtleturtlerapidrapidthethe

Definite Clause GrammarsNon-terminals may have arguments

SS --gt --gt NPNP((NN))VPVP((NN))

NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))

VP(VP(NN)) --gt VI(--gt VI(NN))

Art(Art(singularsingular)) --gt [a]--gt [a]

Art(Art(singularsingular)) --gt [the]--gt [the]

Art(Art(pluralplural)) --gt [the]--gt [the]

N(N(singularsingular)) --gt [turtle]--gt [turtle]

N(N(pluralplural)) --gt [turtles]--gt [turtles]

VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]

VI(VI(pluralplural)) --gt [sleep]--gt [sleep]

Number Agreement

DCGs

Non-terminals may have argumentsbull Variables (start with capital)

Eg Number Any

bull Constants (start with lower case) Eg singular plural

bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)

Parsing needs to be adapted bull Using unification

Unification in a nutshell (cf AI course)

Substitutions

Eg Num singular T vp(VNP)

Applying substitution bull Simultaneously replace variables by

corresponding termsbull S(Num) Num singular = S(singular)

Unification

Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)

Gives Num singular

bull Art(singular) and Art(plural) Fails

bull Art(Num1) and Art(Num2) Num1 Num2

bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative

Parsing with DCGs

Now require successful unification at each step

S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps

S-gt a turtle sleep fails

Case Marking

PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]

PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]

PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]

PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]

S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)

VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)

VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)

VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)

He sees her She sees him They see her

But not Them see he

DCGs

Are strictly more expressive than CFGs Can represent for instance

bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]

Probabilistic Models

Traditional grammar models are very rigid bull essentially a yes no decision

Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative

Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing

Illustration

Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known

bull Constructed by handbull Can be used to derive stochastic context free

grammarsbull SCFG assign probability to parse trees

Compute the most probable parse tree

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 33: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

Language models based on Grammars

Grammar Typesbull Regular grammars and Finite State Automatabull Context-Free Grammarsbull Definite Clause Grammars

A particular type of Unification Based Grammar (Prolog)

Distinguish lexicon from grammarbull Lexicon (dictionary) contains information about

words eg word - possible tags (and possibly additional information) flies - V(erb) - N(oun)

bull Grammar encode rules

Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing

(more than just recognition) Result of parsing mostly parse tree

showing the constituents of a sentence eg verb or noun phrases

Syntax usually specified in terms of a grammar consisting of grammar rules

Regular Grammars and Finite State Automata

Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no

argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an

argumentbull Adj (adjective)

Now acceptbull The cat sleptbull Det N Vi

As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]

Lexicon bull The - Detbull Cat - Nbull Slept - Vi

bull hellip

Finite State Automaton

Sentences

bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N

Phrase structure

S

NP

D N

VP

NPV

D N

PP

P NP

D N

the dog chased a cat into the garden

Notation

S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase

Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]

Terminals ~ Lexicon

Phrase structure

Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the

Recursionbull bdquoThe girl thought the dog chased the catldquo

VP -gt V SN -gt [girl]V -gt [thought]

Top-down parsing

S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat

Context-free grammarSS --gt --gt NPNPVPVP

NPNP --gt PN --gt PN Proper nounProper noun

NPNP --gt Art Adj N--gt Art Adj N

NPNP --gt ArtN--gt ArtN

VPVP --gt VI --gt VI intransitive verbintransitive verb

VPVP --gt VT --gt VT NPNP transitive verbtransitive verb

ArtArt --gt [the]--gt [the]

AdjAdj --gt [lazy]--gt [lazy]

AdjAdj --gt [rapid]--gt [rapid]

PNPN --gt [achilles]--gt [achilles]

NN --gt [turtle]--gt [turtle]

VIVI --gt [sleeps]--gt [sleeps]

VTVT --gt [beats]--gt [beats]

Parse tree

SS

NPNP VPVP

ArtArt AdjAdj NN VtVt NPNP

PNPN

achillesachillesbeatsbeatsturtleturtlerapidrapidthethe

Definite Clause GrammarsNon-terminals may have arguments

SS --gt --gt NPNP((NN))VPVP((NN))

NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))

VP(VP(NN)) --gt VI(--gt VI(NN))

Art(Art(singularsingular)) --gt [a]--gt [a]

Art(Art(singularsingular)) --gt [the]--gt [the]

Art(Art(pluralplural)) --gt [the]--gt [the]

N(N(singularsingular)) --gt [turtle]--gt [turtle]

N(N(pluralplural)) --gt [turtles]--gt [turtles]

VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]

VI(VI(pluralplural)) --gt [sleep]--gt [sleep]

Number Agreement

DCGs

Non-terminals may have argumentsbull Variables (start with capital)

Eg Number Any

bull Constants (start with lower case) Eg singular plural

bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)

Parsing needs to be adapted bull Using unification

Unification in a nutshell (cf AI course)

Substitutions

Eg Num singular T vp(VNP)

Applying substitution bull Simultaneously replace variables by

corresponding termsbull S(Num) Num singular = S(singular)

Unification

Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)

Gives Num singular

bull Art(singular) and Art(plural) Fails

bull Art(Num1) and Art(Num2) Num1 Num2

bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative

Parsing with DCGs

Now require successful unification at each step

S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps

S-gt a turtle sleep fails

Case Marking

PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]

PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]

PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]

PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]

S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)

VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)

VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)

VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)

He sees her She sees him They see her

But not Them see he

DCGs

Are strictly more expressive than CFGs Can represent for instance

bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]

Probabilistic Models

Traditional grammar models are very rigid bull essentially a yes no decision

Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative

Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing

Illustration

Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known

bull Constructed by handbull Can be used to derive stochastic context free

grammarsbull SCFG assign probability to parse trees

Compute the most probable parse tree

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 34: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

Grammars and parsing Syntactic level best understood and formalized Derivation of grammatical structure parsing

(more than just recognition) Result of parsing mostly parse tree

showing the constituents of a sentence eg verb or noun phrases

Syntax usually specified in terms of a grammar consisting of grammar rules

Regular Grammars and Finite State Automata

Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no

argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an

argumentbull Adj (adjective)

Now acceptbull The cat sleptbull Det N Vi

As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]

Lexicon bull The - Detbull Cat - Nbull Slept - Vi

bull hellip

Finite State Automaton

Sentences

bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N

Phrase structure

S

NP

D N

VP

NPV

D N

PP

P NP

D N

the dog chased a cat into the garden

Notation

S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase

Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]

Terminals ~ Lexicon

Phrase structure

Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the

Recursionbull bdquoThe girl thought the dog chased the catldquo

VP -gt V SN -gt [girl]V -gt [thought]

Top-down parsing

S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat

Context-free grammarSS --gt --gt NPNPVPVP

NPNP --gt PN --gt PN Proper nounProper noun

NPNP --gt Art Adj N--gt Art Adj N

NPNP --gt ArtN--gt ArtN

VPVP --gt VI --gt VI intransitive verbintransitive verb

VPVP --gt VT --gt VT NPNP transitive verbtransitive verb

ArtArt --gt [the]--gt [the]

AdjAdj --gt [lazy]--gt [lazy]

AdjAdj --gt [rapid]--gt [rapid]

PNPN --gt [achilles]--gt [achilles]

NN --gt [turtle]--gt [turtle]

VIVI --gt [sleeps]--gt [sleeps]

VTVT --gt [beats]--gt [beats]

Parse tree

SS

NPNP VPVP

ArtArt AdjAdj NN VtVt NPNP

PNPN

achillesachillesbeatsbeatsturtleturtlerapidrapidthethe

Definite Clause GrammarsNon-terminals may have arguments

SS --gt --gt NPNP((NN))VPVP((NN))

NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))

VP(VP(NN)) --gt VI(--gt VI(NN))

Art(Art(singularsingular)) --gt [a]--gt [a]

Art(Art(singularsingular)) --gt [the]--gt [the]

Art(Art(pluralplural)) --gt [the]--gt [the]

N(N(singularsingular)) --gt [turtle]--gt [turtle]

N(N(pluralplural)) --gt [turtles]--gt [turtles]

VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]

VI(VI(pluralplural)) --gt [sleep]--gt [sleep]

Number Agreement

DCGs

Non-terminals may have argumentsbull Variables (start with capital)

Eg Number Any

bull Constants (start with lower case) Eg singular plural

bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)

Parsing needs to be adapted bull Using unification

Unification in a nutshell (cf AI course)

Substitutions

Eg Num singular T vp(VNP)

Applying substitution bull Simultaneously replace variables by

corresponding termsbull S(Num) Num singular = S(singular)

Unification

Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)

Gives Num singular

bull Art(singular) and Art(plural) Fails

bull Art(Num1) and Art(Num2) Num1 Num2

bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative

Parsing with DCGs

Now require successful unification at each step

S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps

S-gt a turtle sleep fails

Case Marking

PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]

PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]

PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]

PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]

S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)

VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)

VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)

VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)

He sees her She sees him They see her

But not Them see he

DCGs

Are strictly more expressive than CFGs Can represent for instance

bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]

Probabilistic Models

Traditional grammar models are very rigid bull essentially a yes no decision

Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative

Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing

Illustration

Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known

bull Constructed by handbull Can be used to derive stochastic context free

grammarsbull SCFG assign probability to parse trees

Compute the most probable parse tree

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 35: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

Regular Grammars and Finite State Automata

Lexical information - which words are bull Det(erminer)bull N(oun)bull Vi (intransitive verb) - no

argumentbull Pn (pronoun) bull Vt (transitive verb) - takes an

argumentbull Adj (adjective)

Now acceptbull The cat sleptbull Det N Vi

As regular grammarbull S -gt [Det] S1 [ ] terminal bull S1 -gt [N] S2bull S2 -gt [Vi]

Lexicon bull The - Detbull Cat - Nbull Slept - Vi

bull hellip

Finite State Automaton

Sentences

bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N

Phrase structure

S

NP

D N

VP

NPV

D N

PP

P NP

D N

the dog chased a cat into the garden

Notation

S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase

Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]

Terminals ~ Lexicon

Phrase structure

Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the

Recursionbull bdquoThe girl thought the dog chased the catldquo

VP -gt V SN -gt [girl]V -gt [thought]

Top-down parsing

S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat

Context-free grammarSS --gt --gt NPNPVPVP

NPNP --gt PN --gt PN Proper nounProper noun

NPNP --gt Art Adj N--gt Art Adj N

NPNP --gt ArtN--gt ArtN

VPVP --gt VI --gt VI intransitive verbintransitive verb

VPVP --gt VT --gt VT NPNP transitive verbtransitive verb

ArtArt --gt [the]--gt [the]

AdjAdj --gt [lazy]--gt [lazy]

AdjAdj --gt [rapid]--gt [rapid]

PNPN --gt [achilles]--gt [achilles]

NN --gt [turtle]--gt [turtle]

VIVI --gt [sleeps]--gt [sleeps]

VTVT --gt [beats]--gt [beats]

Parse tree

SS

NPNP VPVP

ArtArt AdjAdj NN VtVt NPNP

PNPN

achillesachillesbeatsbeatsturtleturtlerapidrapidthethe

Definite Clause GrammarsNon-terminals may have arguments

SS --gt --gt NPNP((NN))VPVP((NN))

NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))

VP(VP(NN)) --gt VI(--gt VI(NN))

Art(Art(singularsingular)) --gt [a]--gt [a]

Art(Art(singularsingular)) --gt [the]--gt [the]

Art(Art(pluralplural)) --gt [the]--gt [the]

N(N(singularsingular)) --gt [turtle]--gt [turtle]

N(N(pluralplural)) --gt [turtles]--gt [turtles]

VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]

VI(VI(pluralplural)) --gt [sleep]--gt [sleep]

Number Agreement

DCGs

Non-terminals may have argumentsbull Variables (start with capital)

Eg Number Any

bull Constants (start with lower case) Eg singular plural

bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)

Parsing needs to be adapted bull Using unification

Unification in a nutshell (cf AI course)

Substitutions

Eg Num singular T vp(VNP)

Applying substitution bull Simultaneously replace variables by

corresponding termsbull S(Num) Num singular = S(singular)

Unification

Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)

Gives Num singular

bull Art(singular) and Art(plural) Fails

bull Art(Num1) and Art(Num2) Num1 Num2

bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative

Parsing with DCGs

Now require successful unification at each step

S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps

S-gt a turtle sleep fails

Case Marking

PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]

PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]

PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]

PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]

S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)

VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)

VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)

VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)

He sees her She sees him They see her

But not Them see he

DCGs

Are strictly more expressive than CFGs Can represent for instance

bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]

Probabilistic Models

Traditional grammar models are very rigid bull essentially a yes no decision

Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative

Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing

Illustration

Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known

bull Constructed by handbull Can be used to derive stochastic context free

grammarsbull SCFG assign probability to parse trees

Compute the most probable parse tree

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 36: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

Finite State Automaton

Sentences

bull John smiles - Pn Vibull The cat disappeared - Det N Vibull These new shoes hurt - Det Adj N Vibull John liked the old cat PN Vt Det Adj N

Phrase structure

S

NP

D N

VP

NPV

D N

PP

P NP

D N

the dog chased a cat into the garden

Notation

S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase

Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]

Terminals ~ Lexicon

Phrase structure

Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the

Recursionbull bdquoThe girl thought the dog chased the catldquo

VP -gt V SN -gt [girl]V -gt [thought]

Top-down parsing

S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat

Context-free grammarSS --gt --gt NPNPVPVP

NPNP --gt PN --gt PN Proper nounProper noun

NPNP --gt Art Adj N--gt Art Adj N

NPNP --gt ArtN--gt ArtN

VPVP --gt VI --gt VI intransitive verbintransitive verb

VPVP --gt VT --gt VT NPNP transitive verbtransitive verb

ArtArt --gt [the]--gt [the]

AdjAdj --gt [lazy]--gt [lazy]

AdjAdj --gt [rapid]--gt [rapid]

PNPN --gt [achilles]--gt [achilles]

NN --gt [turtle]--gt [turtle]

VIVI --gt [sleeps]--gt [sleeps]

VTVT --gt [beats]--gt [beats]

Parse tree

SS

NPNP VPVP

ArtArt AdjAdj NN VtVt NPNP

PNPN

achillesachillesbeatsbeatsturtleturtlerapidrapidthethe

Definite Clause GrammarsNon-terminals may have arguments

SS --gt --gt NPNP((NN))VPVP((NN))

NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))

VP(VP(NN)) --gt VI(--gt VI(NN))

Art(Art(singularsingular)) --gt [a]--gt [a]

Art(Art(singularsingular)) --gt [the]--gt [the]

Art(Art(pluralplural)) --gt [the]--gt [the]

N(N(singularsingular)) --gt [turtle]--gt [turtle]

N(N(pluralplural)) --gt [turtles]--gt [turtles]

VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]

VI(VI(pluralplural)) --gt [sleep]--gt [sleep]

Number Agreement

DCGs

Non-terminals may have argumentsbull Variables (start with capital)

Eg Number Any

bull Constants (start with lower case) Eg singular plural

bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)

Parsing needs to be adapted bull Using unification

Unification in a nutshell (cf AI course)

Substitutions

Eg Num singular T vp(VNP)

Applying substitution bull Simultaneously replace variables by

corresponding termsbull S(Num) Num singular = S(singular)

Unification

Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)

Gives Num singular

bull Art(singular) and Art(plural) Fails

bull Art(Num1) and Art(Num2) Num1 Num2

bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative

Parsing with DCGs

Now require successful unification at each step

S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps

S-gt a turtle sleep fails

Case Marking

PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]

PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]

PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]

PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]

S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)

VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)

VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)

VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)

He sees her She sees him They see her

But not Them see he

DCGs

Are strictly more expressive than CFGs Can represent for instance

bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]

Probabilistic Models

Traditional grammar models are very rigid bull essentially a yes no decision

Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative

Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing

Illustration

Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known

bull Constructed by handbull Can be used to derive stochastic context free

grammarsbull SCFG assign probability to parse trees

Compute the most probable parse tree

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 37: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

Phrase structure

S

NP

D N

VP

NPV

D N

PP

P NP

D N

the dog chased a cat into the garden

Notation

S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase

Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]

Terminals ~ Lexicon

Phrase structure

Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the

Recursionbull bdquoThe girl thought the dog chased the catldquo

VP -gt V SN -gt [girl]V -gt [thought]

Top-down parsing

S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat

Context-free grammarSS --gt --gt NPNPVPVP

NPNP --gt PN --gt PN Proper nounProper noun

NPNP --gt Art Adj N--gt Art Adj N

NPNP --gt ArtN--gt ArtN

VPVP --gt VI --gt VI intransitive verbintransitive verb

VPVP --gt VT --gt VT NPNP transitive verbtransitive verb

ArtArt --gt [the]--gt [the]

AdjAdj --gt [lazy]--gt [lazy]

AdjAdj --gt [rapid]--gt [rapid]

PNPN --gt [achilles]--gt [achilles]

NN --gt [turtle]--gt [turtle]

VIVI --gt [sleeps]--gt [sleeps]

VTVT --gt [beats]--gt [beats]

Parse tree

SS

NPNP VPVP

ArtArt AdjAdj NN VtVt NPNP

PNPN

achillesachillesbeatsbeatsturtleturtlerapidrapidthethe

Definite Clause GrammarsNon-terminals may have arguments

SS --gt --gt NPNP((NN))VPVP((NN))

NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))

VP(VP(NN)) --gt VI(--gt VI(NN))

Art(Art(singularsingular)) --gt [a]--gt [a]

Art(Art(singularsingular)) --gt [the]--gt [the]

Art(Art(pluralplural)) --gt [the]--gt [the]

N(N(singularsingular)) --gt [turtle]--gt [turtle]

N(N(pluralplural)) --gt [turtles]--gt [turtles]

VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]

VI(VI(pluralplural)) --gt [sleep]--gt [sleep]

Number Agreement

DCGs

Non-terminals may have argumentsbull Variables (start with capital)

Eg Number Any

bull Constants (start with lower case) Eg singular plural

bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)

Parsing needs to be adapted bull Using unification

Unification in a nutshell (cf AI course)

Substitutions

Eg Num singular T vp(VNP)

Applying substitution bull Simultaneously replace variables by

corresponding termsbull S(Num) Num singular = S(singular)

Unification

Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)

Gives Num singular

bull Art(singular) and Art(plural) Fails

bull Art(Num1) and Art(Num2) Num1 Num2

bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative

Parsing with DCGs

Now require successful unification at each step

S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps

S-gt a turtle sleep fails

Case Marking

PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]

PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]

PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]

PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]

S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)

VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)

VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)

VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)

He sees her She sees him They see her

But not Them see he

DCGs

Are strictly more expressive than CFGs Can represent for instance

bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]

Probabilistic Models

Traditional grammar models are very rigid bull essentially a yes no decision

Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative

Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing

Illustration

Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known

bull Constructed by handbull Can be used to derive stochastic context free

grammarsbull SCFG assign probability to parse trees

Compute the most probable parse tree

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 38: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

Notation

S sentence D or Det Determiner (eg articles) N noun V verb P preposition NP noun phrase VP verb phrase PP prepositional phrase

Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]

Terminals ~ Lexicon

Phrase structure

Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the

Recursionbull bdquoThe girl thought the dog chased the catldquo

VP -gt V SN -gt [girl]V -gt [thought]

Top-down parsing

S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat

Context-free grammarSS --gt --gt NPNPVPVP

NPNP --gt PN --gt PN Proper nounProper noun

NPNP --gt Art Adj N--gt Art Adj N

NPNP --gt ArtN--gt ArtN

VPVP --gt VI --gt VI intransitive verbintransitive verb

VPVP --gt VT --gt VT NPNP transitive verbtransitive verb

ArtArt --gt [the]--gt [the]

AdjAdj --gt [lazy]--gt [lazy]

AdjAdj --gt [rapid]--gt [rapid]

PNPN --gt [achilles]--gt [achilles]

NN --gt [turtle]--gt [turtle]

VIVI --gt [sleeps]--gt [sleeps]

VTVT --gt [beats]--gt [beats]

Parse tree

SS

NPNP VPVP

ArtArt AdjAdj NN VtVt NPNP

PNPN

achillesachillesbeatsbeatsturtleturtlerapidrapidthethe

Definite Clause GrammarsNon-terminals may have arguments

SS --gt --gt NPNP((NN))VPVP((NN))

NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))

VP(VP(NN)) --gt VI(--gt VI(NN))

Art(Art(singularsingular)) --gt [a]--gt [a]

Art(Art(singularsingular)) --gt [the]--gt [the]

Art(Art(pluralplural)) --gt [the]--gt [the]

N(N(singularsingular)) --gt [turtle]--gt [turtle]

N(N(pluralplural)) --gt [turtles]--gt [turtles]

VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]

VI(VI(pluralplural)) --gt [sleep]--gt [sleep]

Number Agreement

DCGs

Non-terminals may have argumentsbull Variables (start with capital)

Eg Number Any

bull Constants (start with lower case) Eg singular plural

bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)

Parsing needs to be adapted bull Using unification

Unification in a nutshell (cf AI course)

Substitutions

Eg Num singular T vp(VNP)

Applying substitution bull Simultaneously replace variables by

corresponding termsbull S(Num) Num singular = S(singular)

Unification

Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)

Gives Num singular

bull Art(singular) and Art(plural) Fails

bull Art(Num1) and Art(Num2) Num1 Num2

bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative

Parsing with DCGs

Now require successful unification at each step

S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps

S-gt a turtle sleep fails

Case Marking

PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]

PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]

PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]

PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]

S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)

VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)

VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)

VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)

He sees her She sees him They see her

But not Them see he

DCGs

Are strictly more expressive than CFGs Can represent for instance

bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]

Probabilistic Models

Traditional grammar models are very rigid bull essentially a yes no decision

Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative

Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing

Illustration

Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known

bull Constructed by handbull Can be used to derive stochastic context free

grammarsbull SCFG assign probability to parse trees

Compute the most probable parse tree

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 39: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

Context Free GrammarS -gt NP VPNP -gt D NVP -gt V NPVP -gt V NP PPPP -gt P NPD -gt [the]D -gt [a]N -gt [dog]N -gt [cat]N -gt [garden]V -gt [chased]V -gt [saw]P -gt [into]

Terminals ~ Lexicon

Phrase structure

Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the

Recursionbull bdquoThe girl thought the dog chased the catldquo

VP -gt V SN -gt [girl]V -gt [thought]

Top-down parsing

S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat

Context-free grammarSS --gt --gt NPNPVPVP

NPNP --gt PN --gt PN Proper nounProper noun

NPNP --gt Art Adj N--gt Art Adj N

NPNP --gt ArtN--gt ArtN

VPVP --gt VI --gt VI intransitive verbintransitive verb

VPVP --gt VT --gt VT NPNP transitive verbtransitive verb

ArtArt --gt [the]--gt [the]

AdjAdj --gt [lazy]--gt [lazy]

AdjAdj --gt [rapid]--gt [rapid]

PNPN --gt [achilles]--gt [achilles]

NN --gt [turtle]--gt [turtle]

VIVI --gt [sleeps]--gt [sleeps]

VTVT --gt [beats]--gt [beats]

Parse tree

SS

NPNP VPVP

ArtArt AdjAdj NN VtVt NPNP

PNPN

achillesachillesbeatsbeatsturtleturtlerapidrapidthethe

Definite Clause GrammarsNon-terminals may have arguments

SS --gt --gt NPNP((NN))VPVP((NN))

NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))

VP(VP(NN)) --gt VI(--gt VI(NN))

Art(Art(singularsingular)) --gt [a]--gt [a]

Art(Art(singularsingular)) --gt [the]--gt [the]

Art(Art(pluralplural)) --gt [the]--gt [the]

N(N(singularsingular)) --gt [turtle]--gt [turtle]

N(N(pluralplural)) --gt [turtles]--gt [turtles]

VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]

VI(VI(pluralplural)) --gt [sleep]--gt [sleep]

Number Agreement

DCGs

Non-terminals may have argumentsbull Variables (start with capital)

Eg Number Any

bull Constants (start with lower case) Eg singular plural

bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)

Parsing needs to be adapted bull Using unification

Unification in a nutshell (cf AI course)

Substitutions

Eg Num singular T vp(VNP)

Applying substitution bull Simultaneously replace variables by

corresponding termsbull S(Num) Num singular = S(singular)

Unification

Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)

Gives Num singular

bull Art(singular) and Art(plural) Fails

bull Art(Num1) and Art(Num2) Num1 Num2

bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative

Parsing with DCGs

Now require successful unification at each step

S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps

S-gt a turtle sleep fails

Case Marking

PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]

PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]

PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]

PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]

S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)

VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)

VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)

VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)

He sees her She sees him They see her

But not Them see he

DCGs

Are strictly more expressive than CFGs Can represent for instance

bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]

Probabilistic Models

Traditional grammar models are very rigid bull essentially a yes no decision

Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative

Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing

Illustration

Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known

bull Constructed by handbull Can be used to derive stochastic context free

grammarsbull SCFG assign probability to parse trees

Compute the most probable parse tree

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 40: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

Phrase structure

Formalism of context-free grammarsbull Nonterminal symbols S NP VP bull Terminal symbols dog cat saw the

Recursionbull bdquoThe girl thought the dog chased the catldquo

VP -gt V SN -gt [girl]V -gt [thought]

Top-down parsing

S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat

Context-free grammarSS --gt --gt NPNPVPVP

NPNP --gt PN --gt PN Proper nounProper noun

NPNP --gt Art Adj N--gt Art Adj N

NPNP --gt ArtN--gt ArtN

VPVP --gt VI --gt VI intransitive verbintransitive verb

VPVP --gt VT --gt VT NPNP transitive verbtransitive verb

ArtArt --gt [the]--gt [the]

AdjAdj --gt [lazy]--gt [lazy]

AdjAdj --gt [rapid]--gt [rapid]

PNPN --gt [achilles]--gt [achilles]

NN --gt [turtle]--gt [turtle]

VIVI --gt [sleeps]--gt [sleeps]

VTVT --gt [beats]--gt [beats]

Parse tree

SS

NPNP VPVP

ArtArt AdjAdj NN VtVt NPNP

PNPN

achillesachillesbeatsbeatsturtleturtlerapidrapidthethe

Definite Clause GrammarsNon-terminals may have arguments

SS --gt --gt NPNP((NN))VPVP((NN))

NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))

VP(VP(NN)) --gt VI(--gt VI(NN))

Art(Art(singularsingular)) --gt [a]--gt [a]

Art(Art(singularsingular)) --gt [the]--gt [the]

Art(Art(pluralplural)) --gt [the]--gt [the]

N(N(singularsingular)) --gt [turtle]--gt [turtle]

N(N(pluralplural)) --gt [turtles]--gt [turtles]

VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]

VI(VI(pluralplural)) --gt [sleep]--gt [sleep]

Number Agreement

DCGs

Non-terminals may have argumentsbull Variables (start with capital)

Eg Number Any

bull Constants (start with lower case) Eg singular plural

bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)

Parsing needs to be adapted bull Using unification

Unification in a nutshell (cf AI course)

Substitutions

Eg Num singular T vp(VNP)

Applying substitution bull Simultaneously replace variables by

corresponding termsbull S(Num) Num singular = S(singular)

Unification

Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)

Gives Num singular

bull Art(singular) and Art(plural) Fails

bull Art(Num1) and Art(Num2) Num1 Num2

bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative

Parsing with DCGs

Now require successful unification at each step

S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps

S-gt a turtle sleep fails

Case Marking

PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]

PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]

PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]

PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]

S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)

VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)

VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)

VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)

He sees her She sees him They see her

But not Them see he

DCGs

Are strictly more expressive than CFGs Can represent for instance

bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]

Probabilistic Models

Traditional grammar models are very rigid bull essentially a yes no decision

Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative

Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing

Illustration

Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known

bull Constructed by handbull Can be used to derive stochastic context free

grammarsbull SCFG assign probability to parse trees

Compute the most probable parse tree

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 41: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

Top-down parsing

S -gt NP VP S -gt Det N VP S -gt The N VP S -gt The dog VP S -gt The dog V NP S -gt The dog chased NP S -gt The dog chased Det N S-gt The dog chased the N S-gt The dog chased the cat

Context-free grammarSS --gt --gt NPNPVPVP

NPNP --gt PN --gt PN Proper nounProper noun

NPNP --gt Art Adj N--gt Art Adj N

NPNP --gt ArtN--gt ArtN

VPVP --gt VI --gt VI intransitive verbintransitive verb

VPVP --gt VT --gt VT NPNP transitive verbtransitive verb

ArtArt --gt [the]--gt [the]

AdjAdj --gt [lazy]--gt [lazy]

AdjAdj --gt [rapid]--gt [rapid]

PNPN --gt [achilles]--gt [achilles]

NN --gt [turtle]--gt [turtle]

VIVI --gt [sleeps]--gt [sleeps]

VTVT --gt [beats]--gt [beats]

Parse tree

SS

NPNP VPVP

ArtArt AdjAdj NN VtVt NPNP

PNPN

achillesachillesbeatsbeatsturtleturtlerapidrapidthethe

Definite Clause GrammarsNon-terminals may have arguments

SS --gt --gt NPNP((NN))VPVP((NN))

NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))

VP(VP(NN)) --gt VI(--gt VI(NN))

Art(Art(singularsingular)) --gt [a]--gt [a]

Art(Art(singularsingular)) --gt [the]--gt [the]

Art(Art(pluralplural)) --gt [the]--gt [the]

N(N(singularsingular)) --gt [turtle]--gt [turtle]

N(N(pluralplural)) --gt [turtles]--gt [turtles]

VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]

VI(VI(pluralplural)) --gt [sleep]--gt [sleep]

Number Agreement

DCGs

Non-terminals may have argumentsbull Variables (start with capital)

Eg Number Any

bull Constants (start with lower case) Eg singular plural

bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)

Parsing needs to be adapted bull Using unification

Unification in a nutshell (cf AI course)

Substitutions

Eg Num singular T vp(VNP)

Applying substitution bull Simultaneously replace variables by

corresponding termsbull S(Num) Num singular = S(singular)

Unification

Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)

Gives Num singular

bull Art(singular) and Art(plural) Fails

bull Art(Num1) and Art(Num2) Num1 Num2

bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative

Parsing with DCGs

Now require successful unification at each step

S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps

S-gt a turtle sleep fails

Case Marking

PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]

PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]

PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]

PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]

S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)

VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)

VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)

VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)

He sees her She sees him They see her

But not Them see he

DCGs

Are strictly more expressive than CFGs Can represent for instance

bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]

Probabilistic Models

Traditional grammar models are very rigid bull essentially a yes no decision

Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative

Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing

Illustration

Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known

bull Constructed by handbull Can be used to derive stochastic context free

grammarsbull SCFG assign probability to parse trees

Compute the most probable parse tree

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 42: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

Context-free grammarSS --gt --gt NPNPVPVP

NPNP --gt PN --gt PN Proper nounProper noun

NPNP --gt Art Adj N--gt Art Adj N

NPNP --gt ArtN--gt ArtN

VPVP --gt VI --gt VI intransitive verbintransitive verb

VPVP --gt VT --gt VT NPNP transitive verbtransitive verb

ArtArt --gt [the]--gt [the]

AdjAdj --gt [lazy]--gt [lazy]

AdjAdj --gt [rapid]--gt [rapid]

PNPN --gt [achilles]--gt [achilles]

NN --gt [turtle]--gt [turtle]

VIVI --gt [sleeps]--gt [sleeps]

VTVT --gt [beats]--gt [beats]

Parse tree

SS

NPNP VPVP

ArtArt AdjAdj NN VtVt NPNP

PNPN

achillesachillesbeatsbeatsturtleturtlerapidrapidthethe

Definite Clause GrammarsNon-terminals may have arguments

SS --gt --gt NPNP((NN))VPVP((NN))

NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))

VP(VP(NN)) --gt VI(--gt VI(NN))

Art(Art(singularsingular)) --gt [a]--gt [a]

Art(Art(singularsingular)) --gt [the]--gt [the]

Art(Art(pluralplural)) --gt [the]--gt [the]

N(N(singularsingular)) --gt [turtle]--gt [turtle]

N(N(pluralplural)) --gt [turtles]--gt [turtles]

VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]

VI(VI(pluralplural)) --gt [sleep]--gt [sleep]

Number Agreement

DCGs

Non-terminals may have argumentsbull Variables (start with capital)

Eg Number Any

bull Constants (start with lower case) Eg singular plural

bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)

Parsing needs to be adapted bull Using unification

Unification in a nutshell (cf AI course)

Substitutions

Eg Num singular T vp(VNP)

Applying substitution bull Simultaneously replace variables by

corresponding termsbull S(Num) Num singular = S(singular)

Unification

Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)

Gives Num singular

bull Art(singular) and Art(plural) Fails

bull Art(Num1) and Art(Num2) Num1 Num2

bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative

Parsing with DCGs

Now require successful unification at each step

S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps

S-gt a turtle sleep fails

Case Marking

PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]

PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]

PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]

PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]

S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)

VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)

VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)

VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)

He sees her She sees him They see her

But not Them see he

DCGs

Are strictly more expressive than CFGs Can represent for instance

bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]

Probabilistic Models

Traditional grammar models are very rigid bull essentially a yes no decision

Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative

Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing

Illustration

Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known

bull Constructed by handbull Can be used to derive stochastic context free

grammarsbull SCFG assign probability to parse trees

Compute the most probable parse tree

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 43: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

Parse tree

SS

NPNP VPVP

ArtArt AdjAdj NN VtVt NPNP

PNPN

achillesachillesbeatsbeatsturtleturtlerapidrapidthethe

Definite Clause GrammarsNon-terminals may have arguments

SS --gt --gt NPNP((NN))VPVP((NN))

NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))

VP(VP(NN)) --gt VI(--gt VI(NN))

Art(Art(singularsingular)) --gt [a]--gt [a]

Art(Art(singularsingular)) --gt [the]--gt [the]

Art(Art(pluralplural)) --gt [the]--gt [the]

N(N(singularsingular)) --gt [turtle]--gt [turtle]

N(N(pluralplural)) --gt [turtles]--gt [turtles]

VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]

VI(VI(pluralplural)) --gt [sleep]--gt [sleep]

Number Agreement

DCGs

Non-terminals may have argumentsbull Variables (start with capital)

Eg Number Any

bull Constants (start with lower case) Eg singular plural

bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)

Parsing needs to be adapted bull Using unification

Unification in a nutshell (cf AI course)

Substitutions

Eg Num singular T vp(VNP)

Applying substitution bull Simultaneously replace variables by

corresponding termsbull S(Num) Num singular = S(singular)

Unification

Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)

Gives Num singular

bull Art(singular) and Art(plural) Fails

bull Art(Num1) and Art(Num2) Num1 Num2

bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative

Parsing with DCGs

Now require successful unification at each step

S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps

S-gt a turtle sleep fails

Case Marking

PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]

PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]

PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]

PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]

S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)

VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)

VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)

VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)

He sees her She sees him They see her

But not Them see he

DCGs

Are strictly more expressive than CFGs Can represent for instance

bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]

Probabilistic Models

Traditional grammar models are very rigid bull essentially a yes no decision

Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative

Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing

Illustration

Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known

bull Constructed by handbull Can be used to derive stochastic context free

grammarsbull SCFG assign probability to parse trees

Compute the most probable parse tree

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 44: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

Definite Clause GrammarsNon-terminals may have arguments

SS --gt --gt NPNP((NN))VPVP((NN))

NP(NP(NN)) --gt Art(--gt Art(NN)N()N(NN))

VP(VP(NN)) --gt VI(--gt VI(NN))

Art(Art(singularsingular)) --gt [a]--gt [a]

Art(Art(singularsingular)) --gt [the]--gt [the]

Art(Art(pluralplural)) --gt [the]--gt [the]

N(N(singularsingular)) --gt [turtle]--gt [turtle]

N(N(pluralplural)) --gt [turtles]--gt [turtles]

VI(VI(singularsingular)) --gt [sleeps]--gt [sleeps]

VI(VI(pluralplural)) --gt [sleep]--gt [sleep]

Number Agreement

DCGs

Non-terminals may have argumentsbull Variables (start with capital)

Eg Number Any

bull Constants (start with lower case) Eg singular plural

bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)

Parsing needs to be adapted bull Using unification

Unification in a nutshell (cf AI course)

Substitutions

Eg Num singular T vp(VNP)

Applying substitution bull Simultaneously replace variables by

corresponding termsbull S(Num) Num singular = S(singular)

Unification

Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)

Gives Num singular

bull Art(singular) and Art(plural) Fails

bull Art(Num1) and Art(Num2) Num1 Num2

bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative

Parsing with DCGs

Now require successful unification at each step

S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps

S-gt a turtle sleep fails

Case Marking

PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]

PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]

PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]

PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]

S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)

VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)

VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)

VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)

He sees her She sees him They see her

But not Them see he

DCGs

Are strictly more expressive than CFGs Can represent for instance

bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]

Probabilistic Models

Traditional grammar models are very rigid bull essentially a yes no decision

Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative

Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing

Illustration

Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known

bull Constructed by handbull Can be used to derive stochastic context free

grammarsbull SCFG assign probability to parse trees

Compute the most probable parse tree

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 45: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

DCGs

Non-terminals may have argumentsbull Variables (start with capital)

Eg Number Any

bull Constants (start with lower case) Eg singular plural

bull Structured terms (start with lower case and take arguments themselves) Eg vp(VNP)

Parsing needs to be adapted bull Using unification

Unification in a nutshell (cf AI course)

Substitutions

Eg Num singular T vp(VNP)

Applying substitution bull Simultaneously replace variables by

corresponding termsbull S(Num) Num singular = S(singular)

Unification

Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)

Gives Num singular

bull Art(singular) and Art(plural) Fails

bull Art(Num1) and Art(Num2) Num1 Num2

bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative

Parsing with DCGs

Now require successful unification at each step

S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps

S-gt a turtle sleep fails

Case Marking

PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]

PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]

PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]

PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]

S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)

VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)

VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)

VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)

He sees her She sees him They see her

But not Them see he

DCGs

Are strictly more expressive than CFGs Can represent for instance

bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]

Probabilistic Models

Traditional grammar models are very rigid bull essentially a yes no decision

Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative

Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing

Illustration

Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known

bull Constructed by handbull Can be used to derive stochastic context free

grammarsbull SCFG assign probability to parse trees

Compute the most probable parse tree

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 46: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

Unification in a nutshell (cf AI course)

Substitutions

Eg Num singular T vp(VNP)

Applying substitution bull Simultaneously replace variables by

corresponding termsbull S(Num) Num singular = S(singular)

Unification

Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)

Gives Num singular

bull Art(singular) and Art(plural) Fails

bull Art(Num1) and Art(Num2) Num1 Num2

bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative

Parsing with DCGs

Now require successful unification at each step

S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps

S-gt a turtle sleep fails

Case Marking

PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]

PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]

PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]

PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]

S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)

VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)

VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)

VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)

He sees her She sees him They see her

But not Them see he

DCGs

Are strictly more expressive than CFGs Can represent for instance

bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]

Probabilistic Models

Traditional grammar models are very rigid bull essentially a yes no decision

Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative

Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing

Illustration

Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known

bull Constructed by handbull Can be used to derive stochastic context free

grammarsbull SCFG assign probability to parse trees

Compute the most probable parse tree

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 47: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

Unification

Take two non-terminals with arguments and compute (most general) substitution that makes them identical egbull Art(singular) and Art(Num)

Gives Num singular

bull Art(singular) and Art(plural) Fails

bull Art(Num1) and Art(Num2) Num1 Num2

bull PN(Num accusative) and PN(singular Case) Numsingular Caseaccusative

Parsing with DCGs

Now require successful unification at each step

S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps

S-gt a turtle sleep fails

Case Marking

PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]

PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]

PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]

PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]

S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)

VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)

VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)

VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)

He sees her She sees him They see her

But not Them see he

DCGs

Are strictly more expressive than CFGs Can represent for instance

bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]

Probabilistic Models

Traditional grammar models are very rigid bull essentially a yes no decision

Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative

Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing

Illustration

Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known

bull Constructed by handbull Can be used to derive stochastic context free

grammarsbull SCFG assign probability to parse trees

Compute the most probable parse tree

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 48: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

Parsing with DCGs

Now require successful unification at each step

S -gt NP(N) VP(N) S -gt Art(N) N(N) VP(N) Nsingular S -gt a N(singular) VP(singular) S -gt a turtle VP(singular) S -gt a turtle sleeps

S-gt a turtle sleep fails

Case Marking

PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]

PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]

PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]

PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]

S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)

VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)

VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)

VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)

He sees her She sees him They see her

But not Them see he

DCGs

Are strictly more expressive than CFGs Can represent for instance

bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]

Probabilistic Models

Traditional grammar models are very rigid bull essentially a yes no decision

Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative

Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing

Illustration

Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known

bull Constructed by handbull Can be used to derive stochastic context free

grammarsbull SCFG assign probability to parse trees

Compute the most probable parse tree

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 49: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

Case Marking

PN(singularnominative)PN(singularnominative) --gt --gt [he][she][he][she]

PN(singularaccusative)PN(singularaccusative) --gt --gt [him][her][him][her]

PN(pluralnominative)PN(pluralnominative) --gt --gt [they][they]

PN(pluralaccusative)PN(pluralaccusative) --gt --gt [them][them]

S --gt NP(Numbernominative) NP(Number)S --gt NP(Numbernominative) NP(Number)

VP(Number) --gt V(Number) VP(Anyaccusative)VP(Number) --gt V(Number) VP(Anyaccusative)

VP(NumberCase) --gt PN(NumberCase)VP(NumberCase) --gt PN(NumberCase)

VP(NumberAny) --gt Det N(Number)VP(NumberAny) --gt Det N(Number)

He sees her She sees him They see her

But not Them see he

DCGs

Are strictly more expressive than CFGs Can represent for instance

bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]

Probabilistic Models

Traditional grammar models are very rigid bull essentially a yes no decision

Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative

Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing

Illustration

Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known

bull Constructed by handbull Can be used to derive stochastic context free

grammarsbull SCFG assign probability to parse trees

Compute the most probable parse tree

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 50: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

DCGs

Are strictly more expressive than CFGs Can represent for instance

bull S(N) -gt A(N) B(N) C(N)bull A(0) -gt [] bull B(0) -gt []bull C(0) -gt []bull A(s(N)) -gt A(N) [A]bull B(s(N)) -gt B(N) [B]bull C(s(N)) -gt C(N) [C]

Probabilistic Models

Traditional grammar models are very rigid bull essentially a yes no decision

Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative

Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing

Illustration

Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known

bull Constructed by handbull Can be used to derive stochastic context free

grammarsbull SCFG assign probability to parse trees

Compute the most probable parse tree

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 51: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

Probabilistic Models

Traditional grammar models are very rigid bull essentially a yes no decision

Probabilistic grammarsbull Define a probability models for the databull Compute the probability of each alternativebull Choose the most likely alternative

Ilustrate on bull Shannon Gamebull Spelling correctionbull Parsing

Illustration

Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known

bull Constructed by handbull Can be used to derive stochastic context free

grammarsbull SCFG assign probability to parse trees

Compute the most probable parse tree

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 52: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

Illustration

Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known

bull Constructed by handbull Can be used to derive stochastic context free

grammarsbull SCFG assign probability to parse trees

Compute the most probable parse tree

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 53: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

Sequences are omni-present

Therefore the techniques we will see also apply tobull Bioinformatics

DNA proteins mRNA hellip can all be represented as strings

bullRobotics Sequences of actions states hellip

bullhellip

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course
Page 54: Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting

Rest of the Course

Limitations traditional grammar models motivate probabilistic extensions bull Regular grammars and Finite State Automata

All use principles of Part I on Graphical Models Markov Models using n-gramms (Hidden) Markov Models Conditional Random Fields

bull As an example of using undirected graphical models

bull Probabilistic Context Free Grammarsbull Probabilistic Definite Clause Grammars

  • Advanced Artificial Intelligence
  • Topic
  • Contents
  • Rationalism versus Empiricism
  • Slide 5
  • This course
  • Ambiguity
  • NLP and Statistics
  • Slide 9
  • Corpora
  • Word Counts
  • Slide 12
  • Word Counts (Brown corpus)
  • Slide 14
  • Zipflsquos Law
  • Language and sequences
  • Key NLP Problem Ambiguity
  • Language Model
  • Example of bad language model
  • A bad language model
  • Slide 22
  • A good language model
  • Why language models
  • Applications
  • Spelling errors
  • Handwriting recognition
  • For Spell Checkers
  • Another dimension in language models
  • Sequence Tagging
  • Slide 31
  • Parsing
  • Slide 33
  • Language models based on Grammars
  • Grammars and parsing
  • Regular Grammars and Finite State Automata
  • Finite State Automaton
  • Phrase structure
  • Notation
  • Context Free Grammar
  • Slide 41
  • Top-down parsing
  • Context-free grammar
  • Parse tree
  • Definite Clause Grammars Non-terminals may have arguments
  • DCGs
  • Unification in a nutshell (cf AI course)
  • Unification
  • Parsing with DCGs
  • Case Marking
  • Slide 51
  • Probabilistic Models
  • Illustration
  • PowerPoint Presentation
  • Sequences are omni-present
  • Rest of the Course