center for personkommunikation p.1 background for nlp questions brought up by n. chomsky in the...

14
P. 1 Center for PersonKommunikation Background for NLP Questions brought up by N. Chomsky in the 1950’ies: Can a natural language like English be described (“parsed”, “compiled”) with the same methods as used for formal/artificial (programming) languages in computer science? Can we use simple finite state grammars or context- free grammars for the description of English? Or does linguistics need to invent an own and more powerful grammar type for the description of natural languages? Offshoots: “The Chomsky Hierarchy of Grammars”, “Natural Language Processing”, “Generative Transformational Grammar”,

Upload: daniela-young

Post on 05-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Center for PersonKommunikation P.1 Background for NLP Questions brought up by N. Chomsky in the 1950’ies: –Can a natural language like English be described

P.1

Center for PersonKommunikation

Background for NLP

• Questions brought up by N. Chomsky in the 1950’ies:– Can a natural language like English be described (“parsed”, “compiled”) with

the same methods as used for formal/artificial (programming) languages in computer science?

– Can we use simple finite state grammars or context-free grammars for the description of English?

– Or does linguistics need to invent an own and more powerful grammar type for the description of natural languages?

• Offshoots: “The Chomsky Hierarchy of Grammars”, “Natural Language Processing”, “Generative Transformational Grammar”,

Page 2: Center for PersonKommunikation P.1 Background for NLP Questions brought up by N. Chomsky in the 1950’ies: –Can a natural language like English be described

P.2

Center for PersonKommunikation

Chomsky: Grammar Theory 0

• Some key extracts/quotations from ”Syntactic Structures”

– A language is a (infinite) set of sentences, each finite in length and constructed out of a finite set of elements.

– A grammar is a device that separate the grammatical sequences from the ungrammatical sentences and generates the structures of the grammatical ones.

– A grammar is a reconstruction of the native speaker’s competence, his ability to generate (produce and understand) an infinite number of sentences

– A grammar is a theory of a language. It must comply with the empiristic axioms: The theory must be adequate and simple.

Page 3: Center for PersonKommunikation P.1 Background for NLP Questions brought up by N. Chomsky in the 1950’ies: –Can a natural language like English be described

P.3

Center for PersonKommunikation

Chomsky: Grammar Theory 1

English Native SpeakerEnglish Grammar/Language Theory

Have you a book on modern music?The book seems interesting.…...

Sentence parsable!Sentence parsable!…..

The grammar must generate (“parse”) ALL sentences acceptable to the native speaker and ….

!

Page 4: Center for PersonKommunikation P.1 Background for NLP Questions brought up by N. Chomsky in the 1950’ies: –Can a natural language like English be described

P.4

Center for PersonKommunikation

Chomsky: Grammar Theory 2

English Native Speaker

According to my intuition this sentence is

OK!OK!

...

English Grammar/Language Theory

1) Colorles green ideas sleep furiously.2) Have you a book on modern music?…

… the grammar must generate NOTHING BUT sentences acceptable to the native speaker and ...

Random sentence generation:

!

Page 5: Center for PersonKommunikation P.1 Background for NLP Questions brought up by N. Chomsky in the 1950’ies: –Can a natural language like English be described

P.5

Center for PersonKommunikation

Chomsky: Grammar Theory 3

Grammar A

Grammar B

Set of Sentences

generated byA and B

Preferable grammar

(equivalent grammars)

… the grammar must be as SIMPLE (e.g. “small”) as possible

Language l

!

Page 6: Center for PersonKommunikation P.1 Background for NLP Questions brought up by N. Chomsky in the 1950’ies: –Can a natural language like English be described

P.6

Center for PersonKommunikation

Chomsky: Grammar Theory 4

What’s in the “Black Box”? What type of grammar can “generate” a natural language like English?

– A Finite State Grammar without/with loops? • (No! “Syntactic Structures” pp. 18 ff.)

– A Phrase Structure Grammar?• (No! “Syntactic Structures” pp. 26 ff.)

– A Transformational Grammar?• (Yes/Maybe! According to “Syntactic Structures” pp 34 ff. BUT

“Generative Transformational Grammar” has turned out to be a “blind alley” in computational linguistics)

?

!

!

!

Page 7: Center for PersonKommunikation P.1 Background for NLP Questions brought up by N. Chomsky in the 1950’ies: –Can a natural language like English be described

P.7

Center for PersonKommunikation

Chomsky: Hierarchy of Grammars

• Type 3: Regular Grammars

– Equivalent to finite state automata, finite state transition networks, Markov models (probabilistic type).

• Type 2: Context free Grammars

– E.g. recursive transition networks (RTNs), phrase structure grammars (PSGs). Unification grammars where attributes take values drawn from a finite table.

• Type 1: Context sensitive Grammars

– Augmented transition networks (ATNs), transformational grammars, some unification grammars

• Type 0: Unrestricted Grammars

!

!

Page 8: Center for PersonKommunikation P.1 Background for NLP Questions brought up by N. Chomsky in the 1950’ies: –Can a natural language like English be described

P.8

Center for PersonKommunikation

Finite State Grammar

• Structure:– Directed Graph/Transition Network structure

– All transitions are terminals

– The terminal symbols are either words or POS (word class) names like Noun, Verb, Pronoun.

– The network structure may involve loops (iterations) and “empty” transitions (jumps, skips)

4

nodes

transition

loopj

jump

!

Page 9: Center for PersonKommunikation P.1 Background for NLP Questions brought up by N. Chomsky in the 1950’ies: –Can a natural language like English be described

P.9

Center for PersonKommunikation

Recursive Transition Network Grammar

• Structure:– A SET of named Directed Graph/Transition Network structureS

– Transitions are terminals or NON-TERMINALS

– Terminal symbols/loops/jumps -> see FSN -slide

– A non-terminal symbol is the name of a network in the set included in the RTN

X

Jump

Xa bEquivalent BNF/PSGX -> a bX -> a X b

AnBn-problem, “Syntactic Structures”, p. 30

!

Page 10: Center for PersonKommunikation P.1 Background for NLP Questions brought up by N. Chomsky in the 1950’ies: –Can a natural language like English be described

P.10

Center for PersonKommunikation

What’s wrong with FSNs & RTNs according to Chomsky?

• FSNs without loops can only generate a finite set of sentences. English is an infinite set

• FSNs with loops generate infinite sets of sentences but cannot describe AnBn sequences found in constructions with “respectively”.

• RTNs (PSGs/BNFs) generate infinite sets of sentences, can describe AnBn sequences, but applied to English a huge number of symbols is required (conflict with simplicity)

!

Page 11: Center for PersonKommunikation P.1 Background for NLP Questions brought up by N. Chomsky in the 1950’ies: –Can a natural language like English be described

P.11

Center for PersonKommunikation

Generative Transformational Grammar

Page 12: Center for PersonKommunikation P.1 Background for NLP Questions brought up by N. Chomsky in the 1950’ies: –Can a natural language like English be described

P.12

Center for PersonKommunikation

Young et al.: Grammar types in speech reocgnition

• Level Building [obsolete] (Young p. 8 f.):– finite state grammar without loops

• Viterbi/Token passing (Young p. 8 f, p. 11 ff.)– finite state grammar with loops

– context-free grammar provided that every non-terminal refers to a unique instance of a sub-network (No recursions!)

Conclusion: Decoding algorithms used in modern speech recognition technology can only be applied on the weakest grammar type within the Chomsky Hierarchy

Page 13: Center for PersonKommunikation P.1 Background for NLP Questions brought up by N. Chomsky in the 1950’ies: –Can a natural language like English be described

P.13

Center for PersonKommunikation

Exercise 1

• The following extremely simple grammar generates all (typed) English sentences. What is wrong with it according to the Chomsky theory?

• Describe the concepts “native speaker, “intuition”, “generate” as used in the Chomsky theory.

char

Lexicon: char=a,b,c,…z,’;’,’.’,’?’, …etc

Page 14: Center for PersonKommunikation P.1 Background for NLP Questions brought up by N. Chomsky in the 1950’ies: –Can a natural language like English be described

P.14

Center for PersonKommunikation

Exercise 2

Consider the "correct" (or "grammatical") ansi-C printf sequences in I and compare them with the "false" (or "ungrammatical") ones in II :

I printf("%d %s",integer,string)

printf("%s %d %d",string,integer,integer)

printf("%d",integer)

etc.

II printf("%d %s",integer,string,string)

printf("%d %s",integer)

printf("%d %s",string,integer)

etc.

Is it possible to design a regular (finite state) grammar that generates the correct sequences in I without generating II? If not, can a context-free grammar be designed that meet these conditions?