center for personkommunikation p.1 background for nlp questions brought up by n. chomsky in the...
TRANSCRIPT
P.1
Center for PersonKommunikation
Background for NLP
• Questions brought up by N. Chomsky in the 1950’ies:– Can a natural language like English be described (“parsed”, “compiled”) with
the same methods as used for formal/artificial (programming) languages in computer science?
– Can we use simple finite state grammars or context-free grammars for the description of English?
– Or does linguistics need to invent an own and more powerful grammar type for the description of natural languages?
• Offshoots: “The Chomsky Hierarchy of Grammars”, “Natural Language Processing”, “Generative Transformational Grammar”,
P.2
Center for PersonKommunikation
Chomsky: Grammar Theory 0
• Some key extracts/quotations from ”Syntactic Structures”
– A language is a (infinite) set of sentences, each finite in length and constructed out of a finite set of elements.
– A grammar is a device that separate the grammatical sequences from the ungrammatical sentences and generates the structures of the grammatical ones.
– A grammar is a reconstruction of the native speaker’s competence, his ability to generate (produce and understand) an infinite number of sentences
– A grammar is a theory of a language. It must comply with the empiristic axioms: The theory must be adequate and simple.
P.3
Center for PersonKommunikation
Chomsky: Grammar Theory 1
English Native SpeakerEnglish Grammar/Language Theory
Have you a book on modern music?The book seems interesting.…...
Sentence parsable!Sentence parsable!…..
The grammar must generate (“parse”) ALL sentences acceptable to the native speaker and ….
!
P.4
Center for PersonKommunikation
Chomsky: Grammar Theory 2
English Native Speaker
According to my intuition this sentence is
OK!OK!
...
English Grammar/Language Theory
1) Colorles green ideas sleep furiously.2) Have you a book on modern music?…
… the grammar must generate NOTHING BUT sentences acceptable to the native speaker and ...
Random sentence generation:
!
P.5
Center for PersonKommunikation
Chomsky: Grammar Theory 3
Grammar A
Grammar B
Set of Sentences
generated byA and B
Preferable grammar
(equivalent grammars)
… the grammar must be as SIMPLE (e.g. “small”) as possible
Language l
!
P.6
Center for PersonKommunikation
Chomsky: Grammar Theory 4
What’s in the “Black Box”? What type of grammar can “generate” a natural language like English?
– A Finite State Grammar without/with loops? • (No! “Syntactic Structures” pp. 18 ff.)
– A Phrase Structure Grammar?• (No! “Syntactic Structures” pp. 26 ff.)
– A Transformational Grammar?• (Yes/Maybe! According to “Syntactic Structures” pp 34 ff. BUT
“Generative Transformational Grammar” has turned out to be a “blind alley” in computational linguistics)
?
!
!
!
P.7
Center for PersonKommunikation
Chomsky: Hierarchy of Grammars
• Type 3: Regular Grammars
– Equivalent to finite state automata, finite state transition networks, Markov models (probabilistic type).
• Type 2: Context free Grammars
– E.g. recursive transition networks (RTNs), phrase structure grammars (PSGs). Unification grammars where attributes take values drawn from a finite table.
• Type 1: Context sensitive Grammars
– Augmented transition networks (ATNs), transformational grammars, some unification grammars
• Type 0: Unrestricted Grammars
!
!
P.8
Center for PersonKommunikation
Finite State Grammar
• Structure:– Directed Graph/Transition Network structure
– All transitions are terminals
– The terminal symbols are either words or POS (word class) names like Noun, Verb, Pronoun.
– The network structure may involve loops (iterations) and “empty” transitions (jumps, skips)
4
nodes
transition
loopj
jump
!
P.9
Center for PersonKommunikation
Recursive Transition Network Grammar
• Structure:– A SET of named Directed Graph/Transition Network structureS
– Transitions are terminals or NON-TERMINALS
– Terminal symbols/loops/jumps -> see FSN -slide
– A non-terminal symbol is the name of a network in the set included in the RTN
X
Jump
Xa bEquivalent BNF/PSGX -> a bX -> a X b
AnBn-problem, “Syntactic Structures”, p. 30
!
P.10
Center for PersonKommunikation
What’s wrong with FSNs & RTNs according to Chomsky?
• FSNs without loops can only generate a finite set of sentences. English is an infinite set
• FSNs with loops generate infinite sets of sentences but cannot describe AnBn sequences found in constructions with “respectively”.
• RTNs (PSGs/BNFs) generate infinite sets of sentences, can describe AnBn sequences, but applied to English a huge number of symbols is required (conflict with simplicity)
!
P.11
Center for PersonKommunikation
Generative Transformational Grammar
P.12
Center for PersonKommunikation
Young et al.: Grammar types in speech reocgnition
• Level Building [obsolete] (Young p. 8 f.):– finite state grammar without loops
• Viterbi/Token passing (Young p. 8 f, p. 11 ff.)– finite state grammar with loops
– context-free grammar provided that every non-terminal refers to a unique instance of a sub-network (No recursions!)
Conclusion: Decoding algorithms used in modern speech recognition technology can only be applied on the weakest grammar type within the Chomsky Hierarchy
P.13
Center for PersonKommunikation
Exercise 1
• The following extremely simple grammar generates all (typed) English sentences. What is wrong with it according to the Chomsky theory?
• Describe the concepts “native speaker, “intuition”, “generate” as used in the Chomsky theory.
char
Lexicon: char=a,b,c,…z,’;’,’.’,’?’, …etc
P.14
Center for PersonKommunikation
Exercise 2
Consider the "correct" (or "grammatical") ansi-C printf sequences in I and compare them with the "false" (or "ungrammatical") ones in II :
I printf("%d %s",integer,string)
printf("%s %d %d",string,integer,integer)
printf("%d",integer)
etc.
II printf("%d %s",integer,string,string)
printf("%d %s",integer)
printf("%d %s",string,integer)
etc.
Is it possible to design a regular (finite state) grammar that generates the correct sequences in I without generating II? If not, can a context-free grammar be designed that meet these conditions?