cs 3240 – chapter 5. languagemachinegrammar regularfinite automatonregular expression, regular...

29
CS 3240 – Chapter 5

Upload: conrad-richards

Post on 18-Jan-2016

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: CS 3240 – Chapter 5. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar

CS 3240 – Chapter 5

Page 2: CS 3240 – Chapter 5. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar

Language Machine Grammar

Regular Finite Automaton Regular Expression,Regular Grammar

Context-Free Pushdown Automaton

Context-Free Grammar

Recursively Enumerable

Turing Machine Unrestricted Phrase-Structure Grammar

2CS 3240 - Introduction

Page 3: CS 3240 – Chapter 5. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar

5.1: Context-Free Grammars Derivations Derivation Trees

5.2: Parsing and Ambiguity5.3: CFGs and Programming

Languages Precedence Associativity Expression Trees

CS 3240 - Context-Free Languages 3

Page 4: CS 3240 – Chapter 5. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar

S ➞ aaSa | λ It is not right-linear or left-linear

so it is not a “regular grammar”But it is linear

only one variableWhat is it’s language?

CS 3240 - Context-Free Languages 4

Page 5: CS 3240 – Chapter 5. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar

CS 3240 - Context-Free Languages 5

S ➝ aSb | λ

Deriving aaabbb:

S ⇒ aSb ⇒ aaSbb ⇒ aaaSbbb ⇒ aaabbb

Page 6: CS 3240 – Chapter 5. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar

Variables aka “non-terminals”

Letters from some alphabet, Σ aka “terminals”

Rules (“substitution rules”) of the form V → s

▪ where s is any string of letters and variables, or λ

Rules are often called productions

CS 3240 - Context-Free Languages 6

Page 7: CS 3240 – Chapter 5. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar

ancbn

anb2n

anbm, where 0 ≤ n ≤ m ≤ 2nanbm, n ≠ mPalindrome (start with a recursive

definition)Non-PalindromeEqualanbnam

CS 3240 - Context-Free Languages 7

Page 8: CS 3240 – Chapter 5. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar

CS 3240 - Pushdown Automata 8

S → aSbSbS | bSaSbS | bSbSaS | λ

Trace ababbb

When building CFGs, remember that the start variable (S) represents a string in the language. So, for example, if S has twice as many b’s as a’s, then so does aSbSbS, etc.

Page 9: CS 3240 – Chapter 5. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar

A derivation is a sequence of applications of grammatical rules, eventually yielding a string in the language

A CFG can have multiple variables on the right-hand side of a rule Giving a choice of which variable to

expand firstBy convention, we usually use a

leftmost derivationCS 3240 - Context-Free Languages 9

Page 10: CS 3240 – Chapter 5. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar

CS 3240 - Context-Free Languages 10

<S> → <NP> <VP><NP> → the <N><VP> → <V> <NP><V> → sings | eats<N> → cat | song | canary

<S> ⇒ <NP> <VP> ⇒ the <N> <VP> ⇒ the canary <VP> ⇒ the canary <V> <NP> ⇒ the canary sings <NP> ⇒ the canary sings the <N> ⇒ the canary sings the song

“sentential forms”(aka “productions”)

Page 11: CS 3240 – Chapter 5. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar

A graphical representation of a derivation

The start symbol is the rootEach symbol in the right-hand side of

the rule is a child node at the same level

Continue until the leaves are all terminals

CS 3240 - Context-Free Languages 11

Page 12: CS 3240 – Chapter 5. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar

CS 3240 - Context-Free Languages 12

Page 13: CS 3240 – Chapter 5. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar

Note how there was only one parse tree or the string “the canary sings the song” And only one leftmost derivation

This is not true of all grammars! Some grammars allow choices of

distinct rules to generate the same string Or equivalently, where there is more than

one parse tree for the same string Such a grammar is ambiguous

Not easy to process programmaticallyCS 3240 - Context-Free Languages 13

Page 14: CS 3240 – Chapter 5. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar

CS 3240 - Context-Free Languages 14

<exp> → <exp> + <exp> | <exp> * <exp> | (<exp>) | a | b | c

<exp> ⇒ <exp> + <exp> ⇒ a + <exp> ⇒ a + <exp> * <exp> ⇒ a + b * <exp> ⇒ a + b * c

<exp> ⇒ <exp> * <exp> ⇒ <exp> + <exp> * <exp> ⇒ a + <exp> * <exp ⇒ a + b * <exp> ⇒ a + b * c

Page 15: CS 3240 – Chapter 5. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar

CS 3240 - Context-Free Languages 15

Which one is “correct”?

Page 16: CS 3240 – Chapter 5. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar

The process of determining if a string is generated by a grammar And often we want the parse tree So that we know the order of operations

Top-down Parsing Easiest conceptually

Bottom-up Parsing Most efficient (used by commercial

compilers) We will use a simple one in Chapter 6

CS 3240 - Context-Free Languages 16

Page 17: CS 3240 – Chapter 5. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar

Try to match a string, w, to a grammar If there is a rule S → w, we’re done!

Fat chance :-) Try to find rules that match the first

character A “look-ahead” strategy

This is what we do “in our heads” anyway Repeat on the rest of the string… Very “brute force”

CS 3240 - Context-Free Languages 17

Page 18: CS 3240 – Chapter 5. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar

CS 3240 - Context-Free Languages 18

S → SS | aSb | bSa | λ

Parse “aabb”:

Page 19: CS 3240 – Chapter 5. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar

CS 3240 - Context-Free Languages 19

S → SS | aSb | bSa | λ

Parse “aabb”:

Candidate rules: 1) S → SS, 2) S → aSb:

1)SS ⇒ SSS, SS ⇒ aSbS2)aSb ⇒ aSSb, aSb ⇒ aaSbb

Answer: S ⇒ aSb ⇒ aaSbb ⇒ aabb (2)

Not a well-defined algorithm (yet)!

Page 20: CS 3240 – Chapter 5. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar

A top-down parsing technique Grammar Requirements:

no ambiguity no lambdas no left-recursion (e.g., A -> Ab) … and some other stuff

Create a function for each variable Check first character to choose a rule Start by calling S( )

CS 3240 - Context-Free Languages 20

Page 21: CS 3240 – Chapter 5. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar

Grammar: S -> aSb | ab

Function S: if length == 2, check to see if it is “ab” otherwise, consume outer‘a’ and ‘b ’, then

call S on what’s left See parseanbn.py, parseanbn2.py

CS 3240 - Context-Free Languages 21

Page 22: CS 3240 – Chapter 5. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar

Grammar: A -> BA | a

B -> bB | bSee parsebstara.cpp

CS 3240 - Context-Free Languages 22

Page 23: CS 3240 – Chapter 5. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar

Lambda rules can cause productions to shrink Then they can grow, and shrink again

And grow, and shrink, and grow, and shrink…

How then can we know if the string isn’t in the language? That is, how do we know when we’re done

so we can stop and reject the string?CS 3240 - Context-Free Languages 23

Page 24: CS 3240 – Chapter 5. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar

A rule of the form A → B doesn’t increase the size of the sentential form

Once again, we could spend a long time cycling through unit rules before parsing |w|

We prefer a method that always strictly grows to |w|, so we can stop and answer “yes” or “no” efficiently

So, we will remove lambda and unit rules In Chapter 6

CS 3240 - Context-Free Languages 24

Page 25: CS 3240 – Chapter 5. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar

PrecedenceAssociativity

CS 3240 - Context-Free Languages 25

Page 26: CS 3240 – Chapter 5. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar

It was ambiguous because it treated all operators equally But multiplication should have higher

precedence than additionSo we introduce a new variable for

multiplicative expressions And place it further down in the rules Because we want it to appear further

down in the parse tree

CS 3240 - Context-Free Languages 26

Page 27: CS 3240 – Chapter 5. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar

CS 3240 - Context-Free Languages 27

<exp> → <exp> + <mulexp> | <mulexp><mulexp> → <mulexp> * <rootexp> | <rootexp><rootexp> → (<exp>) | a | b | c

Now only one leftmost derivation for a + b * c:

<exp> ⇒ <exp> + <mulexp> ⇒ <mulexp> + <mulexp> ⇒ <rootexp> + <mulexp> ⇒ a + <mulexp> ⇒ a + <mulexp> * <rootexp> ⇒ a + <rootexp> * <rootexp> ⇒ a + b * <rootexp> ⇒ a + b * c

Page 28: CS 3240 – Chapter 5. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar

CS 3240 - Context-Free Languages 28

Page 29: CS 3240 – Chapter 5. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar

Derive the parse tree for a + b + c … Note how you get (a + b) + c, in

effect Left-recursion gives left associativity

Analogously for right associativity Exercise:

Add a right-associative power (exponentiation) operator (^, with variable <powerexp>) to the grammar with the proper precedence

CS 3240 - Context-Free Languages 29