formal languages, automata and models of · pdf fileformal languages, automata and models of...
TRANSCRIPT
CDT314
FABERFABER
Formal Languages, Automata and Models of Computation
Lecture 4
School of Innovation, Design and Engineering Mälardalen University
2011
11
ContentContent
Regular Expressions and Regular Languages
NFA→DFA Subset Construction (Delmängdskonstruktion)NFA→DFA Subset Construction (Delmängdskonstruktion)
FA → RE State Elimination (Tillståndseliminering)
Mi i i i DFA b S t P titi i (Sä kilj d l it )Minimizing DFA by Set Partitioning (Särskiljandealgoritmen)
Grammar: Linear grammar. Regular grammar. Regular Lng. .
2
1: KleeneTheoremNFA↔DFANFA↔DFA
Let M be NFA accepting L. p gThen there exists DFA M´ that accepts L as well.
4
II: Finite Language TheoremA fi it l i FSA t blAny finite language is FSA-acceptable.
Example L = {abba, abb, abab}
FSA = Finite State Automaton = DFA/NFA
5
Theorem - Part 1
LanguagesGenerated by
Regular⊆Generated byRegular Expressions
Languages⊆
1. For any regular expressionthe language is regular
r)(rL
7
Theorem - Part 2
LanguagesGenerated by
RegularL⊇Generated by
Regular ExpressionsLanguages⊇
2. For any regular language there isLLrL =)(
2. For any regular language there isa regular expression withr
8
Proof - Part 1
rFor any regular expression)(rL
y g pthe language is regular
Proof by induction on the size of rProof by induction on the size of r
9
Induction BasisPrimitive Regular Expressions: αλ,,∅NFAs Σ∈α(where )NFAs
)()( 1 ∅=∅= LML
( )
)()( 1 ∅=∅= LML
Regular)(}{)( 2 λλ LML == RegularLanguages
)(}{)( 3 aLaML ==a
10
Inductive Hypothesis
Assume
1r 2rfor regular expressions and
that and are regular languages)( 1rL )( 2rL
11
Inductive StepWe will prove:
)()( 21
LrrL +
regular languages)(
)(
1
21
rL
rrL∗
⋅
languages))(()(
1
1
rLrL
12
By definition of regular expressions:
)()()()()()( 2121
rLrLrrLrLrLrrL
=⋅∪=+
))(()(
)()()(*
11
2121
rLrL
rLrLrrL
=
=⋅∗
)())(())(()(
11
11
rLrL =
13
B i d ti h th i k)( 1rL )( 2rL
By inductive hypothesis we know:and are regular languages
It is known, and we will prove in this lecture:Regular languages are closed under
( ) ( )
It is known, and we will prove in this lecture:
( ) ( )( ) ( )
21
rLrLrLrL ∪union
concatenation ( ) ( )( )( )*1
21
rLrLrLconcatenation
star
14
Therefore
( ) ( ) ( )2121 rLrLrrL ∪=+( ) ( ) ( )
( ) ( ) ( )
2121
LLL
rLrLrrL ∪+
regular( ) ( ) ( )2121 rLrLrrL =⋅ languages
( ) ( )( )** 11 rLrL =
And trivially ))(( 1rL is a regular language15
y ))(( 1 is a regular language
Proof – Part 2
LFor any regular language there isr LrL =)(a regular expression with
Proof by construction of regular expression
16
LM
Since is regular take the NFA that accepts ittake the NFA that accepts it
LML =)(
Single final state17
Single final state
From construct the equivalentGeneralized Transition Graph
Mp
transition labels are regular expressions
ExampleM
a cM
a c
ba, ba +
18
TheoremTheorem
The reverse of a regular languageRL Lg g gis a regular language
Proof idea
Construct NFA that accepts :RL
invert the transitions of the NFAthat accepts L
20
that accepts L
ProofSince is regular, there is NFA that accepts
LLthere is NFA that accepts
Example: bExample:
baabL += *b
abb
a
21
Properties1L 2LFor regular languages and
we will prove that:
21 LL ∪Union:
p
21 LL ∪U o
Are Regular21LLConcatenation:
Are RegularLanguages
*1LStar:27
1LStar:
L L1LRegular language
( ) 11 LML =2LRegular language
( ) 22 LML =( ) 11 LML
MNFA M
( ) 22 LML
NFA1MNFA 2MNFA
Single final state Single final state
29
Summary: Operations on Regular Expressions
RE Regular language descriptionRE Regular language description
a+b {a b}a+b {a,b} (a+b)(a+b) {aa, ab, ba, bb}a* {λ a aa aaa }a* {λ, a, aa, aaa, …}a*b {b,ab, aab, aaab, …}( +b)* {λ b b b bbb }(a+b)* {λ, a, b, aa, ab, ba, aaa, bbb…}
37
Algebraic Properties
Axiom Description
r +s = s+r + is commutative
(r +s)+t = r +(s+t) + is associative (r s) t r (s t) is associative
(rs)t = r (st) concatenation is associative
r (s+t) = rs+rt (s+t)r = sr +tr
concatenation distributes over +
λr = r λ is the identity element for λr r rλ = r
λ is the identity element for concatenation
r* = ( r +λ)* relation between * and λ
38r** = r* * is idempotent
Operator PrecedenceOperator Precedence
1. Kleene star2. Concatenation3. Union
allows dropping unnecessary parentheses.
39
Example: From a Regular Expression to an NFA
Example : (a+b)*abb step by step construction
(a+b)
2a
λ 3 λ
61
λλ 4 5b
λλ
41
Example: From a Regular Expression to an NFA
Example : (a+b)*abbλ
(a+b)*
a
λ
2a
1λ
λλ λ6
3
0 7
4 5b
λλ
6
λ
42
Example: From a Regular Expression to an NFA
Example : (a+b)*abbλ (a+b)*abb
a
λ (a b) abb
2 a
10 aλ
λλλ
76
3
84 5b b
λλ7 8
bλ
9
4310
C ti FAConverting FA to a Regular Expressiong p
State Elimination
http://www.math.uu.se/~salling/Movies/FA_to_RegExpression.mov
44
Example
ba ,a
1 2b
b
1 2
a b3
a
Add a new start (s) and a new final (f) state:Add a new start (s) and a new final (f) state:• From s to the ex-starting state (1)• From each ex-final state (1,3) to f
45
From each ex final state (1,3) to f
λ ba ,a
1 2sb
b
1
aλ
b3
The original:
ba ,a
1 2ba ,
a
1 2
f λb
b
1 2
3
ab
b
1 2
3
a
Let’s remove state 1!Each combination of input/output to 1 will generateEach combination of input/output to 1 will generate a new path once state 1 is gone
b λ46
;λ ba , λ ;λ ba a; , a ; λ
When state 1 is gone we must be able to make all those transitions!
λ b∪a( );λ ba ,
λ baλ ba ,a
b1 2s
λb
3
a
λ baa
Previous:
3
f λ
λ ba,
bba
λ
47
f λλ
A common mistake: having several arrows between the same pair of states. Join the two arrows (by union of their regular p ( y gexpressions) before going on to the next step.
λ bλ b∪a( )
λ ba ,a
b1 2s
ba
λb
λ 3λ bb∪a( )a ∪
50f λ
λ ba( )
Now we repeat the same procedure for the state 2...
λ b∪a( )
a2sb
b∪a(a ∪ b3
λ b∪a(a ∪ b
f λ a∪
54
Following the path s-2-3 we concatenate all strings on our way…don’t forget a* in 2!
λ b∪a( )
ab
2sbb∪a( )λ a∗
3λ
bb∪a( )a ∪
3
f λ a∪
55
f
When 2 is removed, the path 3 - 2 –3 has also to be preserved, so we concatenate 3-2 string, take a* loop and go back 2-3 with string b!
λ b∪a( )
ab 2sbb∪a( )λ a∗
λbb∪a(a ∪
bba( )λ a
3λ
( bb∪a( )a ∪ ba∗)
56
f λ a∪
This is how the FA looks like without state 2:
sbb∪a( )λ a∗
3λ
( bb∪a( )a ∪ ba∗)
f λ a∪
( bb∪a( )a ∪ ba)
57
...so we concatenate strings s-3, loop in 3 and 3-f
sbaba *)( ∪λ
3λbabbaa ∗∪∪ ))((
f
babbaa ∪∪ ))((a∪λ
∗∗∪∪ )))((( babbaababa ∗∪ )(λ )( a∪λ
59
Now we can omit state 3!
s
λ
f
∗∗∪∪ )))((( babbaababa ∗∪ )(λ )( a∪λ
From s we have two choices, empty string or the long expression
∪∪ )))((( babbaababa∪ )(λ )( a∪λ
60OR is represented by union ∪, as usually
Converting FA to a RegularConverting FA to a Regular Expression-p
An Algorithm for State EliminationElimination
62
• We expand our notion of NFA- to allow transitions on arbitrary regular expressions, not simply single symbols or λ .
• Successively eliminate states, replacing transitions that enter and leave a state with a more complicated regular expression, until eventually there are only two states left: a start state, an accepting state, and astates left: a start state, an accepting state, and a single transition connecting them, labeled with a regular expression.
• The resulting regular expression is then our answer.
63
• To begin with, the automaton should have a start state that has no transitions into it (including self-loops), and ( g p ),which is not accepting.
• If your automaton does not obey this property, add a i d dd i inew, non-accepting start state s, and add a λ-transition
from s to the original start state.
The automaton should also have a single accepting• The automaton should also have a single accepting final state with no transitions leaving it, and no self-loops.
• If your automaton does not have it, add a new final state q, change all other accepting states to non-accepting and add λ transitions from them to qaccepting, and add λ -transitions from them to q.
This change clearly doesn't change the language accepted by the automaton.
64
accepted by the automaton.
Repeat the following steps, which eliminate a state:
1. Pick a non-start, non-accepting state q to eliminate. The state q will have i transitions in and j transitions out. Each will be labeled with a regular expression. For each of the ij combinations of transitions into and out of q, replace:
A
BCA
p qC
r
ithwithCBA *
p r
65And delete state q.
2 If several transitions go between a pair of states replace2. If several transitions go between a pair of states, replace them with a single transition that is the union of the individual transitions.
E.g. replace: A
p r
B
ithwith
p rBA∪
66
p
N.B. common mistake!
a b
meansba ,*)( ba∪
meansi.e.
NOT *)*( ba ∪See example on s 46 and 47 in Sallings book
NOT )( ba ∪
68
p g
Minimizing DFAb S t P titi iby Set Partitioning
http://www.math.uu.se/~salling/Movies/SubsetConstruction.mov
69
Minimization of DFAMinimization of DFA
The deterministic finite automata are not always the smallest possible accepting the source p p glanguage.
There may be states with the sameThere may be states with the same "acceptance behavior". This applies to states p and q, if for all input words, the automaton always or never moves to a final state from p and q.
70
State Reduction by Set Partitioning (Särskiljandealgoritmen)
The set partitioning technique is similar to one usedThe set partitioning technique is similar to one usedfor partitioning people into groups based on theirresponses to questionnaireresponses to questionnaire.
The following slides show the detailed steps forcomputing equivalent state sets of the starting DFAand constructing the corresponding reduced DFA.
71
State Reduction by Set Partitioning
Step 0: Partition the states into two groupsaccepting and non acceptingaccepting and non-accepting.
{ } { }
P1 P2
aa
b
a 31
a
{ 3, 4, 5 } { 0, 1, 2 }
b ba
b 4
2
0
bb a5
73
State Reduction by Set Partitioning
Step 1: Get the response of each state for each input symbol. Notice that States 3 and 0 show different responses from the ones of the other states in the same setones of the other states in the same set.
P1 P2 P1P2
p1 p1 p1 p2 p1 p1a→ ↑ ↑ ↑ a→ ↑ ↑ ↑ a
b3
a
a→ ↑ ↑ ↑ a→ ↑ ↑ ↑{3, 4, 5 } {0, 1, 2 }
b→ ↓ ↓ ↓ b→ ↓ ↓ ↓b b
aa
ab
a
4
1
0
p2 p1 p1 p2 p1 p1 bb a
b b25
74
Step 2: Partition the sets according to the responses, and go to Step 1 until no partition occurs.
P11 P12 P21 P22
p11 p11 p12 p12
a→ ↑ ↑ a→ ↑ ↑
{4, 5} {3} {1, 2} {0}b→ ↓ ↓ b→↓ ↓
p11 p11 p11 p11
No further partition is possible for the sets P11 and P21 . So the final partition results are as follows:
75
{4, 5} {3} {1, 2} {0}
Minimized DFA consists of four states of the final partition, and the transitions are the one corresponding to the
{4, 5} {3} {1, 2} {0}
and the transitions are the one corresponding to the starting DFA.
{ } { } { } { }
aa
b3
a, ba
ab
b
a
4
31
a
0
b3
a, ba
a, b1,2
b4,5
0,
bb a
b bab 4
25
0aa, b
1,2
b4,5
0,
Minimized DFA Starting DFA
76
g
DFA Minimization Algorithm
The algorithm Why does this work?Partition P ∈ 2Q (power set)Start off with 2 subsets of QP ← { F, {Q-F}}
while ( P is still changing)T ← { }f h t P
Start off with 2 subsets of Q{F} and {Q-F}
for each set s ∈ Pfor each α ∈ Σ
partition s by αinto s1, s2, …, sk
While loop takes Pi→Pi+1 by splitting one or more sets
Pi+1 is at least one step closer to1, 2, , k
T ← T ∪ s1, s2, …, skif T ≠ P then
P ← T
Pi+1 is at least one step closer to the partition with |Q | sets
Maximum of |Q | splits
Partitions are never combinedInitial partition ensures that final
t t i t t
This is a fixed-point algorithm!
77
states are intact
Minimization and partitioning (Salling)
aExample 2.38 A minimal automaton },{ ba=Σ
a ab
aaaΣ
Theorem 2.4A DFA with Nstates is
a
bε
bab abastates is
minimal if its language distinguishes b
a
Σb
b
b
distinguishes N strings
b
The automaton above is minimal, as it has 6 states }{ bbb
78
and distinguishes 6 strings: },,,,,{ abaabaabaε
ab
aaa a
Two strings x,y are distinguished by a language b
aaab
aaεa
aε
aaa aε
ε
(DFA) if there is a distinguishing stringsuch that only one of strings
zab
abab b
aε
aaaε ε
aaaε
εsuch that only one of strings xz, yz ends in acceptingstate (belongs to language).
b aa abε a
aa
aaa
a ab
aε ab aba
Σ
ba
b
εb
79Σ b
b
ab
aaa a
A set of strings is distinguished by b
aaab
aaεa
aε
aaa aε
ε
g ylanguage if each pair of strings is distinguished.
a
ababa
b b
aε
aaaε ε
aaaε
ε
aa
aaaaΣ
ε a b aa ab
ba
εb
ab abaΣ
ba
Σb
b
80
Σb
ab
aaa ab
aaab
aaεa
aε
aaa aε
ε
a
ababa
b b
aε
aaaε ε
aaaε
εε
aa
aaaaΣ
ε a b aa ab
ba
εb
ab abaΣ
ba
Σb
b
81
Σb
ab
aaa ab
aaab
aaεa
aε
aaa aε
ε
a
ababa
b b
aε
aaaε ε
aaaε
ε
aa
aaaaΣ
ε a b aa ab
ba
εb
ab abaΣ
ba
Σb
b
82
Σb
ab
aaa ab
aaab
aaεa
aε
aaa aε
ε
a
ababa
b b
aε
aaaε ε
aaaε
ε
aa
aaaaΣ
ε a b aa ab
εba
εb
ab abaΣ ε
ba
Σb
b
83
Σb
ab
aaa ab
aaab
aaεa
aε
aaa aε
ε
a
ababa
b b
aε
aaaε ε
aaaε
ε
aa
aaaaΣ
ε a b aa ab
ba
εb
ab abaΣ
ba
Σb
b
84
Σb
ab
aaa ab
aaab
aaεa
aε
aaa aε
ε
a
ababa
b b
aε
aaaε ε
aaaε
εε
aa
aaaaΣ
ε a b aa ab
ba
εb
ab abaΣ
ba
Σb
b
85
Σb
ab
aaa aWhy ?aaa
We can try shorter strings baaab
aaεa
aε
aaa aε
ε
We can try shorter strings. Will do?aNo as both aba and aa
a
ababa
b b
aε
aaaε ε
aaaε
εNo, as both aba and aa get accept! Check!
aa
aaaaΣ
ε a b aa ab
ba
εb
ab abaΣ
},{ ba=Σ
ba
Σb
b
86
Σb
ab
aaa ab
aaab
aaεa
aε
aaa aε
ε
a
ababa
b b
aε
aaaε ε
aaaε
ε
aa
aaaaΣ
ε a b aa ab
εba
εb
ab abaΣ ε
ba
Σb
b
87
Σb
ab
aaa ab
aaab
aaεa
aε
aaa aε
ε
ba
ababa
b
aε
aaaε ε
aaaε
εεb
aa
aaaaΣ
ε a aa ab
ba
εb
ab abaΣ
ba
Σb
b
88
Σb
ab
aaa ab
aaab
aaεa
aε
aaa aε
ε
ba
ababa
b
aε
aaaε ε
aaaε
εb
aa
aaaaΣ
ε a aa ab
ba
εb
ab abaΣ
ba
Σb
b
89
Σb
ab
aaa ab
aaab
aaεa
aε
aaa aε
ε
ba
ababa
b
aε
aaaε ε
aaaε
εb
aa
aaaaΣ
ε a aa abε
ba
εb
ab abaΣ ε
ba
Σb
b
90
Σb
ab
aaa ab
aaab
aaεa
aε
aaa aε
ε
ba
ababa
b
aε
aaaε ε
aaaε
εb
aa
aaaaΣ
ε a aa abε
ba
εb
ab abaΣ ε
ba
Σb
b
91
Σb
ab
aaa ab
aaab
aaεa
aε
aaa aε
ε
a
ababa
b b
aε
aaaε ε
aaaε
εε
aa
aaaaΣ
ε a b aa ab
ba
εb
ab abaΣ
ba
Σb
b
92
Σb
ab
aaa ab
aaab
aaεa
aε
aaa aε
ε
a
ababa
b b
aε
aaaε ε
aaaε
ε
aa
aaaaΣ
ε a b aa ab
ba
εb
ab abaΣ
a
ba
Σb
b
93
Σb
ab
aaa ab
aaab
aaεa
aε
aaa aε
ε
ba
ababa
b
aε
aaaε ε
aaaε
εab
aa
aaaaΣ
ε a b aa
ba
εb
ab abaΣ ε
ba
Σb
b
94
Σb
Set Partition Algorithm (Salling book)
Example 2.39 We apply step by step set partition algorithm on the following DFA: }{ ba=Σ },{ ba=Σ
b 4Two strings x,y are distinguished by
bb
2
4a ab
g ylanguage (DFA) if there is a (distinguishing) string a
a b1
35 b
(distinguishing) string z such as that only one of strings xz, yz
d i ia
ab3
6ends in acceptingstate (belongs to language).
95
g g )
}6,5,4,3,2,1{N ti A ti
We search for strings that are distinguished by DFA!
}6,5,4,2,1{ }3{Non-accepting: Accepting:
bb 4b
What happens when we read ?a
ba
a b
21
35 ba ab
Distinguishing string: ora bWhat happens when we read ?a
From 1 and 6 by a we reach 3 which is accepting They form a special group
aa
b36
}3{}5,4,2{ }6,1{
accepting. They form a special group.
}3{}5{ }6,1{}4,2{
What happens when we read ?bFrom 5 we end in 6 by b,
96
}3{}5{ }6,1{}4,2{yleaving its group.
The minimal automaton has four states:
}3{}5{ }6,1{}4,2{
a bb
b}5{
}42{ ab
}6,1{ b }3{}4,2{
a
a
a
97
}0:{ ! ≥nan}0,:{ ≥+ lncba lnln
Context-Free LanguagesNon-regular languages
}{ nnba }{ RwwContext Free Languages
Regular LanguagesRegular Languages
99
Example of subset construction for NFA → DFA
ε
2 a 3
εε
1start 1010
ε
0 6
bε
87 9ε ε a b b
4 5b
ε
NFA N for (a+b)*abb
101
( )
Example of subset construction for NFA → DFA
STATEINPUT SYMBOL
a b
A B CABCD
BBBB
CDCED
EBB
EC
Translation table for DFA
102
Example of subset construction for NFA → DFA
b
b bC
Astart Ba Db b 10E
a
a aa
Result of applying the subset construction of NFA for (a+b)*abb.
103
Another Example of State Elimination
b b
baa
b b
ba,
b0q 1q 2q
b
b b
ba +a
b
0q 1q 2q
b
b0q 1q 2q
104
Another Example of State Elimination
babb*Resulting Regular Expression
0q 2q
babb
)(* babb +0q 2q)(
*)(**)*( bbabbabbr +=
LMLrL == )()(106
)()(
Obtaining the final regular expression
1rr
4r
0q fq3r
0q fq2r
*)*(* 213421 rrrrrrr +=
LMLrL == )()(
108
Example: From an NFA to a DFA
b
states a bA B C
Cab b
B B DC B CD B E
A B D Ea b b
aD B EE B C a
a
a
109
Example: From an NFA to a DFA
states a bA B AA B A
B B D
D B ED B E
E B A
b b
A B Da b b
b b
A B D E
a
a
a
110
Example: DFA Minimization
Curr ent Part it ion Split on a Split on b
P0 {s4} {s0, s1, s2, s3} none {s0, s1, s2} {s3}
P1 {s 4}{s3}{s0 s1 s2} none {s0 s2}{s1}
final state
P1 {s 4}{s3}{s0, s1, s2} none {s0, s2}{s1}
P2 {s4}{s3}{s1}{s0, s2} none none
a a ba a
s0
as1
b
s3
bs4
a
b
a b
s0 , s2
as1 s3
bs4
b
a b
s2
b
b
111
Example: DFA Minimization
What about a ( b + c )* ?
a λq4 q5
b
λ
λ λλ λ
First the subset construction:
q0 q1 a λ
q6 q7 c
q3 q8 q2 q9
λ
λ λ
λ λ
First, the subset construction:
ε-c los ure (m ove( s ,*)) b
NFA s ta te s a b cs 0 q 0 q 1, q 2, q 3,
q 4, q 6, q 9
none none
s 1 q 1, q 2, q 3, none q 5, q 8, q 9, q 7, q 8, q 9,
s2
s0 s1
ba
b cq 4, q 6, q 9 q 3, q 4, q 6 q 3, q 4, q 6
s 2 q 5, q 8, q 9,q 3, q 4, q 6
none s 2 s 3
s 3 q 7, q 8, q 9,q 3, q 4, q 6
none s 2 s 3
s3
c
c
112Final states
Example: DFA MinimizationThen, apply the minimization algorithm
s
b
S p lit on s2
s0 s1
c
b
ab c
S p lit onCurren t Pa rt ition a b c
P0 { s 1, s 2, s 3} {s 0} none none none
To produce the minimal DFAs3
c
c
final states
a
b , c
s0 s1
113
GrammarsGrammars express languages
Example: the English language
predicatephrasenounsentence → _
nounarticlephrasenoun →_
verbpredicate →
115
i lthearticleaarticle
→
→
birdnoun →
thearticle
dognounbirdnoun
→
→
singsverb →
barksverbsingsverb
→
116
A derivation of “the bird sings”:A derivation of the bird sings :
⟩⟨⟩⟨⇒⟩⟨ di tht⟩⟨⟩⟨⟩⟨⇒
⟩⟨⟩⟨⇒⟩⟨
predicatenounarticlepredicatephrasenounsentence _
⟩⟨⟩⟨⟩⟨⇒ verbnounarticlep
⟩⟨⟩⟨⟩⟨⇒
⟩⟨⟩⟨⟩⟨⇒
verbbirdtheverbnounthe
⟩⟨⟩⟨⟩⟨⇒
⟩⟨⟩⟨⟩⟨⇒
birdtheverbbirdthesings
117
A derivation of “a dog barks”:A derivation of a dog barks :
predicatephrasenounsentence ⇒
verbphrasenounpredicatephrasenounsentence
⇒
⇒ _
verbnounarticlep
⇒_
verbnouna⇒
b kdverbdoga
⇒
⇒
118
barksdoga⇒
The language of the grammar:
,"barksbirda"{=L
,"barksbirdhet","ingssbirda"
"barksogda","ingssbirdhet"
}
,"ingssogda",barksogda
"ingssogdhet","barksogdhet"}
119
gg }
λ→→
SaSbSGrammar:
bb
λ→S
aabbaaSbbaSbS ⇒⇒⇒
aabbDerivation of sentence
aabbaaSbbaSbS ⇒⇒⇒
aSbS → λ→S122
Other derivationsOther derivations
aaabbbaaaSbbbaaSbbaSbS ⇒⇒⇒⇒ aaabbbaaaSbbbaaSbbaSbS ⇒⇒⇒⇒
aaaabbbbaaaaSbbbbaaaSbbbaaSbbaSbS
⇒⇒⇒⇒⇒aaaabbbbaaaaSbbbb⇒⇒
123
Formal DefinitionFormal Definition
G ( )PSTVGGrammar ( )PSTVG ,,,=
V S t f i bl:V Set of variables
:T Set of terminal symbolsSet of terminal symbols
:S Start variable
:P Set of production rules
125
ExampleGrammar
→ aSbSG
λ→→
SaSbS
( )PSTVG ,,,=
}{SV }{ bT}{SV = },{ baT =
}{ λSSbSP126
},{ λ→→= SaSbSP
Sentential FormSe e a oA sentence that contains variables and terminalsvariables and terminals
Example
aaabbbaaaSbbbaaSbbaSbS ⇒⇒⇒⇒
sentential forms Sentence(sats)
127
(sats)
In general we write
ww*⇒
In general we write
nww1 ⇒
wwww ⇒⇒⇒⇒ L321
if
nwwww ⇒⇒⇒⇒ 321
By default ww*⇒( )
129
By default ww ⇒( )
Another Grammar Example
→ AbSGGrammar
→→
aAbAbS
λ→ADerivations
abbaAbbAbSbAbS
⇒⇒⇒⇒⇒
aabbbaaAbbbaAbbSabbaAbbAbS⇒⇒⇒
⇒⇒⇒
132
aabbbaa bbba bbS ⇒⇒⇒
More Derivations
aaaAbbbbaaAbbbaAbbAbS ⇒⇒⇒⇒aaaabbbbbaaaaAbbbbb⇒⇒
aaaabbbbbS∗⇒
bbbaaaaaabbbbS∗⇒
bbaS
bbbaaaaaabbbbS
nn∗⇒
⇒
133
bbaS⇒
The Language of a Grammar
For a grammar with start variable G Sg
}{)( SGL∗⇒ }:{)( wSwGL ⇒=
String of terminalsString of terminals
134
Linear Grammars
Grammars with at most one variable (non-terminal)at most one variable (non terminal)at the right side of a production
→ AbSSbSExamples:
→→
aAbAAbS
λ→→
SaSbS
λ→→
Aλ→S
138
Right-Linear GrammarsRight-Linear Grammars
All productions have form: xBA→or
xA→
abSS →Example
aS →
141
Left-Linear Grammars
All productions have form BxA→All productions have form BxA→
Aor
xA→Example
BAabAAabS
→→
|aB
BAabA→→ |
142
aB →
Regular Grammars
A grammar is a regular grammar if and only if it is right-linear or left-linear.
143
Theorem - Part 1
LanguagesGenerated by
Regular⊆Generated byRegular Grammars
Languages⊆
Any regular grammar generatesa regular language
146
Theorem - Part 2
LanguagesR l
g gGenerated byRegular Grammars
RegularLanguages
⊇Regular Grammars
Any regular language is generated by a regular grammar
147
y g g
Proof – Part 1Proof Part 1
LanguagesGenerated by
Regular⊆Generated byRegular Grammars
Languages⊆
The language generated by)(GLThe language generated by any regular grammar is regular
)(GLG
148
The case of Right-Linear Grammars
Let be a right-linear grammarG g g
We will prove: is regular)(GLp g)(
Proof idea We will construct NFAwith
M)()( GLML =with )()( GLML =
149
Construct NFA such that every state Mis a grammar variable:
AS special
final stateFV
Bfinal state
BaaABaAS |
→→
aBbBBaaA|→
→
151
|
GM GrammarNFA
Aa
BaAS |→a a
abBBBaaA|→
→
S FVλ a a
abBB |→
Bλ a
GLML )()(b
abaaaabGLML
**)()(
+==
158
abaaaab +
In GeneralA right-linear grammar G
has variables: K,,, 210 VVV
and productions:
VV jmi VaaaV L21→or
mi aaaV L21→or
159
We construct the NFA such that:M
each variable corresponds to a node: iV
0V 1V V FV0V 1V 2V FV
special
….
specialfinal state
160
For each production: jmi VaaaV L21→
we add transitions and intermediate nodes
V V1a 2a aiV jV………1a 2a ma
Example:
161
http://www.cs.duke.edu/csed/jflap/tutorial/grammar/toFA/index.htmlConvert Right-Linear Grammar to FA by JFLAP
Example
9a
M48433421
23110
||
VaaaVaaVVaVaV
→→
1V 3V1a3a2a 4a
5a
9
5193
452
| aVaVVaV
→→ 0V
FV3a3a
4a
8a
5a
9a
04
5193 |aV → 2V
4V8a 9a
5a
)()( MLGL =162
)()(
The case of Left-Linear Grammars
Let be a left-linear grammarG g
We will prove: is regular)(GLp g)(GL
Proof ideaWe will construct a right-linearWe will construct a right lineargrammar with G′ RGLGL )()( ′=
163
Construct right-linear grammar G′
I G aaBaA→In :G kaaBaA L21→
vBA→ vBA→
G′ BAIn :G′ BaaaA k 12L→
BA R165
BvA R→
RIt is easy to see that: RGLGL )()( ′=
Since is right-linear, we have:G′
)(GL ′ RGL )( ′ )(GL)(GL GL )( )(GLRegular Regular RegularLanguage
gLanguage
gLanguage
167
Proof - Part 2
LanguagesLanguagesGenerated by
RegularLanguages
⊇Regular Grammars
Languages
Any regular language is generated b l
LGby some regular grammar G
168
Any regular language is generated LG
Proof idea
by some regular grammar G
Proof idea
Since is regularLSince is regularthere is an NFA such that
LM )(MLL =
Construct from a regular grammar h th t
M G)()( GLMLsuch that )()( GLML =
169
Convert to a right-linear grammarMb
M→ aqqG
a aM0q 1q 2q→
→
11
10
bqqaqq
bλ
1 2
→ 21
11
aqqqq
3q→ 32 bqq
λ→→ 13
qqq
LMLGL == )()(171
λ→3q )()(
Regular GrammarsA regular grammar is any right-linear or left-linear grammarright linear or left linear grammar
Examples
1G 2GExamples
abSS → AabS →
aS →B
BAabA→ |
175aB →
ObservationRegular grammars generate regular languagesExamples
b1G 2G
abSS →BAabA
AabS→→
|aS →
aBBAabA
→→ |
aabGL *)()( 1 = *)()( 2 abaabGL =176
aabGL )()( 1 = )()( 2 abaabGL =
What is a compiler?A compiler is program that translates a source
language into an equivalent target language
while (i > 3) {C programa[i] = b[i];
i ++}
C program
}
compiler does this
mov eax, ebxadd eax, 1cmp eax 3
assemblyprogram
179
cmp eax, 3jcc eax, edx
program
What is a compiler?
l f {class foo {int bar;...
Java program
}
compiler
struct foo {int bar;
does this
int bar;...
}
C program
180
What is a compiler?
l f {class foo {int bar;...
Java program
}
compiler
........does this
Java virtual.................
Java virtual machine program
181
Phases of a compilerpSource Program
Lexical AnalyzerScanner
Tokens
Parser Syntax Analyzer
P T
Semantic Analyzer
Parse Tree
Abstract Syntax Tree with
182
yattributes
Compilation in a Nutshell
Source code if (b == 0) a = b;Source code(character stream)
Lexical analysis
if (b == 0) a = b;
Parsing
Token stream if ( b ) a = b ;0==
Parsing
Abstract syntax tree(AST)
if==
b 0
=
a b
;
(AST)Semantic Analysisif
==
b 0
=
b
booleanDecorated AST
int;
183
int b int 0 int alvalue
int b
Stages of AnalysisStages of Analysis
Lexical Analysis – breaking the input up into individual words/tokens
Syntax Analysis – parsing the phrase structure ofthe program
Semantic Analysis – calculating the program’s imeaning
184
Lexical AnalyzerLexical Analyzer
Input is a stream of charactersProduces a stream of names, keywords & , y
punctuation marksDiscards white space and commentsp
185
Lexical AnalyzerLexical Analyzer
Recognize “tokens” in a program source code.The tokens can be variable names, reserved
d t b twords, operators, numbers, … etc.Each kind of token can be specified as an RE,
e g a variable name is of the form [A Zae.g., a variable name is of the form [A-Za-z][A-Za-z0-9]*.
We can then construct an λ NFA to recognize itWe can then construct an λ -NFA to recognize it automatically.
186
Lexical AnalyzerLexical Analyzer
By putting all these λ-NFA’s together, we obtain one which can recognize all different kinds of t k i th i t t itokens in the input string.
We can then convert this λ -NFA to NFA and th t DFA d i l t thi DFAthen to DFA, and implement this DFA as a deterministic program - the lexical analyzer.
187