models of computation i: finite state automata - comp1600 ... · example: java identi ers from...
TRANSCRIPT
Models of Computation I: Finite State AutomataCOMP1600 / COMP6260
Dirk PattinsonAustralian National University
Semester 2, 2018
The Story So Far . . .
Logic.
language and proofs to speak about systems precisely
useful to express properties and do proofs
Functional Programs
establish properties of functional programs
main tool: (structural) induction
Imperative Programs.
again: focus on properties of programs
main tool: Hoare Logic
Q. Is there a general notion of computation? That encompasses both?
1 / 67
First Shot: Your Laptop
Abstract Characteristics.
can do computation
has memory – a finite amount
has (lots of) internal states2 / 67
From Laptops to Formal Models
Concrete (your laptop)
realistic (it exists!)
complex
hard to analyse
Abstract (mathematical model)
exists only as a model
simple
easy to analyse
Q. What is a “good” simple model of computation?
should match what really exists (possibly by a long shot)
should be conceptually simple
3 / 67
First Answer: Finite State Automata
Basic Components.
internal states – finitely many
state transitions – triggered by reading input
simplifying assumption: just one output: yes/no
Data.
basic input: strings (what you type in, text/xml file)
characters: drawn from finite set (alphabet)
4 / 67
Example: Java Identifiers
From Oracle’s Java Language Specification.
An identifier is a sequence of one or more characters. The firstcharacter must be a valid first character (letter, $, ) in an identifier ofthe Java programming language, hereafter in this chapter called simplyJava. Each subsequent character in the sequence must be a validnonfirst character (letter, digit, $, ) in a Java identifier.
Graphical Specification
Letter
$
_
Letter
Digit
_
$
Identifier
Q. Can you “see” a machine that recognises Java identifiers?5 / 67
Java Identifiers
Example: Main Components
Letter
$
_
Letter
Digit
_
$
Identifier
Data.
drawn form a finite alphabet (unicode, or ASCII)
Control.
“yes” if I can get from the left to the right, “no” otherwisehave states after taking a transition (implicit in diagram)
Computational Problem with yes/no answer:
it a given sequence of characters a valid Java identifier?6 / 67
Preview.
This week. Finite Automata
start with simplest model: finite automata
relate to regular languages, non-determinism
conclusion: finite automata “too simple”
Next Week. Pushdown automata
like finite automata, but some more memory
useful for e.g. specifying syntax of programming languages
still “too simple” for general computation
Then. Turing machines
The most widely accepted model of computation
infinite memory
idea: buy another hard disk whenever your computation runs out ofmemory
limits of what can be computed7 / 67
Finite State Automata: First Example
The simplest useful abstraction of a “computing machine” consists of:
A fixed, finite set of states
A transition relation over the states
Example: a traffic light FSA has 3 states:
����- G
@@R
G names state in which light is green.
����Y���
Y names state in which light is yellow.����R�
R names state in which light is red.
System designs are often in terms of state machines.
8 / 67
Second Example: Vending Machine
Operation
accept 10c and 20c coins
delivers if it has received at least 40c and selection is made
����- 0c -20
@@R10
��
����10c -
20
���10����20c -20
@@R10
����30c -
20
���10����� ��40c
@@R10
�select
����� ��50c
�select
Note.
transitions are labelled
new ingredient: final states (doubly circled)
Computation. Sequences of actions (lablels) from initial to final state.
9 / 67
Language Examples
Main Idea.
input: a string over a fixed character set
operation: transitions labelled with characters
output: yes if in final state after reading the input
More Generally.
Setup: Fix a finite set of characters (an alphabet)
Problem: A set of strings (called language) that are “valid” or “good”
Task: decide computationally which strings are “good”
Example Languages.
1. A finite set.{a, aa, ab, aaa, aab, aba, abb}
2. Palindromes consisting of bits (0,1):
{0, 1, 00, 11, 010, 101, 000, 111, 0110, ...}
Languages in this sense are called formal languages.10 / 67
Terminology
Alphabet.A finite set (of symbols). Usually denoted by Σ.
Strings over an alphabet Σfinite sequence of characters (elements of Σ, can be the empty
sequence. E.g. for Σ = {a, b, c}, ababc is a string over Σ.
Languages over alphabet Σare just sets of strings over Σ.
Sentences of the languagejust another name for the elements (strings) of the language.
Notation:
Σ∗ is the set of all strings over Σ.
Therefore, every language with alphabet Σ is some subset of Σ∗.
11 / 67
Automata
First Model of Copmputation. Deterministic Finite Automata
solve computational problem: given string s, is s accepted?
Basic Ingredients. (see e.g. traffic light and vending machine example)
The alphabet of a DFA is a finite set of input tokens that anautomaton acts on.
a DFA consists of a finite set of states (a primitive notion)
One of the states is the initial state — where the automaton starts
At least one of the states is a final state
A transition function (next state function):
State × Token → State
12 / 67
Recurring Theme
Diagrammatic Notation.
useful for humans
e.g. the transition diagram of the vending machine
Mathematical Notation.
useful for formal manipulation (e.g. proving theorems)
useful for computer implementation
Glue between Diagrams and Maths
both notions convey precisely the same information
crucial: being able to switch back and forth!
13 / 67
Formal Definition of DFA
A Deterministic Finite State Automaton (DFA) consists of five parts:
A = (Σ,S , s0,F ,N)
an input alphabet Σ, the set of tokens
a set of states S
an “initial” state s0 ∈ S (we start here)
a set of “final” states F ⊆ S (we hope to finish in one of these)
a transition function N : S × Σ → S
Aside. Having a transition function is what makes the automatondeterministic.
14 / 67
Example 1
As a diagram.
����- S0
@@R0��61
����S1���1 ���0
����� ��S2� 1
��0
In Mathematical Notation.
Alphabet - {0, 1}States - {S0,S1, S2}Initial state - S0
Final states - {S2}Transition function (as a table) -
0 1
S0 S1 S0S1 S1 S2S2 S1 S0
Aside. The actual names of the states are irrelevant.
15 / 67
Example 1, ctd
Recall. N : S × Σ→ S is the transition function.0 1
S0 S1 S0S1 S1 S2S2 S1 S0
Single Steps of the automaton
N(S0, 0) is the state that the automation transitions to from state S0reading letter 0.
Here: N(S0, 0) = S1.
Multiple Steps of the automaton
N(N(S0, 0), 1) is the state of the automation when starting in S0 andreading first 0, then 1.
Here: N(N(S0, 0), 1) = S2.
16 / 67
Example 2
����- U
?
a
-b
@@@@R
c
����� ��Z��-a,b,c
����� ��V ���
a,b,c
����Y ���c6b
�a
a b c
U→ Z V YV� V V VY Z V YZ� Z Z Z
(the table carries the same information as the diagram)
Q. What is the language of this automaton?
17 / 67
Eventual State Function
Revisit example 1:
����- S0
@@R0��61
����S1���1 ���0
����� ��S2� 1
��0
Input 0101 takes the DFA from S0 to S2,Input 1011 takes the DFA from S1 to S0, etc
A complete list of such possibilities is a function from a given stateand a string to an ‘eventual state.’
This is the idea of Eventual State Function.
18 / 67
Eventual State Function — Definition
Definition. Let A be a DFA with states S , alphabet Σ, and transitionfunction N.
The eventual state function for A is of type
N∗ : S × Σ∗ → S
and is defined inductively by:
N∗(s, ε) = s (N1)
N∗(s, xα) = N∗(N(s, x), α) (N2)
Informally. N∗(s,w) is the state A reaches, starting in state s andreading string w .
For Haskell afficionados:
N∗ = uncurry(foldl(curry N))
19 / 67
An Important (but Unsurprising) Theorem about N∗
Theorem. For all states s ∈ S and for all strings α, β ∈ Σ∗
N∗(s, αβ) = N∗(N∗(s, α), β)
Proof by induction on the length of α.Base case: α = ε
LHS = N∗(s, εβ) = N∗(s, β)
RHS = N∗(N∗(s, ε), β)
= N∗(s, β) = LHS (by (N1))
20 / 67
Proof ctd: Step case:
Step Case. Show that N∗(s, (xα)β) = N∗(N∗(s, xα), β)
LHS = N∗(s, (xα)β)
= N∗(s, x(αβ))
= N∗(N(s, x), αβ) (by (N2))
= N∗(N∗(N(s, x), α), β) (by IH)
RHS = N∗(N∗(s, xα), β)
= N∗(N∗(N(s, x), α), β) (by (N2))
Corollary — when β is a single token
N∗(s, αy) = N(N∗(s, α), y)
21 / 67
Example
����- S0
@@R0��61
����S1���1 ���0
����� ��S2� 1
��0
N∗(S1, 1011) = N∗(N(S1, 1), 011)
= N∗(S2, 011)
= N∗(S1, 11)
= N∗(S2, 1)
= N∗(S0, ε)
= S0
22 / 67
Language of an Automaton
Acceptance Informally. A DFA accepts a string if, starting from the startstate, it terminates in one of the final states.
Acceptance, Formally. Let A = (Σ,S , s0,F ,N) be an DFA and w be astring in Σ∗.We say w is accepted by A if
N∗(s0,w) ∈ F
The language accepted by A is the set of all strings accepted by A:
L(A) = {w ∈ Σ∗|N∗(s0,w) ∈ F}
(That is, w ∈ L(A) iff N∗(s0,w) ∈ F .)
23 / 67
Example 1 again
A1:����- S0
@@R0��61
����S1���1 ���
0
����� ��S2� 1
��
0
Q. Which strings are accepted?
e.g. 0011101 takes the machine from state S0 through states S1, S1,S2, S0, S0, S1 to S2 (a final state).
N∗(S0, 0011101) = N∗(S1, 011101) = N∗(S1, 11101) =. . .N∗(S1, 1) = S2
others: 01, 001, 101, 0001, 0101, 00101101 . . .
24 / 67
Example 1 (ctd.)
A1:����- S0
@@R0��61
����S1���1 ���0
����� ��S2� 1
��0
Accepted Strings.01, 001, 101, 0001, 0101, 00101101 . . .
Strings that are not accepted.ε, 0, 1, 00, 10, 11, 100 . . .
Q. What do the accepted strings have in common? How do we justify this?
25 / 67
Proving an Acceptance Predicate — in General
Our Claim. The automaton A accepts precisely the strings that areelements of the language L = {w ∈ Σ∗ | P(w)}.
(P is sometimes called an acceptance predicate.)
Proof Obligations.
1. Show that any string satisfying P is accepted by A.
2. Show any string accepted by A satisfies P.
26 / 67
Proving an Acceptance Predicate for A1
Proof obligation 1:If a string ends in 01, then it is accepted by A1. That is:
For all α ∈ Σ∗, N∗(S0, α01) ∈ F
Proof obligation 2:If a string is accepted by A1, then it ends in 01. That is:
For all w ∈ Σ∗, if N∗(S0,w) ∈ F then ∃α ∈ Σ∗. w = α01
27 / 67
Part 1: ∀α ∈ Σ∗, N∗(S0, α01) ∈ F
Lemma:∀s ∈ S . N∗(s, 01) = S2
Proof by cases:
N∗(S0, 01) = N∗(S1, 1) = S2
N∗(S1, 01) = N∗(S1, 1) = S2
N∗(S2, 01) = N∗(S1, 1) = S2
So, by the “append” theorem above,
N∗(S0, α01) = N∗(N∗(S0, α), 01) = S2�
28 / 67
Part 2: N∗(S0,w) = S2 =⇒ ∃α. w = α01
Proof. Suppose N∗(S0, αxy) = S2.
By corollary to apppend-theorem (case of single token):
N(N∗(S0, αx), y) = S2
By the definition of N, y must be 1 and N∗(S0, αx) must be S1.
Similarly,N(N∗(S0, α), x) = S1
and x is 0, again by the definition of N.
29 / 67
Another Example
What language does this DFA accept?
SOB : ����- S0 -
1��60����� ��S1 -
1��60����S2��60
���1
30 / 67
Answer for SOB
SOB accepts the language of bitstrings containing exactly one 1-bit.
Proof obligations:
Show that if a bitstring contains exactly one 1-bit then it is acceptedby SOB.
Show that if a string is accepted by SOB it contains exactly one1-bit.
SOB : ����- S0 -
1��60����� ��S1 -
1��60����S2��60
���1
31 / 67
Mapping to Mathematics
Expressed mathematically, the main conclusion is
L(SOB) = {w ∈ Σ∗ | w = 0n10m}
The two subgoals are
1. If w = 0n10m then N∗(S0,w) = S1
2. If N∗(S0,w) = S1 then w = 0n10m.
For this DFA the phrase “w is accepted by SOB” is captured by theexpression N∗(S0,w) = S1.
32 / 67
Proving these subgoals
The first subgoal follows immediately from the following two lemmas,which are easily proved by induction:
∀n ≥ 0. N∗(S0, 0n) = S0
∀n ≥ 0. N∗(S1, 0n) = S1
Therefore
N∗(S0, 0n10m) = N∗(N∗(S0, 0
n), 10m) = N∗(S0, 10m)
= N∗(N(S0, 1), 0m) = N∗(S1, 0m) = S1
The second subgoal, stated more formally as
∀w : N∗(S0,w) = S1 =⇒ ∃n,m ≥ 0. w = 0n10m
can be proved in a similar fashion to Example 1 on earlier slides.
33 / 67
Limitations of FSAs
Q. Is an FSA a “good” model of computation?
Suppose we have a program P that always terminates
and outputs “yes” or “no” for every input string
Is there an FSA that accepts precisely the strings for which P says“yes”?
Technical Analysis. Properties of languages accepted by a DFA.
A very important example: L = { anbn | n ∈ N}L = {ε, ab, aabb, aaabbb, a4b4, a5b5, ...}Claim. There is no FSA that recognises this language.
(because an FSA’s memory is limited.)
Q. Given the claim above, are FSA’s realistic models of computation?
34 / 67
Proof of Claim
Proof by contradiction.Suppose A is an FSA that accepts L. That is L = L(A).
Then each of the following are states of A:
N∗(S0, a), N∗(S0, a2), N∗(S0, a
3) . . .
But A only has finitely many states, so some state must repeat:
There are distinct i and j such that N∗(S0, ai ) = N∗(S0, a
j).
that is, the automaton cannot tell ai and aj apart.
35 / 67
Proof by contradiction (ctd)
Since aibi is accepted, we know
N∗(S0, aibi ) ∈ F
By the append theorem
N∗(N∗(S0, ai ), bi ) = N∗(S0, a
ibi ) ∈ F
Now, since N∗(S0, ai ) = N∗(S0, a
j)
N∗(N∗(S0, aj), bi ) = N∗(S0, a
jbi ) ∈ F
So ajbi is accepted by A but ajbi is not in L, contradicting the initialassumption.
36 / 67
Pigeon-Hole Principle
The proof used the pigeon-hole principle:
No function from one set to a smaller finite set can beone-to-one. �
�••••
�
�•••
(Finiteness is not really necessary — no function from one set to another with
smaller cardinality can be one-to-one.)
“You cannot fit n + 1 pigeons into n holes”
37 / 67
Equivalence of Automata
Two automata are said to be equivalent if they accept the same language.
Example:A4:
����� ��- S0 -1
��?0
����S1
?
1
���0
����� ��S2
���0�1����
S3��-0
61
A5:����� ��- S0
?
1
��?0
����S1
���06
1
Q. Can FSAs be simplified? is there an equivalent FSA with fewer states?
38 / 67
Equivalence of States
Two states Sj and Sk a FSA are equivalent if, for all input strings w
N∗(Sj ,w) ∈ F if and only if N∗(Sk ,w) ∈ F
Example. In A4, S2 is equivalent to S0 and S1 is equivalent to S3.
A4:����� ��- S0 -1
��?
0
����S1
?
1
���0
����� ��S2
���0�1����
S3��-0
61
39 / 67
Elimination of Equivalent States
Assumptions.
A = (Σ,S ,S0,F ,N) is an FSA
Sk and Sj be equivalent
Sk 6= S0 (don’t eliminate the initial state!)
Elimination of Sk from A: new automaton A′ = (Σ,S ′,S0,F′,N ′)
S ′ is S without Sk
F ′ is F without Sk
N ′(s,w) = (if N(s,w) = Sk then Sj else N(s,w))
40 / 67
Example
Since S2 ≡ S0 in A4, let’s eliminate S2.
New set of states is {S0, S1,S3}New set of final states is {S0}New transition function is:
0 1
S0 S0 S1S1 S1 S0S3 S3 S0
A6:
����� ��- S0
-1
��?0
����S1�
1���0
����S3
��-061
41 / 67
FSA Minimisation
Elimination of equivalent states.
if two states are equivalent, one can be elimnated
Elimination of Unreachable States
if a state cannot be reached from the initial state then it can also beeliminated.
Example. S3 not reachable
A6:
����� ��- S0
-1
��?
0
����S1�
1���0
����S3
��-061
42 / 67
The Standard Minimisation Algorithm
Main Idea.
aggregate states into groups (of possibly equivalent states)
initially, all states are possibly equivalent
split a group of possibly equivalent states if we have evidence thatthey are not equivalent.
I a non-final state is never equivalent to a final stateI two states are non-equivalent if the transition function takes them into
different groups (with the same letter)
repeat until no more groups can be split.
Realisation.
The working data structure for the algorithm is a list of lists(“groups”) of states
On each iteration, we test one of the groups with a symbol from thealphabet.
If we notice differing behaviour, we split the group.
43 / 67
The Algorithm Details
Input: A list containing two “groups”. (a group is represented as alist of states). One group consists of the Final states and the otherconsists of the non-final states.
Data: The working data structure, WDS : [[State]], is a list ofgroups of states. When two states are in different groups, we knowthey are not equivalent.
Loop: Pick a group, {s1, ...sj} and a symbol, x .I If the states {N(si , x) | i = 1, . . . , j} are all in the same group, then
the group {s1, ...sj} is not split.I If the states {N(si , x) | i = 1, . . . , j} belong to different groups of
WDS , then the group {s1, ...sj} should be split accordingly.
Continue until we cannot, by any choice of letter, split any group.
44 / 67
Our Previous Example
Our running example is trivial. The initial split is it.
A:����� ��- S0 -1
��?0
����S1
?
1
���0
����� ��S2
���0�1����
S3��-0
61
[[s0, s2], [s1, s3]]?0
[[s0, s2], [s1, s3]]?0
[[s0, s2], [s1, s3]]
?1
[[s0, s2], [s1, s3]]
?1
[[s0, s2], [s1, s3]]
A′:����� ��- Sa
?
1
��?0
����Sb
���06
1
45 / 67
Non-Deterministic Finite State Automata — NFAs
Consider this FSA:
����- S0 -
a��6a����S1 -
b��6b����S2 -
c��6c����� ��S3
Q. Is it intuitively clear what it does?
Q. Is it a DFA in the sense of our definition?
46 / 67
Is it legal, i.e. a “proper” DFA?
����- S0 -
a��6a����S1 -
b��6b����S2 -
c��6c����� ��S3
A. It makes sense, but it is nondeterministic: A nondeterministic finiteautomaton (NFA). So not a “legal” DFA, but a specimen of a differentbreed.
Differences to deterministic automata
Multiple edges with the same label come out of statesFor some states, there is not an edge for every token
Formally. NFAs have a transition relation rather than a transitionfunction.
transition relation R(s1, x , s2) obtains if there’s an x-labelled edgefrom s1 to s2there can be no x-labelled edge between s1 and any statethere can be many states s2, s3, . . . that are connected to s1 via anx-labelled edge. 47 / 67
Is it clear what it does?
����- S0 -
a��6a����S1 -
b��6b����S2 -
c��6c����� ��S3
Observations.
Some states don’t have an outgoing edge with a certain letter, so theNFA can “get stuck”.In some states, there’s more than one possible successor state with acertain letter.
Acceptance condition for NFAs given string α:
can get from initial to final state, making the “right” choice ofsuccessor statewithout getting stuck
Exanple. α = aaabcc
need to “look ahead” to make the right choice(alternatively, try to backtrack if wrong choice has been made)
48 / 67
DFAs vs NFAs
Key Differences.
For each state in a DFA and for each input symbol, there is a uniquesuccessor state.
DFAs have a transition function.
NFAs allow zero, one or more transitions from a state for the sameinput symbol.
NFAs have a transition relation.
An input sequence a1, a2, . . . , an is accepted by a NFA if there existssome sequence of transitions that leads from the initial state to a finalstate.
49 / 67
Why NFAs?
Example. NFAs are simpler.
A NFA recognizing strings of letters ending in “man”:(Σ is the Latin alphabet)
����- S0 -
m��6����S1 -
a ����S2 -
n ����� ��S3
50 / 67
An Equivalent DFA
Example. DFAs are (often) more complex.
A DFA that recognises strings of letters than end in “man”.
����- S0 -
m��6Σ-{m}
���
@@I����S1
� Σ-{a,m}-
a
��?
m�� ����
S2�m -
n�Σ-{m,n}
����� ��S3
@@m
�Σ-{m}
51 / 67
NFAs: Formal Definition
A Nondeterministic Finite State Automaton (DFA) consists of five parts:
A = (Σ,S , s0,F ,N)
an input alphabet Σ, the set of tokens
a set of states S
an “initial” state s0 ∈ S (we start here)
a set of “final” states F ⊆ S (we hope to finish in one of these)
a transition relation R ⊆ S × Σ× S .
Aside. The transition relation is what makes the automatonnondeterministic.
52 / 67
Eventual State Relation for NFAs
Basic Idea. The eventual state relation R∗(s,w , s ′) is true if s ′ is a statethe NFA can reach, starting in state s and reading string w .
Formal Definition. The eventual state relation has type
R∗ ⊆ S × Σ∗ × S
or R∗ : S × Σ∗ × S → Bool
and is defined inductively as follows:
R∗(s, ε, s)
R∗(s, xα, s ′) = ∃s ′′.R(s, x , s ′′) ∧ R∗(s ′′, α, s ′)
53 / 67
An Important (but Unsurprising) Theorem about R∗)
For all states s, s ′ and for all strings α, β ∈ Σ∗
R∗(s, αβ, s ′) if and only if ∃s ′′. R∗(s, α, s ′′) ∧ R∗(s ′′, β, s ′)
The proof is similar to the corresponding result for N∗ in DFAs.
54 / 67
Language of a NFA
Let A = (Σ,S , s0,F ,R) be a NFA.
Definition. A string w is accepted by A if
∃s ∈ F . R∗(s0,w , s)
The language accepted by A is the set of all strings accepted by A
L(A) = {w ∈ Σ∗ | ∃s ∈ F . R∗(s0,w , s)}
Informally. That is, w ∈ L(A) iff there exists a path through the diagramfor A, from s0 to a final state s (s ∈ F ), such that the symbols on thepath match the symbols in w
55 / 67
Power of Nondeterminism?
Q. Is there a language that is accepted by an NFA for which we cannotfind a DFA that (also) accepts it?
it seems easier to construct NFAs
but in examples, DFAs did also exist
A. A simple “no”.
Theorem. If language L is accepted by a NFA, then there is some DFAwhich accepts the same language.
Moreover, this DFA can be computed using an algorithm.)
just like the minimal automaton can be computed using stateequivalence
Drawback. The resulting NFA may have exponentially many states
Have to record a set of states that the NFA could be in.
56 / 67
Constructing the Equivalent DFA from an NFA
Assumption. We have an NFA with state set {q0, . . . , qn}.
Basic Idea.
consider all possible runs of the NFA in parallel
as a consequence, can be in a set of tates
Construction.
A state of the DFA is a set of states of the NFA
e.g. {q3, q7} or ∅signifies the states that the NFA can be in after reading some input
transition function: records possible next states
e.g. from {q3, q7 with letter x , take union of transitions (with x) fromq3 and q7
final states are state sets that contain a final state.
57 / 67
Regular Expressions
Challenge. Understand the computational power of DFAs / NFAs.
Approach. Characterise the languages that can be accepted by an NFA ina different form.
One Characterisation. Regular expressions (cf. Perl, Ruby, grep)
Basic Operators used to construct new expressions from old:
vertical bar (pipe): choose either the left or right expressionKleene star: repeat strings from an expressionε, the empty string, and every letter of the alphabetconcatenation, for sequencing expressionsparentheses, for grouping
Example.
a∗ indicates 0 or more as.yes | no is the language with just the 2 given strings.(0 | 1)∗ indicates the set of binary numerals.
58 / 67
Regular Expressions — More Examples
0|(1(0|1)∗) is the set of binary numerals with no leading zeros.
(a | b)∗c(a | b)∗ is the set of strings over {a, b, c} with just one c.
(0∗10∗10∗)∗ is the language of bit-strings that have an even numberof ones. (Alternatively 0∗(10∗10∗)∗)
(z∗(x∗ | y∗) z))∗ is the set of strings over {x , y , z} with no x and yadjacent.
1 | (0 ( ε |(.(0 | 1)∗1)))) is binary fractional numerals between 0 and1 with no trailing zeroes. (e.g. 0.1, 0.110011 but not .1 or 0.10)
59 / 67
The Definition of Regular Expressions
Key Concept.
regular expressions are purely syntactical – just like formulae
but: every expression denotes a set of strings – this is the meaning.
Definition. The regular expressions over alphabet Σ and the sets thatthey denote are:
∅ is a regular expression and denotes the empty set ∅ε is a regular expression and denotes the set {ε}for each a ∈ Σ, a is a regular expression and denotes the set {a}
If α and β are regular expressions denoting languages R and Srespectively, then:
α | β denotes R ∪ S
αβ denotes RS which is {xy | x ∈ R ∧ y ∈ S}α∗ denotes R∗, ie, the set of finitely many ri ∈ R, concatenated
R∗ is (inductively) defined as {ε} ∪ RR∗
60 / 67
Regular Expressions and FSAs
Key Insight.
Regular expressions and NFAs / DFAs are equivalent.
for every DFA A, have regular expression r with L(A) = L(r)
for every regular expression r , have DFA A with L(r) = L(A)
so the “power” of NFAs / DFAs are completely described by regularexpressions.
Q. Can we “compute” more than what can be described by regularexpressions?
61 / 67
From Regular Expressions to NFAs
Extra Ingredient: Spontaneous transitions
NFAs that may change state without consuming a symbol.
NFAs of this kind are called NFAs with ε-transitions
can convert NFAs with ε-transitions to (standard) NFAs (so no moreexpressive power, we don’t cover this translation).
Formal Definition. An NFA with ε-transitions is an NFA, but thetransition relation has the form
R ⊆ S × Σ ∪ {ε} × S
cf. NFAs with transition relation R ⊆ S × Σ× S
R(s, ε, s ′) signifies a spontaneous transition (without reading inputsymbol)
62 / 67
Regular Expressions to NFAs
Key Insight.
regular expressions are an inductively defined structure
e.g. representable by an inductive data type in Haskell
as a consequence, we can give inductive definition of thecorresponding automaton
Construction. (start state on left, final state on right)
When the regular expression is a symbol a of the alphabet (languageis {a}) the automaton is
a
When the regular expression is ε (language is {ε}) the automaton is
ε
When the regular expression is ∅ (language is ∅) the automaton hasno edges
63 / 67
Regular Expressions to NFAs, ctd
Suppose the NFA corresponding to some R is:
R
Then NFAs corresponding to composite regular expressions are defined asfollows:
R1
2R2RR1
R1 2RR1 2R
RR*ε
ε
ε ε
ε
ε
ε
ε
64 / 67
Example
Given the regular expression for binary numerals without leading zeros,(0 | 1(0|1)∗), the above algorithm gives this NFA.
0
1
1 ε
ε
ε
0
ε
εε ε
ε ε
ε
ε
65 / 67
Summary.
Starting Point. Finite Automata
motivated by computers having finite memory (only)
solving simple problems: is string s accepted?
Limitations of Finite Automata
e.g. cannot recognise L = {anbn | n ≥ 0}
Characterisation of expressive power
can go back and forth between automata and regular expressions
Q. Are finite automata a “good” model of computation?
if yes, why?
if not, why not? What is missing?
66 / 67
Literature.
Introduction to Automata Theory, Languages, and Computation ByHopcroft, Motwani, and Ullman.
A classic text that has been re-worked from a standard textbook.
Introduction To The Theory Of Computation by Michael Sipser
The part on Automata and Languages covers (more than) what wehave discussed here.
67 / 67