lexical analysis2 modefied

Upload: justaperson3157

Post on 15-Apr-2018

238 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/6/2019 Lexical Analysis2 Modefied

    1/33

    Lexical AnalysisLexical Analysis

    Course 338- Compilers

    Techniques and Tools

    Chapter 3

  • 8/6/2019 Lexical Analysis2 Modefied

    2/33

    2

    Lexical AnalysisLexical Analysis

    Lexical analysis recognizes the vocabulary of

    the programming language and transforms a

    string ofcharacters into a string ofwords or

    tokens

    Lexical analysis discards white spaces andcomments between the tokens

    Lexical analyzer(orscanner) is the program

    that performs lexical analysis

  • 8/6/2019 Lexical Analysis2 Modefied

    3/33

    3

    ContentsContents

    Scanners

    Tokens

    Regular expressions

    Finite automata

  • 8/6/2019 Lexical Analysis2 Modefied

    4/33

    4

    ScannersScanners

    Scanner Parser

    Symbol

    Table

    token

    next token

    characters

  • 8/6/2019 Lexical Analysis2 Modefied

    5/33

    5

    TokensTokens

    A token is a sequence of characters that can

    be treated as a unit in the grammarof aprogramming language

    A programming language classifies tokens

    into a finite set oftoken types

    Type ExamplesID foo i n

    NUM 73 13

    IF if

    COMMA ,

  • 8/6/2019 Lexical Analysis2 Modefied

    6/33

    6

    Semantic Values of TokensSemantic Values of Tokens

    Semantic values are used to distinguish

    different tokens in a token type < ID, foo>, < ID, i >, < ID, n >

    < NUM, 73>, < NUM, 13 >

    < IF, >

    < COMMA, >

    Token types affect syntax analysis and

    semantic values affect semantic analysis

  • 8/6/2019 Lexical Analysis2 Modefied

    7/33

    7

    Scanner GeneratorsScanner Generators

    Scanner

    Generator

    Scannerdefinition in

    Mata LanguageScanner

    ScannerProgram in

    programming

    language

    Token types &

    semantic values

  • 8/6/2019 Lexical Analysis2 Modefied

    8/33

    8

    LanguagesLanguages

    A language is a set ofstrings

    A string is a finite sequence ofsymbols taken

    from a finite alphabet

    The C language is the (infinite) set of all strings that

    constitute legal C programs

    The language of C reserved words is the (finite) set

    of all alphabetic strings that cannot be used as

    identifiers in the C programs

    Each token type is a language

  • 8/6/2019 Lexical Analysis2 Modefied

    9/33

    9

    Regular Expressions (RE)Regular Expressions (RE)

    Language allows us to use finite descriptions

    to specify (possibly infinite) sets

    RE is the metalanguage used to define the

    token types of a programming language

  • 8/6/2019 Lexical Analysis2 Modefied

    10/33

    10

    Regular ExpressionsRegular Expressions

    I is a RE denoting L = {I}

    Ifa alphabet, then a is a RE denoting L = {a}

    Suppose r and s are RE denoting L(r) and L(s)

    alternation: (r) | (s) is a RE denoting L(r) L(s)

    concatenation: (r) (s) is a RE denoting L(r)L(s)

    repetition: (r)* is a RE denoting (L(r))*

    (r) is a RE denoting L(r)

  • 8/6/2019 Lexical Analysis2 Modefied

    11/33

    11

    ExamplesExamples

    a | b {a, b}

    (a | b)(a | b) {aa, ab, ba, bb}

    a* {I, a, aa, aaa, ...}

    (a | b)* the set of all strings ofas and bs

    a | a*b the set containing the string a andall strings consisting of zero or more

    as followed by a b

  • 8/6/2019 Lexical Analysis2 Modefied

    12/33

    12

    Regular DefinitionsRegular Definitions

    Names for regular expressions

    d1 p r1d2 p r2

    ..

    dn p rn

    where ri over alphabet {d1, d2, ..., di-1} Examples:

    letterp A | B | ... | Z | a | b | ... | z

    digit p 0 | 1 | ... | 9

    identifier p letter ( letter | digit )*

  • 8/6/2019 Lexical Analysis2 Modefied

    13/33

    13

    Notational AbbreviationsNotational Abbreviations

    One or more instances

    (r)+ denoting (L(r))+

    r* = r+ | I r+ = r r*

    Zero or one instancer? = r | I

    Character classes[abc] = a | b | c [a-z] = a | b | ... | z

    [^abc] = any character except a | b | c

    Any character except newline

    .

  • 8/6/2019 Lexical Analysis2 Modefied

    14/33

    14

    ExamplesExamples

    if {return IF;}

    [a-z][a-z0-9]* {return ID;}

    [0-9]+ {return NUM;}

    ([0-9]+.

    [0-9]*)|([0-9]*

    .

    [0-9]+) {return

    REAL;}

    (--[a-z]*\n)|( | \n | \t)+

    {/*do nothing for white spaces and*

  • 8/6/2019 Lexical Analysis2 Modefied

    15/33

    15

    Finite AutomataFinite Automata

    A finite automaton is a finite-state transition

    diagram that can be used to model therecognition of a token type specified by a

    regular expression

    A finite automaton can be a nondeterministicfinite automaton or a deterministic finite

    automaton

  • 8/6/2019 Lexical Analysis2 Modefied

    16/33

    16

    Nondeterministic Finite AutomataNondeterministic Finite Automata(NFA)(NFA)

    An NFA consists of

    A finite set ofstates

    A finite set ofinput symbols

    A transition function that maps (state, symbol)

    pairs to sets of states

    A state distinguished as start state

    A set of states distinguished as final states

  • 8/6/2019 Lexical Analysis2 Modefied

    17/33

    17

    An ExampleAn Example

    RE: (a | b)*abb

    States: {1, 2, 3, 4}

    Input symbols: {a, b}

    Transition function:

    (1,a) = {1,2}, (1,b) = {1}(2,b) = {3}, (3,b) = {4}

    Start state: 1

    Final state: {4}

    1

    2

    3

    4

    a

    b

    b

    a,b

    start

  • 8/6/2019 Lexical Analysis2 Modefied

    18/33

    18

    Acceptance of NFAAcceptance of NFA

    An NFA accepts an input string s iff there issome path in the finite-state transition diagram

    from the start state to some final state such

    that the edge labels along this path spell out s

    The language recognized by an NFA

    automaton is the set of strings it accepts

  • 8/6/2019 Lexical Analysis2 Modefied

    19/33

    19

    An ExampleAn Example

    0 31 2

    a b b

    a

    b

    start

    (a | b)

    *

    abb aabb

  • 8/6/2019 Lexical Analysis2 Modefied

    20/33

    20

    aaba

    An ExampleAn Example

    0 31 2

    a b b

    a

    b

    start

    (a | b)

    *

    abb

    a

  • 8/6/2019 Lexical Analysis2 Modefied

    21/33

    21

    Another ExampleAnother Example

    RE: aa* | bb*

    States: {1, 2, 3, 4, 5}

    Input symbols: {a, b}

    Transition function:

    (1, I) = {2, 4}, (2, a) = {3}, (3, a) = {3},(4, b) = {5}, (5, b) = {5}

    Start state: 1

    Final states: {3, 5}

  • 8/6/2019 Lexical Analysis2 Modefied

    22/33

    22

    FiniteFinite--State Transition DiagramState Transition Diagram

    start

    aa* | bb*

    1

    4

    2 3a

    b

    a

    b

    5

    I

    I

  • 8/6/2019 Lexical Analysis2 Modefied

    23/33

    23

    Deterministic Finite Automata (DFA)Deterministic Finite Automata (DFA)

    A DFA is a special case of an NFA in which

    no state has an I-transition

    for each state s and input symbol a, there is at

    most one edge labeled a leaving s

  • 8/6/2019 Lexical Analysis2 Modefied

    24/33

    24

    An ExampleAn Example

    RE: (a | b)*abb

    States: {1, 2, 3, 4} Input symbols: {a, b}

    Transition function:

    (1,a) = {2}, (2,a) = {2}, (3,a) = {2}, (4,a) = {2}

    (1,b) = {1}, (2,b) = {3}, (3,b) = {4}, (4,b) = {1}

    Start state: 1

    Final state: {4}

  • 8/6/2019 Lexical Analysis2 Modefied

    25/33

    25

    Finite-State Transition Diagram

    (a | b)*abb

    1 42 3a

    b b

    a

    b

    start

    a

    b

    a

  • 8/6/2019 Lexical Analysis2 Modefied

    26/33

    26

    Acceptance of DFAAcceptance of DFA

    A DFA accepts an input string s iff there is one

    path in the finite-state transition diagram fromthe start state to some final state such that the

    edge labels along this path spell out s

    The language recognized by a DFA

    automaton is the set of strings it accepts

  • 8/6/2019 Lexical Analysis2 Modefied

    27/33

    27

    An ExampleAn Example

    (a | b)*abb

    1 42 3a

    b b

    a

    b

    start

    a

    b

    a

    aabb

  • 8/6/2019 Lexical Analysis2 Modefied

    28/33

    28

    An ExampleAn Example

    (a | b)*abb

    1 42 3a

    b b

    a

    b

    start

    a

    b

    a

    aaba

  • 8/6/2019 Lexical Analysis2 Modefied

    29/33

    29

    Combined Finite AutomataCombined Finite Automata

    1

    32

    4.

    0-9

    start0-9

    1 2

    i fstart

    3

    1a-zstart

    2 a-z,0-9

    50-9

    .0-9

    0-9

    [a-z][a-z0-9]*

    ([0-9]+.[0-9]*)

    |

    ([0-9]*.[0-9]+)

    if IFID

    REAL

    REAL

  • 8/6/2019 Lexical Analysis2 Modefied

    30/33

    30

    Combined Finite AutomataCombined Finite Automata

    7

    98

    10.

    0-9I0-9

    2 3i f

    I

    4

    5a-z

    I6 a-z,0-9

    110-9

    .0-9

    0-9

    1start

    IFID

    REAL

    REAL

    NFA

  • 8/6/2019 Lexical Analysis2 Modefied

    31/33

    31

    Combined Finite AutomataCombined Finite Automata

    65

    7

    .

    0-90-9

    2

    i

    f3

    j-z4 a-z,0-9

    80-9

    .0-9

    0-9

    1start

    IF

    ID

    REAL

    REALDFA

    a-z,0-9

    a-h

    a-eg-z

  • 8/6/2019 Lexical Analysis2 Modefied

    32/33

    32

    Recognizing the Longest MatchRecognizing the Longest Match

    The automaton must keep track ofthe longest

    match seen so farand the position of thatmatch until a dead state is reached

    Use two variables Last-Final (the state

    number of the most recent final state

    encountered) and Input-Position-at-Last-Finalto remember the last time the automaton was

    in a final state

  • 8/6/2019 Lexical Analysis2 Modefied

    33/33

    33

    An ExampleAn Example

    65

    7

    .

    0-9

    0-9

    2

    i

    f3

    j-z4 a-z,0-9

    80-9

    .0-9

    0-9

    1

    start

    IF

    ID

    REAL

    REAL

    DFA

    a-z,0-9

    a-h

    a-e g-ziffail+ S C L P1 0 0

    i 2 0 0

    f 3 3 2

    f 4 4 3

    a 4 4 4

    i 4 4 5

    l 4 4 6

    + ?