cs406-compiler construction - lecture 3: lexical...

44
CS406-Compiler Construction Lecture 3: Lexical Analysis Waheed Noor Computer Science and Information Technology, University of Balochistan, Quetta, Pakistan Waheed Noor (CS&IT, UoB, Quetta) CS406-Compiler Construction March 2014 1 / 44

Upload: others

Post on 21-May-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS406-Compiler Construction - Lecture 3: Lexical Analysiscsit.uob.edu.pk/images/web/staff/lecture/doc-7.2014-4-18.No-7.pdf · 3 Input Buffering 4 Regular Expression 5 Quiz 2 6 Regular

CS406-Compiler ConstructionLecture 3: Lexical Analysis

Waheed Noor

Computer Science and Information Technology,University of Balochistan,

Quetta, Pakistan

Waheed Noor (CS&IT, UoB, Quetta) CS406-Compiler Construction March 2014 1 / 44

Page 2: CS406-Compiler Construction - Lecture 3: Lexical Analysiscsit.uob.edu.pk/images/web/staff/lecture/doc-7.2014-4-18.No-7.pdf · 3 Input Buffering 4 Regular Expression 5 Quiz 2 6 Regular

Outline

1 Phases and Passes

2 Readings & Quiz 1

3 Input Buffering

4 Regular Expression

5 Quiz 2

6 Regular Definition

7 Recognition of Tokens

Waheed Noor (CS&IT, UoB, Quetta) CS406-Compiler Construction March 2014 2 / 44

Page 3: CS406-Compiler Construction - Lecture 3: Lexical Analysiscsit.uob.edu.pk/images/web/staff/lecture/doc-7.2014-4-18.No-7.pdf · 3 Input Buffering 4 Regular Expression 5 Quiz 2 6 Regular

Outline

1 Phases and Passes

2 Readings & Quiz 1

3 Input Buffering

4 Regular Expression

5 Quiz 2

6 Regular Definition

7 Recognition of Tokens

Waheed Noor (CS&IT, UoB, Quetta) CS406-Compiler Construction March 2014 3 / 44

Page 4: CS406-Compiler Construction - Lecture 3: Lexical Analysiscsit.uob.edu.pk/images/web/staff/lecture/doc-7.2014-4-18.No-7.pdf · 3 Input Buffering 4 Regular Expression 5 Quiz 2 6 Regular

Phases and Passes I

Definition (Phases)The logical organization of compiler is called phases such as lexicalanalysis, syntax analysis and semantic analysis.

Definition (Passes)In implementation, sometime different phases are group together in apass that is able to read the input and generate output.

Waheed Noor (CS&IT, UoB, Quetta) CS406-Compiler Construction March 2014 4 / 44

Page 5: CS406-Compiler Construction - Lecture 3: Lexical Analysiscsit.uob.edu.pk/images/web/staff/lecture/doc-7.2014-4-18.No-7.pdf · 3 Input Buffering 4 Regular Expression 5 Quiz 2 6 Regular

Phases and Passes II

ExampleMay be the phases lexical analysis, syntax analysis, semantic analysisand intermediate code generation are grouped together into one pass,and called front-end, while the code optimization and code generationfor a particular machine are grouped into one pass called back-end.

Passes are helpful to re-use different phases of the complier

For example, assume a front-end for a language with a welldesigned intermediate representation allows interface with theback-end for a particular machine

Waheed Noor (CS&IT, UoB, Quetta) CS406-Compiler Construction March 2014 5 / 44

Page 6: CS406-Compiler Construction - Lecture 3: Lexical Analysiscsit.uob.edu.pk/images/web/staff/lecture/doc-7.2014-4-18.No-7.pdf · 3 Input Buffering 4 Regular Expression 5 Quiz 2 6 Regular

Phases and Passes III

Now you can develop different front-ends for different languagesfor the back-end of the same machine.

Question: What other use of these collections you can think?

So compilers can be of a single pass or consist of multiple pass.

Waheed Noor (CS&IT, UoB, Quetta) CS406-Compiler Construction March 2014 6 / 44

Page 7: CS406-Compiler Construction - Lecture 3: Lexical Analysiscsit.uob.edu.pk/images/web/staff/lecture/doc-7.2014-4-18.No-7.pdf · 3 Input Buffering 4 Regular Expression 5 Quiz 2 6 Regular

Outline

1 Phases and Passes

2 Readings & Quiz 1

3 Input Buffering

4 Regular Expression

5 Quiz 2

6 Regular Definition

7 Recognition of Tokens

Waheed Noor (CS&IT, UoB, Quetta) CS406-Compiler Construction March 2014 7 / 44

Page 8: CS406-Compiler Construction - Lecture 3: Lexical Analysiscsit.uob.edu.pk/images/web/staff/lecture/doc-7.2014-4-18.No-7.pdf · 3 Input Buffering 4 Regular Expression 5 Quiz 2 6 Regular

Readings

There are plenty of compiler construction specialized tools available.You should read and understand the functions and characteristics ofthese different tools.

Waheed Noor (CS&IT, UoB, Quetta) CS406-Compiler Construction March 2014 8 / 44

Page 9: CS406-Compiler Construction - Lecture 3: Lexical Analysiscsit.uob.edu.pk/images/web/staff/lecture/doc-7.2014-4-18.No-7.pdf · 3 Input Buffering 4 Regular Expression 5 Quiz 2 6 Regular

Role of a Lexical Analyzer: Revisit I

First phase of the compilerRead the input characters from the source programGroup then into lexemesGenerate output as a sequence of tokens for each lexemeInteract with the symbol table by entering tokens for identifiers

Waheed Noor (CS&IT, UoB, Quetta) CS406-Compiler Construction March 2014 9 / 44

Page 10: CS406-Compiler Construction - Lecture 3: Lexical Analysiscsit.uob.edu.pk/images/web/staff/lecture/doc-7.2014-4-18.No-7.pdf · 3 Input Buffering 4 Regular Expression 5 Quiz 2 6 Regular

Role of a Lexical Analyzer: Revisit II

Removing white spaces (blank spaces, tab spaces, new line) andcomments.

Correlating and reporting error messages by the compiler.

Waheed Noor (CS&IT, UoB, Quetta) CS406-Compiler Construction March 2014 10 / 44

Page 11: CS406-Compiler Construction - Lecture 3: Lexical Analysiscsit.uob.edu.pk/images/web/staff/lecture/doc-7.2014-4-18.No-7.pdf · 3 Input Buffering 4 Regular Expression 5 Quiz 2 6 Regular

Tokens, Lexemes and Patterns: By ExampleC Language Statementprintf(“Total = %d \n”,score);

printf and score are lexemes matching pattern for a keywordtoken, and id“Total = %d \n” is lexeme matching literal

Figure : Some common tokens

Waheed Noor (CS&IT, UoB, Quetta) CS406-Compiler Construction March 2014 11 / 44

Page 12: CS406-Compiler Construction - Lecture 3: Lexical Analysiscsit.uob.edu.pk/images/web/staff/lecture/doc-7.2014-4-18.No-7.pdf · 3 Input Buffering 4 Regular Expression 5 Quiz 2 6 Regular

Lexical Errors I

It is difficult for lexical analyzer alone to identify source-code errorwithout the help of other components

For example, consider the a C statement fi (a==f(x)) ...

The lexical alone can not tell that fi is a misspelled keyword, or anundeclared function identifier, since it is a valid lexeme

Therefore, the lexical analyzer identifies it as a lexeme for token idand handover to the parser

The parser in this case may then handle this error

Waheed Noor (CS&IT, UoB, Quetta) CS406-Compiler Construction March 2014 12 / 44

Page 13: CS406-Compiler Construction - Lecture 3: Lexical Analysiscsit.uob.edu.pk/images/web/staff/lecture/doc-7.2014-4-18.No-7.pdf · 3 Input Buffering 4 Regular Expression 5 Quiz 2 6 Regular

Lexical Errors II

Another situation that may occur during lexical analysis, when thelexical analyzer can not match any pattern of the remaining input

Simplest recovery strategy in this case is called the panic moderecovery

In which, we delete successive characters from remaining inputuntil the lexical analyzer finds a matching pattern for a token

Waheed Noor (CS&IT, UoB, Quetta) CS406-Compiler Construction March 2014 13 / 44

Page 14: CS406-Compiler Construction - Lecture 3: Lexical Analysiscsit.uob.edu.pk/images/web/staff/lecture/doc-7.2014-4-18.No-7.pdf · 3 Input Buffering 4 Regular Expression 5 Quiz 2 6 Regular

Quiz 1

Figure : Question 1

Figure : Question 2

QuizIdentify appropriate lexemes from the C code above

Waheed Noor (CS&IT, UoB, Quetta) CS406-Compiler Construction March 2014 14 / 44

Page 15: CS406-Compiler Construction - Lecture 3: Lexical Analysiscsit.uob.edu.pk/images/web/staff/lecture/doc-7.2014-4-18.No-7.pdf · 3 Input Buffering 4 Regular Expression 5 Quiz 2 6 Regular

Outline

1 Phases and Passes

2 Readings & Quiz 1

3 Input Buffering

4 Regular Expression

5 Quiz 2

6 Regular Definition

7 Recognition of Tokens

Waheed Noor (CS&IT, UoB, Quetta) CS406-Compiler Construction March 2014 15 / 44

Page 16: CS406-Compiler Construction - Lecture 3: Lexical Analysiscsit.uob.edu.pk/images/web/staff/lecture/doc-7.2014-4-18.No-7.pdf · 3 Input Buffering 4 Regular Expression 5 Quiz 2 6 Regular

Input Buffering I

Input buffering is a way of speeding up the reading process of thesource program

It is difficult since we need to look one or more characters beyondthe next lexeme, before we can be sure that we have the rightlexeme

Often we may need to look at least one character ahead

For example, we can not be sure we have seen an identifier untilwe see a character that is not a letter or digit

Waheed Noor (CS&IT, UoB, Quetta) CS406-Compiler Construction March 2014 16 / 44

Page 17: CS406-Compiler Construction - Lecture 3: Lexical Analysiscsit.uob.edu.pk/images/web/staff/lecture/doc-7.2014-4-18.No-7.pdf · 3 Input Buffering 4 Regular Expression 5 Quiz 2 6 Regular

Input Buffering II

Letter and digit here are specific to the programming languagesuch as letter can be from a set of small or capital letters [a-z,A-Z]and digit can be from a set of [0 . . . 9]

Example (Another Situation)In C, we have single character operators such as <, >, = and similarlytwo character operators <=, >=, ==

Therefore, we need to use a two-buffer scheme, the buffer pairs

Waheed Noor (CS&IT, UoB, Quetta) CS406-Compiler Construction March 2014 17 / 44

Page 18: CS406-Compiler Construction - Lecture 3: Lexical Analysiscsit.uob.edu.pk/images/web/staff/lecture/doc-7.2014-4-18.No-7.pdf · 3 Input Buffering 4 Regular Expression 5 Quiz 2 6 Regular

Input Buffering III

Both buffers are of the same size N, that is normally of the discblock size

Load both buffers from the input program, so we don’t need a percharacter system read call

If the input program is smaller than the buffer size, then a specialcharacter, eof, end of file is placed

Waheed Noor (CS&IT, UoB, Quetta) CS406-Compiler Construction March 2014 18 / 44

Page 19: CS406-Compiler Construction - Lecture 3: Lexical Analysiscsit.uob.edu.pk/images/web/staff/lecture/doc-7.2014-4-18.No-7.pdf · 3 Input Buffering 4 Regular Expression 5 Quiz 2 6 Regular

Input Buffering IV

Two pointers are maintained1 lexemBegin: that marks the beginning of the current lexeme2 forward: scans ahead until a pattern match is found

Waheed Noor (CS&IT, UoB, Quetta) CS406-Compiler Construction March 2014 19 / 44

Page 20: CS406-Compiler Construction - Lecture 3: Lexical Analysiscsit.uob.edu.pk/images/web/staff/lecture/doc-7.2014-4-18.No-7.pdf · 3 Input Buffering 4 Regular Expression 5 Quiz 2 6 Regular

Outline

1 Phases and Passes

2 Readings & Quiz 1

3 Input Buffering

4 Regular Expression

5 Quiz 2

6 Regular Definition

7 Recognition of Tokens

Waheed Noor (CS&IT, UoB, Quetta) CS406-Compiler Construction March 2014 20 / 44

Page 21: CS406-Compiler Construction - Lecture 3: Lexical Analysiscsit.uob.edu.pk/images/web/staff/lecture/doc-7.2014-4-18.No-7.pdf · 3 Input Buffering 4 Regular Expression 5 Quiz 2 6 Regular

Regular Expression

DefinitionRegular expression is a formal notation for specifying lexeme patternsthrough pattern matching needed for tokens. Lexical analyzer is thenbuilt by converting these regular expression to automata to recognizespecific tokens.

Before we move on to learn regular expressions, we need to buildsome other concepts such as alphabet, strings, languages andoperations on languages.

Waheed Noor (CS&IT, UoB, Quetta) CS406-Compiler Construction March 2014 21 / 44

Page 22: CS406-Compiler Construction - Lecture 3: Lexical Analysiscsit.uob.edu.pk/images/web/staff/lecture/doc-7.2014-4-18.No-7.pdf · 3 Input Buffering 4 Regular Expression 5 Quiz 2 6 Regular

Alphabets, Strings & Languages

Definition (Alphabet)An alphabet is a finite set of symbols. For example, letters, digits, andpunctuations.

ExampleThe set {0,1} is the binary alphabet.ASCII is an example of alphabetUnicode approximately consist of 100,000 characters fromalphabets around the world

Waheed Noor (CS&IT, UoB, Quetta) CS406-Compiler Construction March 2014 22 / 44

Page 23: CS406-Compiler Construction - Lecture 3: Lexical Analysiscsit.uob.edu.pk/images/web/staff/lecture/doc-7.2014-4-18.No-7.pdf · 3 Input Buffering 4 Regular Expression 5 Quiz 2 6 Regular

Alphabets, Strings & LanguagesDefinition (Strings)A string is a finite sequence of symbols drawn from the alphabet andthat string is called the string over that alphabet. String is alsosometimes called “word” or “sentence”. The length of a string s isusually written as |s| and the empty string is denoted by ε whoselength is zero.

Exampleuniversity is a string of length ten.

ConcatenationConcatenation is a string operation that simply join/append a stringwith another string. For example, if s1 = Balochistan ands2 = University are two strings, then concatenation denoted bys1s2 = BalcohstanUniversity.

Waheed Noor (CS&IT, UoB, Quetta) CS406-Compiler Construction March 2014 23 / 44

Page 24: CS406-Compiler Construction - Lecture 3: Lexical Analysiscsit.uob.edu.pk/images/web/staff/lecture/doc-7.2014-4-18.No-7.pdf · 3 Input Buffering 4 Regular Expression 5 Quiz 2 6 Regular

Terms for Parts of Strings

Waheed Noor (CS&IT, UoB, Quetta) CS406-Compiler Construction March 2014 24 / 44

Page 25: CS406-Compiler Construction - Lecture 3: Lexical Analysiscsit.uob.edu.pk/images/web/staff/lecture/doc-7.2014-4-18.No-7.pdf · 3 Input Buffering 4 Regular Expression 5 Quiz 2 6 Regular

Alphabets, Strings & Languages

Definition (Language)Broadly, a language is any countable set of strings over some fixedalphabet. Here it is not related to the meanings associated to thestrings.

ExampleSet of all syntactically well-formed C programs are languagesSet of all grammatically correct English sentences are alsolanguages

Waheed Noor (CS&IT, UoB, Quetta) CS406-Compiler Construction March 2014 25 / 44

Page 26: CS406-Compiler Construction - Lecture 3: Lexical Analysiscsit.uob.edu.pk/images/web/staff/lecture/doc-7.2014-4-18.No-7.pdf · 3 Input Buffering 4 Regular Expression 5 Quiz 2 6 Regular

Operations on Languages

Usually capital letters are used to denote a language such as L, M, Detc.

Class ActivityLet L be the set of letters {A,B, . . . ,Z ,a,b, . . . , z} and let D be the setof digits {0,1, . . . ,9}. L and D here can be alphabets and can also beconsidered as languages whose all strings are of length one.

Waheed Noor (CS&IT, UoB, Quetta) CS406-Compiler Construction March 2014 26 / 44

Page 27: CS406-Compiler Construction - Lecture 3: Lexical Analysiscsit.uob.edu.pk/images/web/staff/lecture/doc-7.2014-4-18.No-7.pdf · 3 Input Buffering 4 Regular Expression 5 Quiz 2 6 Regular

Regular Expression

DefinitionRegular expression is a formal notation describing languages oversome alphabet by applying different language operations such asunion, concatenation. The regular expressions are very useful forstrings pattern matching that is why they are used to recognizelexemes.

Regular expression are built recursively from smaller regularexpression.Each regular expression denotes a language, which is alsodefined recursively from languages denoted by subexpression ofthat regular expression.For example, let r be a regular expression denoting a languageL(r)We will denote alphabet by σ, e.g., Σ = {0,1}

Waheed Noor (CS&IT, UoB, Quetta) CS406-Compiler Construction March 2014 27 / 44

Page 28: CS406-Compiler Construction - Lecture 3: Lexical Analysiscsit.uob.edu.pk/images/web/staff/lecture/doc-7.2014-4-18.No-7.pdf · 3 Input Buffering 4 Regular Expression 5 Quiz 2 6 Regular

Basic Rules of Regular Expression

Rule 1ε is a regular expression denoting language L(ε)= {ε}, i.e., the onlymember is the empty string.

Rule 2If a is a symbol in Σ then a is a regular expression and L(a) = {a},which is the language with one string of length one.

Waheed Noor (CS&IT, UoB, Quetta) CS406-Compiler Construction March 2014 28 / 44

Page 29: CS406-Compiler Construction - Lecture 3: Lexical Analysiscsit.uob.edu.pk/images/web/staff/lecture/doc-7.2014-4-18.No-7.pdf · 3 Input Buffering 4 Regular Expression 5 Quiz 2 6 Regular

Induction

DefinitionLarger regular expressions can be built from smaller regularexpressions through induction, i.e., inducing smaller regularexpressions to existing regular expression.

For example, Let r and s are regular expressions denoting languagesL(r) and L(s), respectively

(r)|(s) or r|s is a regular expression denoting the language L(r) ∪L(s).(r)(s) or rs is a regular expression denoting the language L(r)L(s).(r∗) or r∗ is a regular expression denoting the language (L(r))∗

(r) is a regular expression denoting the language L(r).

Waheed Noor (CS&IT, UoB, Quetta) CS406-Compiler Construction March 2014 29 / 44

Page 30: CS406-Compiler Construction - Lecture 3: Lexical Analysiscsit.uob.edu.pk/images/web/staff/lecture/doc-7.2014-4-18.No-7.pdf · 3 Input Buffering 4 Regular Expression 5 Quiz 2 6 Regular

Some Properties of Operators

Kleene star closure * is a unary operator having highestprecedence and is left associative.Concatenation has the second highest precedence and is leftassociative as well.| has the lowest the lowest associative and is left associative.

Class Activitya|b*c, what language it denotes.

Waheed Noor (CS&IT, UoB, Quetta) CS406-Compiler Construction March 2014 30 / 44

Page 31: CS406-Compiler Construction - Lecture 3: Lexical Analysiscsit.uob.edu.pk/images/web/staff/lecture/doc-7.2014-4-18.No-7.pdf · 3 Input Buffering 4 Regular Expression 5 Quiz 2 6 Regular

An ExampleLet Σ = {a,b} be an alphabet then the regular expressions andrespective languages are given bellow.

a|b denotes {a,b}.(a|b)(a|b) denotes a language of all strings of length two{aa,ab,bb,ba}.(a|b)* denotes the language {ε,a,b,aaab,baab, . . .}.a|a*b denotes the language ???

Class ActivityGiven is the above alphabet Σ, write regular expressions for followinglanguages

All strings must start and end with the symbol a and if the symbolb appears it must appear even number of times.All strings with an even number of a and odd number of b.

Waheed Noor (CS&IT, UoB, Quetta) CS406-Compiler Construction March 2014 31 / 44

Page 32: CS406-Compiler Construction - Lecture 3: Lexical Analysiscsit.uob.edu.pk/images/web/staff/lecture/doc-7.2014-4-18.No-7.pdf · 3 Input Buffering 4 Regular Expression 5 Quiz 2 6 Regular

More Examples

Describe the languages from following regular expressionsa(a|b)*b.((ε|a)b*)*.(a|b)*a(a|b)(a|b).a*ba*ba*ba*.

Waheed Noor (CS&IT, UoB, Quetta) CS406-Compiler Construction March 2014 32 / 44

Page 33: CS406-Compiler Construction - Lecture 3: Lexical Analysiscsit.uob.edu.pk/images/web/staff/lecture/doc-7.2014-4-18.No-7.pdf · 3 Input Buffering 4 Regular Expression 5 Quiz 2 6 Regular

Regular SetDefinitionA language the can be defined by a regular expression is calledregular set. If two regular expressions r and s denotes the sameregular set then they are equivalent and can be written as r = s.

Some algebraic laws that apply on regular expressions are givenbellow that assert that the two regular expressions of different formsare equivalent.

Waheed Noor (CS&IT, UoB, Quetta) CS406-Compiler Construction March 2014 33 / 44

Page 34: CS406-Compiler Construction - Lecture 3: Lexical Analysiscsit.uob.edu.pk/images/web/staff/lecture/doc-7.2014-4-18.No-7.pdf · 3 Input Buffering 4 Regular Expression 5 Quiz 2 6 Regular

Outline

1 Phases and Passes

2 Readings & Quiz 1

3 Input Buffering

4 Regular Expression

5 Quiz 2

6 Regular Definition

7 Recognition of Tokens

Waheed Noor (CS&IT, UoB, Quetta) CS406-Compiler Construction March 2014 34 / 44

Page 35: CS406-Compiler Construction - Lecture 3: Lexical Analysiscsit.uob.edu.pk/images/web/staff/lecture/doc-7.2014-4-18.No-7.pdf · 3 Input Buffering 4 Regular Expression 5 Quiz 2 6 Regular

Quiz 2

Q1: Write a regular expression to denote a language consisting of allstrings that starts either with a or b but must end with b and if bappears in the string then it must only be followed by another b.

Q2: Describe the language denoted by the regular expression((aa)*a|bb)*

Waheed Noor (CS&IT, UoB, Quetta) CS406-Compiler Construction March 2014 35 / 44

Page 36: CS406-Compiler Construction - Lecture 3: Lexical Analysiscsit.uob.edu.pk/images/web/staff/lecture/doc-7.2014-4-18.No-7.pdf · 3 Input Buffering 4 Regular Expression 5 Quiz 2 6 Regular

Outline

1 Phases and Passes

2 Readings & Quiz 1

3 Input Buffering

4 Regular Expression

5 Quiz 2

6 Regular Definition

7 Recognition of Tokens

Waheed Noor (CS&IT, UoB, Quetta) CS406-Compiler Construction March 2014 36 / 44

Page 37: CS406-Compiler Construction - Lecture 3: Lexical Analysiscsit.uob.edu.pk/images/web/staff/lecture/doc-7.2014-4-18.No-7.pdf · 3 Input Buffering 4 Regular Expression 5 Quiz 2 6 Regular

Regular Definition

DefinitionFor the purpose of notational convenience, we may define/assignnames to certain regular expressions and treat them as symbols byusing them in subsequent regular expressions. In this way suchsequence of names/definition will be called regular definition.

If Σ is our alphabet, then such sequence of definition will take the form

Waheed Noor (CS&IT, UoB, Quetta) CS406-Compiler Construction March 2014 37 / 44

Page 38: CS406-Compiler Construction - Lecture 3: Lexical Analysiscsit.uob.edu.pk/images/web/staff/lecture/doc-7.2014-4-18.No-7.pdf · 3 Input Buffering 4 Regular Expression 5 Quiz 2 6 Regular

Regular Definition: ExamplesC language identifiers

Unsigned Numbers such as 3402, 0.065, 2.333E4 and 2.333E − 4

In regular expressions, we can use ? the unary operator for zero or oneinstance.Similarly, we can also use character classes such as [A-Za-z] for letter,and [0-9] for digit.Waheed Noor (CS&IT, UoB, Quetta) CS406-Compiler Construction March 2014 38 / 44

Page 39: CS406-Compiler Construction - Lecture 3: Lexical Analysiscsit.uob.edu.pk/images/web/staff/lecture/doc-7.2014-4-18.No-7.pdf · 3 Input Buffering 4 Regular Expression 5 Quiz 2 6 Regular

Regular Definition: Examples

Class ActivityAll strings of lower case containing vowels in order.

All strings surrounded by \∗ and ∗\.All strings of a and b, with an even number of a’s and an odd numberb’s.

Waheed Noor (CS&IT, UoB, Quetta) CS406-Compiler Construction March 2014 39 / 44

Page 40: CS406-Compiler Construction - Lecture 3: Lexical Analysiscsit.uob.edu.pk/images/web/staff/lecture/doc-7.2014-4-18.No-7.pdf · 3 Input Buffering 4 Regular Expression 5 Quiz 2 6 Regular

Outline

1 Phases and Passes

2 Readings & Quiz 1

3 Input Buffering

4 Regular Expression

5 Quiz 2

6 Regular Definition

7 Recognition of Tokens

Waheed Noor (CS&IT, UoB, Quetta) CS406-Compiler Construction March 2014 40 / 44

Page 41: CS406-Compiler Construction - Lecture 3: Lexical Analysiscsit.uob.edu.pk/images/web/staff/lecture/doc-7.2014-4-18.No-7.pdf · 3 Input Buffering 4 Regular Expression 5 Quiz 2 6 Regular

Token RecognitionWe have already learned how to express patterns using regularexpressions. Now, we need to learn

How to take patterns for all required tokens.Build a piece of code to examine the input strings and finds aprefix that is a lexeme matching a pattern.

For the purpose of studying and understanding, we will make use offollowing running example

Figure : An example of a grammar for branching statements

Waheed Noor (CS&IT, UoB, Quetta) CS406-Compiler Construction March 2014 41 / 44

Page 42: CS406-Compiler Construction - Lecture 3: Lexical Analysiscsit.uob.edu.pk/images/web/staff/lecture/doc-7.2014-4-18.No-7.pdf · 3 Input Buffering 4 Regular Expression 5 Quiz 2 6 Regular

Token Recognition: Regular DefinitionHere, lexical analyzer is concerned to recognize tokens for if, then,else, relop, id and number. Therefore, we have built following regulardefinitions.

What are possible tokens, lexemes and attribute values?

Waheed Noor (CS&IT, UoB, Quetta) CS406-Compiler Construction March 2014 42 / 44

Page 43: CS406-Compiler Construction - Lecture 3: Lexical Analysiscsit.uob.edu.pk/images/web/staff/lecture/doc-7.2014-4-18.No-7.pdf · 3 Input Buffering 4 Regular Expression 5 Quiz 2 6 Regular

Token Recognition: Class Activity

ExampleAssume the following grammar for assignment statement, what will bethe regular definitions for all possible tokens:stmt→ id := expr,expr→ expr arithop term,expr→ term,term→ id | number

Waheed Noor (CS&IT, UoB, Quetta) CS406-Compiler Construction March 2014 43 / 44

Page 44: CS406-Compiler Construction - Lecture 3: Lexical Analysiscsit.uob.edu.pk/images/web/staff/lecture/doc-7.2014-4-18.No-7.pdf · 3 Input Buffering 4 Regular Expression 5 Quiz 2 6 Regular

References I

Waheed Noor (CS&IT, UoB, Quetta) CS406-Compiler Construction March 2014 44 / 44