a brief [f]lex tutorial saumya debray the university of arizona tucson, az 85721

28
A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ 85721

Upload: denis-robertson

Post on 24-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ 85721

A brief [f]lex tutorial

Saumya DebrayThe University of Arizona

Tucson, AZ 85721

Page 2: A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ 85721

A quick tutorial on Lex 2

flex (and lex): Overview

Scanner generators: Helps write programs whose control flow is

directed by instances of regular expressions in the input stream.

Input: a set of regular expressions + actions

Output: C code implementing a scanner:

function: yylex()

file: lex.yy.c

flex (or lex)

Page 3: A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ 85721

A quick tutorial on Lex 3

Using flex

lex input spec (regexps + actions)

file: lex.yy.c

yylex(){…}

driver code

compiler

flex

user supplies

main() {…} orparser() {…}

Page 4: A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ 85721

A quick tutorial on Lex 4

flex: input format

An input file has the following structure:

definitions %% rules %% user code

optionalrequired

Shortest possible legal flex input:

%%

Page 5: A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ 85721

A quick tutorial on Lex 5

Definitions

A series of: name definitions, each of the form

name definition

e.g.:DIGIT [0-9]CommentStart "/*"ID [a-zA-Z][a-zA-Z0-9]*

start conditions stuff to be copied verbatim into the flex output

(e.g., declarations, #includes):– enclosed in %{ … }%, or – indented

Page 6: A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ 85721

A quick tutorial on Lex 6

Rules

The rules portion of the input contains a sequence of rules.

Each rule has the formpattern action

where: pattern describes a pattern to be matched on the

input pattern must be un-indented action must begin on the same line.

Page 7: A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ 85721

A quick tutorial on Lex 7

Patterns

Essentially, extended regular expressions. Syntax: similar to grep (see man page) <<EOF>> to match “end of file” Character classes:

– [:alpha:], [:digit:], [:alnum:], [:space:], etc. (see man page)

{name} where name was defined earlier. “start conditions” can be used to specify that a

pattern match only in specific situations.

Page 8: A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ 85721

A quick tutorial on Lex 8

Example

%{

#include <stdio.h>

#include <stdlib.h>

%}

dgt [0-9]

%%

{dgt}+ return atoi(yytext);

%%

void main()

{

int val, total = 0, n = 0;

while ( (val = yylex()) > 0 ) {

total += val;

n++;

}

if (n > 0) printf(“ave = %d\n”, total/n);

}

A flex program to read a file of (positive) integers and compute the average:

Page 9: A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ 85721

A quick tutorial on Lex 9

Example

%{

#include <stdio.h>

#include <stdlib.h>

%}

dgt [0-9]

%%

{dgt}+ return atoi(yytext);

%%

void main()

{

int val, total = 0, n = 0;

while ( (val = yylex()) > 0 ) {

total += val;

n++;

}

if (n > 0) printf(“ave = %d\n”, total/n);

}

A flex program to read a file of (positive) integers and compute the average:

Definition for a digit (could have used builtin definition [:digit:] instead)

Rule to match a number and return its value to the calling routine

Driver code(could instead have been in a separate file)

defin

ition

sru

les

user

cod

e

Page 10: A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ 85721

A quick tutorial on Lex 10

Example

%{

#include <stdio.h>

#include <stdlib.h>

%}

dgt [0-9]

%%

{dgt}+ return atoi(yytext);

%%

void main()

{

int val, total = 0, n = 0;

while ( (val = yylex()) > 0 ) {

total += val;

n++;

}

if (n > 0) printf(“ave = %d\n”, total/n);

}

A flex program to read a file of (positive) integers and compute the average:

defin

ition

sru

les

user

cod

e

defining and using a name

Page 11: A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ 85721

A quick tutorial on Lex 11

Example

%{

#include <stdio.h>

#include <stdlib.h>

%}

dgt [0-9]

%%

{dgt}+ return atoi(yytext);

%%

void main()

{

int val, total = 0, n = 0;

while ( (val = yylex()) > 0 ) {

total += val;

n++;

}

if (n > 0) printf(“ave = %d\n”, total/n);

}

A flex program to read a file of (positive) integers and compute the average:

defin

ition

sru

les

user

cod

e

defining and using a name

char * yytext;a buffer that holds the input characters that actually match the pattern

Page 12: A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ 85721

A quick tutorial on Lex 12

Example

%{

#include <stdio.h>

#include <stdlib.h>

%}

dgt [0-9]

%%

{dgt}+ return atoi(yytext);

%%

void main()

{

int val, total = 0, n = 0;

while ( (val = yylex()) > 0 ) {

total += val;

n++;

}

if (n > 0) printf(“ave = %d\n”, total/n);

}

A flex program to read a file of (positive) integers and compute the average:

defin

ition

sru

les

user

cod

e

defining and using a name

char * yytext;a buffer that holds the input characters that actually match the pattern

Invoking the scanner: yylex()

Each time yylex() is called, the scanner continues processing the input from where it last left off.Returns 0 on end-of-file.

Page 13: A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ 85721

Avoiding compiler warnings

If compiled using “gcc –Wall” the previous flex file will generate compiler warnings:

lex.yy.c: … : warning: `yyunput’ defined but not used

lex.yy.c: … : warning: `input’ defined but not used

These can be removed using ‘%option’ declarations in the first part of the flex input file:

%option nounput

%option noinput

A quick tutorial on Lex 13

Page 14: A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ 85721

A quick tutorial on Lex 14

Matching the Input

When more than one pattern can match the input, the scanner behaves as follows: the longest match is chosen; if multiple rules match, the rule listed first in the

flex input file is chosen; if no rule matches, the default is to copy the next

character to stdout. The text that matched (the “token”) is copied to a

buffer yytext.

Page 15: A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ 85721

A quick tutorial on Lex 15

Matching the Input (cont’d)

Pattern to match C-style comments: /* … */

"/*"(.|\n)*"*/"Input:

#include <stdio.h> /* definitions */

int main(int argc, char * argv[ ]) {

if (argc <= 1) {

printf(“Error!\n”); /* no arguments */

}

printf(“%d args given\n”, argc);

return 0;

}

Page 16: A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ 85721

A quick tutorial on Lex 16

Matching the Input (cont’d)

Pattern to match C-style comments: /* … */

"/*"(.|\n)*"*/"Input:

#include <stdio.h> /* definitions */

int main(int argc, char * argv[ ]) {

if (argc <= 1) {

printf(“Error!\n”); /* no arguments */

}

printf(“%d args given\n”, argc);

return 0;

}

longest match:

Page 17: A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ 85721

A quick tutorial on Lex 17

Matching the Input (cont’d)

Pattern to match C-style comments: /* … */

"/*"(.|\n)*"*/"Input:

#include <stdio.h> /* definitions */

int main(int argc, char * argv[ ]) {

if (argc <= 1) {

printf(“Error!\n”); /* no arguments */

}

printf(“%d args given\n”, argc);

return 0;

}

longest match:Matched text shown in blue

Page 18: A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ 85721

A quick tutorial on Lex 18

Start Conditions

Used to activate rules conditionally. Any rule prefixed with <S> will be activated only

when the scanner is in start condition S. Declaring a start condition S:

in the definition section: %x S– “%x” specifies “exclusive start conditions”– flex also supports “inclusive start conditions” (“%s”),

see man pages.

Putting the scanner into start condition S: action: BEGIN(S)

Page 19: A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ 85721

A quick tutorial on Lex 19

Start Conditions (cont’d)

Example: <STRING>[^"]* { …match string body… }

– [^"] matches any character other than "– The rule is activated only if the scanner is in the start

condition STRING.

INITIAL refers to the original state where no start conditions are active.

<*> matches all start conditions.

Page 20: A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ 85721

A quick tutorial on Lex 20

Using Start Conditions

Start conditions let us explicitly simulate finite state machines.

This lets us get around the “longest match” problem for C-style comments.

/ *

non-*

*

*/

non-{ /,* }

flex input:

%x S1, S2, S3%%"/" BEGIN(S1);<S1>"*" BEGIN(S2);<S2>[^*] ; /* stay in S2 */<S2>"*" BEGIN(S3);<S3>"*" ; /* stay in S3 */<S3>[^*/] BEGIN(S2);<S3>"/" BEGIN(INITIAL);

S1 S2 S3

FSA for C comments:

Page 21: A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ 85721

A quick tutorial on Lex 21

Using Start Conditions

Start conditions let us explicitly simulate finite state machines.

This lets us get around the “longest match” problem for C-style comments.

/ *

non-*

*

*/

non-{ /,* }

flex input:

%x S1, S2, S3%%"/" BEGIN(S1);<S1>"*" BEGIN(S2);<S2>[^*] ; /* stay in S2 */<S2>"*" BEGIN(S3);<S3>"*“ ; /* stay in S3 */<S3>[^*/] BEGIN(S2);<S3>"/" BEGIN(INITIAL);

S1 S2 S3

FSA for C comments:

Page 22: A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ 85721

A quick tutorial on Lex 22

Using Start Conditions

Start conditions let us explicitly simulate finite state machines.

This lets us get around the “longest match” problem for C-style comments.

/ *

non-*

*

*/

non-{ /,* }

flex input:

%x S1, S2, S3%%"/" BEGIN(S1);<S1>"*" BEGIN(S2);<S2>[^*] ; /* stay in S2 */<S2>"*" BEGIN(S3);<S3>"*“ ; /* stay in S3 */<S3>[^*/] BEGIN(S2);<S3>"/" BEGIN(INITIAL);

S1 S2 S3

FSA for C comments:

Page 23: A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ 85721

A quick tutorial on Lex 23

Using Start Conditions

Start conditions let us explicitly simulate finite state machines.

This lets us get around the “longest match” problem for C-style comments.

/ *

non-*

*

*/

non-{ /,* }

flex input:

%x S1, S2, S3%%"/" BEGIN(S1);<S1>"*" BEGIN(S2);<S2>[^*] ; /* stay in S2 */<S2>"*" BEGIN(S3);<S3>"*“ ; /* stay in S3 */<S3>[^*/] BEGIN(S2);<S3>"/" BEGIN(INITIAL);

S1 S2 S3

FSA for C comments:

Page 24: A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ 85721

A quick tutorial on Lex 24

Using Start Conditions

Start conditions let us explicitly simulate finite state machines.

This lets us get around the “longest match” problem for C-style comments.

/ *

non-*

*

*/

non-{ /,* }

flex input:

%x S1, S2, S3%%"/" BEGIN(S1);<S1>"*" BEGIN(S2);<S2>[^*] ; /* stay in S2 */<S2>"*" BEGIN(S3);<S3>"*“ ; /* stay in S3 */<S3>[^*/] BEGIN(S2);<S3>"/" BEGIN(INITIAL);

S1 S2 S3

FSA for C comments:

Page 25: A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ 85721

A quick tutorial on Lex 25

Using Start Conditions

Start conditions let us explicitly simulate finite state machines.

This lets us get around the “longest match” problem for C-style comments.

/ *

non-*

*

*/

non-{ /,* }

flex input:

%x S1, S2, S3%%"/" BEGIN(S1);<S1>"*" BEGIN(S2);<S2>[^*] ; /* stay in S2 */<S2>"*" BEGIN(S3);<S3>"*“ ; /* stay in S3 */<S3>[^*/] BEGIN(S2);<S3>"/" BEGIN(INITIAL);

S1 S2 S3

FSA for C comments:

Page 26: A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ 85721

A quick tutorial on Lex 26

Using Start Conditions

Start conditions let us explicitly simulate finite state machines.

This lets us get around the “longest match” problem for C-style comments.

/ *

non-*

*

*/

non-{ /,* }

flex input:

%x S1, S2, S3%%"/" BEGIN(S1);<S1>"*" BEGIN(S2);<S2>[^*] ; /* stay in S2 */<S2>"*" BEGIN(S3);<S3>"*“ ; /* stay in S3 */<S3>[^*/] BEGIN(S2);<S3>"/" BEGIN(INITIAL);

S1 S2 S3

FSA for C comments:

Page 27: A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ 85721

A quick tutorial on Lex 27

Using Start Conditions

Start conditions let us explicitly simulate finite state machines.

This lets us get around the “longest match” problem for C-style comments.

/ *

non-*

*

*/

non-{ /,* }

flex input:

%x S1, S2, S3%%"/" BEGIN(S1);<S1>"*" BEGIN(S2);<S2>[^*] ; /* stay in S2 */<S2>"*" BEGIN(S3);<S3>"*“ ; /* stay in S3 */<S3>[^*/] BEGIN(S2);<S3>"/" BEGIN(INITIAL);

S1 S2 S3

FSA for C comments:

Page 28: A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ 85721

A quick tutorial on Lex 28

Putting it all together

Scanner implemented as a functionint yylex();

return value indicates type of token found (encoded as a +ve integer);

the actual string matched is available in yytext. Scanner and parser need to agree on token type

encodings let yacc generate the token type encodings

– yacc places these in a file y.tab.h use “#include y.tab.h” in the definitions section of the flex

input file. When compiling, link in the flex library using “-lfl”