metalexer : a modular lexical specification language
DESCRIPTION
Andrew Casey Laurie Hendren McGill University. MetaLexer : A Modular Lexical Specification Language. www.sable.mcgill.ca/metalexer. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A A A A A A A A A A A A. Why MetaLexer ? - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: MetaLexer : A Modular Lexical Specification Language](https://reader035.vdocument.in/reader035/viewer/2022062310/568161bd550346895dd19a0f/html5/thumbnails/1.jpg)
MetaLexer: A Modular Lexical
Specification LanguageAndrew CaseyLaurie Hendren McGill University
www.sable.mcgill.ca/metalexer
![Page 2: MetaLexer : A Modular Lexical Specification Language](https://reader035.vdocument.in/reader035/viewer/2022062310/568161bd550346895dd19a0f/html5/thumbnails/2.jpg)
2
Why MetaLexer?
Why is it relevant to AOSD?
What are the challenges?
![Page 3: MetaLexer : A Modular Lexical Specification Language](https://reader035.vdocument.in/reader035/viewer/2022062310/568161bd550346895dd19a0f/html5/thumbnails/3.jpg)
3
Structure of a Compiler Front-End
Scanner (Lexical Analysis)
Parser and Semantic Checks
Context-free grammars + actions/attributes (yacc, bison, Polyglot, JastAdd, ...)
Regular Expressions +State (flex, jflex, ...)
![Page 4: MetaLexer : A Modular Lexical Specification Language](https://reader035.vdocument.in/reader035/viewer/2022062310/568161bd550346895dd19a0f/html5/thumbnails/4.jpg)
4
Given a front-end specification for a
language (i.e. Java), current method to
implement a front-end for an extension of that language (i.e.
AspectJ)?Lexical specification for original language
Grammar and actions for original language
Modified lexical specification for extended language
Grammar and actions for original languageGrammar
rules for extension
![Page 5: MetaLexer : A Modular Lexical Specification Language](https://reader035.vdocument.in/reader035/viewer/2022062310/568161bd550346895dd19a0f/html5/thumbnails/5.jpg)
5
Desired Modular MetaLexer Approach
Lexical specification for original language
Grammar and actions for original language
Lexical specification for original language
Grammar and actions for original language
Grammar rules for
extension
Lexical rules for
extension
![Page 6: MetaLexer : A Modular Lexical Specification Language](https://reader035.vdocument.in/reader035/viewer/2022062310/568161bd550346895dd19a0f/html5/thumbnails/6.jpg)
6
We also want to be able to combine lexical
specifications for diverse languages.
• Java + HTML• Java + Aspects (AspectJ)• Java + SQL• MATLAB + Aspects (AspectMatlab)
![Page 7: MetaLexer : A Modular Lexical Specification Language](https://reader035.vdocument.in/reader035/viewer/2022062310/568161bd550346895dd19a0f/html5/thumbnails/7.jpg)
7
Would like to be able to reuse and
extend lexical
specification modules• Nested C-style comments
• Javadoc comments• Floating-point constants• URL• regular expressions• …
![Page 8: MetaLexer : A Modular Lexical Specification Language](https://reader035.vdocument.in/reader035/viewer/2022062310/568161bd550346895dd19a0f/html5/thumbnails/8.jpg)
8
Scanning AspectJ
![Page 9: MetaLexer : A Modular Lexical Specification Language](https://reader035.vdocument.in/reader035/viewer/2022062310/568161bd550346895dd19a0f/html5/thumbnails/9.jpg)
9
Scanning Java/Javadoc
![Page 10: MetaLexer : A Modular Lexical Specification Language](https://reader035.vdocument.in/reader035/viewer/2022062310/568161bd550346895dd19a0f/html5/thumbnails/10.jpg)
10
First, let’s understand
the traditional lexer tools (lex, flex,
jflex). • programmer specifies regular expressions + actions• tools generate a finite automaton-based implementation• states are used to handle different language contexts
![Page 11: MetaLexer : A Modular Lexical Specification Language](https://reader035.vdocument.in/reader035/viewer/2022062310/568161bd550346895dd19a0f/html5/thumbnails/11.jpg)
11
1 %%2 %class Lexer3 Identi¯er = [: jletter :] [: jletterdigit :]¤4 ...5 %state STR ING6 %%7 <Y Y IN IT IA L> f8 "abstract" f return symbol(sym.ABSTRACT ); g9 f Identi¯er g f return symbol(sym.IDENT IF IER ); g10 n" f string.setLength(0); yybegin(ST R ING); g11 ...12 g13
14 <STR ING> f15 n" f yybegin(Y Y IN IT IA L ); return ...; g16 [̂ nnnrn"nn]+ f string.append( yytext() ); g17 nnt f string.append('nt '); g18 ...19 g
![Page 12: MetaLexer : A Modular Lexical Specification Language](https://reader035.vdocument.in/reader035/viewer/2022062310/568161bd550346895dd19a0f/html5/thumbnails/12.jpg)
12
Current (ugly) method for extending jflex specifications -
copy&modify
Principled way of weaving new rules into existing rules.
Modular and abstract notion of state and changing between states.
Copy jflex specification. Insert new scanner rules into copy.
Order of rules matters! Introduce new states and action logic for converting
between states.
![Page 13: MetaLexer : A Modular Lexical Specification Language](https://reader035.vdocument.in/reader035/viewer/2022062310/568161bd550346895dd19a0f/html5/thumbnails/13.jpg)
13
Jflex Lexing Structure
Lexing rules associated with a state. Changing states associated with action
code.
Specification in one file.
![Page 14: MetaLexer : A Modular Lexical Specification Language](https://reader035.vdocument.in/reader035/viewer/2022062310/568161bd550346895dd19a0f/html5/thumbnails/14.jpg)
14
MetaLexer Structure
Components define lexing rules associated with a state and produce meta-tokens.
Layout defines transitions between components, state changes by meta-lexer.
Each component specified in its own file.
Layout specified in its
own file.
![Page 15: MetaLexer : A Modular Lexical Specification Language](https://reader035.vdocument.in/reader035/viewer/2022062310/568161bd550346895dd19a0f/html5/thumbnails/15.jpg)
15
Structure of a MetaLexer Specification for Matlab
![Page 16: MetaLexer : A Modular Lexical Specification Language](https://reader035.vdocument.in/reader035/viewer/2022062310/568161bd550346895dd19a0f/html5/thumbnails/16.jpg)
16
Extending a MetaLexer Specification for Matlab
![Page 17: MetaLexer : A Modular Lexical Specification Language](https://reader035.vdocument.in/reader035/viewer/2022062310/568161bd550346895dd19a0f/html5/thumbnails/17.jpg)
17
Sharing component specifications with MetaLexer
![Page 18: MetaLexer : A Modular Lexical Specification Language](https://reader035.vdocument.in/reader035/viewer/2022062310/568161bd550346895dd19a0f/html5/thumbnails/18.jpg)
18
Scanning a properties file1 #some properties2 name=properties3 date=2009/ 09/ 214
5 #some more properties6 owner=root
Properties
Key Value
Util_Patterns
![Page 19: MetaLexer : A Modular Lexical Specification Language](https://reader035.vdocument.in/reader035/viewer/2022062310/568161bd550346895dd19a0f/html5/thumbnails/19.jpg)
19
util_properties.mlc helper component
1 %component util patterns2 %helper3
4 lineTerminator = [nrnn] j "nrnn"5 otherWhitespace = [ ntnfnb]6 identi¯er = [a¡ zA ¡ Z][a¡ zA ¡ Z0¡ 9 ]¤7 comment = #[̂ nrnn]¤
![Page 20: MetaLexer : A Modular Lexical Specification Language](https://reader035.vdocument.in/reader035/viewer/2022062310/568161bd550346895dd19a0f/html5/thumbnails/20.jpg)
20
key.mlc component1 %component key2 %extern "Token symbol(int)"3 %extern "Token symbol(int, String)"4 %extern "void error(String) throws LexerException"5
6 %%7
8 %%inherit util patterns9 f lineTerminatorg f : / ¤ignore¤/ :g10 f otherWhitespaceg f : / ¤ignore¤/ :g11 "=" f : return symbol(A SSIGN); :g ASSIGN12 %:13 f identi¯er g f : return symbol(K EY , yytext()); :g14 f commentg f : / ¤ignore¤/ :g15 %:16 <<ANY >> f : error("Unexpected char '"+yytext()+" '" ); :g17 <<EOF>> f : return symbol(EOF ); :g
![Page 21: MetaLexer : A Modular Lexical Specification Language](https://reader035.vdocument.in/reader035/viewer/2022062310/568161bd550346895dd19a0f/html5/thumbnails/21.jpg)
21
value.mlc component
1 %component value2 %extern "Token symbol(int, String, int, int , int , int )"3 %appendf4 return symbol(VA LUE, text, startL ine, startCol,5 endL ine, endCol);6 %appendg7
8 %%9
10 %%inherit util patterns11 f lineTerminatorg f : :g L INE T ERM INATOR12 %:13 %:14 <<ANY >> f : append(yytext()); :g15 <<EOF>> f : :g L INE T ERM INATOR
![Page 22: MetaLexer : A Modular Lexical Specification Language](https://reader035.vdocument.in/reader035/viewer/2022062310/568161bd550346895dd19a0f/html5/thumbnails/22.jpg)
22
properties.mll layout1 package properties;2 %%3 import static properties.TokenTypes.¤;4 %%5 %layout properties6 %option public "%public"7 ...8 %lexthrow "LexerException"9 %component key10 %component value11 %start key12 %%13 %%embed14 %name key value15 %host key16 %guest value17 %start A SSIGN18 %end LINE T ERM INATOR
![Page 23: MetaLexer : A Modular Lexical Specification Language](https://reader035.vdocument.in/reader035/viewer/2022062310/568161bd550346895dd19a0f/html5/thumbnails/23.jpg)
23
MetaLexer is implemented and available:www.sable.mcgill.ca/metalexer
properties.mll
key.mlc
value.mlc
util_patterns.mlc
MetaLexerproperties.jflex
![Page 24: MetaLexer : A Modular Lexical Specification Language](https://reader035.vdocument.in/reader035/viewer/2022062310/568161bd550346895dd19a0f/html5/thumbnails/24.jpg)
24
Key problems to solve: How to implement the meta-token lexer?
How to allow for insertion of new components, replacing of components, adding new embeddings (metalexer transitions).
How to insert new patterns into components as specific points.
![Page 25: MetaLexer : A Modular Lexical Specification Language](https://reader035.vdocument.in/reader035/viewer/2022062310/568161bd550346895dd19a0f/html5/thumbnails/25.jpg)
25
Implementing the meta-token lexer
Recognize the
matching suffix.
Recognize a meta-
pattern, i.e. when to go
to a new component and when to
return.
![Page 26: MetaLexer : A Modular Lexical Specification Language](https://reader035.vdocument.in/reader035/viewer/2022062310/568161bd550346895dd19a0f/html5/thumbnails/26.jpg)
26
Implementing inheritance (structured weaving).
![Page 27: MetaLexer : A Modular Lexical Specification Language](https://reader035.vdocument.in/reader035/viewer/2022062310/568161bd550346895dd19a0f/html5/thumbnails/27.jpg)
27
Implementing MetaLexer layout inheritance
• Layouts can inherit other layouts
• %inherit directive put at the location at which the inherited transition rules (embeddings) should be placed.
• each %inherit directive can be followed by:• %unoption• %replace• %unembed• new embeddings
![Page 28: MetaLexer : A Modular Lexical Specification Language](https://reader035.vdocument.in/reader035/viewer/2022062310/568161bd550346895dd19a0f/html5/thumbnails/28.jpg)
28
Implementing MetaLexer component inheritance
![Page 29: MetaLexer : A Modular Lexical Specification Language](https://reader035.vdocument.in/reader035/viewer/2022062310/568161bd550346895dd19a0f/html5/thumbnails/29.jpg)
29
O
Weaving in inherited component
Original Component
New Component adds some rules and inherits original component.
Woven output
![Page 30: MetaLexer : A Modular Lexical Specification Language](https://reader035.vdocument.in/reader035/viewer/2022062310/568161bd550346895dd19a0f/html5/thumbnails/30.jpg)
30
Results:Applied to three projects with complex scanners:
• AspectJ (abc and extensions)
• Matlab (Annotations and AspectMatlab extensions)
• MetaLexer
![Page 31: MetaLexer : A Modular Lexical Specification Language](https://reader035.vdocument.in/reader035/viewer/2022062310/568161bd550346895dd19a0f/html5/thumbnails/31.jpg)
31
1 %%embed2 %name perclause3 %host aspect decl4 %guest pointcut5 %start [P ERCFLOW PERCFLOWBELOW PERTARGET6 P ERTHIS] LPAREN7 %end RPAREN8 %pair LPAREN , RPAREN9
10 %%embed11 %name pointcut12 %host java, aspect13 %guest pointcut14 %start POINTCUT15 %end SEM ICOLON
AspectJ and Extensions
![Page 32: MetaLexer : A Modular Lexical Specification Language](https://reader035.vdocument.in/reader035/viewer/2022062310/568161bd550346895dd19a0f/html5/thumbnails/32.jpg)
32
Using MetaLexer for an extensible front end for
McLab
PLDI 2011 Tutorial on McLab!!!!!
![Page 33: MetaLexer : A Modular Lexical Specification Language](https://reader035.vdocument.in/reader035/viewer/2022062310/568161bd550346895dd19a0f/html5/thumbnails/33.jpg)
33
MetaLexer scanner implemented in MetaLexer
1st version of MetaLexer written in JFlex, one for components and one for layouts.
2nd version implemented in MetaLexer, many shared components between the component lexer and the layout lexer.
![Page 34: MetaLexer : A Modular Lexical Specification Language](https://reader035.vdocument.in/reader035/viewer/2022062310/568161bd550346895dd19a0f/html5/thumbnails/34.jpg)
34
Ad-hoc systems with separate scanner/ LALR parser Polyglot JastAdd abc
Recursive-descent scanner/parser ANTLR and systems using ANTLR
Scannerless systems Rats! (PEGs)
Integrated systems Copper (modified LALR parser which communicates with DFA-based
scanner)
Related Work
![Page 35: MetaLexer : A Modular Lexical Specification Language](https://reader035.vdocument.in/reader035/viewer/2022062310/568161bd550346895dd19a0f/html5/thumbnails/35.jpg)
35
Conclusions MetaLexer allows one to specify modular
and extensible scanners suitable for any system that works with JFlex.
Two main ideas: meta-lexing and component/layout inheritance.
Used in large projects such as abc, McLab and MetaLexer itself.
Available at: www.sable.mcgill.ca/metalexer