polyglot an extensible compiler framework for java
DESCRIPTION
Polyglot An Extensible Compiler Framework for Java. Nathaniel Nystrom Michael R. Clarkson Andrew C. Myers Cornell University. Language extension. Language designers often create extensions to existing languages e.g., C++, PolyJ, GJ, Pizza, AspectJ, Jif, ArchJava, ESCJava, Polyphonic C#, ... - PowerPoint PPT PresentationTRANSCRIPT
PolyglotAn Extensible Compiler Framework for
Java
Nathaniel NystromMichael R. Clarkson
Andrew C. Myers
Cornell University
2
Language extension• Language designers often create
extensions to existing languages• e.g., C++, PolyJ, GJ, Pizza, AspectJ, Jif,
ArchJava, ESCJava, Polyphonic C#, ...• Want to reuse existing compiler
infrastructure as much as possible
• Polyglot is a framework for writing compiler extensions for Java
3
Requirements• Language extension
• Modify both syntax and semantics of the base language
• Not necessarily backward compatible• Goals:
• Easy to build and maintain extensions• Extensibility should be scalable
• No code duplication• Compilers for language extensions should
be open to further extension
4
Rejected approaches• In-place modification
• Macro languages• Limited to syntax extensions• Semantic checks after macro expansion
basecompiler
1.0
bug fixes &upgrades
basecompiler
2.0
copy &
modify
extensioncompiler
1.0
copy &
modify(again)
extensioncompiler
2.0
bug fixes &upgrades (again)
5
Polyglot• Base compiler is a complete Java front
end• 25K lines of Java
• Name resolution, inner class support, type checking, exception checking, uninitialized variable analysis, unreachable code analysis, ...
• Can reuse and extend through inheritance
6
Scalable extensibility
• Most compiler passes are sparse:
AST Nodes
Passes
Changes to the compiler should be proportional to changes in the
language.
+ if x e.f =
name resolution
type checking
exception checking
constant folding
+ if x e.f =
name resolution
type checking
exception checking
constant folding
+ if x e.f =
name resolution
type checking
exception checking
constant folding
+ if x e.f =
name resolution
type checking
exception checking
constant folding
+ if x e.f =
name resolution
type checking
exception checking
constant folding
7
Non-scalable approaches
Visitors
pass as ASTnode method(“naive OO”)
Polyglot
Easy to add or modifyPasses AST nodes
Using
8
Javasource
Javatarget
Javaparser
Codegenerator
AST rewritingpasses
Base Polyglot compiler
Polyglot architecture
Extsource
Javatarget
Extparser
Codegenerator
AST rewritingpasses
Ext2
sourceJava
target
Ext2
parserCode
generatorAST rewriting
passes
9
Architecture details• Parser written using PPG
• Adds grammar inheritance to Java CUP• AST nodes constructed using a node
factory• Decouples node types from implementation
• AST rewriting passes:• Each pass lazily creates a new AST• From naive OO: traverse AST invoking a
method at each node• From visitors: AST traversal factored out
10
Example: PAO• Primitive types as subclasses of Object• Changes type system, relaxes Java
syntax• Implementation: insert boxing and
unboxing code where needed
HashMap m;m.put(“two”, 2);int v = (int) m.get(“two”);
HashMap m;m.put(“two”, new Integer(2));int v = ((Integer) m.get(“two”)).intValue();
11
PAO implementation• Modify parser and type-checking pass
to permit e instanceof int• Parser changes with PPG:
include “java.cup”drop { rel_expr ::= rel_expr INSTANCEOF
ref_type }extend rel_expr ::= rel_expr:a INSTANCEOF
type:b {: RESULT = node_factory.Instanceof(a, b); :}
• Add one new pass to insert boxing and unboxing code
12
Implementing a new pass• Want to extend Node interface with rewrite() method
• Default implementation: identity translation• Specialized implementations: boxing and unboxing
• Mixin extensibility: extensions to a base class should be inherited by subclasses
typeCheck()
codeGen()
condthenelse
typeCheck()
codeGen()
lhsrhs
typeCheck()
codeGen()
Node
If Add
typeCheck()
codeGen()rewrite()
condthenelse
typeCheck()
codeGen()rewrite()
lhsrhs
typeCheck()
codeGen()rewrite()
13
Inheritance is inadequate
typeCheck()
codeGen()
condthenelse
typeCheck()
codeGen()
lhsrhs
typeCheck()
codeGen()
Node
If Add
typeCheck()
codeGen()rewrite()
condthenelse
typeCheck()
codeGen()rewrite()
lhsrhs
typeCheck()
codeGen()rewrite()
PaoNode
PaoIf PaoAdd
14
Inheritance is inadequatetypeCheck()codeGen()
Node
typeCheck()codeGen()
typeCheck()codeGen()
typeCheck()codeGen()
typeCheck()codeGen()
typeCheck()codeGen()typeCheck()
codeGen()typeCheck()codeGen()typeCheck()
codeGen()typeCheck()codeGen()
typeCheck()codeGen()typeCheck()
codeGen()typeCheck()codeGen()typeCheck()
codeGen()
typeCheck()codeGen()typeCheck()
codeGen()typeCheck()codeGen()typeCheck()
codeGen()typeCheck()codeGen()
typeCheck()codeGen()typeCheck()
codeGen()typeCheck()codeGen()typeCheck()
codeGen()typeCheck()codeGen()
typeCheck()codeGen()typeCheck()
codeGen()typeCheck()codeGen()typeCheck()
codeGen()typeCheck()codeGen()
typeCheck()codeGen()rewrite()
typeCheck()codeGen()rewrite()
typeCheck()codeGen()rewrite()
typeCheck()codeGen()rewrite()
typeCheck()codeGen()rewrite()
PaoNode
typeCheck()codeGen()rewrite()
typeCheck()codeGen()rewrite()
typeCheck()codeGen()rewrite()
typeCheck()codeGen()rewrite()
typeCheck()codeGen()rewrite()
typeCheck()codeGen()rewrite()
typeCheck()codeGen()rewrite()
typeCheck()codeGen()rewrite()
typeCheck()codeGen()rewrite()
typeCheck()codeGen()rewrite()
typeCheck()codeGen()rewrite()
typeCheck()codeGen()rewrite()
typeCheck()codeGen()rewrite()
typeCheck()codeGen()rewrite()
typeCheck()codeGen()rewrite()
typeCheck()codeGen()rewrite()
typeCheck()codeGen()rewrite()
typeCheck()codeGen()rewrite()
typeCheck()codeGen()rewrite()
typeCheck()codeGen()rewrite()
typeCheck()codeGen()rewrite()
typeCheck()codeGen()rewrite()
typeCheck()codeGen()rewrite()
typeCheck()codeGen()rewrite()
15
Extension objects
Use composition to mixin methods and fields into AST node classes
exttypeCheck(
)codeGen()
extcondthenelse
typeCheck()
codeGen()
extlhsrhs
typeCheck()
codeGen()
Node
IfAdd
extrewrite()
PaoExt
PAO extension objects; installed into
all nodes by node factory
null
16
Extension objects
Extension objects have their own ext field to leave extension open
exttypeCheck(
)codeGen()
extcondthenelse
typeCheck()
codeGen()
extlhsrhs
typeCheck()
codeGen()
Node
IfAdd
extrewrite()
PaoExt
exttypeCheck()ext_type_inf
onull
17
Method invocation• A method may be implemented in the node
or in any one of several extension objects.
• Extension should call node.ext.ext.typeCheck()
• Base compiler should call: node.typeCheck()
• Cannot hardcode the calls
exttypeCheck(
)codeGen()
Nodeext
rewrite()
PaoExtext
typeCheck()ext_type_inf
onull
18
Delegate objects• Each node & extension object has a del field• Delegate object implements same interface as
node or ext• Directs call to appropriate method implementation
• Ex: node.del.typeCheck()• Ex: node.ext.del.rewrite()
• Run-time overhead < 2%
delext
typeCheck()
codeGen()
Nodedelext
rewrite()
PaoExtdelext
typeCheck()ext_type_inf
o
typeCheck()
codeGen()
{ node.ext.ext.typeCheck() }{ node.codeGen() }
JavaDel
null
19
Scalable extensibility• To add a new pass:
• Use an extension object to mixin default implementation of the pass for the Node base class
• Use extension objects to mixin specialized implementations as needed
• To change the implementation of an existing pass
• Use delegate object to redirect to method providing new implementation
• To create an AST node type:• Create a new subclass of Node• Or, mixin new fields to existing node using an
extension object
20
Polyglot family tree
Polyglot base (Java)
parameterizedtypes
Coffer PolyJ Jif
PAO
Jif/split
JMatch covariantreturn
21
Results• Can build small extensions in hours or
days• 10% of base code is interfaces and
factoriesExtension #
Tokens
% of Base
Polyglot base (Java) 166K 100 Jif 129K 78 JMatch 108K 65 Jif/split 99K 60 PolyJ 79K 48 Coffer 24K 14 PAO 6.1K 3.6 parameterized types
3.2K 2
covariant return 1.6K 1javac 1.1 132K 80
22
Related work• Other extensible compilers
• e.g., CoSy, SUIF• e.g., JastAdd, JaCo
• Macros• e.g., EPP, Java Syntax Extender, Jakarta• e.g., Maya
• Visitors• e.g., staggered visitors, extensible visitors
23
Conclusions• Several Java extensions have been
implemented with Polyglot• Programmer effort scales well with size
of difference with Java• Extension objects and delegate objects
provide scalable extensibility• Download from:
http://www.cs.cornell.edu/projects/polyglot
24
AcknowledgmentsBrandon Bray JMatch
Michael Brukman PPG
Steve Chong Jif, Jif/split, covariant return
Matt Harren JMatch
Aleksey Kliger JLtools, PolyJ
Jed Liu JMatch
Naveen Sastry JLtools
Dan Spoonhower JLtools
Steve Zdancewic Jif, Jif/split
Lantian Zheng Jif, Jif/split
http://www.cs.cornell.edu/projects/polyglot
26
Mixin extensibility
typeCheck()
codeGen()
condthenelse
typeCheck()
codeGen()
lhsrhs
typeCheck()
codeGen()
Node
If Add
typeCheck()
codeGen()rewrite()
condthenelse
typeCheck()
codeGen()rewrite()
lhsrhs
typeCheck()
codeGen()rewrite()
Inheritance does not provide mixin extensibility:when a base class is extended, subclasses should
inherit the changes
27
Other Polyglot features• Quasi-quoting library
• Useful for translation from extension language AST to base language or intermediate language AST
qqStmt(“if (%e.relabelsTo(%e)) %s;
else %s;”,
new Object[] { L, Li, then_body, else_body });
• Automatic separate compilation• Serialize type information and store in the AST• Encoded into the class file via javac• Extracted from class file using reflection
• Data-flow analysis framework
28
PAO rewriting
• rewrite(ts) called for each AST node:
class PaoExt extends Ext { Node rewrite(PaoTypeSystem ts) { return node(); } }
class PaoInstanceofExt extends PaoExt { Node rewrite(PaoTypeSystem ts) { Instanceof e = (Instanceof) node(); Type rtype = e.compareType(); // e.g., “e instanceof int” “e instanceof Integer” if (rtype.isPrimitive()) return e.compareType(ts.boxedType(rtype)); else return n; } }
29
Node factories• Each extension has a node factory (nf)• To create a node of type T, call method
nf.T()• T() may return an extension-specific
subclass of T• T() attaches the extension and
delegate objects to T via a call to extT()• Mixin extensibility: if T is a subclass of
S, then extT() will call extS()
30
Results• Can build small extensions in hours,
days• 10% of base compiler code is interfaces
and factories Extension Lexer
Parser
Total
% of Base
javac 1.1 (excl. bytecode asm)
119K
72
Polyglot base 2.7K
11K 166K
100
Jif 2.7K
5.4K 129K
78
JMatch 2.7K
14K 108K
65
PolyJ 2.7K
3.5K 79K 48
Coffer 2.7K
2.7K 24K 14
PAO 2.7K
169 6.1K
3.6
Param 0 0 3.2K
2
covariant return 0 0 1.6K
1
31
Why not output bytecode?• Wanted to be able to read the output• The symmetry is satisfying• Limitations of Java as a target language
• Scoping rules sometimes make it difficult to output Java code, especially with inner classes
• Lack of goto can make generated control flow inefficient