![Page 1: Survey of Program Transformation Technologies](https://reader031.vdocument.in/reader031/viewer/2022030309/58f18a3e1a28ab36258b4613/html5/thumbnails/1.jpg)
This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC
Survey of Program Transformation
Technologies
LLNL-PRES-607473
Chunhua (Leo) Liao, Daniel J. Quinlan, and Adrian Prantl
Software Institute for Abstractions and Methodologies for HPC Simulations Codes on Future Architectures workshop
Dec. 10th, 2012 Chicago, IL, USA
![Page 2: Survey of Program Transformation Technologies](https://reader031.vdocument.in/reader031/viewer/2022030309/58f18a3e1a28ab36258b4613/html5/thumbnails/2.jpg)
Lawrence Livermore National Laboratory 2
Outline
Big picture
Program transformation techniques
• T1: String-based (scripting)
• T2: Compiler-based (direct IR modification)
• T3: Rule-based term rewrite
• T4: Semantic patches
Our relevant efforts
Summary
![Page 3: Survey of Program Transformation Technologies](https://reader031.vdocument.in/reader031/viewer/2022030309/58f18a3e1a28ab36258b4613/html5/thumbnails/3.jpg)
Lawrence Livermore National Laboratory 3
What is a program transformation?
Definition: modifications to an input program to generate an output program
— A program: a sequence of statements/instructions written to perform a specified task with a computer
Approaches: — Manual: prohibitively expensive
– E.g. Porting to a new platform: 160 lines per programmer day*: 17 years for 1 million SLOC
— Automated (semi-automated)
*P. Newcomb, R. Doblar, Automated Transformation of Legacy Systems, CrossTalk, December 2001.
Program
X
Program
Y
Modification
![Page 4: Survey of Program Transformation Technologies](https://reader031.vdocument.in/reader031/viewer/2022030309/58f18a3e1a28ab36258b4613/html5/thumbnails/4.jpg)
Lawrence Livermore National Laboratory 4
Program transformations in software
life-cycle/programming models
Code generation • Programming model implementation:
E.g: OpenMP, UPC, X10
• DSLs to general purpose code
• Compilation: source to binary
Program optimizations • loop unrolling, tiling, interchanging
• Inlining, parallelization, vectorization,
• Autotuning (empirical tuning): code variant generation
Code migration/porting • Fortran to C++, C++ to X10, etc
• Linux to Windows, Desktop to Embedded Systems, CPUs to GPUs, …
Code refactoring
• Variable renaming, code obfuscation
• Push member method up, extract
code to new method, …
Aspect-oriented programming
• Cross-cutting issues: inject support
for logging, resilience, persistence,
etc.
Program analysis
• Normalization, instrumentation
(coverage analysis)
…
![Page 5: Survey of Program Transformation Technologies](https://reader031.vdocument.in/reader031/viewer/2022030309/58f18a3e1a28ab36258b4613/html5/thumbnails/5.jpg)
Lawrence Livermore National Laboratory 5
Challenges to practical program
transformations Sheer size
• applications with millions lines
Multiple programming languages • real apps mixes of C/C++/Fortran, OpenMP/CUDA,
Python/Perl ...
Multiple configurations • #if .. #elseif .. #endif due to algorithm, library, platform variants
Representation of programs • Parsing and constructing internal representation
Analysis • Where (location& scope) and when (eligibility & profitability)
Modification • Correctness: individual transformations and their combinations
Program
Transformation
![Page 6: Survey of Program Transformation Technologies](https://reader031.vdocument.in/reader031/viewer/2022030309/58f18a3e1a28ab36258b4613/html5/thumbnails/6.jpg)
Lawrence Livermore National Laboratory 6
Outline
Big picture
Program transformation techniques
• T1: String-based (scripting)
• T2: Compiler-based (direct IR modification)
• T3: Rule-based term rewrite
• T4: Semantic patches
Our relevant efforts
Summary
![Page 7: Survey of Program Transformation Technologies](https://reader031.vdocument.in/reader031/viewer/2022030309/58f18a3e1a28ab36258b4613/html5/thumbnails/7.jpg)
Lawrence Livermore National Laboratory 7
T1: String-based transformation
(scripting)
Peephole transformations: variable renaming, etc.
• Representation: original string format
• Analysis: pattern match using regular expressions
• Modification: string replacement
✔ No need to parse input program
✔ Easy to learn and quick to use
✔ Widely/immediately available
sed -e 's/old_pattern/new_stuff/g' inputFileName > outputFileName
✘ Insufficient information: symbol resolving, CFG
✘ Only for localized simple transformation
✘ No access to advanced analysis
![Page 8: Survey of Program Transformation Technologies](https://reader031.vdocument.in/reader031/viewer/2022030309/58f18a3e1a28ab36258b4613/html5/thumbnails/8.jpg)
Lawrence Livermore National Laboratory 8
T2: Direct IR (Intermediate
Representation) modification
Based on matured compiler technologies
• Representation:
— Parsing to IR: High levels to low levels, with symbol tables
• Analysis:
— Control flow, data flow, dependence, etc.
• Modification:
— Procedural code directly manipulating IR
• Abundant choices: GCC, Open64, ROSE, Cetus,
LLVM, …
— Classic vs. source-to-source
![Page 9: Survey of Program Transformation Technologies](https://reader031.vdocument.in/reader031/viewer/2022030309/58f18a3e1a28ab36258b4613/html5/thumbnails/9.jpg)
Lawrence Livermore National Laboratory 9
Example: Open64 using multiple
levels of IRs
Lower to High IR
GCC
C/C++
Cray
Fortran
Inter Procedural
Analyzer Loop Nest
Optimizer
Inliner
Global Scalar Optimizer
Lower all
Lower I/O
Lower Mid W
Code Generation
-O3 -IPA
.w2c.c
.w2c.f
-O0
-O2/O3
Very high WHIRL
High WHIRL
Mid WHIRL
Low WHIRL
Take either path
(only for f90)
WHIRL2 C/Fortran
-INLINE
-CLIST/
-FLIST
IA-64, x86, MIPS, …
http://www.open64.net/
![Page 10: Survey of Program Transformation Technologies](https://reader031.vdocument.in/reader031/viewer/2022030309/58f18a3e1a28ab36258b4613/html5/thumbnails/10.jpg)
Lawrence Livermore National Laboratory 10
Example: ROSE using a single high
level AST
EDG Front-end/
Open Fortran
Parser
Abstract
Syntax Tree
(AST)
Unparser
ROSE–based source-to-source tools
http://www.roseCompiler.org
Generic
Analyses/
Transformations/
Optimizations
Custom
Analyses/
Transformations/
Optimizations
Analyzed/
Transformed/
Optimized
Source Code
Input
C/C++/Fortran
OpenMP/UPC
Source Code
Vendor
Compiler
Machine
Executable
Developed at LLNL
![Page 11: Survey of Program Transformation Technologies](https://reader031.vdocument.in/reader031/viewer/2022030309/58f18a3e1a28ab36258b4613/html5/thumbnails/11.jpg)
Lawrence Livermore National Laboratory 11
ROSE AST for a simple program
1. int main()
2. {
3. int i=0;
4. i++;
5. return i;
6. }
S3
S4 S5
S2
![Page 12: Survey of Program Transformation Technologies](https://reader031.vdocument.in/reader031/viewer/2022030309/58f18a3e1a28ab36258b4613/html5/thumbnails/12.jpg)
Lawrence Livermore National Laboratory 12
Procedural code to create a for loop
// Grab a scope in which the code will be built
SgBasicBlock *func_body = func_def->get_definition ()->get_body ();
…
// for(i=0;..)
SgStatement* init_stmt= buildAssignStatement(buildVarRefExp("i"),buildIntVal(0));
// for(..,i<100,...) It is an expression, not a statement
SgExprStatement* cond_stmt=
buildExprStatement(buildLessThanOp(buildVarRefExp("i"),buildIntVal(100)));
// for (..,;...;i++); not ++i;
SgExpression* incr_exp =
buildPlusPlusOp(buildVarRefExp("i"),SgUnaryOp::postfix);
// j++; as loop body statement
SgStatement* loop_body=
buildExprStatement(buildPlusPlusOp(buildVarRefExp("j"),SgUnaryOp::postfix));
// build for (i=0; i<100; i++) {j++}
SgForStatement*for_stmt = buildForStatement (init_stmt,cond_stmt,incr_exp,loop_body);
appendStatement (for_stmt, func_body);
Bottom-up
construction
void foo()
{
int i;
int j;
for (i=0;i++;i<100)
j++;
}
✔ Detailed representation of input program
✔ Familiar APIs: C/C++ interfaces
✔ Access to advanced compiler analysis
✔ Arbitrary transformation: trivial to radical
✘ Parsing/building IR is hard, especially for C++
✘ Learning curve to IR (AST)
✘ Tedious/error-prone coding for traversal,
pattern match, IR manipulation, etc.
Transparently handle
edges, symbols
![Page 13: Survey of Program Transformation Technologies](https://reader031.vdocument.in/reader031/viewer/2022030309/58f18a3e1a28ab36258b4613/html5/thumbnails/13.jpg)
Lawrence Livermore National Laboratory 13
Rewriting systems:
— Rule (lhs → rhs ): a logic formalism to express transformation between objects
— Iff a rewrite system formed by the rules is both
confluent and terminating,
the order of rule application is not significant, the system converges to a normal form.
— Otherwise, a rewrite strategy to control which rule is applied first
T3: Rule-based term rewriting
Functor Atom
Variable
List
Term Rewriting: rules for terms: nested expressions of Atoms, Functors, Lists, Variables.
— [add_expr(X, int_val(2))]
s1
s2 s3
s4
s1
s2
s3
![Page 14: Survey of Program Transformation Technologies](https://reader031.vdocument.in/reader031/viewer/2022030309/58f18a3e1a28ab36258b4613/html5/thumbnails/14.jpg)
Lawrence Livermore National Laboratory 14
Term rewriting applied to programs
Programs -> trees/ terms -> rewriting -> trees/ terms -> programs
• Representation: abstract syntax tree (AST) == nested terms
• Analysis: pattern match
• Modification: substitution (term replacement) specified via rules/strategies
Colors indicate node types, box indicates arbitrary substructure.
Pattern to match Pattern to substitute Input structure that
contains pattern
Output structure
after rewriting
Example structure of a rewrite rule:
Lhs -> rhs
Example application of rewriting:
![Page 15: Survey of Program Transformation Technologies](https://reader031.vdocument.in/reader031/viewer/2022030309/58f18a3e1a28ab36258b4613/html5/thumbnails/15.jpg)
Lawrence Livermore National Laboratory 15
Example: Stratego/XT (aka Spoofax)
Stratego/XT is an implementation of a term
rewrite system
• Stratego: a language for specifying transformations
— Rewrite rules: transformations
— Custom strategies to apply rewrite rules: traversals like
innermost, topdown, bottomup, repeat, etc.
• XT: collection of tools
— Parser generator: parser to generate nested terms
— Pretty printer generator: unparser to generate source code
— Grammar tools
http://strategoxt.org/Spoofax/
![Page 16: Survey of Program Transformation Technologies](https://reader031.vdocument.in/reader031/viewer/2022030309/58f18a3e1a28ab36258b4613/html5/thumbnails/16.jpg)
Lawrence Livermore National Laboratory 16
Rule R1: lhs_term -> rhs_term
While(e, stm) -> If(e, DoWhile(stm, e))
Term rewrite: example
if(e) do { stmts; } while(e);
while(e) { stmts; }
✔ More user-friendly than others
✔ High level transformation rules and their
application strategies
✘ Limited access to compiler analysis
✘ Learning curve for the transformation
language
Strategies:
simplify = bottomup(repeat(R1 <+ ... <+ Rn))
simplify = topdown(R1 <+ ... <+ Rn)
![Page 17: Survey of Program Transformation Technologies](https://reader031.vdocument.in/reader031/viewer/2022030309/58f18a3e1a28ab36258b4613/html5/thumbnails/17.jpg)
Lawrence Livermore National Laboratory 17
Coccinelle: program matching and transformation tool for unpreprocessed C code
Systematical bug finding and fixing
Collateral evolutions: changes to APIs -> changes to client codes
Semantic Patch Language (SmPL)
Declarative, patch-like syntax
Supports type declarations
Language-aware pattern matching (NOT literal matching!)
Abstract away diff in spacing, indentation, comments, coding style variants, irrelevant code, etc.
T4: Semantic patches - Coccinelle
http://coccinelle.lip6.fr/
![Page 18: Survey of Program Transformation Technologies](https://reader031.vdocument.in/reader031/viewer/2022030309/58f18a3e1a28ab36258b4613/html5/thumbnails/18.jpg)
Lawrence Livermore National Laboratory 18
Transformation Engine*
Parse
C files
Translate to
IR/CFG
Parse
semantic patch
Expand
isomorphisms
Translate to
CTL
CTL: Computational Tree Logic
With extra features
*Source http://coccinelle.lip6.fr/Intro_gen.pdf
Matching using
model checking
Modify
IR/CFG
Unparse
![Page 19: Survey of Program Transformation Technologies](https://reader031.vdocument.in/reader031/viewer/2022030309/58f18a3e1a28ab36258b4613/html5/thumbnails/19.jpg)
Lawrence Livermore National Laboratory 19
Semantic Patch Language (SmPL)
Examples
Example: Replace boolean expression
@@ // type declaration expression E; constant C; @@ - !E & C // pattern + !(E & C) // Replacement
Example: Change from kmalloc() with explicit init to kzalloc() with built in init. @@
expression x;
expression E1,E2;
@@
- x = kmalloc(E1,E2);
+ x = kzalloc(E1,E2);
... // ignore irrelevant code
- memset(x, 0, E1); ✔ Intuitive user input using patch-like language
✔ Used in Linux kernel development
✘ Only supports the C language
✘ Limited access to compiler analysis
![Page 20: Survey of Program Transformation Technologies](https://reader031.vdocument.in/reader031/viewer/2022030309/58f18a3e1a28ab36258b4613/html5/thumbnails/20.jpg)
Lawrence Livermore National Laboratory 20
Comparison
T1: Scripting
(Sed)
T2: Direct IR
modification
T3: Term rewriting T4: Semantic
patching
Program
Representation
String IR/AST AST terms IR
/Control Flow Graph
Transformation
Specification
s/in/out/g Low level
procedural code
High level language
for rules/strategies
Declarative
patch-like language
Input
Languages
All All (Limited by
frontends)
All (Limited by
frontends)
C only
Output String IR/ Source or binary Source (AST terms) Source
Analysis Regular
Expression
Data flow, control flow,
dependence, etc.
Pattern-match Pattern-match
/Model checking
Easy to use Easy Hard Medium Easy
Powerfulness Weak Strong Medium Medium
Robustness Weak Strong Medium Medium
![Page 21: Survey of Program Transformation Technologies](https://reader031.vdocument.in/reader031/viewer/2022030309/58f18a3e1a28ab36258b4613/html5/thumbnails/21.jpg)
Lawrence Livermore National Laboratory 21
Manual
Refinement
Exec
Resilience
Machine Learning
& Formal Methods
Parameterized
Abstract
Machine Model
Refinement/
Transformation
s
Refinement/
Lowering Vendor
Compiler
Performance
Tools
X10/SEEC
Runtime
Scalable Data
Structures
Levels of ROSE AST
DSL 1 ..N
Specification
ROSE-based
DSL Compiler
Semantic Analysis
DSL 1 ..N
Programs
DOE Apps
Rosebud DSL
Compiler Generator
Parser Generator
Rewrite system
Grammar system
Migration
Process
Compiler analysis &
Transformations
ROSE
Recording & Mapping
Front-end
Sketch-based
Transformations
http://www.dtec-xstack.org
D-TEC – “DSL Technology for Exascale Computing”
![Page 22: Survey of Program Transformation Technologies](https://reader031.vdocument.in/reader031/viewer/2022030309/58f18a3e1a28ab36258b4613/html5/thumbnails/22.jpg)
Lawrence Livermore National Laboratory 22
Minitermite: term rewriting
leveraging ROSE
• Minitermite connects ROSE with
Stratego/XT and other term-based tools
• Rewrite C++ and Fortran
• Retains column/line/preprocessing
info
• Released with ROSE already!
• Work in progress to bring semantic-
patch-like functionality
Example. http://compose-hpc.sourceforge.net ROSE+Stratego were used to transform parts of NWChem (2.9M loc/Fortran+Global Arrays) to add instrumentation and
increase parallelism.
The transformation improves performance up to 4×. [PNNL]
Stratego/XT
Source Code
C, C++,
Fortran
Transformed
Source Code
C, C++, Fortran
Minitermite (src2term --stratego)
Minitermite (term2src --stratego)
ROSE
Frontend
ROSE
Unparser
Rewrite
Rules
ROSE
Abstract Syntax Tree (AST)
Term Representation
Transformed
ROSE AST
Transformed Term
Representation
![Page 23: Survey of Program Transformation Technologies](https://reader031.vdocument.in/reader031/viewer/2022030309/58f18a3e1a28ab36258b4613/html5/thumbnails/23.jpg)
Lawrence Livermore National Laboratory 23
Summary
Program transformation • Indispensable for each stage of software life-cycle
• Code generation, analysis, optimization, migration, reverse-engineering, etc.
Difficulties • Theory: accurate eligibility and profitability analysis, correctness of
transformation
• Practice: parsing, sheer size, mixed & complex languages, multiple configurations, diverse requirements, etc.
Solution • Re-usable, common transformation infrastructure combining multiple
techniques — Parsing, analysis, direct AST modification, rewrite system, etc.
• Customized tools built in collaboration between compiler experts and end users
![Page 24: Survey of Program Transformation Technologies](https://reader031.vdocument.in/reader031/viewer/2022030309/58f18a3e1a28ab36258b4613/html5/thumbnails/24.jpg)
Lawrence Livermore National Laboratory 24
Thank You!
Questions?
![Page 25: Survey of Program Transformation Technologies](https://reader031.vdocument.in/reader031/viewer/2022030309/58f18a3e1a28ab36258b4613/html5/thumbnails/25.jpg)
Lawrence Livermore National Laboratory 25
Many other tools
http://www.txl.ca/ : The TXL Programming Language. Rule-based source-to-source transformation. traversals are part of rewrite rules.
http://www.semdesigns.com: DMS Software Reengineering Toolkit. Commercial product.
http://www.meta-environment.org/: ASF+SDF Meta-Environment for interactive program analysis and transformation. It combines SDF (Syntax Definition Formalism), ASF (Algebraic Specification Formalism) and other technologies. • http://www.rascal-mpl.org/ : Rascal–Meta Programming Language to
combine both source code analysis and manipulation
http://www.eclipse.org/aspectj/: AspectJ: Aspect-oriented programming extension to Java
![Page 26: Survey of Program Transformation Technologies](https://reader031.vdocument.in/reader031/viewer/2022030309/58f18a3e1a28ab36258b4613/html5/thumbnails/26.jpg)
Lawrence Livermore National Laboratory 26
Taxonomy of program
transformation*
*Source: http://www.program-transformation.org/
Aspect
Language
High-Level
Language X
High-Level
Language Y
Low-Level
Language Z
1.4 Analysis 1.3 Migration
1.2 Reverse
Engineering 1.1 Synthesis
2. Rephrasing (Within one language)
1. Translation (Across languages)
2. Rephrasing 2. Rephrasing
![Page 27: Survey of Program Transformation Technologies](https://reader031.vdocument.in/reader031/viewer/2022030309/58f18a3e1a28ab36258b4613/html5/thumbnails/27.jpg)
Lawrence Livermore National Laboratory 27
Generic transformation steps
Input
Program Output
Program
Intermediate Representation (IR)
/Abstract Syntax Tree (AST)
1.
Parse 4.
Unparse
2.
Analysis* 3.
Transformation*
*Interleaved and/or repeated
Pretty printer
![Page 28: Survey of Program Transformation Technologies](https://reader031.vdocument.in/reader031/viewer/2022030309/58f18a3e1a28ab36258b4613/html5/thumbnails/28.jpg)
Lawrence Livermore National Laboratory 28
Variants of Direct IR Modification IR=intermediate representation
Input: languages
• depend on the compiler frontends (parsers)
Levels of intermediate representation (IR):
• High level (close to source code, complex, many idioms)
• Low level (close to binary instructions, normalized, minimal)
Driver: data flow analysis in compiler • Programs: data (values) + control (control flow graph, call graph)
• Harder to implement source-to-source (complex IR)
• Examples: Reaching definition, live variables, constant propagation, partial redundancy, loop optimizations, …
APIs to manipulate IR (and symbol tables)
• Traversal, creation, copy, removal, insertion, ...
Output: depend on the level of IR
• Source-to-source compilers can output human-readable and compilable output with comments/preprocessing info, ...
![Page 29: Survey of Program Transformation Technologies](https://reader031.vdocument.in/reader031/viewer/2022030309/58f18a3e1a28ab36258b4613/html5/thumbnails/29.jpg)