smack: behind the refactorings

36
SMACC: BEHIND THE REFACTORINGS Jason Lecerf & Thierry Goubier May 17, 2017

Upload: pharo

Post on 28-Jan-2018

124 views

Category:

Software


4 download

TRANSCRIPT

Page 1: Smack: Behind the Refactorings

SMACC: BEHIND THE REFACTORINGS

Jason Lecerf & Thierry Goubier

May 17, 2017

Page 2: Smack: Behind the Refactorings

TABLE OF CONTENTS

1 Introduction

2 Overview of SmaCCGeneral workflowFrom a user point of view

3 The rewrite engineThe engineAnatomy of the rewritesAnd RB

4 Final words

SmaCC & rewriting Jason Lecerf & Thierry Goubier May 17, 2017 2/36

Page 3: Smack: Behind the Refactorings

SMACC

The Smalltalk Compiler Compiler is a parser generator forSmalltalk

Originally developed by John Brant & Don Roberts

Used in Moose, Synectique, CEA, RefactoryWorkers

SmaCC & rewriting Jason Lecerf & Thierry Goubier May 17, 2017 3/36

Page 4: Smack: Behind the Refactorings

CONTEXT

Architecture to generate the front-end of compilers

for DSL parsingfor program analysisfor program migrationfor refactoring and transformation of source codefor compiler front-end implementation

SmaCC is written in Smalltalk and generate parsers inSmalltalk

It has the infrastructure for generating parsers in otherprogramming languages

SmaCC & rewriting Jason Lecerf & Thierry Goubier May 17, 2017 4/36

Page 5: Smack: Behind the Refactorings

EXAMPLE OF USE

What you want to do:

Find pattern in code written in a (niche) language

Refactor these patterns into something new

What you will need:

The grammar for your language (if it does not already existsin SmaCC)

Your patterns and related transformations

Your program

SmaCC & rewriting Jason Lecerf & Thierry Goubier May 17, 2017 5/36

Page 6: Smack: Behind the Refactorings

TABLE OF CONTENTS

1 Introduction

2 Overview of SmaCCGeneral workflowFrom a user point of view

3 The rewrite engineThe engineAnatomy of the rewritesAnd RB

4 Final words

SmaCC & rewriting Jason Lecerf & Thierry Goubier May 17, 2017 6/36

Page 7: Smack: Behind the Refactorings

CAPABILITIES

Automated generation of LR parsers

Input: the specification of the grammar

Output: parser for the grammar

can create arbitrary code run at parse timecan create an AST1

1Abstract Syntax Tree SmaCC & rewriting Jason Lecerf & Thierry Goubier May 17, 2017 7/36

Page 8: Smack: Behind the Refactorings

SMACC GENERATION PIPELINE

Figure: SmaCC overall pipeline

SmaCC & rewriting Jason Lecerf & Thierry Goubier May 17, 2017 8/36

Page 9: Smack: Behind the Refactorings

SMACC INPUT: THE GRAMMAR

In a slightly modified BNF2 form

CondExp

: ArithExp

| BitwiseExp

| RelationalExp

| BoolExp

| TernaryExp

| <lpar > CondExp <rpar >

| "defined (" Id <rpar >

| Id

| Number

;

#if NB_BITS < 16

#elif NB_BITS < 32

#else

#endif

2Bachus-Naur Form: a meta-grammarSmaCC & rewriting Jason Lecerf & Thierry Goubier May 17, 2017 9/36

Page 10: Smack: Behind the Refactorings

SMACC PARSER GENERATION

The generation:

Produces a DFA lexer (the scanner)

Produces either LR(1) or LALR(1) parsers

Can be augmented to support GLR parsing

Generate methods for parse table transitions

not exactly the parse tables themselves (optimizations weredone)no tables, just methods for the lexer

LR Standard parser for context-free grammars

LALR Merge states resulting in a smaller memory footprint

GLR Try all the possible transitions for a state

SmaCC & rewriting Jason Lecerf & Thierry Goubier May 17, 2017 10/36

Page 11: Smack: Behind the Refactorings

SMACC PARSER GENERATION OUTPUT

The generated parser is a Smalltalk package containing:

a Scanner class

a Parser class

the AST node classes

a generic AST visitor

Running the parser on an input pogram:

produce AST nodes instances

execute arbitrary code given to the grammar

SmaCC & rewriting Jason Lecerf & Thierry Goubier May 17, 2017 11/36

Page 12: Smack: Behind the Refactorings

GRAMMAR WITH ARBITRARY CODE EXECUTION

Expression

: Expression ’left ’ "+" Expression ’right ’

{left + right}

| Expression ’left ’ "-" Expression ’right ’

{left - right}

| Expression ’left ’ "*" Expression ’right ’

{left * right}

| Expression ’left ’ "/" Expression ’right ’

{left / right}

| Expression ’left ’ "^" Expression ’right ’

{left raisedTo: right}

| "(" Expression ’expression ’ ")" {expression}

| Number ’number ’ {number}

;

Number

: <number > ’numberToken ’

{numberToken value asNumber}

;

SmaCC & rewriting Jason Lecerf & Thierry Goubier May 17, 2017 12/36

Page 13: Smack: Behind the Refactorings

GRAMMAR WITH AST PRODUCTION

Expression

: Expression ’left ’ "+" ’op’ Expression ’right ’

{{ Expression }}

| Expression ’left ’ "-" ’op’ Expression ’right ’

{{ Expression }}

| Expression ’left ’ "*" ’op’ Expression ’right ’

{{ Expression }}

| Expression ’left ’ "/" ’op’ Expression ’right ’

{{ Expression }}

| Expression ’left ’ "^" ’op’ Expression ’right ’

{{ Expression }}

| "(" Expression ’expression ’ ")" {{}}

| Number

;

Number

: <number > {{ Number }}

;

SmaCC & rewriting Jason Lecerf & Thierry Goubier May 17, 2017 13/36

Page 14: Smack: Behind the Refactorings

TABLE OF CONTENTS

1 Introduction

2 Overview of SmaCCGeneral workflowFrom a user point of view

3 The rewrite engineThe engineAnatomy of the rewritesAnd RB

4 Final words

SmaCC & rewriting Jason Lecerf & Thierry Goubier May 17, 2017 14/36

Page 15: Smack: Behind the Refactorings

EXTENDED SMACC PIPELINE

Figure: Extended SmaCC pipeline

SmaCC & rewriting Jason Lecerf & Thierry Goubier May 17, 2017 15/36

Page 16: Smack: Behind the Refactorings

PRECONDITIONS

Parser for the language

Enable GLR parsing in the grammar

Declare a pattern token in the grammar

usually in between backquotes since they are used in barelyany language

SmaCC & rewriting Jason Lecerf & Thierry Goubier May 17, 2017 16/36

Page 17: Smack: Behind the Refactorings

INPUT TO THE REWRITE ENGINE

Pattern

Metavariables

Transformation

Parser: MyExpressionParser

>>>‘a‘ + ‘b‘<<<

->

>>>‘a‘ ‘b‘ +<<<

SmaCC & rewriting Jason Lecerf & Thierry Goubier May 17, 2017 17/36

Page 18: Smack: Behind the Refactorings

REWRITTEN OUTPUT EXAMPLE

Input program:

(3 + 4) + (4 + 3)

Rewritten program using the rewrite engine:

3 4 + 4 3 + +

SmaCC & rewriting Jason Lecerf & Thierry Goubier May 17, 2017 18/36

Page 19: Smack: Behind the Refactorings

ANATOMY OF THE REWRITE ENGINE

Figure: SmaCC rewriting process

SmaCC & rewriting Jason Lecerf & Thierry Goubier May 17, 2017 19/36

Page 20: Smack: Behind the Refactorings

PARSING OF THE PATTERN STRING

Metavariables can match any nodes (unless specifiedotherwise)

can be modified to match list of nodes or specific types ofnodes

Use the GLR parser to parse the pattern

Try all the possible starting symbols (entry points) of thegrammar

ex: Methods, expressions, method callnot only the top entry point (often ”Program”)

Get all the possible ASTs for the pattern

SmaCC & rewriting Jason Lecerf & Thierry Goubier May 17, 2017 20/36

Page 21: Smack: Behind the Refactorings

PRELIMINARY OPERATIONS

1 Parse program using the GLR parser

Produces the program AST

2 Parse pattern using the GLR parser

If there are conflicts: we get a forest of treesIf there are pattern nodes (metavariables): we get a forest oftrees if valid for the grammarOtherwise: we get a single tree

SmaCC & rewriting Jason Lecerf & Thierry Goubier May 17, 2017 21/36

Page 22: Smack: Behind the Refactorings

SIMPLIFIED SPLITFORPATTERNTOKEN

if currentToken = patternToken thenfor all symbol in {tokens OR non-terminal nodes} do

actionsToProcess ←all possible LR actions for symbolfor all LR action in actionsToProcess do

Check if action was not already performedif symbol = Token OR

(symbol = Node AND action = reduction) thenAdd a token interpretation to the current tokenTry to perform current LR action

else if symbol = Node AND action = shift thenstateStack add new ambiguous state

end ifend for

end forRemove current pattern token state

end if SmaCC & rewriting Jason Lecerf & Thierry Goubier May 17, 2017 22/36

Page 23: Smack: Behind the Refactorings

MATCHING OF THE PATTERN

Based on ambiguity handling by GLR

Reuses the parser for your grammar

Based on the parse tables of said parser

When parsing a pattern token:

Try all the valid action-token combinations (transitions) for thecurrent stateWhen conflicts arise (i.e. more than one transition is possible),fork the parser

SmaCC & rewriting Jason Lecerf & Thierry Goubier May 17, 2017 23/36

Page 24: Smack: Behind the Refactorings

ANATOMY OF THE REWRITE ENGINE

Figure: SmaCC rewriting process

SmaCC & rewriting Jason Lecerf & Thierry Goubier May 17, 2017 24/36

Page 25: Smack: Behind the Refactorings

UNIFICATION ALGORITHM

Require: patternForest, programTreefor all programNode in programTree do. Depth first traversal

for all patternTree in patternForest dofor all patternNode in patternTree do

if patternNodeclass = programNodeclass thenTries to match patternNode subnodes with

programNode subnodeselse

continueend if

end forend for

end for

SmaCC & rewriting Jason Lecerf & Thierry Goubier May 17, 2017 25/36

Page 26: Smack: Behind the Refactorings

UNIFICATION ALGORITHM

Figure: SmaCC rewriting process

SmaCC & rewriting Jason Lecerf & Thierry Goubier May 17, 2017 26/36

Page 27: Smack: Behind the Refactorings

EQUALITY

ArgumentListPar

: "(" (Expression ’argument ’

("," Expression ’argument ’)* )? ")"

Node equality Class are identical and every subnodes, subtokensmatch

Token equality Both values are identical (same string)

Here: the left and right parenthesis tokens

Node collection equality Every individual node matches

Here: the list of n − 1 commas

Token collection equality Every individual token matches

Here: the list of n Expression argumentsSmaCC & rewriting Jason Lecerf & Thierry Goubier May 17, 2017 27/36

Page 28: Smack: Behind the Refactorings

CONTEXT & METAVARIABLE BINDING

Figure: SmaCC binding & rewriting process

SmaCC & rewriting Jason Lecerf & Thierry Goubier May 17, 2017 28/36

Page 29: Smack: Behind the Refactorings

ANATOMY OF THE REWRITE ENGINE

Figure: SmaCC rewriting process

SmaCC & rewriting Jason Lecerf & Thierry Goubier May 17, 2017 29/36

Page 30: Smack: Behind the Refactorings

REWRITING

When nodes match, they are stored in the context

Even if it seems intuitive, SmaCC does not perform ASTrewriting

Is rewriting a part of a tree really that simple ?And what if I rewrite in another programming language ?

When rewriting, only transform the source of the nodes to thesource of the transform

i.e.: the source of the transform is not parsed

SmaCC & rewriting Jason Lecerf & Thierry Goubier May 17, 2017 30/36

Page 31: Smack: Behind the Refactorings

WHAT ABOUT RB ?

RB and SmaCC share the same creators

But Smalltalk is a very simple language (to parse)

It is simple to specify a pattern tree directlyUse a bit the parser to complete the pattern treeRule: a pattern is valid Smalltalk codeBut may match a slightly different tree (message)

Subtree matching algorithm is exactly the same as in SmaCC

For example, seeRBPatternMethodNode>>#match:inContext:

SmaCC & rewriting Jason Lecerf & Thierry Goubier May 17, 2017 31/36

Page 32: Smack: Behind the Refactorings

TABLE OF CONTENTS

1 Introduction

2 Overview of SmaCCGeneral workflowFrom a user point of view

3 The rewrite engineThe engineAnatomy of the rewritesAnd RB

4 Final words

SmaCC & rewriting Jason Lecerf & Thierry Goubier May 17, 2017 32/36

Page 33: Smack: Behind the Refactorings

SMACC APPROACH

Put some (most) of the complexity in the parser

Use reflexivity on the grammar

The grammar specify all correct phrases (all valid sequences oftokens)The AST directives specify all possibles nodes and trees ofnodes (complete type specification)The parser contains all that information in the state tablesQuery it!

Match and rewrite (on a large scale... over a million lines ofcode)

SmaCC & rewriting Jason Lecerf & Thierry Goubier May 17, 2017 33/36

Page 34: Smack: Behind the Refactorings

CONCLUSION

SmaCC is a parser generator extended with pattern matchingand rewriting capabilities

Uses the parser as a way to reflect on the grammar and buildpattern trees

Tree traversal for matching is depth first (ASTs are not verydeep)

Generalization of the Refactoring Browser in the case of an”arbitrary” grammar

Used extensively by John Brant, Thierry Goubier and others...

SmaCC & rewriting Jason Lecerf & Thierry Goubier May 17, 2017 34/36

Page 35: Smack: Behind the Refactorings

WIP: SMACC BOOKLET

Tutorial and documentation bookfor SmaCC

SmaCC & rewriting Jason Lecerf & Thierry Goubier May 17, 2017 35/36

Page 36: Smack: Behind the Refactorings

Thank you!

Commissariat a l’energie atomique et aux energies alternatives

Campus Saclay Nano-Innov F-91191 Gif-sur-Yvette Cedexwww-list.cea.fr

Etablissement public a caractere industriel et commercial RCS Paris B 775 685 019