![Page 2: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01c1a28abf838ccfd59/html5/thumbnails/2.jpg)
Parsing and Unparsing... in a Broad Sense
Vadim Zaytsev, Anya Helene Bagge, Parsing in a Broad Sense,MoDELS’14, LNCS 8767, pp.50-67, 2014, Springer.
![Page 3: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01c1a28abf838ccfd59/html5/thumbnails/3.jpg)
Program Representations
SLE tools use different program representations at different abstraction levels:
• textual: strings, tokens, ...• structural: parse trees, ASTs, ...• graphical: vector drawings, graphs, UML models,...
Different representations are typically connected by pairs of bidirectional transformations:
text AST
parsing
unparsing
![Page 4: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01c1a28abf838ccfd59/html5/thumbnails/4.jpg)
Textual Representations
• unstructured string of individual characters
• flat sequence of strings (lexemes)– incl. spaces, line breaks, comments etc. (layout)
• flat sequence of typed tokens– with attributes and lexemes but without layout
• structured sequence of typed token groups
f arg = arg +1;
f ;1+=arg' ' ' ' arg' ' ' '
id(f) ;num(1)+=id(arg) id(arg)
id(f) ;num(1)+=id(arg) id(arg)
![Page 5: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01c1a28abf838ccfd59/html5/thumbnails/5.jpg)
Structural Representations
• set of alternative parse trees (parse forest)“ambiguity node”
![Page 6: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01c1a28abf838ccfd59/html5/thumbnails/6.jpg)
Structural Representations
• parse tree (incl. layout)
![Page 7: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01c1a28abf838ccfd59/html5/thumbnails/7.jpg)
Structural Representations
• concrete syntax tree (without layout)
![Page 8: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01c1a28abf838ccfd59/html5/thumbnails/8.jpg)
Structural Representations
• abstract syntax tree
![Page 9: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01c1a28abf838ccfd59/html5/thumbnails/9.jpg)
Graphical Representations
• rasterized picture• vector graph (drawing)• generic graph• abstract graph model
![Page 10: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01c1a28abf838ccfd59/html5/thumbnails/10.jpg)
All representations can bemerged into one “Mega-Model”.
![Page 11: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01c1a28abf838ccfd59/html5/thumbnails/11.jpg)
Tokenization
Definition: A tokenizer tokL : Str → Tok for a lexical grammar L maps a character sequence c1,..., cn toa token sequence w1,...,wk so that their concate-nations are equal (i.e., c1+...+cn = w1+...+wk). Its reverse operation is concat.tokL and concat satisfy the following equations:
∀x Str: ∈ concat (tokL(x)) = x
∀y Tok: ∈ tokL(concat (y)) = ylanguage-independent
![Page 12: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01c1a28abf838ccfd59/html5/thumbnails/12.jpg)
Adding/Removing Layout
Definition: A strip operation strip: Tok → TTk removes layout information, while a format operation formatL : TTk → Tok introduces it.strip and formatL satisfy the following equation:
∀x TTk: ∈ strip(formatL(x)) = x
What about
∀y Tok: ∈ formatL(strip(y)) = y
language-independent
not injective
also for trees and graphs
![Page 13: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01c1a28abf838ccfd59/html5/thumbnails/13.jpg)
Parsing/Unparsing
keeps layout
ignores layout
![Page 14: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01c1a28abf838ccfd59/html5/thumbnails/14.jpg)
Imploding/Exploding ASTs
![Page 15: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01c1a28abf838ccfd59/html5/thumbnails/15.jpg)
Imploding/Exploding ASTs
![Page 16: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01c1a28abf838ccfd59/html5/thumbnails/16.jpg)
Pretty-Printing
![Page 17: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01c1a28abf838ccfd59/html5/thumbnails/17.jpg)
What is Pretty-Printing?
![Page 18: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01c1a28abf838ccfd59/html5/thumbnails/18.jpg)
What is Pretty-Printing?
A pretty printer takes an AST and a text width and• converts the AST into text (unparsing) • breaks the text to fit into the given text width (layout)
... so that the output is “pretty”
![Page 19: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01c1a28abf838ccfd59/html5/thumbnails/19.jpg)
What is Pretty?
w = 128:
var x:integer; y:char; begin x := 1; y := ’a’; end
w = 40: var x:integer; y:char; begin x := 1; y := ’a’; end
w = 30: var x:integer; y:char; begin x := 1; y := ’a’; end
var x:integer; y:char; begin x := 1;y := ’a’; end var
x:integer;y:char; beginx := 1;y := ’a’;end
![Page 20: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01c1a28abf838ccfd59/html5/thumbnails/20.jpg)
What is Pretty-Printing?
A pretty printer takes an AST and a text width and• converts the AST into text (unparsing) • breaks the text to fit into the given text width (layout)
... so that• line breaks and indentation represent logical
structure• line breaks and indentation are used consistently• line breaks are minimized
![Page 21: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01c1a28abf838ccfd59/html5/thumbnails/21.jpg)
Pretty-Printing Architecture
Bad Ideas:• print text during AST traversal• post-processing on raw text
Instead:• generate (new) AST containing text and mark-up
– for layout hints• interpret mark-up AST and generate raw text
(source)AST
unparsing (mark-up)AST
raw textlayout
language-independentsyntax-directed
![Page 22: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01c1a28abf838ccfd59/html5/thumbnails/22.jpg)
Oppen-style Mark-up
Oppen’s (core) algorithm uses two mark-up elements:• blanks: positions where a line can be broken
– can denote number of indentation spaces• groups: sequences of elements that are printed on
one line, if possible; otherwise each element is printed on its own line– represented as pair of opening and closing brackets– any two elements must be separated by a blank– blanks can be “inconsistent”, i.e., printer tries to fit as
many elements as possible on one line before breaking
![Page 23: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01c1a28abf838ccfd59/html5/thumbnails/23.jpg)
Oppen-style Mark-up
Examples:[[var blank(2) [x:integer; blank(0) y:char;]]
blank(0) [begin blank(2) [x := 1; blank(0) y := ’a’;] blank(0) end]]
vs.[[var x:integer; blank(4) y:char;]]
blank(0) [begin blank(0) [x := 1; blank(2) y := ’a’;] blank(0) end]]
![Page 24: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01c1a28abf838ccfd59/html5/thumbnails/24.jpg)
Box-style Mark-up
Uses boxes (similar to Oppen’s groups)• basic boxes
– plain strings keywords
– subtrees
• horizontal boxes
• vertical boxes
• more: HV, HOV, I, ALT
_1
“foo” KW [ “foo” ]
H [ ]B B Bhs=x
B B B
V [ ]
vs=y is=i B B B
B
B
B
![Page 25: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01c1a28abf838ccfd59/html5/thumbnails/25.jpg)
Pretty-print tables can be generated.
Exp.IfThen -- KW["if"] _1 KW["then"] _2, Exp.Let -- KW["let"] _1 KW["in"] _2 KW["end"],
Exp.Let.1:iter-star -- _1,Exp.Let.2:iter-star-sep -- _1 KW[";"]
"if" Exp "then" Exp -> Exp {"IfThen"}"let" Dec* "in" {Exp ";"}* "end" -> Exp {"Let"}
![Page 26: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01c1a28abf838ccfd59/html5/thumbnails/26.jpg)
Pretty-print tables can be modified.
"if" Exp "then" Exp -> Exp {"IfThen"}"let" Dec* "in" {Exp ";"}* "end" -> Exp {"Let"}
Exp.Let -- V vs=1 is=0 [ V vs=1 is=2 [KW["let"] _1] V vs=1 is=2 [KW["in"] _2] KW["end"]
]
![Page 27: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01c1a28abf838ccfd59/html5/thumbnails/27.jpg)
Further Reading
• Derek C. Oppen: Prettyprinting. ACM Trans. Program. Lang. Syst. 2(4):465-483 (1980).
• Philip Wadler: A prettier printer. In: The Fun of Programming. A symposium in honour of Professor Richard Bird's 60th birthday Examination Schools, Oxford, 24-25 March 2003.http://homepages.inf.ed.ac.uk/wadler/papers/prettier/prettier.pdf
![Page 28: Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering](https://reader036.vdocument.in/reader036/viewer/2022062805/5697c01c1a28abf838ccfd59/html5/thumbnails/28.jpg)
Further Reading (II)
• Mark van den Brand, Eelco Visser: Generation of Formatters for Context-Free Languages. ACM Trans. Softw. Eng. Methodol. 5(1):1-41 (1996).
• Tobi Vollebregt, Lennart C. L. Kats, Eelco Visser:Declarative specification of template-based textual editors. LDTA 2012: 8