typed trees and tree walking in c - university of birminghamhxt/2016/c-plus-plus/trees-in... ·...
TRANSCRIPT
![Page 1: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/1.jpg)
Typed trees and tree walking in Cwith struct, union, enum, and switch1
Hayo ThieleckeUniversity of Birmingham
http://www.cs.bham.ac.uk/~hxt
February 16, 2017
+
1 ∗
x 2
1and pointers, of courseHayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 1
![Page 2: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/2.jpg)
Introduction to this section of the module
Different kinds of trees in C
union
Struct, union and enum
union, enum and switch
Adding recursion ⇒ trees
Extended example: abstract syntax trees as C data structures
C data structures and functional programming
Example: a recursive-descent parser in C
Object orientation and the expression problem
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 2
![Page 3: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/3.jpg)
Progression: position of this module in the curriculum
First year Software Workshop, functional programming,Language and Logic
Second year C/C++
Final year Operating systems, compilers, parallel programming
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 3
![Page 4: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/4.jpg)
Outline of the module (provisional)
I am aiming for these blocks of material:
1. pointers+struct+malloc+free⇒ dynamic data structures in C as used in OS X
2. pointers+struct+union+tree⇒ trees in Csuch as parse trees and abstract syntax trees
3. object-oriented trees in C++composite and visitor patterns
4. templates in C++parametric polymorphism
An assessed exercise for each.
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 4
![Page 5: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/5.jpg)
Trees from struct and pointers
I We have seen n-ary trees built from structures and pointersonly
I recursion ends by NULL pointers
I hence if(p) and while(p) idioms
I only one kind of node
I sufficient for some situations, e.g. much OS code
I But there are more complex trees in computer science
I different kinds of nodes with different numbers and kinds ofchild nodes
I needs a type system of different nodes
I canonical example: abstract syntax trees
I fundamental ideas in compiling
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 5
![Page 6: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/6.jpg)
Struct, union, and enum idioms
I How do we represent typed trees, such as abstract syntaxtrees or parse trees?
I Composite pattern in OO
I In functional languages: pattern matching
I Based on and inspired by: patterns, expression problem, typetheory, compilers
I Pitfall: “pattern” means different things here:OO desing patterns vs pattern-matching in OCaml and Haskellusually clear from context
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 6
![Page 7: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/7.jpg)
union syntax
The syntax of union is like that of struct:
union u {
T1 m1;
T2 m2;
...
Tk mk;
};
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 7
![Page 8: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/8.jpg)
Structure vs union layout in memorystruct s {
T1 m1;
T2 m2;
};
m1
m2
union u {
T1 m1;
T2 m2;
};
m1 or m2
C11 draft standard says in section 6.7.2.1 that
a structure is a type consisting of a sequence ofmembers, whose storage is allocated in an orderedsequence
and
a union is a type consisting of a sequence of memberswhose storage overlap.
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 8
![Page 9: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/9.jpg)
unions are not tagged
union u {
T1 m1;
T2 m2;
};
m1 or m2
The memory does not know whether it contains data of type T1 orT2.In C, memory contains bits without type informationIf we want a tagged union, we need to build with from struct andenum
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 9
![Page 10: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/10.jpg)
Quiz
union u {
char s[10];
int n;
};
int main()
{
union u x;
strncpy(x.s, "gollum", 7);
printf("%d\n", x.n);
}
What does it print?
1. gollum
2. Nothing, type error
3. 1819045735
4. 2987297274
5. Unspecified, could be any numberHayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 10
![Page 11: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/11.jpg)
Does valgrind report errors?
union u {
char s[10];
int n;
};
int main()
{
union u x;
strncpy(x.s, "gollum", 7);
printf("%d\n", x.n);
}
No, valgrind is fine with the aboveWe are not using any bits we shouldn’tThe type information is not visible to valgrindValgrind works on compiled code, not C sourceThere are no unions there, only memory accesses
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 11
![Page 12: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/12.jpg)
Does valgrind report errors?
union u {
char s[10];
int n;
};
int main()
{
union u x;
strncpy(x.s, "gollum", 7);
printf("%d\n", x.n);
}
No, valgrind is fine with the aboveWe are not using any bits we shouldn’tThe type information is not visible to valgrindValgrind works on compiled code, not C sourceThere are no unions there, only memory accesses
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 12
![Page 13: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/13.jpg)
Nesting in C type definitions
struct s1 {
T1 m;
int j;
};
Recursion in the grammar of C types:
T1⇒ struct s2 { int k; ... }
A struct may contain a type that may itself be a struct
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 13
![Page 14: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/14.jpg)
Nesting: struct inside struct
struct s1 {
struct s2 { int k; ... } m;
int j;
};
Recursion in the grammar of C types:
T1⇒ struct s2 { int k; ... }
A struct may contain a type that may itself be a struct
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 14
![Page 15: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/15.jpg)
Nesting struct inside struct lifted out
struct s2 { int k; ... };
struct s1 {
struct s2 m;
int j;
};
Recursion in the grammar of C types:
T1⇒ struct s2
A struct may contain a type that may itself be a struct
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 15
![Page 16: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/16.jpg)
struct and member names
struct s1 {
struct s2 { int k; ... } m;
int j;
};
struct s1 a;
a.j = 1;
a.m.k = 2;
s2 is the name of the type, and could be omitted herem is the name of the nested struct as a member of the outer one
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 16
![Page 17: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/17.jpg)
enum = enumeration type, much as in Java
enum dwarf { thorin, oin, gloin, fili, kili };
...
enum dwarf d;
...
switch(d) {
...
case thorin: hack(orcs);
...
Implementation: small integers, e.g. thorin = 0, and so on
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 17
![Page 18: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/18.jpg)
Tagged unions idiom
We use an enum for the tags.Then we package the union in a struct together with the enum
enum ABtag { isA, isB };
struct taggedAorB {
enum ABtag tag;
union {
A a;
B b;
} AorB;
};
It could be an A or a B
and we know which by looking a the tag.
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 18
![Page 19: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/19.jpg)
switch statement and tagged unions
struct taggedAorB {
enum ABtag tag;
union {
A a;
B b;
} AorB;
};
Access the tagged unions with switch:
struct taggedAorB x;
...
switch(x.tag) {
case isA:
// use x.AorB.a
case isB:
// use x.AorB.b
}Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 19
![Page 20: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/20.jpg)
Disjoint union in set theory
union is a bit like union ∪ for setsOne can define a disjoint union with injection tags
A + B
= {(1, a) | a ∈ A}∪ {(2, b) | b ∈ B}
We can tell if something comes from A or B by looking at the tag,1 or 2.Somewhat like a switch.(This won’t be in the exam.)
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 20
![Page 21: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/21.jpg)
Example for union and switch: geometric shapes
I Consider geometric shapes and a function to compute theirarea
I A shape could be a rectangle, OR a circle, OR some othershape
I A circle has a radius
I A rectangle has a height AND a width
I OR ⇒ tagged union idiom
I AND ⇒ struct
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 21
![Page 22: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/22.jpg)
Example: geometric shapes 2
enum shape { circle, rectangle };
struct geomobj {
enum shape shape;
union {
struct {
float height, width;
} rectangle;
struct {
float radius;
} circle;
} shapes;
};
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 22
![Page 23: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/23.jpg)
Example: geometric shapes — constructor-like functionThis function is analogous to a constructor in object-orientedlanguages.It encapsulates the low-level call to malloc and performsinitialisation.
struct geomobj *mkrectangle(float w, float h)
{
struct geomobj *p = malloc(sizeof(struct geomobj));
if(!p) {
fprintf(stderr, "malloc failed\n");
exit(1); // give up :(
}
p->shape = rectangle;
p->rectangle.width = w;
p->rectangle.height = h;
return p;
}
Note that there is both -> and .Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 23
![Page 24: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/24.jpg)
Example: geometric shapes — switch
float area(struct geomobj x)
{
switch(x.shape) {
case rectangle:
return x.shapes.rectangle.height
* x.shapes.rectangle.width;
// and so on
}
}
XCode warns about missing case, analogous to non-exhaustivepatterns in OCaml
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 24
![Page 25: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/25.jpg)
Example: geometric shapes — enum and switchType definition:
struct geomobj {
enum shape shape;
union {
struct {
float height, width;
} rectangle;
// more shapes
} shapes;
};
Code that operates on the type:
switch(x.shape) {
case rectangle:
return x.shapes.rectangle.height
* x.shapes.rectangle.width;
// more cases and formulas for areas
XCode warns about missing case, analogous to non-exhaustivepatterns in OCaml
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 25
![Page 26: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/26.jpg)
Inconsistent use of tagged union idiom
Warning: you can make mistakes like this
switch(x.shape) {
case circle:
return x.shapes.rectangle.height
* x.shapes.rectangle.width;
// ...
}
Does valgrind detect this kind of bug?
Sometimes, but not alwaysValgrind cares about bits, not about types
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 26
![Page 27: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/27.jpg)
Inconsistent use of tagged union idiom
Warning: you can make mistakes like this
switch(x.shape) {
case circle:
return x.shapes.rectangle.height
* x.shapes.rectangle.width;
// ...
}
Does valgrind detect this kind of bug?Sometimes, but not alwaysValgrind cares about bits, not about types
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 27
![Page 28: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/28.jpg)
unions compared to Java inheritance
In C we have unions:
union AorB {
A a;
B b;
};
In Java, we could build a hierarchy of classes:
class AorB { ... };
class A extends AorB { ... };
class B extends AorB { ... };
Now AorB can be used somewhat like a union of A and B.This is closer to a tagged union than a union by itself.Could use instanceof, but bad OO
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 28
![Page 29: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/29.jpg)
Naming struct and union members
I Having to invent names for struct, union and enum membersis tedious
I You may wish to make a systematic naming scheme for agiven situation and stick to it
I matter of taste
I Since C11, anonymous structs and unions require fewer names
I flatten the tree-like name space of nested struct and union
I less verbose when nesting structs or unions
I also needs fewer dot operators to access
I clang supports them
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 29
![Page 30: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/30.jpg)
anonymous structure examplesInner struct is not anonymous
struct s1 {
struct t1 { int n; ... } b;
int n;
};
Inner struct is anonymous, lacking a member name, but compilerstill knows what q is
struct s2 {
struct t2 { int q; ... } ;
int n;
};
Not allowed: confusion about what member n is
struct s3 {
struct t3 { int n; ... };
int n;
};Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 30
![Page 31: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/31.jpg)
Structures containing structures or pointers to them
Anonymous structures are about using fewer names (and dots).This is not the same as accessing via pointers or not.
struct scont {
A a;
B b;
};
scont
struct spoint {
A *ap;
B *bp;
};
•spoint
•
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 31
![Page 32: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/32.jpg)
On to trees
I We have seen struct+enum+union to get tagged union idiom
I can be processed with switch
I all well and good, but ...
I this needs more pointers
I and recursion
I when we add pointers and recursion, we get typed trees
I from trees and pointers we also get graphs
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 32
![Page 33: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/33.jpg)
Trees with values only at the leaves
enum treetag { isLeaf, isInternal } tag;
struct intbt
{
enum treetag tag;
union {
// if tag == isLeaf use this:
int Leaf; // no recursion
// if tag == isInternal use this:
struct {
struct intbt *left; // recursion
struct intbt *right; // recursion
} Internal;
} LeafOrInternal;
};
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 33
![Page 34: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/34.jpg)
Not the same as the trees with struct+pointer
struct twoptrs {
struct twoptrs *ptrone, *ptrtwo;
// recursion, unless NULL
int data;
// at all nodes, not just leaf nodes
}
Needs NULL pointers to terminate.NULL pointers are simple and efficient, but a bit of a hackC has NULL pointers, but OCaml and C++ references can neverbe null
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 34
![Page 35: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/35.jpg)
Trees and syntax
I parsing happens every time you run clang
I parsing happens every time you look at a web page
I fundamental idea: syntax is about trees
I trees can be represented in various ways in modern languages
I good example for C/C++ module
I there are two fundamentally different ways of representingtrees in C++
I once you have the tree, meaning (of programs) is tree walking
I We will use abstract syntax trees as a worked example
I similar to parse trees
I struct + union + enum
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 35
![Page 36: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/36.jpg)
Abstract syntax tree (AST)
In principle, a parser could build the whole parser tree:In practice, parsers build more compact abstract syntax trees.Just enough structure for the semantics (=meaning) of thelanguage. For example:
+
1 ∗
x 2
In C, we use the for ASTs:struct + union + enum + pointers + recursion + switch
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 36
![Page 37: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/37.jpg)
AST for expressions in infix
E → n
E → E − E
E → E ∗ E
enum Etag {
constant, minus, times
};
struct E {
enum Etag tag;
union {
int constant;
struct {
struct E *e1;
struct E *e2;
} minus;
struct {
struct E *e1;
struct E *e2;
} times;
} Eunion;
};http://www.cs.bham.ac.uk/~hxt/2016/c-plus-plus/
ParserTree.cHayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 37
![Page 38: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/38.jpg)
Evaluation function as abstract syntax tree walk
+
1 ∗
7 2
I each of the nodes is a struct with pointers to the child nodes(if any)
I recursive calls on subtrees
I combine result of recursive calls depending on node type, suchas +
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 38
![Page 39: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/39.jpg)
Null pointers and assertions
I Our syntax trees are not supposed to contain null pointers atall
I Part of the invariant of the data structure
I the recursion ends by one of the cases of the union
I we can use the assert macro from assert.h
I assert(p)
means we are not expecting a null pointer heregive an error if the assertion fails
I assertions can be used for debugging and switched off forproduction code
I the idea of a precondition of code is generally useful, not justin C
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 39
![Page 40: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/40.jpg)
eval as abstract syntax tree walk
int eval(struct E *p)
{
assert(p);
switch(p->tag) {
case constant:
return p->Eunion.constant;
case plus:
return eval(p->Eunion.plus.e1)
+ eval(p->Eunion.plus.e2);
case minus:
return eval(p->Eunion.minus.e1)
- eval(p->Eunion.minus.e2);
case times:
return eval(p->Eunion.times.e1)
* eval(p->Eunion.times.e2);
default:
fprintf(stderr, "Invalid tag for struct E.\n\n");
exit(1);
}
} Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 40
![Page 41: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/41.jpg)
Exercise
Write a pretty-printing function that outputs an expression tree in(some form of) properly indented XML. For example, 1+2 shouldbe printed as
<plus>
<constant>
1
</constant>
<constant>
2
</constant>
</plus>
Hint: the pretty printing function should take an integer parameterrepresenting the current level of indentation.
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 41
![Page 42: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/42.jpg)
Grammar for Lisp-style expressions
E → n (constant)E → x (variable)E → (+ L) (operator application for addition)E → (* L) (operator application for multiplication)E → (= x E E) (let binding)
L → E L (expression list)L →
Lisp syntax is easy to parse.
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 42
![Page 43: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/43.jpg)
AST for Lisp-style expressions
E → nE → xE → (+ L)E → (* L)E → (= x E E)
enum op { isplus, ismult };
enum exptag { islet, isconstant,
isvar, isopapp };
struct exp {
enum exptag tag;
union {
int constant;
char var[8];
struct {
enum op op;
struct explist *exps;
} ; // anonymous struct
struct {
char bvar[8];
struct exp *bexp;
struct exp *body;
}; // anonymous
}; // anonymous
};Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 43
![Page 44: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/44.jpg)
AST for Lisp-style expressions
L → E LL →
struct explist {
struct exp *head;
struct explist *tail;
};
Optimization: instead of a union, we use the standard list idiom
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 44
![Page 45: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/45.jpg)
Environments are passed down into the tree
let
x 2 x
I The second x is evaluated with the environment in which x isbound to 2.
I The environment is a parameter to the evaluation function
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 45
![Page 46: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/46.jpg)
Constructing abstract syntax treesI C structs are like classes that contain only data membersI structs do not have constructorsI but we can write functions that do the same jobI allocate memory and initialize is to given arguments
struct exp *mkopapp(enum op op, struct explist *el)
{
struct exp *ep = malloc(sizeof(struct exp));
if(!ep) {
fprintf(stderr, "malloc failed\n");
exit(1); // give up :(
}
ep->tag = isopapp;
ep->op = op;
ep->exps = el;
return ep;
}
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 46
![Page 47: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/47.jpg)
C tree vs functional programming
type intbt = Leaf of int
| Internal of intbt * intbt;;
The analogue in C using unions and structs is the following:
struct intbt
{
enum { isLeaf, isInternal } tag;
union {
// if tag == isLeaf use this:
int Leaf;
// if tag == isInternal use this:
struct {
struct intbt *left;
struct intbt *right;
} Internal;
} LeafOrInternal;
};
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 47
![Page 48: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/48.jpg)
C tree vs functional programming
type intbt = Leaf of int
| Internal of intbt * intbt;;
C version more compact with anonymous struct and union:
struct intbt
{
enum { isLeaf, isInternal } tag;
union {
int Leaf;
struct {
struct intbt *left, *right;
} ;
} ;
};
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 48
![Page 49: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/49.jpg)
Learning C and functional languages
I The way data structures are built and used in C is closer tofunctional languages than to object-oriented ones
I OCaml or Haskell program= data structures (with * and |)and functions on them with pattern matching
I C program= data structures (with struct and union)and functions on them with switch, ->, .
I object-oriented program in Java or C++= data and functions glommed together in classes
I But: no garbage collector in C, no type safety
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 49
![Page 50: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/50.jpg)
Comparison of struct + union + pointer to OOManager “is-a” employee; manager “has-a” underlingssounds like LOLcat:I can haz cheezburger; I iz in your computer
struct employee {
enum job { coder, manager } job;
union {
struct {
char *proglanguage;
int linesofcode;
} coder;
struct {
struct employee **underlings; // recursion
// array of pointers to managed employees
int numunderlings; // size of array
float bonus;
} manager;
};
};
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 50
![Page 51: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/51.jpg)
Capstone example: recursive-descent parser in C
I this example illustrates all the main C constructs we havecovered:http://www.cs.bham.ac.uk/~hxt/2016/c-plus-plus/
ParserTree.c
I not trivial, but only 300 lines of code
I classic example; Kernighan&Ritchie and Stroustrup’s bookalso use parsers as examples
I comes from compiling literature
I recursive descent: one C function for each non-terminalsymbol of the grammar
I use lookahead into input and switch on it
I only complication is the need to eliminate left recursion
I clang uses a recursive-descent parserhttp://clang.llvm.org/features.html#unifiedparser
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 51
![Page 52: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/52.jpg)
Scanning input using pointer arithmetic
char input[100];
char *pos = input;
char lookahead()
{
while(isspace(*pos))
pos++;
return *pos;
}
void match(char c)
{
if(lookahead() == c)
pos++;
else
syntaxerror();
}
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 52
![Page 53: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/53.jpg)
Allocating memory using pointer arithmetic
struct E myheap[1000];
struct E *freeptr = myheap;
struct E *myalloc()
{
if(freeptr + 1 >= myheap + heapsize) {
fprintf(stderr, "Heap overflow.\n");
exit(1);
}
return freeptr++;
}
void reset()
{
freeptr = myheap;
pos = input;
}
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 53
![Page 54: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/54.jpg)
Optimizing memory managment in C
I we can write our own memory management in C/C++
I as an alternative or on top of the library
I the more we know about our allocation pattern, the moreefficient we can make it
I standard malloc is general-purpose and not the most efficientin all situations
I may avoid malloc entirely and use only global vars (what theC standard calls static storage)
I we can write C code for situations where there is no OS andno malloc available
I Can you do that in Java? Build a tree by putting all the nodesinto an array and not use any heap?
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 54
![Page 55: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/55.jpg)
Grammar for parsing
This grammar has been made suitable for parsing by left-recursionelimination.
P → 1 | . . . | 9
P → ( E )
E → F E ′
E ′ → + E E ′
E ′ →F → P F ′
F ′ → *P F ′
F ′ →
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 55
![Page 56: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/56.jpg)
Parsing function for nonterminal P
struct E *parseP() // primary expression
{
char c;
struct E *result = 0;
switch(c = lookahead()) {
case ’0’ ... ’9’:
match(c);
result = makeconstant(c - ’0’);
break;
case ’(’:
match(’(’);
result = parseE();
match(’)’);
break;
default:
syntaxerror();
}
return result;
} Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 56
![Page 57: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/57.jpg)
Parsing function for nonterminal E
struct E *parseE()
{
struct E *resultOfF;
struct E *result = NULL;
switch(lookahead()) {
case ’0’ ... ’9’:
case ’(’:
resultOfF = parseF();
result = parseEprime(resultOfF);
break;
default:
syntaxerror();
}
return result;
}
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 57
![Page 58: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/58.jpg)
Stretch exercise: recursive descent parser in C
Write a parser for the Lisp-style expression syntaxWrite one parsing function for each non-terminalUse lookahead and match functions to guide the recursive parsingfunctionsConstruct the abstract syntax treeRather than using malloc, construct the node in an array
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 58
![Page 59: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/59.jpg)
Exercise: grammar → structs
Take some grammars and translate them to thestruct/union/enum/pointer idiom.For example, suppose we have two binary operators ⊗ and ⊕ and aunary operator �, as follows:
A → B ⊕ B
A → �AB → A⊗ A
B → n where n is an integer
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 59
![Page 60: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/60.jpg)
Object orientation and the expression problem
1. We may wish to add more cases to the grammar, say for adivision operator: easy to do in a class hierarchy, hard to dowith struct+union
2. We may wish to add more operations to the expression trees,say pretty printing or compilation to machine codeeasy to do with struct and union, hard to do with classhierarchy
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 60
![Page 61: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/61.jpg)
Outline of the module (provisional)
I am aiming for these blocks of material:
1. pointers+struct+malloc+free⇒ dynamic data structures in C as used in OS X
2. pointers+struct+union⇒ typed trees in Csuch as abstract syntax trees X
3. object-oriented trees in C++composite and visitor patterns
4. templates in C++parametric polymorphism
An assessed exercise for each.
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 61
![Page 62: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/62.jpg)
C
++
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 62
![Page 63: Typed trees and tree walking in C - University of Birminghamhxt/2016/c-plus-plus/trees-in... · 2017-02-16 · Outline of the module (provisional) I am aiming for these blocks of](https://reader034.vdocument.in/reader034/viewer/2022050314/5f76b56299c56e132f6b6c01/html5/thumbnails/63.jpg)
C++
Hayo Thielecke University of Birmingham http://www.cs.bham.ac.uk/~hxt 63