e˙icient ir for the openmodelica...

63
Linköpings universitet SE–581 83 Linköping +46 13 28 10 00 , www.liu.se Linköping University | Department of Computer Science Master thesis, 30 ECTS | Datateknik 2018 | LIU-IDA/LITH-EX-A--18/014--SE Eicient IR for the OpenModelica Compiler Eektiv IR för OpenModelica-kompilatorn Patrik Andersson Simon Eriksson Supervisor : Martin Sjölund Examiner : Peter Fritzson

Upload: others

Post on 24-May-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

Linköpings universitetSE–581 83 Linköping

+46 13 28 10 00 , www.liu.se

Linköping University | Department of Computer Science

Master thesis, 30 ECTS | Datateknik

2018 | LIU-IDA/LITH-EX-A--18/014--SE

E�icient IR for theOpenModelica CompilerE�ektiv IR för OpenModelica-kompilatorn

Patrik AnderssonSimon Eriksson

Supervisor : Martin SjölundExaminer : Peter Fritzson

Page 2: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

Upphovsrä�

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare – un-

der 25 år från publiceringsdatum under förutsättning att inga extraordinära om-

ständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en

att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det

oförändrat för ickekommersiell forskning och för undervisning. Överföring av

upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan

användning av dokumentet kräver upphovsmannens medgivande. För att garan-

tera äktheten, säkerheten och tillgängligheten �nns lösningar av teknisk och ad-

ministrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som

upphovsman i den omfattning som god sed kräver vid användning av dokumentet

på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras

i sådan form eller i sådant sammanhang som är kränkande för upphovsmannens

litterära eller konstnärliga anseende eller egenart. För ytterligare information om

Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet – or its possible re-

placement – for a period of 25 years starting from the date of publication barring

exceptional circumstances. The online availability of the document implies per-

manent permission for anyone to read, to download, or to print out single copies

for his/hers own use and to use it unchanged for non-commercial research and

educational purpose. Subsequent transfers of copyright cannot revoke this per-

mission. All other uses of the document are conditional upon the consent of the

copyright owner. The publisher has taken technical and administrative measures

Page 3: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

to assure authenticity, security and accessibility. According to intellectual prop-

erty law the author has the right to be mentioned when his/her work is accessed

as described above and to be protected against infringement. For additional in-

formation about the Linköping University Electronic Press and its procedures for

publication and for assurance of document integrity, please refer to its www home

page: http://www.ep.liu.se/.

©Patrik Andersson

Simon Eriksson

Page 4: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

Abstract

The OpenModelica compiler currently generates code directly from a syntax

tree representation, which leads to ine�cient code in several cases. This the-

sis work introduces a lower-level intermediate representation for the com-

piler which aims to simplify the compiler back end and enable more opti-

mizations. The resulting design of the representation features �at primitive

operations and control �ow using basic blocks and terminators. Variables are

mutable, unlike SSA-based representations. Introducing the IR did not signif-

icantly change the runtime performance of the test programs. The number

of lines of code compared to the old back end was reduced to a quarter, this

and the simpler representation will help future work on optimization passes

and implementing an LLVM-based back end.

Page 5: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

Contents

Abstract iv

Contents v

List of Figures vii

List of Tables viii

1 Introduction 11.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.3 Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.4 Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.5 Delimitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Theory 42.1 Compilers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2 The Modelica language . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3 The OpenModelica environment . . . . . . . . . . . . . . . . . . . 22

3 Method 333.1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.3 Performance evaluation . . . . . . . . . . . . . . . . . . . . . . . . 33

3.4 Code complexity measurements . . . . . . . . . . . . . . . . . . . . 34

4 Results 364.1 Overview of the MidCode design . . . . . . . . . . . . . . . . . . . 36

4.2 Performance measurements . . . . . . . . . . . . . . . . . . . . . . 43

v

Page 6: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

5 Discussion 445.1 Performance results . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5.2 Design of MidCode . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5.3 Code Complexity of MidCode . . . . . . . . . . . . . . . . . . . . . 45

5.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

5.5 The work in a wider context . . . . . . . . . . . . . . . . . . . . . . 46

6 Conclusion 476.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

A Performance test functions 49A.1 Fibonacci . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

A.2 Mandelbrot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

A.3 Quicksort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

A.4 Takeuchi function . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

Bibliography 53

vi

Page 7: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

List of Figures

2.1 Compilation of Haskell in GHC . . . . . . . . . . . . . . . . . . . . . . 11

2.2 Overview of translation phases in the OpenModelica compiler . . . . 23

2.3 Overview of the OpenModelica components . . . . . . . . . . . . . . . 24

4.1 Overview of MidCode phases . . . . . . . . . . . . . . . . . . . . . . . 37

vii

Page 8: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

List of Tables

2.1 Available LVALUE types in MIR . . . . . . . . . . . . . . . . . . . . . . 8

2.2 Available RVALUE types in MIR . . . . . . . . . . . . . . . . . . . . . 9

2.3 Available terminators in MIR . . . . . . . . . . . . . . . . . . . . . . . 10

4.1 Lines of code for corresponding parts of old back end . . . . . . . . . 43

4.2 Lines of code for new back end . . . . . . . . . . . . . . . . . . . . . . 43

4.3 Performance measurements for old and new code generator . . . . . . 43

viii

Page 9: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

Chapter 1

Introduction

1.1 Background

OpenModelica is an open-source modeling and simulation environment developed

mainly by the non-pro�t Open Source Modelica Consortium (OSMC) that imple-

ments the open standard Modelica modeling language. Modelica is a declarative

equation-based language designed for describing various complex and dynamic

systems and can be used for simulating, for example, mechanical, electrical, hy-

draulical and process oriented systems. OpenModelica is mainly targeted towards

industrial and academic purposes.

1.2 Motivation

Currently, the C code generated by the OpenModelica compiler is ine�cient in

many cases, causing signi�cant performance issues, especially considering that

the data inputs may often be large and one major reason for that is due to the

code being generated directly from a high-level syntax tree-based representation

of the Modelica code[2].

A better solution would be to convert this representation to a lower-level inter-mediate representation (IR) more suitable for optimization and code generation

before actually generating the code. Later on, implementing a stage converting

the new lower-level IR to a more common representation like for example LLVM

would be feasible.

1

Page 10: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

1.3. Aim

1.3 Aim

The aim of this thesis is to design and implement a new e�cient and maintainable

IR solution for OpenModelica. The new IR stage should be able to compile various

testing programs with roughly equal run time performance to the old code gener-

ation while simplifying the back-end code generation and enabling various useful

lower-level transformations in the future. This new code generation should easily

be extended with more lower-level optimizations and new back ends. Especially

a future LLVM-based back end would be interesting in the long term.

1.4 Research questions

1. How can a new IR help the implementation of optimizations and other code

transformations?

2. Which IR design choices (of some common alternatives) are most suitable

for OpenModelica?

3. How can this new IR be implemented in the OpenModelica compiler?

4. How much of the back end can be moved to a shared portable format leaving

target speci�c implementations simpler?

5. How will the new IR a�ect the run-time performance of OpenModelica?

1.5 Delimitations

This project will focus on evaluating some common IR approaches and on imple-

menting the new IR and its corresponding C code generator. The IR approaches to

evaluate should be low-level but platform-independent and well suited for trans-

forming to LLVM. Alternative special-purpose C code generator variants (for ex-

ample for parallelization or embedded devices) will not be considered.

While Modelica has many language features speci�c to simulation such as

equation-based models and model connections, the project will focus on imple-

menting Modelica functions, its algorithm feature subset and the MetaModelica

extension providing various features common in functional programming but not

included in the Modelica standard such as pattern matching, tagged unions and

2

Page 11: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

1.5. Delimitations

linked lists. This part of Modelica is closer to general-purpose programming and

therefore easier to compare to other languages and their corresponding IR solu-

tions, as well as less complex. The Modelica support for multi-dimensional arrays

was deemed too complex and time-consuming, and was therefore skipped.

While performance improvements at this stage are of course desirable, the main

focus is on creating a solution that can later be extended on with new optimiza-

tions and back ends.

3

Page 12: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

Chapter 2

Theory

2.1 Compilers

Modern compilers are commonly structured as a pipeline of several phases taking

a structure, transforming it, and sending it to the next phase. This pipeline can

be divided into a front end parsing and analyzing the source code, and a back endtaking the structure produced by the front end and converting it into a executable

program[1].

Some common operations of the front end part are lexical analyzing (converting

the source text to more easily parsable tokens), syntax analyzing (checking if the

syntax is correct and converting the token stream into an abstract syntax tree),

and semantic analyzing (e.g. checking that the types are correct) while some com-

mon tasks of the back end are optimization and code generation [1].

2.1.1 Optimization

In order to improve the performance, size and/or power consumption of a gen-

erated program, compilers may attempt to optimize the generated code rather

than producing the most obvious conversion of the source code. Optimizations

must give the same results as the unoptimized version as well as su�ciently high

performance improvements while being fast enough to still give acceptable com-

pilation times [1].

4

Page 13: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

2.1. Compilers

2.1.2 Intermediate representations

Intermediate representations are internal forms of a computer program created and

used by a compiler in order to aid the compilation process. An intermediate repre-

sentation lies somewhere between the original source and the compiled target and

can be at di�erent levels depending on its area of use; it can be high-level (close to

the original source code), low-level (close to the target language) or something in

between. The data structures used can also vary; they can for example be a graph,

a tree or a linear list. Compilers often have multiple intermediate representations

in its pipeline, with each IR serving di�erent purposes and each phase converting

the program to a lower IR form[19].

One major advantage of having intermediate representations is that the compiler

can be more easily retargeted into new source languages or new target platforms

and reuse independent components for those. Instead of having to write one com-

piler for every source/target combination, developers can just add a single front

end in order to support one source language, and a single back end in order to

support one target platform [19][1].

Common IR designs

One common linear representation of operations is three-address codes, where

each operation is binary and represented by two source variables, one destina-

tion variable and an operation type. These operations can be stored in memory

as records called quadruples storing the three variables and the operation type.

Variables can be either a named variable, a constant or a compiler-generated tem-

porary variable. Unary operations may be de�ned as having just a single source

variable. Expression trees are �attened by storing intermediate operations in tem-

porary variables which are then used as source variables in later expressions.

Temporary variables are usually given unique names and not shared between

di�erent intermediate operation results. Special call, jump and conditional op-

erations can be implemented for representing control �ow.[1].

One control �ow representation that is frequently used is the control �ow graph(CFG). The instructions are partitioned into basic blocks, which is an instruction

sequence where the control �ow of each block can only enter through the �rst in-

struction and exit in the last, where various jumping constructions can be chosen.

The basic blocks are then represented as nodes in the diagram and execution paths

as directed edges between the blocks. Each basic block should preferably contain

as many instructions as possible without violating these rules. The instructions

5

Page 14: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

2.1. Compilers

used in basic blocks are primitive and in three-address form. This representation

simpli�es many analyses and is therefore useful for performing many optimiza-

tions[1].

Another complementary representation for tracking data is static single assign-ment (SSA), which di�ers from three-address codes in that each variable can not

be assigned more than once in a function, which simpli�es data �ow analysis

signi�cantly. This gives SSA the important property of referential transparency,

meaning that a reference can be replaced with its de�nition and therefore that the

variable values are independent of the order their statements are listed in. Ref-

erential transparency also allows for a computation to be replaced by the result

which allows for well known transformations like common subexpression elim-

ination. As such, compilers that perform data �ow analysis can do a conversion

pass over the non-SSA representation and produce a SSA-base representation that

is easier to reason with. SSA is used together with a basic block structure and uses

a special φ (phi) function when execution paths merge, which takes a list of source

variables and assigns one of them to a new variable depending on the previously

executed block. The transformations can in general be made with other meth-

ods, but SSA has the advantage of being both intuitive and e�cient, allowing for

more optimizations to easier be implemented while also enabling fast compilation

times, often fast enough that it can be used in just-in-time compilers[17].

Example: GCC

One example of IR usage is GCC, which has two intermediate representations

called GENERIC and GIMPLE. GENERIC represents a function and its statements

as a tree structures, while GIMPLE is a subset of GENERIC reduced by a process

called gimpli�cation and used in the optimization stage. These representations

are both independent of the programming language used[12].

Example: LLVM

LLVM is an IR that is, among other things, used in the Clang compiler for C. LLVM

is also used for other back ends, for example Rust, Swift and GHC (for Haskell).

These compilers use LLVM for optimizations and code generation. LLVM also

aims to be a portable format by supporting several targets for code generation.

LLVM uses a basic block and terminator model for control �ow. The instructions

are three-address-code with single static assignment variables.

6

Page 15: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

2.1. Compilers

Example: Rust and MIR

The o�cial compiler for the Rust programming language has added a new IR

named MIR (Mid-level Intermediate Representation) between its high-level AST

and its low-level LLVM code generation, whereas previously the LLVM code was

directly from the AST. The design of MIR is based on primitive three-operand

statements and basic blocks with terminators. This design makes translations

to LLVM, which also uses primitive operations and control �ow representations,

relatively simple to do [10].

Some of the main goals of MIR in Rust are improving compilation time by having

more e�cient data structures, enabling more Rust-speci�c optimizations, reduc-

ing redundancy in the code base and making optimizations and other transfor-

mations easier to work and reason with in general[10].

One notable di�erence from LLVM is that it is not SSA-based, i.e. it allows multiple

assignments to the same variable, and named variables are kept as-is. However,

generated temporaries are still typically single-assignment. As more advanced

optimizations relying on SSA representations are typically done by LLVM, an

lvalue-based representation rather than SSA is considered su�cient for this pur-

pose.[10][11].

The complete Rust language including its various syntactic sugar constructions

is reduced to a small subset that is easier to work with, since various redundant

representation variations of a single low-level feature, all having to be handled

separately, are now represented by fewer variants meaning the analysis has fr-

wer cases to handle. Where control �ow analyses were previously done on sepa-

rate control-�ow graphs that had to be generated from the AST, they can now be

done directly on the MIR representation. The Rust safety analyses are also more

accurate since the lower-level nature of MIR makes the di�erence between the

analyzed structure and the �nal code smaller. Rust-speci�c optimizations can be

directly done as a separate stage, whereas it previously often was done during con-

version to LLVM, adding unwanted complexity to this conversion phase. Apart

from simplifying LLVM generation, MIR also adds potential for adding other low-

level back ends in the future.[11].

The MIR data structure describes the workings of a single function and contains

a control-�ow graph stored as a list of basic blocks, a list of compiler-generated

temporary variables, and a list of user-declared variables. A single basic block

contains a list of statements and a terminator, which describes the control-�ow

action that occurs at the end of the basic block execution. A statement can either

7

Page 16: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

2.1. Compilers

be a variable assignment or a drop (deallocation) of a variable, which is described

explicitly unlike in the source language. An assignment statement contains an

rvalue for the right-hand side and an lvalue for the left-hand side.

An lvalue can be variables of di�erent kinds such as named, temporary, argument

or return variables, a �eld in a struct or tuple, a pointer dereference, an array

index, or a enum downcast[11], see table 2.1.

B User-declared variable binding

TEMP Compiler-generated temporary

ARG Function argument

RETURN Return value

LVALUE.f Struct or tuple �eld

*LVALUE Pointer dereference

LVALUE[LVALUE] Array index

(LVALUE as VARIANT) Enum downcast

Table 2.1: Available LVALUE types in MIR

An rvalue symbolizes an expression and can be the use of an lvalue, a mutable

or immutable reference, a cast, a constant, a literals of a struct or built-in con-

tainer type, the length of an object, or common simple binary operations and

unary operations[11], see table 2.2. As shown below, most rvalue operations only

take lvalues as arguments, meaning that for constants and data structure literals

can only be used through temporary variables. The special BOX value represents

the memory allocation function taking the struct constructor method as its sole

argument, and is used in the MIR call representation just like other functions[11].

Terminators in MIR can jump to another basic block with or without stack un-

winding, jumps to one of two speci�ed basic blocks depending on the truth value

of a variable, jumps to one basic block from a list depending on the value of a

variable, call a function and afterwards jump to one of two basic blocks depend-

ing on if the function succeeded or failed, or simply return from the function with

or without stack unwinding[11], see table 2.3.

8

Page 17: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

2.1. Compilers

Use(LVALUE) Value of LVALUE

[LVALUE; LVALUE] Array literal of speci�ed size with

the same de�ned value for all

cells

&’REGION LVALUE Reference to LVALUE

&’REGION mut LVALUE Mutable reference to LVALUE

LVALUE as TYPE Cast

LVALUE <BINOP> LVALUE Binary operation

<UNOP> LVALUE Unary operation

Struct { f: LVALUE0, ... } Struct literal

(LVALUE...LVALUE) Tuple literal

[LVALUE...LVALUE] Array literal

CONSTANT Constant

LEN(LVALUE) Length of LVALUE

BOX Memory allocation function for

box operator

Table 2.2: Available RVALUE types in MIR

Example: Swi� and SIL

Similar to the Rust compiler, the o�cial Swift compiler has also added a new

mid-level IR between the AST and the generated LLVM code with the name SIL(Swift Intermediate Language). Unlike MIR, SIL is SSA-based, but replaces the

phi node concept with having arguments in basic blocks that are set by termina-

tors jumping to that block. Within the block, the argument variables work like

typical source variables. Like with MIR, literals have to be saved in temporaries

before they can be used in operations. Calls are implemented di�erently in SIL,

while MIR implements calls as terminators, SIL instead implements them as reg-

ular statements. Operators are also implemented as calls to built-in functions

rather than special rvalue constructions by like in MIR. More low-level memory

operations are stored as explicit constructions than in MIR, including heap and

stack allocations, memory accesses and reference counting handling [20].

9

Page 18: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

2.1. Compilers

GOTO(BB) Jump unconditionally to basic block BB

PANIC(BB) Start stack unwinding and jump to basic

block BB for cleanup

IF(LVALUE, BB0, BB1) Jump to BB0 if LVALUE is true, otherwise

jump to BB1

SWITCH(LVALUE,BB...)

Jump to one of the listed basic blocks depend-

ing on value of enum LVALUE

CALL(LVALUE0 =LVALUE1(LVALUE2...),BB0, BB1)

Call function referenced in LVALUE1 with

arguments in LVALUE2 onwards, store re-

turn value in LVALUE0, jump to BB0 if call

succeeded or BB1 if it panicked

DIVERGE Return and unwind stack

RETURN Return

Table 2.3: Available terminators in MIR

Example: Glasgow Haskell Compiler

One major compiler for the functional language Haskell is the Glasgow HaskellCompiler, GHC, which is an open source project. One of GHC’s IRs is Core. While

being an IR, Core corresponds well to a simple source level language which elim-

inates super�uous ways to express the same language construct [9]. For example,

a list comprehension needs to be changed from a native Haskell construct into an

expression based on variable bindings and functions in Core.

In Core, case-expressions are also restricted as they cannot match nested construc-tors of a value [7]. It is used to see which member of a union a value contains as

well as accessing the attributes of the record. Core also �attens expressions by

restricting their usage. An argument to a function must be a literal or variable

(called atom in the paper), resulting in the dependence of function calls being

explicitly ordered by variable bindings.

GHC has another lower level IR, the The Spineless Tagless Graph Reduction Ma-chine, or STG [8]. The di�erence between STG and Core is that Core is meant to

simplify expressions in a functional setting while STG is meant to help simpli�-

cations targeted at modern processors. As such it speci�es operational semantics,

unlike Core. In addition, all type information information is lost in transforming

Core to STG.

10

Page 19: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

2.1. Compilers

Parse Tree

Core

STG

Cmm

LLVMC

Assembly

desugar

STGify

CodeGen

LLVM compiler

NCG

C compiler

Figure 2.1: Compilation of Haskell in GHC [9]

The operational semantics include a stack for arguments, returns, and the imple-

mentation of the lazy calling convention. Arguments are pushed when a function

application is evaluated and popped when entering closures with arguments. The

return entries in the stack is actually not for function returns since the only eval-

uation is from pattern matching, so the entry is for the result of a pattern match.

The implementation of the lazy calling convention is done with a stack entry that

causes a memory mutation of a suspended computation with the current value

computed.

STG also has a heap which contains all values allocated until they are deallocated

by garbage collection. An important feature for long running computations in a

lazy language is black holes. When a computation is entered, it is replaced by a

11

Page 20: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

2.2. The Modelica language

black hole, which does not keep any of the computation references alive , although

the ones used when evaluating the computation are. This means that if garbage

collection is performed while evaluating the black holed value, more things can

be collected. For example, in code for �nding the last value of a long linked list,

earlier elements can be collected even if garbage collection happens in the middle

of evaluation. Additionally, if evaluation tried to evaluate a black hole that it has

created, then an in�nite loop has been detected, so an exception can be thrown.

Further down in the compilation pipeline, we �nd Cmm, which is a processor

portable intermediate language reminiscent of LLVM. Cmm consists of simple

control �ow between blocks, basic types that re�ect machine representation and

stack-backed unlimited variables[21]. Cmm contains no type information except

for machine level representations like 32-bit signed integers. It also explicitly rep-

resents the heap and stack and writing to byte addresses. As can be seen from �g-

ure 2.1.2 [9], there are several back ends that starts from Cmm and then generate

assembly.

2.2 The Modelica language

Modelica is a declarative and object-oriented language developed for equation-

based modeling of complex and dynamic physical systems. It can be used for

simulating, for example, mechanical, electrical, hydraulical and process oriented

systems[5]. The Modelica standard exists in multiple implementations and is gov-

erned by the international non-pro�t Modelica Association[16]. Systems can be

separated in smaller components which can then connect to other components

and be distributed in model libraries. This enables equation systems to be reused

and combined to make larger systems. Many common standard components are

distributed by the Modelica Association in their Modelica Standard Library[16].

2.2.1 Primitive types and arrays

The primitive types supported are integers, reals (�oating-point), booleans,

strings, enumerations and a special clock type used for synchronous systems. In

addition, support for complex numbers are implemented in a standard library.

Multi-dimensional arrays are also supported, and can have dimension sizes that

are unspeci�ed at compile time. A data type for complex values is also imple-

mented by the standard Modelica library[4].

12

Page 21: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

2.2. The Modelica language

Some of the primitive operations supported in expressions are scalar arithmetic

operations (such as addition, subtraction, division, multiplication and exponenti-

ation), elementwise arithmetic operations on arrays, comparisons, logical opera-

tions, and if-expressions[4].

2.2.2 Models and equations

Modelica model classes describe the system to be modelled as a system of vari-

ables with optional initial values and di�erential, algebraic and discrete equations,

which can then be compiled and solved by the Modelica implementation for a

given time slice. The class de�ned at the top of the program is automatically in-

stantiated, and other classes can be instantiated by declaring them as variables in

the top class[4].

Each equation consists of two expressions, one on each side of an equality (=)

operator. The listed equations are not a�ected by the order in which they are

listed and are acausal, meaning they do not have a �xed data �ow direction. In

order to support variation over time, variables can be surrounded by the der()time derivative operator, and the time variable can also be accessed directly as

time. For-loops can also be used to declare repetitive equation series in a shorter

way.[16][4].

Variables can optionally have de�ned initial values, and models also support ad-

ditional variable types such as named constants and parameters, which unlike

normal named constants can be set before simulation without recompiling[4].

For example, a pendulum can be modelled as in the following example taken from

page 21 in Principles of Object-Oriented Modeling and Simulation with Modelica3[4]. This model contains both di�erential equations and algebraic equations,

and is therefore an example of an di�erential algebraic equation system (DAE).

This system can be simulated by calling the simulate function, for example by

writing simulate(Pendulum,stopTime=6)[4] and then plotted by calling

the plot function with the variable to be plotted as its argument[4].

model Pendulumparameter Real m=1, g=9.81, L=0.5;

//mass, gravity, length of pendulumReal F; //forceoutput Real x(start=0.5), y(start=0)

//x and y position with set start valuesoutput Real vx, vy; //x and y velocity

13

Page 22: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

2.2. The Modelica language

equationm * der(vx) = -(x / L) * F;m * der(vy) = -(y / L) * F - m * g;der(x) = vx;der(y) = vy;x^2 + y^2 = L^2;

end Pendulum;

2.2.3 Model inheritance

Models can extend on other models, and therefore provide more specialization

while reusing code, similar to hierarchical class inheritance in typical object-

oriented languages. By inheriting equations, data variables and class members

from a base class, a subclass can inherit part of their behaviour while modifying

and adding on it by adding additional equations and variables[4].

Model classes can be partial, meaning that their equation systems are under-

speci�ed and can only be made solvable by extending them with subclasses pro-

viding additional equations, this can be seen as an analog to abstract classes in

object-oriented languages. Variables of an instances are accessed though dot syn-

tax, though they can be protected from outside access by putting them in the

protected section, which will block direct access from outside but still make

them available in submodels.[16][4].

Classes can also contain variables with type declarations that are replaceable by

subclasses, similar to generics in other languages. A �eld with a replaceable type

is simply pre�xed by the protected keyword. For making a new class based on

a class with replaceable types, a new type de�nition specifying the types is made

which can be then be instantiated like a regular class[4].

2.2.4 Connections

Model instances can be connected to each other through special connect-

equations in order to create larger systems. The interfaces for these connections

are speci�ed by connector classes, which contain a list of the variables that are

carried by the signals. Variables in a connector can optionally be con�gured as

�ow variables, indicating that the values of all connected signals will sum to zero

instead of being equal[4].

14

Page 23: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

2.2. The Modelica language

Connections are generally acausal, meaning that they like equations lack a spec-

i�ed data direction, but they can also be speci�ed as input or output connections,

meaning that they can only receive from or send to a component, respectively[4].

When connecting one variable in a component to many subcomponents without

having to make a large number of connect-equations explicitly, it can be made

implicitly by pre�xing the shared variable in the top component with the innerkeyword and declaring a reference variable with the same name in the subcom-

ponents pre�xed by the outer keyword[4].

Discrete events

Discrete instantaneous events can be modelled by using or by using the when-

statement, which only activates its subequations at the exact time moment when

one or more of its condition expressions transitions to true. Discrete and con-

tinuous components can be freely combined to create hybrid systems. A when-

statement can contain a special reinit equation that resets a variable to a new

value on the event. In a reinit equation, the previous value of the variable can

be accessed through the pre operator. Apart from the when-statements, simple

if-expressions and if statements in normal equations may also be used to model

discrete changes[4].

Basic electronics example

In listing 1, we take examples from Principles of Object-Oriented Modeling andSimulationwithModelica 3 to give a taste of Modelica. The listing de�nes electrical

components in Modelica by de�ning variables, equations, connectors and using

inheritance so that shared equations can be de�ned in a single partial superclass

[4].

Packages

In order to avoid name con�icts and simplify sharing code, libraries can be dis-

tributed as packages, which gives all content in the library its own hierarchical

namespace. Other packages can then be imported in another package with the

import keyword, which optionally allows importing namespaces directly at the

top-level within the package. Within a package, an imported namespace can be

given custom names so that typing can be reduced without risking name con�icts

as with top-level imports.

15

Page 24: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

2.2. The Modelica language

type Voltage = Real(unit="V");type Current = Real(unit="A");type Resistance = Real(unit="Ohm");type Capacitance = Real(unit="F");

connector Pin "Electrical pin"Voltage v;flow Current i;// the flow keyword indicates that any connected// variables should sum to zero

end Pin;

partial model TwoPin "Electrical component with two pins"// partial since it does not have enough equations// to be fully definedPin p,n;Voltage v;Current i;

equationv = p.v - n.v;0 = p.i + n.i;i = p.i;

end TwoPin;

model Resistorextends TwoPin;// include all variables and equations from TwoPinparameter Resistance R;

equationR*i = v;

end Resistor;

model Capacitorextends TwoPin;parameter Capacitance C;

equationC*der(v) = i;

end Capacitor;

model GroundPin p;

equation0 = p.v;

end Ground;

model LowPassPin in,out;parameter Resistance R;parameter Capacitance C;Resistor resistor(R=R);Capacitor capacitor(C=C);Ground ground;

equationconnect(in, resistor.p);connect(resistor.n, out);connect(out, capacitor.p);connect(capacitor.n, ground.p);

end LowPass;

Listing 1: Models for basic electronics simulation

16

Page 25: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

2.2. The Modelica language

2.2.5 Functions and algorithms

More traditional imperative code can be written in Modelica inside algorithm sec-

tions. Unlike in normal equation sections, variables are assigned values directly

with the := assignment operator, they can also be assigned multiple times within

a single section. Both recursion and common imperative control �ow statements

such as if-then-else, for and while are supported. Algorithm sections in Modelica

are pure, i.e. without side-e�ects and global state, in order to support safe usage

inside equation systems. [4]

The special function class type can be used for implementing named mathematical

functions using algorithm sections. Functions can have multiple input variables

and, unlike many other languages, multiple outputs variables as well. Functions

can also declare local variables inside protected sections for use in the algorithm

section. [4]

Two examples of implementations for the factorial function are provided below:

function factorial_recursiveinput Integer i;output Integer o;

algorithmif i > 1 then

o := i * factorial_recursive(i-1);else

o := 1;end if;

end factorial_recursive;

function factorial_imperativeinput Integer i;output Integer o;

protectedInteger acc;

algorithmacc := 1;for x in 2:i loopacc := x*acc;

end loop;o := acc;

end factorial_imperative;

17

Page 26: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

2.2. The Modelica language

2.2.6 MetaModelica

MetaModelica is an extended version of Modelica designed for modeling program-

ming languages. It complements the algorithm support in Modelica with various

features common to functional programming, such as tagged unions with support

for recursion, linked lists, tuples, and pattern matching. It also adds support for

exception handling and generics [6].

Parameterized types

Parameterized types enable types to be specialized by another type as a parameter,

and is similar to generics in other programming languages. Most of the new built-

in types in MetaModelica support type parameters [6].

Lists

Lists contain an arbitrary number of objects of a single type. Lists are imple-

mented as immutable linked lists like in many functional languages, meaning that

they are immutable which enables parts of lists to be shared between di�erent

lists. New lists can be created in constant time by inserting new values before

existing lists with the :: (cons) operator. [6]. However, some operations like ap-

pending, getting a value from a speci�c index, and calculating the list length will

have linear time complexity. Lists can be created either with the cons operator or

by braces-surrounded list literals listing all values in the list, this is also used to

represent the empty list {}[13].

In addition, pattern matching can be used for extracting values from or comparing

lists[6]. MetaModelica also has several built-in methods for performing various

operations on linked lists[13]:

listAppend — Returns a copy of a list concatenated with another list

listDelete — Returns a copy of a list with a speci�c index-speci�ed object

skipped

listEmpty — Returns a boolean indicating if a list is empty (has length 0)

listHead — Returns the �rst object in a list

listGet — Get an object in a list by index (1-indexed)

18

Page 27: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

2.2. The Modelica language

listMember — Returns a boolean indicating if a list contains a speci�c value

listLength — Returns the length of a list

listRest — Returns the tail of the linked list (every object except the �rst)

listReverse — Return a reversed copy of a list

List<Integer> l, l2, l3; //variable declaration

l := {3, 4, 5};//list literall2 := 2 :: l;//creating a new list {2, 3, 4, 5} with the cons operator

i := listGet(l, 2);//accessing the second value through in the list (4)len := listLength(l);//getting the list length (3)l3 := listReverse(l);//getting a reversed list ({5, 4, 3})

Tuples

Tuples contain an arbitrary number of objects of mixed types, and can be seen

as a way to create simple records without having to write record declarations.

Values in the tuple can be accessed either through pattern matching or by dot

notation, denoted by following the tuple with a dot and the index of the object

(1-indexed)[6].

Tuple<Integer, String, List<Real>> t; //variable declaration

t := (12, "hello", {1.0, 2.0, 3.0}); //tuple literali := t.2; //accessing the second value through dot notation

Union types

Union type objects store record data with a type-safe constructor describing its

variant, and are similar to algebraic data types in functional programming. One

or more record types can be de�ned for a single union type. Union type instances

are also immutable, i.e. its �elds can not be modi�ed after it has been created.

19

Page 28: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

2.2. The Modelica language

Union types are recursive, meaning that they can have �elds of its own type, and

are therefore useful for describing tree structures, such as abstract syntax trees.

Pattern matching can be used for checking and extracting �eld values[6].

uniontype Numberrecord INT

Integer int;end INT;record RATIONAL

Integer int1;Integer int2;

end RATIONAL;record REAL

Real re;end REAL;record COMPLEX

Real re;Real im;

end COMPLEX;end Number;

Number a; //variable declaration

a := RATIONAL(8, 13); //literal with RATIONAL constructora := REAL(1.618033); //literal with REAL constructor

Option types

Option type values either carry a single �eld of a speci�c type or none at all, and

is generally used for cases where objects are optionally de�ned. They are im-

plemented as a built-in parameterized union type with the constructors NONE()or SOME(x) where x is a object of the parameter type. The constructor can be

checked with the ‘isSome‘ and ‘isNone‘ functions, and option type values can also

be unpacked with pattern matching like other union types[6].

Option<String> o; //variable declaration

o := NONE(); //none literalo := SOME("hej"); //some literal

if isNone(o) then...

end if;

20

Page 29: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

2.2. The Modelica language

Pa�ern matching

One of the most important features in MetaModelica is its pattern matching sup-

port, which is similar to pattern matching in many functional languages. This can

be used for more advanced control �ow and enables simple and powerful handling

of structural data[6].

Each case is tested in the order they are listed and contains a pattern, the body

to be executed and a case return expression calculated and returned by the match

expression after the body has �nished. The unit value () can be returned if an

actual return value is not desired. The return value can also be a tuple, allowing

multiple values to be returned. The return values in all cases in a single match

statements are required to be of the same type. The body for each case can either

be a algorithm section or a equation section, equation sections are however not

allowed to contain di�erential equations. A match statement can have its own set

of local variables, these can also be used for pattern binding[6].

Patterns that can be matched in a case include scalar constants such as integers

and strings, record constructors with named or positional arguments, tuples, lists

made with literal syntax, lists made with the cons (:: operator, and the _ wild

card which allows and ignores all values, these patterns can also be nested. Vari-

ables placed in a pattern will be bounded, i.e. assigned the actual value, if the case

match succeeds. In addition, the whole pattern itself can be bound to a variable

with the special as binding operator. The __ pattern as the single argument to

a record constructor can be used to bind all �elds without having to explicitly

name them. Apart from the pattern expression itself, a pattern can also include a

guard expression which must be true for the matching to succeed, this expression

pattern can include variables from the pattern expression[6].

Pattern matching expressions come in two variants with di�erent behaviour when

an exception is raised in the case body: match, which makes the whole match

statement fail as expected and matchcontinue, which instead rewinds the

state and tries the following patterns, failing the whole match expression only

when all patterns have been exhausted[6].

Comprehensions

List and array comprehensions allow the user to write concise mapping and �l-

tering on collections using some syntactic sugar. They take map expression and

one or more collections with a named iterator variable for each collection, and

21

Page 30: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

2.3. The OpenModelica environment

can optionally take guards �ltering the values. There are also “threaded” compre-

hensions which work like a zip between any number of lists[6].

list<Integer> l0 := list(1+x for x guard 0<=x in otherList);list<Integer> l1 := list(a+b threaded for a in 1:2, b in 3:4);// {1+3,2+4}list<Integer> l2 := list(a+b for a in 1:2, b in 3:4);// {4,5,5,6}

Exception handling and asserts

Exceptions such as out-of-bounds accesses and divisions by zero can be tested by

putting the expression or statement inside a failure call, which will succeed

if the test statement causes an exception and throw an exception if the test state-

ment succeeds. If an unhandled exception occurs inside a matchcontinue case,

the program will then rewind the state and try the following cases rather than

making the entire match statement fail. Exceptions can also be generated explic-

itly with the fail function, or by assertions using the assert function, which

takes an assertion condition, a message string and optionally an assertion severity

level[6].

2.3 The OpenModelica environment

OpenModelica is an open-source Modelica-based simulation and modeling envi-

ronment. Some of its main purposes is to provide e�cient, easy-to-use and well

visualized Modelica-based simulations while also serving as a teaching and re-

search tool and as a reference implementation that is itself written largely in

Modelica[5]. Most of the development of OpenModelica is done by Linköping

University in Sweden.

2.3.1 Compiler structure

The OpenModelica compiler takes Modelica code and translates it to C code which

can be compiled by a standard compiler. The subsystem also provides an inter-

preter so that code can be tested interactively[3].

Most parts of the OpenModelica compiler are written in MetaModelica. The

OpenModelica compiler can compile MetaModelica code, including bootstrapping

itself[18].

22

Page 31: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

2.3. The OpenModelica environment

Translator

Analyzer

Optimizer

Code Generator

C Compiler

Simulation

Modelica source

DAE with �attened models

DAE with sorted equations

DAE with optimized sorted equations

C source code

Executable program

Figure 2.2: Overview of translation phases in the OpenModelica compiler

The OpenModelica Compiler is organized, like most other compilers, as a pipeline

of these phases[4][3] as seen in �gure 2.2:

Translator — parses the source code into the initial Absyn-format AST, con-

verts it into the simpli�ed SCode-format intermediate AST, and reduces the

object-oriented structures to a single �at equation system in the DAE-format

AST. Type checking and other static analyses are also performed here.

Analyzer — performs transformations on the equation system so that they can

be e�ciently solved, including dependency sorting the equations and con-

verting to imperative assignments.

Optimizer — performs optimizations on the DAE.

23

Page 32: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

2.3. The OpenModelica environment

Parse

SCode/explode

Inst

BackendDAECreate

Symbolic operations

(BackEnd)

SimCode

Code generator

Lookup Static Ceval

Modelica code

Absyn

SCode

DAE

Backend DAE

Sorted and optimized DAE

SimCode

C code

Figure 2.3: Overview of the OpenModelica components

24

Page 33: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

2.3. The OpenModelica environment

Code Generator — generates compilable C code from the DAE. This code is then

passed to a C compiler.

A more detailed overview on some of the most relevant modules used in the code

generation is shown in �gure 2.3.

2.3.2 Susan as a Code Generator

Susan is a template language used by the OpenModelica Compiler. Its purpose is

to allow easy to use text generation from MetaModelica structures.

A Susan �le consists of several templates that accept some MetaModelica data

type and return text. Templates can also use what’s called bu�ers to �ll in holes

left in the returned text. Templates may be used solely for their e�ects on bu�ers

and not for the text they return.

See listing 2 for an example of a Susan template. The listing contains a bu�er

auxFunction and a match on var. The cases of the match return the �-

nal result of the entire template. The VARIABLE case has a nested template

contextCref to which it passes the auxFunction bu�er.

template funArgBoxedDefinition(Variable var)"A definition for a boxed variable is always of typemodelica_metatype, unless it's a function pointer"

::=let &auxFunction = buffer ""match varcase VARIABLE(__)

then 'modelica_metatype <%contextCref(name,contextFunction,&auxFunction)

%>'case FUNCTION_PTR(__)

then 'modelica_fnptr _<%name%>'end funArgBoxedDefinition;

Listing 2: A snippet in the Susan template language

2.3.3 The DAE representation

The DAE representation is a AST representation that, unlike the previous repre-

sentation stages, have the object-oriented structures such as class instances and

25

Page 34: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

2.3. The OpenModelica environment

connections simpli�ed and �attened into a single equation system. This �atten-

ing is done from the SCode representation by the Inst module. However, Meta-

Modelica data structures are still preserved and constructed in run-time. Like the

other representations in OpenModelica, it is implemented using MetaModelica

data structures such as union types, optionals and lists[3].

A function in DAE can contain various di�erent elements, such as algorithms,

equations of di�erent kinds, variables, reinit statements, calls and asserts[15].

This overview will focus on the part implementing the algorithm subset, which

is the subset most relevant to the IR implemented in this thesis.

Element and Algorithm union types

A function contains elements of various types, such as algorithm sections, equa-

tions of di�erent forms, and variables. These are represented by the Element algo-

rithm[15]. Described below are the element types most important to this thesis.

Although all element types contain a source �eld of the ElementSource union

type containing metadata such as source code line numbers and classes and in-

stances it belongs to, this �eld is skipped in these descriptions for brevity.

VAR - This element type represents variables and contains many �elds related to

names, types, equation �ow and connections. The most important ones for

this thesis are the component reference and the type �eld.

ALGORITHM - This element type represent algorithm sections and contains a

�eld of the Algorithm union type, which simply contains a list of state-

ments.

ComponentRef union type

Component references represent hierarchical path names and are typically used

for describing variables[15].

CREF_IDENT — This record type represents a non-hierarchical or bottom-level

identi�er, and contains the name as a string, its type and a list of optional

subscripts.

CREF_ITER — This record type is used for iterators, and contains an index used

for code generation in addition to the data in in CREF_IDENT.

26

Page 35: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

2.3. The OpenModelica environment

CREF_QUAL — This record type represents a higher level in a hierarchical path,

and contains a component reference to the level below in addition to the

data in CREF_IDENT.

Absyn.Path union type

While Absyn.Path is strictly part of the Absyn representation de�nitions, it

is frequently used in DAE for externally accessible objects such as functions or

union types, and so it is mentioned here.

IDENT — This record type represents a non-hierarchical or bottom-level identi-

�er, and contains the name as a string,

QUALIFIED — This path type represents a higher level in a hierarchical path

,and contains the path to the level below in addition to the name string of

its level.

Statement union type

The statement record types available are assignments of various types and control

�ow statements such as calls, if statements, loop statements like for and while,

when statements, and simple skipping statements like break, continue and re-

turn[15]. Described below are the statement types most important to this thesis.

Although all statement types, like the element types, contain aElementSourcesource �eld containing metadata, this �eld is skipped in these descriptions for

brevity as well.

STMT_ASSIGN — This statement type describes an assignment and contains the

type of the assignment and the expressions of the left and right hand side.

STMT_IF and the Else union type — This statement type describes an if

statement and contains the conditional expression, a list of statements to

be executed when the condition is true, and a value of the Else union

type to describe the behaviour when the condition is false. The type in the

Else union type �eld can either be NOELSE signifying that nothing is done,

ELSEIF performing another conditional step and having the same �elds

as a STMT_IF, or a ELSE which simply contains a list of statements to be

executed on a false condition.

27

Page 36: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

2.3. The OpenModelica environment

STMT_FOR — This statement type describes a for(each) statement and contains

the type of the iterator, the name of the iterator variable, the range expres-

sion to be iterated over and a list of statements executed in the loop body.

It also contains a few additional code generation-aiding variables which did

not have to be considered in the development of this thesis.

STMT_WHILE — This statement type describes a while statement and contains

a conditional expression and a list of statements executed in the loop body.

STMT_NORETCALL — This statement type describes a call not having or storing

any return values, and the only �eld is contains is an expression of the call

type described further down.

STMT_BREAK, STMT_CONTINUE and STMT_RETURN — These statement

types simply describe break, continue and return statements and do contain

any additional data. Note that value returns in Modelica are done by as-

signments to designated output variables rather than by return statements,

therefore STMT_RETURN does not contain any return values, but simply

exits the function.

Type union type

This union type represents the data types used in DAE[15].

T_INTEGER, T_REAL, T_STRING and T_BOOL — These types simply repre-

sent the basic data types in Modelica, i.e. integers, reals, strings and

booleans.

T_NORETCALL — This type represents the return value of a call without output

variables.

T_TUPLE — This type represents tuples as returned from functions with multi-

ple output values contains a list of types indicating the type of each tuple

element and a optional list of tuple �eld names as strings.

T_METALIST — This type represents MetaModelica lists and contains a type

�eld indicating the type of its elements.

T_METATUPLE — This type represents MetaModelica tuples and contains a list

of types indicating the type of each tuple element.

28

Page 37: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

2.3. The OpenModelica environment

T_METAOPTION — This type represents MetaModelica optionals indicating the

type of its element when it contains a value.

T_METAUNIONTYPE — This type represents MetaModelica union types.

T_METARECORD — This type represents MetaModelica records, and contains an

Absyn.Path to the union type, an Absyn.Path to the record, a list con-

taining the type of each �eld, the constructor ID for the record, a list of the

Var components of each �eld, and a boolean indicating if the record type is

a singleton.

T_METAARRAY — This type represents MetaModelica arrays and contains a type

�eld indicating the type of its elements.

T_METABOXED — This type represents MetaModelica boxed values.

Exp union type

This union type represents the expression types that can be used in DAE such as

literals, operators, variable references and calls[15].

ICONST, RCONST, SCONST and BCONST — These expression types simply

represent constants of the basic Modelica data types, i.e. integers, reals,

strings and booleans. Its sole �eld is the constant value it contains.

CREF — This expression type represents a variable reference and contains a com-

ponent reference �eld and the type of the variable.

BINARY and UNARY — These expression types represent binary or unary arith-

metic operations and contains one or two subexpressions and a Operatorvalue denoting the operation to be performed.

LBINARY and LUNARY — These expression types represent binary or unary

logical operations such as and, not, and or. Similar to the arithmetic op-

erations, it contains one or two subexpressions and a Operator value de-

noting the operation to be performed.

RELATION — This expression type represents comparisons. Apart from having

two subexpressions and a Operator value like other binary operations,

it has some additional �elds for model simulation handling which is not

considered here.

29

Page 38: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

2.3. The OpenModelica environment

IFEXP — This expression type represents an if expression and contains three

subexpression: one for the condition, and one each for the true and false

case.

CALL — This expression type represents a call and contains the name of the

function, a list of subexpressions denoting the arguments and a special

CallAttributes �eld storing various additional data about the call.

Some of the data stored in CallAttributes are the type of the return

value, if the function call return multiple values as a tuple, if the call is to a

built-in function, and if the call is inline or a tail call.

RANGE — This expression type represents numeric ranges is typically used in

for statements and contains the type of the numeric values, the start value,

the end value and optionally the step between each value, which is 1 if not

speci�ed.

CAST — This expression type represents a type cast and contains the type the

value is cast to and a subexpression representing the value is being cast.

TSUB — This expression type represents tuple subscripts and contains the subex-

pression to be subscripted, the integer index, and the type of the returned

value.

ASUB — This expression type represents array subscripts and contains the subex-

pression to be subscripted and a list of integer indexes with each value rep-

resenting a di�erent array dimension.

RSUB — This expression type represents record value accesses and contains the

subexpression of the record, the integer o�set of the �eld, the name of the

�eld, and the type of the returned value.

LIST — This expression type represents a MetaModelica list literal or a nil node

and contains a list of subexpressions denoting each element stored in the

list.

CONS — This expression type represents a MetaModelica list node and contains

two subexpressions denoting the head and tail of the list node.

META_TUPLE — This expression type represents a MetaModelica tuple node and

contains a list of subexpressions denoting each element stored in the tuple.

META_OPTION — This expression type represents a MetaModelica optional and

contains an optional subexpression.

30

Page 39: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

2.3. The OpenModelica environment

METARECORDCALL — This expression type represents a MetaModelica record

constructor and contains the path to the record, the arguments as a list of

subexpressions, a list of �eld names, the record variant number, and a list of

types for each �eld.

MATCHEXPRESSION — This expression type represents match expressions

and contains a �eld of the MatchType union type that can be

MATCHCONTINUE or MATCH, a list of subexpressions for the expressions

to be matched, a list of local declarations as Element values, a list of

cases as MatchCase values, and the type of the match expression. The

MatchCase union type is described more in detail below.

BOX — This expression type represents a MetaModelica boxed value and contains

a subexpression for the value to be boxed.

UNBOX — This expression type represents the unboxing of a MetaModelica boxed

value contains a subexpression for the value to be unboxed and a type �eld

indicating the type of the unboxed value.

PATTERN — This expression type represents various patterns as used in match

statements. Its sole value is of the Pattern union type described more in

detail below.

MatchCase union type

This union type represents a single case in a match expression and contains a

single variant record type ‘CASE‘. It contains a list of patterns of the Patternunion type, an optional guard subexpression, a list of local declarations as ele-

ments, a case body as a list of statements, an optional case return subexpression,

and some source-code related metadata[15].

Pattern union type

This union type represents patterns used in match expressions, and can also be

recursive like expressions[15].

PAT_WILD — This pattern type represents a wildcard that accepts all values

without binding anything. It does not contain any data.

31

Page 40: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

2.3. The OpenModelica environment

PAT_CONSTANT — This pattern type matches various literals like numerals,

strings, empty list, and NONE. The record contains the expression and op-

tionally a type used for unboxing the value.

PAT_AS — This pattern type allows binding the entire value to a name while

continuing to match on its contents, such as listVar as _::tailVar,

and contains an identi�er, an optional type for unboxing, some attributes of

the identi�er, and the pattern that will be matched.

PAT_META_TUPLE — This pattern type matches the content of a tuple and con-

tains a list of patterns, one for each element.

PAT_CONS — This pattern type represents a linked list node and contains two

subpatterns representing the head and tail of the list.

PAT_CALL — This pattern type matches a union type constructor and contains

a name, the index of the matched record within its union type, the patterns

for each record attribute, a list of variables for each attribute, a list of types,

and a boolean indicating if the union type is known to be a singleton.

PAT_SOME — This pattern type represents an optional with a SOME value and

contains a subpattern for the actual value.

32

Page 41: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

Chapter 3

Method

3.1 Design

During the design phase, di�erent IR designs and existing IR solutions of notable

compilers were evaluated and compared in order to create an initial IR design. The

evaluation focused on extendability, ability to implement optimizations and ease

of implementation with regards to conversions from the AST and to the back-end

code, with special focus on easy conversion with SSA-based back ends such as

LLVM.

The code base of the OpenModelica compiler and its corresponding documenta-

tion was also investigated in order to make good design decisions.

3.2 Implementation

The implementation roughly consists of three parts: one phase converting the

DAE representation to the new IR, one optimization phase where the generated

IR is improved in some respect, and another one converting the new IR to com-

pilable C-code. MetaModelica was used as the programming language for the

implementation, since this language is used by the rest of the compiler.

3.3 Performance evaluation

During the evaluation phase, the code quality and performance of the new code

generator were compared to the results for the old code generator. These results

33

Page 42: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

3.4. Code complexity measurements

was then analyzed in order to see how large the di�erences are between the new

representation and its optimizations and if the new generator gives an improve-

ment.

The time was measured with the execStat timing module that is built-in into

the OpenModelica compiler. As execution time of compiled code wasn’t previ-

ously measured, this had to be implemented separately with a core change out-

side the MidCode code base. The test-cases were executed multiple times in order

to guard against anomaly results, then a result representing the median case was

picked. The input data and exact number of execution times were chosen so that

the total time would be large enough to be accurately measured while not taking

too long time to run. The computer used to run the measurements was a laptop

with a Intel i7 2630QM (Sandy Bridge) processor.

The following benchmark functions were made, which can be seen in appendix

A:

�bonacci – Recursive �bonacci F30 without memoization, executed 100 times

mandelbrot – ASCII Mandelbrot with 1000 iterations returning a linked list of

characters, executed 200 times

tak – Takeuchi function tak(18, 12, 6), executed 10000 times

qsort – Quick-sort of a random array of 20000 elements, executed 100 times

The C compiler used for compiling the generated code was GCC 7.2. The opti-

mization setting for the C compilation was changed to -O2 rather than the usual

-O0 since it was noticed that the low-level style of the MidCode-generated C code

was poorly suited for unoptimized compilation. It was also noted that the parti-

tion function in the Quicksort test was tail-call optimized by the original genera-

tor, something that has not been implemented in the current MidCode generator.

3.4 Code complexity measurements

The complexity of the di�erent code generators was also measured using the num-

ber of lines of code (LOC). This is measured because being able to have simpler

code generators means that there is less work to port the language to another

code generator.

34

Page 43: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

3.4. Code complexity measurements

According to Nguyen et al., LOC is used widely within industry and literature

while being an essential component of several more advanced software complex-

ity measurements[14]. Speci�cally, we use the number of lines in the �le includ-

ing empty lines, comments, etc. A discussion of how appropriate and relevant

this metric is can be found in the section 5.3.

Both of the target speci�c implementations are in the Susan template language.

We compare to CodegenCFunctions.tpl, which is closest in functionality.

Unfortunately, this is not a precise comparison since the �le chosen for compar-

ison implements more features than our implementation. The old back end also

has more template �les like CodegenC.tpl, see table 4.1, but it is mostly im-

plementing features that are outside the scope of this thesis.

35

Page 44: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

Chapter 4

Results

4.1 Overview of the MidCode design

MidCode, the resulting IR, represents the control �ow of a procedure by the com-

mon approach of basic blocks. Each basic block has a terminator which declares

what control �ow action happens at the end of the block, this may include opera-

tions returning values such as calls. The data �ow of the procedure is represented

by named variables, compiler-created temporaries and simple unary or binary

operations. Unlike SSA, named variables can be rewritten.

The MidCode related code paths are divided into three phases: “From Modelica

to MidCode”, “MidCode Transformations”, and “From MidCode to C”.

4.1.1 IR design details

This part describes the uniontypes and records de�ned for MidCode and the �elds

contained within these.

Program

A program is represented by the Program type. This type contains a name and

a list of functions.

36

Page 45: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

4.1. Overview of the MidCode design

DAEToMid

MidCode transformations

MidToC

DAE/SimCode representation

MidCode

MidCode

C code

Figure 4.1: Overview of MidCode phases

Function

Functions are represented by the Function type. Each function contains a name

as an Absyn.Path, several lists of local, input and output variables, a body rep-

resented as a list of basic blocks, and ID references to the special entry and exit

basic blocks.

Block

Basic blocks are represented by the Block type. They contain a block ID number,

a list of statements and a terminator.

Stmt

Statements are represented by the Stmt type and can either be a NOP or an

ASSIGN, which simply assigns the value of an RValue to a Var. A statement

has linear control �ow but otherwise has various e�ects.

Var

Variables are represented by the Var type, and are used to represent both vari-

ables used by the Modelica code and variables introduced during the translation

process. Vars have a name and a data type.

37

Page 46: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

4.1. Overview of the MidCode design

OutVar

Since output variables can be thrown away by the caller, lists of output variables

in call statements contain the OutVar type rather than the plain Var type. In-

stances of this type can either be a OUT_VAR containing an actual Var instance,

or OUT_WILD indicating that the caller will not save the value.

RValue

An RValue is a value that can be placed on the right side of an assignment.

The RValue type in MidCode contains a few expressions like addition of two

Vars and negating a Var. They appear in MidCode as part of assign statements.

RValues do not have other RValues as operands, instead temporary variables

are created during the translation process which are then sent as operands.

UNARYOP — An UNARYOP is a constructor of the RValue union representing

operations with a single operand, i.e. a single Var. UNARYOP has variants

representing for copying the unchanged value, negating, logically inverting,

boxing and unboxing a variable. The operation to choose is determined by

an enumeration value.

BINARYOP — A BINARYOP is a constructor of the RValue union representing

operations with two operands, i.e. two Vars. BINARYOP has several vari-

ants representing common operations like addition, subtraction, division,

multiplication, logical or/and, and comparisons. The operation to choose is

determined by an enumeration value.

Literal value constructors — A group of constructors of the RValue union rep-

resenting literal values. The LITERALINTEGER constructor represents

integer literals, the LITERALREAL constructor represents real (�oating-

point) literals, the LITERALBOOLEAN represents boolean values, and

LITERALSTRING represents literals. The more complex meta object lit-

erals used for records, linked lists, optionals and tuples are represented by

the LITERALMETATYPE constructor.

Meta object data accessors — A group of constructors that are used for access-

ing data about meta objects. The METAFIELD constructor returns a value

from a meta object slot and is used for accessing record and tuple �elds.

There are also three constructors speci�cally made for pattern matching,

UNIONTYPEVARIANT returning the value of the record variant for union-

types, ISCONS for checking if a linked list node is cons or nil, and ISSOMEfor checking if an optional has a value.

38

Page 47: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

4.1. Overview of the MidCode design

Terminator

Each basic block has a terminator controlling the control �ow following the block,

which is represented by the Terminator type. Terminators have e�ects and can

cause branching and/or exceptional control �ow.

GOTO — The GOTO terminator simply jumps to a given block.

RETURN — The RETURN terminator simply exits the procedure.

BRANCH — The BRANCH terminator jumps to one of two given blocks depending

on if the given condition variable is true or false, and is used by several

terminator types.

SWITCH — The SWITCH terminator jumps to one of multiple given blocks in a

dictionary depending on the value of the given condition variable, this is

used when generating code in match statements.

CALL — The CALL terminator is a function call to another Modelica function.

Since it can cause control �ow via exceptions (for example through the

fail function), it is de�ned as a terminator rather than a statement.

LONGJMP, PUSHJMP and POPJMP — The LONGJMP terminator causes a con-

trol �ow transfer to the active PUSHJMP call site, even across function

boundaries. The PUSHJMP terminator is used to add a new active location

for LONGJMP while the PUSHJMP terminator is used to deactivate a corre-

sponding active PUSHJMP and cause the previously called one to become

active.

ASSERT and TERMINATE — The ASSERT terminator aborts the program with

an error message if a Var containing a condition result has a false value.

The TERMINATOR simply unconditionally aborts with an error message.

The error message for both terminators is given by a Var.

4.1.2 From Modelica to MidCode

MidCode is designed to represent interesting low-level properties uniformly,

which means that we need to lower several high-level Modelica representations

into a composition of MidCode constructs. The DAEToMid phase takes Mod-

elica functions as given from the SimCode module and converts it to Mid-

Code. The most important �elds in a SimCode function object are its name as

an Absyn.Path its variable de�nitions, and its list of DAE statements.

39

Page 48: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

4.1. Overview of the MidCode design

Expressions are �attened and converted to MidCode by recursively translating

subexpressions into statements. The subexpression statements store their values

into temporary variables which are then used as operands in the statement of the

containing expression.

An if statement is translated into a set of blocks evaluating the condition termi-

nated by a branch. The branches are the set of blocks representing the body of

the if and the other leading either to the else body or the end of the if statement.

At the end of the if body represented by MidCode blocks is a jump to the end of

the statement. For a while statement, the terminator at the end of the body in-

stead jumps back to the evaluation of the condition. A for statement is translated

somewhat similarly to a while statement but generates its own code for iterator

handling and comparisons. When translating loops, the generated body and end

labels are inserted into stacks so that break and continue statements can be

implemented correctly.

Match expressions are translated into a state machine that keeps track of which

case is next. Patterns are translated into a series of checks, consisting of MidCode

blocks and branches, and assignments. If a check fails, it advances the case state

machine and branches to the next case, but if all checks succeed, the match per-

forms the assignments and continues with the body of the case. If no cases match,

then an exception is raised.

Exceptional control �ow are translated into MidCode constructs that are close to

the current C back-end implementation. Basically, “landing pads” are constructed

and removed using PUSHJMP and POPJMP. There are also restrictions put in

place to simplify e�cient transformations of exceptional structures, see more de-

tails in section 4.1.5. These restrictions are not limiting for this phase since all

exceptional control �ow from MetaModelica already �ts into the restricted struc-

ture, but it does need to take care not to introduce more complicated control �ow

including exceptional components.

One interesting symbiosis presents itself between the two phases “From Model-

ica to MidCode” and “MidCode Transformations”. The optimization phases will

contain many transformations that make MidCode constructs more e�cient, al-

lowing the initial transformation to be simpler as long as it produces optimizable

MidCode. For example, any dead code produced can be removed by an optimiza-

tion pass. It is also possible to add additional “cleanup” optimizations tuned for

removing ine�ciencies often introduced in our translation.

40

Page 49: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

4.1. Overview of the MidCode design

4.1.3 MidCode Transformations

MidCode is designed to allow transformations from MidCode to MidCode that

optimizes or otherwise improves the code.

One such transformation that is special is one that removes function local

longjmp. MidCode allows longjmp within functions but that is not correct in C,

thus local longjmps must be removed at some point. Doing it in the “from Mod-

elica” phase means the optimization phase is not allowed to introduce longjmps,

for example due to inlining. Doing it in the later “to C” phase means that other

back ends with the same restrictions need to do the same work that could be han-

dled once and for all in MidCode transformations. Doing it in the transformation

phase allows the analyzer to look at the new control �ow generated from the

transformation and try to glean further bene�cial transformations.

4.1.4 From MidCode to C

The MidToC phase takes MidCode functions produced by the previous

DAEToMid phase and transforms them to C code using Susan templates. There

are templates for generating basic MidCode constructs like functions and state-

ments as well as helper templates where appropriate to help generate the con-

structs. The code generation phase does appear to be simpler. It requires no use

of Susan bu�ers which means the generation can simply be implemented as string

appending.

4.1.5 Exceptional Control Flow

MetaModelica has language constructs for exceptional control �ow. This includes

constructs like matchcontinue and fail. In C, the interesting parts of this is im-

plemented using longjmp. The MidCode implementation is straightforwardly

based on the same model.

MidCode contains three terminators for handling this. The PUSHJMP terminator

uses the C call setjmp and a special jump bu�er to set a landing pad that the next

LONGJMP terminator will go to. The POPJMP terminator uses an old bu�er saved

by a corresponding PUSHJMP to remove the landing pad made by that PUSHJMP.

As previously hinted at, there should be some structure with respect to PUSHJMPand POPJMP. Each PUSHJMP should be accompanied by a corresponding

41

Page 50: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

4.1. Overview of the MidCode design

POPJMPwith the correct variables. The path between can branch or loop, but the

terminators should cover the only entry and exit to a subgraph between them. See

listing 3 for a pseudo-code example of exceptional e�ects “hidden” by branching.

if (c) pushjmp(...);body(...);if (c) popjmp(...);

Listing 3: Incorrect exceptional control �ow in MidCode.

These restrictions might be lifted but currently exist in order to simplify the nec-

essary analysis for converting local longjmps to goto.

4.1.6 Data in MidCode

Many boxed data types from MetaModelica are translated into a special uniform

representation in C. This includes union types, linked lists, arrays and optionals.

The representation is a heap-allocated sequence of pointer-sized words with the

�rst one being a header. The header contains the length of the sequence as well

as a constructor tag for union types. Lists and options are coded like union types,

as can be expected.

The data representation in MidCode reuses many parts of the MetaModelica rep-

resentation and is very similar.

There are some simple types: integer, real, enumeration, bool. Sev-

eral of the “meta”-pre�xed types have the same representation in C,

modelica_metatype, so in MidCode they are uni�ed under METATYPE. The

DAE types included are METATUPLE, METARECORD, METALIST, METAARRAY,

and METAOPTION.

4.1.7 Complexity of MidCode

For the complexity analysis table, 4.1 contains LOC measurements for the current

code generator, while table 4.2 contains the LOC measurements for the code in-

troduced in this thesis. The di�erences are large, but not all of the �les included

in the results here are relevant for comparisons, and the comparisons made are

further quali�ed and elaborated upon in later chapters.

42

Page 51: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

4.2. Performance measurements

6094 CodegenC.tpl

6995 CodegenCFunctions.tpl

12089 total

Table 4.1: Lines of code for corresponding parts of old back end

727 MidToC.tpl

1513 DAEToMid.mo

99 HashTableMidVar.mo

230 MidCode.mo

184 MidToMid.mo

2969 total

Table 4.2: Lines of code for new back end

Program C MidC %

�bonacci 0.942s 1.498s 63%

mandelbrot 3.037s 3.309s 92%

tak 2.762s 2.459s 112%

qsort 2.368s 2.895s 82%

Table 4.3: Performance measurements for old and new code generator

4.2 Performance measurements

Table 4.3 contains the runtime performance measurements from the old and new

MidCode-based code generator, and their relative di�erence in percentage of per-

formance of the old generator.

43

Page 52: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

Chapter 5

Discussion

5.1 Performance results

The performance measurements made showed that performance was mostly sim-

ilar, which is not particularly surprising. While the MidCode-generated code was

noticeably better in some cases, it could also be signi�cantly worse in others.

Many optimizations are made in DAE, which means MidCode gets the same ad-

vantages as the previous code generator. As we have not yet implemented im-

pactful enough optimizations, the simpli�cations did not by themselves improve

the performance of the code generator. However, we hope that the new IR will

prove easy to implement optimizations in. A likely improvement is to use LLVM

instead of C as the output, this will likely be helped by having the simpler basic

block-based MidCode as a starting point.

5.2 Design of MidCode

During the design of MidCode there were many choices that had to be made. In

this section we will discuss some of them.

While SSA is a powerful tool in IRs, MidCode does not use SSA. In order to get

value from SSA we would need to perform an analysis like LLVM’s mem2regwhich we initially did not want to plan for. One of our inspirations, Rust’s MIR,

also does not feature SSA because LLVM performs many of the analyses that uses

SSA. Since MidCode is intended to be used with LLVM in the future, OpenMod-

elica will also be able to leverage these optimizations.

44

Page 53: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

5.3. Code Complexity of MidCode

We decided to transform the many control �ow constructs in DAE to fewer Mid-

Code constructs. We used basic blocks with terminators that point to the next

block, like MIR. This was done because control �ow is crucial for being able to

reason about a program, and any logic used for transforming the code need to be

able to handle every construct. Otherwise it may produce incorrect results, for

example if the analysis fails to notice that a return or break means some code is

not reached and its e�ects should not happen.

We did not change the types of variables because we did not �nd a suitable model

and since the DAE type system was considered su�cient in this stage. Maybe

with more time, we could have made a lower level type system more suitable for

a low-level IR.

5.3 Code Complexity of MidCode

In the LOC comparison, we compare Susan templates to each other, hence we

compare the same style and use the same method. Thus according to Nguyen et

al. the size measurement itself is useful for comparison.[14]

Unfortunately, while the size is comparable the functionality is not.

CodegenCFunctions does do more things than our implementation, for

example support for parallel computations. We do however plan on our template

�le not increasing signi�cantly in size if additional features were added to our

new backend. Thus we hope that the numbers would hold if there is future work

that would make the comparison more fair in terms of features. But currently we

do not think that any particular importance can be placed on this result.

5.4 Related Work

MidCode is fairly similar to other intermediate representations mentioned in the

Theory chapter. Most notably, it takes inspiration from the MIR representation

used in the Rust compiler. Like MIR, it in based on �at primitive operations and

control-�ow graphs with basic blocks and terminators, and does not use SSA to

describe the data �ow. It also has a similar purpose, as the Rust compiler intro-

duced MIR as a new-stage between the AST and the LLVM generation whereas

previously the LLVM code was generated directly from the AST. The SIL represen-

tation used in the Swift compiler also has a similar purpose and a similar design,

but has several di�erences compared to MidCode and MIR like a SSA-based data

45

Page 54: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

5.5. The work in a wider context

�ow, basic block arguments, and calls implemented as regular statements rather

than terminators.

The GHC Haskell compiler also has several representations that we could compare

to. MidCode is intended as a low-level abstraction over C and thus �ts GHC’s

Cmm very well. Compared to Cmm, MidCode has fewer features, and does not

allow control over register allocation or tail calls. So that leaves us to wonder

how OpenModelica’s version of STG or Core would look. Of these, Core seems

the most interesting since it is the “Core” of Haskell semantics. There is a similar

higher level intermediate representation used in OpenModelica, DAE. DAE is used

as a lower level representation than the AST but it still retains control structures

expressed in redundant forms, for example for, while, and if. Core means

to get rid of redundant ways to express the same thing because, from a program

analysis point of view, fewer constructs means fewer cases to consider. MidCode

does get rid o� redundant forms, e.g. control �ow forms, and may be closer to

Core with regards to the purpose of the representations.

5.5 The work in a wider context

The work of this thesis will hopefully enable future performance improvements

to the OpenModelica compiler, and therefore increase productivity in the indus-

try and academia where OpenModelica-based simulations are done. It may also

increase competitiveness for the OpenModelica-based solution compared to pro-

prietary and commercial Modelica environments.

46

Page 55: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

Chapter 6

Conclusion

While the project did not give immediate performance bene�ts, it provides a good

starting point for better optimizations and simpler code generation, and it has

potential to be further extended and �ne-tuned to match the current and future

needs of the OpenModelica compiler project. It has been able to demonstrate a

practical way of lowering the representation of a subset of MetaModelica dealing

with imperative algorithms. However, even then there are several major Modelica

features left that has to be added to the representation in order to make it truly

useful in production.

6.1 Future work

The work has plenty of room for further extensions. One of the main motivators

for the work was to enable more optimizations. As an example, to implement

common subexpression elimination, we would like several general features. These

data�ow related include some way to track mutation of variables like single static

assignment and tracking whether an operation has side e�ects.

There are also interesting possibilities for control �ow analysis. After performing

a branch on a condition we, in that branch, have the knowledge that the condition

is true, and vice versa for a condition shown to be false. This knowledge can be

used to simplify code in that branch, for example by removing another branch for

the a condition implied true by the gained knowledge. By �guring out whether

a function can fail, a region of code could be shown not to fail and a PUSHJMP,

POPJMP pair may be removed, which in turn can allow for further optimizations

since side e�ecting terminators were removed.

47

Page 56: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

6.1. Future work

Implementing inlining in MidCode is a potent future technique that is very im-

portant for enabling optimizations by opening the callee to analysis and special-

ization.

MidCode does not currently support all of Modelica and MetaModelica. Support

for more language constructs can be added as future work. Similarly some sup-

port is lacking or incorrectly implemented, there are for example in the current

implementation some issues with handling of lexical scopes.

48

Page 57: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

Appendix A

Performance test functions

A.1 Fibonacci

function fibinput Integer i;output Integer o;

algorithmo := match i

case 0 then 0;case 1 then 1;else then fib(i - 1) + fib(i - 2);

end match;end fib;

A.2 Mandelbrot

function mandelbrot_displayoutput list<String> out;

algorithmout := {};for y in -39:39 loop

for x in -39:39 loopif mandelbrot(x/40.0, y/40.0) == 0 then

out := "*" :: out;else

out := " " :: out;end if;

end for;

49

Page 58: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

A.3. Quicksort

out := "\n" :: out;end for;

end mandelbrot_display;

function mandelbrotinput Real x;input Real y;output Integer result;

protectedReal cr, ci, zr, zi, zr2, zi2;Real tmp;Integer iter;

algorithmcr := x - 0.5;ci := y;

zr := 0;zi := 0;

iter := 0;

while true loopiter := iter + 1;tmp := zr * zi;zr2 := zr * zr;zi2 := zi * zi;zr := zr2 - zi2 + cr;zi := 2 * tmp + ci;if zi2 + zr2 > 16 then

result := iter;return;

end if;if iter > 1000 then

result := 0;return;

end if;end while;

end mandelbrot;

A.3 �icksort

function qsortinput list<Integer> l;output list<Integer> o;

50

Page 59: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

A.4. Takeuchi function

protectedlist<Integer> smaller;list<Integer> larger;Integer pivot;list<Integer> rest;

algorithmo := match l

case {} then {};case pivot::restalgorithm

(smaller, larger) := partition(pivot, rest, {}, {});then listAppend(qsort(smaller),

listAppend({pivot}, qsort(larger)));end match;

end qsort;

function partitioninput Integer pivot;input List<Integer> l;input output List<Integer> smaller;input output List<Integer> larger;

protectedInteger head;list<Integer> tail;

algorithm(smaller, larger) := match l

case {} then (smaller, larger);case head::tail guard head <= pivot

then partition(pivot, tail, head::smaller, larger);case head::tail guard head > pivot

then partition(pivot, tail, smaller, head::larger);else then ({},{});

end match;end partition;

A.4 Takeuchi function

function takinput Integer x;input Integer y;input Integer z;output Integer o;

algorithmo := if y < x then

51

Page 60: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

A.4. Takeuchi function

tak(tak(x-1, y, z), tak(y-1, z, x), tak(z-1, x, y))else

z;end tak;

52

Page 61: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

Bibliography

[1] A.V. Aho, M.S. Lam, J.D. Ullman, and R. Sethi.

Compilers: Principles, Techniques, and Tools. Pearson Education, 2011.

isbn: 9780133002140.

[2] E�cient IR for the OpenModelica Compiler (thesis proposal). 2016.

url: https://openmodelica.org/images/docs/OpenMasterThesis/2016_Efficient_IR_for_OpenModelica_compiler_v1.pdf.

[3] Peter Fritzson et al. OpenModelica System Documentation, Version2014-02-01 for Modelica 1.9.1 Beta1. Feb. 2014.

url: https://github.com/OpenModelica/OpenModelica-doc/blob/d5928d96c0157e3c8762b2b85b67a7a963be9763/OpenModelicaSystem.pdf.

[4] Peter Fritzson. Principles of Object-Oriented Modeling and Simulation withModelica 3.3: A Cyber-Physical Approach. 2015. isbn: 9781118859124.

[5] Peter Fritzson, Peter Aronsson, Håkan Lundvall, Kaj Nyström, Adrian Pop,

Levon Saldamli, and David Broman. “The OpenModelica Modeling,

Simulation, and Development Environment”. In: 2005.

[6] Peter Fritzson, Adrian Pop, and Martin Sjölund. Towards Modelica 4Meta-Programming and Language Modeling with MetaModelica 2.0.

Tech. rep. 2011:10. Linköping University, PELAB - Programming

Environment Laboratory, May 2011. 297 pp.

url: http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-68361 (visited on 04/01/2013).

[7] Simon L Peyton Jones. “Compiling Haskell by program transformation: A

report from the trenches”. In: European Symposium on Programming.

Springer. 1996, pp. 18–44.

53

Page 62: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

Bibliography

[8] Simon L Peyton Jones. “Implementing lazy functional languages on stock

hardware: the Spineless Tagless G-machine”.

In: Journal of functional programming 2.02 (1992), pp. 127–202.

[9] Simon Marlow, S Peyton Jones, et al. The Glasgow Haskell Compiler. 2004.

[10] Niko Matsakis. Introducing MIR. 2016.

url: https://blog.rust-lang.org/2016/04/19/MIR.html.

[11] Niko Matsakis. Rust RFC #1211. 2015.

url: https://github.com/nox/rust-rfcs/blob/master/text/1211-mir.md.

[12] Jason Merrill.

“GENERIC and GIMPLE: A new tree representation for entire functions”.

In: Proceedings of the 2003 GCC Developers’ Summit. 2003, pp. 171–179.

[13] MetaModelica documentation at openmodelica.org.

url: https://build.openmodelica.org/Documentation/MetaModelica.html.

[14] Vu Nguyen, Sophia Deeds-rubin, Thomas Tan, and Barry Boehm.

“A SLOC Counting Standard”. In: COCOMO II Forum 2007.

[15] OMCompiler/DAE.mo. Mar. 2017.

url: https://github.com/OpenModelica/OMCompiler/blob/b8fe1840ca6a758e39255b674a549a2c7c4a4bbe/Compiler/FrontEnd/DAE.mo.

[16] Martin Otter and Hilding Elmqvist.

“Modelica-Language, Libraries, Tools, Workshop and EU-Project”.

In: (2001).

[17] Fabrice Rastello. SSA-based Compiler Design. 1st.

Springer Publishing Company, Incorporated, 2016.

[18] Martin Sjölund, Peter Fritzson, and Adrian Pop. “Bootstrapping a

Compiler for an Equation-Based Object-Oriented Language”.

In: Modeling, Identi�cation and Control 35.1 (2014), pp. 1–19.

[19] J Stanier and D Watson.

“Intermediate Representations in Imperative Compilers: A Survey”.

In: ACM Computing Surveys 45.3 (2013). issn: 03600300.

[20] Swift Intermediate Language (SIL). 2017.

url: https://github.com/apple/swift/blob/57ecaa7fae78d30ae4f90cb4606c98504723717e/docs/SIL.rst.

54

Page 63: E˙icient IR for the OpenModelica Compilerliu.diva-portal.org/smash/get/diva2:1280874/FULLTEXT01.pdf · the source text to more easily parsable tokens), syntax analyzing (checking

Bibliography

[21] David A Terei and Manuel MT Chakravarty.

“An LLVM backend for GHC”. In: ACM Sigplan Notices. Vol. 45. 11.

ACM. 2010, pp. 109–120.

55