in copyright - non-commercial use permitted rights ... · acknowledgements i wouldlike to thank...
TRANSCRIPT
Research Collection
Doctoral Thesis
Modular language specification and composition
Author(s): Denzler, Christoph
Publication Date: 2001
Permanent Link: https://doi.org/10.3929/ethz-a-004183485
Rights / License: In Copyright - Non-Commercial Use Permitted
This page was generated automatically upon download from the ETH Zurich Research Collection. For moreinformation please consult the Terms of use.
ETH Library
Diss. ETH No. 14215
Modular Language Specificationand Composition
A dissertation submitted to the
SWISS FEDERAL INSTITUTE OF TECHNOLOGY (ETH)
ZÜRICH
for the degree of
Doctor of Technical Sciences
presented byCHRISTOPH DENZLER
Dipl. Informatik-Ing. ETH
born July 20, 1968
citizen ofMuttenz, BL
and Schwerzenbach, ZH
Prof. Dr. Albert Kündig, examiner
Dr. Matthias Anlauff, co-examiner
O <£^o<^
2001
TIK-Schriftenreihe Nr. 43
Dissertation ETH No. 14215
Examination date: June 1, 2001
Acknowledgements
I would like to thank my adviser, Prof. Dr. Albert Kündig, for giving me the
possibility to develop my own ideas and for his confidence in my work. He
introduced me in the field of embedded systems from where I learned many
lessons on efficiency, simplicity and reusability. I am grateful to Dr. Matthias
Anlauff for agreeing to be co-examiner of this thesis. MCS modelled on his
Gem/Mex system. Especially his advice to represent undefas null saved me
many redesigns.I want to thank my colleagues from TIK, Daniel Schweizer, Philipp Kutter,
Samarjit Chakraborty, and Jörn Janneck for insightful discussions. Daniel
introduced me into the field of language and program specification. So I was
well prepared to join Philipps Montage approach. Discussing Montages with
him always left me with a head full of new ideas. Jörn and in particular Samar¬
jit then helped me to get some structure into these creative boosts. A specialthank goes to Hugo Fierz whose CIP system inspired my own implementation.Discussing modelling techniques with him gave me valuable insights into gooddesign practice. I owe my thanks also to Hans Otto Trutmann for his
FrameMaker templates and his support on this word processor.
Discussions with Stephan Missura and Nikiaus Mannhart could be reallymind challenging as their clear train of thought forced my arguments to be
equally precise. Many times, a glance at Stephans thesis gave me the needed
inspiration to continue with mine. Having lunch with Nikiaus Mannhart was
always a welcome interruption to my work. He also deserves my thanks for
proof-reading this thesis.
Last but not least, I thank Regula Hunziker, my fiancée, for her love and her
support.
l
Abstract
This dissertation deals with the modularisation of specifications of program¬
ming languages. A specification will not be partitioned into compiler phases as
usual but into modules - called Montages - which describe one language con¬
struct completely. This partitioning allows to plug specifications of languageconstructs into a specification for a language and — as Montages contain exe¬
cutable code - thus building an interpreter for the specified language. The
problems that follow from this are discussed on different levels of abstraction.
The different character of language specifications on a construct by con¬
struct basis also demands for a different concept of the whole system. Knowl¬
edge about processes, such as parsing, has to be distributed to many Montages.But this is made up by the increased flexibility of Montage deployment. A lan¬
guage construct that once has been successfully implemented for a languagecan be reused with only minor adaptations in many different languages. Well-
defined access via interfaces separate Montages clearly, such that changes in one
construct cannot have unintentional side-effects on other constructs.
This dissertation describes the concept and implementation of a system
based on Java as a specification language. Reuse of specifications is not
restricted to reuse of source code but it is also possible to reuse precompiledcomponents. This enables to distribute and sell language specifications without
giving away valuable know-how on its internals. Some approaches towards
development and distribution of as well as support on language componentswill be discussed.
A detailed description of the Montage Component System will go into the
particulars of decentralised parsing (each Montage can parse only its specifiedconstruct), explain how static semantics can be processed efficiently and how a
program can be executed, i.e. how its dynamic semantics is processed.
in
Se/te Leer /Blank leaf
Zusammenfassung
Die vorliegende Arbeit beschäftigt sich mit der Modularisierung von Spezifika¬tionen fur Programmiersprachen. Eine Spezifikation wird dabei nicht wie
üblich in Compilationsphasen unterteilt, sondern in Moduln — sogenannte
Montages — die jeweils ein Sprachkonstrukt komplett beschreiben. Diese
Unterteilung erlaubt es Spezifikationen einzelner Sprachkonstrukte zu einer
Spezifikation einer ganzen Sprache zusammenzustecken. Da die Montages aus¬
führbaren Code enthalten, lässt sich auf diese Weise ein Interpreter für die spe¬
zifizierte Sprache zusammensetzen. Es ergeben sich dabei Probleme auf
mehreren Abstraktionsebenen deren Lösungen in dieser Arbeit diskutiert wer¬
den.
Die spezielle Art der Spezifikation einer Sprache (Konstrukt für Konstrukt)
verlangt auch nach einer andersartigen Konzeption des ganzen Systems. Pro¬
zesswissen, das in einem herkömmlichen Modell in einer Phase vorhanden ist
(z.B. über das Parsen), muss auf viele einzelne Montages verteilt werden. Der
Gewinn ist aber eine enorme Flexibilität beim Einsatz von Montages. Ein
Sprachkonstrukt welches in einer Sprache erfolgreich zum Einsatz kam, kann
mit nur minimalen Anpassungen in einer neuen Sprachspezifikation eingesetztwerden. Die Montage-Schnittstellen grenzen die einzelnen Teilspezifikationensauber voneinander ab, so dass durch eine Aenderung in einem Konstrukt
keine unbeabsichtigten Nebeneffekte in anderen Konstrukten entstehen kön¬
nen.
Diese Arbeit beschreibt die Konzeption und Implementation eines Systemsbasierend auf Java als Spezifikationssprache. Teilspezifikationen können nicht
nur als Quellcode wiederverwendet werden, sondern auch als kompilierteKomponenten. Dies eröffnet auch die Möglichkeit, Sprachspezifikationen zu
vermarkten ohne dabei wertvolles Know-How preiszugeben. Es werden deshalb
einige Ansätze zur Entwicklung, Vertrieb und Unterhalt von Sprachkompo¬nenten beschrieben und diskutiert.
v
vi Zusammenfassung
Die Beschreibung des Montage Component Systems geht auf die Probleme
des dezentralen Parsens ein (jede Montage kann nur das von ihr beschriebene
Konstrukt parsen), erklärt wie die statische Semantik effizient ausgeführt wer¬
den kann und legt dar wie ein Program zur Ausführung gelangt.
Contents
Abstract iii
Zusammenfassung y
Contents vii
1 Introduction 1
1.1 Motivation and Goals 1
1.2 Contributions 4
1.3 Overview 6
2 Electronic Commerce with Software Components 7
2.1 E-Commerce for Software Components 8
2.1.1 What is a Software Component? 8
2.1.2 End-User Composition 9
2.1.3 What Market will Language Components have? 11
2.2 Electronic Marketing, Sale and Support 11
2.2.1 Virtual Software House (VSH) 12
2.2.2 On-line Consulting 13
2.2.3 Application Web 16
2.3 Formal Methods and Electronic Commerce 18
2.3.1 Properties of Formal Methods 18
3 Composing Languages 23
3.1 Partitioning of Language Specifications 24
3.1.1 Horizontal Partitioning 24
3.1.2 Vertical Partitioning 26
3.1.3 Static and Dynamic Semantics of Specifications 28
3.2 Language Composition 29
vii
viii Contents
3.2.1 The Basic Idea 29
3.2.2 On Benefits and Costs of Language Composition 30
3.3 The Montages Approach 33
3.3.1 What is a Montage? 34
3.3.2 Composition of Montages 35
4 From Composition to Interpretation 39
4.1 What is a Montage in MCS? 39
4.1.1 Language and Tokens 40
4.1.2 Montages 40
4.2 Overview 42
4.3 Registration / Adaptation 44
4.4 Integration 45
4.4.1 Parser Generation 45
4.4.2 Scanner Generation 45
4.4.3 Internal Consistency 46
4.4.4 External consistency 50
4.5 Parsing 52
4.5.1 Predefined Parser 52
4.5.2 Bottom-Up Parsing 54
4.5.3 Top-Down Parsing 56
4.5.4 Parsing in MCS 57
4.6 Static Semantics Analysis 60
4.6.1 Topological Sort of Property Dependencies 60
4.6.2 Predefined Properties 62
4.6.3 Symbol Table 63
4.7 Control Flow Composition 68
4.7.1 Connecting Nodes 68
4.7.2 Execution 70
5 Implementation 71
5.1 Language 71
5.2 Syntax 73
5.2.1 Token Manager and Scanner 75
5.2.2 Tokens 78
5.2.3 Modular Parsing 79
5.3 Data Structures for Dynamic Semantics of Specification 80
5.3.1 Main Class Hierarchy. 81
5.3.2 Action 82
Contents ix
5.3.3 I and T Nodes 82
5.3.4 Terminal 85
5.3.5 Repetition 85
5.3.6 Nonterminal 86
5.3.7 Synonym 86
5.3.8 Montage 88
5.3.9 Properties and their Initialisation 90
5.3.10 Symbol Table Implementation 94
6 Related Work 97
6.1 Gem-Mex and XASM 97
6.2 Vanilla 99
6.3 Intentional Programming 100
6A Compiler-Construction Tools 102
6.4.1 Lex & Yacc 102
6.4.2 Java CC 103
6.4.3 Cocktail 104
6AA Eli 104
6.4.5 Depot 4 105
6.4.6 Sprint 106
6.5 Component Systems 108
6.5.1 CORBA 108
6.5.2 COM 110
6.5.3 JavaBeans Ill
6.6 On the history of the Montage Component System 112
7 Concluding Remarks 115
7.1 What was achieved 115
7.2 Rough Edges 116
7.2.1 Neglected Parsing 116
7.2.2 Correspondence between Concrete and Abstract Syntax ...117
7.2.3 BNF or EBNF? 119
7.3 Conclusions and Outlook 121
7.3.1 Separation of Concrete and Abstract Syntax. 122
7.3.2 Optimization and Monitoring 123
7.3.3 A Possible Application 124
Bibliography 125
Curriculum Vitae 133
Seite Leer /
Blank leaf
Chapter 1
Introduction
This thesis aims to bring specifications of programming languages closer to
their implementations. Understanding and mastering the semantics of lan¬
guages will be important to a growing number of programmers and — to a cer¬
tain extent - also to software users.
A typical compiler processes the source program in distinct phases which
normally run in sequential order (Fig. 1). In compiler construction suites such
as GCC, Eli or Cocktail [GCC, GHL+92, GE90], each of these phases corre¬
sponds to one module/tool.
This architecture supports highly optimized compilers and is well suited for
complex general-purpose languages. Its limitations become apparent when
reuse and extensibility of existing implementations/specifications is of interest.
Consider extending an existing programming language with a new construct
— a not yet available switch statement in a subset of C for instance. In a tra¬
ditional compiler architecture, such an extension would imply changes in all
modules. The scanner must recognize the new token Switch), the parser has
to be able to parse the new statement correctly, and obviously semantic analysisand code generation have to be adapted.
1.1 Motivation and Goab
The specification and implementation of programming languages are often
seen as entirely different disciplines. This lamentable circumstance leads to a
separation of programming languages into two groups:
1. Languages designed by theorists. Most of them have very precisely formulated
semantics, many where first specified before an implementation was availa¬
ble. And often they are based on a sound concept or model of computation.
1
21 Introduction
source code
i
lexical analysis
syntax analysis
i—
semantic analysis
intermediate code generation
i
optimization
icode generation
i
execution
Figure 1: Module structure (phases) ofa traditional compiler system
Examples are: ML [HMT90], Lisp [Ste90], Haskell [Tho99] (functional
programming), Prolog [DEC96] (logic programming), ASM [Gur94,
Gur97] (state machines).
Although their semantics is often given in unambiguous mathematical
notations, they lack a large programmer community — either because the
mathematical background needed to understand the specification is consid¬
erable, or because their usually high level of abstraction hinders an efficient
implementation.2. Languages designed byprogrammers. Their development was driven by operat¬
ing systems and hardware architecture needs, by marketing considerations,
by the competition with other languages, by practical problems — for many
such languages, several of these reasons apply. Examples are: Fortran, C,
C++, Basic, Java.
Many of these languages feature surprisingly poor semantics descriptions.
Normally, their specification is given in plain English, leaving room for
many ambiguities. The most precise specification in these cases is the source
code of the compiler, if available, and even this source code has to be
adapted to the hosting platforms specific needs. For this reasons, formal
specifications are hard to provide. Checking a program against a given speci¬
fication is a very tedious and (in general) intractable task [SD97].
1.1 Motivation and Goals 3
This thesis will focus on specifications of imperative programming languages,which will be found mainly in the second group. Ideally, language specifica¬tions should be easy to understand and easy to use. Denotational descriptionsassume a thorough mathematical background of the reader. They allow to for¬
mulate many problems in a very elegant fashion (e.g. an order of elements), but
they lack to give hints for an efficient implementation (e.g. a quick-sort algo¬rithm).
Specifications should be understandable by a large community of different
readers: language designers, compiler implementors, and programmers usingthe language to implement software solutions. Denotational specifications will
be laid aside by the compiler constructor as soon as efficiency is in demand and
they are not suited to stir the interest of the average C programmer. Whether
this is an indication for the insufficient education of programmers or for the
unnecessary complexity of mathematical notions will not be subject of this the¬
sis. Both arguments find their supporters and in both there is a certain truth
and a certain ignorance towards the other.
Programming languages become more and more widespread in many disci¬
plines apart from computer science. The more powerful applications get, the
more sophisticated their scripting languages become. It is e.g. a common task
to enter formulas in spread sheet cells. Some of these formulas should only be
evaluated under certain circumstances, and some might have to be evaluated
repeatedly. Another example for the increasing importance of basic program¬
ming skills is the ability to enter queries in search engines on the Internet.
Such a programming language designed for a specific purpose is called a
domain specific language (DSL) [CM98] or a "little language" [Sal98]. Manyof those languages are used by only a few programmers, some may be even
designed for one single task in a crash project (e.g. a data migration project).Although most DSLs will never have the attention of thousands of program¬
mers, they still should feature a correct implementation and fulfil their purpose
reliably. Many DSLs will be implemented by programmers who are inexperi¬enced in language design and compiler construction. A tool for the construc¬
tion of DSLs should support these programmers. This basically means that it
should be possible to encapsulate the experience of professional languagedesigners as well as their reliable and well understood implementations in «lan¬
guage libraries». This enables a DSL designer to merely compose his new lan¬
guage instead of programming it from scratch. The language specification suite
we describe in this thesis was designed to allow for such a systematic engineer¬ing approach.
Languages are decomposed into their basic constructs, e.g. while andfor loops,if-then-else branches, expressions and so on. Each such construct is representedby a software component. Such a component provides the modularity and
4 1 Introduction
composability needed for our programming language construction kit. It ena¬
bles the designer of a new DSL to simply compose a new language by picking
the desired constructs from a library and plugging them together. Existing
components may be copied and adapted or new constructs be created from
scratch and added to the library
1.2 Contributions
We describe a system based on the Montages approach [KP97b] which we call
Montage Component System (MCS). The basic idea is to compose a program¬
ming language on a construct by construct basis.A possible module system for
an IF statement will look like Fig. 2. Extending a language basically consists of
adding new modules (Montages) to a language specification. Each such mod¬
ule is implemented as a software component, which simplifies the composition
of languages.The originating Montage approach has been extended to support:
• Abstract syntax
• Mapping of concrete to abstract syntax
• Coexistence of precompiled and newly specified components.• Configurability (to some extent) of precompiled components.
• 4 configurable phases: parsing, static semantics, generation of an execution
network and execution
We provide a survey of compiler construction technology and an in-depth dis¬
cussion of how language specifications can be composed to executable inter¬
preters. A MCS specification decomposes language constructs into four
different levels, which roughly correspond to the phases of traditional
approaches: parsing, static semantics, code generation and execution (dynamic
semantics). In contrast to conventional compiler architectures, these specifica¬
tions are not given per level for the whole language, but only for a specific lan¬
guage construct. Combining Montage specifications on each of these four
levels is the main topic of this thesis. The systems architecture supports multi-
language and multi-paradigm specifications.The Montage Component System originated in the context of a long stand¬
ing tradition of research in software development techniques at ourlaboratory
[FMN93, Mar94, Schw97, Mur97, Sche98]. Although technically oriented,
our work always considered economical aspects as well. This thesis shall con¬
tinue this tradition and starts with some reflections on marketing, sale and sup¬
port of language components. It is important to emphasise that these
considerations greatly influenced design decisions and requirements MCS had
to fulfil. The latter were:
1.2 Contributions 5
i
Execution
(dashed)
Static analysis:simultaneous
firing
(dotted)
Parse
(solid)
Identifier
a
A Montage Component Web for:
IF aob THEN a := c END
ConditionalOp
<>
Identifier
b
Figure 2: A MCS Webfor an IFstatement and its various controlflows.
• Specifications should be easy to understand and to use.
• Specifications should be reusable not only in source form but also in com¬
piled form (component reuse).• Specifications should be formal
• Modularity/composability of specifications from a programmers point of
view. Programmers think in entities of language constructs and not of com¬
piler phases.• Employment of a standard component system and a standard programming
language for specification.
6 1 Introduction
1.3 Overview
This thesis is organized as follows. Commercialization of software components
imposes some prerequisites on the architecture of MCS. These prerequisites
will be introduced and explained in Chapter 2. Chapter 3 then gives an over¬
view ofhow programming languages can be composed. Design and concepts of
MCS are explained in detail in Chapter 4, which is the core of this thesis.
Chapter 5 discusses some interesting implementation details A survey of
related work will be given in Chapter 6. References to this chapter can be
found throughout this thesis, in order to put a particular problem into a wider
context. Finally, Chapter 7 concludes this dissertation and gives some prospects
for the future.
Chapter 2
Electronic Commerce with
Software Components
Electronic commerce has emerged in the late 90s and has grown to a multi- bil¬
lion dollar business. E-Commerce businesses can be characterised by two
dimensions: the degree ofvirtuality of the traded goods and the degree ofvirtu-
ality of the document flow. Fig. 3 shows these two dimensions and illustrates
their meaning by giving some examples. Whether 'real' goods have to be
moved depends on the business sector involved. On the document flow axis
however, E-Commerce tries to reduce the amount of physically moved docu¬
ments. An important distinction has to be made between business-to-business
(B2B) commerce and business-to-consumer (B2C) commerce. Note that a
company may well distinguish between logistic (B2B like) and customer (B2C
like) commerce — Wal-Mart supermarkets e.g. B2C E-Commerce will emerge
much slower, as consumers cannot be expected to be on-line 24 hours a day.Security considerations second this argument (a signature on a piece of paper is
much easier to understand than to trust in abstract cryptographic systems).
Electronic document flow
banking
systems'virtual'
goods
travel
agency
automobile
industry'real'
goods
super¬
market
Paper-based document flow
Figure 3: Characterisation ofE-Commerce businesses. The horizontal axis represents the
'virtuality ofthe tradedgoods. The vertical axis indicates how paperless' the office works.
7
8 2 Electronic Commerce with Software Components
E-Commerce as we refer to in this chapter is concerned with the left part of
the diagram in Fig. 3. Components are virtual goods which might be sold to
end-users or deployed by other business organisations. Section 2.2 will give
some examples of both B2B and B2C scenarios with language components
involved.
Szyperski explains that there has to be a market for component technology
in order to keep the technology from vanishing [Szy97]. We further believe,
that E-Commerce will play a major role in this market, as we will elaborate in
section 2.2.This chapter will focus on the relation of software components and
their marketing. The Montage Components System presented later (chapter 4,
p. 39) in this thesis highly relies on the success of component technology.
Although basic idea of composing language specifications can be applied to a
single user environment, it makes limited sense only. Its full potential is
revealed only if language components can be distributed over the Internet
[Sche98]. In the following we will present our vision of a (language) compo¬
nent market.
After defining the term 'Software Component', we point out some premises
E-Commerce imposes on software components. We then explain our vision of
electronic marketing, consulting and support. These ideas where developed
and implemented under the umbrella of the Swiss National Science Founda¬
tions project "Virtual Software House"!. The chapter closes with some consid¬
erations on the acceptance of formal methods in software markets.
2.1 E-Commercefor Software Components
2.1.1 What is a Sofiware Component?
The term 'Software Component' is used in many different ways. For marketing
and sales persons it is simply a 'software box'. Programmers have different ideas
of components, too. To the C++ programmer, it might be a dynamic link
library (DLL); a Java programmer refers to a JavaBean and a CORBA specialist
has in mind any program offering its services through an ORB. Often the
terms 'Object' and 'Component' are not separated clearly enough. Throughout
this monograph we keep to the following definition, taken from [SP97] :
1 project no. 5003-52210
2.1 E-Commerce for Software Components 9
"A component is a unit ofcomposition with contractually specifiedinterfaces and explicit context dependencies only. Components can
be deployed independently and are subject to composition by third
parties"
This definition is product-independent and does not only cover technical
aspects. It covers five important aspects of a component:
1. Extent: unit of composition' means a piece of software that is sufficientlyself-contained to be composed by third parties (people who do not have
complete insight into the components software).2. Appearance: 'contractually specified interfaces' implies that the interface of a
component adheres to a standard of interfaces that is also followed by other
components. For example, JavaBeans employs a special naming scheme
which allows others to set and get attribute values, to fire events and to gain
insight into the components structure.
3. Requirements: 'explicit context dependencies' specifies that the component
does not only reveal its interfaces to its clients (as classes and modules
would); it furthermore tells what the deployed environment has to providein order to be operative.
4. Occurrence: 'Components can be deployed independently', i.e. it is well sep¬
arated from its environment and from other components. No implicitknowledge of the underlying operating system, hardware or other software
(components) may be used during compile-time and at run-time.
5. Usage: Components are subject to composition by third parties' and thus
will be deployed in systems unknown to the programmer of the component.
This aspect justifies the four former items of the definition. A component
should encapsulate its implementation and interact with its environment
only through well defined interfaces.
A detailed discussion on the above definition along with the difference between
'Objects' and 'Components' is given in Szyperski's book on component soft¬
ware [Szy97].
2.1.2 End-User Composition
Besides the pure definition of what a software component is, there is also a list
of requirements that a component has to fulfil to be a 'useful' component.
Components should provide solutions to a certain problem on a high level
of abstraction. As we have seen above, a component is subject to third party
composition and will eventually be employed in combination with other com¬
ponents of other component worlds (JavaBeans interoperating with ActiveX
components for example). This implies that its interfaces and its interfacing
10 2 Electronic Commerce with Software Components
process have to be kept as simple and as abstract as possible. There is no room
for proprietary solutions and 'hacks' — the adherence to standards is vital for
the success of a component. Of course, there is a price to be paid for seamless
interoperability: performance. In many environments based on modern high-
performance hardware, however, one is readily willing to pay this price for the
flexibility gained. As we will point out in chapter 4, there are many reasons to
pay this price in the case of language composition as well. Nevertheless, we
must stress the fact that (hand-written) compilers, as a rule, significantly out¬
perform MCS generated systems.
Aspect 4 in our component définition forbids to make any assumption on
the environment of a component. Aspect 5 implies another unknown: the
capabilities and skills of third parties. The success of a component is tightly
coupled with its ease ofuse and the time it takes to understand its functionality.
Especially if components are subject to end-user composition, this argument
becomes important. The gain in flexibility of deployment (down to the end-
user) outweighs in many cases the above mentioned drawback of slower execu¬
tion
We defined the extent of a component as the unit of composition and con¬
sidering the entire definition, we can deduce that a component is also a unit of
deployment. This technical view does not necessarily apply to a components
business aspect. Although a component may be useful for deployment, it may
not be a unit of sale. For example, there is no argument against a component
that specifies only the multiplication construct of a language, but in sales it will
normally be bundled with (at least) the division construct. MCS allows to bun¬
dle complete sublanguages and use them as a component of its own. Such a
cluster of useful components can unfold its full potential only, if it is possible to
still (re-)configure its constituents. This is important, because as components
grow (components may be composed of other clustered components), the
resulting component will become monolithic and inert. On the other hand,
bundling components for sale can be accomplished according to any criteria,
thus decoupling technical issues from marketing aspects. For example, techni¬
cally seen, the term and factor language constructs (= components in MCS)
only make sense if there are also numerical constructs such asaddition, sub¬
traction, multiplication and division. From a sales point of view, it
may be perfectly right to sell term and factor without its constituents,
because the buyer wants to specify these on his own.
2.2 Electronic Marketing, Sale and Support 11
2.1.3 WhatMarket will Language Components have?
When speaking about end-users, we do mean 'programming literates'. That is,
people skilled in using any imperative programming language. This is regardedas a prerequisite to be able to specify new languages. Thus the answer to this
sections question is: the hard middle. According to JefF Bezos, founder and
CEO ofAmazon.com, this is [Eco97]:
"In today's world, ifyou want to reach 12 people, that's easy: you
use the phone. Ifyou want to reach 12 million people, it's easy: you
take out an ad during the Superbowl. But ifyou want to pitchsomething to 10,000 people — the hard middle — that's reallyhard.
"
The next section addresses exactly this problem, as how to do marketing, sale
and support for the hard middle.
2.2 Electronic Marketing, Sale and Support
The most important advantage of component technology is — as its name sug¬
gests - its composition concept. Components will not make sense at all if theyto not lend themselves easily to composition. Composable software will create
more flexible and complex applications than conventionally developed soft¬
ware. This is not because a single software house would not be able to write
complex software, but it is because different styles of software development,different cultures of approaching problems will also be put together when com¬
posing software. This adds to its complexity, but may also lead to more effi¬
cient, more effective and more elegant solutions.
Component software will not be successful if there is no market for software
components. Such markets, however, have to be established first. Conventional
software distribution schemes will not match the requirements of the compo¬
nent industry. They deal with stand-alone applications, which can be sold in
shrink-wrapped boxes, containing media on which the software is delivered
(DVD, CD, diskettes, tapes) and several kilograms of manuals as hard copies.
Obviously this is not applicable to software components. Depending on their
functionality, components may be of only a few kilobytes in size and extremelysimple to use. No one would like to go to a shopping mall to acquire such a
piece of software, not to speak of the overhead of packaging, storage and sales
room space.
The typical marketplace for components is the Internet. Advanced tech¬
niques for distribution, support and invoicing have to be applied to protect
12 2 Electronic Commerce with Software Components
both the customers and the vendors interests. In the following subsections, we
describe how such a market place on the Internet ideally looks like.
2.2.1 Virtual Software House (VSH)
In the Virtual Software House project^ [SS99] a virtual marketplace for soft¬
ware of any kind was studied and prototype solutions have been developed.
The VSH is a virtual analogy to a real shopping mall. An organisation running
a VSH is responsible for the maintenance of the virtual mall. It will offer its
services to suppliers (shop owners in the mall) that want to sell their products
and services over the Internet. It runs the server with the VSH software, it may
provide disk space and network bandwidth to its contractors and it will give
them certain guarantees about quality of service (e.g. availability, security,
bandwidth). Invoicing and electronic contracting could also be part of its serv¬
ices.
The quality of the mediating services of a VSH will be important for all par¬
ticipants:1. Operators of a VSH can establish their name as a brand and thus attract new
customers as well as new contractors.
2. Suppliers can rely on the provided services and do not need to care about
establishing themselves the infrastructure for logistics, invoicing and market¬
ing. A VSH is an out-sourcing centre for a supplier. Especially small compa¬
nies will e.g. profit from a professional web appearance and facilitated sales
procedures. They can concentrate on their main business without neglecting
the sales aspect.
3. For the customer, an easy-to-use and easy-to-understand web interface is
very important. This includes simple and yet powerful search and query
facilities. Clearly formulated and understandable payment conditions and
strict security guarantees will help to win customers' confidence.
Of central importance is the product catalogue which can be queried by cus¬
tomers. Although several different companies sell their products under the
umbrella of a VSH, customers usually prefer to have one single central search
engine. In contrast to shopping in real shopping malls, internet customers usu¬
ally do not want to spend an afternoon surfing web pages just to find a prod¬
uct. They will expect search and query facilities that are more elaborated than
just plain text searching (as is done in search engines like AltaVista or Google).
Instead, Mediating Electronic Product Catalogues (MEPC) [HSSS97] will be
employed. These catalogues summarize the (possibly proprietary) product cata-
funded by Swiss National Science Foundation, project no. 5003-52210.
2.2 Electronic Marketing, Sale and Support 13
logues of the different companies participating in the VSH. Mediating means
that they offer a standardized view of the available products. MEPCs enable a
customer to query for specific products and/or combination of products.Optionally he can do this in his own language, using his own units of measure¬
ment and currencies. It is the mediating aspect of such EPCs to convert and
translate figures and languages.A VSH has a very flexible, federated organisation. It will not only allow its
contractors to sell software but it will also support them in offering other (vir¬
tual) services such as consulting (see section 2.2.2), software installation and
maintenance or federated software developing (see section 2.2.3). A VSH is not
only designed for business to consumer (B2C) commerce, but it can also serve
as a platform for business to business (B2B) commerce.
One example would be federated software development, where two or more
contractors use the services of a VSH for communication, as a clearinghouseduring the development stage and as a merchandising platform afterwards.
Another example for B2B commerce would be a financial institute joining the
VSH to offer secure payment services (by electronic money transfers or credit
card) to the other contractors.
Such Virtual Software Houses are already available on the web. A spin-off ofthe VSH project has started its business in autumn 1999 and can be reached at
www.informationobjects.com. Another similar example (although not
featuring such sophisticated product catalogues) iswww.palmgear.com. The
latter have specialised in software for palmtop or hand-held computers. Usuallysuch programs are only a few kilobytes in size and often they are distributed as
shareware by some ambitious student or hobby programmer (which not neces¬
sarily reduces their quality). Obviously such programmers do not have enoughtime and money to professionally advertise and sell their software. As some of
their programs are just simple add-ons to the standard operating system, this
market already comes close to the component market proposed above
2.2.2 On-line Consulting
Consulting on evaluation, buying, installation and use of software plays a
prominent role in today's software business. There are many attempts to replacehuman consultants by sophisticated help-utilities, electronic wizards and elabo¬
rated web-sites. But these all have two major drawbacks:/7>.tf, they cannot (yet)answer specific questions. They can only offer a database of knowledge, which
has to be queried by the user. Second, they are not human. For support, many
users prefer a human counterpart to a machine. But human manpower is
14 2 Electronic Commerce with Software Components
expensive, individual (on-site) consulting even more so as time consuming
travelling adds to the costs.
The aim of the Network and On-line Consulting (NOLC) project* was to
reduce costs in consulting. This aim was achieved by providing a platform that
supports on-line-connections between one (or several) client(s) and one (or sev¬
eral) consultant(s). Clients and consultants may communicate via different
services like chat, audio, video, file transfer, whiteboards and application shar¬
ing (Fig. 4)It is important, that these services are available at very low cost. This spe¬
cially means that there should not be any costly installations necessary. Fortu¬
nately, for the platform used in the NOLC project (Wintel), there is a free
software called NetMeeting [NM] which meets this requirement:
• Availability for free, as it is shipped as an optional package of Windows.
Today's hardware suffices to run NetMeeting. The only additional cost may
arise from the purchase of an optional camera.
• Supports audio, video and communication via standard drivers; so any
sound card and video camera can be used even over modem lines.
• A decentralised organisation of the communication channels.
The last item was very important: The separation of the control over the com¬
munication from the communication itself. It allows to control communica¬
tion channels between customers and consultants from a server without havingto deal with the data-stream they produce. I.e. establishment and conclusion of
a connection (and the quality of service parameters) can be controlled by a
server running the NOLC-system. The actual data-stream of the consultingsession (video, audio, etc.), however, will be sent peer-to-peer, and thus does
not influence the throughput of the NOLC-server.
With consulting, three parties are involved:
1. A client who has a problem about e.g. a piece of software.
2. A consultant who offers help in the field of the client's problem.3. An intermediary providing a platform that brings the above two parties
together.
The NOLC-project investigated the characteristics of the third party, and a
prototype of such an intermediate platform was implemented. It consists of a
server that provides the connecting services between the first two parties. The
server controls the establishment and conclusion of a consulting session. To
fulfil this task, it has access to a database that stores information about consult¬
ants and clients (for an unrestricted consulting session, both parties need to be
3 funded by Swiss National Science Foundation, project no. 5003-045329, a sub-project ofVSH.
2.2 Electronic Marketing, Sale and Support 15
Customer Consultant
database Server
Figure 4: NOLC architecture andparticipants in a consulting session
registered). This data consists of the communication infrastructure available at
the parties computers (audio, video), but also the skills of the consultants, their
availability and the fees.
When a potential client looks for consulting services, he will eventually visit
the web site of a consulting provider. On this web site, he may register Hmself
and apply for a consulting session. Before such a session may start, he is pre¬
sented a few choices and questions about the requested consulting session:
What kind of media shall be used (video, whiteboard, application sharing etc.)and about which topic the session will be. It is possible to let the system search
for an available consultant, or the client may himself enter the name of a con¬
sultant (if known). Just before a new session is started, the client gets an over¬
view of the costs that this session is going to generate. Usually, cost depends on
the communication infrastructure, the chosen consultant and the duration of
the session. Once the client agrees, the chosen consultant (which was previ¬ously marked as available) receives a message, indicating that a client wants to
enter a new session.
During a session both parties have the possibility to suspend the session and
to resume it later on. This feature is necessary e.g. if there are questions which
16 2 Electronic Commerce with Software Components
cannot be answered immediately; after resuming, the same configuration as
before suspension is re-established. Of course it is also possible to renegotiate
this configuration at any point in a session.
After completing the session, the client will be presented the total cost and
he will be asked to fill in an evaluation form which serves as a feedback to the
consultant.
The intermediary service controls re-negotiation ofthe communication con¬
figuration, suspension and resumption of sessions and, finally, the calculation
of the cost. Once a session terminates, the feedback report is stored for future
evaluation and the generated costs are automatically billed to the client.
Recently, some security features were added. The customer will have to digit¬
ally sign each session request and the server will store these signed requests.
Thus, it will be possible to prove that a customer has requested and accepted to
pay for a session.
Billing and money transfer is not part of the NOLC-platform, but is dele¬
gated to a Virtual Software House [SS99]. So NOLC is a business service pro¬
vided either by the VSH itself or by an additional party offering their services
through the VSH.
In addition, n to m group communication and load-balancing in a fault-tol¬
erant environment were also investigated [Fel98, FGS98]. The Object Group
Service (OGS) employed is a CORBA service which can be used to replicate a
server in order to make it fault-tolerant.
2.2.3 Application Web
The last aspect of E-Commerce that has been investigated in the context of the
VSH project was the maintenance of component systems. The central question
was: How to remotely maintain and control a component system, given that
the system is heterogeneously composed. I.e. components from different soft¬
ware developers with arbitrary complex versioning should be manageable
[MV99,VM99].Information management tools and techniques do not scale well in the face
of great organisational complexity. An informal approach to information shar¬
ing, based largely on manual copying of information, cannot meet the
demands of the task as size and complexity increase. Formal approaches to
sharing information are based on groupware tools, but cooperating organisa¬
tions do not always enjoy the trust or availability of sophisticated infrastruc¬
ture, methods, and skills that this approach requires. Bridging the gap requires
a simple, loosely coupled, highly flexible strategy for information sharing.
Extensive information relevant to different parts of the software life cycle
2.2 Electronic Marketing, Sale and Support 17
should be interconnected in a simple, easily described way; such connections
should permit selective information sharing by a variety of tools and in a vari¬
ety of collaboration modes that vary in the amount of organisational couplingthey require.
During the development of a component, the programmers have a lot of
information about the software, e.g. knowledge about versioning, compatibilitywith other components, operating systems and hardware, known bugs, omitted
and planned features, unofficial (undocumented) features etc. All this informa¬
tion is lost when the software is released in a conventional manner. The cus¬
tomer of such a component may only rely on official documentation. The core
idea of the application web is to maintain links back to the developers data. So
it would be possible at any time in the life cycle of the software to track a prob¬lem back to its roots [Mur97, Sche98]. Of course, these links will in generalnot be accessible to anybody. As an illustration, some scenarios will now be
described:
Remote on-line consulting. A software developer out-sources the support and
maintenance services to a intermediary (a consulting company). Such a con¬
sultant would have to acquire a license to provide support and the right to
access the developers internal database. In turn, customers facing problemsusing a certain product will automatically be rerouted to the consultant, when
following the help link available in their software.
Customer specific versions. A customer unhappy with a certain version of a
component may follow the maintenance link back to the developers site, where
he may request additional functions. A developer receives all the importantdata about a component, such as version number and configuration. He may
then build a new variant of the component according to the clients needs. On
completion, the client will be notified and may download the new version from
the web immediately.
Federated software development. Several software companies developing a
component system may use the services of the application web to provide data
for their partners. As these partners may be competitors in other fields, onlyrelevant links will be accessible to them. The application web allows a fine
grained access control, supporting several service levels.
A prototype of the application web was developed using Java technology.What are the services of the application web?
• Naming and versioning: It is important to maintain a simple, globally scala¬
ble naming scheme for versioned configurations distributed across organisa¬tions [VM99]. The naming scheme employed is based on the reversed
internet addresses of the accessible components (similar to the namingscheme ofJava's classes, e.g. org.omg.CORBA.portable).
18 2 Electronic Commerce with Software Components
• Persistence, immutability and caching: It is important that links will not be
moved or deleted. Participating organisations have to ensure the stability and
accessibility of the linked information. Repositories (another B2B service in
a VSH) could provide reliable, persistent bindings between versioned pack¬
age names and their contents. For example they may support WWW brows¬
ing and provide querying facilities.
• Reliable building: Versioned components contain links to other components
and packages (binaries) they import during the building process. As these
links will be available through the application webs services, building (even
with older versions of libraries) will always be possible.• Application inspection: Java's introspection techniques allow to (remotely)
gain insight into the components configuration at run-time. This feature is
very important in the consulting and special version scenarios above. They
allow e.g. a consultant to collect information about the actual components
environment. This information may be used to query the knowledge base
for possible known bugs or incompatibilities.
2.3 FormalMethods andElectronic Commerce
Consulting will become an emerging market in the advent of electronic com¬
merce, the main reason being the decoupling of sales and consulting. As there
will be less or no personal contact between customer and vendor in e-com-
merce, customers will not be willing to pay a price for a product so far justified
by the personal advice of the sales representative in conventional business. The
job of a sales representative will shift to a consultant's job. Consulting and sale
will be two different profit centres. It should be noted that this is to the advan¬
tage of the customer, too. He will get fair and transparent prices for the product
as well as for additional advisory services. The consultant will be more neutral,
as he does not have to sell the most profitable product to earn his salary, but the
best (from the client's point of view) if he wants to keep his clients. Could
rationalisation also cut consultative jobs? Not on a short and middle term.
There are several reasons for this answer which will be discussed in the follow¬
ing sections. All these reasons are related to the (limited) applicability of formal
methods in computer science.
2.3.1 Properties ofFormalMethods
Formal notations are applied in specifications of software. They allow — on the
basis of a sound mathematical model - to describe precisely the semantics of a
2.3 Formal Methods and Electronic Commerce 19
piece of software, e.g. the observable behaviour of a component. Many formal
notations have a rather simple semantics, thereby lending themselves to mathe¬
matical reasoning and automated proof techniques. But capturing the seman¬
tics of a problem has remained a hard task, although there has been extensive
research on this topic during the last decades. The following discussion will
focus on some major aspects of formal methods and their effect on commer¬
cially distributed software components.
Scalability. Unfortunately formal methods do not scale, i.e. they cannot keep
up with growing^ system. E.g. proving that an implementation matches its
specification is intractable for programs longer than a few hundred lines of
code [SD97]. The main reason is the simple semantics formal notations fea¬
ture. Typically they are lacking type systems, namespacing, information hidingand modularity. Introducing such concepts complicates these formalisms in
such a way that the complexity of their semantics catches up with those of con¬
ventional programming languages. On the other hand, the programming lan¬
guage designers learned many lessons from the formal semantics community.This lead to simpler programming languages with clearer semantics. Examplesare Standard-ML [Sml97] (functional programming with one of the most
advanced type and module systems), Oberon [Wir88] and Java [GJS96]
(imperative programming, simplicity by omitting unnecessary language fea¬
tures). These languages have proven their scalability in many projects.
Comprehensibility. To successfully employ formal methods, a sound mathe¬
matical knowledge is presumed. This can be a major hindrance to the introduc¬
tion of formal methods, as many programmers in the IT community do not
have a university degree or a similar mathematical background. Of course,
improving education for programmers is important. But this does not com¬
pletely solve the problem. For describing syntax, the Backus-Naur-Form (BNF)
[Nau60] or the Extended Backus-Naur-Form (EBNF) [Wir77b] in combina¬
tion with regular expressions [Les75] (micro syntax) has become a standard.
For the specification of semantics there is no such dominant formalism availa¬
ble. Semantics has many facets, which will be addressed by different specifica¬tion formalisms. In widespread use and easy to understand are semi-formal
specification languages. The most prominent among them is the Unified Mod¬
eling Language (UML). Notabene, UML is a good example iotone single for¬
malism that does not suffice to capture all facets of semantics: Unified does not
denote one single formalism, but rather expresses the fact that the four most
well-known architects of modeling languages agreed on a common basis. In
fact, UML features over half a dozen different diagram classes. UML is semi-
growing in terms of complexity, often even in terms of lines of code.
20 2 Electronic Commerce with Software Components
formal, because the different models represented by the different diagrams have
no formal correspondence. This correspondence is either given by name equiv¬
alence (in simple cases) or by textual statements in English (complex cases).
Specifications can be separated into two classes: declarative and operational
semantics. Declarative semantics describes what has to be done, but not how.
Operational semantics on the other hand describes in more detail how some¬
thing is done. To many programmers, the latter are easier to understand, as
they are closer to programming languages than declarative descriptions.
Efficiency. Why distinguish between a specification and an implementation? If
there is a specification available, why going through the error prone process of
implementing it? Basically, a specification and its corresponding implementa¬
tion can be viewed as two different - but equivalent — descriptions of a prob¬
lem and its solution. However, in practice, it is very hard to run formal
specifications efficiently (compared to C / C++ code) on a computer.
• In declarative specifications, the 'how' is missing. This means, that a code
generator would have to find out on its own which algorithm should be
applied for a certain problem. In general, declarative languages were not
designed for execution, but to reason on them. It is for example possible to
decide on the equivalence of two different declarative specifications. In the
PESCA project [Schw97, SD97] this property was used to show the equiva¬
lence of an implementation with respect to its formal specification. Using
algebraic specifications, the proof of equivalence was performed employing
semi-automated theorem proofing tools. In order to compare the implemen¬
tation with its specification, the implementation had to be transformed into
algebraic form as well. This could be done automatically in 0(1) where / is
the length of the implementation. Because of its operational origin, this
transformed specification had to be executed symbolically in order to be
compared. During this execution, terms tend to grow quickly and therefore
term rewriting gets slow and memory consuming. Apart from very simple
examples, comparing specifications does not (yet) seem to be a tractable task.
This is very unfortunate, as in the context of a VSH, a specification search
facility would be an interesting feature: given a specification of a certain
problem, is there a set of components solving it?
• Operational semantics does not do much better. Although the semantics is
already given in terms of algorithms, most operational specification lan¬
guages feature existential and universal quantifiers. In general, these quanti¬
fiers cannot be implemented efficiently (linear search over the underlying
universe, the size of which may be unknown).
Completeness. Bridging the gap between specification and implementation is
one of the trickiest parts when employing formal methods. It is important to
2.3 Formal Methods and Electronic Commerce 21
guarantee that the implementation obeys its specifications. As discussed above,
this is only possible with a considerable overhead of time and man-power (the
problem has to be specified, implemented and the correspondence has to be
proven). But it does not suffice to do this for new components only, it also has
to be done for the compiler, the operating system and the underlying machine!
On all these levels of specification one will face the above mentioned problems.Why can there be no gaps tolerated?
Specifying (existing) libraries of components is an expensive task, as it binds
considerable resources of well educated (and paid) specialists and not to be
underestimated computing power. This would only pay off, if consulting on
components could be eliminated and replaced by e.g. specification matching.Bridging the gaps between specifications and implementations completely is
necessary in order to avoid this automated process to interrupt and ask for user
assistance. In their work, Moorman-Zaremski and Wing investigated signaturematching [MZW95a] and specification matching [MZW95b]. Matching onlysignatures of functions featured surprising results. Queries normally returned
only a few results which had to be examined by hand. An experienced user, on
the other hand, could decide with very little effort on which function to use.
Full specification matching cannot be guaranteed to be accomplished without
user interaction (the underlying theorem prover might ask for directions or
proof strategies). Considering that these user interactions are at least as com¬
plex as the decision between a handful of pre-selected functions, the questionarises whether it makes sense to use formal specifications at all in this scenario.
Openness. Should a component reveal its specification/implementation at all?
In many business scenarios, giving away a formal specification or the source
code of a component is not a topic as the company's know-how is a primaryasset. Publishing this know-how would have considerable consequences on the
business model. The global market for operating systems may serve as an exam¬
ple: The free Linux versus the black box Windows: Free software products can¬
not be sold whereas black box software cannot be trusted. Of course, there are
many different shades of gray: from "Free Software^", "Open Source^", "Free¬
ware", "Public Domain", "Shareware", "Licensed Software" up to buying all
rights on a specific piece of software. Both sides, software developer and user
have to decide on a specific distribution model.
Conclusions and implementation decisions. The original Montages [see
section 3.3] used ASMs as specification formalism. The intended closeness to
As propagated by Richard Stallmann, founder of the GNU project (see www.gnu.org/philosophy).
As propagated by Eric S. Raymond and Bruce Perens, founders of the Open Source Initiative (see
www.opensource.org).
22 2 Electronic Commerce with Software Components
Turing Machines resulted in the ASMs being simple and easy to understand.
However, there are some drawbacks. ASMs lack modularity and have no type
system; on the other hand, they have a semantics of parallel execution with
fixed-point termination. All these features distinguish them enough from con¬
ventional programming languages so as to scare off C++ or Java programmers.
As a typical representative ofan operational specification language, ASMs focus
on algorithmic specifications and support restricted reasoning only.
When deciding on an implementation platform for our Montage Compo¬
nent System, ease of use, clarity of specifications and compatibility were major
criteria. The Montage model itself proved to be very useful for language specifi¬
cation; therefore, its core model was chosen as a base for our implementation.
However, ASMs were replaced by Java to reflect the considerations in this sec¬
tion. It also allows many programmers to understand the MCS without prior
learning of a new formalism and thus seeing MCS as a tool rather than an
abstract formal specification mechanism. The designers of Java learned many
lessons from the past, avoiding pitfalls of C/C++ but still attracting thousands
of programmers. Other considerable advantages of Java over ASMs are the
availability of (standard) libraries and an advanced component model (Java
Beans).
Chapter 3
Composing Languages
Programming languages are not fixed sets of rules and instructions, they evolve
over the years (or decades), as they are adopted to changing needs and markets.
This evolution might lead to a new similar language (a new branch in the lan¬
guage family) or just to a new version of the same language.Examples are the Pascal language family, which evolved over the past three
decades to many new languages. Among them: Pascal QW74], Modula
[Wïr77a], Modula-2 [Wir82], Modula-3 [Har92], Oberon [Wir88], Oberon-2
[MW91], ComponentPascal [OMS97], Delphi [LisOO]. Another important
dynasty of programming languages are the C-like languages: Starting with C
[KR88](K&R-C (Kernighan and Ritchie) and ANSI-C are distinguished) and
evolving to Objective-C [LW93] and C++ [Str97]. The latter underwent sev¬
eral enhancements during its two decades of existence, e.g. introduction of
templates.Java [GJS96] as a relatively new, but nevertheless successful language has
already undergone an interesting evolution: It started as a Pascal like language(then called Oak) and got its C++ like face in order to make it popular. Com¬
pared to C++ or Delphi, it lost many features as for example its hybrid charac¬
ter (all methods have to be class-bound), address pointers, operator
overloading, multiple inheritance, and templates. Java's success soon led to new
enhancements of the language (inner classes) and to new dialects (Java 1.0
[GJS96], Java 1.1 [GJSB00], JavaCard [CheOO]).Some basic concepts are the same in all the above mentioned programming
languages, e.g. while loops, if-then-else statements or variable declarations.
These constructs may have different syntax but their semantics remains the
same. All imperative programming languages basically differ in the number of
language constructs they offer to the programmer. For example, C++ can be
described as C plus additional language constructs. In general, programming
languages are simply composed of different language constructs.
23
24 3 Composing Languages
3.1 Partitioning ofLanguage Specifications
A programming language specification is usually given in form of text and con¬
sists of one or several files containing specifications given in a specialised nota¬
tion or language1. Well known examples are (E)BNF for syntax specification
[Wir77b] and regular expressions as they are used in Lex [Les75]; even the
make utility [SMOO] controlling compilation (of the compiler or interpreter)
can be mentioned here. Usually a language specification is structured into dif¬
ferent parts, each corresponding to one particular part of the compiler and
often each of these parts is given in a separate notation/language. From a soft¬
ware engineering point of view this makes sense, as such partitionings of speci¬
fications allow for separate and parallel development - and for reuse — of
different parts of the compiler.In principle, a language specification can be partitioned/modularised in two
different ways:
1. It is split into transformation phases like scanning, parsing, code generation
etc. This corresponds to the well known compiler architecture; we will call it
the horizontal partitioning scheme.
2. It is described language construct by language construct. Each of these con¬
struct descriptions, we call them Montages, contains a complete specifica¬
tion from syntax to static and dynamic semantics. We call this approach the
vertical partitioning scheme.
We will consider the pros and cons of both approaches in the following sec¬
tions.
3.1.1 Horizontal Partitioning
The conventional approach of partitioning a compiler into its compilation
phases is very successful. It has been established in the 1960s along with the
development of regular parsers. The idea was to split the compilation process
into two independent parts: the front-end and the back-end fig. 5). The
front-end is concerned with scanning, parsing and type checking while the
back-end is responsible for code generation.
Advantages: This partitioning is well suited for large languages with highly
optimized compilers, and it meets exactly the needs of compiler constructors
like Borland and Microsoft, having numerous languages in offer. The separa¬
tion of front- and back-end allows them to build several front-ends for differ¬
ent languages, all producing the same intermediate representation. From this
In fact, such specification languages are good examples for "little languages" or DSLs.
3.1 Partitioning of Language Specifications 25
Figure 5: The typicalphases ofa compilerpartition its design horizontally
intermediate representation, different back-ends can generate code for different
operating systems and microprocessors. The intermediate representation serves
as a pivot in this design and reduces complexity2 from O(l-t) to 0(l+t) where /
is the number of languages and t the number of target platforms.This kind of modularity allows for fast development of compilers of new
languages or of existing languages on new platforms. Only specific parts of the
compiler have to be implemented anew, the intermediate representation is the
pivot in this design. Even within the front- and back-end phases, modules
might be exchanged. An example is the code optimizer that could be replacedby an improved version without altering the rest of the compiler.The availability of tools supporting the horizontal partitioning approach
simplifies the task of building a compiler (seeReiated Work in section 6.4 for an
overview). Most of these tools support only front-end construction, but some
also offer support on code generation. Especially code optimization is hard to
automize due to the variety of hardware architectures. The horizontal parti¬
tioning allows to perform optimization on the entire program. Many optimiza¬tion techniques - like register colouring, peep-hole optimization or loopunrolling - cannot be applied locally or per construct as their mechanisms are
Here, the complexity of managing all the languages and target architectures is meant
26 3 Composing Languages
based on a more global scale. For example register colouring will examine a
block or a whole procedure at once. As everything can be done within the same
phase, no accessibility problems (due to encapsulation and information hiding)have to be solved.
Disadvantages: Each module of a traditional compiler contains specificationsfor all language constructs corresponding to its phase. I.e. a module specifies
only a single aspect of a construct (e.g. static semantics) but it does this for all
constructs of the language. The complete specification of a single languageconstruct is thus spread over all phases. Therefore, the horizontal partitioningis not well suited to experiment with a language, i.e. to generate various ver¬
sions to gain experience with its look and feel. In general, applying only minor
changes to a language can be very costly, especially if the changes affect all
phases of the compiler. This is usually true if new language features and/or new
language constructs are added. There are roughly three different levels of
impact a change can have on a horizontally partitioned compiler:
1. Only one single phase is affected: Examples would be minor changes in the
syntax, like replacing lower case keywords with upper case keywords, or an
improved code optimizer.2. Some, but not all phases are affected: This is the case if language constructs
are introduced that can be mapped to the existing intermediate format. As
an example, consider the introduction of a REPEAT until loop into a lan¬
guage that knows already aWHiLE-loop. In this case, the change will have an
impact on the front-end phases, while the back-end remains unchanged.3. All phases have to be changed: This is the case if the changes in the language
constructs cannot be mapped to the existing intermediate representation. An
example is the introduction of virtual functions in C in order to enhance the
language with dynamic binding features. The code generator now has to
emit code that determines the function entry point at run-time instead of
statically generate code for procedure calls.
All these changes have in common that they potentially have an impact on the
whole language. Even if only a single phase is affected, the change has to be
carefully tested on undesired side-effects, which cannot be precluded due to the
lack of encapsulation of language construct spécifications within a phase.
3.1.2 Vertical Partitioning
Instead of modularizing a compiler along compilation phases, each languageconstruct is considered a module (Fig. 6). Such a module — we call it a Mon-
3.1 Partitioning of Language Specifications 27
4—'
c ^_C
•—» <D COCD
c
CD E c
0O E
CDE
CD4—» E CD
COCD
CC
CO•—•
COc 3
•a CO
0)1
_CD1c
w
<
CDÜO
£ Q_ CD
DC
Montage Component System
Figure 6: Verticalpartitioning along language constructsplugged into the MontageComponent System
tage - contains a complete specification of this construct, including syntax as
well as static and dynamic semantics.
Vertical partitioning of a compiler is very similar to the way beginners do
first translations^. They try to identify the main phrases of a sentence. Then
each phrase is parsed into more elementary ones until, finally, they end with
single words. Now each word can be translated and the process is reversed,
combining single words to phrases and sentences. Our approach supports this
idea in a similar way. A program is subdivided into a set of language constructs,
each of them specified in a single module. Once the program is broken up into
these units, translation or execution is simple, as only a small part of the whole
language has to be considered at once. Then these modules are re-combined
using predefined interfaces. Section 3.3 will elaborate in more detail how this is
done.
Advantages: A vertical partitioning scheme is very flexible with respect to
changes in a language. Modifications will be local, as they usually affect only a
single module. As each module contains specifications for all aspects of the
compilation process for a certain construct, unintended side-effects with other
modules are very unlikely. This is in contrast to the horizontal partitioningscheme, where side-effects within a phase potentially affect the whole language.As an example, we mentioned already the introduction of virtual functions in
This applies both to the translation of natural languages and to the act of understanding a pro¬
gram text when reading.
28 3 Composing Languages
C, affecting all phases ofthe compiler. As the phases do not feature any form of
further modularization, unintended side-effects can easily be introduced.
Component frameworks like Java Beans or COM allow to build new sys¬
tems by composing pre-compiled components. The vertical partitioning sup¬
ports this approach, as Montages are compiled components which can be
deployed in binary form. This in turn opens many possibilities in marketing of
language components. As they do not have to be available in source form —
which is the case with conventional horizontally partitioned compiler specifica¬
tions - the developer does not have to give away any know-how about the
internals of the language components.
Not only the developer profits from pre-compiled language components, the
user does so too. Combining them and testing the newly designed languages is
much less complex than testing all phases of a conventional compiler. New
groups of users have now access to language design, as the learning curve is flat¬
tened considerably. In the best case, a new language can be composed com¬
pletely from existing language components. Normally, however, language
construction is a combination of reusing components and implementing new
ones. In this case, the amount ofeffort is still reduced, as with the availability of
a language component market, the need for implementing new components
decreases over the time.
Disadvantages: Construction of an efficient and highly optimizing compiler
will be very difficult with our approach. Optimization relies on the capability
to overview a certain amount of code, which is exactly what vertical partition¬
ing is not about. Ease of use and flexibility of deployment are of primary inter¬
est in our system.
For main stream programming languages that have a huge community, effi¬
ciency can be achieved by using conventional approaches, preferably in combi¬
nation with our system: Language composition supports fast prototyping and
is used to evaluate different dialects of a new language. When this process con¬
verges to a stable version, then an optimizing compiler can be implemented
using the phase model.
3.1.3 Static and Dynamic Semantics ofSpecifications
A language specification contains static and dynamic semantics in a similar
manner as a program does (in fact, language specifications often are programs:
either compilers or interpreters). The partitioning schemes as described above
are part of the static semantics of language specifications. They define how
specifications can be composed, not how the specified components interact in
order to process a program.
3.2 Language Composition 29
It is important to distinguish between the structure of the specification and
the structure of the transformation process that turns a program text into an
executable piece of code. In a horizontal partitioning scheme, modularization is
done along the transformation phases, i.e. each phase completes before the next
can start. In other words, partitioning scheme and control flow during compi¬lation are the same.
In a vertical partitioning scheme, a program text is transformed in a similar
manner as in conventional compilers: phase after phase. It is simply a causal
necessity to parse a program text before static semantics can be checked which
in turn has to be done before code generation. But control flow switches
between the specification modules during all phases of compilation.
3.2 Language Composition
A note to those familiar with compiler-compilers: Usually these tools rest on a
specification of the translations. A Montage specifies also the behaviour of a
language construct, but in a purely operational manner. Therefore, translations
are programmed rather than specified.
3.2.1 The Basic Idea
Modularity available in the specification of a programming language will be
destroyed by most compiler construction due to their horizontal partitioningscheme (for a detailed discussion see section 3.1.1). Modularity and composi-
tionality of a programming language is therefore only available to compilerimplementors, but not to the users of a compiler, i.e. to the programmers. Of
course, the programmer is free to use only a subset of a given language, but he
will never be able to extend the language by his own constructs or to recombine
different languages to form a more powerful and problem specific language.The Montage Component System will provide these features to the pro¬
grammer. It allows to separately specify language constructs. From such a spec¬
ification, a language component can be generated that is pluggable to other
language components, and therefore suited to build new languages on a plugand play basis.
In contrast to existing compiler compilers, reuse of specifications can be
applied on a binary level rather than on a source text level. MCS generates
compiled software components from the specifications. These can be distrib¬
uted independently from any language. Each component is provided with a set
of requirements and services that can be queried and matched to other compo-
303 Composing Languages
nents. Nevertheless pre-compiled language components need a certain flexibil¬
ity, e.g. the syntax has to be adaptable or identifiers need to be renamed in
order to avoid naming collisions.
Creation of a new programming language is a tool-assisted process. It is pos¬
sible to build a language from any combination of newly specified and existing
components. The latter could e.g. be downloaded from various sources on the
Internet. The system will check that all requirements of the components are
met and prompt the user for missing adaptations. If there are no more reported
errors and warnings, the new language is ready to use, i.e. an interpreter is
available.
How does such a system work in detail? An in-depth description of the
Montage Component System is given in chapter 4 and details about its design
and implementation in chapter 5. To understand our approach in principle,
section 3.3 below should be read first, as it provides an introduction to the
Montage approach.
3.2.2 On Benefits and Costs ofLanguage Composition
The established methods have successfully been deployed for over two decades.
They produce stable, efficient and well understood compilers and interpreters.
Unfortunately they are rather rigid when considering changes. Especially dur¬
ing the design phase of a new language, a greater flexibility is desirable. For
example, it should be possible to produce and test dozens of variants of a new
language before it is released.
The most important aspect of language composition is that pre-compiled
components can be reused. The advantages can be summarised as follows:
1. Language composition is now accessible to non-experts in the field of pro¬
gramming language construction. Pre-compiled components will be
designed, developed and distributed by experts. Languages composed of
such components profit from the built-in expert knowledge.
2. The development cycle for a new language can be drastically reduced. On
the one hand because the pre-compiled components need no further testing
and on the other hand due to the abbreviated specification-generation-com¬
pilation cycle: the pre-compiled components need no further compilation.
3. Reuse is done on a binary level, reducing the possibility of text copying
errors. In combination with the limited impact that a component has on the
whole language, this again results in a more reliable and flexible language
design method.
This list may give rise to some questions and objections, which need to be dis¬
cussed:
3.2 Language Composition 31
Who composes languages? Programming language design and implementationwill be simplified, such that it will be applicable for non-experts in the field of
programming language construction. Is this desirable? Should this domain not
be reserved to experts?Similar debates were held about operator overloading (e.g. in C++). Should
the programmer be given the opportunity to alter the meaning of parts of the
programming language? Although language construction is much more power¬
ful than operator overloading (which does not add any fundamental power to
the language — it is just a notational convenience), the basic question remains
the same: Should the programmer have the same rights and power on the lan¬
guage than the language implementor?The effort of creating a new language is still great enough, to think about it
thoroughly, before one indulges in language composition. Of course there will
always be designs of lower quality. But these will eventually vanish, as they will
not be convincing enough to be reused in further languages. Only the best
designs will survive on a long term, because their components will be reused in
many different languages. Simonyi compares this process with the survival of
the fittest in biological evolution [Sim96].
We foresee two main areas where vertically partitioned systems - like MCS -
are particularly suitable: education on the one hand, and design and implemen¬tation of domain-specific languages (DSLs) on the other hand:
1. In education, an introduction to programming can be taught using a very
simple language in the beginning, and then refine and extend it stepwise.This solves a typical dilemma in programming courses: Teaching either a
main stream programming language from the beginning or starting with a
didactically more suited language. The first approach faces the problem that
one has to cope with the whole complexity of e.g. C++ from the very begin¬ning. The second approach wastes a lot of time introducing a nice languagewhich will not be used later on. A further effort then has to follow to teach
the subtleties of a main stream language.Using MCS, a teacher can start teaching C++ with a subset of the languagethat is simple and safe. Then step by step he can refine and extend the lan¬
guage, using new Montages or refined versions of existing ones. During the
whole course, basically the same language can been taught; the transitions
from one complexity level to the next are smooth.
The system can be used to its full flexibility: For introductory courses, readymade Montages are available, they only need to be plugged together. Ateacher does not need to have any knowledge of compiler construction, the
students will use only the end-product, namely the compiler or interpreter,but they do not need to understand our system. In more advanced courses,
323 Composing Languages
teachers may explain details of a language construct by showing the corre¬
sponding Montage. And in compiler construction courses, the system can be
used by the students to build new languages themselves.
2. Domain specific languages are typically small languages that have a limited
application area and a small community ofprogrammers. Well known exam¬
ples are Unix tools as sed, awk, make, etc. [Sal98].In some cases, they are
only employed one single time (e.g. to control data migration from a legacy
system to a newer one). In such cases it is not worth constructing optimizing
compilers or inventing a highly sophisticated syntax. Often, such languages
have to be implemented within very tight time bounds. In all these scenar¬
ios, language composition offers an interesting solution to the problems.
Creating a new language might be done within a few hours. Reuse of exist¬
ing language components simplifies development and debugging and
reduces the fault rate in this process [KutOl].
The flexibility of phase model approaches is sufficient for language
development. It is possible to produce dozens of compilers with lex and yacc
too. Of course, it is possible to generate compilers for numerous variations of a
language. But normally this is a fairly sequential and time consuming process
because careful testing has to guarantee no unwanted side-effects. Especially in
education and DSL design the flexibility available in traditional phase model
approaches might not be sufficient.
Student exercises in language design or compiler construction are a good
example to elaborate on this statement. During such a course, students nor¬
mally have to implement a running compiler for a simple language in the exer¬
cises. The lack of sufficient modularization within a compiler phase, forces the
student to an "all or nothing" approach. He has to specify and generate the
phase in its full complexity (i.e. all language constructs at once) in order to have
a running system that can be tested. There are a myriad of possibilities for mak¬
ing mistakes. This is not only discouraging for students but also for their tutors
which have to assist them. Montages could be used in such courses to improve
modularity in the student projects. Once a Montage has been compiled suc¬
cessfully, it can be reused without alteration in future stages of the student
compiler. Changes in one Montage do have limited effects on others and thus
debugging will become easier for both, student and tutor. Success in learning is
one of the most motivating factors in education [F+95]. The time gained can
be used to deepen the students insight into the subject, to broaden his knowl¬
edge by being able to cover more subjects or simply to reduce stress in educa¬
tion.
Experimenting with a language will normally improve its design and its
expressiveness but experimenting takes time. The reduced development cycle
3.3 The Montages Approach 33
time is another reason to prefer vertically partitioned systems over phase model
approaches. As the time it takes to generate a single version of a language is
reduced, it is possible to either develop languages faster or to generate more
variants of a language before deciding on one. Faster development is interestingto industrial DSL designers, whereas experimentation is of advantage to both,
students and professionals. Students will get a better understanding if they are
able to easily alter (parts of) a language and professionals will profit from the
experience they gained during experimentation.
How about pre-compiled compiler phases? Wouldn't this improve the
performance ofestablishedapproaches?]?re-compi[ed compiler phases would
only make sense in a few areas and in general they would even complicate lan¬
guage specifications as is illustrated with the following examples. Changing the
syntax of a while statement from lower to upper case keywords would be easy,
as only the scanner phase is involved. But suppose a simple goto language that
shall be extended with procedures. Changes in the scanner and parser phase are
obvious, but also the code generator needs to be redesigned, as the simple goto
semantics probably would not suffice to model parameter passing and subrou¬
tines efficiently. Pre-compiled compiler phases would be a hindrance in this
case. The language designer would need to have access to the sources of the
pre-compiled parts, in order to re-generate them. With MCS, the same prob¬lem would be solved by introducing some Montages that specify the syntax and
behaviour of procedures. Changes to existing Montages can be necessary as
well, but they will be implemented elegantly by type-extending existing Mon¬
tages.
No testing ofpre-compiled components?Of course, interaction of components
in a newly composed language has to be tested. But these tests happen on a
higher level of abstraction, closer to the language design problem. Testing of
component internals does not have to be considered any more. Componentsinteract only through well defined interfaces which restrict the occurrence of
errors, simplify debugging, and accelerate testing in general.
3.3 The Montages ApproachOur work is based on Montages [KP97b, AKP97], an approach that combines
graphical and textual specification elements. Its underlying model are abstract
state machines [Gur94, Gur97]. The following overview introduces the basic
concepts and ideas of Montages in order to provide a better understanding of
the following chapters. Readers familiar with Montages may skip this section
34 3 Composing Languages
and continue with chapter 4. Detailed information on Montages can be found
in [KP97b, AKP97] and in Kutters thesis [KutOl].
3.3.1 What is a Montage?
A complete language specification is structured in specification modules, called
Montages. Each Montage describes a construct of a programming language by
«extending the grammar to semantics». A Montage consists of up to five com¬
ponents partitioned in four parts (Fig. 7 shows Java's conditional operator
1. Syntax: Extended Backus-Naur Form (EBNF) is used to provide a context-
free grammar of the specified language L. A parser for L can be generatedfrom the set of EBNF rules of all Montages. Furthermore, the rules define in
a canonical way the signature of abstract syntax trees (ASTs) and how parsed
programs are mapped onto an AST. The syntax component is mandatory,the following components are all optional.
2. Control Flow and Data Flow Graph: The Montage Visual Language(MVL) representation has been explicitly devised to extend EBNF rules to
finite state machines (FSM). Such a graph associated to an EBNF rule
defines basically a local finite state machine. Each node of the AST is deco¬
rated with a copy of the FSM fragment given by its Montage. The reference
to descendents in the AST defines an inductive construction of a globallystructured FSM. Control flow is represented by dashed arrows. Data may be
stored in attributes of Montage instances (in our example, the attributes
staticType and value are defined for every Montage).Control flow always enters a Montage at the initial edge (I) and exits at
the terminal edge (T). Control flows may be attributed with predicates. E.g.one control flow leaving from the branching node in Fig. 7 shows the predi¬cate cond. value = true. Branching of control flow may only occur in
terminal nodes. This is due to the condition that there is only one control
flow leaving each Montage (T). The default control flow is indicated byabsence of predicates.
3. Static semantics: A transition rule that does static analysis can be provided.Such rules may fire after successful construction of an AST for a given pro¬
gram.
In Fig. 7, the static type of the conditional operator is determined duringstatic analysis, which is — in this case — not a trivial task. To enhance read¬
ability, macros can be used (e.g. CondExprType).
3.3 The Montages Approach 35
ConditionalExpression ::= ConditionalOrOption "?" Expression":" ConditionalOption
S-ConditionalOrOption.value« S-Expression
S-ConditionalOrOption result > - -»-J
S-ConditionalOption
staticType := CondExprType(S-Expression, S-ConditionalOption)
condition S-ConditionalOrOption.staticType = "boolean"
©result:
if (S-ConditionalOrOption.value) then
value := S-Expression.valueelse
value := S-ConditionalOption.valueendif
Figure 7: Montagefor theJava conditional expression
4. Conditions: The third part contains post conditions that must be estab¬
lished after the execution of static analysis. In our example, static type check¬
ing occurs.
5. Dynamic Semantics: Any node in the FSM may be associated with an
Abstract State Machine (ASM) [Gur94, Gur97] rule. This rule is fired when
the node becomes the current state of the FSM. ASM rules define the
dynamic semantics of the programming language.In the fourth section of Fig. 7, the ASM rule specifies what happens at run¬
time when a conditional operator is encountered. Rules in this section are
always bound to a certain node in the FSM. The header of each rule (here:
@result) defines this association. Note that there may be also predicatesdefined in thegraphical section which are evaluated at runtime.
3.3.2 Composition ofMontages
The syntax of a specified language is given by the collection of all EBNF rules.
Without loss of generality, we assume that the rules are given in one of the two
following forms:
A::=BCD (1)
E = F\G\H (2)
363 Composing Languages
The first form defines thatA has the components B, C and D whereas the
second form defines thatE is one ofthe alternatives F, G or H. Rules ofthe first
form are called characteristicproductionsand rules of the second form are called
synonym productions. Analogously, non-terminals appearing on the left-hand-
side of characteristic rules are called characteristic symbols and those appearing
in synonym rules are called synonym symbols. One characteristic symbol is
marked as the start symbol. It must be guaranteed (by tool support) that each
non-terminal symbol appears in exactly one rule as the left-hand-side.
The two forms of EBNF rules also determine how a language specification
can be constructed from a set of Montages by putting them together.
1. A Montage is considered to be a class^ whose instances are nodes in the
abstract syntax tree. Terminal symbols in the right-hand-side of the EBNF,
e.g. identifiers or numbers, are leave nodes of the AST (represented by ovals
in MVL), they do not correspond to Montages. Non-terminals, on the other
side, are (references to) instances of other Montage classes. Such attributes
are called selectors and are represented by a rectangle and a prefixing «S-».
Each non-terminal in a Montage may have at most one in-going and one
out-coming control flow arrow. This rule allows in a simple way to compose
Montages: The referenced Montage's /and Narrows are connected with the
in-going and out-coming control flow arrow respectively.
2. When using sub-typing and inheritance, synonym symbols can be consid¬
ered as abstract classes. They cannot be instantiated but provide a common
base for their right-hand-side alternatives.
After an AST has been built upon a program input, static semantic rules
may be fired. The idea is to «charge» all rules simultaneously and trigger their
firing by the availability of all occurring attributes (i.e.^ undefi. In our example
the static semantics rule can only be fired when all referenced attributes become
available, i.e. attribute staticType, isConst and value of S-Expres-
sion and S-ConditionalOption are defined. As soon as all attributes for
some Montage become available, the firing begins. In this process further
attributes may be computed and so the execution order is determined automat¬
ically and corresponds to a topological ordering of rules according to their
causal relations. This approach was adopted from [Hed99]. Eventually all static
semantics rules are fired. If not, an error occurs, either because execution of
some rules was faulty, or because one or more attributes never got defined dur¬
ing the firing process. Usually the latter indicates design flaws in the language
specification.
4 in the sense of e.g. a Java class
5 used in the macro CondExprType that is not shown here.
3.3 The Montages Approach 37
Another approach would be to predetermine the execution order, e.g. a pre-
order traversal of the AST. Experience has shown that one predeterminedtraversal often does not suffice. This problem can be solved, but it leads to
clumsy static semantics rules because they have to keep track of «passes».
Once static analysis has successfully terminated, the program is ready for
execution. Control flow begins with the start symbol's/arrow. When encoun¬
tering a state, its dynamic semantics rule is executed. Control is passed to the
next state along the control flow arrow whose predicate evaluates to true. Such
predicates are evaluated after executing the rule associated to the source node.
The absence of a predicate means either true (only one control flow arrow) or
the negated conjunction of all other predicates leaving the same source node.
This scheme of local control flow has its limits, e.g. when describing (virtual)method calls (Fig. 8). Neither is the call target local to the calling Montage(with respect to the parse tree), nor can it be determined statically. In such
cases, non-local jumps can be used. They are distinguished from normal con¬
trol flows by the presence of a term which is evaluated at runtime to compute
the jump target. Moreover, the box representing the jump target is no selector
and is therefore not marked as such. In Fig. 8 theMethodDeclaration
box represents the class MethodDeclaration and the term Dispatch com¬
putes the appropriate instance at run-time.
File Edit ( Windows =d
ojOjMethodlnvocation ::= Nameldent "<" [ Expression <
LIST
Expression > ] ">"
1 S-ExpressionDispatch(S-NameIdent,Name, dynamicType)
~ ~ " ~
,
y
MethodDec1arat1on
staticType :=LookUp(S-NameIdent,Name),staticType
y//LookUpgets
Montageinstanceforgivenname*//dynamicTypereflectscurrentobject'stype//Dispatchdoeslatebindingcondition^LookUprS-Nameldent.Name)"T=Tindef)"and——-—j(foralleinlistS-Expression:e.staticType=LookUp(S-NameIdent,Name).forma1Param<e),statlcType')Figure8:Montageformethodinvocations(screenshotofGemlMextool)
383 Composing Languages
The closest related work to Montages is the MAX system [PH97a]. As Mon¬
tages, MAX builds upon the ASM case studies for dynamic semantics of imper¬
ative programming languages. In order to describe static aspects of a language,
the MAX system uses occurrence algebras, a functional system closely related to
ROAG [Hed99].
A very elegant specification of Java using ASMs can be found in [BS98].
This specification abstracts from syntax and static semantics but focuses on
dynamic semantics. Its rules are presented in less than ten pages.
Chapter 4
From Compositionto Interpretation
This chapter describes the concepts behind the Montage Component System.
Algorithms and data structures are discussed in an abstract form which is neu¬
tral with respect to a concrete implementation in any specific language or com¬
ponent framework. Implementation details are discussed in the next chapter.Those readers not familiar with the Montage approach should read the preced¬ing section 3.3 first in order to get an introduction. More in-depth information
can be found in [AKP97, KutOl]. These publications give a well-founded
description of Montages. Its mathematical background is based on abstract
state machines (ASM).
For reasons discussed in section 2.3 we focus on an implementation using a
main stream programming language. Simplicity, composability and ease of use
are our main goals, and, in combination with our different formalism (Javainstead of ASMs), this explains that our notion of a Montage differs in some
details from the original definition. Therefore, we first present some definitions
that render a notion of a Montage in MCS. These definitions are implementa¬tion independent, although they are given with an object-oriented implemen¬tation and a component framework (such as they are discussed in section 6.5)in mind.
After an overview over the process of transforming Montage specificationsinto an interpreter, its single phases are described in detail throughout the rest
of this chapter. Deviations of our approach from the original Montageapproach are indicated at the appropriate places.
4.1 What is a Montage in MCS?
The following definitions are provided with regard to an implementation and
reflect the necessary data structures that are used to implement the system. We
39
40 4 From Composition to Interpretation
will refer to these definitions and give corresponding class declarations in Java
when discussing the implementation in the next chapter. Montages - although
entities ofcomposition — never can be executed on their own. Only when being
member of a language, they can be deployed conveniently. Therefore we start
by defining our notion of a language.
4.1.1 Language and Tokens
A language L = (M, T) consist of a set of MontagesMand a set of tokens T. A
token tok = (regexp, type) is defined as a pair ofa regular expression regexp defin¬
ing the micro syntax and a type indicating into which type the scanned string is
to be converted. Tokens are either relevant t e Try i.e. they will be passed to
the parser or they are skipped1: t Tskip ^rlv an<^ ^skip are disjoint sets:
Trlv n Tskip = 0. T denotes the set of all tokens of a language:
T = Trlv u Tskip.
4.1.2 Montages
A Montage m = (sr, Pm cfg) can be defined as a triple consisting ofa syntax rule
sr, a set of properties Pm and a control flow graph cfg. Fig. 9 shows a graphical
representation of a Montage.
A syntax rule sr = (name, ebnf) consists of a name (the left hand side or the
production target) and a production rule ebnf(the right hand side). The ele¬
ments of a production rule, which are of interest in our context, are terminal
symbols, nonterminal symbols, repetition delimiters (braces, brackets and
parenthesis) and synonym separators (vertical lines). The complete definition
of the Extended Backus-Naur Form can be found in [Wir77b].
A property p = (name, value, Ref) is basically a named variable containing
some value. Associated with each property is a rule, specifying how its initial
value can be computed. In MCS, Java block statements (see [GJS96] for a defi¬
nition) are used to express such rules. They may contain several (or no) refer¬
ences r £ Refto other properties, possibly to properties of other Montages. A
reference represents a read access, writing to a property within an initialisation
rule is prohibited, (section 4.6 describes the use of properties in detail).
A control flowgraphisauniteddatastructureasitcontainsbothanabstractsyntaxtreefragmentandacontrolflowgraph.Thusitcanbedescribedasatri¬plecfg=(N,EasPEJcontainingasetofnodesA^,asetoftreeedgesEastandasetofcontrolflowedgesEcfAnodecaneitherbeanonterminal,arepetitionore.g.whitespacesandcommentswillnotbeofinteresttotheparserandare
skipped.
4.1 What is a Montage in MCS? 41
Syntax Rule
Example ::= A {B} "text".
Control Flow Graph
I- <n>
LIST
I--*B
1 o |
Propertiesname
x
Y
ruletype
int OtherName.X + DifferentName.Z
boolean true
Actions
@n:
{
int i = 0; // local variable decl
X = i; // access property
// additioal Java statements
}
Figure 9: Schematic representation ofa MontageSyntax Rule: an EBNFproduction
Control Flow Graph: united representation ofanASTfragment and controlflow information
Properties: variables initialized during static semantics evaluation
Action: dynamic semantics
an action. In the tree structure, repetitions may only occur as inner nodes,
actions only as leave nodes, nonterminals may be both. If a nonterminal node
is an inner node, then all its subtrees are part of the Montage that the nonter¬
minal represents. The graphical representation of control flow graphs uses
nested boxes to display the tree structure. This allows to layout the control flow
dependencies as a plain graph.Action nodes in the control flow graph are equivalent to properties. While
properties and their associated initialisation rules define static semantics, the
rules associated with action nodes define the dynamic semantics of a language.An action thus is defined similar to a property: act = (an, Ref). It is associated to
an action node an and it may also contain a block ofJava statements. The same
42 4 From Composition to Interpretation
rules as for initialisation rules apply here, i.e. from within this block, access to
properties (read and write) is possible.
4.2 Overview
Montages have to be aware of each other in order to communicate and interact
during interpreter generation and program text processing. Five transformation
phases are necessary in the process from language specifications to an inter¬
preted program (Fig. 10).
1. In a first step, the Registration-, all Montages and tokens that are part of the
new language specification have to be introduced.
2. During the Integration phase, a parser is generated that is capable of reading
programs of the specified language. Simultaneously, consistency checks are
applied to the Montages, i.e. completeness of the language specification and
the accessibility of all involved subcomponents is asserted.
3. The parser is then used to read a program (Parsing) and to transform it into
an abstract syntax tree (AST).
4. In the next stage {Static semantics), dependencies between the nodes in this
AST are resolved by assigning initial values to all properties of all Montages.
5. Finally, the control flow graphs are connected to each other (Control Flow
Composition) and thus building a network of nodes that can be executed.
The first two phases specify the static semantics of the language specification.
This means that all necessary preparations that can be done statically are com¬
pleted after integration. Further processing is done by executing the specifica¬
tion, namely the phases three to five - the dynamic semantics of the language
specification.The five steps of the transformation process also imply a shift in focus from
Montages towards their subcomponents. This is also reflected in Fig. 10 by the
three major (intermediate) data structures that are generated during specifica¬
tion transformation (displayed in ellipses). As focus shifts from Montages to
their subcomponents (properties or control flow graphs), interaction between
the components gets more and more fine grained, the data structures thus are
increasingly complex.The following sections will provide a detailed description of the five trans¬
formation phases and the resulting data structures.
4.2 Overview 43
1. Registration/Adaptation
Montages & tokens
2. Integration
ProgL 3. Parsing
Source code of
program to execute ^
^V<g> <tf
l^-t.. -I Imo
DO
^> A
4. Static Semantics
5. Control Flow Composition
ÇompanéÉiiijiïÉlI
C:\>
Program interpretation
COo c
4= O£ "=<0 (0
E .20) s(/) o
CO o
c
o
(0o
Üd)a
(/)
(0o
c
(0
E
CO
E(Cc
Û
Figure 10: The transformationprocessfrom specifications to an executable interpreter.
44 4 From Composition to Interpretation
4.3 Registration IAdaptation
Registering simply marks a Montage (or a token) as being part of a language
(Fig. 11). The most of the work performed in this phase is done manually by
the user. He has to adapt imported Montages to the new environment, i.e. to
the new language. This covers renaming of nonterminals, properties and
actions where necessary. Token definitions have to be given in this phase as well
for all variable tokens, e.g. identifiers, numbers, strings, whitespaces, etc.
Tokens for keywords can be generated by the system automatically (see integra¬
tion phase below).
W
2 _-
Mf
added by user
generated during
integration
Figure 11: Language L consisting ofa set ofMontages M and a set oftokens T.
If a language shall be composed of existing Montages, then in almost every
case, minor adaptations have to be performed, e.g. adjusting the syntax rule to
the general guidelines (as capitalized keywords for instance). Too stringent con¬
sistency checking of Montages in this phase would hinder a flexible Montage
composition as only compatible Montages would be allowed to join the lan¬
guage. We consider editing a Montage in a language context (rather than out of
context) as less error prone and thus more productive.
Apart from enforcing set semantics (i.e. no duplicate Montages or tokens in
a language) there are no consistency checks necessary. This loose grouping
allows for comfortable editing of Montages. A language manages a set of Mon¬
tages and tokens. It returns, upon request, references to Montages or tokens
being member ofthe language. It plays a central role in the integration phase as
it is the only place in a language specification where all member Montages and
tokens are known.
One of the Montages of a language has to be designated to be the starting
Montage. It is equivalent to the start symbol (a designated nonterminal sym¬
bol) in a set of EBNF rules specifying a language. The starting Montage will
4.4 Integration 45
begin the parsing process in phase 3 (Fig. 10). Registration has to ensure that
exactly one starting Montage is selected before transformation progresses to the
integration phase.
4.4 Integration
During the integration phase, tokens and Montages are integrated into a lan¬
guage specification. This requires parser and scanner generation as well as inter¬
nal and external consistency checks.
4.4.1 Parser Generation
For each Montage m e M, a concrete syntax tree est is generated by parsing its
syntax rule sr (Fig. 13 shows an example). A syntax tree reflects the structure of
the EBNF rule. Repetitions are represented by inner nodes, nonterminal and
terminal symbols by leave nodes. Note that the original syntax rule can alwaysbe reconstructed from est by performing an inorder traversal2.
The syntax trees of all Montages can be merged by replacing the nontermi¬
nals with references to the root of the syntax trees of their designated Mon¬
tages. This will result in a parse graph as parse trees may refer to each other
mutually (Fig. 12). The parser is now ready for use (see next section on parsing
phase).
4.4.2 Scanner Generation
Syntax rule parsing also generates tokens for terminal symbols. Such terminal
symbols, or keywords, are easily detectable as strings enclosed in quotationmarks. Each keyword encountered is added to the token set of the language.Keywords are fixed tokens, i.e. they have to appear in the program text exactlyas they are given within the quotation marks in the EBNF rule. In contrast, the
syntax of identifiers or numbers varies and can only be specified by a rule (a
regular expression) but not with a fixed string.After all parse trees have been generated, the complete set of tokens is
known. It is now possible to generate a scanner that is capable of reading some
input stream and returning tokens to a parser. We use scanner generation algo¬rithms as they are used in the Lex [Les75] or JLex [Ber97] tools.
In the tree representation we use in our figures this corresponds to a traversal from top to bottom.
46 4 From Composition to Interpretation
ii\ii
T
"DO"
"END"
"IF"
"WHILE"
st: While
- "WHILE"
Condition
"DO"
Statement
"- "END" -.
st: Condition
t Odd
Relation ---
st: Statement
— Block ---
— If >
-While -
— Repeat --
While ::= "WHILE" Condition "DO" Statement "END".
Condition = Odd I Relation
Statement = Block I If I While I Repeat I...
Figure 12: Merging ofsyntax trees by resolving referencesfrom nonterminal symbols to
Montages andfrom terminal symbols to the token table.
4.4.3 Internal Consistency
Internal consistency is concerned with the equivalence between the concrete
and the abstract syntax tree of a Montage. The syntax tree generated from the
EBNF production reflects the concrete syntax est whereas the tree structure of
the control flow graph defines the abstract syntax ast for the same language
component (Fig. 13). If the structure ofest is not equivalent to the structure of
ast then the parser will not be able to map the parsed tokens onto the gwenast
unambiguously and thus will stop the transformation process.
Every nonterminal symbol and repetition in est must have an equivalent
node in ast. This equivalence can either be defined manually or semi automati¬
cally. It is not possible to identify equivalent nodes in both trees completely
automatically, Fig. 14 shows why: nonterminals symbol may have the same
name. Therefore it is impossible to automatically find equivalent nodes in the
control flow graph. E.g. in Fig. 14: is the first occurrence of "'Terni' in the
EBNF rule equivalent to the left or to the right "Term node in the control
flow graph? This example may seem obvious, but we will show that the answer
to this question is part of the language specification itself and cannot be gener¬
ated automatically.
4.4 Integration 47
Case ::= {"CASE" Expression "DO" [ StmtBlock ]}[ "DEFAULT" StmtBlock ] "ESAC".
LIOT-1
Expression - - *~T
i true
TOPT-2
StmtBlock~1
•OPT-3
StmtBlock~2
--(0)-*"
T
est
-{}
ast
> - LIST-1
StmtBlock-1
-[]StmtBlock~2
— "DEFAULT
— StmtBlock
"ESAC"-> equivalent nodes
Figure 13: EBNFproduction and controlflow graph with their respective tree representationsshown below. Structure ofthese trees andposition ofnonterminals have to match.
Manual definition of equivalent nodes (e.g. by selecting both nodes and
marking them as equivalent) is the most flexible solution to the problem of
multiple occurrences of the same name. It allows to define arbitrary nodes as
equivalent. Although it would not be wise to assign e.g. an EBNF nonterminal
symbol "Term' to a control flow node "Factor", manual assignment would not
prevent it. In addition to the production rule and the control flow graph, a
table showing the relation between the two trees would be necessary.
48 4 From Composition to Interpretation
Add ::= Term AddOp Term.
AddOp = "+" I
Figure 14: Multiple occurrence ofthe same namefor a nonterminalsymbol.
As users normally will identify equivalent nodes by name, it is natural to
define equivalence as equality of names. This equivalence could be found auto¬
matically, but as we indicated above, this is not possible for multiple occur¬
rences of the same name.
The nonterminal symbols "Term" in the syntax rule can be distinguished
unambiguously by their occurrence (first and second appearance in text)
because an EBNF rule is given in a sequential manner. The same does not
apply to a control flow graph, although one could argue that the given control
flow would sequentialise the nodes. Although this is true, such a definition may
be too stringent. For our example in Fig. 14 this means that the evaluation
order of the two terms is restricted to left-to-right. A right-to-left evaluation
could not be specified!In some cases, control flow graphs represent a partial order and thus, no
unambiguous order of nonterminal nodes can be given. Fig. 15 shows such a
case. Implying from the annotation of the left edge that the left "Statement"
corresponds to the THEN-clause is dangerous, as it presumes some knowledge
about the dynamic semantics that is not available in the syntax rule.
We propose a semi automated approach to solve the problem of unambigu¬
ously identifying equivalent nodes of the concrete and abstract syntax trees. As
If ::= "IF" Expression "THEN" Statement
"ELSE" Statement "END".
!--- Expression
Expression,result •
S "S.
s,
Statemenl Statement
^ö*::^T
Figure 15: Unspecified evaluation order
4.4 Integration 49
mentioned above, the occurrences of nonterminals in the EBNF rule are
sequentialised by their appearance in the rule. For each nonterminal node in
the control flow graph we need to provide a number that indicates the appear¬
ance in the syntax rule. This number is 1 by default which simplifies the obvi¬
ous cases, as e.g. in Fig. 13. The first and only appearance of "Condition" in the
syntax rule is equivalent to the only "Condition" node in the control flow
graph. If there is more than one nonterminal node in eft with the same name,
then these nodes have to be enumerated in an unambiguous way, e.g. by
appending ~i where i indicates the ph appearance of this nonterminal in the
syntax rule. Fig. 16 illustrates an enumeration of the "Term" nonterminal
nodes, such that the resulting evaluation order is right-to-left.
Add ::= Term AddOp Term.
AddOp = "+" 1 "-".
---<jädcOh--H"1 Term~2 - - -** Term~1
Figure 16: Specification ofa right-to-left evaluation order using node enumeration
Repetitions are enumerated regardless of their kind (option, list or group).In the EBNF rule, only opening brackets are counted in the order of their
occurrence. Fig. 13 provides an overview over all these naming conventions.
We are now ready to specify what internal consistency is: a notion for equiv¬alence between concrete and abstract syntax trees which can be summarized as
follows:
Be Sc = (cp C2 -, c^ a sequence of nodes of est and Sa = (ap a^ ... a^ a
sequence of nodes oiast with m, n> 0, i.e. the sequences are not empty.
Sc was generated by an inorder traversal of est, where all terminal symbolswere ignored (i.e. skipped). Similarly, Sa was generated by an inorder traversal
of ast where all action nodes were ignored^. Furthermore we have a function
eqv: Sc —> Sa that returns the equivalent control flow node for a given syntax
tree node.
Thus, a concrete syntax tree est is equivalent to an abstract syntax tree ast if:
1. | Sc | = | Sa |, the number of nodes produced by the traversais are the same.
2. Vi,j: i, j > 0 a eqv(cj) - a: => / =j, equivalent nodes appearin
thesameorderinbothsequences.Additionally,allsubtreesofnonterminalnodeswhereskippedaswell.SuchsubtreesreflectthetreestructureoftheMontagedesignatedbythenonterminalnodeandthusareofexternalnature.
504 From Composition to Interpretation
4.4.4 External consistency
External consistency is concerned with the accessibility to (parts of) other
Montages. We have seen in Fig. 12 that Montages will be connected to each
other when building parse graphs. Furthermore, properties of Montages may
contain references to properties (of possibly other Montages) and the same ref¬
erence may also occur in the rules associated with action nodes. In order to
function properly, access to all referenced Montages or parts of it (e.g. root of
parse tree, properties) has to be guaranteed. In other words, the external con¬
sistency check has to assert that all referenced entities are available, i.e. accessi¬
ble for read or write (or both) operations. There exist two different kinds of
references to external entities in a Montage:
A. Textual references: If a nonterminal symbol is parsed, then its name has to
designate a Montage registered with L. If no such Montage can be found the
specification is not complete and thus it will be impossible to continue the
transformation process towards an interpreter forZ.
Similar rules apply for Montage properties. References to other properties
may appear in their initialisation rules. A dependency relation exists between
Montages Mj and M2 if an initialization rule in Mj contains a reference to a
property ofM2 Montages and their dependency relations span a graph that is
illustrated in Fig. 20a on p. 61. It is also possible to check whether the referred
properties are within the scope of the referring initialisation rule. The scope of
an initialisation rule is the Montage it is declared in, and all Montages that are
accessible via nonterminals from there. Let us illustrate this with a simple
grammar for an assignment (each line corresponds to a Montage):
Asg : := Ident "=
"
Expr.
Expr : := Term { AddOp Term }.
Term :: = Factor { MultOp Factor }.
Factor = Ident | Number | "(" Expr ")"-
Properties shall be defined as shown in Fig. 17. The initialisation rules are
implementing type checking. An error will be issued on checking the third
property, Expr. Error, as it tries to access the propertyAsg. TypeOK,which is
out ofscope. There is no nonterminalAsg in the Montage Expr nor is it possi¬
ble to construct a path from Expr to Asg by transitively access nonterminals,
e.g. the following would be legal:
Expr.Error := return Term~l.Factor~1.Type;
Note that Factor is a synonym rule and therefore does not have any proper¬
ties. Factor~l .Type actually accesses the Type property of the underlying
Montage (after the parse tree was built, see next section). Hence, a further test
4.4 Integration 51
Asg.TypeOK :~ return Expr.Type == Ident.Type
Expr.Type := if (exists(Term~2)) {
if (Term~l.Type == Term~2.Type) {
return Term~l.Type;
} eise {
return undef;
}
} eise {
return Term~l.Type;
}
:= return Asg.TypeOK;Expr.Error
Term.Type := if (exists(Factor~2)) {
if (Factor~l.Type == Factor~2.Type) {
return Factor~l.Type;
} else {
return undef;
}
} else {
return Factor~l.Type;
}
Figure 17: Property declaration with initialisation rules
should check, whether a property accessed in a synonym Montage is available
in all alternatives of the production. Further processing of properties has to be
done during static semantics analysis and is described in section 4.6.
B. Graphical references: Nonterminal nodes may contain further (nested) repe¬
titions and nonterminals. These refer to repetitions and nonterminals in the
Montage designated by topmost nonterminal. These nested nodes serve only as
source and target nodes for control flow edges. It is not allowed to add actions
to (nested) nonterminal nodes. Each Montage encapsulates its internal struc¬
tures such as properties and the control flow graph. Access is granted via well
defined interfaces. If actions could be added from outside it would violate
encapsulation and destroy modularity between Montages.The external consistency check is completed successfully if the nesting struc¬
ture of the subtree of a nonterminal node matches the designated Montage'sast. Equivalence between nested nodes and the ast of the designated Montage is
defined analogously to the internal consistency described above.
524 From Composition to Interpretation
4.5 Parsing
We are now ready for the dynamic semantics part ofthe language specification,
i.e. to execute the specification in order to read and interpret a program. The
next step in the transformation process is parsing (step 3 in Fig. 10)
The parsing phase is responsible for reading a program and converting it to a
parse tree according to the given syntax rules from the Montages. Fig. 18 illus¬
trates this process with an example of a simple language.
Before going into details about the conversion of a program into a parse tree,
we have to select a suitable parsing strategy.
Grammar of L
Asg ::= Ident "=" Expr.
Expr ::= Term {AddOp Term }.
Term ::= Factor { MultOp Factor}.
Factor = Ident I Number I "(" Expr ")"
IParser
ÏProgram P in L
d = (a + b)
Parse tree of P
Ident: d Expr
Term
Factor
Expr
Term
Factor
Ident: c
Term
Factor
Ident: a
Term
Factor
Ident: b
Figure 18: Parsing transforms aprogram P ofa language L into a parse tree
4.5.1 Predefined Parser
Parsing is a well understood process and is easy to automate. This might be an
explanation why the Montage approach lacks the possibility to specify parser
actions or to get control during parsing in general. In the publications defining
the Montage approach [e.g. AKP97, KP97b, KutOl] parse tree building is
explained only as a mapping of concrete syntax (the programP) onto a parse
tree. No concrete definitions about the parsing method can be found. Further¬
more, no mechanism for intervention during the parsing is foreseen in these
4.5 Parsing 53
publications. From a users point of view this omission can be seen as both a
flaw in or a quality of Montages.On the one hand, the experienced language designer will miss of course the
tricks and techniques that allowed him to specify "irregular" language con¬
structs elegantly and compactly. Normally, these are context sensitive parts of a
grammar where additional context information (such as type information) is
necessary to parse them unambiguously. As there is no way to specify in Mon¬
tages how to resolve ambiguities, the language designer is forced to either
rewrite the syntax rules or to rely on standard resolving algorithms offered bythe underlying parser (if they are known at all). We will give some examplesbelow.
On the other hand, the lack of being able to specify irregularities can be seen
as a construction aid for the language designer. E.g. Appel advises that conflicts
"should not be resolved by fiddling with the parser" [App97]. The occurrence
of an ambiguity in a grammar is normally a symptom of an ill-specified gram¬mar. Having to resolve it by rewriting the grammar rules is definitely an advan¬
tage to the inexperienced language designer, as it forces him to stick to a
properly defined context-free grammar.
The question is: should there be a possibility to control the parser in Mon¬
tages? We decided against it for two reasons:
1. MCS shall stay with the original Montages as close as possible. Even without
sophisticated parsing techniques full-fledged languages such as Oberon
[KP97a] or Java [Wal98] could be specified using Montages.2. With regard to modularity and reuse of spécifications, the Montage
approach is in a dilemma: both, the possibility to specify parse actions as
well as the rewriting of syntax rules has its disadvantages.If parse actions were allowed, they would only apply to a specific parser
model (see below). One would have to stick to a certain parser (e.g. a LALR
parser) to enable reuse. In particular this would be the case if the parser
would be specified completely by the Montages (as it is done with the static
and dynamic semantics).
If there are no parse actions allowed, the language designer is forced to
rewrite the syntax rules in order to express them in a context-free manner. In
extreme cases this might result in not being able to reuse a Montage as it is,
because it leads to an ambiguity in the grammar.
In general, we think that the advantages of a predefined parser will outweighthe complexity one would have to deal with if self defined parse actions were
allowed.
54 4 From Composition to Interpretation
The following discussion will analyse the two basic parsing strategies: bot¬
tom-up or shift-reduce parsing, such as LALR parsers, and top-down or predic¬
tive parsing, such as recursive descent parsers, with regard to Montages. For in-
depth introductions into these parsing techniques, we refer to [ASU86,
App97]. Both parsing approaches are applicable to Montages as shown by
AnlaufFs GEM/MEX (LALR parsing usingyace Qoh75] as a parser generator)
and our MCS (predictive parsing).The choice ofthe parsing technique decides on what classes ofgrammars can
be processed. Both parsing techniques have their pros and cons with regard to
ease of use, efficiency and parser generation.
4.5.2 Bottom-Up Parsing
The bottom-up approach reads tokens from the scanner (so called shift-opera¬
tion) until it finds a production whose right-hand-side (fhs) matches the tokens
read. Then these tokens will be replaced by the left-hand-side Qhs) of the pro¬
duction (which is called a reduce-operation). To be precise, the matching
tokens get a common parent node in the parse tree. The tree therefore grows
from its leaves towards its root, which corresponds to a growth from bottom to
top when considering the usual layout of trees in computer science (root at
top). During the construction of a parse tree, two conflicts may occur:
Reduce-reduce conflict: The parser cannot decide which production to choose
in a reduce operation. This will be the case if several Montages have the same
rhs in their syntax rule. One reason for this is that during registration the equal¬
ity of the two rhs was not noticed, a common mistake if complete sublanguages
are registered. An example for such a sublanguage was shown in Fig. 18. IfAsg
is imported as a self-contained sublanguage, then the Montages Expr, Term
and Factor will be imported as well. If there is already a Montage Expres¬
sion registered that contains the same rhs as Expr, there will be reduce-reduce
conflicts during parsing. In this case, we are grateful for this conflict, as the
related warning will draw our attention on this overspecification of the lan¬
guage.
Reduce-reduce conflicts do not only indicate overspecifications, but also
pinpoint context sensitive parts of the grammar. A typical example offers the
following portion of a FORTRAN like grammar. Note that each line corre¬
sponds to a Montage:
4.5 Parsing 55
Stmt = ProcCall | Asgn= Ident "(" ParamList ")"= Ident [ "(" ExprList ")= Ident {"," Ident}.= Ident {"," Ident}.
ProcCall
Expr
ParamList
ExprList
Unfortunately, the grammar is ambiguous as the following line
A(I, J)
can be interpreted as a call to A with the parameters I an J or as an access to
array A at location (I, J). This grammar is not context free of course, i.e.
only by regarding at the type declaration ofA it can be decided which produc¬tion to apply. In this case, the reduce-reduce conflict indicates a clumsy lan¬
guage design.The deployment of a standard parser generator as e.g.yacc [Joh75] or CUP
[Hud96] might be dangerous as they implement a (too) simple resolving strat¬
egy for reduce-reduce conflicts: the first rule in the syntax spécification is cho¬
sen. Montages cannot be enumerated and thus no order on input can be
guaranteed that will be obeyed during parser generation. Furthermore, the sec¬
ond rule (Montage) will fall into oblivion as it will never be chosen. This is an
unsolved problem in GEM/MEX which delegates parsing to aj/^cc-generatedparser.
Shift-reduce conflict: The second kind of conflict in shift-reduce parsers occurs
when it is undecidable whether to perform a shift-operation (read more tokens
from the scanner) or a reduce-operation (build a new node in the parse tree).The well known dangling-else such as in programming language Pascal or C, is
a good example to demonstrate a shift-reduce conflict:
If ::= "if" Expression "then" Stmt [ "else" Stmt ].
The following program fragment is ambiguous:
if a then if b then si else s2
It can be interpreted in two different ways:
(1) if a then {if b then si else s2}
(2) if a then {if b then si} else s2
Shift-reduce parsers will detect the conflict. Suppose the program is read up to
si. Now, without further information, the parser cannot decide whether to
reduce (interpretation 2) or to continue reading until s2 (interpretation 1). In
Pascal and C, an else has to match the most recent possible then, so interpre¬tation (1) is correct. By default, yacc or CUP resolve shift-reduce conflicts byshifting, which produces the desired result in the dangling-else problem of C or
Pascal.
564 From Composition to Interpretation
4.53 Top-Down Parsing
The second method to parse a program text and to build a parse tree has its
pros and cons with respect to Montages too. Top-down parsers try to build the
parse tree from the root towards the leaves. The parser is structured into several
procedures, each of which is capable to recognize exactly one production rule.
Each of these procedures reads tokens from the scanner and decides upon their
type how to continue parsing. A terminal symbol is simply compared to the
expected input, lists and options will be recognized in the body of while loops
or conditional statements. But the most interesting is the recognition of
nonterminal symbols: it will be delegated, calling its corresponding procedure
in the parser. Asthe recognizing procedures can be called recursively (compare
with the parse graph constructed in the integration phase, Fig. 12, p. 46) and
because the syntax rules will be called from top to bottom^, such a parser is
called recursive-descent parser. Aswith the bottom-up parsers, we
have to men¬
tion two problems that top-down parsers impose on the Montage approach:
Left-Recursiveness: A grammar which is to be recognized by a top-down parser
must not be left-recursive. We will illustrate this by the following grammar:
ProcCall : := Ident "(" ParamList ")".
ParamList ::= { Ident "," } Ident.
If the parser encounters a procedure call such as
p(i) or r (i, j ,k)
then it will not be able to recognize its parameter list. The parser calls a recog¬
nizing procedure ParamList that will try to read all Idents and the succeeding
"," within a while loop. The problem is that the parser cannot predict whether
it should enter this loop at all, and if, when it has to exit the loop because the
first token in the repetition is the same as the one following it.
Lists and options have to be used carefully if they occur at the beginning of a
production rule. Fortunately, every left-recursive grammar can be rewritten to
be right-recursive [ASU86]. For our above example this would look like:
ProcCall ::= Ident "(" ParamList ")"•
ParamList : := Ident { "," Ident }.
As demonstrated here, rewriting (or left factoring as this method is called) can
often be done within the rule itself— no Montages other than ParamList is
affected. The ban of left-recursive productions can be a nuisance if Montages
The starting production is considered to be the topmost production. Then all nonterminals
appearing within this production are listed with their respective syntax rules below, and so on.
Hereby an order is generated that sorts productions from the most general one (starting produc¬
tion) down to the most specialised ones (tokens).
4.5 Parsing 57
are imported that where developed in a system with a bottom-up parser where
this restriction does not apply.
Lookahead: A second problem with top-down parsers is, that they cannot
always decide which production to choose next in order to parse the input. The
following fragment from the Modula-2 syntax [Wir82] shall serve as an exam¬
ple:
statement : := [ assignment | ProcCall | ... ] .
ProcCall ::= designator [ ActualParams ].
assignment ::= designator ":=" expression.ActualParams ::= "(" [ExpList] ")"•
Consider this program fragment as input:
a := a + 1
When the parser starts reading this line, it is expecting a statement. The next
token is a designator a which could be the beginning of the productions Proc¬Call and assignment. Which production should the parser choose now?
There are two ways to answer this question: either it tries to call all possibleproductions in turn5 or it pre-reads the following token and gets ": =" which
allows to identify assignment as the next production. A parser that tries all
possibilities is called a backtracking parser; pre-reading tokens is called looka¬
head and it avoids time-consuming backtracking.
4.5.4 Parsing in MCS
Considering our self imposed preconditions - like comprehensibility of the sys¬
tem and its processes, composability and compactness of a language - and the
open specification of Montages with regard to parsing, a top-down parser
seems more suitable than bottom-up parsing.
Top-Down Parsing: The algorithms for top-down parsing are easier to under¬
stand than those for bottom-up parsing. Shift-reduce parsers are monolithic
finite state machines., usually implemented with a big parse table that is steer¬
ing the recognition of token patterns. As the construction of such parse tables is
too much work to do by hand, the user has to rely on algorithms that are diffi¬
cult to comprehend. Error detection and error recovery are more complex to
implement in bottom-up parsers.
Top-down parsers, however, are subdivided into procedures each of which
can recognize exactly one syntax rule. Note that these procedures form a verti-
e.g. first production ProcCall: the next token must be an opening parenthesis"
("
which would
fit the ActualParams production. As there is no"
(" the parser has to step back and try the next
candidate production, assignment, where it is successful.
58 4 From Composition to Interpretation
cal partitioning of the parser. Hence, the structure of top-down parsers is very
similar to MCS. Each Montage can implement a service that is able to exactly
recognize its own syntax rule. If efficiency is important, then lookaheads have
to be determined. This can be done automatically by analysing so called FIRST
sets[ASU86,Wir86].
From the point of view of a MCS user, a top down parser has its pros and
cons, as indicated in the discussion above. The most important rule - no left-
recursive grammars— is not as limiting as it may seem at first glance. Each left
recursive grammar can be transformed into a right-recursive one and in many
cases this is possible by just rewriting a single syntax rule. The parse algorithmis simple and corresponds the way a human will read a grammar.
If efficiency of parsing is not a major goal, then a backtracking parser even
allows for parsing of ambiguous grammars. The parser could be implemented
to ask for user assistance in the case of several legal interpretations of the input
program. User assisted parsing could be very useful in education, e.g. to dem¬
onstrate ambiguous grammars and their consequences for parsers and program¬
mers (all variants of different parse trees can be tested).
Parse Graph: In order to parse a program, MCS uses the parse graph con¬
structed in the integration phase (see Fig. 12). Each node (read Montage) in
this graph has a method that can be called to parse its own syntax rule. These
methods either return a parse tree (the subtree corresponding to the parsed
input) or an error (in the case of a syntax error in the program). The scanner
provides a token stream from which Montages can read tokens one by one.
Parsing begins at the parse method of the starting Montage. Note that in the
parse tree, each Montage occurs exactly once. As in every recursive descent
parser, control is transferred to the next parse method as soon as a nonterminal
token was read. Then the Montage corresponding to this nonterminal will take
over. When the construct was parsed successfully, the recognized subtree is
returned to the caller. Parse graph and external consistency guarantee that all
necessary Montages will be found during parsing. The parse tree returned to
the caller is basically an unrolled copy of (parts of) the parse graph. Its nodes
are instances of Montages that represent their textual counterparts in the tree.
We refer to these Montages as Instance Montages.
Instance Montages:An Instance Montage (IM) is a representative of a language
construct in a program. Template Montages (Montages as we described them
until now) serve as templates for IMs. They define the attributes of an instance
at runtime (i.e. the dynamic semantics of a language specification), and can be
implemented in two ways:
4.5 Parsing 59
Figure 19: Parse graph to controlparsing and resultingparse tree.
1. As copies of the template Montages. They will be created by cloning the
template. In this case, they feature all characteristics of the template Mon¬
tages, only that some of them will never be used, e.g. generating a parser,
checking internal and external consistency or the ability to parse a program.
2. As instances of new classes. The characteristics of such new classes are
defined by the template Montages. They have the advantage that only the
dynamic semantics of the specifications has to be present.
Fig. 19 illustrates the relations between Template Montagesana Instance Mon¬
tages.
Static semantics and dynamic semantics will be processed on IMs only.Additional characteristics of IMs concerning their implementation and deploy¬ment are explained in section 5.3. and section 5.2.3 provides a more detailed
insight in the implementation of the parser in MCS.
604 From Composition to Interpretation
4.6 Static Semantics Analysis
4.6.1 Topological Sort ofProperty Dependencies
In order to initialize all properties we simply could fire all their initialisation
rules simultaneously^. This will result in some rules being blocked until the
properties they depend on become available. Other properties can be com¬
puted immediately. Fig. 20 illustrates initialization by means of three Montages
and their properties. Some rules depend on the results of others (e.g.Ml.a)
whereas some rules can fire immediately (in our caseM2 . c).
Before initialization starts, all properties are undefined, marked by the dis¬
tinct value undef. Static semantics analysis is completed when all properties are
defined, i.e. V pe P : p £ undef. A simultaneous firing of rules could end in a
deadlock situation if initialisation rules mutually refer to each other. To avoid
this situation, i.e. a system looping infinitely, it is advisable to check for circular
dependencies before executing initialisation rules. This can be done by inter¬
preting the properties and their references as a directed graph (digraph)G - (P,
R) that is defined by a set of vertices P (all properties of all Montages of a lan¬
guage L) and a set of directed edges R (all references contained in these proper¬
ties).Let P be the a set of all properties of a language L and let R be the set of all
references between the properties of P. We define a reference r = (s,t) as an
ordered pair of properties s e Psource and^E Ptarget with Psmrceand Pter^being
the set of reference sources and targets respectively.
We have to assert that G is a directed acyclic graph (dag)7. Fig. 20b shows
such a graph, where we inverted the direction of all references in order to get a
data flow graph. In our example, M2 .C is the only rule that can fire initially. Its
result triggers the computation ofMl. A and M3 .A etc.
Fortunately, there exists an algorithm that very well suits our needs, i.e. top¬
ological sorting:
6 In fact, we are describing our system based on a sequential execution model, "firing all rules in a
random order" would be more precise here.
7 Formally: Let path(a, b) be a sequence of properties pi, p2,... , pn, such that (pi, p2), (p2, p3),...
(pn-1, pn)e R. The length ofa path is the number of references on the path. A path is simple if all
vertices on the path, except possibly the first and last, are distinct. A simple cycle is a simple path of
length at least one that begins and ends atthesamevertex.Inadirectedacyclicgraph,thefollow¬ingholdstrue:\.-3r.((r,r)eR)2.\/r,s,t.((r,s)eRa(s,t)eR=>(r,t)eR)
4.6 Static Semantics Analysis 61
M1
A = B + 2*M2.C
B = M3.A
^
M2
C = 42
D = A< M1.A
M3
A = M2.C + 3
E = M3.D ?2 undefrefers to
M1.A M2.D M3.E
M2.C
M3.A M1.B data flow
Figure 20: Relations between Montages and Propertiesa. dependencies between Montages imposed by initialisation rules
b. dataflow during initialisation ofProperties
1. It checks for cycles in a graph and
2. if no cycles are detected, then it returns an order in which initialisation rules
can be fired without a single rule being blocked because of missing results
If cycles can be found then static semantics cannot be executed. The initialisa¬
tion of the properties participating in the cycle have to be rewritten, therefore it
would be helpful, if a failed topological sort would return the offending refer¬
ence.
Successful execution of all initialisation rules does not imply successful com¬
pletion of static semantics analysis: The initialisation rules may explicitly set a
property to undef {sec. Fig. 17 and Fig. 21). The original Montage definition
features a condition part (see example in Fig. 7, p. 35) which contains a
boolean expression that has to evaluate to true. If this condition cannot be
established then program transformation is stopped.MCS does not contain such a condition part because the same result can be
obtained with a property. The condition shown in Fig. 7 can be expressed with
an initialisation rule in MCS as given in Fig. 21.
It is possible to assign to a property the distinct value otundef. According to
our definition for completion of static semantics, V pe P : p ^ undef one singleproperty being undefined will suffice to stop the transformation process.
Hence, after the topological sorting and execution of all initialisation rules, it is
62 4 From Composition to Interpretation
i f (ConditionalOrOption.staticType
instanceof Java. lang.Boolean) {
return new Boolean (true);
} else {
}
return undef;
Figure 21: Initialising aproperty to undef
important to test whether all properties were set. In section 5.3.9 we will
present an algorithm that can perform static semantics analysis in 0(\P\ + \R\)
where again, \P\ denotes the number of all properties of all Montages and |K|
the number of all references between them. In other words, if cleverly pro¬
grammed, static semantics analysis can be embedded in a topological sort.
Related Work. Topological sorting of attribute grammar actions has been
described by Marti and Hedin.[Mar94, Hed99]
In the GIPSY project presented by Marti a DSL, GIPSY/L, is used to
describe relations between different documents, processes, and resources in
software development processes. GIPSY/L can be extended by users in order to
adapt the system to expanding needs. An extensible attribute grammar [MM92]
allows to specify actions that control these processes. Theorder in which such
actions are executed is determined by a topological sort along their dependen¬
cies.
Hedin describes reference attribute grammars that do not have to transport
information along the abstract syntax tree, but use references between the
attributes in order to access remote data more efficiently. A topological sort has
the same function as in our system: checking for cycles and determining an
order for execution.
4.6.2 Predefined Properties
Some properties are predefined, i.e. they are available in every Montage.
Their number is small in order to keep the system simple. Conceptually seen,
predefined properties are initialized in the parser phase already.
Terminal Synonym Properties:Terminal synonym productions as e.g. AddOp in
Fig. 14, p. 48 generate a property of the same name (^.ddOp in this case). This
property is of type String and contains the actual string that was recognized
by the parser during the parsing phase. In the example of thcAdd Montage in
Fig. 14, this would be either "+" or "-". Terminal synonym properties are read¬
only properties initialised by the parser during AST building.
4.6 Static Semantics Analysis 63
Parent Property: Each Montage implicitly contains a property parent which
is a reference to the parent Montage in the AST. This reference will also be set
by the parser during AST building and is read-only too. Theparent property
allows navigation towards the root of the AST, whereas nonterminals allow to
navigate towards its leaves.
Symbol Table Property: The last of the predefined properties is a reference to a
symbol table, SymTab (see below). Again, this property is read-only but its ini¬
tialisation can be user defined. There is a default behaviour which copies the
reference to the symbol table from the parent Montage in the AST. Neverthe¬
less, for specific cases (e.g. when a new variable is defined) declaration can be
cached in the symbol table in the initialisation rule of the property.
4.6.3 Symbol Table
The symbol table plays an important role during the static semantics phase.Basically it is a cache memory for retrieving declarations. Although it would be
possible to use property initialisation to remember declarations in a subtree, it
would be a tremendous overhead (and an error prone approach) to hand these
references up and down the tree during static semantics evaluation [Hed99].The advantages and the use of symbol tables are best explained with an exam¬
ple:
Variable Declaration and Use: Let us have a closer look at variable declarations
and variable access in a program. We will give a (partial) specification of a sim¬
ple language that allows to declare variables in nested scopes. To simplify the
example, variables have an implicit type (Integer) and there is only one state¬
ment that allows to print the contents of a variable.
Given the following specifications:Prog : := Block.
Block ::= "{" {Decl} {Stmt} "}".
Decl ::= Ident ["=" Expr].Stmt = Print | Block.
Print ::= "print" Var.
Var : := Ident.
The Montages which are of interest here, Block, Decl, Print and Var are
given in Figures 23 through 26 respectively. Consider the following program:
{
int i = 2 ;
{
int i = 5 ;
print i;
}
}
64 4 From Composition to Interpretation
a)
Symbol Table
key value
b)
Symbol Table
key value
°Prog
- SymTab
aBlock
^"~ —SymTab
unique
c)
Symbol Tabfe ^t- —
key valve
^Decl.
a Print
— SymTab- SymTab
name
„
1value
H Var
- SymTab
"K,isDeclared
value1 ° r*
Figure 22: Symbol table and abstract syntax tree
In this example we have two variable declarations which occur in nested scopes.
Both variables have the same name, i, but they have different values. When the
print statement prints the value of/' to the console, it will only see the inner
variable declaration, as scoping rules shadow the outer declaration. Thus, the
output of this program will be: 5
Fig. 22 shows the AST of the program above. First we want to focus on
node 7, a use of variable i. In order to provide access to the memory where the
value of / is stored, the Montage Var has to get the reference from the declara¬
tion (node 5). This non-local dependency between node 7 and node 5 can con¬
veniently be bridged by an entry in the symbol table. Whenever a variable is
declared, it is added to the symbol table with its name as the key for retrieval.
4.6 Static Semantics Analysis 65
Later in static semantics processing, this variable will be used and its declara¬
tion (containing the reference to its place in memory) can be retrieved by que¬
rying the symbol table.
The symbol table is a component that exists independently of Montages and
the AST. Its life-cycle is restricted to the static semantics analysis as it will not
be used any more after all references are resolved. As mentioned above, every
Montage has a predefined property SymTab that refers to the symbol table. But
initialisation of this property cannot be done statically by the parser (as e.g. for
the parent property). The reason for this is the ambiguous meaning oïundefzsa result of a query to the symbol table.
Suppose Montage Var (node 7) is querying for the name / in the symboltable. As a result it gets undef. This could mean two things:
1. There was no declaration of a variable i
2. The initialisation rules of node 5 did not yet fire. They might do in the
future, but then it is too late for node 7.
At least, this scenario would stop and report an error. But suppose the outer
declaration (node 3) fired before node 7. Then querying the symbol table
would result in retrieving the outer declaration instead of the inner one. The
program transformation would continue and generate faulty code.
Therefore we have to impose an order on the initialisation. We can do this
by generating dependencies among the nodes. As the symbol table has to be
initialised as well, we can use the initialisation of the SymTab reference to gen¬
erate a correct initialisation order.
The symbol table will not change its contents at every node in the AST. So it
makes sense to define as a default behaviour to copy the reference from the par¬
ent node
SymTab : return parent.SymTab;
But this behaviour can be overridden by providing a different initialisation
rule. For example:
SymTab : SymbolTable st = parent .SymTab;
st.add(Ident.Name, this);return st;
A new entry will be added to the symbol table. It is a reference to the current
Montage** and it can be retrieved with the given key (ident. Name).Note that the symbol table has to be implemented such that it can cope with
multiple entries of the same name in different scopes. In our example, this
means that the symbol table has to distinguish between the different entries for
i and furthermore it has to offer a different view for different nodes. In Fig. 22
denoted by this, the Java reference to the current object.
664 From Composition to Interpretation
Decl ::= Ident ["=" Expr].
- Optlnit -.
—Hnif)— ^T
!-_. Ident — *- Expr
Prop Type Initialisation
name String return Ident.value;
value Integer new lnteger(); // dummy value
Action
@init:
if (Optlnit.exists) value = Expr.value;
Figure 23: DeclMontage,variable declaration
there is only one single symbol table. To node 1 it is empty (a), nodes 2 and 3
see the declaration of the outer i (b) and the rest of the nodes will see the sym¬
bol table as it is displayed on the bottom (c). There are different implementa¬
tions possible that will meet all the requirements (see section 5.3.10).
Initialisation of the Instance Montages in the AST of Fig. 22 will happen
according to the initialisation rules of the following four Montage.
The Decl Montage specifies the actual declaration of a variable. For conven¬
ience purposes,the property name
is introduced. It is initialised by retrieving
the value of the token representing the identifier. The property value is the
most important property in a declaration, as it holds the value of the variable
during runtime. References to this property have to be established wherever the
variable is accessed. Initially, this property is set to some dummy value, as there
is no static evaluation ofthe initializing expression in this example. At runtime,
the variable's content has to be initialised to either the value of the expression
(if present). Nothing has to be done in absence of the initializer, because the
dummy value was already set during static semantics evaluation.
A Block is the syntactic entity of a scope. Variables declared within a scope
must not have the same names; this condition will be asserted* by the unique
property. The symbol table valid for this scope is built during initialisation of
the predefined SymTab property. First, the reference to the symbol table is
9 A Java set data structure is filled with all the names of the declared variables. The add operation
returns true if an name is new to the set.
4.6 Static Semantics Analysis 67
Block ::= "{" {Decl} {Stmt} "}".
- Decl List -.- StmtList -i
Decl Stmt »-T
Prop
unique
Type
Boolean
SymTab Object
Initialisation
Set set = new HashSet();foreach decl in DeclList
if (!set.add(decl.name))return undef;
return new Boolean(true);
SymbolTable st = parent.SymTab;foreach decl in DeclList
st.add(decl.name, decl);return st;
Figure 24: Block Montage, container ofa scope
retrieved from the parent node, then all declarations are added with their
names as keys.The Var Montage shows the use of a variable. Note that this Montage does
only specify static semantics as there are no actions to perform at runtime.
Read and write accesses to the value property of Var will be specified by the
appropriate Montages, e.g. the Print Montage below. It is important that all
variables are declared prior their use, which is checked with thelsDeclared
property. SymTab denotes the predefined reference to the symbol table. As its
initialisation is not overridden, it will be the same as in its parent Montage.
Var ::= Ident.
Prop Type Initialisation
isDeclared Boolean return SymTab(ldent.value)value Integer if (isDeclared) return SymTab(ldent.value).value;
else return undef
Figure 25: Var Montage, use ofa variable
The Print Montage finally shows how to access a variable's value at runtime.
Print is somehow an opposite to the Var Montage, as it does not specify any
68 4 From Composition to Interpretation
Print ::=Var.
Action
©print:
System.out.println(Var.value);
Figure 26: Print Montage, prints the contents ofa variable to the standard output stream.
static semantics but only runtime behaviour. The action rule accesses the value
of the variable directly via its reference (the value property).
4.7 ControlFlow Composition
4.7.1 Connecting Nodes
In the last phase of the transformation process, the control flow of a program
will be assembled from the control flow graphs of the Instance Montages of the
AST. We will explain control flow composition by means of an example. Given
the CASE-statement of Fig. 13, p. 47 and the following code fragment:
CASE a < 10 Do Stmtl
CASE a>= 10 && a <=2 0 DO // nothing
CASE a > 20 DO Stmt2
ESAC
Then, the parser will build the AST given in Fig. 27. The nodes in the lower
levels of the AST in Fig. 27 display the program text they represent for conven¬
ience. The parser can already do a considerable amount ofwork concerning the
"wiring": it simply copies the control flow graph in a Montage with all its con¬
trol flow edges whenever an appropriate construct is encountered in the pro¬
gram text.
Fig. 27a shows all the connections between the nodes of the subtree of the
Instance Montage Case after parsing but before control flow composition.
Nonterminal nodes are placeholders for the entire control flow graph of
their designated Montage. At control flow composition, the nonterminal node
will be replaced by the entire control flow graph of the designated Montage. All
incoming edges of the nonterminal will be deviated to the initial node/ of the
4.7 Control Flow Composition 69
a) structure generated by parser Case
i a» LIST-1 OPT-3 0 -~*-T"**
i1T
4^
I---*. a< 10 .--^T/>--» 10<a<20 ...*.t\i-- .*- a>20 —*-T
\ ! >
,**"' \ w„-
^v *
f^ \ ?
OPT-3.-'*"
OPT-2-"'"'
OPT-3
1—+. StmtBlkl •--*T |--* StmtBlk2 --+-T
ter control flow composition Case
i LIST-1 OPT-3 0 — +~T
>^--
jLt '{
a< 10 ~~-^-—~ 10<a<20 a>20\.
k/\i ;
/ 1 Y .?S^f'
OPT-3 / OPT-2_---
-
OPT-3,*
Y/
Y
StmtBlkl StmtBlk2
Figure 27: AST built byparser
replaced graph and all outgoing edges will leave from the terminal node of the
Montage. Kutter illustrates this replacement excellently in [KutOl, chapter 3].
Repetition nodes indicate, that their contents (their subtrees) may occur sev¬
eral times in the program. The number of occurrences must be in a certain
range which is part of a repetition node definition. E.g. the definition of an
option allows a minimum ofzero and a maximum of one occurrence of its con¬
tents. The parser will check whether the number of actual instances is within
the given range and in addition it will build a subtree for each of these
instances. This is illustrated in Fig. 27 where all three occurrences ofcASE-partare attached to the LIST-1 node.
70 4 From Composition to Interpretation
The contents of a repetition node specifies what these subtrees look like.
These subtrees or subgraphs (as they also are a partial control flow) are put
together by connecting the terminal node of the nfh instance with the initial
node of the n+Ph instance. All incoming edges of the repetition node will be
deviated to the initial node of the first instance and analogously all outgoing
edges of the list node will leave from the terminal node of the last instance
(illustrated in Fig. 27b).
Note that the repetition nodes are present regardless whether there is an
actual occurrence in the program or not. The second CASE-part and the
optional DEFAULT-part are missing in our sample code. The parser will create
the nodes while copying the control flow graph. They serve the parser as a stub
where it can plug in any actual instances appearing in the program. The above
mentioned stubs remain empty if there is no corresponding code available.
Empty stubs cannot be removed, as they still serve a purpose: they can be used
to query whether their contents was present in the program. We did this e.g. in
Decl Montage (Fig. 23, p. 66) in the action of node init.
After all nonterminals were replaced by their control flow graphs we get a
network of action nodes. We use the term network here instead of graphbecause the nodes and edges resemble an active communication network with
action nodes as routers (routing the control flow) with computing abilities and
edges as communication lines.
4.7.2 Execution
After the transformation process is completed, execution of the program is
almost trivial. The network of action nodes can be executed by starting at the
initial edge of the starting Montage. It refers to some node which will get the
execution focus, i.e. its rules are executed. Then the conditions of all outgoing
edges will be evaluated. If there is none that evaluates to true, then the system
stops, if there are more than one ready to fire, then the system stops too,
because an ambiguous control flow was detected10. In the "normal" case of
only one edge being ready to fire, control will be transferred to its target node.
The system runs in this manner as long as there are control flow edges ready to
fire.
A non-deterministic behaviour of the action network could also be implemented, though parallelexecution semantics was not the focus of our research.
Chapter 5
Implementation
In this chapter, the implementation of the Montage Component System(MCS) is discussed. The system allows to compose several Montage specifica¬tions to a language specification that can be executed, i.e. an interpreter for the
specified language is generated.We begin this chapter with a discussion on what a language is in MCS
(section 5.1). Section 5.2 will explain syntax processing and parsing and
section 5.3 covers the static semantics analysis and control flow composition.Notice that the given code samples are not always exact copies from the
actual implementation. The actual code has to deal with visibility rules, type
checking and thus is usually strewn with casts and additional method calls to
access private data. As we try to focus on the basic aspects, we do not want to
confuse the reader with too many details. The code is trimmed for legibilityand simplifies e.g. getter and setter methods became attributes or class casts
were omitted.
The architecture of MCS follows the Model/View/Controller (MVC) para¬
digm [KP88]. The discussion presented in this chapter concentrates on the
aspects concerning the Model in this MVS triad. User interfaces such as Mon¬
tage editors are only mentioned occasionally.
5.1 Language
A language specification consists of Montages and tokens and they will be dis¬
cussed in detail in the following sections. In order to use a Montage or a token
as a partial specification of a language, it has to be registered at the languagefirst. In MCS this is done by either creating a new Montage in the context of a
language or by importing existing Montages into a language.
71
115 Implementation
A Montage can be stored separately in order to be imported into a language,
but it cannot be deployed out of the context of a language. Does this conform
to the definition of a software component given in section 2.1? This definition
identified five aspects of a component (here: a Montage). On a language level,
three of them are of interest: extent, occurrence and usage. Appearance and
requirements of Montages will be discussed in section 5.3.
Definitely, a Montage is a unit ofcomposition and is subject to composition by
thirdparties. Montages can be stored and distributed separately, which qualifies
them for the extent and usage aspects of our definition. The occurrence aspect
- components can be deployed independently- has to be discussed in more detail.
Independent deployment does not necessarily mean that a component is a
stand-alone application. Consider a button component for instance; it can be
deployed independently of any other graphical components (such as sliders,
text fields or menus), but it cannot be deployed out of the context of a sur¬
rounding panel or window. Similar rules apply for Montages. They may be
viewed, edited and stored independently of other Montages (as buttons can be
manipulated separately in a programming environment), but their runtime
context has to be a language. Within such a context, Montages can either be
imported or exported separately or in groups (see section 5.3 for further
details).
The main graphical interface of MCS reflects the leading role that the lan¬
guage plays. The main interface contains a list of all registered Montages (see
Fig. 28). Here an overview over the language is given by a list of EBNF rules
from all Montages. Tokens are listed on the second panel of this user interface
(see Fig. 29; this interface is explained in more detail in section Fig. 5.2.1).
Plugging a Montage into a language basically means to add it to this list. The
Montage is then marked as being a part of this language definition; no consist¬
ency checks are performed at this time. This is necessary to allow for conven¬
ient editing of the Montages. If these are imported, they have to be adjusted to
the new language environment, i.e. the syntax has to be adapted to match the
general syntax rules (e.g. capitalized keywords) or properties of the static
semantics have to be renamed in order to be used by other Montages (see
section 5.3 for details).
Only after all these adaptations have been performed, the interpreter for a
language may be generated. This happens in several steps which will be listed
next and described in detail in the following sections.
5.2 Syntax 73
aCoMon - Composeg^jf*l5bir*ä^Äiil|B illftï : ^lOixl
Name Syntax
Assignment = Ident" =" Expression
Block = [ConstDecl][VarDecl]{ProcDecl} Statement
CaN„_ „
= "CALL'Jdent___ _ __ ___
Comparison = Expression Compöp Expresston
Condition = Odd | Comparison
Expression = [Ad d 0 p] Termj AddOp Term }_
Factor = Ident| Number|"C' Expression")"
If = "IF" Condition "THEN" Statement "END"
Input ='"?" ident_
Odd_
= J'ODD" Expression__
Output = "i" Expression
Program = Block'"^___
Statement =
__
Assignment | CaNJ InputJLOutput | SjtmtSeqJ If | While
StmtSeq = Statement "," Statement}
Term = Factor{MultOp Factor}
While = "WHILE" Condition "Do" Statement "END"
Edit New Montage New Synonym Import Remove
Figure 28: MCS main user interfacefor manipulating a language
5.2 Syntax
Syntax definitions are given in terms of EBNF productions [Wir77b]. They do
not only specify the syntax of the programming language, they also declare how
Montages can be combined to form a language. We distinguish between char¬
acteristic productions and synonym productions (see also section3.3.2).
Characteristic Productions: \n MCS, characteristic productions are associated
with a Montage, i.e. each Montage has exactly one characteristic production.This production defines the concrete syntax of the Montage and therefore it
reflects the control flow graph given in the graphical part of the Montage. The
control flow graph basically defines the abstract syntax of the Montage. How
this correspondence between concrete and abstract syntax is defined, was
explained in section 4.4.3. This strict correspondencebetweenthecontrolflowgraphandtheconcretesyntaxdoesnotallowalternatives(separatedby"|")inacharacteristicproduction.Examplesforcharacteristicproductions:
745 Implementation
While ::= "WHILE" Condition "DO" StmtSeq "END".
Block ::= [ConstDecl][VarDecl]{ProcDecl}"BEGIN" StmtSeq "END" ".".
Synonym Productions:Synonym production rules assign one of the alternatives
on their right side to the symbol (the placeholder) on their left side. In MCS
there are two different categories of synonym productions: the nonterminal
synonym productions and the terminal synonym productions. As their names
imply, the right side of a nonterminal synonym production may contain only
nonterminal symbols as alternatives whereas in terminal synonym productions
only terminal symbols are allowed.
Nonterminal symbols and nonterminal synonym productions are the pivot
in language construction. They operate as placeholders and thus introduce flex¬
ibility in syntax rules. One possibility to enhance or extend a language is to
provide further alternatives to a synonym production. Nonterminal synonym
productions contain nonterminal symbols on their right side. Only one
nonterminal symbol is allowed per alternative, but there may be several termi¬
nal symbols. Terminal symbols in alternatives will be discarded by the parser as
they may not feature any semantic purposes. Examples of nonterminal syno¬
nym productions are:
Statement = Assign | Call | StmtSeq | If | While.
Factor = Ident | Number | "(" Expression ")".
Terminal Synonym Productions:A Montage may feature terminal synonym
productions, provided that the placeholder appears in the characteristic pro¬
duction of the Montage. An example:
Comparison ::= Expression CompOp Expression.
CompOp = "=" | "#" | "<" | "<=" | ">" | ">=".
Comparison is the characteristic production that describes the concrete syntax
of the Montage. CompOp is a terminal synonym production that conveniently
enumerates all possible comparison operators applicable in this Montage. Nor¬
mally, terminal symbols will be discarded when parsed. However, terminal
symbols declared in a terminal synonym production will be stored in a prede¬
fined property of the same name. To be precise: the property will contain the
string that was found in the program text. In the CompOp example, the parser
would generate a CompOp property of type java. lang. String and its value
would be the actual comparison operator found. Storing these strings is neces¬
sary because after parsing a program text, only this property will contain infor¬
mation about the actual comparison.
5.2 Syntax 75
5.2.1 Token Manager and Scanner
The processing of a program text begins with lexical analysis. There, the inputstream of characters is grouped into tokens. Each token represents a value and
in most cases this value corresponds directly to the character string scanned
from the program text. Such tokens are typically keywords (e.g. if, while,.
\ / «»«,««.» \ / una.» \ c 1
etc.), separators (e.g. ; , ( , ) , etc.) or operators (e.g. +,
*
, etc.) or the
programming language and serve readability purposes or separate syntacticentities.
In certain cases, however, the original string of the program has to be con¬
verted into a more suitable form. When scanning the textual representation of
a number for example, the actual character string is of minor interest as long as
it can be converted into its corresponding integer value. These tokens are called
literals; integers, floating point numbers and booleans are typical literals.
Beyond that strings and characters need to be converted as well, i.e. it might be
necessary to replace escaped character sequences by their corresponding coun¬
terparts (e.g. the Unicode escape sequence \u202l' will be replaced by a dou¬
ble dagger '$')•
In MCS, tokens are the smallest unit of processing. They cannot be
expressed in terms of Montages, as they contain no semantics at all, they only
represent a value. Therefore they have to be managed and specified separately.In order to completely specify a language, Montages do not suffice; some
token specifications will be needed as well. Fortunately, there are only a few
such specifications, which normally are highly reusable. Specifically these token
specifications are literals and white spaces. Keywords, separators and operators
can be extracted from the EBNF rules of the Montages. Literals and white
spaces cannot be generated automatically, as they have a variable micro syntax.
In order to efficiently handle scanning of program texts, MCS has a Token
Manager that keeps track of all tokens related to a language. Each token that
must be recognized has to be registered with the token manager. The majorityof the tokens will automatically be registered by the Montages as soon as they
generate their parsers. This is very convenient, not only because their number
can be high but also because they differ from language to language.Literals and white spaces, however, have to be specified independently from
any Montages (although Montages will refer to them in their EBNFs). Such a
specification consists of:
76 5 Implementation
ScoMon - Composer for Montages [PLO] rÄ|ßfxJ
File Help
[Tokens
Name Rule JPyggL jSWpC«5$Ktr»i
f^^^mû
D
~Q
^^MBSL
"CALT
"Do"
CALL_bo p^wird
D
D
"END"
"IF"
'ODD"
"THEN"
"WHILE"
DecimalLiteral
END
IF tpywoni
ODD Cepwrfi
THEN
WHILE
[1-9][0-9]*[ILp pteaer
D
D
"D
Ident
Whitespacej[a-zA-ZJ[0-9a-zA-ZJ* tort|ter_
(stringD
New Import Remove
Figure 29: Screen shot ofthe Token Manager. Each token is specified by a name, a regular
expression, a conversion method (represented by a type name) and a skip flag indicatingwhether this token will bepassed to the parser.
• a name that can be used in EBNF to refer to a token,
• a regular expression that describes the token's micro syntax,
• a method that returns an object containing the converted value of this token,
• a flag signalling whether this token spécifies a white space, an thus will be
skipped, i.e. it will not be passed to the parser.
The token manager will generate a scanner capable of scanning the program
text and returning tokens as specified. The method we chose to generate a scan¬
ner is the same as in Lex [Les75] and is explained in [ASU86]: applying
Thompsons construction to generate a nondeterministic finite automaton (NFA)
from a regular expression, and subsequently using a method calledsubset con¬
struction to transform these NFAs into one big deterministic finite automaton
(DFA). The application of these algorithms is not problematic at all as long as
all regular expression specifying keywords are processed before those specifying
literals. Normally the character sequence representing a keyword could also be
interpreted as an identifier, as the same lexical rules apply (specified by two dif¬
ferent regular expressions). Ifsubset construction is fed with regular expressions
5.2 Syntax 77
of literals first, then it will return a literal token instead of a keyword token
(refer to [ASU86] chapter 3.8 for further details about this property of lexical
recognizers). MCS solves this problem by numbering token specifications. Lit¬
erals and white spaces are automatically assigned higher numbers1, thus guar¬
anteeing correct recognition of keyword tokens.
The Token Manager will generate a lexical scanner on demand. The scanner
is then available to the parsers of the Montages. In order to process a program
text, they have to access the scanner's interface to get a tokenized form of the
input stream. The Token Manager and the scanner are central to all Montages.This contradicts in a way the decentralised architecture of MCS. Why does not
every Montage define its own scanner?
Although decentralised scanners could be implemented, they do not make
sense in practice. The main reason inconsistent white space and identifier spec¬
ification. In a decentralised setup, it would be possible that each Montagedefines different rules for white spaces. Such programs would be unreadable
and very difficult to code, as syntax rules might change in every token. It does
not make sense that white spaces in expressions should be defined differentlyfrom white spaces in statements. An example of such a program can be found
in Fig. 30. This example shows the core of Euclid's algorithm with the follow¬
ing white space rules: IF, REPEAT and assignments have the same rules as in
Modula-2, Comparisons use underscores '_', Expressions use tabs, Statement-
Sequences have again the same rules as in Modula-2 except thatnewline or car¬
riage return is not allowed.
REPEAT
IF u_>_v THEN
t : = u ; u : = v ; v : = t ;
END; u := u - V;
UNTIL u_=_0;
Figure 30: An example ofa program with different white space rules.
In a language specification, white spaces and literals would have to be specifiedredundantly in many different Montages and thus making them error prone.
Unintended differences could easily be imported by reusing Montages from
different sources. By using a single scanner for all Montages, the user of MCS
Literals and white spaces are numbered starting from 10000, assuming that there will never be
more than 10000 keywords, separators and operators in a language. This simple method prevents
renumbering of tokens. Remember that keyword tokens will not be registered before parser gener¬
ation, i.e. user specified literals and white spaces will be entered first, preventing a first come first
serve strategy.
78 5 Implementation
does gain simplicity, consistency and speed with the loss of some (usually
unwanted) expressive power.
5.2.2 Tokens
When tokens are registered with the Token Manager, they get a unique/öf that
helps to identify a token later in the AST. The id is an int value and can be
used for quick comparisons, e.g. it is faster to compare two integers than com¬
paring to strings each containing "implementation" (the longest Modula-2
keyword).Furthermore each token features a value and a text. The value is an object of
the expected data type, e.g. ifthe token represents a numerical value, xhenvalue
would refer to an Java. lang. Integer or a Java. lang. Float type object.
The tokens text however will contain the character string as it was found in the
program code and is always of type java. lang. String. The data type of
value is known either through its id or it can be queried, e.g. using Java's
instanceof type comparison. Fig. 31 shows the token classes that are availa¬
ble. The classes Token and VToken are abstract classes. VTokens will return
values that have to be transformed from the text in the program. The types of
these values are standard wrapper classes from the package java. lang:
Boolean, Character, Integer, Float, and String respectively.
Token
A
VToken KeywordToken
L\
BooleanToken
CharacterToken
IntegerToken
RealToken
StringToken
Figure 31: Token class hierarchy
5.2 Syntax 79
5.2.3 Modular Parsing
Each Montage is capable of parsing the construct it specifies. In order to do so,
a parser has to be generated first. This is done by parsing the EBNF rule and
building a syntax tree (the EBNF tree concrete syntax tree) for each Montage.The nodes of the EBNF tree are of the same types (classes) than the nodes of
the control flow graph. These classes will be described in section 5.3. Concrete
and abstract syntax tree of a Montage are very similar which has been com¬
mented on in section 4.4.3 on "Internal Consistency". Generating the parser is
done in the integration phase of our transformation process and thus the
EBNF tree is part of a Template Montage (see p. 58) and will not be used in
Instance Montages.
Parsing a program text is now as simple as traversing these EBNF trees and
invoking the parse method on each visited node, beginning with the EBNF
tree of the start Montage of the language. Each node in an EBNF tree belongsto one of the following categories and has parsing capabilities as described:
Terminal symbol: In an EBNF rule they are enclosed in quotation marks, e.g.
"if". When the parsing method of a terminal symbol node is invoked, it
gets the next token from the scanner and compares it to its own string. An
exception is thrown if comparison fails. Upon success, a token representingthe terminal symbol is returned.
Terminal symbols are normally not kept in the abstract syntax tree (see dis¬
cussion in previous section). A kind of an exception are terminals stemmingfrom terminal synonym productions whose text is stored as predefined prop¬
erty. Note that, upon parsing, the terminal symbol cannot decide on its own
whether it should be discarded or not. Therefore a token is returned upon
every successful scanning.
Nonterminal symbol: \n EBNF, these are identifiers designating other Mon¬
tages. When expecting a nonterminal, i.e. the parser encounters a nontermi¬
nal node in an EBNF tree, then parsing will simply be delegated to the
Montage that the nonterminal node represents. This Montage in turn
traverses now its own EBNF tree in order to continue the parse process.
Of course, the nonterminal nodes in the EBNF have to be aware of the
Montages they represent. This awareness will be achieved during the integra¬tion phase as described in section 4.4.
Repetition rule: Repetitions are marked by "{...}" or "[...]". The contents of a
repetition is contained in the children nodes of a repetition rule. Duringparsing, the parsers of these children are called in turn until an error occurs.
If this error occurs at the first child, the repetition has been parsed com¬
pletely, otherwise an error has to be reported to the calling node. Note that it
80 5 Implementation
is possible to get an error on the very first attempt to parse a child. This
means that the optional part specified by the repetition was not present in
the code. The parser is also responsible to check whether the actual number
of occurrences of the repetition contents is within the specified range, i.e.
min < actual occurrences < max, where min and max denote the minimal and
maximal allowed occurrences respectively.
Alternative rule: (An alternative rule separates different nonterminals by a ver¬
tical line "I")- The parser tries to parse each alternative (nonterminal) in
turn. If all alternatives report errors, then an error has to be reported to the
calling node. There are different strategies that can be implemented for a
successful attempt to parse an alternative. Either the first alternative that
reports no errors is chosen and its parse tree is hooked in the AST, or the
remaining alternatives are tested as well. If additional alternatives report
valid parse trees then there are two choices again. Either to stop parsing
because of an ambiguous grammar or to allow user assistance. We chose the
latter option of both choices: testing all alternatives and allowing user inter¬
vention. The user will be presented with a set of valid alternatives an may
chose the one that will be inserted in the AST. This approach will substitute
parser control in a certain way, as it allows to specify ambiguous non con¬
text-free grammars (see also discussion in sections 4.5 and 7.2.1).
Terminal synonym rules are treated differently. The string that was read
by the scanner will be stored in a predefined property which has the same
name as the terminal synonym rule. As the result of a terminal synonym rule
can only be one single string this is the most efficient way to handle these
tokens and no additional indirection (tree node access) is necessary to access
its value.
53 Data Structuresfor Dynamic Semantics ofSpecification
The description of the tasks of the static semantics analysis and control flow
composition in sections 4.6 and 4.7 was given in a chronological order. This
provided an overview over the various relations between Montages properties,
control flow graphs, abstract syntax trees etc. We would like to concentrate
now on the data structures used for the implementation. The following sec¬
tions are not reflecting any chronological order of the transformation process
but rather follow the hierarchy of our main classes: the nodes in the AST.
5.3 Data Structures for Dynamic Semantics of Specification 81
5.3.1 Main Class Hierarchy
The main data structure we have to deal with is the Montage. We decided to
model a Montage as a Java class. And although it is one of the most importantclasses in our framework, it plays a rather marginal role in the class hierarchywe defined. When modelling the classes and their relations, we proceeded from
the assumption that the most important data structure is not the Montage but
rather the abstract syntax tree (AST) built by the parser. This tree will manageall the important information about the program and its static and dynamicsemantics. The whole dynamic semantics of the specification (see Fig. 10,
p. 43) is centred around the AST. Taking this point of view, the first questionis: what objects will be inserted in the AST?
We have already seen the different kinds of nodes that populate the AST:
nonterminals, actions, repetitions, initial and terminal nodes and Montages.When analysing the relations between these nodes certain common properties
may be sorted out (candidates for abstract classes). After some experimenting,we found the class hierarchy given in Fig. 32 the best fitting type model for our
system.
CFIowNode
TV
Action
1\
Terminal CFIowContainer
1 T
7\
Nonterminal
7\
Repetition
Synonym Montage
Figure 32: The MCS class hierarchy
At the root of this hierarchy is an abstract base class CFIowNode that repre¬
sents all common properties of a control flow node. E.g. it implements a tree
node interface (javax. swing, tree .MutableTreeNode) that enables it to
825 Implementation
be managed by (visual) tree data structures. It also manages incoming and out¬
going edges in order to be used as a node in a graph. In other words, aCFlowN-
ode object is capable of being a node in a tree and a node in a graph
simultaneously. Furthermore an abstract method for parsing is defined.
CFlowContainer is an abstract class too that has additional capabilities to
manage a subtree of CFlowNodes. The concrete classes then implement all the
specific features that the different nodes will need to perform their duties. We
will present them in more detail in the following sections.
5.3.2 Action
An Action object will represent an action node at runtime. EachAction object is
a JavaBean listening to an Execute event and featuring a NextAction property.
Fig. 34 sketches the declaration of the Action class. The action provided by the
user will be encapsulated in an object implementing theRule interface.
interface Rule {
// fire the dynamic semantics rule
public void
}
fire 0;
Figure 33: Declaration ofinterface Rule
MCS will encapsulate the code that the user gave for a specific action in a
class that is suited to handle all the references and that has all the access rights
needed. When executing a program, action nodes are wrapper objects hiding
the different implementations of their rules and thus simplifying the execution
model. Action nodes do not need to implement any parsing actior? as they
have no representation in the syntax rule.
5.3.3Iand TNodes
In [AKP97, KutOl] /and Tin the control flow graphs denote the initial and
terminal edge respectively. From an implementation point ofview it turned out
to be easier to implement an initial and a terminal node^. Simply think of the /
2 In order to be a concrete class, Action implements the parsing method, but it is empty.
3 terminal node denotes a Tnode here, in contrast to a node representing a terminal symbol which
is ofclass Terminal. Class names are printed in italics
5.3 Data Structures for Dynamic Semantics of Specification 83
class Action extends CFlowNode
implements ExecuteListener {
//an object containing the rule to execute
private Rule rule;
// Constructor
public Action(Rule rule) {
this.rule = rule;
}
// handle the Execute event
public void executeAction(ExecuteEvent evt)
throws DynamicSemanticsException {
rule.fire();
}
// NextAction property-
public Action getNextAction() {
Action next = null;
for(Iterator it = outgoing.iterator();it.hasNext;) {
Transition t = it.next();if (t.evaluate()) {
next = t.target;
break;
}
}
return next;
}
}
Figure 34: Declaration ofclass Action
and T letters as nodes instead of annotations of the initial and terminal edge.This model has several advantages:• Because there is only one edge class there is no need to distinguish between
ordinary edges and edges with 'lose ends', which in fact the initial and termi¬
nal edges are because they are attached to only one node. Having two more
node classes does not hurt, as we have to distinguish between the different
kinds of nodes anyway.• Connecting Montages is easier as we can merge nodes instead of tying lose
ends of edges. An T node can be merged with an / node (of the followingMontage) by deviating all incoming and outgoing edges of the/ node to the
/"node. When all /and /"nodes are merged this way, only /"nodes remain.
845 Implementation
a) I--»- AnyMtg1- - it- Expr - - »-T
AnyMtg1 I | 9-
i• value.true\^
Stmt
~»~T
c) , (T>-TE3r<T)A' /fTltc
d) ®~xm-;^i
-Kt)---"^
e.true
l _>_ l?f-r_--'_ _' «Ai.v'Mtg
L _
-'value true
(?-ED--©
*--. L
valje true
.. .. _ ^-.^ _ .
! EU—© ,
Di t
Figure 35: 'Wiring a Montages
-»EU—Hj)
• The only difference between T nodes and Action nodes is that the first do
not fire any actions. Evaluating conditions on the outgoing edges is done
exactly the same way as in action nodes.
• /nodes do neither fire actions nor evaluate outgoing edges.
Fig. 35 illustrates the 'wiring' of a control flow of a While Montage sur¬
rounded by other Montages. Its Expr nonterminal node is expanded in
Fig. 35b and collapsed again in Fig. 35c. Finally, in Fig. 35d all I nodes were
removed and their attached edges diverted to T nodes. In this process, only one
I node will remain untouched: the initial node of the start symbol's control
flow. This is where execution of the program will begin.
5.3 Data Structures for Dynamic Semantics of Specification 85
5.3.4 Terminal
A node of class Terminal is capable of parsing a terminal symbol. Terminal
nodes will not be inserted in the AST, but they are members of the EBNF trees
which reflect the EBNF rule. During the integration phase, when the parse tree
is generated, a Terminal node registers its terminal symbol with the Token
Manager and receives an unique id in return. In the parsing phase, when a
token is parsed, the Terminal node then compares the id of the encountered
token to the previously stored one. Parsing may continue upon correspondenceand the token is returned to the calling node in the parse tree. Usually this
node is content with a positive result and does not store the terminal symbol in
the AST. However, for future versions of MCS it would be possible to insert
Terminal nodes as well for improving debugging services for example.
5.3.5 Repetition
Repetition objects represent lists, options or groups. They areCFlowContainers
because they maintain their contents as a subtree (see e.g. Fig. 13). There is no
need for specialised subclasses for these three kinds of repetitions, because theywould not sufficiently differ. We decided to introduce two attributesminOc-
currence and maxOccurence which can be accessed as Java properties. Theydetermine the minimum and maximum number of occurrences of their con¬
tents in a program text. The default values are given in the following table:
Repetition min max
List 0 java.lang. Integer.MAX_VALUEa
Option 0 1
Group 1 1
a. the maximum integer value which comes as close to °° as possible.
Of course minimal and maximal occurrences are not bound to these numbers
and can be set freely by the user provided that min < max. When internal con¬
sistency is checked, the values ofmin and max can be unambiguously assignedto the concrete syntax tree instance of a repetition. This is necessary, as it is not
possible to specify any other values for min and max with EBNF than the ones
shown in the table above.
Repetition nodes in the AST serve as the container for nodes of the actual
occurrences. Edges leaving from or going to a repetition will be managed bythese nodes. The actual instances of the repetition body are stored below zRep-
865 Implementation
a) concatenated occurrence b) list occurrence
LIST
Stmt —*- Stmt
Stmt
Stmt
Stmt
LLIST
—[0]
•Stmt
rStmt
Figure 36: Difference between repeated occurrence and list occurrence ofstatements
etition node in an array. Fig. 36b shows how the AST of a list of two occur¬
rences of a statement looks like. In contrast to Fig. 36a which illustrated a
concatenated occurrence of statements in the specification.
Notice that in the graphical representation the initial and terminal edges can
be left out if there is only one node in the repetition. In this case, it is obvious
where these edges have to be attached to. A Repetition object has to guarantee
that there exists one I node and one T node either being explicitly set by the
user or implicitly assumed as in described above. The numbered nodes are visu¬
alizing the array buckets in which the actual instances are stored, they do no
exist as nodes in the AST.
5.3.6Nonterminal
Nonterminal is a concrete subclass of CFlowContainer, as it is possible to nest
nonterminals (see Fig. 9, p. 41). Nonterminals serve as placeholders for their
designated Montages, therefore the most important extension in ùieNontermi-
nal class is a reference that will refer to the designated Montage. This reference
will be set during the integration phase (section 4.4). Parsing will be delegated
to the Montage the Nonterminalobject is referring to.
5.3.7 Synonym
A Synonym object has two responsibilities: trying to parse the specified alterna¬
tives and representing the actually parsed instance in the AST.
5.3 Data Structures for Dynamic Semantics of Specification 87
Program P
d = 5*c
AST of P
Asg
Ident: d E>:pr
Term it Term
Factor
I
Factor
11
Num: 51
Ident: c
Grammar of L
Asg ::= Ident "=" Expr.
Expr ::= Term {AddOp Term }.Term ::= Factor { MultOp Factor}.Factor = Ident I Number I "(" Expr ")"
Compact AST of P
Ident: d Expr
Term
Factor
Num: 5
Term
Factor
Ident: c
Figure 37: ASTand compactASTofaprogram
Parsing has already been described in section 5.2.3. It is important not to
lose information about the origin of nodes in the AST representing the parsed
program. It is common that a Montage refers to a different one not by its
proper name but by the name of a synonym production. Fig. 37 shows such a
situation. AFactomode will always have exactly one child. Therefore theFactor
node and its child could be merged. But it is important not to 'forget' that
there was a synonym node in between. If this information was lost, it would
not be possible to refer to Factor in Montage Term.
E.g. in a Term Montage there would probably be some action rule similar to:
value = Factor~l.value * Facotr~2.value
which is computing the result of the multiplication. Referring to Factor is the
only possible way, as we do not know at specification time what kind of Factor
we will encounter in the program text.
As we may not remove the synonym node information, we do not remove
the entire node from the AST. This is in contrast to the recommendations of
the original Montage specification [AKP97]. From an implementation point of
view, we cannot simplify matters by compacting the trees. The gain in memory
is not worth the extra coding, and the additional indirection on accesses to
885 Implementation
Num, Val or Expr is certainly less time consuming than querying the node for
information about its previous ancestors.
5.3.8 Montage
The most complex data structure in our framework is the classMontage. The
complexity can be explained by the versatile use of zMontage object. For sim¬
plicity of implementation we do not distinguish between Template Montages
and Instance Montagebut rather use the same data structure for both. The over¬
head we get in terms of memory is not as bad as it may seem on first glance:
there are only few class attributes that are exclusively used by the registration
and integration phases. Most items will be needed in static semantics analysis
and control flow composition as well. The methods needed to implement
Instance and Template Montages are basically the same, which would result in
practically the same implementation, i.e. the Instance Montage implementation
would be a subset of the Template Montageimplementation. Not distinguishing
between the two has therefore the advantage that changes in the code have to
be done only in one class.
Fig. 38 shows the interface of the class Montage. Note that Montage is a
CFlowNode which enables it to be a node in an AST. Furthermore it inherits
the behaviour of a CFlowContainer, thus it is capable of managing subtrees of
CFlowNodes. All methods and attributes related to these abilities were already
implemented in Montages superclasses.The methods in Fig. 38 are grouped according to the underlying data struc¬
tures and their appearance in the graphical representation.
The first three methods are concerned with terminal synonym rule handling.
Synonym rules (objects of class Synonym) can be added to or retrieved and
deleted from a Montage object; they are referenced by name. Adding and delet¬
ing is usually done manually by the user, when he edits a Montage during the
registration/adaptation phase.The same is valid for the next group of methods which is concerned with
construction of the control flow. The methods allow to createActions, Nonter¬
minals, Repetitions and Transitions (control flow edges). They are convenience
methods, as it would also be possible to call their respective constructors
directly and insert them 'manually' by calling the corresponding tree handling
methods. But using these convenience methods has the advantage, that nodes
inserted this way are guaranteed to be correctly set up. E.g. parents are checked
first if they can hold a new node and it is guaranteed that a transition always
connects two valid nodes. Removal does not need such consistency checks and
therefore can be done by calling the tree handling methods inherited from class
5.3 Data Structures for Dynamic Semantics of Spécification 89
public class Montage extends NonTerminal {
// datastructure managing transitions
protected I initial;
protected T terminal;
// Terminal Synonym Rules
public void addSynonymRule(Synonym sr);
public Synonym getSynonymRule(String name);
public void removeSynonymRule(String name);
// Editing the Control Flow Graph
public Action newAction(String name, CFlowNode p);
public NonTerminal newNonTerminal(String name,
CFlowNode p, int cor);
public Repetition newList(String name, CFlowNode p);
public Repetition newOption(String name,
CFlowNode p);
public Repetition newRepetition(String name,
CFlowNode p, int min, int max);
public Transition newTransition(String label,
CFlowNode from, CFlowNode to);
public I setlnitialTransition(CFlowNode node);
public T setTerminalTransition(CFlowNode node);
// Properties
public void addProperty(Property p);
public Property getProperty(String name);
public void removeProperty(String name);
// Action
public void addActionRule(Action node, Rule r);
public void removeActionRule(Action node);
// Registration phase
public void setLanguage(Language newLanguage);
public Language getLanguage();
// Integration phase
public void generateParser()
throws StaticSemanticsException;
// Parsing phase
public void parse() throws ParseException;
Figure 38: Interface ofclass Montage
CFlowNode. Setting an initial and a terminal transition is done by marking a
node in the subtree as the target of the initial transition or the source of the ter-
905 Implementation
minai transition of the Montage respectively. Internally, the Montage will allo¬
cate an / or T object that represent the corresponding edge, as described in
section 5.3.3.
Properties are modelled in a class of their own which is described in the next
section. They can be added to and removed from a Montage during editing
and (important for static semantics) they can be retrieved by their name.
The two methods concerned with actions allow to add and remove an action
to/from an Action node respectively. Note that the interface Rule has already
been introduced in Fig. 33.
In the registration phase, when a Montage is associated with a language, the
setLanguage ( ) method is used to set a reference from the Montage back to
the language. This is necessary so that e.g. the parser generator may have access
to the Token Manager that is stored with the language.getLanguage ( ) is also
used during the static semantics phase to find other Montages of the same lan¬
guage in order to access their properties.
Consistency checks and generation of a parser is done in method gener-
ateParser ( ). If it throws an exception, then an error occurred. StaticSeman-
ticsException is the base class ofvarious, more detailed exception classes that can
be thrown upon the many possible errors. Any errors are also reported toSys¬
tem, err which is the standard error stream in Java.
The method parse () can only be invoked after successful parser genera¬
tion, of course. It can also throw various exceptions (among themNoParserA-
vailableException) which are subclasses ofParseException.
Static semantics analysis and control flow composition can be done without
any access methods in class Montage. Both phases will operate directly on prop¬
erties and AST/control flow graphs respectively.
5.3.9 Properties and their Initialisation
Property Declaration. A MCS property is represented by an object that imple¬
ments the Property interface given in Fig. 39.
Declaring Property as an interface has advantages over a declaration as a class.
We are not forced to implement it as a subclass and thus virtually any object
can be made a property by simply implementing its interface.
Being able to alter the name of a property is very crucial as only a change in
name allows us to adapt an imported Montage to the needs of its new language
environment. When importing a Montage to a new language, there will proba¬
bly be naming conflicts. E.g.aninitialisationrulemayrefertoapropertynamedvalue.TheimportedMontagewouldfeatureamatchingproperty,butitsnamemaybeval.Renamingvaltovalueresolvesthispropertyreference.
5.3 Data Structures for Dynamic Semantics of Specification 91
interface Property {
// user view of a property-
public void setName(String name);
public String getName();
public Class getType();
public void setValue(Object value);
public Object getValue();
// building dependency graph
public void checkReferences(Language language)throws ReferenceException;
public void resolveReferences(Montage montage)throws ReferenceException;
// methods for topological sorting
public boolean isReadyToFire();
public void markReady(Property p);
public Iterator dependent();
// initialisation of property
public void initialize()
throws StaticSemanticsException}
Figure 39: Declaration ofinterface Property
The type of a property cannot be set, this will guarantee that despite of
renaming, properties will remain compatible. The type of a property is defined
by the implementation of the getType ( ) method. Setting a new value has to
be done in accordance with the type of the property, i.e. it is in the responsibil¬ity of the setValue ( ) method, that the stored value is of the same type as
returned by getType ( ). When reading a value, an object of the Java base class
j ava. lang. Obj ec t is returned. The receiver of this value may assume that it
is of the expected type.
During the integration phase, the method checkReferences ( ) will be
called for all properties of the Template Montages. It is responsible to find all
properties that are referred to from the initialisation rule. The argument lan¬
guage provides access to the other Montages of the specified language. If a
property cannot be found, an exception will be thrown.
The counterpart of checkReferences ( ) in the static semantics phase is
the method resolveReferences ( ). It is called for the properties ofInstance
Montages. It will resolve the property references in the initialisation rules byfinding the target properties in the AST. In order to do so, it needs access to the
current Instance Montage, which is given by the argumentmontage. The prop-
925 Implementation
A
B + C>^
A-
i ^
>*
^^
B C
0*/~> 5£. \-f
A B A refers to B
A -« — B dataflow
Figure 40: Dependencies amongproperties
erty is registered with the target property. By doing so, in each target property
there will be built a list of dependent properties. These reversed references can
also be seen as dataflow arrows, as the computed values will flow along these
references. Fig. 40 illustrates these two dependencies among properties. The
solid arrows indicate references between properties, whereas the dashed arrows
indicate the dataflow, i.e. initialisation of properties is done along the dashed
arrows.
For determining the firing order of initialisation rules, some helper methods
are needed. isReadyToFire ( ) will indicate whether all referred properties are
available, i.e. their values have been computed fc undef). When, during the fir¬
ing of the initialisation rules, a referred property becomesavailable this will be
notified through the markReady ( ) method. Its argument tells which property
has become available. In order to traverse the dependency graph of the proper¬
ties, it is important to have access to the dependent properties. This will be
granted by the java. util. Iterator object that is returned by the depend¬
ent ( ) method. Note that internally, each property will probably have a list of
references in order to efficiently process theresolveReferences ( ) method.
Finally, initialize ( ) invokes the initialisation rule. It is completely up to
the implementation how this rule is processed. The only requirement is that
the value will be initialized with an object of the expected type (i.e. Java class).
Firing the Initialisation Rules. The concept of firing the initialisation rules of
all Montages has been explained in section 4.6.1 and now we want to describe
the announced algorithm. Topological sorting can be done in 0(\P\ + \R\)
[Wei99] with P being the set of all properties and R being the set of all refer¬
ences among them. Resolving references and initializing properties can be done
with the same algorithm. Note that 0(\P\ + \R\) is the runtime complexity of
5.3 Data Structures for Dynamic Semantics of Specification 93
void topsort(Montage montage)throws StaticSemanticsException {
Stack s = new Stack( ) ;
Set toProcess = new TreeSet();
Property p, q;
for each property p {
toProcess.add(p);
p.resolveReferences(montage);if (p.isReadyToFire()) {
s.push(p);
}
}
while( !s.empty() ) {
p = s.pop();
toProcess.remove(p);
p.initialize();
for each q adjacent to p {
q.markReady(p);if (q.isReadyToFire() ) {
s.push(q);
}
}
}
if (toProcess.size ( ) > 0) {
throw new CycleFound(toProcess);
}
}
Figure 41: Pseudocode to perform initialisation ofproperties
the algorithm given in Fig. 41. It does not consider any additional runtime
effort of the initialisation rules. As we cannot influence the user defined initial¬
isation rules the more important is an efficient determination of the firingorder.
The algorithm defines two temporary stores, a stack and a set. The stack will
contain all properties which are ready to fire and in the set we store all proper¬
ties that still have to be processed. First, the algorithm iterates over all proper¬
ties and adds them to the set of unprocessed properties. Each of them will have
to resolve its references. The properties that are ready to fire in the first place(e.g. because they contain constants) will be stored in the stack.
The following while loop pops a property from the stack (and removes it
from the set of unprocessed properties as well). Then its initialisation rule is
94 5 Implementation
called. As the value of the property is available thereafter, we can notify all
dependent properties of this fact. If during this notification such a property
reports to have all data available now, it is pushed on the stack.
In the end, we check if all properties were processed. If not, an exception will
be thrown, containing all the unprocessed properties. This information will
help to locate the circular references.
5.3.10 Symbol Table Implementation
Insertions to the symbol table must not destroy the contained information. As
an example we have seen two identifiers with the same name but in different
scopes (Fig. 22, p. 64). The insertion of the inner symbol i may only shadow
the outer declaration for the subtree rooted at node 4 but not for the rest of the
nodes in the AST.
This can be achieved by several implementations. The most unimaginative
one would be to copy the data such that we have a symbol table for each node.
This would be a considerable - if not unrealizable - memory overhead.
A complex but more memory efficient approach is the organisation of the
symbol table as a binary tree structure. Consider, for example, the search tree in
Fig. 42a which represents the symbol table entries (e.g. type names and associ¬
ated declaration node).
Truck —> 1, Bike —» 2, Car —> 3
We can add the entry Bus —> 5, creating a new symbol table rooted at node
Bike in Fig. 42b without destroying the old one. Ifwe add a new node at depth
d of the tree, we must create d new nodes - better than copying the whole tree.
Using the Java class j ava. util. TreeMap which is based on a Red-Black tree
implementation, we can guarantee that d < log(n), where n is the number of
entries in the last symbol table. This implementation is described in more
detail in [App97].
5.3 Data Structures for Dynamic Semantics of Specification 95
Car 3
Bike 2
Bus 5
Truck 1
a. b.
Figure 42: Binary search trees implementing a symbol table.
Seite Leer l\Blank leaf
Chapter 6
Related Work
Programming language processing is one of the oldest disciplines in computer
science, and therefore a wide variety of different approaches and systems has
been proposed or implemented. In this chapter we present a selection of them
and compare them to our system. The first three sections cover closely related
projects, which can be seen as competitors to our Montage Component Sys¬tem. Section 6.4 comprises the traditional compiler construction approachestowards language specification. Section 6.5 briefly compares three different
component models that are in wide-spread use and explains their pros and cons
for our project. Finally, section 6.6 concludes this chapter with some remarks
on two projects in our institute that influenced design decisions and the way
MCS evolved.
6.1 Gem-Mex andXASM
The system related closest to MCS is Gem-Mex, the Montage tool companion.
Montages are specified graphically using the Gem editor [Anl] (a pleonasm, as
Gem stands for Graphical Editor for Montages). Mex [Anl] (Montage Executa¬
ble Generator) then transforms the Montages specifications into XASM
[AnlOO] code. XASM again is transformed into C code which in turn can be
compiled to an executable format. All specifications in the Montages are givenin terms ofXASM rules. As already mentioned above, ASMs are the underlyingmodel of computation in Montages.Gem is a simple editor with hardly any knowledge about the edited entities.
All textual input remains unchecked, whereas very limited integrity checks are
performed when editing in the graphical section occurs. E.g. there is only one
static semantics frame allowed. A Montage specification is stored in an inter¬
mediate textual format for further processing. Mex simply transforms the dif-
97
986 Related Work
ferent entities of the intermediate format into an ASM representation.
Separating the editor from processing has the advantage that the tools are
largely independent of each other. Changes in one tool will affect the others
only if the intermediate representation has to be adapted to reflect the change.
On the other hand, the compiler has to reconstruct many of the informations
that were originally available in the editor, e.g. the nesting and connecting of
non-terminals, lists and actions.
The centrepiece of this system is the XASM {Extensible ASM) environment.
XaSM offers an improved module concept over the ASM standard [Gur94,
Gur97]. The macro concept known in pure ASMs turns out not to be very use¬
ful when building larger systems. This simple text replacement mechanism,
however, does not provide encapsulation, information hiding or namespaces,
properties essential to reuse and federated program development.
The component concept of XASM addresses this issue. Component based-
ness is achieved through «calling conventions» between ASMs (either as a sub-
ASM or as a function, refer to [AnlOO] for further details) and through
extended declarations in the header of each component-ASM. These declara¬
tions do not only announce the import relations but they also contain a list of
what functions are expected to be accessible from an importing component.
Xasm components may be stored in a library in compiled form.
How does MCS distinguishfrom Gem-Mex/XÀSM?
• MCS is based on Java as a specification language instead of ASMs. Expres¬
siveness and connectivity is therefore bound to Java's and JavaBeans' possibil¬
ities. The advantage of this approach is that the full power of the existing
Java libraries is available from the beginning. In an ASM environment, there
are only very few third party libraries available. To circumvent this, one has
to interface with C libraries. By doing so, the advantages ofASMs over Java
(simplicity of model of computation, implicit parallelism, formal back¬
ground) are given up.
• Components are partially reconfigurable at run time. Reconfiguration of
XASM components can only be achieved through recompilation.
• The Gem-Mex system uses Lex & Yacc to generate a parser for a specified
language. It implements the horizontal partitioning model (see
subsection 3.1.1. on p. 24). Xasm-components can therefore not be
deployed on the level of Montages. They can however be called in the static
or dynamic semantics in terms of library modules.
6.2 Vanilla 99
• Gem-Mex has a fixed built-in traversal mechanism of the abstract syntax
tree. This may force the user to implement artificial passes. MCS on the
other hand uses a topological sort on the dependency graph of the attributes
in a language specification. The user does not have to care about the order of
execution.
6.2 Vanilla
The Vanilla language framework [DNW+00] supports programming languageconstruction on the basis of components, as our MCS does. Vanillas aim is
very similar to ours, namely to support re-use of existing components in lan¬
guage design. The vanilla team identified the same shortcomings in traditional
compiler design and language frameworks as we did in chapters 1 and 3. Not
surprisingly, their motivation is almost identical to ours. Their interests also
focus on domain specific languages, but no detailed papers on this topic could
be found. The vanilla framework is implemented in Java and thus using Java's
component model.
In Vanilla, the entity of development is called a pod. A pod correspondsroughly to a Montage. It specifies e.g. type checking features, run-time behav¬
iour and I/O. Component interaction in the Vanilla framework occurs on the
level of pods. Pods in turn are built of a group of objects interacting on method
call level.
The Vanilla team implemented some simple languages (Pascal, 0-2) and
gained some interesting experience from these test cases. They describe that the
degree of orthogonality between individual programming constructs was sur¬
prisingly high. They expected considerable overlap between components but
discovered that in reality there are remarkably few constraints on how languagefeatures may be combined. This finding was very encouraging to us, as it
marked a first (independent) confirmation of our own impression on languagecomposition. If there are so many similarities, the following question arises:
How does MCS distinguishfrom Vanilla?
Vanilla is basically a collection of Java libraries that facilitate the generation of
interpreter components. There are no graphical user interfaces nor any model
behind the language specification. Specifying a new pod is merely done by pro¬
gramming the desired behaviour in some Java classes. These classes may be
inherited from some Vanilla base classes, or they must follow a certain syntax in
order to be processed by some Vanilla tools (e.g. the parser generator). The
type-checking library contains some sophisticated classes that support free vari¬
ables, substitution, sub-typing etc. We therefore think that Vanilla is only
1006 Related Work
suited for compiler construction professionals. Although re-use is encouraged
to a high extent, one still must have a wide knowledge of type-checking tech¬
niques (as an example) to successfully make use of the library pods. The fact
that there is no preferred model behind the Vanilla pods adds to the amount of
knowledge and experience a user should have. This freedom will probably ask
too much of programmer untrained in the field of language design and imple¬
mentation. In Montage, users have to follow a certain scheme (e.g. use control
flow charts), but the intention is to simplify the model of Montages, and thus
to simplify its use.
6.3 Intentional Programming
The pivot in modern compiler architecture is the intermediate representation.
It is generated by the front-end, it is language independent and serves as input
for the target code-generating back-end. If well designed, the intermediate rep¬
resentation is one of the slowest evolving parts in compiler suites. So why not
using the intermediate representation for programming instead of struggling
with the pitfalls of concrete syntax? This is the core idea of Intentional Pro¬
gramming (IP1) [Sim96, Sim99], a project at Microsoft Research.
IP tries to unify programming languages on a level that is common to all of
them: the intention behind language constructs. A loop can be written in many
different ways using many different programming languages, e.g. in terms of a
for or while-statement in C++, a LOOP statement in Oberon or a combina¬
tion of labels and GOTO's in Basic. They share all the same intention, namely to
repeatedly execute a certain part of a program. In IP, a loop could still be repre¬
sented in several different ways, but with no concrete syntax attached. Such an
abstraction can be manipulated by the programmer directly by changing the
abstract syntax tree.
Charles Simonyi, project leader of IP summarizes his vision ofprogramming
in the future as: "Ecology ofAbstractions" [Sim96]. Abstractions are the infor¬
mation carriers of the evolving ideas, comparable with the genes in biology.
While the individuals of the biological ecology serve as "survival machines" for
genes, programs and programming components are the carriers of abstractions.
Programming will be the force behind this ecology. Market success, reusability,
efficiency, simplicity, etc. will be some ofthe criteria for the selection process of
the "survival of the fittest".
Unfortunately, IP becomes more and more an overloaded term. We are aware that in computer
science IP will usually be associated with "Internet Protocol" or sometimes with "Intellectual
Property". However, we are using IP here, the same way as it is also used in the referenced papers.
6.3 Intentional Programming 101
Concrete syntax in this environment can be compared to the personal set¬
tings of the desktop environment of a workstation. Each programmer workingwith IP will define his own preferences for the transformation of abstractions
into a human readable form. IP claims to solve many problems with legacycode as well. There exist parsers for the most important (legacy) programming
languages, which transform the source code into IP abstractions (basically an
AST). Once this transformation is applied, the program can be manipulated,extended, debugged etc. completely within the IP system. There will be no
need to manipulate such a program with text editors any more. If required, a
textual representation can be generated for any programming language.
How does MCS distinguishfrom Intentional Programming?Code generation in IP is done by a conventional compiler back-end. This
means that abstractions have to be transformed into a suitable representationcalled "R-code". This reduction (as the transformation is referred to) is given in
terms of IP itself. The MCS system uses Java in the place of R-code, the reduc¬
tion (static semantics rules in MCS) is given in terms of Java. Boot-strappingthe MCS system using IP would be possible, whereas the R-code centred archi¬
tecture of IP would prevent a boot-strapping of IP using MCS.
The intermediate language plays a less important role in MCS. The system
would also be operational if some Montages were specified using Java and some
using COBOL. This is because interoperability is specified on a higher level of
abstraction, namely on the component level instead of the machine code level.
The architecture of MCS does not suppress a multi-language, multi-para¬digm approach. The only requirement is that Montage components can com¬
municate with each other using some component system (see also section 6.5).
Programmers of legacy languages could still use their knowledge to specifyMontages. MCS is a tool allowing a smooth shift from language based codingto IP. IP is an option in MCS (although an important one) and not a handicap.
In contrast, the IP approach means that, when existing legacy programs have
to be embedded into the IP framework, the legacy language has to be specifiedfirst, and then a legacy parser has to be adapted to produce an appropriate IP
AST instead of its conventional AST. In other words, there has to be a con¬
verter for each legacy language of which we want to reuse code. This approachinvalidates much of the legacy programmers knowledge. After the conversion
to IP, his former language skills are no longer needed. Even worse, he may not
be well educated enough to cope with the paradigm shift from concrete lan¬
guage based programming to the more abstract tree based programming. The
acceptance of the IP approach will thus be limited to organisations that can
afford the radical paradigm shift.
1026 Related Work
6.4 Compiler-Construction Took
Although compiler construction is well understood and widely practised, there
are surprisingly few tools and systems that are in common use. Two of the most
popular ones are Lex [Les75] and Yacc Qoh75]. They were constantly
improved and their names stand as a synonym for front-end generators.
Many other tools never attracted such a large community. This is mainly due
to the steep learning curve associated with the large number of different syn¬
taxes, conventions and options to control them. The following subsections
therefore do not present isolated tools, but compiler construction systems,
which (more or less) smoothly integrate the tools to obtain better interoperabil¬
ity.
6.4.1 Lex & Yacc
Hardly any text on programming language processing can ignore Lex and Yacc.
Their success is based on the coupling of the two tools — Lex was designed to
produce lexical analysers that could be used with Yacc. Although they were not
the first tools of their kind2, bundling them with Unix System V certainly
helped to make them known to many programmers. The derivative JLex is
based upon the Lex analyser generator model. It takes a specification similar to
that accepted by Lex and creates a Java source file for the corresponding lexical
analyser.MCS' lexical analyser is in fact a subset ofJLex [Ber97]. Instead of generat¬
ing Java code, it builds a finite automaton in memory and uses it for scanning
the input stream. The regular expressions accepted for specifying tokens are
identical to those of Lex.
Yacc is a LALR [ASU86] parser generator. It is fed with a parser specification
file and creates a C program (y.tab.c). This file represents an LALR parser plus
some additional C functions that the user may have added to the specification.
These functions may support e.g. tree building or execute simple actions when
a certain production has been recognized. The input file format to both Lex
and Yacc have basically the following form:
declarations9-9-
-8-6
translation rules
%%
auxiliary C functions
As the name ofYacc (Yet another compiler compiler) implies.
6.4 Compiler-Construction Tools 103
Declarations are either plain C declarations (e.g. temporary variables) or speci¬fications of tokens (Yacc) and regular definitions (Lex) respectively. The
declared items may be used in the subsequent parts of the specification.The translation rules basically associate a regular expression or production
rule with some action to be executed as soon as the expression is scanned or the
production has been parsed respectively. Actions are also given in terms of C
code.
The third part contains additional C functions to be called from the actions.
In a Yacc specification, the third part will also contain an#include directive
which links Lex's tokens to Yacc's parser.
Lex and Yacc generate monolithic parsers, i.e. they support horizontal mod¬
ularization. Their support for compiler construction is limited to the first two
phases (lexical and syntax analysis, see Fig. 1, p. 2). Semantic analysis and code
generation have to be coded by hand. Providing semantics in the actions of the
rules, however, is only feasible for simple languages without complex type sys¬
tems. The generation of source code binds the compiler implementor to Cand
even more to Unix where these tools are available on almost every machine. On
the other hand, the problem of a steep learning curve is weakened by this
approach somewhat. There are only two new notations to be learned: regularexpressions and BNF-style productions. Both are so fundamental to compilerconstruction and computer science education in general that this can hardly be
accounted for steepening the learning curve.
6.4.2 Java CC
The Java community soon started to develop its own compiler construction
tools. JLex [Ber97], JFlex [Kle99] and CUP [Hud96] are Java counterparts to
Lex, Flex and Yacc, respectively. At Sun Labs a group of developers followed a
different approach called Java CC?.
The Java CC utility follows a contrary approach: instead of decorating the
specification with C or Java code, Java is extended with some constructs to
declare tokens. The syntax is not specified in terms of BNF or EBNF rules, but
Java CC uses the new method modifierproduction to mark a Java method
as a production. Java CC is merely a preprocessor that transforms this extended
Java syntax into pure Java. Information about the lexical analyser and the syn¬
tax are extracted from the Java CC input file and distilled into an additional
method that starts and controls the parsing process.
This group of developers founded Metamata, a company specializing on debugging and produc¬tivity tools around Java. Their freeware tool Metamata Parse is the successor ofJava CC
1046 Related Work
Java CC will generate top-down LL(k) parsers; LALR grammars have to be
transformed first. The top-down approach simplifies the parsing algorithm in
such a manner that the non-terminals on the right hand side of EBNF rules
represent calls to corresponding methods. Deciding on which rule to follow in
the case of a synonym rule does not have to be implemented by the user since
Java CC will add the necessary code.
Java CC's support of compiler construction is also limited to the scanning
and parsing phase. The philosophy behind this tool differs from Lex/Yacc in
the way specifications are given. Java CC directly exploits the skills and experi¬
ences ofJava programmers. The learning curve is minimal, as the user is con¬
fronted with even fewer new constructs than in Lex/Yacc. The notation for
regular expressions keeps very much to Java's syntax, and the top down
approach in parsing is easier to understand for a compiler novice than the table
driven LALR systems. The obviously strong ties to Java restrict the deployment
of Java CC to Java environments. As virtual machines exist for any combina¬
tion of major hardware platforms and operating systems, this is only a restric¬
tion in terms of usability of the generated code.
6.4.3 Cocktail
In [GE90] Grosch and Emmelmann present a «A Tool Box for Compiler Con¬
struction» which is also known as «Cocktail» [Gro94]. It contains tools for
most of the important compiler phases, including support for attribute gram¬
mars and back end generation. Cocktail is a collection of largely independent
tools, resulting in a large degree offreedom in compiler design.
Each ofthese tools features a specification language which maycontain addi¬
tional target language code (Modula-2 or C). The implementors are aware of
the fact that such target code will make it impossible for the tools to perform
certain consistency checks (e.g.Ag— the attribute evaluator generator- can not
guarantee that attribute evaluations are side-effect free). Nevertheless, they
argue that the advantages outweigh this disadvantage: practical usability, e.g.
interfacing other tools and flatter learning curve, as e.g. conditionsand
actions can be provided in a standard language.
6.4.4 Eli
The Eli system [GHL+92] — as Cocktail — is basically a suite of compiler con¬
struction tools. There is an important difference, however: Eli provides a
smooth integration of these tools. Integration is achieved by an expert system,
called Odin [CO90], helping the user to cope with all these tools. One does
6.4 Compiler-Construction Tools 105
not have to care about matching the output of one tool to the input of another.
Thus, tools developed by different people with different conventions can be
combined into an integrated system. This integration works also if the tools are
only available in an executable format.
To add a new tool to the Eli system, only the knowledge base of the expert
system has to be changed. The knowledge base manages tool dependencies,data transformation between tools and complex user requests. Dependenciesare represented by a derivation graph. A node in this graph represents a manu¬
facturing step: the process of applying a particular tool to a particular set of
inputs and creating a set of outputs. A normal user of Eli does not have to deal
with the knowledge base, it will be updated by the programmers of the tools
when they add or remove them.
The major goal of the Eli project is to reduce the cost of producing compil¬ers. The preferred way to construct a compiler is to decompose this task into a
series of subproblems. Then, to each of these subproblems a specialized tool is
applied. Each of these tools operates on its own specialized language.Eli uses declarative specifications instead of algorithmic code. The user has
to describe the nature of a problem instead of giving a solution method. The
application of solutions to the problem will be performed by the tool. The aim
is to relieve the user as much as possible from the burden of dealing with the
tools; he should be able to concentrate on the specification of the compiler.Eli's answer to the steep learning curve is somewhat ambivalent. On the one
hand, the expert system relieves the user from fiddling around with formats,
options and conventions. But on the other hand, mastering Eli also means
mastering many specialized descriptive languages. And what seems very con¬
vincing at a first glance may be a major hurdle to use this tool suite: Many pro¬
grammers are educated in operational languages only and therefore have
difficulties in mastering the new paradigms associated with declarative specifi¬cations.
6.4.5 Depot 4
Depot4 [Lam98] is a system for language translation, i.e. an input language is
translated into an output language. There are no restrictions to these languagesother than they must represent object streams. This idea was influenced by the
Oberon system, the original implementation platform for Dépota. Texts in
Oberon can easily be extended to object streams, thus allowing Depot4 to act
as a fairly general stream translation/conversion system. However Depot4 is
Depot4 was ported to Java due to the better availability ofJava Virtual Machine platforms.
1066 Related Work
designed as an application generator rather than a traditional compiler. This
means, programming in the large (i.e. assemble different modules and specify
operations between them by providing a DSL) is the preferred use. Although
not impossible, machine instruction generation would be a hard task, as sup¬
porting mechanisms and tools are missing.
EBNF rules can be annotated by operations given in a metalanguage (M14).
Nonterminals on the right hand side of each EBNF rule are treated as proce¬
dure calls to their corresponding rules. This approach implies a predictive
parser algorithm. Of course, grammars have to be left-recursion free. M14 pro¬
vides the programmer with some predefined variables which keep track of the
repetition count, whether an option exists or which alternative was chosen in a
synonym rule.
As MCS, Depot4 addresses the (occasional) implementor of DSLs, who
does not have the extensive experience of a compiler constructor. It aims to
support fast and easy creation of language translators, without trying to com¬
pete with full blown compiler construction suites.
Depot4's similarities to MCS:
• EBNF is used for syntax specification.• the system's parser is vertically partitioned; the concept of modularization of
specification is the same.
• language specifications are precompiled and loaded dynamically on demand
Depot4 does not support:
• symbol table management
• semantic analysis• intermediate code or machine instruction generation
6.4.6 Sprint
Sprint is a methodology [CM98,TC97]for designing and implementing DSLs
developed in the Compose project at IRISA, Rennes. Sprint's formal frame¬
work is based on denotational semantics. In contrast to the approaches pre¬
sented above, it does not so much feature specialized tool support, but rather
sketches how to approach the development of a DSL. Following the Sprint
methodology, DSL development undergoes several phases:
Language analysis: Given a problem family, the first step includes analysis ofthe
mutuality and the variations of the intended language. This analysis should
identify and describe objects and operations needed to express solutions to the
problem family.
6.4 Compiler-Construction Tools 107
Interface definitions: In the next phase, the design elements of the language are
refined, the syntax is defined and an informal semantics is developed. This
semantics relates syntax elements to the objects and operations identified in the
previous step. Another aspect of this phase is to form the signature of the
semantic algebras by formalising the domain of objects and the type of opera¬
tions.
Staged semantics: During this phase, static semantics has to be separated from
dynamic semantics.
Formal definition:The semantics of the syntactic constructs are defined in
terms ofvaluation functions. They describe how the operations of the semantic
algebras are combined.
Abstract machine: The dynamic semantic algebras are then grouped to form an
abstract machine which models the behaviour of the DSL. Denotational
semantics provides an interpretation of the DSL in terms of this abstract
machine.
Implementation: The abstract machine is then implemented, typically by usinglibraries. The valuation function can either be implemented as an interpreter
running on the abstract machine or as a compiler generating abstract machine
instructions.
Partial evaluation: To automatically transform a DSL program into a compiledversion (given an interpreter), partial evaluators can be applied.The Sprint framework does not need specific tool support, as it relies on
proven techniques. As the above phases would also apply to a general purpose
language, tools supporting these can be employed. Techniques to derive imple¬mentations from definitions in denotational semantics can be adopted to the
Sprint approach. For the implementation, standard software libraries and avail¬
able partial evaluators are used.
This form of reuse helps to speed the development time of a new DSL con¬
siderably. In contrast to our approach, reuse is undertaken from a global view,
i.e. only after all the parts of the DSL are specified (in the Implementation-step) reuse is employed. At this late point, reuse involves an implicit analysisphase, as the implementor first has to find appropriate libraries or implementa¬tions of similar abstract machines. For in-house development, this approachwill meet its expectations, whereas it would be difficult to share implementa¬tion between different organisations. Thus, it would hardly be suitable to an
open development environment as discussed in chapter 2. This is due to the
fact, that the formal definition does not describe the behaviour of what is
reused later on (the software library). In our approach, the entity of reuse is also
the entity of specification. This increases the confidence of a client of a Mon-
1086 Related Work
tage in its semantics, whereas there is no such direct link between the semantics
of a valuation function and a library function for example.
6.5 Component Systems
After the success of object-oriented software development in the 1980s, the
advent of component systems began in the 1990's. Among the many compo¬
nent systems proposed, only a few succeeded in software markets. As we have
pointed out in chapter 2, components vitally rely on a market to be successful.
Thus we focus on the three most wide-spread component systems. We will give
a very short overview in order to be able to discuss it with respect to MCS.
Detailed comparisons of the three systems can be found in QGJ97,Szy97]. All
three systems provide a kind of 'wiring standard' in that they standardize how
components interact with each other. Each has it's own background and mar¬
ket.
6.5.1 CORBA
Overview. CORBA (Common Object Request Broker Architecture) was devel¬
oped by the Object Management Group (OMG), which is a large consortium,
having more than 700 member companies. It mainly focused on enterprise
computing. This background is also reflected in the architecture of the system:
it is the most inefficient and complex approach in our comparison But being
independent from a large software house (as in contrast to Suns JavaBeans or
Microsoft's COM) also has advantages. To name two: (i) a wide variety ofhard¬
ware and software platforms is supported and (ii) interface definitions are more
stable, as many partners have to agree on a change of standards.
CORBA was developed to provide interaction between software written in
different languages and running on different systems. Its architecture is shown
in figure 43. The basic idea was to provide a common interface description lan¬
guage (IDL) which allows to specify the interface that a piece of software pro¬
vides. Compilers for this IDL generate object adapters (called stubs) that
convert data (in this case the identifiers, parameters and return values of proce¬
dure calls) to be understood by an object request broker (ORB). It is this bro¬
kers responsibility to redirect invocation requests to corresponding objects
(which provide methods that can process the requests). So basically an ORB
can be seen as a centralised method invocation service. As such it can provide
additional services like object trader service, object transaction service, event notifi-
6.5 Component Systems 109
Applicatîoïïl;„objects f
Ä8
Application interfaces
(skeletons)Client interfaces
(stubs)
Object Request Broker
GORBÄ services
Figure 43: CORBA architecture and services model (CORBAfacilities not shown)
cation service, licensing services and many more (standardised by the ObjectManagement Architecture (OMA)).
CORBA as platformfor the MCS. The central role of an ORB gives the user
the most explicit view of the communication network of a system. SpecificMontage services could be implemented as an OMA service, thus simplifyingthe architecture of MCS. Examples are:
The object trader service, which offers search facilities for services, could be
extended to find suitable Montages. Additional services could be implementedlike parsing and scanning services. They could provide specialised parsing tech¬
niques (LL, LALR). A table management service could implement centralized
table management for a language.CORBA was not chosen for a prototype implementation because of its com¬
plexity. Setting up and running ORBs is non-trivial. Additional services often
undergo tedious standardisation processes. We think that the MCS architecture
is to fine grained too be efficiently used in a CORBA environment. On the one
hand, MCS builds a dense web of many simple components with only limited
functionality (e.g. the parse tree nodes or the Montage properties and their fir¬
ing rules). On the other hand, CORBA was designed to provide distributed
services over networks, and thus network delays may slow down the system
considerably. However, the independence of implementation and platform is a
major advantage of CORBA. If MCS is commercially deployed, it would be
1106 Related Work
worth watching the progress of CORBA. The standard Java libraries provide
CORBA supporting classes. With improved hard- and software implementa¬
tions of CORBA, it might be interesting to provide interfaces to this compo¬
nent system.
6.5.2 COM
Overview. COM (Component Object Model) is Microsoft's standard for inter¬
facing software. It is a binary standard, which means it does not support porta¬
bility (although a Macintosh port exists, which emulates the Intel conventions
on Motorola's PowerPC). With the success of the Windows OS and the wide
variety of software available, the need for inter-application communication
increased. The main goal was to provide a system that allows applications writ¬
ten in different languages to communicate efficiently with each other. A binary
standard for interfaces was established. COM does not specify what an object
or a component is. It merely defines an interface that a piece of software might
support. COM allows a component to have several interfaces, as well as an
interface may be shared by different components. Interfaces are represented by
COM objects. Fig. 44 shows a COM object Pets, featuring the interfaces
IUnknown, iDog and iCat (this is just a simple example). The lUnknown
interface is mandatory and serves to identify the object. Every interface has a
first method called QueryInterface. Its purpose is to enable introspection.
The next two methods shownAddRef and Release, support reference count¬
ing. The last method (Bark or Meow) symbolizes additional methods that a
component might export. COM was designed to run with different program¬
ming languages, most of them not supporting introspection and garbage col¬
lection. The omissions in the language have to be made up by forcing the
programmer into rigid coding schemes (e.g. exact rules, when reference count¬
ing methods have to be called), which is also reflected in every interface specifi¬
cation of a COM component (see Fig. 44).
COMas aplatformforMCS. COM's binary interface philosophy makes it
very unattractive to be used in heterogeneous environments like the internet.
As Microsoft does not have to consider a wide variety of interests, hardware and
software platforms, changes and updates are occurring more often than on the
other systems. ActiveX controls (very simplified: a collection of predefined
COM interfaces) has undergone many updates in the recent years. This might
lead to (forced) updates in our prototype implementation, which we wanted to
avoid. However, Microsoft's COM offers the fastest component wiring by far
Commercialising MCS based on client's demand, this should be considered.
The loss of compatibility could be made up by bridging ActiveX to JavaBeans.
6.5 Component Systems 111
1Unknown Q
(Pets
irvin r~} IDog vptr Querylnterfaceluo9 LJ
iPit r*\ ICat vptr AddRef
V
Object data Release
Bark
Querylnterface
AddRef
Release
Meow
J
Figure 44: A COM object and its internal structure. The implementation ofthe interfaces is
very similar to the virtual method tables ofC+ +
This would allow Java versions to still run with an ActiveX implementation,although at a lower speed due to conversion and Java's slower execution. As this
would penalize non-Windows clients (and probably scare them off), it is ques¬
tionable whether the a double implementation is worth the effort.
6.5.3JavaBeans
Overview. JavaBeans is in many ways the most modern component system
compared to CORBA and COM. Its main market are internet applications.Based on Sun's Java language, JavaBeans provides component interaction byfollowing coding standards. Any Java object is potentially a component (a Java-
Bean) if it follows some naming conventions for its methods. These conven¬
tions and the packing of such components into a compressed file format (theso-called
. j ar format) is basically all behind JavaBeans. JavaBeans profits from
the youthfulness of its base language Java. Many features of Java support the
safety and ease-of-use of JavaBeans, as illustrated with the following examples(i) the automatic garbage-collection prevents memory leaks in JavaBeans,whereas in COM, an error-prone reference counting scheme has to be fol¬
lowed; (ii) interfaces are part of the Java language, while in CORBA and COM
they have to be implemented following coding guidelines; (iii) the virtual
machine representation of Java objects and classes allows introspection of their
exported features at runtime, without additional implementation overhead.
What makes JavaBeans attractive might also be its major drawback: the tightcoupling of the component system with the programming language. JavaBeansis not suited to integrate legacy code into a component framework (CORBA
1126 Related Work
and COM both do better here). Portability is only supported through the Java
Virtual Machine (JVM), which means that the implementation language has to
be Java. CORBA and COM do not restrict the programming language,
CORBA is also independent of the executing hardware platform. And last but
not least, when it comes to execution speed, JavaBeans systems are much slower
than COM implementations.
JavaBeans as aplatformforMCS. We chose JavaBeans as our implementation
platform. In our case, the advantages outweighed the disadvantages. The safety
of Java's garbage collected memory model, type system and language syntax
were far more valuable to the development ofMCS than speed, which was not
a major concern. We believe that also for a commercial version of MCS, Java
would be a good implementation platform for the following reasons:
• Java neatly integrates with the internet (running as an applet).• Java offers the best language support (safety of implementation).• Legacy code is usually not a concern when developing new programming
languages.• Portability is cheap: i) it is "built-in" (on the basis of the Java virtual
machine) in contrast to COM which is relying on Intel hardware and ii) it
does not require to generate stubs for various platforms (as in CORBA).
• Speed will be a decreasingly urgent problem, due to better JIT compilers.
However, one drawback remains: Java is the only implementation language
available for the JavaBeans component framework. If this would become a hin¬
drance, other component systems should be investigated carefully (see corre¬
sponding sections above).
6.6 On the history ofthe Montage Component System
The Montage Component System continues a long tradition of research in
software engineering at our institute. Although in its contents MCS differs
from the other projects — namely GIPSY, PESCA and CIP — concept, design,
and implementation was greatly influenced by them.
GIPSY. The GIPSY approach to software development [Mar94, Mur97,
Sche98] widens the narrow focus on supporting tools. So called Integrated Pro¬
gramming Support Environments (IPSE) go beyond supporting the program¬
ming task (editing, compiling, linking). Software development is embedded in
Computer Engineering and Networks Laboratory at ETHZ
6.6 On the history of the Montage Component System 113
the other business processes such as knowledge management, customer support
or even human resources management [MS98].
Marti [Mar94] describes in his thesis GIPSY as a system that manages docu¬
ment flow in software development systems. Integrated into the system is
GIPSY/L, a domain specific language that allows to describe document flows.
Formal language definitions written in GIPSY/L are used to specify the docu¬
ments' properties. Such definitions combine the documents' syntactical struc¬
ture and semantical properties. From these specifications GIPSY generates
attributed syntax trees. Using extensible attribute grammars [MM92], specifi¬cations for languages are extensible in an object oriented way, i.e. specifications
may inherit properties and behaviour from their base specifications.The algorithm used to evaluate attributed syntax trees is the same as we use
in our system: the partial order of the dependency graph spanned by the
attributes in the tree is topologically sorted before evaluation.
The influence of GIPSY on the Montage Component System was the
understanding that our system cannot be seen as an independent piece of code.
Using MCS will have consequences not only for programmers, but will also
have an impact on how language components are distributed, deployed and
maintained. For detailed reflections on this topic see chapter 2.
PESCA. The project on Programming Environment for Safety Critical Appli¬cations [Schw97, SD97] investigated automated proving of code to specifica¬tion correspondences. Safety critical applications where first specified formallyand then programmed, using a very restricted (primitive recursive) subset of
the programming language Oberon [Schw96]. An automated transformation
of this program into a formal specification could then be validated against the
original specification. However, experience showed that this approach will be
difficult to scale up to handle real-world applications. Programs where
restricted to primitive recursiveness and the proof of correspondence between
the to specifications was only tractable for small programs.
MCS has a more prototypical character and does not focus on languages for
safety critical applications, but still there were some lessons learned from
PESCA. Algebraic specifications used in PESCA feature a steep learning curve,
they appear to be very abstract to a programmer used to imperative program¬
ming. Using Abstract State Machines or even Java seemed to be much closer to
the programmers understanding. As ease of use and simplicity were our goals, a
pragmatic approach was chosen for MCS: using Java to provide operationalsemantics of language constructs.
CIP. Using the CIP method [FMN93, Fie94, Fie99], the functional behaviour
of a system is specified by an operational model of cooperating extended state
machines. Formally a CIP system is defined as the product of all state machines
1146 Related Work
comprised, corresponding to a single state machine in a multi-dimensional
state space.
CIP had a great influence on the implementation design of MCS. It taught
us to concentrate on the basic functionality. CIP features a robust model of
computation which forces its users to a rigid but well understood development
scheme. The rationale behind this is, that it is rewarding to trade some of the
developers freedom for more productivity, reliability and clearness. MCS is
based on this conviction too.
Chapter 7
Concluding Remarks
The work on the Montage Component System revealed interesting insightsinto language processing from an unusual point of view as we tried to empha¬size the compositionality of a language, and thus, we approached languagespecification from a different angle than usual. We gained experience in the
way such systems can be specified and built and upon this experience, we will
give some ideas and proposals for improving Montages and the MCS. We hopethat our reflections on Montages and their market context as well as our ideas
for improvements will be helpful in the planned project of commercializingMontages.
7.1 What was achieved
We described a system for composing language specifications on a construct byconstruct basis. The overall structure of the system differs considerably from
conventional approaches towards language specification or compiler construc¬
tion, as it is modularized along the language constructs and not along the usual
compiler phases. We explained how such a partitioning can be realized and
used in language design.Deployment of (partial) language specifications on a component market has
been investigated. The system was put into a wider context of development,support, distribution, and marketing of language components.
The main fields of deployment we foresee for a system like ours are domain
specific language development and education, since both can profit from the
modularity of the system. Additions to a language can be specified locally in
one Montage and encapsulation prevents unwanted side effects in unaltered
parts of the specification.
115
1167 Concluding Remarks
In contrast to the original Montages [AKP97, KutOl], we use Java instead of
ASM as our specification language. The modularisation of our system is more
rigid than the ASM approach. Global variables or functions are not allowed in
our system (in contrast to the ASM approach). Therefore, precompiled Mon¬
tage specifications are easier to reuse because fewer dependencies have to be
considered. Java as a specification language allows to fall back upon a vast vari¬
ety of libraries. Using Java also implies that our specifications are typed, which
is not the case with ASM specifications. Whether this is an advantage or not is
probably a question of personal preferences. On the one hand, strong typing
can prevent errors; on the other hand many fast-prototyping tools renounce to
typing because this gives the developer more freedom ofhow to use the system.
7.2 Rough Edges
We found - as we think - some weak points in the Montages approach and try
to sketch some ideas for improvement. It is important to understand that the
presented ideas are just proposals. We do not think that these problems can be
solved easily since they are mostly concerned with the Montage notation.
Changes in Montages should be undertaken carefully and based on feedback of
as many users as possible1.In each of the following sections we describe an open problem in Montages
and try to sketch ideas for their solution or at least to give some arguments to
start a discussion.
72.1 Neglected Parsing
Montages provide no means of control over the parsing of a program. We think
that this omission is an obstacle in the deployment of the Montages approach.
Most grammars that need additional control of the parsing process are not con¬
text-free grammars, in other words, the parsing of their programs relies on con¬
text information. In many textbooks on compiler construction, context
sensitive grammars are disparaged and rewriting techniques are described in
order to make them context-free [e.g. in App97]. However, this point of view
reflects only the language designer's arguments and ignores the needs of the
users ofthe language.
This is a lesson learned from programming language design. Take the introduction of inner classes
in Java as an example for a hasty enhancement of a language undertaken by a language designer
and much criticised by programmers and experts in academia.
7.2 Rough Edges 117
Often, users of a DSL are specialists in their own field which has nothing to
do with programming language theory. They use notations that are - in some
cases — hundreds of years old and well accepted. Examples are: mathematics,
physics, chemistry or music.
Mathematicians have an own rich set of notations at their disposal which
can (more or less) easily be typed on a standard keyboard. Suppose an input
language for a symbolic computation system as a typical DSL used by mathe¬
maticians. To enter a multiplication, it would be much more comfortable to
use the common juxtaposition of variable names ab instead of the unusual
asterisk a*b which is never used in printed or hand-written formulas. Such a
language can only be implemented if the parser has access to all additional
knowledge, i.e. access to context information [Mis97].
If Montages shall be successful on a long term, then some means to control
the parser have to be offered.
7.2.2 Correspondence between Concrete andAbstract Syntax
One of the trickiest parts in MCS is the correspondence between the givenEBNF rule and the control flow graph or, generally speaking, the mapping of
the concrete syntax onto the abstract syntax. The implemented solution is
straightforward: Each nonterminal in the control flow graph has to be assignedto a nonterminal in the EBNE This approach is not satisfactory concerning list
processing; more precisely, it fails to model special conditions on the occur¬
rence of a nonterminal in a list. Fig. 45 shows an example of an expression list
as it is defined for argument passing in many languages. Note that this kind of
syntax specification is not available yet.
ExprList ::= Expression {"," Expression }.
LIST-I
Expression
ContolFlow
L— Expression
Structures
EBNF
— Expression
Expression
Figure 45: List ofexpressions showing a clash in structures between controlflow andEBNE
118 7 Concluding Remarks
In a concrete syntax, entities often have to be textually separated. In the
given example, at least one expression has to occur. Subsequent expressions (if
present) have to be separated by a comma. Abstract syntax does not need such
separators, i.e. they are purely syntactical. In addition, theLISTobject contains
a property for both the minimum and maximum number ofexpressions. Thus,
the abstract syntax definition of the ExprList Montage can be much more com¬
pact and concise than the EBNF rule. The comparison of the two structures at
the bottom of Fig. 45 illustrates this.
The structure of the abstract syntax does not reflect the structure of the con¬
crete syntax. In order to simplify reuse of Montages, such clashes should be
allowed. If the abstract syntax depends on a concrete syntax, then a Montage is
not very attractive for reuse. In most reuse cases, one wants to keep the specifi¬
cation of the behaviour, but change the concrete appearance of a language con¬
struct. For the above example, solutions may seem straightforward, but in more
complex situations, it is difficult to give a general rule for mapping EBNF
occurrences of nonterminals to control flow objects. As a more complex exam¬
ple consider an If-Statement (Fig. 46) as it appears in almost every language. It
shall exemplify some of the open questions:
LIST -,
Expression |— *"T
r
i true
-* LIST •
Statement
LIST -i
Statement
-Q-s
Y
T
Figure 46: Abstract syntax ofan ifconstruct with conditional and default blocks
Can this control flow graph be the mapping of th^ following EBNF rule?
If ::= "IF" Expression "THEN" { Statement {";" Statement}}
{ "ELSIF" Expression "THEN" { Statement {";" Statement}}}
[ "ELSE" { Statement {";" Statement}} ]
"END".
This rule could be formulated more elegantly by introducing a new Montage
for the statement sequences:
7.2 Rough Edges 119
If ::= "IF" Expression "THEN" StatSeq{ "ELSIF" Expression "THEN" StatSeq }
[ "ELSE" StatSeq ]
"END".
StatSeq ::= Statement { ";" Statement}.
Unfortunately this adds to the complexity of the language, as there is now an
additional Montage which does not contain any semantics but is purely syntac-
itcal. Ongoing work should investigate whether the introduction of macros
would provide a solution that is sufficiently flexible. Lists as in statement
sequences or in an expression list are easy to detect and to replace, but does this
also apply to the if statement itself? It contains a list with more than one
nonterminal (an expression and a corresponding list of statements). Whyshould the default case ("ELSE" clause) be modelled with an extra list of state¬
ments? Making lists more powerful, this statement sequence could be incorpo¬rated as a special case (no expression, at the end) into the other list. General
mapping rules for concrete to abstract syntax should be investigated. The appli¬cation of such rules can be found in the compiler construction literature, but
not the reversed problem: which rule to apply for translation between a givenconcrete and a given abstract syntax. Sophisticated pattern matching possiblycombined with a rule database should be investigated here.
7.2.3 BNF or EBNF?
One way to circumvent the mapping problems described above would be to
ban list processing from Montages. In order to do this properly, we propose to
use plain Backus-Naur Form (BNF) instead of its extended version (EBNF) for
specifying syntax. As BNF grammars are better suited for bottom-up parsers,
we also suggest to introduce some means of controlling the parser from Mon¬
tages (refer to section 7.2.1 for motives).BNF was extended to EBNF by introducing repetitions, i.e. groups, options
and lists represented by parenthesises "( )", brackets "[]", and braces "{}"
respectively. In addition, alternatives can be expressed by enumerating them
separated by a vertical line.
EBNF allows a much denser representation of a grammar than BNF, as repe¬
titions turn out to be a powerful notational aid. This means that the number of
Montages specifying a language can be reduced and thus the language specifi¬cation is more compact.
Yet there are some problems introduced to Montages by using EBNF instead
of BNF. While the synonym rules display alternative productions extremelywell, repetitions can be a nuisance for the language designer. Apart from the
120 7 Concluding Remarks
Add ::= Term {AddOp Term}.
AddOp = "+" 1 "-".
1---». Term~1 (setValue)
i
i
---^T!- Term~2 »-(addY »-T
Prop Type Initialisation
value Integer return new lnteger();
op Integer return new lnteger(AddOp =="+"? 1 :-1);
Action
©setValue:
value = Term~1 .value;
©add:
value = value + op*Term~2.value
Figure 47: AddMontage with EBNF
mapping problems discussed in the previous section, we have to distinguish
between presence and absence of repetitions. Consider the simple specification
of a variable declaration in Fig. 23, p. 66. The optional initialisation ofthe var¬
iable requires a tedious distinction between the presence and absence of an ini¬
tialisation. In some cases, the presence of repetition complicates a Montage.
Often, it is not obvious to a novice user how initialisation of properties or exe¬
cution of actions work. Some background knowledge of how repetitions are
managed is necessary. Let us exemplify this by the Montage in Fig. 47.
Although the syntax rule is very compact, the control flow graph is far from
that. By only looking at the graph it would be hard to deduce the meaning of
this Montage. Moreover, in contrast to the declaration Montage in Fig. 23, it is
not even necessary to distinguish between presence and absence of the list here!
An alternative representation using BNF rules can be found in Fig. 48. Its
advantages are that the control flow graphs are much easier to understand and
repetitions are banned.
BNF is (in contrast to EBNF) known to a wider community of program¬
mers and language designers. Of course, it is very easy to learn EBNF; its defi-
7.3 Conclusions and Outlook 121
Add:
Term
Add AddOp Term
AddOp: one ofii ,11 m ii
a)
Term — (setValue) T
b)Add *~ Term (add)- T
Prop Type Initialisation
value Integer return new lnteger();
op Integer return new lnteger(AddOp == "+" ? 1 : -1);
Action
©setValue:
value =Term.value;
©add:
value = Add.value + opTerm.value
Figure 48: AddMontage with BNF using left recursive grammar
nition was given by Wirth in [Wir77b] on just half a page. Yet BNF has the
advantage (which should not be underestimated) that many publications about
languages are given using BNF. This means that the specifications given in
many books and articles will be easier to enter in Montage as the original lan¬
guage rules do not have to be rewritten first.
73 Conclusions and Outlook
The problems described in the previous section are inherent in Montages and
have their origin in the overspecification of the syntax tree. We consider the
improvement of the parsing process the most important issue in the continuingdevelopment process of Montage tools. The rest of this section will sketch a
possible solution to the parsing problem and hint at possible future directions
of Montages and their applications.
122 7 Concluding Remarks
7.3.1 Separation ofConcrete andAbstract Syntax
We have discussed parsing techniques already in section 4.5, p. 52ff and an
immediate solution would be to use more powerful parsers, e.g. zn.LR(k) parser
such as it was described by Earley [Ear70]. Unfortunately, this does not solve all
problems we have concerning parsing — we still cannot handle context-sensitive
grammars as they were motivated in section 7.2.1.
On the one hand, a simple LL parser seems to be too rigid for many given
grammars, on the other hand, why bother a developer of a simple language
with all the expressive power of context-sensitive parsers?
We therefore propose to separate the problem of parsing completely from
the rest of the language specification. The MCS would then read abstract syn¬
tax trees instead of programs given as character streams.
XML as an Intermediate Representation. A very simple and pragmatic
approach would be to use XML (extensible Markup Language [W3C98]) as an
intermediate representation, generated by a parser and read by the Montage
Component System. As the syntax of XML is very easy to parse, an existing
XML parser (virtually any SAX2- or DOM-parsers for XML) can be applied to
replace the existing backtracking parser. For the user of the system this has sev¬
eral advantages:
• He can use any existing parser generator, e.g. CUP [Hud96] for LALR gram¬
mars or Metamata Parse [Met97] for LL grammars.
• The parser can be chosen to fit any existing grammar rules, e.g. in many
books BNF is used to explain a language {e.g. Java Language Specification
[GJS96]).• Developers can use a parser of their choice, i.e. one they are familiar with.
• Using XML also allows to use parsers on non-Java platforms. The generated
XML document can then be sent to an MCS running on a Java virtual
machine.
In fact, this intermediate representation already exists and we call it the Mon¬
tage Exchange Format (MXF): when saving a Montage, an XML file will be
generated containing all information necessary^ to reconstruct it again. Pres¬
ently the defining DTD (Document Type Declaration) specifies only the for¬
mat for one single Montage (so for each Montage there is a separate file). But it
should be easy to extend this DTD to allow one file containing a whole
2 SAX, the Simple API for XML and DOM, the Document Object Model both define interfaces to
access an XML parser and the parsed document respectively.
3 the coordinates of the elements of the control flow graph are stored in optional XML tags.
7.3 Conclusions and Outlook 123
abstract syntax tree. The MXF also intends to be an tool independent format
which will simplify the exchange of Montages between the different tools.
73.2 Optimization andMonitoring
After successfully specifying and implementing a language, it would be desira¬
ble to compile programs as fast as possible. Fast execution would allow to
deploy MCS also as a production tool and not only as a fast prototyping tool.
Unfortunately, optimizing executable programs for speed or memory require¬ments is not very well supported in MCS as it relies on the Java compiler that
compiles the generated code. Optimizations often run on a global scale of a
program but the partitioning scheme of MCS builds on the locality of the
given specifications. To extend the system, we would propose plug-ins:
Plug-Ins. Optimization could be offered through plug-ins to the MontageComponent System. Such plug-ins are system components that can operate on
the internal data structure (basically the annotated abstract syntax tree). Theywould be operational between two phases of the transformation process, or
after control flow composition (see Fig. 10, p. 43), but before the Java compilerwill compile the generated code.
As such plug-ins can only operate between two phases of the transformation
process, they would be limited in their optimization capabilities. But it is still
possible to write plug-ins offering monitoring of the AST data structure. Theycould visualize or even animate the transformation process and/or allow to edit
the AST interactively.
Restructuring MCS. In order to be able to replace the implementation of the
different transformation phases, it would be necessary to implement them as
plug-ins as well. In the present implementation the different phases cannot be
replaced^ by the user. The transformation phases are accessed through Javainterfaces. Therefore, it is necessary to replace some implementing class files
and restart the system in order to replace the behaviour/implementation of the
transformation process (or parts of it).To support plug-ins that can be exchanged by the user, it is necessary to
extend the interfaces of the transformation phases to offer plug-in capabilities(e.g. install, remove). This can probably be done by introducing a new interface
Plugin and by extending the existing interfaces from it.
at runtime ofMCS
124 7 Concluding Remarks
733 A Possible Application
A system like MCS does not have to be a standalone application, it could also
be part of a web browser. There it would serve as a kind of generalized Java vir¬
tual machine. A web page would not only contain tags referring to Java code,
but it would contain tags referring to a language specification and tags referring
to the program code. A MCS web browser then could download first the lan¬
guage specification, generate an interpreter for it and then download and inter¬
pret the program(s).
Ofcourse, downloading a whole language before using it might be a waste of
bandwidth for small programs. But there are certainly scenarios where the over¬
head of downloading the language specification will outperform the costs for
downloading equivalent Java applets. Using long term caching for downloaded
Montage specifications will further improve the performance of such a web
browser.
The DSL used in such web pages could be chosen according to its contents
and thus simplify the creation and support of a web site. But in order to justify
the overhead of downloading a language specification first, an application
should be used over a longer period of time and it should be highly interactive.
We might think of the following scenarios:
Tutorials: The DSL could be the subject of the tutorial, or it could be used to
program the many different user interactions that take place throughout the
tutorial.
Forms: In an application that uses many forms, a DSL for form description
could be used to configure customer tailored forms. Forms that are capable
of displaying different aspects according to the user's input are code and data
intensive in Java. A form DSL could be much more concise (e.g. consider
the TCL/TK [Ous97] versus Java API [CLK99]) and thus form intensive
applications would be less bandwidth consuming.
Symbolic computation systems: The downloaded language is a specification of
the Maple-, Mathematica- or Matlab-language. Instead of buying and
installing these applications, they could be rented on a "per use" base.
Such applications should be developed in accordance with the marketing and
support strategies presented in chapter 2 (Electronic Commerce with Software
Components).
[ADK+98] W. Aitken, B. Dickens, P. Kwiatkowski, O. de Moor, D. Richter,
C. Simonyi. Transformation in Intentional Programming. InPro-
ceedings of the 5th International Conference on Software Reuse
(ICSR'98), IEEE Computer Society Press, 1998.
[AKP97] M. Anlauff, P. W. Kutter and A. Pierantonio. Formal Aspects of
and Development Environment for Montages. In M.P.A. Sellink,
editor, Workshop on the Theory and Practice ofAlgebraic Specifica¬tions, volume ASFSDF-97 of electronic Workshops in ComputingBritish Computer Society, 1997.
[Anl] M. Anlauff. Montages Tool Companion: Gem-Mex Download at
ftp://ftp.first.gmd.de/pub/gemmex/.
[AnlOO] M. Anlauff. XASM - An Extensible, Component-Based Abstract
State Machines Language. In Y. Gurevich, P. W. Kutter, M. Oder-
sky and L. Thiele, editors, Proceedings of the Abstract State
Machine Workshop ASM200Q volume 1912 of Lecture Notes in
Computer Science, pages 69-90. Springer, 2000.
[App97] A.W. Appel. Modern Compiler Implementation in Java Cam¬
bridge University Press, 1997.
[ASU86] A.V. Aho, R. Sethi and J. D. Ullman. Compilers - Principles,
Techniques and Tools. Addison-Wesley, 1986.
[Ber97] E. Berk. JLex: A lexical analyzer generator for Java.
http://www.cs.princeton.edu/-appel/modern/java/JLex, 1997
[BS98] E. Borger and W. Schulte. A Programmer Friendly Modular Def¬
inition of the Semantics of Java. In J. Alves-Foss, editor, Formal
Syntax and Semantics ofJava Springer LNCS, 1998.
125
126 Bibliography
[CheOO] Z. Chen. JavaCard Technology for Smart Cards. The Java Series.
Addison-Wesley, 2000.
[CLK99] P. Chan, R. Lee and D. Kramer. TheJava Class Libraries. Volume
1 & 2. The Java Series, Addison-Wesley, 1999.
[CM98] C. Consel and R. Marlet. Architecturing software using a method¬
ology for language development. In Proceedings ofthe 10th Intrna-
tional Symposium on Programming Languages, Implementations,
Logics and Programs (PLILP/ALP '98), pp. 170-194, Pisa, Italy,
September 1998.
[CO90] G. M. Clemm and L. J. Osterweil. A mechanism for environment
integration. ACM Transactions on Programming Languages and Sys¬
tems, 12(1): 1-25, January 1990.
[Col99] M. Colan. InfoBus 1.2 Specification. Sun microsystems, February
1999.
[DAB99] Ch. Denzler, Ph. Altherr and R. Boichat. NOLC - Network and
on-line Consulting. Informatik/Informatique, 5:38-40, October
1999.
[DEC96] P. Deransart, A. Ed-Dbali and L. Cervoni. Prolog: The Standard,
Reference Manual. Springer-Verlag, 1996.
[DNW+00] S. Dobson, P. Nixon, V. Wade, S. Terzis and J. Fuller. Vanilla: an
open language framework. In K. Czarnecki and U.W. Eisenecker,
editors, Generative and Component-Based Software Engineering
LNCS 1799, pages 91-104. Springer-Verlag, 2000.
[Ear70] J. Earley. An Efficient Context-Free Parsing Algorithm. Communi¬
cations oftheACM, 13(2):94-102, February 1970.
[Eco97] Survey of Electronic Commerce. The Economist, May 10, p. 17,
1997.
[Fel98] P. Felber. The CORBA Object Group Service: A Service Approach to
Object Groups in CORBA PhD thesis 1867, Ecole Polytechnique
Fédérale de Lausanne, 1998.
[FGS98] P. Felber, R. Guerraoui and A. Schiper. The Implementation of a
CORBA Object Group Service. Theory and Practice ofObject Sys¬
tems, 4(2):93-105, 1998.
[Fie94] H. Fierz. SCSM — Synchronous Composition of Sequential
Machines, TIK Report 14, Computer Engineering and Networks
Laboratory, ETH Zürich, 1998.
Bibliography 127
[Fie99] H. Fierz. The CIP Method: Component and Model-based Construc¬
tion ofEmbedded Systems. European Software Engineering Confer¬
ence ESEC'99, Toulouse, 1999.
[FMN93] H. Fierz, H. Müller and S. Netos. CIP - Communicating Inter¬
acting Processes. A Formal Method for the Development of Reac¬
tive Systems. In J. Gorski, editor, Proceedings SAFECOMP'93.
Springer-Verlag, 1993.
[F+95] Frey et al. Allgemeine Didaktik. Karl Frey, Verhaltenswissenschaf¬
ten, ETH Zürich, 8th edition, 1995.
[GCC] GCC Team. GNU Compiler Collection, http://gcc.gnu.org.
[GE90] J. Grosch and H. Emmelmann.yl Tool Boxfor Compiler Construc¬
tion. Report 20, GMD, Forschungsstelle an der Universität Karl¬
sruhe, Vincenz-Prießnitz-Str. 1, D-7500 Karlsruhe, January 1990
[GH93] Y. Gurevich and J. K. Huggins. The Semantics of the C Program¬
ming Language. In Selected papers from CSL'92 (Computer Science
Logic), LNCS 702, pages 274-308. Springer-Verlag 1993.
[GHL+92] R. W. Gray, V. P. Heuring, S. P. Levi, A. M. Sloane and W. M.
Waite. Eli: A Complete, Flexible Compiler Construction System.Communications ofthe ACM, 35(2):121-131, February 1992.
[GJS96] J. Gosling, B. Joy and G. Steele. The Java Language SpecificationThe Java Series. Addison-Wesley, 1996.
[GJSBOO] J. Gosling, B. Joy, G. Steele and G. Bracha. The Java LanguageSpecification, Second Edition. The Java Series, Addison-Wesley,2000.
[Gro94] J. Grosch. CoCoLab. http://www.cocolab.de, 1994. Ingenieur¬büro für Datenverarbeitung, Turenneweg 11, D-77880 Sasbach.
[Gur94] Y Gurevich. Evolving Algebras 1993: Lipari Guide. In E. Borger,editor, Specification and Validation Methods Oxford UniversityPress, 1994.
[Gur97] Y Gurevich. May 1997 Draft of the ASM Guide. Technical ReportCSE-TR-336-97, University of Michigan EECS Department,1997.
[Har92] S. P. Harbison. Modula-3, Prentice Hall, 1992.
[Hed99] G. Hedin. Reference Attributed Grammars. In D. Parigot and M.
Mernik, editors, Second Workshop on Attribute Grammars and their
128Bibliography
Applications, WAGA'99, pages 153-172, Amsterdam, The Nether¬
lands, March 1999.
[HMT90] R. Harper, R. Milner and M. Tofte. The Definition of Standard
ML. The MIT Press 1990.
[HSSS97] S. Handschuh, K. Stanoevska-Slabeva and B. Schmid. The Con¬
cept of Mediating Electronic Product Catalogues. EM— Electronic
Markets, 7(3), September 1997.
[Hud96] S. E. Hudson. CUP Parser Generator for Java.
http://www.cs.princeton.edu/appel/modern/java/CUP,1996.
QGJ97] I. Jacobson, M. Griss and P. Jonsson. Software Reuse, acm press,
New York, 1997.
Qoh75] S. C. Johnson. Yacc — Yet another compiler-compiler. Comput¬
ing Science Tech. Rep. 32, Bell Laboratories, Murray Hill, N.J.
1975.
QW74] K. Jensen and N. Wirth. PASCAL - User Manual and Report
Springer-Verlag, 1974
[Kle99] G. Klein. JFlex: The Fast Analyser Generator for Java.
http://jflex.de, 1999.
[Knu68] D. E. Knuth. Semantics of context-free languages. Mathematical
Systems Theory, 2(2):127'-145, June 1968.
[KP88] G. E. Krasner and S. T. Pope. A cookbook for using the model-
view controller user interface paradigm in Sma\\ta\k-80.Journal of
Object-Oriented Programming 1 (3):26-49, August 1988.
[KP97a] P. W. Kutter and A. Pierantonio. The Formal Specification of
Oberon. Journal of Universal Computer Science, 3(5):443-501,
May 1997.
[KP97b] P. W. Kutter and A. Pierantonio. Montages: Specifications or Real¬
istic Programming Languages. Journal of Universal Computer Sci¬
ence, 3(5)-.416-442, May 1997.
[KR88] B. Kernighan and D. Ritchie. C Programming Language. Prentice-
Hall, 2nd edition, May 1988.
[KutOl] P. W. Kutter. Montages — Engineering ofComputer LanguagesPhD
thesis, Institut TIK, ETH Zürich, 2001.
[Lam98] J. Lampe. Depot4 - A generator for dynamically extensible trans¬
lators. Software - Concepts & Tools, 19:97-108, 1998.
Bibliography 129
[Les75] M. E. Lesk. Lex — A Lexical analyzer generator. Computing Sci¬
ence Tech. Rep. 39, Bell Telephone Laboratories, Murray Hill,
N.J. 1975.
[LisOO] R. Lischner. Delphi in a Nutshell. O'Reilly & Associates, 2000.
[LW93] D. Larkin and G. Wilson. Object-Oriented Programming and the
Objective-C Language. Available at: www.gnustep.org, NeXT
Computer Inc, 1993.
[LY97] T. Lindholm and F. Yellin. TheJava Virtual Machine SpecificationThe Java Series. Addison-Wesley, 1997.
[Mar94] R. Marti. GIPSY: Ein Ansatz zum Entwurfintegrierter Softwareent¬wicklungssysteme. PhD thesis 10463, Institut TIK, ETH Zürich,
1994.
[MB91] T. Mason and D. Brown. Lex & Yacc. Nutshell Handbooks.
O'Reilly & Associates, 1991.
[[Met97] Metamata. Java CC and Metamata Parse.
http://www.metamata.com, 1997.
[Mis97] S. A. Missura. Higher-Order Mixfix Syntaxfor Representing Mathe¬
matical Notation and its Parsing PhD thesis 12108, ETH Zürich,
1997.
[MM92] R. Marti and T. Murer. Extensible Attribute Grammars TIK
Report Nr. 6, Computer Engineering and Networks Laboratory,ETH Zürich, December 1992.
[MS98] T. Murer and D. Scherer. Organizational Integrity: Facing the Chal¬
lenge of the Global Software Process TIK-Report 51, Computer
Engineering and Networks Laboratory, ETH Zürich, 1998.
[Mur97] T. Murer. Project GIPSY: Facing the Challenge ofFuture IntegreatedSoftware Engineering Environments PhD thesis 12350, Institut
TIK, ETH Zürich, 1997.
[MV99] T. Murer and M. L. van de Vanter. Replacing Copies With Con¬
nections: Managing Software across the Virtual Organization. In
IEEE 8th International Workshop on Enabling Technologies: Infra¬structurefor Collaborative Enterprises, Stanford University, Califor¬
nia, USA, 16-18 June 1999.
[MW91] H. Mössenböck and N. Wirth. The Programming LanguageOberon-2. Structured Programming 12:179-195, 1991.
130Bibliography
[MZW95a] A. Moorman Zaremski and J. M. Wing. Signature Matching: a
Tool for Using Software Libraries. ACM Transactions on Software
Engineering andMethodology 4(2):146-170, April 1995.
[MZW95b] A. Moorman Zaremski and J. M. Wing. Specification Matching
of Software Components. Proceedings ofthe thirdACM SIGSOFT
symposium on the foundations ofsoftware engineering pages 6—17,
October 1995.
[Nau60] P. Naur. Revised Report on the Algorithmic Language ALGOL
60. Communications ofthe ACM, 3(5):299-3l4, May 1960.
[Ous97] J. K. Ousterhout. Tel and the Tk Toolkit Addison-Wesley Profes¬
sional Computing, 1994.
[PH97a] A. Poetsch-Heffter. Specification and Verification of Object-Ori¬
ented Programs. Habilitationsschrift, 1997.
[OMS97] Oberon Microsystems Inc. Component Pascal Language Report
Available at www.oberon.ch, 1997.
[PH97b] A. Poetsch-Heffter. Prototyping Realistic Programming Languages
Based on Formal Specifications. Acta Informatica, 1997.
[Sal98] P. H. Salus, editor. Little Languages and Tools, volume 3 of Hand¬
book ofProgramming Languages Macmillan Technical Publishing,
1st edition, 1998.
[Sche98] D. Scherer. Internet-wide Software Component Development Process
andDeployment Integration PhD thesis 12943, Institut TIK, ETH
Zürich, 1998
[Schm97] D. A. Schmidt. On the Need for a Popular Formal Semantics. In
ACM Conference on Strategic Directions in Computing Research vol¬
ume 32 ofACM SIGPLAN Notices, pages 115-116, June 1997.
[Schw96] D. Schweizer. Oberonj- - eine Programmiersprache für sicher¬
heitskritische Systeme. TIK-Report 21, Computer Engineering and
Networks Laboratory, ETH Zürich, 1996.
[Schw97] D. Schweizer. Ein neuer Ansatz zur Verifikation von Programmen
für sicherheitskritische Systeme PhD thesis 12056, Institut TIK,
ETH Zürich, 1997.
[SD97] D. Schweizer and Ch. Denzler. Verifying the Specification-to-
Code Correspondance for Abstract Data Types. In M. Dal Cin, C.
Meadows, and WH. Sanders, editors, Dependable Computingfor
Critical Applications 6, volume 11 of Dependable Computing and
Bibliography 131
Fault-Tolerant Systems, pages 177—202. IEEE Computer Society,1997.
[Sim96] C. Simonyi, Intentional Programming - Innovation in the Legacy
Age. Presented at IFIP WG 2.1 meeting, June 4, 1996,
http://www.research.microsoft.com/research/ip/ifipwg/ifipwg.htm
[Sim99] C. Simonyi, The Future is Intentional. In IEEE Computer, pp.
56-57. IEEE Computer Society, May, 1999.
[SK95] K. Slonneger and B. L. Kurtz. Formal Syntax and Semantics ofPro¬
gramming Languages. Addison-Wesley, Reading, 1995.
[SMOO] R. M. Stallman and R. McGrath. GNUMake. Manual, Free Soft¬
ware Foundation, April 2000.
[Sml97] Standard ML of New Jersey. Bell Laboratories, URL: ftp://ftp.research.bell-labs.com/dist/smlnj, 1997
[SP97] C. Szyperski and C. Pfister. Workshop on Component-OrientedProgramming, Summary. In M. Mühlhäuser, editor, Special Issues
in Object-Oriented Programming — ECOOP96 Workshop Reader
dpunkt Verlag, Heidelberg, 1997.
[SS99] K. Stanoevska-Slabeva. The Virtual Software House. Informatik/Informatique, 5:37-38, October 1999.
[Ste90] G. L. Steele. Common Lisp: The Language. Digital Press, 2n^ edi¬
tion, May 1990.
[Ste99] G. L. Steele. Growing a Language. In.Journal ofHigher-Order and
Symbolic Computation 12, 3:221—236, October 1999
[Str97] B. Stroustrup. The C++ Programming Lanuage. Addison-Wesley,3rd edition, July 1997.
[Szy97] C. Szyperski. Component Software. ACM Press, Addison-Wesley,1997.
[TC97] S. Thibault and C. Consel. A Framework of Application Genera¬
tor Design. In M. Harandi, editor, Proceedings of the ACM SIG-
SOFT Symposium on Software Reusability (SSR '97) Software
Engineering Notes, 22(3): 131-135, Boston, USA, May 1997.
[Tho99] S. Thompson. Haskell: The Craft ofFunctional Programming. Add¬
ison-Wesley, 2nd edition, 1999.
[Van94] M. T Vandevoorde. Exploiting specifications to improve program
performance. PhD thesis, Department of Electrical Engineeringand Computer Science, MIT, February 1994.
132
Bibliography
[VM99] M. L. van de Vanter and T. Murer. Global Names: Support for
Managing Software in a World ofVirtual Organizations. InNinth
International Symposium on System Configuration Management
(SCM-9), Toulouse, France, 5-7 September 1999.
[W3C98] W3C. Extensible Markup Language (XML) l.Q REC-xml-
19980210 edition, February 1998, W3C Recommendation.
[Wal95] C. R. Wallace. The Semantics of the C++ Programming Lan¬
guage.In E. Borger, editor, Specification and
Validation Methods,
pages131-164. Oxford University Press, 1995.
[Wal98] C. R. Wallace. The Semantics of the Java Programming Language:
Preliminary Version. Technical Report CSE-TR-335-97, EECS
Department, University ofMichigan, 1997.
[WC99] J. C. Westland and T. H. K. Clark. Global Electronic Commerce.
MIT Press, 1999.
[Wei99] M. A. Weiss. Data structures and algorithm analysis in Java Addi¬
son Wesley Longman,1999.
[Wir77a] N. Wirth. Modula - A Languagefor ModularMultiprogramming.
Software Practice and Experience, 7(l):3-35, January 1977.
[Wir77b] N. Wirth. What Can We Do about the Unnecessary Diversity of
Notations for Syntactic Definitions? Communications ofthe ACM,
20(ll):882-883, 1977.
[Wir82] N. Wirth. Programming in Modula-2 Springer-Verlag, 1982.
[Wir86] N. Wirth. Compilerbau, volume 36 of Leitfäden der angewandten
Mathematik und Mechanik LAMM B.G.Teubner, 4th edition,
1986.
[Wir88] N. Wirth. The Programming Language Oberon. Software - Prac¬
tice andExperience, 18:671-690, 1988.
Curriculum Vitae
I was born on July 20, 1968 in Liestal (BL). From 1975 to 1984 I attended pri¬
mary school and Progymnasium in Muttenz. In 1984 I entered High School
(Gymnasium) in Muttenz, from which I graduated in 1987 with Matura
Typus C.
In 1988 I began studying computer science at ETH Zürich. During this time I
did two internships at Integra (now Siemens) and Ubilab (UBS). I received the
degree Dipl. Informatik-Ing. ETH in 1993. My master thesis entitled^ Mes¬
sage Mechanismfor Oberonvtas supervised by Prof. Niklaus Wirth.
Afterwards I started working as a research and teaching assistant at the Compu¬ter Engineering and Networks Lab (TIK) of ETH in the System Engineering
group led by Prof. Albert Kündig.
133
Seite Leer /
Blank leaf