in copyright - non-commercial use permitted rights ... · acknowledgements i wouldlike to thank...

Research Collection

Doctoral Thesis

Modular language specification and composition

Author(s): Denzler, Christoph

Publication Date: 2001

Permanent Link: https://doi.org/10.3929/ethz-a-004183485

Rights / License: In Copyright - Non-Commercial Use Permitted

This page was generated automatically upon download from the ETH Zurich Research Collection. For moreinformation please consult the Terms of use.

ETH Library

https://doi.org/10.3929/ethz-a-004183485

http://rightsstatements.org/page/InC-NC/1.0/

https://www.research-collection.ethz.ch

https://www.research-collection.ethz.ch/terms-of-use

Diss. ETH No. 14215

Modular Language Specificationand Composition

A dissertation submitted to the

SWISS FEDERAL INSTITUTE OF TECHNOLOGY (ETH)

ZÜRICH

for the degree of

Doctor of Technical Sciences

presented byCHRISTOPH DENZLER

Dipl. Informatik-Ing. ETH

born July 20, 1968

citizen ofMuttenz, BL

and Schwerzenbach, ZH

Prof. Dr. Albert Kündig, examiner

Dr. Matthias Anlauff, co-examiner

O <£^o<^

2001

TIK-Schriftenreihe Nr. 43

Dissertation ETH No. 14215

Examination date: June 1, 2001

Acknowledgements

I would like to thank my adviser, Prof. Dr. Albert Kündig, for giving me the

possibility to develop my own ideas and for his confidence in my work. He

introduced me in the field of embedded systems from where I learned many

lessons on efficiency, simplicity and reusability. I am grateful to Dr. Matthias

Anlauff for agreeing to be co-examiner of this thesis. MCS modelled on his

Gem/Mex system. Especially his advice to represent undefas null saved me

many redesigns.I want to thank my colleagues from TIK, Daniel Schweizer, Philipp Kutter,

Samarjit Chakraborty, and Jörn Janneck for insightful discussions. Daniel

introduced me into the field of language and program specification. So I was

well prepared to join Philipps Montage approach. Discussing Montages with

him always left me with a head full of new ideas. Jörn and in particular Samar¬

jit then helped me to get some structure into these creative boosts. A specialthank goes to Hugo Fierz whose CIP system inspired my own implementation.Discussing modelling techniques with him gave me valuable insights into gooddesign practice. I owe my thanks also to Hans Otto Trutmann for his

FrameMaker templates and his support on this word processor.

Discussions with Stephan Missura and Nikiaus Mannhart could be reallymind challenging as their clear train of thought forced my arguments to be

equally precise. Many times, a glance at Stephans thesis gave me the needed

inspiration to continue with mine. Having lunch with Nikiaus Mannhart was

always a welcome interruption to my work. He also deserves my thanks for

proof-reading this thesis.

Last but not least, I thank Regula Hunziker, my fiancée, for her love and her

support.

l

Abstract

This dissertation deals with the modularisation of specifications of program¬

ming languages. A specification will not be partitioned into compiler phases as

usual but into modules - called Montages - which describe one language con¬

struct completely. This partitioning allows to plug specifications of languageconstructs into a specification for a language and — as Montages contain exe¬

cutable code - thus building an interpreter for the specified language. The

problems that follow from this are discussed on different levels of abstraction.

The different character of language specifications on a construct by con¬

struct basis also demands for a different concept of the whole system. Knowl¬

edge about processes, such as parsing, has to be distributed to many Montages.But this is made up by the increased flexibility of Montage deployment. A lan¬

guage construct that once has been successfully implemented for a languagecan be reused with only minor adaptations in many different languages. Well-

defined access via interfaces separate Montages clearly, such that changes in one

construct cannot have unintentional side-effects on other constructs.

This dissertation describes the concept and implementation of a system

based on Java as a specification language. Reuse of specifications is not

restricted to reuse of source code but it is also possible to reuse precompiledcomponents. This enables to distribute and sell language specifications without

giving away valuable know-how on its internals. Some approaches towards

development and distribution of as well as support on language componentswill be discussed.

A detailed description of the Montage Component System will go into the

particulars of decentralised parsing (each Montage can parse only its specifiedconstruct), explain how static semantics can be processed efficiently and how a

program can be executed, i.e. how its dynamic semantics is processed.

in

Se/te Leer /Blank leaf

Zusammenfassung

Die vorliegende Arbeit beschäftigt sich mit der Modularisierung von Spezifika¬tionen fur Programmiersprachen. Eine Spezifikation wird dabei nicht wie

üblich in Compilationsphasen unterteilt, sondern in Moduln — sogenannte

Montages — die jeweils ein Sprachkonstrukt komplett beschreiben. Diese

Unterteilung erlaubt es Spezifikationen einzelner Sprachkonstrukte zu einer

Spezifikation einer ganzen Sprache zusammenzustecken. Da die Montages aus¬

führbaren Code enthalten, lässt sich auf diese Weise ein Interpreter für die spe¬

zifizierte Sprache zusammensetzen. Es ergeben sich dabei Probleme auf

mehreren Abstraktionsebenen deren Lösungen in dieser Arbeit diskutiert wer¬

den.

Die spezielle Art der Spezifikation einer Sprache (Konstrukt für Konstrukt)

verlangt auch nach einer andersartigen Konzeption des ganzen Systems. Pro¬

zesswissen, das in einem herkömmlichen Modell in einer Phase vorhanden ist

(z.B. über das Parsen), muss auf viele einzelne Montages verteilt werden. Der

Gewinn ist aber eine enorme Flexibilität beim Einsatz von Montages. Ein

Sprachkonstrukt welches in einer Sprache erfolgreich zum Einsatz kam, kann

mit nur minimalen Anpassungen in einer neuen Sprachspezifikation eingesetztwerden. Die Montage-Schnittstellen grenzen die einzelnen Teilspezifikationensauber voneinander ab, so dass durch eine Aenderung in einem Konstrukt

keine unbeabsichtigten Nebeneffekte in anderen Konstrukten entstehen kön¬

nen.

Diese Arbeit beschreibt die Konzeption und Implementation eines Systemsbasierend auf Java als Spezifikationssprache. Teilspezifikationen können nicht

nur als Quellcode wiederverwendet werden, sondern auch als kompilierteKomponenten. Dies eröffnet auch die Möglichkeit, Sprachspezifikationen zu

vermarkten ohne dabei wertvolles Know-How preiszugeben. Es werden deshalb

einige Ansätze zur Entwicklung, Vertrieb und Unterhalt von Sprachkompo¬nenten beschrieben und diskutiert.

v

vi Zusammenfassung

Die Beschreibung des Montage Component Systems geht auf die Probleme

des dezentralen Parsens ein (jede Montage kann nur das von ihr beschriebene

Konstrukt parsen), erklärt wie die statische Semantik effizient ausgeführt wer¬

den kann und legt dar wie ein Program zur Ausführung gelangt.

Contents

Abstract iii

Zusammenfassung y

Contents vii

1 Introduction 1

1.1 Motivation and Goals 1

1.2 Contributions 4

1.3 Overview 6

2 Electronic Commerce with Software Components 7

2.1 E-Commerce for Software Components 8

2.1.1 What is a Software Component? 8

2.1.2 End-User Composition 9

2.1.3 What Market will Language Components have? 11

2.2 Electronic Marketing, Sale and Support 11

2.2.1 Virtual Software House (VSH) 12

2.2.2 On-line Consulting 13

2.2.3 Application Web 16

2.3 Formal Methods and Electronic Commerce 18

2.3.1 Properties of Formal Methods 18

3 Composing Languages 23

3.1 Partitioning of Language Specifications 24

3.1.1 Horizontal Partitioning 24

3.1.2 Vertical Partitioning 26

3.1.3 Static and Dynamic Semantics of Specifications 28

3.2 Language Composition 29

vii

viii Contents

3.2.1 The Basic Idea 29

3.2.2 On Benefits and Costs of Language Composition 30

3.3 The Montages Approach 33

3.3.1 What is a Montage? 34

3.3.2 Composition of Montages 35

4 From Composition to Interpretation 39

4.1 What is a Montage in MCS? 39

4.1.1 Language and Tokens 40

4.1.2 Montages 40

4.2 Overview 42

4.3 Registration / Adaptation 44

4.4 Integration 45

4.4.1 Parser Generation 45

4.4.2 Scanner Generation 45

4.4.3 Internal Consistency 46

4.4.4 External consistency 50

4.5 Parsing 52

4.5.1 Predefined Parser 52

4.5.2 Bottom-Up Parsing 54

4.5.3 Top-Down Parsing 56

4.5.4 Parsing in MCS 57

4.6 Static Semantics Analysis 60

4.6.1 Topological Sort of Property Dependencies 60

4.6.2 Predefined Properties 62

4.6.3 Symbol Table 63

4.7 Control Flow Composition 68

4.7.1 Connecting Nodes 68

4.7.2 Execution 70

5 Implementation 71

5.1 Language 71

5.2 Syntax 73

5.2.1 Token Manager and Scanner 75

5.2.2 Tokens 78

5.2.3 Modular Parsing 79

5.3 Data Structures for Dynamic Semantics of Specification 80

5.3.1 Main Class Hierarchy. 81

5.3.2 Action 82

Contents ix

5.3.3 I and T Nodes 82

5.3.4 Terminal 85

5.3.5 Repetition 85

5.3.6 Nonterminal 86

5.3.7 Synonym 86

5.3.8 Montage 88

5.3.9 Properties and their Initialisation 90

5.3.10 Symbol Table Implementation 94

6 Related Work 97

6.1 Gem-Mex and XASM 97

6.2 Vanilla 99

6.3 Intentional Programming 100

6A Compiler-Construction Tools 102

6.4.1 Lex & Yacc 102

6.4.2 Java CC 103

6.4.3 Cocktail 104

6AA Eli 104

6.4.5 Depot 4 105

6.4.6 Sprint 106

6.5 Component Systems 108

6.5.1 CORBA 108

6.5.2 COM 110

6.5.3 JavaBeans Ill

6.6 On the history of the Montage Component System 112

7 Concluding Remarks 115

7.1 What was achieved 115

7.2 Rough Edges 116

7.2.1 Neglected Parsing 116

7.2.2 Correspondence between Concrete and Abstract Syntax ...117

7.2.3 BNF or EBNF? 119

7.3 Conclusions and Outlook 121

7.3.1 Separation of Concrete and Abstract Syntax. 122

7.3.2 Optimization and Monitoring 123

7.3.3 A Possible Application 124

Bibliography 125

Curriculum Vitae 133

Seite Leer /

Blank leaf

Chapter 1

Introduction

This thesis aims to bring specifications of programming languages closer to

their implementations. Understanding and mastering the semantics of lan¬

guages will be important to a growing number of programmers and — to a cer¬

tain extent - also to software users.

A typical compiler processes the source program in distinct phases which

normally run in sequential order (Fig. 1). In compiler construction suites such

as GCC, Eli or Cocktail [GCC, GHL+92, GE90], each of these phases corre¬

sponds to one module/tool.

This architecture supports highly optimized compilers and is well suited for

complex general-purpose languages. Its limitations become apparent when

reuse and extensibility of existing implementations/specifications is of interest.

Consider extending an existing programming language with a new construct

— a not yet available switch statement in a subset of C for instance. In a tra¬

ditional compiler architecture, such an extension would imply changes in all

modules. The scanner must recognize the new token Switch), the parser has

to be able to parse the new statement correctly, and obviously semantic analysisand code generation have to be adapted.

1.1 Motivation and Goab

The specification and implementation of programming languages are often

seen as entirely different disciplines. This lamentable circumstance leads to a

separation of programming languages into two groups:

1. Languages designed by theorists. Most of them have very precisely formulated

semantics, many where first specified before an implementation was availa¬

ble. And often they are based on a sound concept or model of computation.

1

21 Introduction

source code

i

lexical analysis

syntax analysis

i—

semantic analysis

intermediate code generation

i

optimization

icode generation

i

execution

Figure 1: Module structure (phases) ofa traditional compiler system

Examples are: ML [HMT90], Lisp [Ste90], Haskell [Tho99] (functional

programming), Prolog [DEC96] (logic programming), ASM [Gur94,

Gur97] (state machines).

Although their semantics is often given in unambiguous mathematical

notations, they lack a large programmer community — either because the

mathematical background needed to understand the specification is consid¬

erable, or because their usually high level of abstraction hinders an efficient

implementation.2. Languages designed byprogrammers. Their development was driven by operat¬

ing systems and hardware architecture needs, by marketing considerations,

by the competition with other languages, by practical problems — for many

such languages, several of these reasons apply. Examples are: Fortran, C,

C++, Basic, Java.

Many of these languages feature surprisingly poor semantics descriptions.

Normally, their specification is given in plain English, leaving room for

many ambiguities. The most precise specification in these cases is the source

code of the compiler, if available, and even this source code has to be

adapted to the hosting platforms specific needs. For this reasons, formal

specifications are hard to provide. Checking a program against a given speci¬

fication is a very tedious and (in general) intractable task [SD97].

1.1 Motivation and Goals 3

This thesis will focus on specifications of imperative programming languages,which will be found mainly in the second group. Ideally, language specifica¬tions should be easy to understand and easy to use. Denotational descriptionsassume a thorough mathematical background of the reader. They allow to for¬

mulate many problems in a very elegant fashion (e.g. an order of elements), but

they lack to give hints for an efficient implementation (e.g. a quick-sort algo¬rithm).

Specifications should be understandable by a large community of different

readers: language designers, compiler implementors, and programmers usingthe language to implement software solutions. Denotational specifications will

be laid aside by the compiler constructor as soon as efficiency is in demand and

they are not suited to stir the interest of the average C programmer. Whether

this is an indication for the insufficient education of programmers or for the

unnecessary complexity of mathematical notions will not be subject of this the¬

sis. Both arguments find their supporters and in both there is a certain truth

and a certain ignorance towards the other.

Programming languages become more and more widespread in many disci¬

plines apart from computer science. The more powerful applications get, the

more sophisticated their scripting languages become. It is e.g. a common task

to enter formulas in spread sheet cells. Some of these formulas should only be

evaluated under certain circumstances, and some might have to be evaluated

repeatedly. Another example for the increasing importance of basic program¬

ming skills is the ability to enter queries in search engines on the Internet.

Such a programming language designed for a specific purpose is called a

domain specific language (DSL) [CM98] or a "little language" [Sal98]. Manyof those languages are used by only a few programmers, some may be even

designed for one single task in a crash project (e.g. a data migration project).Although most DSLs will never have the attention of thousands of program¬

mers, they still should feature a correct implementation and fulfil their purpose

reliably. Many DSLs will be implemented by programmers who are inexperi¬enced in language design and compiler construction. A tool for the construc¬

tion of DSLs should support these programmers. This basically means that it

should be possible to encapsulate the experience of professional languagedesigners as well as their reliable and well understood implementations in «lan¬

guage libraries». This enables a DSL designer to merely compose his new lan¬

guage instead of programming it from scratch. The language specification suite

we describe in this thesis was designed to allow for such a systematic engineer¬ing approach.

Languages are decomposed into their basic constructs, e.g. while andfor loops,if-then-else branches, expressions and so on. Each such construct is representedby a software component. Such a component provides the modularity and

4 1 Introduction

composability needed for our programming language construction kit. It ena¬

bles the designer of a new DSL to simply compose a new language by picking

the desired constructs from a library and plugging them together. Existing

components may be copied and adapted or new constructs be created from

scratch and added to the library

1.2 Contributions

We describe a system based on the Montages approach [KP97b] which we call

Montage Component System (MCS). The basic idea is to compose a program¬

ming language on a construct by construct basis.A possible module system for

an IF statement will look like Fig. 2. Extending a language basically consists of

adding new modules (Montages) to a language specification. Each such mod¬

ule is implemented as a software component, which simplifies the composition

of languages.The originating Montage approach has been extended to support:

• Abstract syntax

• Mapping of concrete to abstract syntax

• Coexistence of precompiled and newly specified components.• Configurability (to some extent) of precompiled components.

• 4 configurable phases: parsing, static semantics, generation of an execution

network and execution

We provide a survey of compiler construction technology and an in-depth dis¬

cussion of how language specifications can be composed to executable inter¬

preters. A MCS specification decomposes language constructs into four

different levels, which roughly correspond to the phases of traditional

approaches: parsing, static semantics, code generation and execution (dynamic

semantics). In contrast to conventional compiler architectures, these specifica¬

tions are not given per level for the whole language, but only for a specific lan¬

guage construct. Combining Montage specifications on each of these four

levels is the main topic of this thesis. The systems architecture supports multi-

language and multi-paradigm specifications.The Montage Component System originated in the context of a long stand¬

ing tradition of research in software development techniques at ourlaboratory

[FMN93, Mar94, Schw97, Mur97, Sche98]. Although technically oriented,

our work always considered economical aspects as well. This thesis shall con¬

tinue this tradition and starts with some reflections on marketing, sale and sup¬

port of language components. It is important to emphasise that these

considerations greatly influenced design decisions and requirements MCS had

to fulfil. The latter were:

1.2 Contributions 5

i

Execution

(dashed)

Static analysis:simultaneous

firing

(dotted)

Parse

(solid)

Identifier

a

A Montage Component Web for:

IF aob THEN a := c END

ConditionalOp

<>

Identifier

b

Figure 2: A MCS Webfor an IFstatement and its various controlflows.

• Specifications should be easy to understand and to use.

• Specifications should be reusable not only in source form but also in com¬

piled form (component reuse).• Specifications should be formal

• Modularity/composability of specifications from a programmers point of

view. Programmers think in entities of language constructs and not of com¬

piler phases.• Employment of a standard component system and a standard programming

language for specification.

6 1 Introduction

1.3 Overview

This thesis is organized as follows. Commercialization of software components

imposes some prerequisites on the architecture of MCS. These prerequisites

will be introduced and explained in Chapter 2. Chapter 3 then gives an over¬

view ofhow programming languages can be composed. Design and concepts of

MCS are explained in detail in Chapter 4, which is the core of this thesis.

Chapter 5 discusses some interesting implementation details A survey of

related work will be given in Chapter 6. References to this chapter can be

found throughout this thesis, in order to put a particular problem into a wider

context. Finally, Chapter 7 concludes this dissertation and gives some prospects

for the future.

Chapter 2

Electronic Commerce with

Software Components

Electronic commerce has emerged in the late 90s and has grown to a multi- bil¬

lion dollar business. E-Commerce businesses can be characterised by two

dimensions: the degree ofvirtuality of the traded goods and the degree ofvirtu-

ality of the document flow. Fig. 3 shows these two dimensions and illustrates

their meaning by giving some examples. Whether 'real' goods have to be

moved depends on the business sector involved. On the document flow axis

however, E-Commerce tries to reduce the amount of physically moved docu¬

ments. An important distinction has to be made between business-to-business

(B2B) commerce and business-to-consumer (B2C) commerce. Note that a

company may well distinguish between logistic (B2B like) and customer (B2C

like) commerce — Wal-Mart supermarkets e.g. B2C E-Commerce will emerge

much slower, as consumers cannot be expected to be on-line 24 hours a day.Security considerations second this argument (a signature on a piece of paper is

much easier to understand than to trust in abstract cryptographic systems).

Electronic document flow

banking

systems'virtual'

goods

travel

agency

automobile

industry'real'

goods

super¬

market

Paper-based document flow

Figure 3: Characterisation ofE-Commerce businesses. The horizontal axis represents the

'virtuality ofthe tradedgoods. The vertical axis indicates how paperless' the office works.

7

8 2 Electronic Commerce with Software Components

E-Commerce as we refer to in this chapter is concerned with the left part of

the diagram in Fig. 3. Components are virtual goods which might be sold to

end-users or deployed by other business organisations. Section 2.2 will give

some examples of both B2B and B2C scenarios with language components

involved.

Szyperski explains that there has to be a market for component technology

in order to keep the technology from vanishing [Szy97]. We further believe,

that E-Commerce will play a major role in this market, as we will elaborate in

section 2.2.This chapter will focus on the relation of software components and

their marketing. The Montage Components System presented later (chapter 4,

p. 39) in this thesis highly relies on the success of component technology.

Although basic idea of composing language specifications can be applied to a

single user environment, it makes limited sense only. Its full potential is

revealed only if language components can be distributed over the Internet

[Sche98]. In the following we will present our vision of a (language) compo¬

nent market.

After defining the term 'Software Component', we point out some premises

E-Commerce imposes on software components. We then explain our vision of

electronic marketing, consulting and support. These ideas where developed

and implemented under the umbrella of the Swiss National Science Founda¬

tions project "Virtual Software House"!. The chapter closes with some consid¬

erations on the acceptance of formal methods in software markets.

2.1 E-Commercefor Software Components

2.1.1 What is a Sofiware Component?

The term 'Software Component' is used in many different ways. For marketing

and sales persons it is simply a 'software box'. Programmers have different ideas

of components, too. To the C++ programmer, it might be a dynamic link

library (DLL); a Java programmer refers to a JavaBean and a CORBA specialist

has in mind any program offering its services through an ORB. Often the

terms 'Object' and 'Component' are not separated clearly enough. Throughout

this monograph we keep to the following definition, taken from [SP97] :

1 project no. 5003-52210

2.1 E-Commerce for Software Components 9

"A component is a unit ofcomposition with contractually specifiedinterfaces and explicit context dependencies only. Components can

be deployed independently and are subject to composition by third

parties"

This definition is product-independent and does not only cover technical

aspects. It covers five important aspects of a component:

1. Extent: unit of composition' means a piece of software that is sufficientlyself-contained to be composed by third parties (people who do not have

complete insight into the components software).2. Appearance: 'contractually specified interfaces' implies that the interface of a

component adheres to a standard of interfaces that is also followed by other

components. For example, JavaBeans employs a special naming scheme

which allows others to set and get attribute values, to fire events and to gain

insight into the components structure.

3. Requirements: 'explicit context dependencies' specifies that the component

does not only reveal its interfaces to its clients (as classes and modules

would); it furthermore tells what the deployed environment has to providein order to be operative.

4. Occurrence: 'Components can be deployed independently', i.e. it is well sep¬

arated from its environment and from other components. No implicitknowledge of the underlying operating system, hardware or other software

(components) may be used during compile-time and at run-time.

5. Usage: Components are subject to composition by third parties' and thus

will be deployed in systems unknown to the programmer of the component.

This aspect justifies the four former items of the definition. A component

should encapsulate its implementation and interact with its environment

only through well defined interfaces.

A detailed discussion on the above definition along with the difference between

'Objects' and 'Components' is given in Szyperski's book on component soft¬

ware [Szy97].

2.1.2 End-User Composition

Besides the pure definition of what a software component is, there is also a list

of requirements that a component has to fulfil to be a 'useful' component.

Components should provide solutions to a certain problem on a high level

of abstraction. As we have seen above, a component is subject to third party

composition and will eventually be employed in combination with other com¬

ponents of other component worlds (JavaBeans interoperating with ActiveX

components for example). This implies that its interfaces and its interfacing


process have to be kept as simple and as abstract as possible. There is no room

for proprietary solutions and 'hacks' — the adherence to standards is vital for

the success of a component. Of course, there is a price to be paid for seamless

interoperability: performance. In many environments based on modern high-

performance hardware, however, one is readily willing to pay this price for the

flexibility gained. As we will point out in chapter 4, there are many reasons to

pay this price in the case of language composition as well. Nevertheless, we

must stress the fact that (hand-written) compilers, as a rule, significantly out¬

perform MCS generated systems.

Aspect 4 in our component définition forbids to make any assumption on

the environment of a component. Aspect 5 implies another unknown: the

capabilities and skills of third parties. The success of a component is tightly

coupled with its ease ofuse and the time it takes to understand its functionality.

Especially if components are subject to end-user composition, this argument

becomes important. The gain in flexibility of deployment (down to the end-

user) outweighs in many cases the above mentioned drawback of slower execu¬

tion

We defined the extent of a component as the unit of composition and con¬

sidering the entire definition, we can deduce that a component is also a unit of

deployment. This technical view does not necessarily apply to a components

business aspect. Although a component may be useful for deployment, it may

not be a unit of sale. For example, there is no argument against a component

that specifies only the multiplication construct of a language, but in sales it will

normally be bundled with (at least) the division construct. MCS allows to bun¬

dle complete sublanguages and use them as a component of its own. Such a

cluster of useful components can unfold its full potential only, if it is possible to

still (re-)configure its constituents. This is important, because as components

grow (components may be composed of other clustered components), the

resulting component will become monolithic and inert. On the other hand,

bundling components for sale can be accomplished according to any criteria,

thus decoupling technical issues from marketing aspects. For example, techni¬

cally seen, the term and factor language constructs (= components in MCS)

only make sense if there are also numerical constructs such asaddition, sub¬

traction, multiplication and division. From a sales point of view, it

may be perfectly right to sell term and factor without its constituents,

because the buyer wants to specify these on his own.


2.1.3 WhatMarket will Language Components have?

When speaking about end-users, we do mean 'programming literates'. That is,

people skilled in using any imperative programming language. This is regardedas a prerequisite to be able to specify new languages. Thus the answer to this

sections question is: the hard middle. According to JefF Bezos, founder and

CEO ofAmazon.com, this is [Eco97]:

"In today's world, ifyou want to reach 12 people, that's easy: you

use the phone. Ifyou want to reach 12 million people, it's easy: you

take out an ad during the Superbowl. But ifyou want to pitchsomething to 10,000 people — the hard middle — that's reallyhard.

"

The next section addresses exactly this problem, as how to do marketing, sale

and support for the hard middle.

2.2 Electronic Marketing, Sale and Support

The most important advantage of component technology is — as its name sug¬

gests - its composition concept. Components will not make sense at all if theyto not lend themselves easily to composition. Composable software will create

more flexible and complex applications than conventionally developed soft¬

ware. This is not because a single software house would not be able to write

complex software, but it is because different styles of software development,different cultures of approaching problems will also be put together when com¬

posing software. This adds to its complexity, but may also lead to more effi¬

cient, more effective and more elegant solutions.

Component software will not be successful if there is no market for software

components. Such markets, however, have to be established first. Conventional

software distribution schemes will not match the requirements of the compo¬

nent industry. They deal with stand-alone applications, which can be sold in

shrink-wrapped boxes, containing media on which the software is delivered

(DVD, CD, diskettes, tapes) and several kilograms of manuals as hard copies.

Obviously this is not applicable to software components. Depending on their

functionality, components may be of only a few kilobytes in size and extremelysimple to use. No one would like to go to a shopping mall to acquire such a

piece of software, not to speak of the overhead of packaging, storage and sales

room space.

The typical marketplace for components is the Internet. Advanced tech¬

niques for distribution, support and invoicing have to be applied to protect


both the customers and the vendors interests. In the following subsections, we

describe how such a market place on the Internet ideally looks like.

2.2.1 Virtual Software House (VSH)

In the Virtual Software House project^ [SS99] a virtual marketplace for soft¬

ware of any kind was studied and prototype solutions have been developed.

The VSH is a virtual analogy to a real shopping mall. An organisation running

a VSH is responsible for the maintenance of the virtual mall. It will offer its

services to suppliers (shop owners in the mall) that want to sell their products

and services over the Internet. It runs the server with the VSH software, it may

provide disk space and network bandwidth to its contractors and it will give

them certain guarantees about quality of service (e.g. availability, security,

bandwidth). Invoicing and electronic contracting could also be part of its serv¬

ices.

The quality of the mediating services of a VSH will be important for all par¬

ticipants:1. Operators of a VSH can establish their name as a brand and thus attract new

customers as well as new contractors.

2. Suppliers can rely on the provided services and do not need to care about

establishing themselves the infrastructure for logistics, invoicing and market¬

ing. A VSH is an out-sourcing centre for a supplier. Especially small compa¬

nies will e.g. profit from a professional web appearance and facilitated sales

procedures. They can concentrate on their main business without neglecting

the sales aspect.

3. For the customer, an easy-to-use and easy-to-understand web interface is

very important. This includes simple and yet powerful search and query

facilities. Clearly formulated and understandable payment conditions and

strict security guarantees will help to win customers' confidence.

Of central importance is the product catalogue which can be queried by cus¬

tomers. Although several different companies sell their products under the

umbrella of a VSH, customers usually prefer to have one single central search

engine. In contrast to shopping in real shopping malls, internet customers usu¬

ally do not want to spend an afternoon surfing web pages just to find a prod¬

uct. They will expect search and query facilities that are more elaborated than

just plain text searching (as is done in search engines like AltaVista or Google).

Instead, Mediating Electronic Product Catalogues (MEPC) [HSSS97] will be

employed. These catalogues summarize the (possibly proprietary) product cata-

funded by Swiss National Science Foundation, project no. 5003-52210.


logues of the different companies participating in the VSH. Mediating means

that they offer a standardized view of the available products. MEPCs enable a

customer to query for specific products and/or combination of products.Optionally he can do this in his own language, using his own units of measure¬

ment and currencies. It is the mediating aspect of such EPCs to convert and

translate figures and languages.A VSH has a very flexible, federated organisation. It will not only allow its

contractors to sell software but it will also support them in offering other (vir¬

tual) services such as consulting (see section 2.2.2), software installation and

maintenance or federated software developing (see section 2.2.3). A VSH is not

only designed for business to consumer (B2C) commerce, but it can also serve

as a platform for business to business (B2B) commerce.

One example would be federated software development, where two or more

contractors use the services of a VSH for communication, as a clearinghouseduring the development stage and as a merchandising platform afterwards.

Another example for B2B commerce would be a financial institute joining the

VSH to offer secure payment services (by electronic money transfers or credit

card) to the other contractors.

Such Virtual Software Houses are already available on the web. A spin-off ofthe VSH project has started its business in autumn 1999 and can be reached at

www.informationobjects.com. Another similar example (although not

featuring such sophisticated product catalogues) iswww.palmgear.com. The

latter have specialised in software for palmtop or hand-held computers. Usuallysuch programs are only a few kilobytes in size and often they are distributed as

shareware by some ambitious student or hobby programmer (which not neces¬

sarily reduces their quality). Obviously such programmers do not have enoughtime and money to professionally advertise and sell their software. As some of

their programs are just simple add-ons to the standard operating system, this

market already comes close to the component market proposed above

2.2.2 On-line Consulting

Consulting on evaluation, buying, installation and use of software plays a

prominent role in today's software business. There are many attempts to replacehuman consultants by sophisticated help-utilities, electronic wizards and elabo¬

rated web-sites. But these all have two major drawbacks:/7>.tf, they cannot (yet)answer specific questions. They can only offer a database of knowledge, which

has to be queried by the user. Second, they are not human. For support, many

users prefer a human counterpart to a machine. But human manpower is


expensive, individual (on-site) consulting even more so as time consuming

travelling adds to the costs.

The aim of the Network and On-line Consulting (NOLC) project* was to

reduce costs in consulting. This aim was achieved by providing a platform that

supports on-line-connections between one (or several) client(s) and one (or sev¬

eral) consultant(s). Clients and consultants may communicate via different

services like chat, audio, video, file transfer, whiteboards and application shar¬

ing (Fig. 4)It is important, that these services are available at very low cost. This spe¬

cially means that there should not be any costly installations necessary. Fortu¬

nately, for the platform used in the NOLC project (Wintel), there is a free

software called NetMeeting [NM] which meets this requirement:

• Availability for free, as it is shipped as an optional package of Windows.

Today's hardware suffices to run NetMeeting. The only additional cost may

arise from the purchase of an optional camera.

• Supports audio, video and communication via standard drivers; so any

sound card and video camera can be used even over modem lines.

• A decentralised organisation of the communication channels.

The last item was very important: The separation of the control over the com¬

munication from the communication itself. It allows to control communica¬

tion channels between customers and consultants from a server without havingto deal with the data-stream they produce. I.e. establishment and conclusion of

a connection (and the quality of service parameters) can be controlled by a

server running the NOLC-system. The actual data-stream of the consultingsession (video, audio, etc.), however, will be sent peer-to-peer, and thus does

not influence the throughput of the NOLC-server.

With consulting, three parties are involved:

1. A client who has a problem about e.g. a piece of software.

2. A consultant who offers help in the field of the client's problem.3. An intermediary providing a platform that brings the above two parties

together.

The NOLC-project investigated the characteristics of the third party, and a

prototype of such an intermediate platform was implemented. It consists of a

server that provides the connecting services between the first two parties. The

server controls the establishment and conclusion of a consulting session. To

fulfil this task, it has access to a database that stores information about consult¬

ants and clients (for an unrestricted consulting session, both parties need to be

3 funded by Swiss National Science Foundation, project no. 5003-045329, a sub-project ofVSH.


Customer Consultant

database Server

Figure 4: NOLC architecture andparticipants in a consulting session

registered). This data consists of the communication infrastructure available at

the parties computers (audio, video), but also the skills of the consultants, their

availability and the fees.

When a potential client looks for consulting services, he will eventually visit

the web site of a consulting provider. On this web site, he may register Hmself

and apply for a consulting session. Before such a session may start, he is pre¬

sented a few choices and questions about the requested consulting session:

What kind of media shall be used (video, whiteboard, application sharing etc.)and about which topic the session will be. It is possible to let the system search

for an available consultant, or the client may himself enter the name of a con¬

sultant (if known). Just before a new session is started, the client gets an over¬

view of the costs that this session is going to generate. Usually, cost depends on

the communication infrastructure, the chosen consultant and the duration of

the session. Once the client agrees, the chosen consultant (which was previ¬ously marked as available) receives a message, indicating that a client wants to

enter a new session.

During a session both parties have the possibility to suspend the session and

to resume it later on. This feature is necessary e.g. if there are questions which


cannot be answered immediately; after resuming, the same configuration as

before suspension is re-established. Of course it is also possible to renegotiate

this configuration at any point in a session.

After completing the session, the client will be presented the total cost and

he will be asked to fill in an evaluation form which serves as a feedback to the

consultant.

The intermediary service controls re-negotiation ofthe communication con¬

figuration, suspension and resumption of sessions and, finally, the calculation

of the cost. Once a session terminates, the feedback report is stored for future

evaluation and the generated costs are automatically billed to the client.

Recently, some security features were added. The customer will have to digit¬

ally sign each session request and the server will store these signed requests.

Thus, it will be possible to prove that a customer has requested and accepted to

pay for a session.

Billing and money transfer is not part of the NOLC-platform, but is dele¬

gated to a Virtual Software House [SS99]. So NOLC is a business service pro¬

vided either by the VSH itself or by an additional party offering their services

through the VSH.

In addition, n to m group communication and load-balancing in a fault-tol¬

erant environment were also investigated [Fel98, FGS98]. The Object Group

Service (OGS) employed is a CORBA service which can be used to replicate a

server in order to make it fault-tolerant.

2.2.3 Application Web

The last aspect of E-Commerce that has been investigated in the context of the

VSH project was the maintenance of component systems. The central question

was: How to remotely maintain and control a component system, given that

the system is heterogeneously composed. I.e. components from different soft¬

ware developers with arbitrary complex versioning should be manageable

[MV99,VM99].Information management tools and techniques do not scale well in the face

of great organisational complexity. An informal approach to information shar¬

ing, based largely on manual copying of information, cannot meet the

demands of the task as size and complexity increase. Formal approaches to

sharing information are based on groupware tools, but cooperating organisa¬

tions do not always enjoy the trust or availability of sophisticated infrastruc¬

ture, methods, and skills that this approach requires. Bridging the gap requires

a simple, loosely coupled, highly flexible strategy for information sharing.

Extensive information relevant to different parts of the software life cycle


should be interconnected in a simple, easily described way; such connections

should permit selective information sharing by a variety of tools and in a vari¬

ety of collaboration modes that vary in the amount of organisational couplingthey require.

During the development of a component, the programmers have a lot of

information about the software, e.g. knowledge about versioning, compatibilitywith other components, operating systems and hardware, known bugs, omitted

and planned features, unofficial (undocumented) features etc. All this informa¬

tion is lost when the software is released in a conventional manner. The cus¬

tomer of such a component may only rely on official documentation. The core

idea of the application web is to maintain links back to the developers data. So

it would be possible at any time in the life cycle of the software to track a prob¬lem back to its roots [Mur97, Sche98]. Of course, these links will in generalnot be accessible to anybody. As an illustration, some scenarios will now be

described:

Remote on-line consulting. A software developer out-sources the support and

maintenance services to a intermediary (a consulting company). Such a con¬

sultant would have to acquire a license to provide support and the right to

access the developers internal database. In turn, customers facing problemsusing a certain product will automatically be rerouted to the consultant, when

following the help link available in their software.

Customer specific versions. A customer unhappy with a certain version of a

component may follow the maintenance link back to the developers site, where

he may request additional functions. A developer receives all the importantdata about a component, such as version number and configuration. He may

then build a new variant of the component according to the clients needs. On

completion, the client will be notified and may download the new version from

the web immediately.

Federated software development. Several software companies developing a

component system may use the services of the application web to provide data

for their partners. As these partners may be competitors in other fields, onlyrelevant links will be accessible to them. The application web allows a fine

grained access control, supporting several service levels.

A prototype of the application web was developed using Java technology.What are the services of the application web?

• Naming and versioning: It is important to maintain a simple, globally scala¬

ble naming scheme for versioned configurations distributed across organisa¬tions [VM99]. The naming scheme employed is based on the reversed

internet addresses of the accessible components (similar to the namingscheme ofJava's classes, e.g. org.omg.CORBA.portable).


• Persistence, immutability and caching: It is important that links will not be

moved or deleted. Participating organisations have to ensure the stability and

accessibility of the linked information. Repositories (another B2B service in

a VSH) could provide reliable, persistent bindings between versioned pack¬

age names and their contents. For example they may support WWW brows¬

ing and provide querying facilities.

• Reliable building: Versioned components contain links to other components

and packages (binaries) they import during the building process. As these

links will be available through the application webs services, building (even

with older versions of libraries) will always be possible.• Application inspection: Java's introspection techniques allow to (remotely)

gain insight into the components configuration at run-time. This feature is

very important in the consulting and special version scenarios above. They

allow e.g. a consultant to collect information about the actual components

environment. This information may be used to query the knowledge base

for possible known bugs or incompatibilities.

2.3 FormalMethods andElectronic Commerce

Consulting will become an emerging market in the advent of electronic com¬

merce, the main reason being the decoupling of sales and consulting. As there

will be less or no personal contact between customer and vendor in e-com-

merce, customers will not be willing to pay a price for a product so far justified

by the personal advice of the sales representative in conventional business. The

job of a sales representative will shift to a consultant's job. Consulting and sale

will be two different profit centres. It should be noted that this is to the advan¬

tage of the customer, too. He will get fair and transparent prices for the product

as well as for additional advisory services. The consultant will be more neutral,

as he does not have to sell the most profitable product to earn his salary, but the

best (from the client's point of view) if he wants to keep his clients. Could

rationalisation also cut consultative jobs? Not on a short and middle term.

There are several reasons for this answer which will be discussed in the follow¬

ing sections. All these reasons are related to the (limited) applicability of formal

methods in computer science.

2.3.1 Properties ofFormalMethods

Formal notations are applied in specifications of software. They allow — on the

basis of a sound mathematical model - to describe precisely the semantics of a


piece of software, e.g. the observable behaviour of a component. Many formal

notations have a rather simple semantics, thereby lending themselves to mathe¬

matical reasoning and automated proof techniques. But capturing the seman¬

tics of a problem has remained a hard task, although there has been extensive

research on this topic during the last decades. The following discussion will

focus on some major aspects of formal methods and their effect on commer¬

cially distributed software components.

Scalability. Unfortunately formal methods do not scale, i.e. they cannot keep

up with growing^ system. E.g. proving that an implementation matches its

specification is intractable for programs longer than a few hundred lines of

code [SD97]. The main reason is the simple semantics formal notations fea¬

ture. Typically they are lacking type systems, namespacing, information hidingand modularity. Introducing such concepts complicates these formalisms in

such a way that the complexity of their semantics catches up with those of con¬

ventional programming languages. On the other hand, the programming lan¬

guage designers learned many lessons from the formal semantics community.This lead to simpler programming languages with clearer semantics. Examplesare Standard-ML [Sml97] (functional programming with one of the most

advanced type and module systems), Oberon [Wir88] and Java [GJS96]

(imperative programming, simplicity by omitting unnecessary language fea¬

tures). These languages have proven their scalability in many projects.

Comprehensibility. To successfully employ formal methods, a sound mathe¬

matical knowledge is presumed. This can be a major hindrance to the introduc¬

tion of formal methods, as many programmers in the IT community do not

have a university degree or a similar mathematical background. Of course,

improving education for programmers is important. But this does not com¬

pletely solve the problem. For describing syntax, the Backus-Naur-Form (BNF)

[Nau60] or the Extended Backus-Naur-Form (EBNF) [Wir77b] in combina¬

tion with regular expressions [Les75] (micro syntax) has become a standard.

For the specification of semantics there is no such dominant formalism availa¬

ble. Semantics has many facets, which will be addressed by different specifica¬tion formalisms. In widespread use and easy to understand are semi-formal

specification languages. The most prominent among them is the Unified Mod¬

eling Language (UML). Notabene, UML is a good example iotone single for¬

malism that does not suffice to capture all facets of semantics: Unified does not

denote one single formalism, but rather expresses the fact that the four most

well-known architects of modeling languages agreed on a common basis. In

fact, UML features over half a dozen different diagram classes. UML is semi-

growing in terms of complexity, often even in terms of lines of code.


formal, because the different models represented by the different diagrams have

no formal correspondence. This correspondence is either given by name equiv¬

alence (in simple cases) or by textual statements in English (complex cases).

Specifications can be separated into two classes: declarative and operational

semantics. Declarative semantics describes what has to be done, but not how.

Operational semantics on the other hand describes in more detail how some¬

thing is done. To many programmers, the latter are easier to understand, as

they are closer to programming languages than declarative descriptions.

Efficiency. Why distinguish between a specification and an implementation? If

there is a specification available, why going through the error prone process of

implementing it? Basically, a specification and its corresponding implementa¬

tion can be viewed as two different - but equivalent — descriptions of a prob¬

lem and its solution. However, in practice, it is very hard to run formal

specifications efficiently (compared to C / C++ code) on a computer.

• In declarative specifications, the 'how' is missing. This means, that a code

generator would have to find out on its own which algorithm should be

applied for a certain problem. In general, declarative languages were not

designed for execution, but to reason on them. It is for example possible to

decide on the equivalence of two different declarative specifications. In the

PESCA project [Schw97, SD97] this property was used to show the equiva¬

lence of an implementation with respect to its formal specification. Using

algebraic specifications, the proof of equivalence was performed employing

semi-automated theorem proofing tools. In order to compare the implemen¬

tation with its specification, the implementation had to be transformed into

algebraic form as well. This could be done automatically in 0(1) where / is

the length of the implementation. Because of its operational origin, this

transformed specification had to be executed symbolically in order to be

compared. During this execution, terms tend to grow quickly and therefore

term rewriting gets slow and memory consuming. Apart from very simple

examples, comparing specifications does not (yet) seem to be a tractable task.

This is very unfortunate, as in the context of a VSH, a specification search

facility would be an interesting feature: given a specification of a certain

problem, is there a set of components solving it?

• Operational semantics does not do much better. Although the semantics is

already given in terms of algorithms, most operational specification lan¬

guages feature existential and universal quantifiers. In general, these quanti¬

fiers cannot be implemented efficiently (linear search over the underlying

universe, the size of which may be unknown).

Completeness. Bridging the gap between specification and implementation is

one of the trickiest parts when employing formal methods. It is important to


guarantee that the implementation obeys its specifications. As discussed above,

this is only possible with a considerable overhead of time and man-power (the

problem has to be specified, implemented and the correspondence has to be

proven). But it does not suffice to do this for new components only, it also has

to be done for the compiler, the operating system and the underlying machine!

On all these levels of specification one will face the above mentioned problems.Why can there be no gaps tolerated?

Specifying (existing) libraries of components is an expensive task, as it binds

considerable resources of well educated (and paid) specialists and not to be

underestimated computing power. This would only pay off, if consulting on

components could be eliminated and replaced by e.g. specification matching.Bridging the gaps between specifications and implementations completely is

necessary in order to avoid this automated process to interrupt and ask for user

assistance. In their work, Moorman-Zaremski and Wing investigated signaturematching [MZW95a] and specification matching [MZW95b]. Matching onlysignatures of functions featured surprising results. Queries normally returned

only a few results which had to be examined by hand. An experienced user, on

the other hand, could decide with very little effort on which function to use.

Full specification matching cannot be guaranteed to be accomplished without

user interaction (the underlying theorem prover might ask for directions or

proof strategies). Considering that these user interactions are at least as com¬

plex as the decision between a handful of pre-selected functions, the questionarises whether it makes sense to use formal specifications at all in this scenario.

Openness. Should a component reveal its specification/implementation at all?

In many business scenarios, giving away a formal specification or the source

code of a component is not a topic as the company's know-how is a primaryasset. Publishing this know-how would have considerable consequences on the

business model. The global market for operating systems may serve as an exam¬

ple: The free Linux versus the black box Windows: Free software products can¬

not be sold whereas black box software cannot be trusted. Of course, there are

many different shades of gray: from "Free Software^", "Open Source^", "Free¬

ware", "Public Domain", "Shareware", "Licensed Software" up to buying all

rights on a specific piece of software. Both sides, software developer and user

have to decide on a specific distribution model.

Conclusions and implementation decisions. The original Montages [see

section 3.3] used ASMs as specification formalism. The intended closeness to

As propagated by Richard Stallmann, founder of the GNU project (see www.gnu.org/philosophy).

As propagated by Eric S. Raymond and Bruce Perens, founders of the Open Source Initiative (see

www.opensource.org).


Turing Machines resulted in the ASMs being simple and easy to understand.

However, there are some drawbacks. ASMs lack modularity and have no type

system; on the other hand, they have a semantics of parallel execution with

fixed-point termination. All these features distinguish them enough from con¬

ventional programming languages so as to scare off C++ or Java programmers.

As a typical representative ofan operational specification language, ASMs focus

on algorithmic specifications and support restricted reasoning only.

When deciding on an implementation platform for our Montage Compo¬

nent System, ease of use, clarity of specifications and compatibility were major

criteria. The Montage model itself proved to be very useful for language specifi¬

cation; therefore, its core model was chosen as a base for our implementation.

However, ASMs were replaced by Java to reflect the considerations in this sec¬

tion. It also allows many programmers to understand the MCS without prior

learning of a new formalism and thus seeing MCS as a tool rather than an

abstract formal specification mechanism. The designers of Java learned many

lessons from the past, avoiding pitfalls of C/C++ but still attracting thousands

of programmers. Other considerable advantages of Java over ASMs are the

availability of (standard) libraries and an advanced component model (Java

Beans).

Chapter 3

Composing Languages

Programming languages are not fixed sets of rules and instructions, they evolve

over the years (or decades), as they are adopted to changing needs and markets.

This evolution might lead to a new similar language (a new branch in the lan¬

guage family) or just to a new version of the same language.Examples are the Pascal language family, which evolved over the past three

decades to many new languages. Among them: Pascal QW74], Modula

[Wïr77a], Modula-2 [Wir82], Modula-3 [Har92], Oberon [Wir88], Oberon-2

[MW91], ComponentPascal [OMS97], Delphi [LisOO]. Another important

dynasty of programming languages are the C-like languages: Starting with C

[KR88](K&R-C (Kernighan and Ritchie) and ANSI-C are distinguished) and

evolving to Objective-C [LW93] and C++ [Str97]. The latter underwent sev¬

eral enhancements during its two decades of existence, e.g. introduction of

templates.Java [GJS96] as a relatively new, but nevertheless successful language has

already undergone an interesting evolution: It started as a Pascal like language(then called Oak) and got its C++ like face in order to make it popular. Com¬

pared to C++ or Delphi, it lost many features as for example its hybrid charac¬

ter (all methods have to be class-bound), address pointers, operator

overloading, multiple inheritance, and templates. Java's success soon led to new

enhancements of the language (inner classes) and to new dialects (Java 1.0

[GJS96], Java 1.1 [GJSB00], JavaCard [CheOO]).Some basic concepts are the same in all the above mentioned programming

languages, e.g. while loops, if-then-else statements or variable declarations.

These constructs may have different syntax but their semantics remains the

same. All imperative programming languages basically differ in the number of

language constructs they offer to the programmer. For example, C++ can be

described as C plus additional language constructs. In general, programming

languages are simply composed of different language constructs.

23

24 3 Composing Languages

3.1 Partitioning ofLanguage Specifications

A programming language specification is usually given in form of text and con¬

sists of one or several files containing specifications given in a specialised nota¬

tion or language1. Well known examples are (E)BNF for syntax specification

[Wir77b] and regular expressions as they are used in Lex [Les75]; even the

make utility [SMOO] controlling compilation (of the compiler or interpreter)

can be mentioned here. Usually a language specification is structured into dif¬

ferent parts, each corresponding to one particular part of the compiler and

often each of these parts is given in a separate notation/language. From a soft¬

ware engineering point of view this makes sense, as such partitionings of speci¬

fications allow for separate and parallel development - and for reuse — of

different parts of the compiler.In principle, a language specification can be partitioned/modularised in two

different ways:

1. It is split into transformation phases like scanning, parsing, code generation

etc. This corresponds to the well known compiler architecture; we will call it

the horizontal partitioning scheme.

2. It is described language construct by language construct. Each of these con¬

struct descriptions, we call them Montages, contains a complete specifica¬

tion from syntax to static and dynamic semantics. We call this approach the

vertical partitioning scheme.

We will consider the pros and cons of both approaches in the following sec¬

tions.

3.1.1 Horizontal Partitioning

The conventional approach of partitioning a compiler into its compilation

phases is very successful. It has been established in the 1960s along with the

development of regular parsers. The idea was to split the compilation process

into two independent parts: the front-end and the back-end fig. 5). The

front-end is concerned with scanning, parsing and type checking while the

back-end is responsible for code generation.

Advantages: This partitioning is well suited for large languages with highly

optimized compilers, and it meets exactly the needs of compiler constructors

like Borland and Microsoft, having numerous languages in offer. The separa¬

tion of front- and back-end allows them to build several front-ends for differ¬

ent languages, all producing the same intermediate representation. From this

In fact, such specification languages are good examples for "little languages" or DSLs.


Figure 5: The typicalphases ofa compilerpartition its design horizontally

intermediate representation, different back-ends can generate code for different

operating systems and microprocessors. The intermediate representation serves

as a pivot in this design and reduces complexity2 from O(l-t) to 0(l+t) where /

is the number of languages and t the number of target platforms.This kind of modularity allows for fast development of compilers of new

languages or of existing languages on new platforms. Only specific parts of the

compiler have to be implemented anew, the intermediate representation is the

pivot in this design. Even within the front- and back-end phases, modules

might be exchanged. An example is the code optimizer that could be replacedby an improved version without altering the rest of the compiler.The availability of tools supporting the horizontal partitioning approach

simplifies the task of building a compiler (seeReiated Work in section 6.4 for an

overview). Most of these tools support only front-end construction, but some

also offer support on code generation. Especially code optimization is hard to

automize due to the variety of hardware architectures. The horizontal parti¬

tioning allows to perform optimization on the entire program. Many optimiza¬tion techniques - like register colouring, peep-hole optimization or loopunrolling - cannot be applied locally or per construct as their mechanisms are

Here, the complexity of managing all the languages and target architectures is meant


based on a more global scale. For example register colouring will examine a

block or a whole procedure at once. As everything can be done within the same

phase, no accessibility problems (due to encapsulation and information hiding)have to be solved.

Disadvantages: Each module of a traditional compiler contains specificationsfor all language constructs corresponding to its phase. I.e. a module specifies

only a single aspect of a construct (e.g. static semantics) but it does this for all

constructs of the language. The complete specification of a single languageconstruct is thus spread over all phases. Therefore, the horizontal partitioningis not well suited to experiment with a language, i.e. to generate various ver¬

sions to gain experience with its look and feel. In general, applying only minor

changes to a language can be very costly, especially if the changes affect all

phases of the compiler. This is usually true if new language features and/or new

language constructs are added. There are roughly three different levels of

impact a change can have on a horizontally partitioned compiler:

1. Only one single phase is affected: Examples would be minor changes in the

syntax, like replacing lower case keywords with upper case keywords, or an

improved code optimizer.2. Some, but not all phases are affected: This is the case if language constructs

are introduced that can be mapped to the existing intermediate format. As

an example, consider the introduction of a REPEAT until loop into a lan¬

guage that knows already aWHiLE-loop. In this case, the change will have an

impact on the front-end phases, while the back-end remains unchanged.3. All phases have to be changed: This is the case if the changes in the language

constructs cannot be mapped to the existing intermediate representation. An

example is the introduction of virtual functions in C in order to enhance the

language with dynamic binding features. The code generator now has to

emit code that determines the function entry point at run-time instead of

statically generate code for procedure calls.

All these changes have in common that they potentially have an impact on the

whole language. Even if only a single phase is affected, the change has to be

carefully tested on undesired side-effects, which cannot be precluded due to the

lack of encapsulation of language construct spécifications within a phase.

3.1.2 Vertical Partitioning

Instead of modularizing a compiler along compilation phases, each languageconstruct is considered a module (Fig. 6). Such a module — we call it a Mon-


4—'

c ^_C

•—» <D COCD

c

CD E c

0O E

CDE

CD4—» E CD

COCD

CC

CO•—•

COc 3

•a CO

0)1

_CD1c

w

<

CDÜO

£ Q_ CD

DC

Montage Component System

Figure 6: Verticalpartitioning along language constructsplugged into the MontageComponent System

tage - contains a complete specification of this construct, including syntax as

well as static and dynamic semantics.

Vertical partitioning of a compiler is very similar to the way beginners do

first translations^. They try to identify the main phrases of a sentence. Then

each phrase is parsed into more elementary ones until, finally, they end with

single words. Now each word can be translated and the process is reversed,

combining single words to phrases and sentences. Our approach supports this

idea in a similar way. A program is subdivided into a set of language constructs,

each of them specified in a single module. Once the program is broken up into

these units, translation or execution is simple, as only a small part of the whole

language has to be considered at once. Then these modules are re-combined

using predefined interfaces. Section 3.3 will elaborate in more detail how this is

done.

Advantages: A vertical partitioning scheme is very flexible with respect to

changes in a language. Modifications will be local, as they usually affect only a

single module. As each module contains specifications for all aspects of the

compilation process for a certain construct, unintended side-effects with other

modules are very unlikely. This is in contrast to the horizontal partitioningscheme, where side-effects within a phase potentially affect the whole language.As an example, we mentioned already the introduction of virtual functions in

This applies both to the translation of natural languages and to the act of understanding a pro¬

gram text when reading.


C, affecting all phases ofthe compiler. As the phases do not feature any form of

further modularization, unintended side-effects can easily be introduced.

Component frameworks like Java Beans or COM allow to build new sys¬

tems by composing pre-compiled components. The vertical partitioning sup¬

ports this approach, as Montages are compiled components which can be

deployed in binary form. This in turn opens many possibilities in marketing of

language components. As they do not have to be available in source form —

which is the case with conventional horizontally partitioned compiler specifica¬

tions - the developer does not have to give away any know-how about the

internals of the language components.

Not only the developer profits from pre-compiled language components, the

user does so too. Combining them and testing the newly designed languages is

much less complex than testing all phases of a conventional compiler. New

groups of users have now access to language design, as the learning curve is flat¬

tened considerably. In the best case, a new language can be composed com¬

pletely from existing language components. Normally, however, language

construction is a combination of reusing components and implementing new

ones. In this case, the amount ofeffort is still reduced, as with the availability of

a language component market, the need for implementing new components

decreases over the time.

Disadvantages: Construction of an efficient and highly optimizing compiler

will be very difficult with our approach. Optimization relies on the capability

to overview a certain amount of code, which is exactly what vertical partition¬

ing is not about. Ease of use and flexibility of deployment are of primary inter¬

est in our system.

For main stream programming languages that have a huge community, effi¬

ciency can be achieved by using conventional approaches, preferably in combi¬

nation with our system: Language composition supports fast prototyping and

is used to evaluate different dialects of a new language. When this process con¬

verges to a stable version, then an optimizing compiler can be implemented

using the phase model.

3.1.3 Static and Dynamic Semantics ofSpecifications

A language specification contains static and dynamic semantics in a similar

manner as a program does (in fact, language specifications often are programs:

either compilers or interpreters). The partitioning schemes as described above

are part of the static semantics of language specifications. They define how

specifications can be composed, not how the specified components interact in

order to process a program.


It is important to distinguish between the structure of the specification and

the structure of the transformation process that turns a program text into an

executable piece of code. In a horizontal partitioning scheme, modularization is

done along the transformation phases, i.e. each phase completes before the next

can start. In other words, partitioning scheme and control flow during compi¬lation are the same.

In a vertical partitioning scheme, a program text is transformed in a similar

manner as in conventional compilers: phase after phase. It is simply a causal

necessity to parse a program text before static semantics can be checked which

in turn has to be done before code generation. But control flow switches

between the specification modules during all phases of compilation.

3.2 Language Composition

A note to those familiar with compiler-compilers: Usually these tools rest on a

specification of the translations. A Montage specifies also the behaviour of a

language construct, but in a purely operational manner. Therefore, translations

are programmed rather than specified.

3.2.1 The Basic Idea

Modularity available in the specification of a programming language will be

destroyed by most compiler construction due to their horizontal partitioningscheme (for a detailed discussion see section 3.1.1). Modularity and composi-

tionality of a programming language is therefore only available to compilerimplementors, but not to the users of a compiler, i.e. to the programmers. Of

course, the programmer is free to use only a subset of a given language, but he

will never be able to extend the language by his own constructs or to recombine

different languages to form a more powerful and problem specific language.The Montage Component System will provide these features to the pro¬

grammer. It allows to separately specify language constructs. From such a spec¬

ification, a language component can be generated that is pluggable to other

language components, and therefore suited to build new languages on a plugand play basis.

In contrast to existing compiler compilers, reuse of specifications can be

applied on a binary level rather than on a source text level. MCS generates

compiled software components from the specifications. These can be distrib¬

uted independently from any language. Each component is provided with a set

of requirements and services that can be queried and matched to other compo-

303 Composing Languages

nents. Nevertheless pre-compiled language components need a certain flexibil¬

ity, e.g. the syntax has to be adaptable or identifiers need to be renamed in

order to avoid naming collisions.

Creation of a new programming language is a tool-assisted process. It is pos¬

sible to build a language from any combination of newly specified and existing

components. The latter could e.g. be downloaded from various sources on the

Internet. The system will check that all requirements of the components are

met and prompt the user for missing adaptations. If there are no more reported

errors and warnings, the new language is ready to use, i.e. an interpreter is

available.

How does such a system work in detail? An in-depth description of the

Montage Component System is given in chapter 4 and details about its design

and implementation in chapter 5. To understand our approach in principle,

section 3.3 below should be read first, as it provides an introduction to the

Montage approach.

3.2.2 On Benefits and Costs ofLanguage Composition

The established methods have successfully been deployed for over two decades.

They produce stable, efficient and well understood compilers and interpreters.

Unfortunately they are rather rigid when considering changes. Especially dur¬

ing the design phase of a new language, a greater flexibility is desirable. For

example, it should be possible to produce and test dozens of variants of a new

language before it is released.

The most important aspect of language composition is that pre-compiled

components can be reused. The advantages can be summarised as follows:

1. Language composition is now accessible to non-experts in the field of pro¬

gramming language construction. Pre-compiled components will be

designed, developed and distributed by experts. Languages composed of

such components profit from the built-in expert knowledge.

2. The development cycle for a new language can be drastically reduced. On

the one hand because the pre-compiled components need no further testing

and on the other hand due to the abbreviated specification-generation-com¬

pilation cycle: the pre-compiled components need no further compilation.

3. Reuse is done on a binary level, reducing the possibility of text copying

errors. In combination with the limited impact that a component has on the

whole language, this again results in a more reliable and flexible language

design method.

This list may give rise to some questions and objections, which need to be dis¬

cussed:


Who composes languages? Programming language design and implementationwill be simplified, such that it will be applicable for non-experts in the field of

programming language construction. Is this desirable? Should this domain not

be reserved to experts?Similar debates were held about operator overloading (e.g. in C++). Should

the programmer be given the opportunity to alter the meaning of parts of the

programming language? Although language construction is much more power¬

ful than operator overloading (which does not add any fundamental power to

the language — it is just a notational convenience), the basic question remains

the same: Should the programmer have the same rights and power on the lan¬

guage than the language implementor?The effort of creating a new language is still great enough, to think about it

thoroughly, before one indulges in language composition. Of course there will

always be designs of lower quality. But these will eventually vanish, as they will

not be convincing enough to be reused in further languages. Only the best

designs will survive on a long term, because their components will be reused in

many different languages. Simonyi compares this process with the survival of

the fittest in biological evolution [Sim96].

We foresee two main areas where vertically partitioned systems - like MCS -

are particularly suitable: education on the one hand, and design and implemen¬tation of domain-specific languages (DSLs) on the other hand:

1. In education, an introduction to programming can be taught using a very

simple language in the beginning, and then refine and extend it stepwise.This solves a typical dilemma in programming courses: Teaching either a

main stream programming language from the beginning or starting with a

didactically more suited language. The first approach faces the problem that

one has to cope with the whole complexity of e.g. C++ from the very begin¬ning. The second approach wastes a lot of time introducing a nice languagewhich will not be used later on. A further effort then has to follow to teach

the subtleties of a main stream language.Using MCS, a teacher can start teaching C++ with a subset of the languagethat is simple and safe. Then step by step he can refine and extend the lan¬

guage, using new Montages or refined versions of existing ones. During the

whole course, basically the same language can been taught; the transitions

from one complexity level to the next are smooth.

The system can be used to its full flexibility: For introductory courses, readymade Montages are available, they only need to be plugged together. Ateacher does not need to have any knowledge of compiler construction, the

students will use only the end-product, namely the compiler or interpreter,but they do not need to understand our system. In more advanced courses,


teachers may explain details of a language construct by showing the corre¬

sponding Montage. And in compiler construction courses, the system can be

used by the students to build new languages themselves.

2. Domain specific languages are typically small languages that have a limited

application area and a small community ofprogrammers. Well known exam¬

ples are Unix tools as sed, awk, make, etc. [Sal98].In some cases, they are

only employed one single time (e.g. to control data migration from a legacy

system to a newer one). In such cases it is not worth constructing optimizing

compilers or inventing a highly sophisticated syntax. Often, such languages

have to be implemented within very tight time bounds. In all these scenar¬

ios, language composition offers an interesting solution to the problems.

Creating a new language might be done within a few hours. Reuse of exist¬

ing language components simplifies development and debugging and

reduces the fault rate in this process [KutOl].

The flexibility of phase model approaches is sufficient for language

development. It is possible to produce dozens of compilers with lex and yacc

too. Of course, it is possible to generate compilers for numerous variations of a

language. But normally this is a fairly sequential and time consuming process

because careful testing has to guarantee no unwanted side-effects. Especially in

education and DSL design the flexibility available in traditional phase model

approaches might not be sufficient.

Student exercises in language design or compiler construction are a good

example to elaborate on this statement. During such a course, students nor¬

mally have to implement a running compiler for a simple language in the exer¬

cises. The lack of sufficient modularization within a compiler phase, forces the

student to an "all or nothing" approach. He has to specify and generate the

phase in its full complexity (i.e. all language constructs at once) in order to have

a running system that can be tested. There are a myriad of possibilities for mak¬

ing mistakes. This is not only discouraging for students but also for their tutors

which have to assist them. Montages could be used in such courses to improve

modularity in the student projects. Once a Montage has been compiled suc¬

cessfully, it can be reused without alteration in future stages of the student

compiler. Changes in one Montage do have limited effects on others and thus

debugging will become easier for both, student and tutor. Success in learning is

one of the most motivating factors in education [F+95]. The time gained can

be used to deepen the students insight into the subject, to broaden his knowl¬

edge by being able to cover more subjects or simply to reduce stress in educa¬

tion.

Experimenting with a language will normally improve its design and its

expressiveness but experimenting takes time. The reduced development cycle


time is another reason to prefer vertically partitioned systems over phase model

approaches. As the time it takes to generate a single version of a language is

reduced, it is possible to either develop languages faster or to generate more

variants of a language before deciding on one. Faster development is interestingto industrial DSL designers, whereas experimentation is of advantage to both,

students and professionals. Students will get a better understanding if they are

able to easily alter (parts of) a language and professionals will profit from the

experience they gained during experimentation.

How about pre-compiled compiler phases? Wouldn't this improve the

performance ofestablishedapproaches?]?re-compi[ed compiler phases would

only make sense in a few areas and in general they would even complicate lan¬

guage specifications as is illustrated with the following examples. Changing the

syntax of a while statement from lower to upper case keywords would be easy,

as only the scanner phase is involved. But suppose a simple goto language that

shall be extended with procedures. Changes in the scanner and parser phase are

obvious, but also the code generator needs to be redesigned, as the simple goto

semantics probably would not suffice to model parameter passing and subrou¬

tines efficiently. Pre-compiled compiler phases would be a hindrance in this

case. The language designer would need to have access to the sources of the

pre-compiled parts, in order to re-generate them. With MCS, the same prob¬lem would be solved by introducing some Montages that specify the syntax and

behaviour of procedures. Changes to existing Montages can be necessary as

well, but they will be implemented elegantly by type-extending existing Mon¬

tages.

No testing ofpre-compiled components?Of course, interaction of components

in a newly composed language has to be tested. But these tests happen on a

higher level of abstraction, closer to the language design problem. Testing of

component internals does not have to be considered any more. Componentsinteract only through well defined interfaces which restrict the occurrence of

errors, simplify debugging, and accelerate testing in general.

3.3 The Montages ApproachOur work is based on Montages [KP97b, AKP97], an approach that combines

graphical and textual specification elements. Its underlying model are abstract

state machines [Gur94, Gur97]. The following overview introduces the basic

concepts and ideas of Montages in order to provide a better understanding of

the following chapters. Readers familiar with Montages may skip this section


and continue with chapter 4. Detailed information on Montages can be found

in [KP97b, AKP97] and in Kutters thesis [KutOl].

3.3.1 What is a Montage?

A complete language specification is structured in specification modules, called

Montages. Each Montage describes a construct of a programming language by

«extending the grammar to semantics». A Montage consists of up to five com¬

ponents partitioned in four parts (Fig. 7 shows Java's conditional operator

1. Syntax: Extended Backus-Naur Form (EBNF) is used to provide a context-

free grammar of the specified language L. A parser for L can be generatedfrom the set of EBNF rules of all Montages. Furthermore, the rules define in

a canonical way the signature of abstract syntax trees (ASTs) and how parsed

programs are mapped onto an AST. The syntax component is mandatory,the following components are all optional.

2. Control Flow and Data Flow Graph: The Montage Visual Language(MVL) representation has been explicitly devised to extend EBNF rules to

finite state machines (FSM). Such a graph associated to an EBNF rule

defines basically a local finite state machine. Each node of the AST is deco¬

rated with a copy of the FSM fragment given by its Montage. The reference

to descendents in the AST defines an inductive construction of a globallystructured FSM. Control flow is represented by dashed arrows. Data may be

stored in attributes of Montage instances (in our example, the attributes

staticType and value are defined for every Montage).Control flow always enters a Montage at the initial edge (I) and exits at

the terminal edge (T). Control flows may be attributed with predicates. E.g.one control flow leaving from the branching node in Fig. 7 shows the predi¬cate cond. value = true. Branching of control flow may only occur in

terminal nodes. This is due to the condition that there is only one control

flow leaving each Montage (T). The default control flow is indicated byabsence of predicates.

3. Static semantics: A transition rule that does static analysis can be provided.Such rules may fire after successful construction of an AST for a given pro¬

gram.

In Fig. 7, the static type of the conditional operator is determined duringstatic analysis, which is — in this case — not a trivial task. To enhance read¬

ability, macros can be used (e.g. CondExprType).


ConditionalExpression ::= ConditionalOrOption "?" Expression":" ConditionalOption

S-ConditionalOrOption.value« S-Expression

S-ConditionalOrOption result > - -»-J

S-ConditionalOption

staticType := CondExprType(S-Expression, S-ConditionalOption)

condition S-ConditionalOrOption.staticType = "boolean"

©result:

if (S-ConditionalOrOption.value) then

value := S-Expression.valueelse

value := S-ConditionalOption.valueendif

Figure 7: Montagefor theJava conditional expression

4. Conditions: The third part contains post conditions that must be estab¬

lished after the execution of static analysis. In our example, static type check¬

ing occurs.

5. Dynamic Semantics: Any node in the FSM may be associated with an

Abstract State Machine (ASM) [Gur94, Gur97] rule. This rule is fired when

the node becomes the current state of the FSM. ASM rules define the

dynamic semantics of the programming language.In the fourth section of Fig. 7, the ASM rule specifies what happens at run¬

time when a conditional operator is encountered. Rules in this section are

always bound to a certain node in the FSM. The header of each rule (here:

@result) defines this association. Note that there may be also predicatesdefined in thegraphical section which are evaluated at runtime.

3.3.2 Composition ofMontages

The syntax of a specified language is given by the collection of all EBNF rules.

Without loss of generality, we assume that the rules are given in one of the two

following forms:

A::=BCD (1)

E = F\G\H (2)


The first form defines thatA has the components B, C and D whereas the

second form defines thatE is one ofthe alternatives F, G or H. Rules ofthe first

form are called characteristicproductionsand rules of the second form are called

synonym productions. Analogously, non-terminals appearing on the left-hand-

side of characteristic rules are called characteristic symbols and those appearing

in synonym rules are called synonym symbols. One characteristic symbol is

marked as the start symbol. It must be guaranteed (by tool support) that each

non-terminal symbol appears in exactly one rule as the left-hand-side.

The two forms of EBNF rules also determine how a language specification

can be constructed from a set of Montages by putting them together.

1. A Montage is considered to be a class^ whose instances are nodes in the

abstract syntax tree. Terminal symbols in the right-hand-side of the EBNF,

e.g. identifiers or numbers, are leave nodes of the AST (represented by ovals

in MVL), they do not correspond to Montages. Non-terminals, on the other

side, are (references to) instances of other Montage classes. Such attributes

are called selectors and are represented by a rectangle and a prefixing «S-».

Each non-terminal in a Montage may have at most one in-going and one

out-coming control flow arrow. This rule allows in a simple way to compose

Montages: The referenced Montage's /and Narrows are connected with the

in-going and out-coming control flow arrow respectively.

2. When using sub-typing and inheritance, synonym symbols can be consid¬

ered as abstract classes. They cannot be instantiated but provide a common

base for their right-hand-side alternatives.

After an AST has been built upon a program input, static semantic rules

may be fired. The idea is to «charge» all rules simultaneously and trigger their

firing by the availability of all occurring attributes (i.e.^ undefi. In our example

the static semantics rule can only be fired when all referenced attributes become

available, i.e. attribute staticType, isConst and value of S-Expres-

sion and S-ConditionalOption are defined. As soon as all attributes for

some Montage become available, the firing begins. In this process further

attributes may be computed and so the execution order is determined automat¬

ically and corresponds to a topological ordering of rules according to their

causal relations. This approach was adopted from [Hed99]. Eventually all static

semantics rules are fired. If not, an error occurs, either because execution of

some rules was faulty, or because one or more attributes never got defined dur¬

ing the firing process. Usually the latter indicates design flaws in the language

specification.

4 in the sense of e.g. a Java class

5 used in the macro CondExprType that is not shown here.


Another approach would be to predetermine the execution order, e.g. a pre-

order traversal of the AST. Experience has shown that one predeterminedtraversal often does not suffice. This problem can be solved, but it leads to

clumsy static semantics rules because they have to keep track of «passes».

Once static analysis has successfully terminated, the program is ready for

execution. Control flow begins with the start symbol's/arrow. When encoun¬

tering a state, its dynamic semantics rule is executed. Control is passed to the

next state along the control flow arrow whose predicate evaluates to true. Such

predicates are evaluated after executing the rule associated to the source node.

The absence of a predicate means either true (only one control flow arrow) or

the negated conjunction of all other predicates leaving the same source node.

This scheme of local control flow has its limits, e.g. when describing (virtual)method calls (Fig. 8). Neither is the call target local to the calling Montage(with respect to the parse tree), nor can it be determined statically. In such

cases, non-local jumps can be used. They are distinguished from normal con¬

trol flows by the presence of a term which is evaluated at runtime to compute

the jump target. Moreover, the box representing the jump target is no selector

and is therefore not marked as such. In Fig. 8 theMethodDeclaration

box represents the class MethodDeclaration and the term Dispatch com¬

putes the appropriate instance at run-time.

File Edit ( Windows =d

ojOjMethodlnvocation ::= Nameldent "<" [ Expression <

LIST

Expression > ] ">"

1 S-ExpressionDispatch(S-NameIdent,Name, dynamicType)

~ ~ " ~

,

y

MethodDec1arat1on

staticType :=LookUp(S-NameIdent,Name),staticType

y//LookUpgets

Montageinstanceforgivenname*//dynamicTypereflectscurrentobject'stype//Dispatchdoeslatebindingcondition^LookUprS-Nameldent.Name)"T=Tindef)"and——-—j(foralleinlistS-Expression:e.staticType=LookUp(S-NameIdent,Name).forma1Param<e),statlcType')Figure8:Montageformethodinvocations(screenshotofGemlMextool)


The closest related work to Montages is the MAX system [PH97a]. As Mon¬

tages, MAX builds upon the ASM case studies for dynamic semantics of imper¬

ative programming languages. In order to describe static aspects of a language,

the MAX system uses occurrence algebras, a functional system closely related to

ROAG [Hed99].

A very elegant specification of Java using ASMs can be found in [BS98].

This specification abstracts from syntax and static semantics but focuses on

dynamic semantics. Its rules are presented in less than ten pages.

Chapter 4

From Compositionto Interpretation

This chapter describes the concepts behind the Montage Component System.

Algorithms and data structures are discussed in an abstract form which is neu¬

tral with respect to a concrete implementation in any specific language or com¬

ponent framework. Implementation details are discussed in the next chapter.Those readers not familiar with the Montage approach should read the preced¬ing section 3.3 first in order to get an introduction. More in-depth information

can be found in [AKP97, KutOl]. These publications give a well-founded

description of Montages. Its mathematical background is based on abstract

state machines (ASM).

For reasons discussed in section 2.3 we focus on an implementation using a

main stream programming language. Simplicity, composability and ease of use

are our main goals, and, in combination with our different formalism (Javainstead of ASMs), this explains that our notion of a Montage differs in some

details from the original definition. Therefore, we first present some definitions

that render a notion of a Montage in MCS. These definitions are implementa¬tion independent, although they are given with an object-oriented implemen¬tation and a component framework (such as they are discussed in section 6.5)in mind.

After an overview over the process of transforming Montage specificationsinto an interpreter, its single phases are described in detail throughout the rest

of this chapter. Deviations of our approach from the original Montageapproach are indicated at the appropriate places.

4.1 What is a Montage in MCS?

The following definitions are provided with regard to an implementation and

reflect the necessary data structures that are used to implement the system. We

39

40 4 From Composition to Interpretation

will refer to these definitions and give corresponding class declarations in Java

when discussing the implementation in the next chapter. Montages - although

entities ofcomposition — never can be executed on their own. Only when being

member of a language, they can be deployed conveniently. Therefore we start

by defining our notion of a language.

4.1.1 Language and Tokens

A language L = (M, T) consist of a set of MontagesMand a set of tokens T. A

token tok = (regexp, type) is defined as a pair ofa regular expression regexp defin¬

ing the micro syntax and a type indicating into which type the scanned string is

to be converted. Tokens are either relevant t e Try i.e. they will be passed to

the parser or they are skipped1: t Tskip ^rlv an<^ ^skip are disjoint sets:

Trlv n Tskip = 0. T denotes the set of all tokens of a language:

T = Trlv u Tskip.

4.1.2 Montages

A Montage m = (sr, Pm cfg) can be defined as a triple consisting ofa syntax rule

sr, a set of properties Pm and a control flow graph cfg. Fig. 9 shows a graphical

representation of a Montage.

A syntax rule sr = (name, ebnf) consists of a name (the left hand side or the

production target) and a production rule ebnf(the right hand side). The ele¬

ments of a production rule, which are of interest in our context, are terminal

symbols, nonterminal symbols, repetition delimiters (braces, brackets and

parenthesis) and synonym separators (vertical lines). The complete definition

of the Extended Backus-Naur Form can be found in [Wir77b].

A property p = (name, value, Ref) is basically a named variable containing

some value. Associated with each property is a rule, specifying how its initial

value can be computed. In MCS, Java block statements (see [GJS96] for a defi¬

nition) are used to express such rules. They may contain several (or no) refer¬

ences r £ Refto other properties, possibly to properties of other Montages. A

reference represents a read access, writing to a property within an initialisation

rule is prohibited, (section 4.6 describes the use of properties in detail).

A control flowgraphisauniteddatastructureasitcontainsbothanabstractsyntaxtreefragmentandacontrolflowgraph.Thusitcanbedescribedasatri¬plecfg=(N,EasPEJcontainingasetofnodesA^,asetoftreeedgesEastandasetofcontrolflowedgesEcfAnodecaneitherbeanonterminal,arepetitionore.g.whitespacesandcommentswillnotbeofinteresttotheparserandare

skipped.

4.1 What is a Montage in MCS? 41

Syntax Rule

Example ::= A {B} "text".

Control Flow Graph

I- <n>

LIST

I--*B

1 o |

Propertiesname

x

Y

ruletype

int OtherName.X + DifferentName.Z

boolean true

Actions

@n:

{

int i = 0; // local variable decl

X = i; // access property

// additioal Java statements

}

Figure 9: Schematic representation ofa MontageSyntax Rule: an EBNFproduction

Control Flow Graph: united representation ofanASTfragment and controlflow information

Properties: variables initialized during static semantics evaluation

Action: dynamic semantics

an action. In the tree structure, repetitions may only occur as inner nodes,

actions only as leave nodes, nonterminals may be both. If a nonterminal node

is an inner node, then all its subtrees are part of the Montage that the nonter¬

minal represents. The graphical representation of control flow graphs uses

nested boxes to display the tree structure. This allows to layout the control flow

dependencies as a plain graph.Action nodes in the control flow graph are equivalent to properties. While

properties and their associated initialisation rules define static semantics, the

rules associated with action nodes define the dynamic semantics of a language.An action thus is defined similar to a property: act = (an, Ref). It is associated to

an action node an and it may also contain a block ofJava statements. The same


rules as for initialisation rules apply here, i.e. from within this block, access to

properties (read and write) is possible.

4.2 Overview

Montages have to be aware of each other in order to communicate and interact

during interpreter generation and program text processing. Five transformation

phases are necessary in the process from language specifications to an inter¬

preted program (Fig. 10).

1. In a first step, the Registration-, all Montages and tokens that are part of the

new language specification have to be introduced.

2. During the Integration phase, a parser is generated that is capable of reading

programs of the specified language. Simultaneously, consistency checks are

applied to the Montages, i.e. completeness of the language specification and

the accessibility of all involved subcomponents is asserted.

3. The parser is then used to read a program (Parsing) and to transform it into

an abstract syntax tree (AST).

4. In the next stage {Static semantics), dependencies between the nodes in this

AST are resolved by assigning initial values to all properties of all Montages.

5. Finally, the control flow graphs are connected to each other (Control Flow

Composition) and thus building a network of nodes that can be executed.

The first two phases specify the static semantics of the language specification.

This means that all necessary preparations that can be done statically are com¬

pleted after integration. Further processing is done by executing the specifica¬

tion, namely the phases three to five - the dynamic semantics of the language

specification.The five steps of the transformation process also imply a shift in focus from

Montages towards their subcomponents. This is also reflected in Fig. 10 by the

three major (intermediate) data structures that are generated during specifica¬

tion transformation (displayed in ellipses). As focus shifts from Montages to

their subcomponents (properties or control flow graphs), interaction between

the components gets more and more fine grained, the data structures thus are

increasingly complex.The following sections will provide a detailed description of the five trans¬

formation phases and the resulting data structures.

4.2 Overview 43

1. Registration/Adaptation

Montages & tokens

2. Integration

ProgL 3. Parsing

Source code of

program to execute ^

^V<g> <tf

l^-t.. -I Imo

DO

^> A

4. Static Semantics

5. Control Flow Composition

ÇompanéÉiiijiïÉlI

C:\>

Program interpretation

COo c

4= O£ "=<0 (0

E .20) s(/) o

CO o

c

o

(0o

Üd)a

(/)

(0o

c

(0

E

CO

E(Cc

Û

Figure 10: The transformationprocessfrom specifications to an executable interpreter.


4.3 Registration IAdaptation

Registering simply marks a Montage (or a token) as being part of a language

(Fig. 11). The most of the work performed in this phase is done manually by

the user. He has to adapt imported Montages to the new environment, i.e. to

the new language. This covers renaming of nonterminals, properties and

actions where necessary. Token definitions have to be given in this phase as well

for all variable tokens, e.g. identifiers, numbers, strings, whitespaces, etc.

Tokens for keywords can be generated by the system automatically (see integra¬

tion phase below).

W

2 _-

Mf

added by user

generated during

integration

Figure 11: Language L consisting ofa set ofMontages M and a set oftokens T.

If a language shall be composed of existing Montages, then in almost every

case, minor adaptations have to be performed, e.g. adjusting the syntax rule to

the general guidelines (as capitalized keywords for instance). Too stringent con¬

sistency checking of Montages in this phase would hinder a flexible Montage

composition as only compatible Montages would be allowed to join the lan¬

guage. We consider editing a Montage in a language context (rather than out of

context) as less error prone and thus more productive.

Apart from enforcing set semantics (i.e. no duplicate Montages or tokens in

a language) there are no consistency checks necessary. This loose grouping

allows for comfortable editing of Montages. A language manages a set of Mon¬

tages and tokens. It returns, upon request, references to Montages or tokens

being member ofthe language. It plays a central role in the integration phase as

it is the only place in a language specification where all member Montages and

tokens are known.

One of the Montages of a language has to be designated to be the starting

Montage. It is equivalent to the start symbol (a designated nonterminal sym¬

bol) in a set of EBNF rules specifying a language. The starting Montage will

4.4 Integration 45

begin the parsing process in phase 3 (Fig. 10). Registration has to ensure that

exactly one starting Montage is selected before transformation progresses to the

integration phase.

4.4 Integration

During the integration phase, tokens and Montages are integrated into a lan¬

guage specification. This requires parser and scanner generation as well as inter¬

nal and external consistency checks.

4.4.1 Parser Generation

For each Montage m e M, a concrete syntax tree est is generated by parsing its

syntax rule sr (Fig. 13 shows an example). A syntax tree reflects the structure of

the EBNF rule. Repetitions are represented by inner nodes, nonterminal and

terminal symbols by leave nodes. Note that the original syntax rule can alwaysbe reconstructed from est by performing an inorder traversal2.

The syntax trees of all Montages can be merged by replacing the nontermi¬

nals with references to the root of the syntax trees of their designated Mon¬

tages. This will result in a parse graph as parse trees may refer to each other

mutually (Fig. 12). The parser is now ready for use (see next section on parsing

phase).

4.4.2 Scanner Generation

Syntax rule parsing also generates tokens for terminal symbols. Such terminal

symbols, or keywords, are easily detectable as strings enclosed in quotationmarks. Each keyword encountered is added to the token set of the language.Keywords are fixed tokens, i.e. they have to appear in the program text exactlyas they are given within the quotation marks in the EBNF rule. In contrast, the

syntax of identifiers or numbers varies and can only be specified by a rule (a

regular expression) but not with a fixed string.After all parse trees have been generated, the complete set of tokens is

known. It is now possible to generate a scanner that is capable of reading some

input stream and returning tokens to a parser. We use scanner generation algo¬rithms as they are used in the Lex [Les75] or JLex [Ber97] tools.

In the tree representation we use in our figures this corresponds to a traversal from top to bottom.


ii\ii

T

"DO"

"END"

"IF"

"WHILE"

st: While

- "WHILE"

Condition

"DO"

Statement

"- "END" -.

st: Condition

t Odd

Relation ---

st: Statement

— Block ---

— If >

-While -

— Repeat --

While ::= "WHILE" Condition "DO" Statement "END".

Condition = Odd I Relation

Statement = Block I If I While I Repeat I...

Figure 12: Merging ofsyntax trees by resolving referencesfrom nonterminal symbols to

Montages andfrom terminal symbols to the token table.

4.4.3 Internal Consistency

Internal consistency is concerned with the equivalence between the concrete

and the abstract syntax tree of a Montage. The syntax tree generated from the

EBNF production reflects the concrete syntax est whereas the tree structure of

the control flow graph defines the abstract syntax ast for the same language

component (Fig. 13). If the structure ofest is not equivalent to the structure of

ast then the parser will not be able to map the parsed tokens onto the gwenast

unambiguously and thus will stop the transformation process.

Every nonterminal symbol and repetition in est must have an equivalent

node in ast. This equivalence can either be defined manually or semi automati¬

cally. It is not possible to identify equivalent nodes in both trees completely

automatically, Fig. 14 shows why: nonterminals symbol may have the same

name. Therefore it is impossible to automatically find equivalent nodes in the

control flow graph. E.g. in Fig. 14: is the first occurrence of "'Terni' in the

EBNF rule equivalent to the left or to the right "Term node in the control

flow graph? This example may seem obvious, but we will show that the answer

to this question is part of the language specification itself and cannot be gener¬

ated automatically.

4.4 Integration 47

Case ::= {"CASE" Expression "DO" [ StmtBlock ]}[ "DEFAULT" StmtBlock ] "ESAC".

LIOT-1

Expression - - *~T

i true

TOPT-2

StmtBlock~1

•OPT-3

StmtBlock~2

--(0)-*"

T

est

-{}

ast

> - LIST-1

StmtBlock-1

-[]StmtBlock~2

— "DEFAULT

— StmtBlock

"ESAC"-> equivalent nodes

Figure 13: EBNFproduction and controlflow graph with their respective tree representationsshown below. Structure ofthese trees andposition ofnonterminals have to match.

Manual definition of equivalent nodes (e.g. by selecting both nodes and

marking them as equivalent) is the most flexible solution to the problem of

multiple occurrences of the same name. It allows to define arbitrary nodes as

equivalent. Although it would not be wise to assign e.g. an EBNF nonterminal

symbol "Term' to a control flow node "Factor", manual assignment would not

prevent it. In addition to the production rule and the control flow graph, a

table showing the relation between the two trees would be necessary.


Add ::= Term AddOp Term.

AddOp = "+" I

Figure 14: Multiple occurrence ofthe same namefor a nonterminalsymbol.

As users normally will identify equivalent nodes by name, it is natural to

define equivalence as equality of names. This equivalence could be found auto¬

matically, but as we indicated above, this is not possible for multiple occur¬

rences of the same name.

The nonterminal symbols "Term" in the syntax rule can be distinguished

unambiguously by their occurrence (first and second appearance in text)

because an EBNF rule is given in a sequential manner. The same does not

apply to a control flow graph, although one could argue that the given control

flow would sequentialise the nodes. Although this is true, such a definition may

be too stringent. For our example in Fig. 14 this means that the evaluation

order of the two terms is restricted to left-to-right. A right-to-left evaluation

could not be specified!In some cases, control flow graphs represent a partial order and thus, no

unambiguous order of nonterminal nodes can be given. Fig. 15 shows such a

case. Implying from the annotation of the left edge that the left "Statement"

corresponds to the THEN-clause is dangerous, as it presumes some knowledge

about the dynamic semantics that is not available in the syntax rule.

We propose a semi automated approach to solve the problem of unambigu¬

ously identifying equivalent nodes of the concrete and abstract syntax trees. As

If ::= "IF" Expression "THEN" Statement

"ELSE" Statement "END".

!--- Expression

Expression,result •

S "S.

s,

Statemenl Statement

^ö*::^T

Figure 15: Unspecified evaluation order

4.4 Integration 49

mentioned above, the occurrences of nonterminals in the EBNF rule are

sequentialised by their appearance in the rule. For each nonterminal node in

the control flow graph we need to provide a number that indicates the appear¬

ance in the syntax rule. This number is 1 by default which simplifies the obvi¬

ous cases, as e.g. in Fig. 13. The first and only appearance of "Condition" in the

syntax rule is equivalent to the only "Condition" node in the control flow

graph. If there is more than one nonterminal node in eft with the same name,

then these nodes have to be enumerated in an unambiguous way, e.g. by

appending ~i where i indicates the ph appearance of this nonterminal in the

syntax rule. Fig. 16 illustrates an enumeration of the "Term" nonterminal

nodes, such that the resulting evaluation order is right-to-left.

Add ::= Term AddOp Term.

AddOp = "+" 1 "-".

---<jädcOh--H"1 Term~2 - - -** Term~1

Figure 16: Specification ofa right-to-left evaluation order using node enumeration

Repetitions are enumerated regardless of their kind (option, list or group).In the EBNF rule, only opening brackets are counted in the order of their

occurrence. Fig. 13 provides an overview over all these naming conventions.

We are now ready to specify what internal consistency is: a notion for equiv¬alence between concrete and abstract syntax trees which can be summarized as

follows:

Be Sc = (cp C2 -, c^ a sequence of nodes of est and Sa = (ap a^ ... a^ a

sequence of nodes oiast with m, n> 0, i.e. the sequences are not empty.

Sc was generated by an inorder traversal of est, where all terminal symbolswere ignored (i.e. skipped). Similarly, Sa was generated by an inorder traversal

of ast where all action nodes were ignored^. Furthermore we have a function

eqv: Sc —> Sa that returns the equivalent control flow node for a given syntax

tree node.

Thus, a concrete syntax tree est is equivalent to an abstract syntax tree ast if:

1. | Sc | = | Sa |, the number of nodes produced by the traversais are the same.

2. Vi,j: i, j > 0 a eqv(cj) - a: => / =j, equivalent nodes appearin

thesameorderinbothsequences.Additionally,allsubtreesofnonterminalnodeswhereskippedaswell.SuchsubtreesreflectthetreestructureoftheMontagedesignatedbythenonterminalnodeandthusareofexternalnature.

504 From Composition to Interpretation

4.4.4 External consistency

External consistency is concerned with the accessibility to (parts of) other

Montages. We have seen in Fig. 12 that Montages will be connected to each

other when building parse graphs. Furthermore, properties of Montages may

contain references to properties (of possibly other Montages) and the same ref¬

erence may also occur in the rules associated with action nodes. In order to

function properly, access to all referenced Montages or parts of it (e.g. root of

parse tree, properties) has to be guaranteed. In other words, the external con¬

sistency check has to assert that all referenced entities are available, i.e. accessi¬

ble for read or write (or both) operations. There exist two different kinds of

references to external entities in a Montage:

A. Textual references: If a nonterminal symbol is parsed, then its name has to

designate a Montage registered with L. If no such Montage can be found the

specification is not complete and thus it will be impossible to continue the

transformation process towards an interpreter forZ.

Similar rules apply for Montage properties. References to other properties

may appear in their initialisation rules. A dependency relation exists between

Montages Mj and M2 if an initialization rule in Mj contains a reference to a

property ofM2 Montages and their dependency relations span a graph that is

illustrated in Fig. 20a on p. 61. It is also possible to check whether the referred

properties are within the scope of the referring initialisation rule. The scope of

an initialisation rule is the Montage it is declared in, and all Montages that are

accessible via nonterminals from there. Let us illustrate this with a simple

grammar for an assignment (each line corresponds to a Montage):

Asg : := Ident "=

"

Expr.

Expr : := Term { AddOp Term }.

Term :: = Factor { MultOp Factor }.

Factor = Ident | Number | "(" Expr ")"-

Properties shall be defined as shown in Fig. 17. The initialisation rules are

implementing type checking. An error will be issued on checking the third

property, Expr. Error, as it tries to access the propertyAsg. TypeOK,which is

out ofscope. There is no nonterminalAsg in the Montage Expr nor is it possi¬

ble to construct a path from Expr to Asg by transitively access nonterminals,

e.g. the following would be legal:

Expr.Error := return Term~l.Factor~1.Type;

Note that Factor is a synonym rule and therefore does not have any proper¬

ties. Factor~l .Type actually accesses the Type property of the underlying

Montage (after the parse tree was built, see next section). Hence, a further test

4.4 Integration 51

Asg.TypeOK :~ return Expr.Type == Ident.Type

Expr.Type := if (exists(Term~2)) {

if (Term~l.Type == Term~2.Type) {

return Term~l.Type;

} eise {

return undef;

}

} eise {

return Term~l.Type;

}

:= return Asg.TypeOK;Expr.Error

Term.Type := if (exists(Factor~2)) {

if (Factor~l.Type == Factor~2.Type) {

return Factor~l.Type;

} else {

return undef;

}

} else {

return Factor~l.Type;

}

Figure 17: Property declaration with initialisation rules

should check, whether a property accessed in a synonym Montage is available

in all alternatives of the production. Further processing of properties has to be

done during static semantics analysis and is described in section 4.6.

B. Graphical references: Nonterminal nodes may contain further (nested) repe¬

titions and nonterminals. These refer to repetitions and nonterminals in the

Montage designated by topmost nonterminal. These nested nodes serve only as

source and target nodes for control flow edges. It is not allowed to add actions

to (nested) nonterminal nodes. Each Montage encapsulates its internal struc¬

tures such as properties and the control flow graph. Access is granted via well

defined interfaces. If actions could be added from outside it would violate

encapsulation and destroy modularity between Montages.The external consistency check is completed successfully if the nesting struc¬

ture of the subtree of a nonterminal node matches the designated Montage'sast. Equivalence between nested nodes and the ast of the designated Montage is

defined analogously to the internal consistency described above.


4.5 Parsing

We are now ready for the dynamic semantics part ofthe language specification,

i.e. to execute the specification in order to read and interpret a program. The

next step in the transformation process is parsing (step 3 in Fig. 10)

The parsing phase is responsible for reading a program and converting it to a

parse tree according to the given syntax rules from the Montages. Fig. 18 illus¬

trates this process with an example of a simple language.

Before going into details about the conversion of a program into a parse tree,

we have to select a suitable parsing strategy.

Grammar of L

Asg ::= Ident "=" Expr.

Expr ::= Term {AddOp Term }.

Term ::= Factor { MultOp Factor}.

Factor = Ident I Number I "(" Expr ")"

IParser

ÏProgram P in L

d = (a + b)

Parse tree of P

Ident: d Expr

Term

Factor

Expr

Term

Factor

Ident: c

Term

Factor

Ident: a

Term

Factor

Ident: b

Figure 18: Parsing transforms aprogram P ofa language L into a parse tree

4.5.1 Predefined Parser

Parsing is a well understood process and is easy to automate. This might be an

explanation why the Montage approach lacks the possibility to specify parser

actions or to get control during parsing in general. In the publications defining

the Montage approach [e.g. AKP97, KP97b, KutOl] parse tree building is

explained only as a mapping of concrete syntax (the programP) onto a parse

tree. No concrete definitions about the parsing method can be found. Further¬

more, no mechanism for intervention during the parsing is foreseen in these

4.5 Parsing 53

publications. From a users point of view this omission can be seen as both a

flaw in or a quality of Montages.On the one hand, the experienced language designer will miss of course the

tricks and techniques that allowed him to specify "irregular" language con¬

structs elegantly and compactly. Normally, these are context sensitive parts of a

grammar where additional context information (such as type information) is

necessary to parse them unambiguously. As there is no way to specify in Mon¬

tages how to resolve ambiguities, the language designer is forced to either

rewrite the syntax rules or to rely on standard resolving algorithms offered bythe underlying parser (if they are known at all). We will give some examplesbelow.

On the other hand, the lack of being able to specify irregularities can be seen

as a construction aid for the language designer. E.g. Appel advises that conflicts

"should not be resolved by fiddling with the parser" [App97]. The occurrence

of an ambiguity in a grammar is normally a symptom of an ill-specified gram¬mar. Having to resolve it by rewriting the grammar rules is definitely an advan¬

tage to the inexperienced language designer, as it forces him to stick to a

properly defined context-free grammar.

The question is: should there be a possibility to control the parser in Mon¬

tages? We decided against it for two reasons:

1. MCS shall stay with the original Montages as close as possible. Even without

sophisticated parsing techniques full-fledged languages such as Oberon

[KP97a] or Java [Wal98] could be specified using Montages.2. With regard to modularity and reuse of spécifications, the Montage

approach is in a dilemma: both, the possibility to specify parse actions as

well as the rewriting of syntax rules has its disadvantages.If parse actions were allowed, they would only apply to a specific parser

model (see below). One would have to stick to a certain parser (e.g. a LALR

parser) to enable reuse. In particular this would be the case if the parser

would be specified completely by the Montages (as it is done with the static

and dynamic semantics).

If there are no parse actions allowed, the language designer is forced to

rewrite the syntax rules in order to express them in a context-free manner. In

extreme cases this might result in not being able to reuse a Montage as it is,

because it leads to an ambiguity in the grammar.

In general, we think that the advantages of a predefined parser will outweighthe complexity one would have to deal with if self defined parse actions were

allowed.


The following discussion will analyse the two basic parsing strategies: bot¬

tom-up or shift-reduce parsing, such as LALR parsers, and top-down or predic¬

tive parsing, such as recursive descent parsers, with regard to Montages. For in-

depth introductions into these parsing techniques, we refer to [ASU86,

App97]. Both parsing approaches are applicable to Montages as shown by

AnlaufFs GEM/MEX (LALR parsing usingyace Qoh75] as a parser generator)

and our MCS (predictive parsing).The choice ofthe parsing technique decides on what classes ofgrammars can

be processed. Both parsing techniques have their pros and cons with regard to

ease of use, efficiency and parser generation.

4.5.2 Bottom-Up Parsing

The bottom-up approach reads tokens from the scanner (so called shift-opera¬

tion) until it finds a production whose right-hand-side (fhs) matches the tokens

read. Then these tokens will be replaced by the left-hand-side Qhs) of the pro¬

duction (which is called a reduce-operation). To be precise, the matching

tokens get a common parent node in the parse tree. The tree therefore grows

from its leaves towards its root, which corresponds to a growth from bottom to

top when considering the usual layout of trees in computer science (root at

top). During the construction of a parse tree, two conflicts may occur:

Reduce-reduce conflict: The parser cannot decide which production to choose

in a reduce operation. This will be the case if several Montages have the same

rhs in their syntax rule. One reason for this is that during registration the equal¬

ity of the two rhs was not noticed, a common mistake if complete sublanguages

are registered. An example for such a sublanguage was shown in Fig. 18. IfAsg

is imported as a self-contained sublanguage, then the Montages Expr, Term

and Factor will be imported as well. If there is already a Montage Expres¬

sion registered that contains the same rhs as Expr, there will be reduce-reduce

conflicts during parsing. In this case, we are grateful for this conflict, as the

related warning will draw our attention on this overspecification of the lan¬

guage.

Reduce-reduce conflicts do not only indicate overspecifications, but also

pinpoint context sensitive parts of the grammar. A typical example offers the

following portion of a FORTRAN like grammar. Note that each line corre¬

sponds to a Montage:

4.5 Parsing 55

Stmt = ProcCall | Asgn= Ident "(" ParamList ")"= Ident [ "(" ExprList ")= Ident {"," Ident}.= Ident {"," Ident}.

ProcCall

Expr

ParamList

ExprList

Unfortunately, the grammar is ambiguous as the following line

A(I, J)

can be interpreted as a call to A with the parameters I an J or as an access to

array A at location (I, J). This grammar is not context free of course, i.e.

only by regarding at the type declaration ofA it can be decided which produc¬tion to apply. In this case, the reduce-reduce conflict indicates a clumsy lan¬

guage design.The deployment of a standard parser generator as e.g.yacc [Joh75] or CUP

[Hud96] might be dangerous as they implement a (too) simple resolving strat¬

egy for reduce-reduce conflicts: the first rule in the syntax spécification is cho¬

sen. Montages cannot be enumerated and thus no order on input can be

guaranteed that will be obeyed during parser generation. Furthermore, the sec¬

ond rule (Montage) will fall into oblivion as it will never be chosen. This is an

unsolved problem in GEM/MEX which delegates parsing to aj/^cc-generatedparser.

Shift-reduce conflict: The second kind of conflict in shift-reduce parsers occurs

when it is undecidable whether to perform a shift-operation (read more tokens

from the scanner) or a reduce-operation (build a new node in the parse tree).The well known dangling-else such as in programming language Pascal or C, is

a good example to demonstrate a shift-reduce conflict:

If ::= "if" Expression "then" Stmt [ "else" Stmt ].

The following program fragment is ambiguous:

if a then if b then si else s2

It can be interpreted in two different ways:

(1) if a then {if b then si else s2}

(2) if a then {if b then si} else s2

Shift-reduce parsers will detect the conflict. Suppose the program is read up to

si. Now, without further information, the parser cannot decide whether to

reduce (interpretation 2) or to continue reading until s2 (interpretation 1). In

Pascal and C, an else has to match the most recent possible then, so interpre¬tation (1) is correct. By default, yacc or CUP resolve shift-reduce conflicts byshifting, which produces the desired result in the dangling-else problem of C or

Pascal.


4.53 Top-Down Parsing

The second method to parse a program text and to build a parse tree has its

pros and cons with respect to Montages too. Top-down parsers try to build the

parse tree from the root towards the leaves. The parser is structured into several

procedures, each of which is capable to recognize exactly one production rule.

Each of these procedures reads tokens from the scanner and decides upon their

type how to continue parsing. A terminal symbol is simply compared to the

expected input, lists and options will be recognized in the body of while loops

or conditional statements. But the most interesting is the recognition of

nonterminal symbols: it will be delegated, calling its corresponding procedure

in the parser. Asthe recognizing procedures can be called recursively (compare

with the parse graph constructed in the integration phase, Fig. 12, p. 46) and

because the syntax rules will be called from top to bottom^, such a parser is

called recursive-descent parser. Aswith the bottom-up parsers, we

have to men¬

tion two problems that top-down parsers impose on the Montage approach:

Left-Recursiveness: A grammar which is to be recognized by a top-down parser

must not be left-recursive. We will illustrate this by the following grammar:

ProcCall : := Ident "(" ParamList ")".

ParamList ::= { Ident "," } Ident.

If the parser encounters a procedure call such as

p(i) or r (i, j ,k)

then it will not be able to recognize its parameter list. The parser calls a recog¬

nizing procedure ParamList that will try to read all Idents and the succeeding

"," within a while loop. The problem is that the parser cannot predict whether

it should enter this loop at all, and if, when it has to exit the loop because the

first token in the repetition is the same as the one following it.

Lists and options have to be used carefully if they occur at the beginning of a

production rule. Fortunately, every left-recursive grammar can be rewritten to

be right-recursive [ASU86]. For our above example this would look like:

ProcCall ::= Ident "(" ParamList ")"•

ParamList : := Ident { "," Ident }.

As demonstrated here, rewriting (or left factoring as this method is called) can

often be done within the rule itself— no Montages other than ParamList is

affected. The ban of left-recursive productions can be a nuisance if Montages

The starting production is considered to be the topmost production. Then all nonterminals

appearing within this production are listed with their respective syntax rules below, and so on.

Hereby an order is generated that sorts productions from the most general one (starting produc¬

tion) down to the most specialised ones (tokens).

4.5 Parsing 57

are imported that where developed in a system with a bottom-up parser where

this restriction does not apply.

Lookahead: A second problem with top-down parsers is, that they cannot

always decide which production to choose next in order to parse the input. The

following fragment from the Modula-2 syntax [Wir82] shall serve as an exam¬

ple:

statement : := [ assignment | ProcCall | ... ] .

ProcCall ::= designator [ ActualParams ].

assignment ::= designator ":=" expression.ActualParams ::= "(" [ExpList] ")"•

Consider this program fragment as input:

a := a + 1

When the parser starts reading this line, it is expecting a statement. The next

token is a designator a which could be the beginning of the productions Proc¬Call and assignment. Which production should the parser choose now?

There are two ways to answer this question: either it tries to call all possibleproductions in turn5 or it pre-reads the following token and gets ": =" which

allows to identify assignment as the next production. A parser that tries all

possibilities is called a backtracking parser; pre-reading tokens is called looka¬

head and it avoids time-consuming backtracking.

4.5.4 Parsing in MCS

Considering our self imposed preconditions - like comprehensibility of the sys¬

tem and its processes, composability and compactness of a language - and the

open specification of Montages with regard to parsing, a top-down parser

seems more suitable than bottom-up parsing.

Top-Down Parsing: The algorithms for top-down parsing are easier to under¬

stand than those for bottom-up parsing. Shift-reduce parsers are monolithic

finite state machines., usually implemented with a big parse table that is steer¬

ing the recognition of token patterns. As the construction of such parse tables is

too much work to do by hand, the user has to rely on algorithms that are diffi¬

cult to comprehend. Error detection and error recovery are more complex to

implement in bottom-up parsers.

Top-down parsers, however, are subdivided into procedures each of which

can recognize exactly one syntax rule. Note that these procedures form a verti-

e.g. first production ProcCall: the next token must be an opening parenthesis"

("

which would

fit the ActualParams production. As there is no"

(" the parser has to step back and try the next

candidate production, assignment, where it is successful.


cal partitioning of the parser. Hence, the structure of top-down parsers is very

similar to MCS. Each Montage can implement a service that is able to exactly

recognize its own syntax rule. If efficiency is important, then lookaheads have

to be determined. This can be done automatically by analysing so called FIRST

sets[ASU86,Wir86].

From the point of view of a MCS user, a top down parser has its pros and

cons, as indicated in the discussion above. The most important rule - no left-

recursive grammars— is not as limiting as it may seem at first glance. Each left

recursive grammar can be transformed into a right-recursive one and in many

cases this is possible by just rewriting a single syntax rule. The parse algorithmis simple and corresponds the way a human will read a grammar.

If efficiency of parsing is not a major goal, then a backtracking parser even

allows for parsing of ambiguous grammars. The parser could be implemented

to ask for user assistance in the case of several legal interpretations of the input

program. User assisted parsing could be very useful in education, e.g. to dem¬

onstrate ambiguous grammars and their consequences for parsers and program¬

mers (all variants of different parse trees can be tested).

Parse Graph: In order to parse a program, MCS uses the parse graph con¬

structed in the integration phase (see Fig. 12). Each node (read Montage) in

this graph has a method that can be called to parse its own syntax rule. These

methods either return a parse tree (the subtree corresponding to the parsed

input) or an error (in the case of a syntax error in the program). The scanner

provides a token stream from which Montages can read tokens one by one.

Parsing begins at the parse method of the starting Montage. Note that in the

parse tree, each Montage occurs exactly once. As in every recursive descent

parser, control is transferred to the next parse method as soon as a nonterminal

token was read. Then the Montage corresponding to this nonterminal will take

over. When the construct was parsed successfully, the recognized subtree is

returned to the caller. Parse graph and external consistency guarantee that all

necessary Montages will be found during parsing. The parse tree returned to

the caller is basically an unrolled copy of (parts of) the parse graph. Its nodes

are instances of Montages that represent their textual counterparts in the tree.

We refer to these Montages as Instance Montages.

Instance Montages:An Instance Montage (IM) is a representative of a language

construct in a program. Template Montages (Montages as we described them

until now) serve as templates for IMs. They define the attributes of an instance

at runtime (i.e. the dynamic semantics of a language specification), and can be

implemented in two ways:

4.5 Parsing 59

Figure 19: Parse graph to controlparsing and resultingparse tree.

1. As copies of the template Montages. They will be created by cloning the

template. In this case, they feature all characteristics of the template Mon¬

tages, only that some of them will never be used, e.g. generating a parser,

checking internal and external consistency or the ability to parse a program.

2. As instances of new classes. The characteristics of such new classes are

defined by the template Montages. They have the advantage that only the

dynamic semantics of the specifications has to be present.

Fig. 19 illustrates the relations between Template Montagesana Instance Mon¬

tages.

Static semantics and dynamic semantics will be processed on IMs only.Additional characteristics of IMs concerning their implementation and deploy¬ment are explained in section 5.3. and section 5.2.3 provides a more detailed

insight in the implementation of the parser in MCS.


4.6 Static Semantics Analysis

4.6.1 Topological Sort ofProperty Dependencies

In order to initialize all properties we simply could fire all their initialisation

rules simultaneously^. This will result in some rules being blocked until the

properties they depend on become available. Other properties can be com¬

puted immediately. Fig. 20 illustrates initialization by means of three Montages

and their properties. Some rules depend on the results of others (e.g.Ml.a)

whereas some rules can fire immediately (in our caseM2 . c).

Before initialization starts, all properties are undefined, marked by the dis¬

tinct value undef. Static semantics analysis is completed when all properties are

defined, i.e. V pe P : p £ undef. A simultaneous firing of rules could end in a

deadlock situation if initialisation rules mutually refer to each other. To avoid

this situation, i.e. a system looping infinitely, it is advisable to check for circular

dependencies before executing initialisation rules. This can be done by inter¬

preting the properties and their references as a directed graph (digraph)G - (P,

R) that is defined by a set of vertices P (all properties of all Montages of a lan¬

guage L) and a set of directed edges R (all references contained in these proper¬

ties).Let P be the a set of all properties of a language L and let R be the set of all

references between the properties of P. We define a reference r = (s,t) as an

ordered pair of properties s e Psource and^E Ptarget with Psmrceand Pter^being

the set of reference sources and targets respectively.

We have to assert that G is a directed acyclic graph (dag)7. Fig. 20b shows

such a graph, where we inverted the direction of all references in order to get a

data flow graph. In our example, M2 .C is the only rule that can fire initially. Its

result triggers the computation ofMl. A and M3 .A etc.

Fortunately, there exists an algorithm that very well suits our needs, i.e. top¬

ological sorting:

6 In fact, we are describing our system based on a sequential execution model, "firing all rules in a

random order" would be more precise here.

7 Formally: Let path(a, b) be a sequence of properties pi, p2,... , pn, such that (pi, p2), (p2, p3),...

(pn-1, pn)e R. The length ofa path is the number of references on the path. A path is simple if all

vertices on the path, except possibly the first and last, are distinct. A simple cycle is a simple path of

length at least one that begins and ends atthesamevertex.Inadirectedacyclicgraph,thefollow¬ingholdstrue:\.-3r.((r,r)eR)2.\/r,s,t.((r,s)eRa(s,t)eR=>(r,t)eR)


M1

A = B + 2*M2.C

B = M3.A

^

M2

C = 42

D = A< M1.A

M3

A = M2.C + 3

E = M3.D ?2 undefrefers to

M1.A M2.D M3.E

M2.C

M3.A M1.B data flow

Figure 20: Relations between Montages and Propertiesa. dependencies between Montages imposed by initialisation rules

b. dataflow during initialisation ofProperties

1. It checks for cycles in a graph and

2. if no cycles are detected, then it returns an order in which initialisation rules

can be fired without a single rule being blocked because of missing results

If cycles can be found then static semantics cannot be executed. The initialisa¬

tion of the properties participating in the cycle have to be rewritten, therefore it

would be helpful, if a failed topological sort would return the offending refer¬

ence.

Successful execution of all initialisation rules does not imply successful com¬

pletion of static semantics analysis: The initialisation rules may explicitly set a

property to undef {sec. Fig. 17 and Fig. 21). The original Montage definition

features a condition part (see example in Fig. 7, p. 35) which contains a

boolean expression that has to evaluate to true. If this condition cannot be

established then program transformation is stopped.MCS does not contain such a condition part because the same result can be

obtained with a property. The condition shown in Fig. 7 can be expressed with

an initialisation rule in MCS as given in Fig. 21.

It is possible to assign to a property the distinct value otundef. According to

our definition for completion of static semantics, V pe P : p ^ undef one singleproperty being undefined will suffice to stop the transformation process.

Hence, after the topological sorting and execution of all initialisation rules, it is


i f (ConditionalOrOption.staticType

instanceof Java. lang.Boolean) {

return new Boolean (true);

} else {

}

return undef;

Figure 21: Initialising aproperty to undef

important to test whether all properties were set. In section 5.3.9 we will

present an algorithm that can perform static semantics analysis in 0(\P\ + \R\)

where again, \P\ denotes the number of all properties of all Montages and |K|

the number of all references between them. In other words, if cleverly pro¬

grammed, static semantics analysis can be embedded in a topological sort.

Related Work. Topological sorting of attribute grammar actions has been

described by Marti and Hedin.[Mar94, Hed99]

In the GIPSY project presented by Marti a DSL, GIPSY/L, is used to

describe relations between different documents, processes, and resources in

software development processes. GIPSY/L can be extended by users in order to

adapt the system to expanding needs. An extensible attribute grammar [MM92]

allows to specify actions that control these processes. Theorder in which such

actions are executed is determined by a topological sort along their dependen¬

cies.

Hedin describes reference attribute grammars that do not have to transport

information along the abstract syntax tree, but use references between the

attributes in order to access remote data more efficiently. A topological sort has

the same function as in our system: checking for cycles and determining an

order for execution.

4.6.2 Predefined Properties

Some properties are predefined, i.e. they are available in every Montage.

Their number is small in order to keep the system simple. Conceptually seen,

predefined properties are initialized in the parser phase already.

Terminal Synonym Properties:Terminal synonym productions as e.g. AddOp in

Fig. 14, p. 48 generate a property of the same name (^.ddOp in this case). This

property is of type String and contains the actual string that was recognized

by the parser during the parsing phase. In the example of thcAdd Montage in

Fig. 14, this would be either "+" or "-". Terminal synonym properties are read¬

only properties initialised by the parser during AST building.


Parent Property: Each Montage implicitly contains a property parent which

is a reference to the parent Montage in the AST. This reference will also be set

by the parser during AST building and is read-only too. Theparent property

allows navigation towards the root of the AST, whereas nonterminals allow to

navigate towards its leaves.

Symbol Table Property: The last of the predefined properties is a reference to a

symbol table, SymTab (see below). Again, this property is read-only but its ini¬

tialisation can be user defined. There is a default behaviour which copies the

reference to the symbol table from the parent Montage in the AST. Neverthe¬

less, for specific cases (e.g. when a new variable is defined) declaration can be

cached in the symbol table in the initialisation rule of the property.

4.6.3 Symbol Table

The symbol table plays an important role during the static semantics phase.Basically it is a cache memory for retrieving declarations. Although it would be

possible to use property initialisation to remember declarations in a subtree, it

would be a tremendous overhead (and an error prone approach) to hand these

references up and down the tree during static semantics evaluation [Hed99].The advantages and the use of symbol tables are best explained with an exam¬

ple:

Variable Declaration and Use: Let us have a closer look at variable declarations

and variable access in a program. We will give a (partial) specification of a sim¬

ple language that allows to declare variables in nested scopes. To simplify the

example, variables have an implicit type (Integer) and there is only one state¬

ment that allows to print the contents of a variable.

Given the following specifications:Prog : := Block.

Block ::= "{" {Decl} {Stmt} "}".

Decl ::= Ident ["=" Expr].Stmt = Print | Block.

Print ::= "print" Var.

Var : := Ident.

The Montages which are of interest here, Block, Decl, Print and Var are

given in Figures 23 through 26 respectively. Consider the following program:

{

int i = 2 ;

{

int i = 5 ;

print i;

}

}


a)

Symbol Table

key value

b)

Symbol Table

key value

°Prog

- SymTab

aBlock

^"~ —SymTab

unique

c)

Symbol Tabfe ^t- —

key valve

^Decl.

a Print

— SymTab- SymTab

name

„

1value

H Var

- SymTab

"K,isDeclared

value1 ° r*

Figure 22: Symbol table and abstract syntax tree

In this example we have two variable declarations which occur in nested scopes.

Both variables have the same name, i, but they have different values. When the

print statement prints the value of/' to the console, it will only see the inner

variable declaration, as scoping rules shadow the outer declaration. Thus, the

output of this program will be: 5

Fig. 22 shows the AST of the program above. First we want to focus on

node 7, a use of variable i. In order to provide access to the memory where the

value of / is stored, the Montage Var has to get the reference from the declara¬

tion (node 5). This non-local dependency between node 7 and node 5 can con¬

veniently be bridged by an entry in the symbol table. Whenever a variable is

declared, it is added to the symbol table with its name as the key for retrieval.


Later in static semantics processing, this variable will be used and its declara¬

tion (containing the reference to its place in memory) can be retrieved by que¬

rying the symbol table.

The symbol table is a component that exists independently of Montages and

the AST. Its life-cycle is restricted to the static semantics analysis as it will not

be used any more after all references are resolved. As mentioned above, every

Montage has a predefined property SymTab that refers to the symbol table. But

initialisation of this property cannot be done statically by the parser (as e.g. for

the parent property). The reason for this is the ambiguous meaning oïundefzsa result of a query to the symbol table.

Suppose Montage Var (node 7) is querying for the name / in the symboltable. As a result it gets undef. This could mean two things:

1. There was no declaration of a variable i

2. The initialisation rules of node 5 did not yet fire. They might do in the

future, but then it is too late for node 7.

At least, this scenario would stop and report an error. But suppose the outer

declaration (node 3) fired before node 7. Then querying the symbol table

would result in retrieving the outer declaration instead of the inner one. The

program transformation would continue and generate faulty code.

Therefore we have to impose an order on the initialisation. We can do this

by generating dependencies among the nodes. As the symbol table has to be

initialised as well, we can use the initialisation of the SymTab reference to gen¬

erate a correct initialisation order.

The symbol table will not change its contents at every node in the AST. So it

makes sense to define as a default behaviour to copy the reference from the par¬

ent node

SymTab : return parent.SymTab;

But this behaviour can be overridden by providing a different initialisation

rule. For example:

SymTab : SymbolTable st = parent .SymTab;

st.add(Ident.Name, this);return st;

A new entry will be added to the symbol table. It is a reference to the current

Montage** and it can be retrieved with the given key (ident. Name).Note that the symbol table has to be implemented such that it can cope with

multiple entries of the same name in different scopes. In our example, this

means that the symbol table has to distinguish between the different entries for

i and furthermore it has to offer a different view for different nodes. In Fig. 22

denoted by this, the Java reference to the current object.


Decl ::= Ident ["=" Expr].

- Optlnit -.

—Hnif)— ^T

!-_. Ident — *- Expr

Prop Type Initialisation

name String return Ident.value;

value Integer new lnteger(); // dummy value

Action

@init:

if (Optlnit.exists) value = Expr.value;

Figure 23: DeclMontage,variable declaration

there is only one single symbol table. To node 1 it is empty (a), nodes 2 and 3

see the declaration of the outer i (b) and the rest of the nodes will see the sym¬

bol table as it is displayed on the bottom (c). There are different implementa¬

tions possible that will meet all the requirements (see section 5.3.10).

Initialisation of the Instance Montages in the AST of Fig. 22 will happen

according to the initialisation rules of the following four Montage.

The Decl Montage specifies the actual declaration of a variable. For conven¬

ience purposes,the property name

is introduced. It is initialised by retrieving

the value of the token representing the identifier. The property value is the

most important property in a declaration, as it holds the value of the variable

during runtime. References to this property have to be established wherever the

variable is accessed. Initially, this property is set to some dummy value, as there

is no static evaluation ofthe initializing expression in this example. At runtime,

the variable's content has to be initialised to either the value of the expression

(if present). Nothing has to be done in absence of the initializer, because the

dummy value was already set during static semantics evaluation.

A Block is the syntactic entity of a scope. Variables declared within a scope

must not have the same names; this condition will be asserted* by the unique

property. The symbol table valid for this scope is built during initialisation of

the predefined SymTab property. First, the reference to the symbol table is

9 A Java set data structure is filled with all the names of the declared variables. The add operation

returns true if an name is new to the set.


Block ::= "{" {Decl} {Stmt} "}".

- Decl List -.- StmtList -i

Decl Stmt »-T

Prop

unique

Type

Boolean

SymTab Object

Initialisation

Set set = new HashSet();foreach decl in DeclList

if (!set.add(decl.name))return undef;

return new Boolean(true);

SymbolTable st = parent.SymTab;foreach decl in DeclList

st.add(decl.name, decl);return st;

Figure 24: Block Montage, container ofa scope

retrieved from the parent node, then all declarations are added with their

names as keys.The Var Montage shows the use of a variable. Note that this Montage does

only specify static semantics as there are no actions to perform at runtime.

Read and write accesses to the value property of Var will be specified by the

appropriate Montages, e.g. the Print Montage below. It is important that all

variables are declared prior their use, which is checked with thelsDeclared

property. SymTab denotes the predefined reference to the symbol table. As its

initialisation is not overridden, it will be the same as in its parent Montage.

Var ::= Ident.


isDeclared Boolean return SymTab(ldent.value)value Integer if (isDeclared) return SymTab(ldent.value).value;

else return undef

Figure 25: Var Montage, use ofa variable

The Print Montage finally shows how to access a variable's value at runtime.

Print is somehow an opposite to the Var Montage, as it does not specify any


Print ::=Var.

Action

©print:

System.out.println(Var.value);

Figure 26: Print Montage, prints the contents ofa variable to the standard output stream.

static semantics but only runtime behaviour. The action rule accesses the value

of the variable directly via its reference (the value property).

4.7 ControlFlow Composition

4.7.1 Connecting Nodes

In the last phase of the transformation process, the control flow of a program

will be assembled from the control flow graphs of the Instance Montages of the

AST. We will explain control flow composition by means of an example. Given

the CASE-statement of Fig. 13, p. 47 and the following code fragment:

CASE a < 10 Do Stmtl

CASE a>= 10 && a <=2 0 DO // nothing

CASE a > 20 DO Stmt2

ESAC

Then, the parser will build the AST given in Fig. 27. The nodes in the lower

levels of the AST in Fig. 27 display the program text they represent for conven¬

ience. The parser can already do a considerable amount ofwork concerning the

"wiring": it simply copies the control flow graph in a Montage with all its con¬

trol flow edges whenever an appropriate construct is encountered in the pro¬

gram text.

Fig. 27a shows all the connections between the nodes of the subtree of the

Instance Montage Case after parsing but before control flow composition.

Nonterminal nodes are placeholders for the entire control flow graph of

their designated Montage. At control flow composition, the nonterminal node

will be replaced by the entire control flow graph of the designated Montage. All

incoming edges of the nonterminal will be deviated to the initial node/ of the

4.7 Control Flow Composition 69

a) structure generated by parser Case

i a» LIST-1 OPT-3 0 -~*-T"**

i1T

4^

I---*. a< 10 .--^T/>--» 10<a<20 ...*.t\i-- .*- a>20 —*-T

\ ! >

,**"' \ w„-

^v *

f^ \ ?

OPT-3.-'*"

OPT-2-"'"'

OPT-3

1—+. StmtBlkl •--*T |--* StmtBlk2 --+-T

ter control flow composition Case

i LIST-1 OPT-3 0 — +~T

>^--

jLt '{

a< 10 ~~-^-—~ 10<a<20 a>20\.

k/\i ;

/ 1 Y .?S^f'

OPT-3 / OPT-2_---

-

OPT-3,*

Y/

Y

StmtBlkl StmtBlk2

Figure 27: AST built byparser

replaced graph and all outgoing edges will leave from the terminal node of the

Montage. Kutter illustrates this replacement excellently in [KutOl, chapter 3].

Repetition nodes indicate, that their contents (their subtrees) may occur sev¬

eral times in the program. The number of occurrences must be in a certain

range which is part of a repetition node definition. E.g. the definition of an

option allows a minimum ofzero and a maximum of one occurrence of its con¬

tents. The parser will check whether the number of actual instances is within

the given range and in addition it will build a subtree for each of these

instances. This is illustrated in Fig. 27 where all three occurrences ofcASE-partare attached to the LIST-1 node.


The contents of a repetition node specifies what these subtrees look like.

These subtrees or subgraphs (as they also are a partial control flow) are put

together by connecting the terminal node of the nfh instance with the initial

node of the n+Ph instance. All incoming edges of the repetition node will be

deviated to the initial node of the first instance and analogously all outgoing

edges of the list node will leave from the terminal node of the last instance

(illustrated in Fig. 27b).

Note that the repetition nodes are present regardless whether there is an

actual occurrence in the program or not. The second CASE-part and the

optional DEFAULT-part are missing in our sample code. The parser will create

the nodes while copying the control flow graph. They serve the parser as a stub

where it can plug in any actual instances appearing in the program. The above

mentioned stubs remain empty if there is no corresponding code available.

Empty stubs cannot be removed, as they still serve a purpose: they can be used

to query whether their contents was present in the program. We did this e.g. in

Decl Montage (Fig. 23, p. 66) in the action of node init.

After all nonterminals were replaced by their control flow graphs we get a

network of action nodes. We use the term network here instead of graphbecause the nodes and edges resemble an active communication network with

action nodes as routers (routing the control flow) with computing abilities and

edges as communication lines.

4.7.2 Execution

After the transformation process is completed, execution of the program is

almost trivial. The network of action nodes can be executed by starting at the

initial edge of the starting Montage. It refers to some node which will get the

execution focus, i.e. its rules are executed. Then the conditions of all outgoing

edges will be evaluated. If there is none that evaluates to true, then the system

stops, if there are more than one ready to fire, then the system stops too,

because an ambiguous control flow was detected10. In the "normal" case of

only one edge being ready to fire, control will be transferred to its target node.

The system runs in this manner as long as there are control flow edges ready to

fire.

A non-deterministic behaviour of the action network could also be implemented, though parallelexecution semantics was not the focus of our research.

Chapter 5

Implementation

In this chapter, the implementation of the Montage Component System(MCS) is discussed. The system allows to compose several Montage specifica¬tions to a language specification that can be executed, i.e. an interpreter for the

specified language is generated.We begin this chapter with a discussion on what a language is in MCS

(section 5.1). Section 5.2 will explain syntax processing and parsing and

section 5.3 covers the static semantics analysis and control flow composition.Notice that the given code samples are not always exact copies from the

actual implementation. The actual code has to deal with visibility rules, type

checking and thus is usually strewn with casts and additional method calls to

access private data. As we try to focus on the basic aspects, we do not want to

confuse the reader with too many details. The code is trimmed for legibilityand simplifies e.g. getter and setter methods became attributes or class casts

were omitted.

The architecture of MCS follows the Model/View/Controller (MVC) para¬

digm [KP88]. The discussion presented in this chapter concentrates on the

aspects concerning the Model in this MVS triad. User interfaces such as Mon¬

tage editors are only mentioned occasionally.

5.1 Language

A language specification consists of Montages and tokens and they will be dis¬

cussed in detail in the following sections. In order to use a Montage or a token

as a partial specification of a language, it has to be registered at the languagefirst. In MCS this is done by either creating a new Montage in the context of a

language or by importing existing Montages into a language.

71

115 Implementation

A Montage can be stored separately in order to be imported into a language,

but it cannot be deployed out of the context of a language. Does this conform

to the definition of a software component given in section 2.1? This definition

identified five aspects of a component (here: a Montage). On a language level,

three of them are of interest: extent, occurrence and usage. Appearance and

requirements of Montages will be discussed in section 5.3.

Definitely, a Montage is a unit ofcomposition and is subject to composition by

thirdparties. Montages can be stored and distributed separately, which qualifies

them for the extent and usage aspects of our definition. The occurrence aspect

- components can be deployed independently- has to be discussed in more detail.

Independent deployment does not necessarily mean that a component is a

stand-alone application. Consider a button component for instance; it can be

deployed independently of any other graphical components (such as sliders,

text fields or menus), but it cannot be deployed out of the context of a sur¬

rounding panel or window. Similar rules apply for Montages. They may be

viewed, edited and stored independently of other Montages (as buttons can be

manipulated separately in a programming environment), but their runtime

context has to be a language. Within such a context, Montages can either be

imported or exported separately or in groups (see section 5.3 for further

details).

The main graphical interface of MCS reflects the leading role that the lan¬

guage plays. The main interface contains a list of all registered Montages (see

Fig. 28). Here an overview over the language is given by a list of EBNF rules

from all Montages. Tokens are listed on the second panel of this user interface

(see Fig. 29; this interface is explained in more detail in section Fig. 5.2.1).

Plugging a Montage into a language basically means to add it to this list. The

Montage is then marked as being a part of this language definition; no consist¬

ency checks are performed at this time. This is necessary to allow for conven¬

ient editing of the Montages. If these are imported, they have to be adjusted to

the new language environment, i.e. the syntax has to be adapted to match the

general syntax rules (e.g. capitalized keywords) or properties of the static

semantics have to be renamed in order to be used by other Montages (see

section 5.3 for details).

Only after all these adaptations have been performed, the interpreter for a

language may be generated. This happens in several steps which will be listed

next and described in detail in the following sections.

5.2 Syntax 73

aCoMon - Composeg^jf*l5bir*ä^Äiil|B illftï : ^lOixl

Name Syntax

Assignment = Ident" =" Expression

Block = [ConstDecl][VarDecl]{ProcDecl} Statement

CaN„_ „

= "CALL'Jdent___ _ __ ___

Comparison = Expression Compöp Expresston

Condition = Odd | Comparison

Expression = [Ad d 0 p] Termj AddOp Term }_

Factor = Ident| Number|"C' Expression")"

If = "IF" Condition "THEN" Statement "END"

Input ='"?" ident_

Odd_

= J'ODD" Expression__

Output = "i" Expression

Program = Block'"^___

Statement =

__

Assignment | CaNJ InputJLOutput | SjtmtSeqJ If | While

StmtSeq = Statement "," Statement}

Term = Factor{MultOp Factor}

While = "WHILE" Condition "Do" Statement "END"

Edit New Montage New Synonym Import Remove

Figure 28: MCS main user interfacefor manipulating a language

5.2 Syntax

Syntax definitions are given in terms of EBNF productions [Wir77b]. They do

not only specify the syntax of the programming language, they also declare how

Montages can be combined to form a language. We distinguish between char¬

acteristic productions and synonym productions (see also section3.3.2).

Characteristic Productions: \n MCS, characteristic productions are associated

with a Montage, i.e. each Montage has exactly one characteristic production.This production defines the concrete syntax of the Montage and therefore it

reflects the control flow graph given in the graphical part of the Montage. The

control flow graph basically defines the abstract syntax of the Montage. How

this correspondence between concrete and abstract syntax is defined, was

explained in section 4.4.3. This strict correspondencebetweenthecontrolflowgraphandtheconcretesyntaxdoesnotallowalternatives(separatedby"|")inacharacteristicproduction.Examplesforcharacteristicproductions:

745 Implementation

While ::= "WHILE" Condition "DO" StmtSeq "END".

Block ::= [ConstDecl][VarDecl]{ProcDecl}"BEGIN" StmtSeq "END" ".".

Synonym Productions:Synonym production rules assign one of the alternatives

on their right side to the symbol (the placeholder) on their left side. In MCS

there are two different categories of synonym productions: the nonterminal

synonym productions and the terminal synonym productions. As their names

imply, the right side of a nonterminal synonym production may contain only

nonterminal symbols as alternatives whereas in terminal synonym productions

only terminal symbols are allowed.

Nonterminal symbols and nonterminal synonym productions are the pivot

in language construction. They operate as placeholders and thus introduce flex¬

ibility in syntax rules. One possibility to enhance or extend a language is to

provide further alternatives to a synonym production. Nonterminal synonym

productions contain nonterminal symbols on their right side. Only one

nonterminal symbol is allowed per alternative, but there may be several termi¬

nal symbols. Terminal symbols in alternatives will be discarded by the parser as

they may not feature any semantic purposes. Examples of nonterminal syno¬

nym productions are:

Statement = Assign | Call | StmtSeq | If | While.

Factor = Ident | Number | "(" Expression ")".

Terminal Synonym Productions:A Montage may feature terminal synonym

productions, provided that the placeholder appears in the characteristic pro¬

duction of the Montage. An example:

Comparison ::= Expression CompOp Expression.

CompOp = "=" | "#" | "<" | "<=" | ">" | ">=".

Comparison is the characteristic production that describes the concrete syntax

of the Montage. CompOp is a terminal synonym production that conveniently

enumerates all possible comparison operators applicable in this Montage. Nor¬

mally, terminal symbols will be discarded when parsed. However, terminal

symbols declared in a terminal synonym production will be stored in a prede¬

fined property of the same name. To be precise: the property will contain the

string that was found in the program text. In the CompOp example, the parser

would generate a CompOp property of type java. lang. String and its value

would be the actual comparison operator found. Storing these strings is neces¬

sary because after parsing a program text, only this property will contain infor¬

mation about the actual comparison.

5.2 Syntax 75

5.2.1 Token Manager and Scanner

The processing of a program text begins with lexical analysis. There, the inputstream of characters is grouped into tokens. Each token represents a value and

in most cases this value corresponds directly to the character string scanned

from the program text. Such tokens are typically keywords (e.g. if, while,.

\ / «»«,««.» \ / una.» \ c 1

etc.), separators (e.g. ; , ( , ) , etc.) or operators (e.g. +,

*

, etc.) or the

programming language and serve readability purposes or separate syntacticentities.

In certain cases, however, the original string of the program has to be con¬

verted into a more suitable form. When scanning the textual representation of

a number for example, the actual character string is of minor interest as long as

it can be converted into its corresponding integer value. These tokens are called

literals; integers, floating point numbers and booleans are typical literals.

Beyond that strings and characters need to be converted as well, i.e. it might be

necessary to replace escaped character sequences by their corresponding coun¬

terparts (e.g. the Unicode escape sequence \u202l' will be replaced by a dou¬

ble dagger '$')•

In MCS, tokens are the smallest unit of processing. They cannot be

expressed in terms of Montages, as they contain no semantics at all, they only

represent a value. Therefore they have to be managed and specified separately.In order to completely specify a language, Montages do not suffice; some

token specifications will be needed as well. Fortunately, there are only a few

such specifications, which normally are highly reusable. Specifically these token

specifications are literals and white spaces. Keywords, separators and operators

can be extracted from the EBNF rules of the Montages. Literals and white

spaces cannot be generated automatically, as they have a variable micro syntax.

In order to efficiently handle scanning of program texts, MCS has a Token

Manager that keeps track of all tokens related to a language. Each token that

must be recognized has to be registered with the token manager. The majorityof the tokens will automatically be registered by the Montages as soon as they

generate their parsers. This is very convenient, not only because their number

can be high but also because they differ from language to language.Literals and white spaces, however, have to be specified independently from

any Montages (although Montages will refer to them in their EBNFs). Such a

specification consists of:

76 5 Implementation

ScoMon - Composer for Montages [PLO] rÄ|ßfxJ

File Help

[Tokens

Name Rule JPyggL jSWpC«5$Ktr»i

f^^^mû

D

~Q

^^MBSL

"CALT

"Do"

CALL_bo p^wird

D

D

"END"

"IF"

'ODD"

"THEN"

"WHILE"

DecimalLiteral

END

IF tpywoni

ODD Cepwrfi

THEN

WHILE

[1-9][0-9]*[ILp pteaer

D

D

"D

Ident

Whitespacej[a-zA-ZJ[0-9a-zA-ZJ* tort|ter_

(stringD

New Import Remove

Figure 29: Screen shot ofthe Token Manager. Each token is specified by a name, a regular

expression, a conversion method (represented by a type name) and a skip flag indicatingwhether this token will bepassed to the parser.

• a name that can be used in EBNF to refer to a token,

• a regular expression that describes the token's micro syntax,

• a method that returns an object containing the converted value of this token,

• a flag signalling whether this token spécifies a white space, an thus will be

skipped, i.e. it will not be passed to the parser.

The token manager will generate a scanner capable of scanning the program

text and returning tokens as specified. The method we chose to generate a scan¬

ner is the same as in Lex [Les75] and is explained in [ASU86]: applying

Thompsons construction to generate a nondeterministic finite automaton (NFA)

from a regular expression, and subsequently using a method calledsubset con¬

struction to transform these NFAs into one big deterministic finite automaton

(DFA). The application of these algorithms is not problematic at all as long as

all regular expression specifying keywords are processed before those specifying

literals. Normally the character sequence representing a keyword could also be

interpreted as an identifier, as the same lexical rules apply (specified by two dif¬

ferent regular expressions). Ifsubset construction is fed with regular expressions

5.2 Syntax 77

of literals first, then it will return a literal token instead of a keyword token

(refer to [ASU86] chapter 3.8 for further details about this property of lexical

recognizers). MCS solves this problem by numbering token specifications. Lit¬

erals and white spaces are automatically assigned higher numbers1, thus guar¬

anteeing correct recognition of keyword tokens.

The Token Manager will generate a lexical scanner on demand. The scanner

is then available to the parsers of the Montages. In order to process a program

text, they have to access the scanner's interface to get a tokenized form of the

input stream. The Token Manager and the scanner are central to all Montages.This contradicts in a way the decentralised architecture of MCS. Why does not

every Montage define its own scanner?

Although decentralised scanners could be implemented, they do not make

sense in practice. The main reason inconsistent white space and identifier spec¬

ification. In a decentralised setup, it would be possible that each Montagedefines different rules for white spaces. Such programs would be unreadable

and very difficult to code, as syntax rules might change in every token. It does

not make sense that white spaces in expressions should be defined differentlyfrom white spaces in statements. An example of such a program can be found

in Fig. 30. This example shows the core of Euclid's algorithm with the follow¬

ing white space rules: IF, REPEAT and assignments have the same rules as in

Modula-2, Comparisons use underscores '_', Expressions use tabs, Statement-

Sequences have again the same rules as in Modula-2 except thatnewline or car¬

riage return is not allowed.

REPEAT

IF u_>_v THEN

t : = u ; u : = v ; v : = t ;

END; u := u - V;

UNTIL u_=_0;

Figure 30: An example ofa program with different white space rules.

In a language specification, white spaces and literals would have to be specifiedredundantly in many different Montages and thus making them error prone.

Unintended differences could easily be imported by reusing Montages from

different sources. By using a single scanner for all Montages, the user of MCS

Literals and white spaces are numbered starting from 10000, assuming that there will never be

more than 10000 keywords, separators and operators in a language. This simple method prevents

renumbering of tokens. Remember that keyword tokens will not be registered before parser gener¬

ation, i.e. user specified literals and white spaces will be entered first, preventing a first come first

serve strategy.

78 5 Implementation

does gain simplicity, consistency and speed with the loss of some (usually

unwanted) expressive power.

5.2.2 Tokens

When tokens are registered with the Token Manager, they get a unique/öf that

helps to identify a token later in the AST. The id is an int value and can be

used for quick comparisons, e.g. it is faster to compare two integers than com¬

paring to strings each containing "implementation" (the longest Modula-2

keyword).Furthermore each token features a value and a text. The value is an object of

the expected data type, e.g. ifthe token represents a numerical value, xhenvalue

would refer to an Java. lang. Integer or a Java. lang. Float type object.

The tokens text however will contain the character string as it was found in the

program code and is always of type java. lang. String. The data type of

value is known either through its id or it can be queried, e.g. using Java's

instanceof type comparison. Fig. 31 shows the token classes that are availa¬

ble. The classes Token and VToken are abstract classes. VTokens will return

values that have to be transformed from the text in the program. The types of

these values are standard wrapper classes from the package java. lang:

Boolean, Character, Integer, Float, and String respectively.

Token

A

VToken KeywordToken

L\

BooleanToken

CharacterToken

IntegerToken

RealToken

StringToken

Figure 31: Token class hierarchy

5.2 Syntax 79

5.2.3 Modular Parsing

Each Montage is capable of parsing the construct it specifies. In order to do so,

a parser has to be generated first. This is done by parsing the EBNF rule and

building a syntax tree (the EBNF tree concrete syntax tree) for each Montage.The nodes of the EBNF tree are of the same types (classes) than the nodes of

the control flow graph. These classes will be described in section 5.3. Concrete

and abstract syntax tree of a Montage are very similar which has been com¬

mented on in section 4.4.3 on "Internal Consistency". Generating the parser is

done in the integration phase of our transformation process and thus the

EBNF tree is part of a Template Montage (see p. 58) and will not be used in

Instance Montages.

Parsing a program text is now as simple as traversing these EBNF trees and

invoking the parse method on each visited node, beginning with the EBNF

tree of the start Montage of the language. Each node in an EBNF tree belongsto one of the following categories and has parsing capabilities as described:

Terminal symbol: In an EBNF rule they are enclosed in quotation marks, e.g.

"if". When the parsing method of a terminal symbol node is invoked, it

gets the next token from the scanner and compares it to its own string. An

exception is thrown if comparison fails. Upon success, a token representingthe terminal symbol is returned.

Terminal symbols are normally not kept in the abstract syntax tree (see dis¬

cussion in previous section). A kind of an exception are terminals stemmingfrom terminal synonym productions whose text is stored as predefined prop¬

erty. Note that, upon parsing, the terminal symbol cannot decide on its own

whether it should be discarded or not. Therefore a token is returned upon

every successful scanning.

Nonterminal symbol: \n EBNF, these are identifiers designating other Mon¬

tages. When expecting a nonterminal, i.e. the parser encounters a nontermi¬

nal node in an EBNF tree, then parsing will simply be delegated to the

Montage that the nonterminal node represents. This Montage in turn

traverses now its own EBNF tree in order to continue the parse process.

Of course, the nonterminal nodes in the EBNF have to be aware of the

Montages they represent. This awareness will be achieved during the integra¬tion phase as described in section 4.4.

Repetition rule: Repetitions are marked by "{...}" or "[...]". The contents of a

repetition is contained in the children nodes of a repetition rule. Duringparsing, the parsers of these children are called in turn until an error occurs.

If this error occurs at the first child, the repetition has been parsed com¬

pletely, otherwise an error has to be reported to the calling node. Note that it

80 5 Implementation

is possible to get an error on the very first attempt to parse a child. This

means that the optional part specified by the repetition was not present in

the code. The parser is also responsible to check whether the actual number

of occurrences of the repetition contents is within the specified range, i.e.

min < actual occurrences < max, where min and max denote the minimal and

maximal allowed occurrences respectively.

Alternative rule: (An alternative rule separates different nonterminals by a ver¬

tical line "I")- The parser tries to parse each alternative (nonterminal) in

turn. If all alternatives report errors, then an error has to be reported to the

calling node. There are different strategies that can be implemented for a

successful attempt to parse an alternative. Either the first alternative that

reports no errors is chosen and its parse tree is hooked in the AST, or the

remaining alternatives are tested as well. If additional alternatives report

valid parse trees then there are two choices again. Either to stop parsing

because of an ambiguous grammar or to allow user assistance. We chose the

latter option of both choices: testing all alternatives and allowing user inter¬

vention. The user will be presented with a set of valid alternatives an may

chose the one that will be inserted in the AST. This approach will substitute

parser control in a certain way, as it allows to specify ambiguous non con¬

text-free grammars (see also discussion in sections 4.5 and 7.2.1).

Terminal synonym rules are treated differently. The string that was read

by the scanner will be stored in a predefined property which has the same

name as the terminal synonym rule. As the result of a terminal synonym rule

can only be one single string this is the most efficient way to handle these

tokens and no additional indirection (tree node access) is necessary to access

its value.

53 Data Structuresfor Dynamic Semantics ofSpecification

The description of the tasks of the static semantics analysis and control flow

composition in sections 4.6 and 4.7 was given in a chronological order. This

provided an overview over the various relations between Montages properties,

control flow graphs, abstract syntax trees etc. We would like to concentrate

now on the data structures used for the implementation. The following sec¬

tions are not reflecting any chronological order of the transformation process

but rather follow the hierarchy of our main classes: the nodes in the AST.


5.3.1 Main Class Hierarchy

The main data structure we have to deal with is the Montage. We decided to

model a Montage as a Java class. And although it is one of the most importantclasses in our framework, it plays a rather marginal role in the class hierarchywe defined. When modelling the classes and their relations, we proceeded from

the assumption that the most important data structure is not the Montage but

rather the abstract syntax tree (AST) built by the parser. This tree will manageall the important information about the program and its static and dynamicsemantics. The whole dynamic semantics of the specification (see Fig. 10,

p. 43) is centred around the AST. Taking this point of view, the first questionis: what objects will be inserted in the AST?

We have already seen the different kinds of nodes that populate the AST:

nonterminals, actions, repetitions, initial and terminal nodes and Montages.When analysing the relations between these nodes certain common properties

may be sorted out (candidates for abstract classes). After some experimenting,we found the class hierarchy given in Fig. 32 the best fitting type model for our

system.

CFIowNode

TV

Action

1\

Terminal CFIowContainer

1 T

7\

Nonterminal

7\

Repetition

Synonym Montage

Figure 32: The MCS class hierarchy

At the root of this hierarchy is an abstract base class CFIowNode that repre¬

sents all common properties of a control flow node. E.g. it implements a tree

node interface (javax. swing, tree .MutableTreeNode) that enables it to

825 Implementation

be managed by (visual) tree data structures. It also manages incoming and out¬

going edges in order to be used as a node in a graph. In other words, aCFlowN-

ode object is capable of being a node in a tree and a node in a graph

simultaneously. Furthermore an abstract method for parsing is defined.

CFlowContainer is an abstract class too that has additional capabilities to

manage a subtree of CFlowNodes. The concrete classes then implement all the

specific features that the different nodes will need to perform their duties. We

will present them in more detail in the following sections.

5.3.2 Action

An Action object will represent an action node at runtime. EachAction object is

a JavaBean listening to an Execute event and featuring a NextAction property.

Fig. 34 sketches the declaration of the Action class. The action provided by the

user will be encapsulated in an object implementing theRule interface.

interface Rule {

// fire the dynamic semantics rule

public void

}

fire 0;

Figure 33: Declaration ofinterface Rule

MCS will encapsulate the code that the user gave for a specific action in a

class that is suited to handle all the references and that has all the access rights

needed. When executing a program, action nodes are wrapper objects hiding

the different implementations of their rules and thus simplifying the execution

model. Action nodes do not need to implement any parsing actior? as they

have no representation in the syntax rule.

5.3.3Iand TNodes

In [AKP97, KutOl] /and Tin the control flow graphs denote the initial and

terminal edge respectively. From an implementation point ofview it turned out

to be easier to implement an initial and a terminal node^. Simply think of the /

2 In order to be a concrete class, Action implements the parsing method, but it is empty.

3 terminal node denotes a Tnode here, in contrast to a node representing a terminal symbol which

is ofclass Terminal. Class names are printed in italics


class Action extends CFlowNode

implements ExecuteListener {

//an object containing the rule to execute

private Rule rule;

// Constructor

public Action(Rule rule) {

this.rule = rule;

}

// handle the Execute event

public void executeAction(ExecuteEvent evt)

throws DynamicSemanticsException {

rule.fire();

}

// NextAction property-

public Action getNextAction() {

Action next = null;

for(Iterator it = outgoing.iterator();it.hasNext;) {

Transition t = it.next();if (t.evaluate()) {

next = t.target;

break;

}

}

return next;

}

}

Figure 34: Declaration ofclass Action

and T letters as nodes instead of annotations of the initial and terminal edge.This model has several advantages:• Because there is only one edge class there is no need to distinguish between

ordinary edges and edges with 'lose ends', which in fact the initial and termi¬

nal edges are because they are attached to only one node. Having two more

node classes does not hurt, as we have to distinguish between the different

kinds of nodes anyway.• Connecting Montages is easier as we can merge nodes instead of tying lose

ends of edges. An T node can be merged with an / node (of the followingMontage) by deviating all incoming and outgoing edges of the/ node to the

/"node. When all /and /"nodes are merged this way, only /"nodes remain.

845 Implementation

a) I--»- AnyMtg1- - it- Expr - - »-T

AnyMtg1 I | 9-

i• value.true\^

Stmt

~»~T

c) , (T>-TE3r<T)A' /fTltc

d) ®~xm-;^i

-Kt)---"^

e.true

l _>_ l?f-r_--'_ _' «Ai.v'Mtg

L _

-'value true

(?-ED--©

*--. L

valje true

.. .. _ ^-.^ _ .

! EU—© ,

Di t

Figure 35: 'Wiring a Montages

-»EU—Hj)

• The only difference between T nodes and Action nodes is that the first do

not fire any actions. Evaluating conditions on the outgoing edges is done

exactly the same way as in action nodes.

• /nodes do neither fire actions nor evaluate outgoing edges.

Fig. 35 illustrates the 'wiring' of a control flow of a While Montage sur¬

rounded by other Montages. Its Expr nonterminal node is expanded in

Fig. 35b and collapsed again in Fig. 35c. Finally, in Fig. 35d all I nodes were

removed and their attached edges diverted to T nodes. In this process, only one

I node will remain untouched: the initial node of the start symbol's control

flow. This is where execution of the program will begin.


5.3.4 Terminal

A node of class Terminal is capable of parsing a terminal symbol. Terminal

nodes will not be inserted in the AST, but they are members of the EBNF trees

which reflect the EBNF rule. During the integration phase, when the parse tree

is generated, a Terminal node registers its terminal symbol with the Token

Manager and receives an unique id in return. In the parsing phase, when a

token is parsed, the Terminal node then compares the id of the encountered

token to the previously stored one. Parsing may continue upon correspondenceand the token is returned to the calling node in the parse tree. Usually this

node is content with a positive result and does not store the terminal symbol in

the AST. However, for future versions of MCS it would be possible to insert

Terminal nodes as well for improving debugging services for example.

5.3.5 Repetition

Repetition objects represent lists, options or groups. They areCFlowContainers

because they maintain their contents as a subtree (see e.g. Fig. 13). There is no

need for specialised subclasses for these three kinds of repetitions, because theywould not sufficiently differ. We decided to introduce two attributesminOc-

currence and maxOccurence which can be accessed as Java properties. Theydetermine the minimum and maximum number of occurrences of their con¬

tents in a program text. The default values are given in the following table:

Repetition min max

List 0 java.lang. Integer.MAX_VALUEa

Option 0 1

Group 1 1

a. the maximum integer value which comes as close to °° as possible.

Of course minimal and maximal occurrences are not bound to these numbers

and can be set freely by the user provided that min < max. When internal con¬

sistency is checked, the values ofmin and max can be unambiguously assignedto the concrete syntax tree instance of a repetition. This is necessary, as it is not

possible to specify any other values for min and max with EBNF than the ones

shown in the table above.

Repetition nodes in the AST serve as the container for nodes of the actual

occurrences. Edges leaving from or going to a repetition will be managed bythese nodes. The actual instances of the repetition body are stored below zRep-

865 Implementation

a) concatenated occurrence b) list occurrence

LIST

Stmt —*- Stmt

Stmt

Stmt

Stmt

LLIST

—[0]

•Stmt

rStmt

Figure 36: Difference between repeated occurrence and list occurrence ofstatements

etition node in an array. Fig. 36b shows how the AST of a list of two occur¬

rences of a statement looks like. In contrast to Fig. 36a which illustrated a

concatenated occurrence of statements in the specification.

Notice that in the graphical representation the initial and terminal edges can

be left out if there is only one node in the repetition. In this case, it is obvious

where these edges have to be attached to. A Repetition object has to guarantee

that there exists one I node and one T node either being explicitly set by the

user or implicitly assumed as in described above. The numbered nodes are visu¬

alizing the array buckets in which the actual instances are stored, they do no

exist as nodes in the AST.

5.3.6Nonterminal

Nonterminal is a concrete subclass of CFlowContainer, as it is possible to nest

nonterminals (see Fig. 9, p. 41). Nonterminals serve as placeholders for their

designated Montages, therefore the most important extension in ùieNontermi-

nal class is a reference that will refer to the designated Montage. This reference

will be set during the integration phase (section 4.4). Parsing will be delegated

to the Montage the Nonterminalobject is referring to.

5.3.7 Synonym

A Synonym object has two responsibilities: trying to parse the specified alterna¬

tives and representing the actually parsed instance in the AST.


Program P

d = 5*c

AST of P

Asg

Ident: d E>:pr

Term it Term

Factor

I

Factor

11

Num: 51

Ident: c

Grammar of L

Asg ::= Ident "=" Expr.

Expr ::= Term {AddOp Term }.Term ::= Factor { MultOp Factor}.Factor = Ident I Number I "(" Expr ")"

Compact AST of P

Ident: d Expr

Term

Factor

Num: 5

Term

Factor

Ident: c

Figure 37: ASTand compactASTofaprogram

Parsing has already been described in section 5.2.3. It is important not to

lose information about the origin of nodes in the AST representing the parsed

program. It is common that a Montage refers to a different one not by its

proper name but by the name of a synonym production. Fig. 37 shows such a

situation. AFactomode will always have exactly one child. Therefore theFactor

node and its child could be merged. But it is important not to 'forget' that

there was a synonym node in between. If this information was lost, it would

not be possible to refer to Factor in Montage Term.

E.g. in a Term Montage there would probably be some action rule similar to:

value = Factor~l.value * Facotr~2.value

which is computing the result of the multiplication. Referring to Factor is the

only possible way, as we do not know at specification time what kind of Factor

we will encounter in the program text.

As we may not remove the synonym node information, we do not remove

the entire node from the AST. This is in contrast to the recommendations of

the original Montage specification [AKP97]. From an implementation point of

view, we cannot simplify matters by compacting the trees. The gain in memory

is not worth the extra coding, and the additional indirection on accesses to

885 Implementation

Num, Val or Expr is certainly less time consuming than querying the node for

information about its previous ancestors.

5.3.8 Montage

The most complex data structure in our framework is the classMontage. The

complexity can be explained by the versatile use of zMontage object. For sim¬

plicity of implementation we do not distinguish between Template Montages

and Instance Montagebut rather use the same data structure for both. The over¬

head we get in terms of memory is not as bad as it may seem on first glance:

there are only few class attributes that are exclusively used by the registration

and integration phases. Most items will be needed in static semantics analysis

and control flow composition as well. The methods needed to implement

Instance and Template Montages are basically the same, which would result in

practically the same implementation, i.e. the Instance Montage implementation

would be a subset of the Template Montageimplementation. Not distinguishing

between the two has therefore the advantage that changes in the code have to

be done only in one class.

Fig. 38 shows the interface of the class Montage. Note that Montage is a

CFlowNode which enables it to be a node in an AST. Furthermore it inherits

the behaviour of a CFlowContainer, thus it is capable of managing subtrees of

CFlowNodes. All methods and attributes related to these abilities were already

implemented in Montages superclasses.The methods in Fig. 38 are grouped according to the underlying data struc¬

tures and their appearance in the graphical representation.

The first three methods are concerned with terminal synonym rule handling.

Synonym rules (objects of class Synonym) can be added to or retrieved and

deleted from a Montage object; they are referenced by name. Adding and delet¬

ing is usually done manually by the user, when he edits a Montage during the

registration/adaptation phase.The same is valid for the next group of methods which is concerned with

construction of the control flow. The methods allow to createActions, Nonter¬

minals, Repetitions and Transitions (control flow edges). They are convenience

methods, as it would also be possible to call their respective constructors

directly and insert them 'manually' by calling the corresponding tree handling

methods. But using these convenience methods has the advantage, that nodes

inserted this way are guaranteed to be correctly set up. E.g. parents are checked

first if they can hold a new node and it is guaranteed that a transition always

connects two valid nodes. Removal does not need such consistency checks and

therefore can be done by calling the tree handling methods inherited from class

5.3 Data Structures for Dynamic Semantics of Spécification 89

public class Montage extends NonTerminal {

// datastructure managing transitions

protected I initial;

protected T terminal;

// Terminal Synonym Rules

public void addSynonymRule(Synonym sr);

public Synonym getSynonymRule(String name);

public void removeSynonymRule(String name);

// Editing the Control Flow Graph

public Action newAction(String name, CFlowNode p);

public NonTerminal newNonTerminal(String name,

CFlowNode p, int cor);

public Repetition newList(String name, CFlowNode p);

public Repetition newOption(String name,

CFlowNode p);

public Repetition newRepetition(String name,

CFlowNode p, int min, int max);

public Transition newTransition(String label,

CFlowNode from, CFlowNode to);

public I setlnitialTransition(CFlowNode node);

public T setTerminalTransition(CFlowNode node);

// Properties

public void addProperty(Property p);

public Property getProperty(String name);

public void removeProperty(String name);

// Action

public void addActionRule(Action node, Rule r);

public void removeActionRule(Action node);

// Registration phase

public void setLanguage(Language newLanguage);

public Language getLanguage();

// Integration phase

public void generateParser()

throws StaticSemanticsException;

// Parsing phase

public void parse() throws ParseException;

Figure 38: Interface ofclass Montage

CFlowNode. Setting an initial and a terminal transition is done by marking a

node in the subtree as the target of the initial transition or the source of the ter-

905 Implementation

minai transition of the Montage respectively. Internally, the Montage will allo¬

cate an / or T object that represent the corresponding edge, as described in

section 5.3.3.

Properties are modelled in a class of their own which is described in the next

section. They can be added to and removed from a Montage during editing

and (important for static semantics) they can be retrieved by their name.

The two methods concerned with actions allow to add and remove an action

to/from an Action node respectively. Note that the interface Rule has already

been introduced in Fig. 33.

In the registration phase, when a Montage is associated with a language, the

setLanguage ( ) method is used to set a reference from the Montage back to

the language. This is necessary so that e.g. the parser generator may have access

to the Token Manager that is stored with the language.getLanguage ( ) is also

used during the static semantics phase to find other Montages of the same lan¬

guage in order to access their properties.

Consistency checks and generation of a parser is done in method gener-

ateParser ( ). If it throws an exception, then an error occurred. StaticSeman-

ticsException is the base class ofvarious, more detailed exception classes that can

be thrown upon the many possible errors. Any errors are also reported toSys¬

tem, err which is the standard error stream in Java.

The method parse () can only be invoked after successful parser genera¬

tion, of course. It can also throw various exceptions (among themNoParserA-

vailableException) which are subclasses ofParseException.

Static semantics analysis and control flow composition can be done without

any access methods in class Montage. Both phases will operate directly on prop¬

erties and AST/control flow graphs respectively.

5.3.9 Properties and their Initialisation

Property Declaration. A MCS property is represented by an object that imple¬

ments the Property interface given in Fig. 39.

Declaring Property as an interface has advantages over a declaration as a class.

We are not forced to implement it as a subclass and thus virtually any object

can be made a property by simply implementing its interface.

Being able to alter the name of a property is very crucial as only a change in

name allows us to adapt an imported Montage to the needs of its new language

environment. When importing a Montage to a new language, there will proba¬

bly be naming conflicts. E.g.aninitialisationrulemayrefertoapropertynamedvalue.TheimportedMontagewouldfeatureamatchingproperty,butitsnamemaybeval.Renamingvaltovalueresolvesthispropertyreference.


interface Property {

// user view of a property-

public void setName(String name);

public String getName();

public Class getType();

public void setValue(Object value);

public Object getValue();

// building dependency graph

public void checkReferences(Language language)throws ReferenceException;

public void resolveReferences(Montage montage)throws ReferenceException;

// methods for topological sorting

public boolean isReadyToFire();

public void markReady(Property p);

public Iterator dependent();

// initialisation of property

public void initialize()

throws StaticSemanticsException}

Figure 39: Declaration ofinterface Property

The type of a property cannot be set, this will guarantee that despite of

renaming, properties will remain compatible. The type of a property is defined

by the implementation of the getType ( ) method. Setting a new value has to

be done in accordance with the type of the property, i.e. it is in the responsibil¬ity of the setValue ( ) method, that the stored value is of the same type as

returned by getType ( ). When reading a value, an object of the Java base class

j ava. lang. Obj ec t is returned. The receiver of this value may assume that it

is of the expected type.

During the integration phase, the method checkReferences ( ) will be

called for all properties of the Template Montages. It is responsible to find all

properties that are referred to from the initialisation rule. The argument lan¬

guage provides access to the other Montages of the specified language. If a

property cannot be found, an exception will be thrown.

The counterpart of checkReferences ( ) in the static semantics phase is

the method resolveReferences ( ). It is called for the properties ofInstance

Montages. It will resolve the property references in the initialisation rules byfinding the target properties in the AST. In order to do so, it needs access to the

current Instance Montage, which is given by the argumentmontage. The prop-

925 Implementation

A

B + C>^

A-

i ^

>*

^^

B C

0*/~> 5£. \-f

A B A refers to B

A -« — B dataflow

Figure 40: Dependencies amongproperties

erty is registered with the target property. By doing so, in each target property

there will be built a list of dependent properties. These reversed references can

also be seen as dataflow arrows, as the computed values will flow along these

references. Fig. 40 illustrates these two dependencies among properties. The

solid arrows indicate references between properties, whereas the dashed arrows

indicate the dataflow, i.e. initialisation of properties is done along the dashed

arrows.

For determining the firing order of initialisation rules, some helper methods

are needed. isReadyToFire ( ) will indicate whether all referred properties are

available, i.e. their values have been computed fc undef). When, during the fir¬

ing of the initialisation rules, a referred property becomesavailable this will be

notified through the markReady ( ) method. Its argument tells which property

has become available. In order to traverse the dependency graph of the proper¬

ties, it is important to have access to the dependent properties. This will be

granted by the java. util. Iterator object that is returned by the depend¬

ent ( ) method. Note that internally, each property will probably have a list of

references in order to efficiently process theresolveReferences ( ) method.

Finally, initialize ( ) invokes the initialisation rule. It is completely up to

the implementation how this rule is processed. The only requirement is that

the value will be initialized with an object of the expected type (i.e. Java class).

Firing the Initialisation Rules. The concept of firing the initialisation rules of

all Montages has been explained in section 4.6.1 and now we want to describe

the announced algorithm. Topological sorting can be done in 0(\P\ + \R\)

[Wei99] with P being the set of all properties and R being the set of all refer¬

ences among them. Resolving references and initializing properties can be done

with the same algorithm. Note that 0(\P\ + \R\) is the runtime complexity of


void topsort(Montage montage)throws StaticSemanticsException {

Stack s = new Stack( ) ;

Set toProcess = new TreeSet();

Property p, q;

for each property p {

toProcess.add(p);

p.resolveReferences(montage);if (p.isReadyToFire()) {

s.push(p);

}

}

while( !s.empty() ) {

p = s.pop();

toProcess.remove(p);

p.initialize();

for each q adjacent to p {

q.markReady(p);if (q.isReadyToFire() ) {

s.push(q);

}

}

}

if (toProcess.size ( ) > 0) {

throw new CycleFound(toProcess);

}

}

Figure 41: Pseudocode to perform initialisation ofproperties

the algorithm given in Fig. 41. It does not consider any additional runtime

effort of the initialisation rules. As we cannot influence the user defined initial¬

isation rules the more important is an efficient determination of the firingorder.

The algorithm defines two temporary stores, a stack and a set. The stack will

contain all properties which are ready to fire and in the set we store all proper¬

ties that still have to be processed. First, the algorithm iterates over all proper¬

ties and adds them to the set of unprocessed properties. Each of them will have

to resolve its references. The properties that are ready to fire in the first place(e.g. because they contain constants) will be stored in the stack.

The following while loop pops a property from the stack (and removes it

from the set of unprocessed properties as well). Then its initialisation rule is

94 5 Implementation

called. As the value of the property is available thereafter, we can notify all

dependent properties of this fact. If during this notification such a property

reports to have all data available now, it is pushed on the stack.

In the end, we check if all properties were processed. If not, an exception will

be thrown, containing all the unprocessed properties. This information will

help to locate the circular references.

5.3.10 Symbol Table Implementation

Insertions to the symbol table must not destroy the contained information. As

an example we have seen two identifiers with the same name but in different

scopes (Fig. 22, p. 64). The insertion of the inner symbol i may only shadow

the outer declaration for the subtree rooted at node 4 but not for the rest of the

nodes in the AST.

This can be achieved by several implementations. The most unimaginative

one would be to copy the data such that we have a symbol table for each node.

This would be a considerable - if not unrealizable - memory overhead.

A complex but more memory efficient approach is the organisation of the

symbol table as a binary tree structure. Consider, for example, the search tree in

Fig. 42a which represents the symbol table entries (e.g. type names and associ¬

ated declaration node).

Truck —> 1, Bike —» 2, Car —> 3

We can add the entry Bus —> 5, creating a new symbol table rooted at node

Bike in Fig. 42b without destroying the old one. Ifwe add a new node at depth

d of the tree, we must create d new nodes - better than copying the whole tree.

Using the Java class j ava. util. TreeMap which is based on a Red-Black tree

implementation, we can guarantee that d < log(n), where n is the number of

entries in the last symbol table. This implementation is described in more

detail in [App97].


Car 3

Bike 2

Bus 5

Truck 1

a. b.

Figure 42: Binary search trees implementing a symbol table.

Seite Leer l\Blank leaf

Chapter 6

Related Work

Programming language processing is one of the oldest disciplines in computer

science, and therefore a wide variety of different approaches and systems has

been proposed or implemented. In this chapter we present a selection of them

and compare them to our system. The first three sections cover closely related

projects, which can be seen as competitors to our Montage Component Sys¬tem. Section 6.4 comprises the traditional compiler construction approachestowards language specification. Section 6.5 briefly compares three different

component models that are in wide-spread use and explains their pros and cons

for our project. Finally, section 6.6 concludes this chapter with some remarks

on two projects in our institute that influenced design decisions and the way

MCS evolved.

6.1 Gem-Mex andXASM

The system related closest to MCS is Gem-Mex, the Montage tool companion.

Montages are specified graphically using the Gem editor [Anl] (a pleonasm, as

Gem stands for Graphical Editor for Montages). Mex [Anl] (Montage Executa¬

ble Generator) then transforms the Montages specifications into XASM

[AnlOO] code. XASM again is transformed into C code which in turn can be

compiled to an executable format. All specifications in the Montages are givenin terms ofXASM rules. As already mentioned above, ASMs are the underlyingmodel of computation in Montages.Gem is a simple editor with hardly any knowledge about the edited entities.

All textual input remains unchecked, whereas very limited integrity checks are

performed when editing in the graphical section occurs. E.g. there is only one

static semantics frame allowed. A Montage specification is stored in an inter¬

mediate textual format for further processing. Mex simply transforms the dif-

97

986 Related Work

ferent entities of the intermediate format into an ASM representation.

Separating the editor from processing has the advantage that the tools are

largely independent of each other. Changes in one tool will affect the others

only if the intermediate representation has to be adapted to reflect the change.

On the other hand, the compiler has to reconstruct many of the informations

that were originally available in the editor, e.g. the nesting and connecting of

non-terminals, lists and actions.

The centrepiece of this system is the XASM {Extensible ASM) environment.

XaSM offers an improved module concept over the ASM standard [Gur94,

Gur97]. The macro concept known in pure ASMs turns out not to be very use¬

ful when building larger systems. This simple text replacement mechanism,

however, does not provide encapsulation, information hiding or namespaces,

properties essential to reuse and federated program development.

The component concept of XASM addresses this issue. Component based-

ness is achieved through «calling conventions» between ASMs (either as a sub-

ASM or as a function, refer to [AnlOO] for further details) and through

extended declarations in the header of each component-ASM. These declara¬

tions do not only announce the import relations but they also contain a list of

what functions are expected to be accessible from an importing component.

Xasm components may be stored in a library in compiled form.

How does MCS distinguishfrom Gem-Mex/XÀSM?

• MCS is based on Java as a specification language instead of ASMs. Expres¬

siveness and connectivity is therefore bound to Java's and JavaBeans' possibil¬

ities. The advantage of this approach is that the full power of the existing

Java libraries is available from the beginning. In an ASM environment, there

are only very few third party libraries available. To circumvent this, one has

to interface with C libraries. By doing so, the advantages ofASMs over Java

(simplicity of model of computation, implicit parallelism, formal back¬

ground) are given up.

• Components are partially reconfigurable at run time. Reconfiguration of

XASM components can only be achieved through recompilation.

• The Gem-Mex system uses Lex & Yacc to generate a parser for a specified

language. It implements the horizontal partitioning model (see

subsection 3.1.1. on p. 24). Xasm-components can therefore not be

deployed on the level of Montages. They can however be called in the static

or dynamic semantics in terms of library modules.

6.2 Vanilla 99

• Gem-Mex has a fixed built-in traversal mechanism of the abstract syntax

tree. This may force the user to implement artificial passes. MCS on the

other hand uses a topological sort on the dependency graph of the attributes

in a language specification. The user does not have to care about the order of

execution.

6.2 Vanilla

The Vanilla language framework [DNW+00] supports programming languageconstruction on the basis of components, as our MCS does. Vanillas aim is

very similar to ours, namely to support re-use of existing components in lan¬

guage design. The vanilla team identified the same shortcomings in traditional

compiler design and language frameworks as we did in chapters 1 and 3. Not

surprisingly, their motivation is almost identical to ours. Their interests also

focus on domain specific languages, but no detailed papers on this topic could

be found. The vanilla framework is implemented in Java and thus using Java's

component model.

In Vanilla, the entity of development is called a pod. A pod correspondsroughly to a Montage. It specifies e.g. type checking features, run-time behav¬

iour and I/O. Component interaction in the Vanilla framework occurs on the

level of pods. Pods in turn are built of a group of objects interacting on method

call level.

The Vanilla team implemented some simple languages (Pascal, 0-2) and

gained some interesting experience from these test cases. They describe that the

degree of orthogonality between individual programming constructs was sur¬

prisingly high. They expected considerable overlap between components but

discovered that in reality there are remarkably few constraints on how languagefeatures may be combined. This finding was very encouraging to us, as it

marked a first (independent) confirmation of our own impression on languagecomposition. If there are so many similarities, the following question arises:

How does MCS distinguishfrom Vanilla?

Vanilla is basically a collection of Java libraries that facilitate the generation of

interpreter components. There are no graphical user interfaces nor any model

behind the language specification. Specifying a new pod is merely done by pro¬

gramming the desired behaviour in some Java classes. These classes may be

inherited from some Vanilla base classes, or they must follow a certain syntax in

order to be processed by some Vanilla tools (e.g. the parser generator). The

type-checking library contains some sophisticated classes that support free vari¬

ables, substitution, sub-typing etc. We therefore think that Vanilla is only

1006 Related Work

suited for compiler construction professionals. Although re-use is encouraged

to a high extent, one still must have a wide knowledge of type-checking tech¬

niques (as an example) to successfully make use of the library pods. The fact

that there is no preferred model behind the Vanilla pods adds to the amount of

knowledge and experience a user should have. This freedom will probably ask

too much of programmer untrained in the field of language design and imple¬

mentation. In Montage, users have to follow a certain scheme (e.g. use control

flow charts), but the intention is to simplify the model of Montages, and thus

to simplify its use.

6.3 Intentional Programming

The pivot in modern compiler architecture is the intermediate representation.

It is generated by the front-end, it is language independent and serves as input

for the target code-generating back-end. If well designed, the intermediate rep¬

resentation is one of the slowest evolving parts in compiler suites. So why not

using the intermediate representation for programming instead of struggling

with the pitfalls of concrete syntax? This is the core idea of Intentional Pro¬

gramming (IP1) [Sim96, Sim99], a project at Microsoft Research.

IP tries to unify programming languages on a level that is common to all of

them: the intention behind language constructs. A loop can be written in many

different ways using many different programming languages, e.g. in terms of a

for or while-statement in C++, a LOOP statement in Oberon or a combina¬

tion of labels and GOTO's in Basic. They share all the same intention, namely to

repeatedly execute a certain part of a program. In IP, a loop could still be repre¬

sented in several different ways, but with no concrete syntax attached. Such an

abstraction can be manipulated by the programmer directly by changing the

abstract syntax tree.

Charles Simonyi, project leader of IP summarizes his vision ofprogramming

in the future as: "Ecology ofAbstractions" [Sim96]. Abstractions are the infor¬

mation carriers of the evolving ideas, comparable with the genes in biology.

While the individuals of the biological ecology serve as "survival machines" for

genes, programs and programming components are the carriers of abstractions.

Programming will be the force behind this ecology. Market success, reusability,

efficiency, simplicity, etc. will be some ofthe criteria for the selection process of

the "survival of the fittest".

Unfortunately, IP becomes more and more an overloaded term. We are aware that in computer

science IP will usually be associated with "Internet Protocol" or sometimes with "Intellectual

Property". However, we are using IP here, the same way as it is also used in the referenced papers.

6.3 Intentional Programming 101

Concrete syntax in this environment can be compared to the personal set¬

tings of the desktop environment of a workstation. Each programmer workingwith IP will define his own preferences for the transformation of abstractions

into a human readable form. IP claims to solve many problems with legacycode as well. There exist parsers for the most important (legacy) programming

languages, which transform the source code into IP abstractions (basically an

AST). Once this transformation is applied, the program can be manipulated,extended, debugged etc. completely within the IP system. There will be no

need to manipulate such a program with text editors any more. If required, a

textual representation can be generated for any programming language.

How does MCS distinguishfrom Intentional Programming?Code generation in IP is done by a conventional compiler back-end. This

means that abstractions have to be transformed into a suitable representationcalled "R-code". This reduction (as the transformation is referred to) is given in

terms of IP itself. The MCS system uses Java in the place of R-code, the reduc¬

tion (static semantics rules in MCS) is given in terms of Java. Boot-strappingthe MCS system using IP would be possible, whereas the R-code centred archi¬

tecture of IP would prevent a boot-strapping of IP using MCS.

The intermediate language plays a less important role in MCS. The system

would also be operational if some Montages were specified using Java and some

using COBOL. This is because interoperability is specified on a higher level of

abstraction, namely on the component level instead of the machine code level.

The architecture of MCS does not suppress a multi-language, multi-para¬digm approach. The only requirement is that Montage components can com¬

municate with each other using some component system (see also section 6.5).

Programmers of legacy languages could still use their knowledge to specifyMontages. MCS is a tool allowing a smooth shift from language based codingto IP. IP is an option in MCS (although an important one) and not a handicap.

In contrast, the IP approach means that, when existing legacy programs have

to be embedded into the IP framework, the legacy language has to be specifiedfirst, and then a legacy parser has to be adapted to produce an appropriate IP

AST instead of its conventional AST. In other words, there has to be a con¬

verter for each legacy language of which we want to reuse code. This approachinvalidates much of the legacy programmers knowledge. After the conversion

to IP, his former language skills are no longer needed. Even worse, he may not

be well educated enough to cope with the paradigm shift from concrete lan¬

guage based programming to the more abstract tree based programming. The

acceptance of the IP approach will thus be limited to organisations that can

afford the radical paradigm shift.

1026 Related Work

6.4 Compiler-Construction Took

Although compiler construction is well understood and widely practised, there

are surprisingly few tools and systems that are in common use. Two of the most

popular ones are Lex [Les75] and Yacc Qoh75]. They were constantly

improved and their names stand as a synonym for front-end generators.

Many other tools never attracted such a large community. This is mainly due

to the steep learning curve associated with the large number of different syn¬

taxes, conventions and options to control them. The following subsections

therefore do not present isolated tools, but compiler construction systems,

which (more or less) smoothly integrate the tools to obtain better interoperabil¬

ity.

6.4.1 Lex & Yacc

Hardly any text on programming language processing can ignore Lex and Yacc.

Their success is based on the coupling of the two tools — Lex was designed to

produce lexical analysers that could be used with Yacc. Although they were not

the first tools of their kind2, bundling them with Unix System V certainly

helped to make them known to many programmers. The derivative JLex is

based upon the Lex analyser generator model. It takes a specification similar to

that accepted by Lex and creates a Java source file for the corresponding lexical

analyser.MCS' lexical analyser is in fact a subset ofJLex [Ber97]. Instead of generat¬

ing Java code, it builds a finite automaton in memory and uses it for scanning

the input stream. The regular expressions accepted for specifying tokens are

identical to those of Lex.

Yacc is a LALR [ASU86] parser generator. It is fed with a parser specification

file and creates a C program (y.tab.c). This file represents an LALR parser plus

some additional C functions that the user may have added to the specification.

These functions may support e.g. tree building or execute simple actions when

a certain production has been recognized. The input file format to both Lex

and Yacc have basically the following form:

declarations9-9-

-8-6

translation rules

%%

auxiliary C functions

As the name ofYacc (Yet another compiler compiler) implies.

6.4 Compiler-Construction Tools 103

Declarations are either plain C declarations (e.g. temporary variables) or speci¬fications of tokens (Yacc) and regular definitions (Lex) respectively. The

declared items may be used in the subsequent parts of the specification.The translation rules basically associate a regular expression or production

rule with some action to be executed as soon as the expression is scanned or the

production has been parsed respectively. Actions are also given in terms of C

code.

The third part contains additional C functions to be called from the actions.

In a Yacc specification, the third part will also contain an#include directive

which links Lex's tokens to Yacc's parser.

Lex and Yacc generate monolithic parsers, i.e. they support horizontal mod¬

ularization. Their support for compiler construction is limited to the first two

phases (lexical and syntax analysis, see Fig. 1, p. 2). Semantic analysis and code

generation have to be coded by hand. Providing semantics in the actions of the

rules, however, is only feasible for simple languages without complex type sys¬

tems. The generation of source code binds the compiler implementor to Cand

even more to Unix where these tools are available on almost every machine. On

the other hand, the problem of a steep learning curve is weakened by this

approach somewhat. There are only two new notations to be learned: regularexpressions and BNF-style productions. Both are so fundamental to compilerconstruction and computer science education in general that this can hardly be

accounted for steepening the learning curve.

6.4.2 Java CC

The Java community soon started to develop its own compiler construction

tools. JLex [Ber97], JFlex [Kle99] and CUP [Hud96] are Java counterparts to

Lex, Flex and Yacc, respectively. At Sun Labs a group of developers followed a

different approach called Java CC?.

The Java CC utility follows a contrary approach: instead of decorating the

specification with C or Java code, Java is extended with some constructs to

declare tokens. The syntax is not specified in terms of BNF or EBNF rules, but

Java CC uses the new method modifierproduction to mark a Java method

as a production. Java CC is merely a preprocessor that transforms this extended

Java syntax into pure Java. Information about the lexical analyser and the syn¬

tax are extracted from the Java CC input file and distilled into an additional

method that starts and controls the parsing process.

This group of developers founded Metamata, a company specializing on debugging and produc¬tivity tools around Java. Their freeware tool Metamata Parse is the successor ofJava CC

1046 Related Work

Java CC will generate top-down LL(k) parsers; LALR grammars have to be

transformed first. The top-down approach simplifies the parsing algorithm in

such a manner that the non-terminals on the right hand side of EBNF rules

represent calls to corresponding methods. Deciding on which rule to follow in

the case of a synonym rule does not have to be implemented by the user since

Java CC will add the necessary code.

Java CC's support of compiler construction is also limited to the scanning

and parsing phase. The philosophy behind this tool differs from Lex/Yacc in

the way specifications are given. Java CC directly exploits the skills and experi¬

ences ofJava programmers. The learning curve is minimal, as the user is con¬

fronted with even fewer new constructs than in Lex/Yacc. The notation for

regular expressions keeps very much to Java's syntax, and the top down

approach in parsing is easier to understand for a compiler novice than the table

driven LALR systems. The obviously strong ties to Java restrict the deployment

of Java CC to Java environments. As virtual machines exist for any combina¬

tion of major hardware platforms and operating systems, this is only a restric¬

tion in terms of usability of the generated code.

6.4.3 Cocktail

In [GE90] Grosch and Emmelmann present a «A Tool Box for Compiler Con¬

struction» which is also known as «Cocktail» [Gro94]. It contains tools for

most of the important compiler phases, including support for attribute gram¬

mars and back end generation. Cocktail is a collection of largely independent

tools, resulting in a large degree offreedom in compiler design.

Each ofthese tools features a specification language which maycontain addi¬

tional target language code (Modula-2 or C). The implementors are aware of

the fact that such target code will make it impossible for the tools to perform

certain consistency checks (e.g.Ag— the attribute evaluator generator- can not

guarantee that attribute evaluations are side-effect free). Nevertheless, they

argue that the advantages outweigh this disadvantage: practical usability, e.g.

interfacing other tools and flatter learning curve, as e.g. conditionsand

actions can be provided in a standard language.

6.4.4 Eli

The Eli system [GHL+92] — as Cocktail — is basically a suite of compiler con¬

struction tools. There is an important difference, however: Eli provides a

smooth integration of these tools. Integration is achieved by an expert system,

called Odin [CO90], helping the user to cope with all these tools. One does


not have to care about matching the output of one tool to the input of another.

Thus, tools developed by different people with different conventions can be

combined into an integrated system. This integration works also if the tools are

only available in an executable format.

To add a new tool to the Eli system, only the knowledge base of the expert

system has to be changed. The knowledge base manages tool dependencies,data transformation between tools and complex user requests. Dependenciesare represented by a derivation graph. A node in this graph represents a manu¬

facturing step: the process of applying a particular tool to a particular set of

inputs and creating a set of outputs. A normal user of Eli does not have to deal

with the knowledge base, it will be updated by the programmers of the tools

when they add or remove them.

The major goal of the Eli project is to reduce the cost of producing compil¬ers. The preferred way to construct a compiler is to decompose this task into a

series of subproblems. Then, to each of these subproblems a specialized tool is

applied. Each of these tools operates on its own specialized language.Eli uses declarative specifications instead of algorithmic code. The user has

to describe the nature of a problem instead of giving a solution method. The

application of solutions to the problem will be performed by the tool. The aim

is to relieve the user as much as possible from the burden of dealing with the

tools; he should be able to concentrate on the specification of the compiler.Eli's answer to the steep learning curve is somewhat ambivalent. On the one

hand, the expert system relieves the user from fiddling around with formats,

options and conventions. But on the other hand, mastering Eli also means

mastering many specialized descriptive languages. And what seems very con¬

vincing at a first glance may be a major hurdle to use this tool suite: Many pro¬

grammers are educated in operational languages only and therefore have

difficulties in mastering the new paradigms associated with declarative specifi¬cations.

6.4.5 Depot 4

Depot4 [Lam98] is a system for language translation, i.e. an input language is

translated into an output language. There are no restrictions to these languagesother than they must represent object streams. This idea was influenced by the

Oberon system, the original implementation platform for Dépota. Texts in

Oberon can easily be extended to object streams, thus allowing Depot4 to act

as a fairly general stream translation/conversion system. However Depot4 is

Depot4 was ported to Java due to the better availability ofJava Virtual Machine platforms.

1066 Related Work

designed as an application generator rather than a traditional compiler. This

means, programming in the large (i.e. assemble different modules and specify

operations between them by providing a DSL) is the preferred use. Although

not impossible, machine instruction generation would be a hard task, as sup¬

porting mechanisms and tools are missing.

EBNF rules can be annotated by operations given in a metalanguage (M14).

Nonterminals on the right hand side of each EBNF rule are treated as proce¬

dure calls to their corresponding rules. This approach implies a predictive

parser algorithm. Of course, grammars have to be left-recursion free. M14 pro¬

vides the programmer with some predefined variables which keep track of the

repetition count, whether an option exists or which alternative was chosen in a

synonym rule.

As MCS, Depot4 addresses the (occasional) implementor of DSLs, who

does not have the extensive experience of a compiler constructor. It aims to

support fast and easy creation of language translators, without trying to com¬

pete with full blown compiler construction suites.

Depot4's similarities to MCS:

• EBNF is used for syntax specification.• the system's parser is vertically partitioned; the concept of modularization of

specification is the same.

• language specifications are precompiled and loaded dynamically on demand

Depot4 does not support:

• symbol table management

• semantic analysis• intermediate code or machine instruction generation

6.4.6 Sprint

Sprint is a methodology [CM98,TC97]for designing and implementing DSLs

developed in the Compose project at IRISA, Rennes. Sprint's formal frame¬

work is based on denotational semantics. In contrast to the approaches pre¬

sented above, it does not so much feature specialized tool support, but rather

sketches how to approach the development of a DSL. Following the Sprint

methodology, DSL development undergoes several phases:

Language analysis: Given a problem family, the first step includes analysis ofthe

mutuality and the variations of the intended language. This analysis should

identify and describe objects and operations needed to express solutions to the

problem family.


Interface definitions: In the next phase, the design elements of the language are

refined, the syntax is defined and an informal semantics is developed. This

semantics relates syntax elements to the objects and operations identified in the

previous step. Another aspect of this phase is to form the signature of the

semantic algebras by formalising the domain of objects and the type of opera¬

tions.

Staged semantics: During this phase, static semantics has to be separated from

dynamic semantics.

Formal definition:The semantics of the syntactic constructs are defined in

terms ofvaluation functions. They describe how the operations of the semantic

algebras are combined.

Abstract machine: The dynamic semantic algebras are then grouped to form an

abstract machine which models the behaviour of the DSL. Denotational

semantics provides an interpretation of the DSL in terms of this abstract

machine.

Implementation: The abstract machine is then implemented, typically by usinglibraries. The valuation function can either be implemented as an interpreter

running on the abstract machine or as a compiler generating abstract machine

instructions.

Partial evaluation: To automatically transform a DSL program into a compiledversion (given an interpreter), partial evaluators can be applied.The Sprint framework does not need specific tool support, as it relies on

proven techniques. As the above phases would also apply to a general purpose

language, tools supporting these can be employed. Techniques to derive imple¬mentations from definitions in denotational semantics can be adopted to the

Sprint approach. For the implementation, standard software libraries and avail¬

able partial evaluators are used.

This form of reuse helps to speed the development time of a new DSL con¬

siderably. In contrast to our approach, reuse is undertaken from a global view,

i.e. only after all the parts of the DSL are specified (in the Implementation-step) reuse is employed. At this late point, reuse involves an implicit analysisphase, as the implementor first has to find appropriate libraries or implementa¬tions of similar abstract machines. For in-house development, this approachwill meet its expectations, whereas it would be difficult to share implementa¬tion between different organisations. Thus, it would hardly be suitable to an

open development environment as discussed in chapter 2. This is due to the

fact, that the formal definition does not describe the behaviour of what is

reused later on (the software library). In our approach, the entity of reuse is also

the entity of specification. This increases the confidence of a client of a Mon-

1086 Related Work

tage in its semantics, whereas there is no such direct link between the semantics

of a valuation function and a library function for example.

6.5 Component Systems

After the success of object-oriented software development in the 1980s, the

advent of component systems began in the 1990's. Among the many compo¬

nent systems proposed, only a few succeeded in software markets. As we have

pointed out in chapter 2, components vitally rely on a market to be successful.

Thus we focus on the three most wide-spread component systems. We will give

a very short overview in order to be able to discuss it with respect to MCS.

Detailed comparisons of the three systems can be found in QGJ97,Szy97]. All

three systems provide a kind of 'wiring standard' in that they standardize how

components interact with each other. Each has it's own background and mar¬

ket.

6.5.1 CORBA

Overview. CORBA (Common Object Request Broker Architecture) was devel¬

oped by the Object Management Group (OMG), which is a large consortium,

having more than 700 member companies. It mainly focused on enterprise

computing. This background is also reflected in the architecture of the system:

it is the most inefficient and complex approach in our comparison But being

independent from a large software house (as in contrast to Suns JavaBeans or

Microsoft's COM) also has advantages. To name two: (i) a wide variety ofhard¬

ware and software platforms is supported and (ii) interface definitions are more

stable, as many partners have to agree on a change of standards.

CORBA was developed to provide interaction between software written in

different languages and running on different systems. Its architecture is shown

in figure 43. The basic idea was to provide a common interface description lan¬

guage (IDL) which allows to specify the interface that a piece of software pro¬

vides. Compilers for this IDL generate object adapters (called stubs) that

convert data (in this case the identifiers, parameters and return values of proce¬

dure calls) to be understood by an object request broker (ORB). It is this bro¬

kers responsibility to redirect invocation requests to corresponding objects

(which provide methods that can process the requests). So basically an ORB

can be seen as a centralised method invocation service. As such it can provide

additional services like object trader service, object transaction service, event notifi-


Applicatîoïïl;„objects f

Ä8

Application interfaces

(skeletons)Client interfaces

(stubs)

Object Request Broker

GORBÄ services

Figure 43: CORBA architecture and services model (CORBAfacilities not shown)

cation service, licensing services and many more (standardised by the ObjectManagement Architecture (OMA)).

CORBA as platformfor the MCS. The central role of an ORB gives the user

the most explicit view of the communication network of a system. SpecificMontage services could be implemented as an OMA service, thus simplifyingthe architecture of MCS. Examples are:

The object trader service, which offers search facilities for services, could be

extended to find suitable Montages. Additional services could be implementedlike parsing and scanning services. They could provide specialised parsing tech¬

niques (LL, LALR). A table management service could implement centralized

table management for a language.CORBA was not chosen for a prototype implementation because of its com¬

plexity. Setting up and running ORBs is non-trivial. Additional services often

undergo tedious standardisation processes. We think that the MCS architecture

is to fine grained too be efficiently used in a CORBA environment. On the one

hand, MCS builds a dense web of many simple components with only limited

functionality (e.g. the parse tree nodes or the Montage properties and their fir¬

ing rules). On the other hand, CORBA was designed to provide distributed

services over networks, and thus network delays may slow down the system

considerably. However, the independence of implementation and platform is a

major advantage of CORBA. If MCS is commercially deployed, it would be

1106 Related Work

worth watching the progress of CORBA. The standard Java libraries provide

CORBA supporting classes. With improved hard- and software implementa¬

tions of CORBA, it might be interesting to provide interfaces to this compo¬

nent system.

6.5.2 COM

Overview. COM (Component Object Model) is Microsoft's standard for inter¬

facing software. It is a binary standard, which means it does not support porta¬

bility (although a Macintosh port exists, which emulates the Intel conventions

on Motorola's PowerPC). With the success of the Windows OS and the wide

variety of software available, the need for inter-application communication

increased. The main goal was to provide a system that allows applications writ¬

ten in different languages to communicate efficiently with each other. A binary

standard for interfaces was established. COM does not specify what an object

or a component is. It merely defines an interface that a piece of software might

support. COM allows a component to have several interfaces, as well as an

interface may be shared by different components. Interfaces are represented by

COM objects. Fig. 44 shows a COM object Pets, featuring the interfaces

IUnknown, iDog and iCat (this is just a simple example). The lUnknown

interface is mandatory and serves to identify the object. Every interface has a

first method called QueryInterface. Its purpose is to enable introspection.

The next two methods shownAddRef and Release, support reference count¬

ing. The last method (Bark or Meow) symbolizes additional methods that a

component might export. COM was designed to run with different program¬

ming languages, most of them not supporting introspection and garbage col¬

lection. The omissions in the language have to be made up by forcing the

programmer into rigid coding schemes (e.g. exact rules, when reference count¬

ing methods have to be called), which is also reflected in every interface specifi¬

cation of a COM component (see Fig. 44).

COMas aplatformforMCS. COM's binary interface philosophy makes it

very unattractive to be used in heterogeneous environments like the internet.

As Microsoft does not have to consider a wide variety of interests, hardware and

software platforms, changes and updates are occurring more often than on the

other systems. ActiveX controls (very simplified: a collection of predefined

COM interfaces) has undergone many updates in the recent years. This might

lead to (forced) updates in our prototype implementation, which we wanted to

avoid. However, Microsoft's COM offers the fastest component wiring by far

Commercialising MCS based on client's demand, this should be considered.

The loss of compatibility could be made up by bridging ActiveX to JavaBeans.


1Unknown Q

(Pets

irvin r~} IDog vptr Querylnterfaceluo9 LJ

iPit r*\ ICat vptr AddRef

V

Object data Release

Bark

Querylnterface

AddRef

Release

Meow

J

Figure 44: A COM object and its internal structure. The implementation ofthe interfaces is

very similar to the virtual method tables ofC+ +

This would allow Java versions to still run with an ActiveX implementation,although at a lower speed due to conversion and Java's slower execution. As this

would penalize non-Windows clients (and probably scare them off), it is ques¬

tionable whether the a double implementation is worth the effort.

6.5.3JavaBeans

Overview. JavaBeans is in many ways the most modern component system

compared to CORBA and COM. Its main market are internet applications.Based on Sun's Java language, JavaBeans provides component interaction byfollowing coding standards. Any Java object is potentially a component (a Java-

Bean) if it follows some naming conventions for its methods. These conven¬

tions and the packing of such components into a compressed file format (theso-called

. j ar format) is basically all behind JavaBeans. JavaBeans profits from

the youthfulness of its base language Java. Many features of Java support the

safety and ease-of-use of JavaBeans, as illustrated with the following examples(i) the automatic garbage-collection prevents memory leaks in JavaBeans,whereas in COM, an error-prone reference counting scheme has to be fol¬

lowed; (ii) interfaces are part of the Java language, while in CORBA and COM

they have to be implemented following coding guidelines; (iii) the virtual

machine representation of Java objects and classes allows introspection of their

exported features at runtime, without additional implementation overhead.

What makes JavaBeans attractive might also be its major drawback: the tightcoupling of the component system with the programming language. JavaBeansis not suited to integrate legacy code into a component framework (CORBA

1126 Related Work

and COM both do better here). Portability is only supported through the Java

Virtual Machine (JVM), which means that the implementation language has to

be Java. CORBA and COM do not restrict the programming language,

CORBA is also independent of the executing hardware platform. And last but

not least, when it comes to execution speed, JavaBeans systems are much slower

than COM implementations.

JavaBeans as aplatformforMCS. We chose JavaBeans as our implementation

platform. In our case, the advantages outweighed the disadvantages. The safety

of Java's garbage collected memory model, type system and language syntax

were far more valuable to the development ofMCS than speed, which was not

a major concern. We believe that also for a commercial version of MCS, Java

would be a good implementation platform for the following reasons:

• Java neatly integrates with the internet (running as an applet).• Java offers the best language support (safety of implementation).• Legacy code is usually not a concern when developing new programming

languages.• Portability is cheap: i) it is "built-in" (on the basis of the Java virtual

machine) in contrast to COM which is relying on Intel hardware and ii) it

does not require to generate stubs for various platforms (as in CORBA).

• Speed will be a decreasingly urgent problem, due to better JIT compilers.

However, one drawback remains: Java is the only implementation language

available for the JavaBeans component framework. If this would become a hin¬

drance, other component systems should be investigated carefully (see corre¬

sponding sections above).

6.6 On the history ofthe Montage Component System

The Montage Component System continues a long tradition of research in

software engineering at our institute. Although in its contents MCS differs

from the other projects — namely GIPSY, PESCA and CIP — concept, design,

and implementation was greatly influenced by them.

GIPSY. The GIPSY approach to software development [Mar94, Mur97,

Sche98] widens the narrow focus on supporting tools. So called Integrated Pro¬

gramming Support Environments (IPSE) go beyond supporting the program¬

ming task (editing, compiling, linking). Software development is embedded in

Computer Engineering and Networks Laboratory at ETHZ

6.6 On the history of the Montage Component System 113

the other business processes such as knowledge management, customer support

or even human resources management [MS98].

Marti [Mar94] describes in his thesis GIPSY as a system that manages docu¬

ment flow in software development systems. Integrated into the system is

GIPSY/L, a domain specific language that allows to describe document flows.

Formal language definitions written in GIPSY/L are used to specify the docu¬

ments' properties. Such definitions combine the documents' syntactical struc¬

ture and semantical properties. From these specifications GIPSY generates

attributed syntax trees. Using extensible attribute grammars [MM92], specifi¬cations for languages are extensible in an object oriented way, i.e. specifications

may inherit properties and behaviour from their base specifications.The algorithm used to evaluate attributed syntax trees is the same as we use

in our system: the partial order of the dependency graph spanned by the

attributes in the tree is topologically sorted before evaluation.

The influence of GIPSY on the Montage Component System was the

understanding that our system cannot be seen as an independent piece of code.

Using MCS will have consequences not only for programmers, but will also

have an impact on how language components are distributed, deployed and

maintained. For detailed reflections on this topic see chapter 2.

PESCA. The project on Programming Environment for Safety Critical Appli¬cations [Schw97, SD97] investigated automated proving of code to specifica¬tion correspondences. Safety critical applications where first specified formallyand then programmed, using a very restricted (primitive recursive) subset of

the programming language Oberon [Schw96]. An automated transformation

of this program into a formal specification could then be validated against the

original specification. However, experience showed that this approach will be

difficult to scale up to handle real-world applications. Programs where

restricted to primitive recursiveness and the proof of correspondence between

the to specifications was only tractable for small programs.

MCS has a more prototypical character and does not focus on languages for

safety critical applications, but still there were some lessons learned from

PESCA. Algebraic specifications used in PESCA feature a steep learning curve,

they appear to be very abstract to a programmer used to imperative program¬

ming. Using Abstract State Machines or even Java seemed to be much closer to

the programmers understanding. As ease of use and simplicity were our goals, a

pragmatic approach was chosen for MCS: using Java to provide operationalsemantics of language constructs.

CIP. Using the CIP method [FMN93, Fie94, Fie99], the functional behaviour

of a system is specified by an operational model of cooperating extended state

machines. Formally a CIP system is defined as the product of all state machines

1146 Related Work

comprised, corresponding to a single state machine in a multi-dimensional

state space.

CIP had a great influence on the implementation design of MCS. It taught

us to concentrate on the basic functionality. CIP features a robust model of

computation which forces its users to a rigid but well understood development

scheme. The rationale behind this is, that it is rewarding to trade some of the

developers freedom for more productivity, reliability and clearness. MCS is

based on this conviction too.

Chapter 7

Concluding Remarks

The work on the Montage Component System revealed interesting insightsinto language processing from an unusual point of view as we tried to empha¬size the compositionality of a language, and thus, we approached languagespecification from a different angle than usual. We gained experience in the

way such systems can be specified and built and upon this experience, we will

give some ideas and proposals for improving Montages and the MCS. We hopethat our reflections on Montages and their market context as well as our ideas

for improvements will be helpful in the planned project of commercializingMontages.

7.1 What was achieved

We described a system for composing language specifications on a construct byconstruct basis. The overall structure of the system differs considerably from

conventional approaches towards language specification or compiler construc¬

tion, as it is modularized along the language constructs and not along the usual

compiler phases. We explained how such a partitioning can be realized and

used in language design.Deployment of (partial) language specifications on a component market has

been investigated. The system was put into a wider context of development,support, distribution, and marketing of language components.

The main fields of deployment we foresee for a system like ours are domain

specific language development and education, since both can profit from the

modularity of the system. Additions to a language can be specified locally in

one Montage and encapsulation prevents unwanted side effects in unaltered

parts of the specification.

115

1167 Concluding Remarks

In contrast to the original Montages [AKP97, KutOl], we use Java instead of

ASM as our specification language. The modularisation of our system is more

rigid than the ASM approach. Global variables or functions are not allowed in

our system (in contrast to the ASM approach). Therefore, precompiled Mon¬

tage specifications are easier to reuse because fewer dependencies have to be

considered. Java as a specification language allows to fall back upon a vast vari¬

ety of libraries. Using Java also implies that our specifications are typed, which

is not the case with ASM specifications. Whether this is an advantage or not is

probably a question of personal preferences. On the one hand, strong typing

can prevent errors; on the other hand many fast-prototyping tools renounce to

typing because this gives the developer more freedom ofhow to use the system.

7.2 Rough Edges

We found - as we think - some weak points in the Montages approach and try

to sketch some ideas for improvement. It is important to understand that the

presented ideas are just proposals. We do not think that these problems can be

solved easily since they are mostly concerned with the Montage notation.

Changes in Montages should be undertaken carefully and based on feedback of

as many users as possible1.In each of the following sections we describe an open problem in Montages

and try to sketch ideas for their solution or at least to give some arguments to

start a discussion.

72.1 Neglected Parsing

Montages provide no means of control over the parsing of a program. We think

that this omission is an obstacle in the deployment of the Montages approach.

Most grammars that need additional control of the parsing process are not con¬

text-free grammars, in other words, the parsing of their programs relies on con¬

text information. In many textbooks on compiler construction, context

sensitive grammars are disparaged and rewriting techniques are described in

order to make them context-free [e.g. in App97]. However, this point of view

reflects only the language designer's arguments and ignores the needs of the

users ofthe language.

This is a lesson learned from programming language design. Take the introduction of inner classes

in Java as an example for a hasty enhancement of a language undertaken by a language designer

and much criticised by programmers and experts in academia.

7.2 Rough Edges 117

Often, users of a DSL are specialists in their own field which has nothing to

do with programming language theory. They use notations that are - in some

cases — hundreds of years old and well accepted. Examples are: mathematics,

physics, chemistry or music.

Mathematicians have an own rich set of notations at their disposal which

can (more or less) easily be typed on a standard keyboard. Suppose an input

language for a symbolic computation system as a typical DSL used by mathe¬

maticians. To enter a multiplication, it would be much more comfortable to

use the common juxtaposition of variable names ab instead of the unusual

asterisk a*b which is never used in printed or hand-written formulas. Such a

language can only be implemented if the parser has access to all additional

knowledge, i.e. access to context information [Mis97].

If Montages shall be successful on a long term, then some means to control

the parser have to be offered.

7.2.2 Correspondence between Concrete andAbstract Syntax

One of the trickiest parts in MCS is the correspondence between the givenEBNF rule and the control flow graph or, generally speaking, the mapping of

the concrete syntax onto the abstract syntax. The implemented solution is

straightforward: Each nonterminal in the control flow graph has to be assignedto a nonterminal in the EBNE This approach is not satisfactory concerning list

processing; more precisely, it fails to model special conditions on the occur¬

rence of a nonterminal in a list. Fig. 45 shows an example of an expression list

as it is defined for argument passing in many languages. Note that this kind of

syntax specification is not available yet.

ExprList ::= Expression {"," Expression }.

LIST-I

Expression

ContolFlow

L— Expression

Structures

EBNF

— Expression

Expression

Figure 45: List ofexpressions showing a clash in structures between controlflow andEBNE

118 7 Concluding Remarks

In a concrete syntax, entities often have to be textually separated. In the

given example, at least one expression has to occur. Subsequent expressions (if

present) have to be separated by a comma. Abstract syntax does not need such

separators, i.e. they are purely syntactical. In addition, theLISTobject contains

a property for both the minimum and maximum number ofexpressions. Thus,

the abstract syntax definition of the ExprList Montage can be much more com¬

pact and concise than the EBNF rule. The comparison of the two structures at

the bottom of Fig. 45 illustrates this.

The structure of the abstract syntax does not reflect the structure of the con¬

crete syntax. In order to simplify reuse of Montages, such clashes should be

allowed. If the abstract syntax depends on a concrete syntax, then a Montage is

not very attractive for reuse. In most reuse cases, one wants to keep the specifi¬

cation of the behaviour, but change the concrete appearance of a language con¬

struct. For the above example, solutions may seem straightforward, but in more

complex situations, it is difficult to give a general rule for mapping EBNF

occurrences of nonterminals to control flow objects. As a more complex exam¬

ple consider an If-Statement (Fig. 46) as it appears in almost every language. It

shall exemplify some of the open questions:

LIST -,

Expression |— *"T

r

i true

-* LIST •

Statement

LIST -i

Statement

-Q-s

Y

T

Figure 46: Abstract syntax ofan ifconstruct with conditional and default blocks

Can this control flow graph be the mapping of th^ following EBNF rule?

If ::= "IF" Expression "THEN" { Statement {";" Statement}}

{ "ELSIF" Expression "THEN" { Statement {";" Statement}}}

[ "ELSE" { Statement {";" Statement}} ]

"END".

This rule could be formulated more elegantly by introducing a new Montage

for the statement sequences:

7.2 Rough Edges 119

If ::= "IF" Expression "THEN" StatSeq{ "ELSIF" Expression "THEN" StatSeq }

[ "ELSE" StatSeq ]

"END".

StatSeq ::= Statement { ";" Statement}.

Unfortunately this adds to the complexity of the language, as there is now an

additional Montage which does not contain any semantics but is purely syntac-

itcal. Ongoing work should investigate whether the introduction of macros

would provide a solution that is sufficiently flexible. Lists as in statement

sequences or in an expression list are easy to detect and to replace, but does this

also apply to the if statement itself? It contains a list with more than one

nonterminal (an expression and a corresponding list of statements). Whyshould the default case ("ELSE" clause) be modelled with an extra list of state¬

ments? Making lists more powerful, this statement sequence could be incorpo¬rated as a special case (no expression, at the end) into the other list. General

mapping rules for concrete to abstract syntax should be investigated. The appli¬cation of such rules can be found in the compiler construction literature, but

not the reversed problem: which rule to apply for translation between a givenconcrete and a given abstract syntax. Sophisticated pattern matching possiblycombined with a rule database should be investigated here.

7.2.3 BNF or EBNF?

One way to circumvent the mapping problems described above would be to

ban list processing from Montages. In order to do this properly, we propose to

use plain Backus-Naur Form (BNF) instead of its extended version (EBNF) for

specifying syntax. As BNF grammars are better suited for bottom-up parsers,

we also suggest to introduce some means of controlling the parser from Mon¬

tages (refer to section 7.2.1 for motives).BNF was extended to EBNF by introducing repetitions, i.e. groups, options

and lists represented by parenthesises "( )", brackets "[]", and braces "{}"

respectively. In addition, alternatives can be expressed by enumerating them

separated by a vertical line.

EBNF allows a much denser representation of a grammar than BNF, as repe¬

titions turn out to be a powerful notational aid. This means that the number of

Montages specifying a language can be reduced and thus the language specifi¬cation is more compact.

Yet there are some problems introduced to Montages by using EBNF instead

of BNF. While the synonym rules display alternative productions extremelywell, repetitions can be a nuisance for the language designer. Apart from the


Add ::= Term {AddOp Term}.

AddOp = "+" 1 "-".

1---». Term~1 (setValue)

i

i

---^T!- Term~2 »-(addY »-T


value Integer return new lnteger();

op Integer return new lnteger(AddOp =="+"? 1 :-1);

Action

©setValue:

value = Term~1 .value;

©add:

value = value + op*Term~2.value

Figure 47: AddMontage with EBNF

mapping problems discussed in the previous section, we have to distinguish

between presence and absence of repetitions. Consider the simple specification

of a variable declaration in Fig. 23, p. 66. The optional initialisation ofthe var¬

iable requires a tedious distinction between the presence and absence of an ini¬

tialisation. In some cases, the presence of repetition complicates a Montage.

Often, it is not obvious to a novice user how initialisation of properties or exe¬

cution of actions work. Some background knowledge of how repetitions are

managed is necessary. Let us exemplify this by the Montage in Fig. 47.

Although the syntax rule is very compact, the control flow graph is far from

that. By only looking at the graph it would be hard to deduce the meaning of

this Montage. Moreover, in contrast to the declaration Montage in Fig. 23, it is

not even necessary to distinguish between presence and absence of the list here!

An alternative representation using BNF rules can be found in Fig. 48. Its

advantages are that the control flow graphs are much easier to understand and

repetitions are banned.

BNF is (in contrast to EBNF) known to a wider community of program¬

mers and language designers. Of course, it is very easy to learn EBNF; its defi-


Add:

Term

Add AddOp Term

AddOp: one ofii ,11 m ii

a)

Term — (setValue) T

b)Add *~ Term (add)- T


value Integer return new lnteger();

op Integer return new lnteger(AddOp == "+" ? 1 : -1);

Action

©setValue:

value =Term.value;

©add:

value = Add.value + opTerm.value

Figure 48: AddMontage with BNF using left recursive grammar

nition was given by Wirth in [Wir77b] on just half a page. Yet BNF has the

advantage (which should not be underestimated) that many publications about

languages are given using BNF. This means that the specifications given in

many books and articles will be easier to enter in Montage as the original lan¬

guage rules do not have to be rewritten first.

73 Conclusions and Outlook

The problems described in the previous section are inherent in Montages and

have their origin in the overspecification of the syntax tree. We consider the

improvement of the parsing process the most important issue in the continuingdevelopment process of Montage tools. The rest of this section will sketch a

possible solution to the parsing problem and hint at possible future directions

of Montages and their applications.


7.3.1 Separation ofConcrete andAbstract Syntax

We have discussed parsing techniques already in section 4.5, p. 52ff and an

immediate solution would be to use more powerful parsers, e.g. zn.LR(k) parser

such as it was described by Earley [Ear70]. Unfortunately, this does not solve all

problems we have concerning parsing — we still cannot handle context-sensitive

grammars as they were motivated in section 7.2.1.

On the one hand, a simple LL parser seems to be too rigid for many given

grammars, on the other hand, why bother a developer of a simple language

with all the expressive power of context-sensitive parsers?

We therefore propose to separate the problem of parsing completely from

the rest of the language specification. The MCS would then read abstract syn¬

tax trees instead of programs given as character streams.

XML as an Intermediate Representation. A very simple and pragmatic

approach would be to use XML (extensible Markup Language [W3C98]) as an

intermediate representation, generated by a parser and read by the Montage

Component System. As the syntax of XML is very easy to parse, an existing

XML parser (virtually any SAX2- or DOM-parsers for XML) can be applied to

replace the existing backtracking parser. For the user of the system this has sev¬

eral advantages:

• He can use any existing parser generator, e.g. CUP [Hud96] for LALR gram¬

mars or Metamata Parse [Met97] for LL grammars.

• The parser can be chosen to fit any existing grammar rules, e.g. in many

books BNF is used to explain a language {e.g. Java Language Specification

[GJS96]).• Developers can use a parser of their choice, i.e. one they are familiar with.

• Using XML also allows to use parsers on non-Java platforms. The generated

XML document can then be sent to an MCS running on a Java virtual

machine.

In fact, this intermediate representation already exists and we call it the Mon¬

tage Exchange Format (MXF): when saving a Montage, an XML file will be

generated containing all information necessary^ to reconstruct it again. Pres¬

ently the defining DTD (Document Type Declaration) specifies only the for¬

mat for one single Montage (so for each Montage there is a separate file). But it

should be easy to extend this DTD to allow one file containing a whole

2 SAX, the Simple API for XML and DOM, the Document Object Model both define interfaces to

access an XML parser and the parsed document respectively.

3 the coordinates of the elements of the control flow graph are stored in optional XML tags.


abstract syntax tree. The MXF also intends to be an tool independent format

which will simplify the exchange of Montages between the different tools.

73.2 Optimization andMonitoring

After successfully specifying and implementing a language, it would be desira¬

ble to compile programs as fast as possible. Fast execution would allow to

deploy MCS also as a production tool and not only as a fast prototyping tool.

Unfortunately, optimizing executable programs for speed or memory require¬ments is not very well supported in MCS as it relies on the Java compiler that

compiles the generated code. Optimizations often run on a global scale of a

program but the partitioning scheme of MCS builds on the locality of the

given specifications. To extend the system, we would propose plug-ins:

Plug-Ins. Optimization could be offered through plug-ins to the MontageComponent System. Such plug-ins are system components that can operate on

the internal data structure (basically the annotated abstract syntax tree). Theywould be operational between two phases of the transformation process, or

after control flow composition (see Fig. 10, p. 43), but before the Java compilerwill compile the generated code.

As such plug-ins can only operate between two phases of the transformation

process, they would be limited in their optimization capabilities. But it is still

possible to write plug-ins offering monitoring of the AST data structure. Theycould visualize or even animate the transformation process and/or allow to edit

the AST interactively.

Restructuring MCS. In order to be able to replace the implementation of the

different transformation phases, it would be necessary to implement them as

plug-ins as well. In the present implementation the different phases cannot be

replaced^ by the user. The transformation phases are accessed through Javainterfaces. Therefore, it is necessary to replace some implementing class files

and restart the system in order to replace the behaviour/implementation of the

transformation process (or parts of it).To support plug-ins that can be exchanged by the user, it is necessary to

extend the interfaces of the transformation phases to offer plug-in capabilities(e.g. install, remove). This can probably be done by introducing a new interface

Plugin and by extending the existing interfaces from it.

at runtime ofMCS


733 A Possible Application

A system like MCS does not have to be a standalone application, it could also

be part of a web browser. There it would serve as a kind of generalized Java vir¬

tual machine. A web page would not only contain tags referring to Java code,

but it would contain tags referring to a language specification and tags referring

to the program code. A MCS web browser then could download first the lan¬

guage specification, generate an interpreter for it and then download and inter¬

pret the program(s).

Ofcourse, downloading a whole language before using it might be a waste of

bandwidth for small programs. But there are certainly scenarios where the over¬

head of downloading the language specification will outperform the costs for

downloading equivalent Java applets. Using long term caching for downloaded

Montage specifications will further improve the performance of such a web

browser.

The DSL used in such web pages could be chosen according to its contents

and thus simplify the creation and support of a web site. But in order to justify

the overhead of downloading a language specification first, an application

should be used over a longer period of time and it should be highly interactive.

We might think of the following scenarios:

Tutorials: The DSL could be the subject of the tutorial, or it could be used to

program the many different user interactions that take place throughout the

tutorial.

Forms: In an application that uses many forms, a DSL for form description

could be used to configure customer tailored forms. Forms that are capable

of displaying different aspects according to the user's input are code and data

intensive in Java. A form DSL could be much more concise (e.g. consider

the TCL/TK [Ous97] versus Java API [CLK99]) and thus form intensive

applications would be less bandwidth consuming.

Symbolic computation systems: The downloaded language is a specification of

the Maple-, Mathematica- or Matlab-language. Instead of buying and

installing these applications, they could be rented on a "per use" base.

Such applications should be developed in accordance with the marketing and

support strategies presented in chapter 2 (Electronic Commerce with Software

Components).

[ADK+98] W. Aitken, B. Dickens, P. Kwiatkowski, O. de Moor, D. Richter,

C. Simonyi. Transformation in Intentional Programming. InPro-

ceedings of the 5th International Conference on Software Reuse

(ICSR'98), IEEE Computer Society Press, 1998.

[AKP97] M. Anlauff, P. W. Kutter and A. Pierantonio. Formal Aspects of

and Development Environment for Montages. In M.P.A. Sellink,

editor, Workshop on the Theory and Practice ofAlgebraic Specifica¬tions, volume ASFSDF-97 of electronic Workshops in ComputingBritish Computer Society, 1997.

[Anl] M. Anlauff. Montages Tool Companion: Gem-Mex Download at

ftp://ftp.first.gmd.de/pub/gemmex/.

[AnlOO] M. Anlauff. XASM - An Extensible, Component-Based Abstract

State Machines Language. In Y. Gurevich, P. W. Kutter, M. Oder-

sky and L. Thiele, editors, Proceedings of the Abstract State

Machine Workshop ASM200Q volume 1912 of Lecture Notes in

Computer Science, pages 69-90. Springer, 2000.

[App97] A.W. Appel. Modern Compiler Implementation in Java Cam¬

bridge University Press, 1997.

[ASU86] A.V. Aho, R. Sethi and J. D. Ullman. Compilers - Principles,

Techniques and Tools. Addison-Wesley, 1986.

[Ber97] E. Berk. JLex: A lexical analyzer generator for Java.

http://www.cs.princeton.edu/-appel/modern/java/JLex, 1997

[BS98] E. Borger and W. Schulte. A Programmer Friendly Modular Def¬

inition of the Semantics of Java. In J. Alves-Foss, editor, Formal

Syntax and Semantics ofJava Springer LNCS, 1998.

125

126 Bibliography

[CheOO] Z. Chen. JavaCard Technology for Smart Cards. The Java Series.

Addison-Wesley, 2000.

[CLK99] P. Chan, R. Lee and D. Kramer. TheJava Class Libraries. Volume

1 & 2. The Java Series, Addison-Wesley, 1999.

[CM98] C. Consel and R. Marlet. Architecturing software using a method¬

ology for language development. In Proceedings ofthe 10th Intrna-

tional Symposium on Programming Languages, Implementations,

Logics and Programs (PLILP/ALP '98), pp. 170-194, Pisa, Italy,

September 1998.

[CO90] G. M. Clemm and L. J. Osterweil. A mechanism for environment

integration. ACM Transactions on Programming Languages and Sys¬

tems, 12(1): 1-25, January 1990.

[Col99] M. Colan. InfoBus 1.2 Specification. Sun microsystems, February

1999.

[DAB99] Ch. Denzler, Ph. Altherr and R. Boichat. NOLC - Network and

on-line Consulting. Informatik/Informatique, 5:38-40, October

1999.

[DEC96] P. Deransart, A. Ed-Dbali and L. Cervoni. Prolog: The Standard,

Reference Manual. Springer-Verlag, 1996.

[DNW+00] S. Dobson, P. Nixon, V. Wade, S. Terzis and J. Fuller. Vanilla: an

open language framework. In K. Czarnecki and U.W. Eisenecker,

editors, Generative and Component-Based Software Engineering

LNCS 1799, pages 91-104. Springer-Verlag, 2000.

[Ear70] J. Earley. An Efficient Context-Free Parsing Algorithm. Communi¬

cations oftheACM, 13(2):94-102, February 1970.

[Eco97] Survey of Electronic Commerce. The Economist, May 10, p. 17,

1997.

[Fel98] P. Felber. The CORBA Object Group Service: A Service Approach to

Object Groups in CORBA PhD thesis 1867, Ecole Polytechnique

Fédérale de Lausanne, 1998.

[FGS98] P. Felber, R. Guerraoui and A. Schiper. The Implementation of a

CORBA Object Group Service. Theory and Practice ofObject Sys¬

tems, 4(2):93-105, 1998.

[Fie94] H. Fierz. SCSM — Synchronous Composition of Sequential

Machines, TIK Report 14, Computer Engineering and Networks

Laboratory, ETH Zürich, 1998.

Bibliography 127

[Fie99] H. Fierz. The CIP Method: Component and Model-based Construc¬

tion ofEmbedded Systems. European Software Engineering Confer¬

ence ESEC'99, Toulouse, 1999.

[FMN93] H. Fierz, H. Müller and S. Netos. CIP - Communicating Inter¬

acting Processes. A Formal Method for the Development of Reac¬

tive Systems. In J. Gorski, editor, Proceedings SAFECOMP'93.

Springer-Verlag, 1993.

[F+95] Frey et al. Allgemeine Didaktik. Karl Frey, Verhaltenswissenschaf¬

ten, ETH Zürich, 8th edition, 1995.

[GCC] GCC Team. GNU Compiler Collection, http://gcc.gnu.org.

[GE90] J. Grosch and H. Emmelmann.yl Tool Boxfor Compiler Construc¬

tion. Report 20, GMD, Forschungsstelle an der Universität Karl¬

sruhe, Vincenz-Prießnitz-Str. 1, D-7500 Karlsruhe, January 1990

[GH93] Y. Gurevich and J. K. Huggins. The Semantics of the C Program¬

ming Language. In Selected papers from CSL'92 (Computer Science

Logic), LNCS 702, pages 274-308. Springer-Verlag 1993.

[GHL+92] R. W. Gray, V. P. Heuring, S. P. Levi, A. M. Sloane and W. M.

Waite. Eli: A Complete, Flexible Compiler Construction System.Communications ofthe ACM, 35(2):121-131, February 1992.

[GJS96] J. Gosling, B. Joy and G. Steele. The Java Language SpecificationThe Java Series. Addison-Wesley, 1996.

[GJSBOO] J. Gosling, B. Joy, G. Steele and G. Bracha. The Java LanguageSpecification, Second Edition. The Java Series, Addison-Wesley,2000.

[Gro94] J. Grosch. CoCoLab. http://www.cocolab.de, 1994. Ingenieur¬büro für Datenverarbeitung, Turenneweg 11, D-77880 Sasbach.

[Gur94] Y Gurevich. Evolving Algebras 1993: Lipari Guide. In E. Borger,editor, Specification and Validation Methods Oxford UniversityPress, 1994.

[Gur97] Y Gurevich. May 1997 Draft of the ASM Guide. Technical ReportCSE-TR-336-97, University of Michigan EECS Department,1997.

[Har92] S. P. Harbison. Modula-3, Prentice Hall, 1992.

[Hed99] G. Hedin. Reference Attributed Grammars. In D. Parigot and M.

Mernik, editors, Second Workshop on Attribute Grammars and their

128Bibliography

Applications, WAGA'99, pages 153-172, Amsterdam, The Nether¬

lands, March 1999.

[HMT90] R. Harper, R. Milner and M. Tofte. The Definition of Standard

ML. The MIT Press 1990.

[HSSS97] S. Handschuh, K. Stanoevska-Slabeva and B. Schmid. The Con¬

cept of Mediating Electronic Product Catalogues. EM— Electronic

Markets, 7(3), September 1997.

[Hud96] S. E. Hudson. CUP Parser Generator for Java.

http://www.cs.princeton.edu/appel/modern/java/CUP,1996.

QGJ97] I. Jacobson, M. Griss and P. Jonsson. Software Reuse, acm press,

New York, 1997.

Qoh75] S. C. Johnson. Yacc — Yet another compiler-compiler. Comput¬

ing Science Tech. Rep. 32, Bell Laboratories, Murray Hill, N.J.

1975.

QW74] K. Jensen and N. Wirth. PASCAL - User Manual and Report

Springer-Verlag, 1974

[Kle99] G. Klein. JFlex: The Fast Analyser Generator for Java.

http://jflex.de, 1999.

[Knu68] D. E. Knuth. Semantics of context-free languages. Mathematical

Systems Theory, 2(2):127'-145, June 1968.

[KP88] G. E. Krasner and S. T. Pope. A cookbook for using the model-

view controller user interface paradigm in Sma\\ta\k-80.Journal of

Object-Oriented Programming 1 (3):26-49, August 1988.

[KP97a] P. W. Kutter and A. Pierantonio. The Formal Specification of

Oberon. Journal of Universal Computer Science, 3(5):443-501,

May 1997.

[KP97b] P. W. Kutter and A. Pierantonio. Montages: Specifications or Real¬

istic Programming Languages. Journal of Universal Computer Sci¬

ence, 3(5)-.416-442, May 1997.

[KR88] B. Kernighan and D. Ritchie. C Programming Language. Prentice-

Hall, 2nd edition, May 1988.

[KutOl] P. W. Kutter. Montages — Engineering ofComputer LanguagesPhD

thesis, Institut TIK, ETH Zürich, 2001.

[Lam98] J. Lampe. Depot4 - A generator for dynamically extensible trans¬

lators. Software - Concepts & Tools, 19:97-108, 1998.

Bibliography 129

[Les75] M. E. Lesk. Lex — A Lexical analyzer generator. Computing Sci¬

ence Tech. Rep. 39, Bell Telephone Laboratories, Murray Hill,

N.J. 1975.

[LisOO] R. Lischner. Delphi in a Nutshell. O'Reilly & Associates, 2000.

[LW93] D. Larkin and G. Wilson. Object-Oriented Programming and the

Objective-C Language. Available at: www.gnustep.org, NeXT

Computer Inc, 1993.

[LY97] T. Lindholm and F. Yellin. TheJava Virtual Machine SpecificationThe Java Series. Addison-Wesley, 1997.

[Mar94] R. Marti. GIPSY: Ein Ansatz zum Entwurfintegrierter Softwareent¬wicklungssysteme. PhD thesis 10463, Institut TIK, ETH Zürich,

1994.

[MB91] T. Mason and D. Brown. Lex & Yacc. Nutshell Handbooks.

O'Reilly & Associates, 1991.

[[Met97] Metamata. Java CC and Metamata Parse.

http://www.metamata.com, 1997.

[Mis97] S. A. Missura. Higher-Order Mixfix Syntaxfor Representing Mathe¬

matical Notation and its Parsing PhD thesis 12108, ETH Zürich,

1997.

[MM92] R. Marti and T. Murer. Extensible Attribute Grammars TIK

Report Nr. 6, Computer Engineering and Networks Laboratory,ETH Zürich, December 1992.

[MS98] T. Murer and D. Scherer. Organizational Integrity: Facing the Chal¬

lenge of the Global Software Process TIK-Report 51, Computer

Engineering and Networks Laboratory, ETH Zürich, 1998.

[Mur97] T. Murer. Project GIPSY: Facing the Challenge ofFuture IntegreatedSoftware Engineering Environments PhD thesis 12350, Institut

TIK, ETH Zürich, 1997.

[MV99] T. Murer and M. L. van de Vanter. Replacing Copies With Con¬

nections: Managing Software across the Virtual Organization. In

IEEE 8th International Workshop on Enabling Technologies: Infra¬structurefor Collaborative Enterprises, Stanford University, Califor¬

nia, USA, 16-18 June 1999.

[MW91] H. Mössenböck and N. Wirth. The Programming LanguageOberon-2. Structured Programming 12:179-195, 1991.

130Bibliography

[MZW95a] A. Moorman Zaremski and J. M. Wing. Signature Matching: a

Tool for Using Software Libraries. ACM Transactions on Software

Engineering andMethodology 4(2):146-170, April 1995.

[MZW95b] A. Moorman Zaremski and J. M. Wing. Specification Matching

of Software Components. Proceedings ofthe thirdACM SIGSOFT

symposium on the foundations ofsoftware engineering pages 6—17,

October 1995.

[Nau60] P. Naur. Revised Report on the Algorithmic Language ALGOL

60. Communications ofthe ACM, 3(5):299-3l4, May 1960.

[Ous97] J. K. Ousterhout. Tel and the Tk Toolkit Addison-Wesley Profes¬

sional Computing, 1994.

[PH97a] A. Poetsch-Heffter. Specification and Verification of Object-Ori¬

ented Programs. Habilitationsschrift, 1997.

[OMS97] Oberon Microsystems Inc. Component Pascal Language Report

Available at www.oberon.ch, 1997.

[PH97b] A. Poetsch-Heffter. Prototyping Realistic Programming Languages

Based on Formal Specifications. Acta Informatica, 1997.

[Sal98] P. H. Salus, editor. Little Languages and Tools, volume 3 of Hand¬

book ofProgramming Languages Macmillan Technical Publishing,

1st edition, 1998.

[Sche98] D. Scherer. Internet-wide Software Component Development Process

andDeployment Integration PhD thesis 12943, Institut TIK, ETH

Zürich, 1998

[Schm97] D. A. Schmidt. On the Need for a Popular Formal Semantics. In

ACM Conference on Strategic Directions in Computing Research vol¬

ume 32 ofACM SIGPLAN Notices, pages 115-116, June 1997.

[Schw96] D. Schweizer. Oberonj- - eine Programmiersprache für sicher¬

heitskritische Systeme. TIK-Report 21, Computer Engineering and

Networks Laboratory, ETH Zürich, 1996.

[Schw97] D. Schweizer. Ein neuer Ansatz zur Verifikation von Programmen

für sicherheitskritische Systeme PhD thesis 12056, Institut TIK,

ETH Zürich, 1997.

[SD97] D. Schweizer and Ch. Denzler. Verifying the Specification-to-

Code Correspondance for Abstract Data Types. In M. Dal Cin, C.

Meadows, and WH. Sanders, editors, Dependable Computingfor

Critical Applications 6, volume 11 of Dependable Computing and

Bibliography 131

Fault-Tolerant Systems, pages 177—202. IEEE Computer Society,1997.

[Sim96] C. Simonyi, Intentional Programming - Innovation in the Legacy

Age. Presented at IFIP WG 2.1 meeting, June 4, 1996,

http://www.research.microsoft.com/research/ip/ifipwg/ifipwg.htm

[Sim99] C. Simonyi, The Future is Intentional. In IEEE Computer, pp.

56-57. IEEE Computer Society, May, 1999.

[SK95] K. Slonneger and B. L. Kurtz. Formal Syntax and Semantics ofPro¬

gramming Languages. Addison-Wesley, Reading, 1995.

[SMOO] R. M. Stallman and R. McGrath. GNUMake. Manual, Free Soft¬

ware Foundation, April 2000.

[Sml97] Standard ML of New Jersey. Bell Laboratories, URL: ftp://ftp.research.bell-labs.com/dist/smlnj, 1997

[SP97] C. Szyperski and C. Pfister. Workshop on Component-OrientedProgramming, Summary. In M. Mühlhäuser, editor, Special Issues

in Object-Oriented Programming — ECOOP96 Workshop Reader

dpunkt Verlag, Heidelberg, 1997.

[SS99] K. Stanoevska-Slabeva. The Virtual Software House. Informatik/Informatique, 5:37-38, October 1999.

[Ste90] G. L. Steele. Common Lisp: The Language. Digital Press, 2n^ edi¬

tion, May 1990.

[Ste99] G. L. Steele. Growing a Language. In.Journal ofHigher-Order and

Symbolic Computation 12, 3:221—236, October 1999

[Str97] B. Stroustrup. The C++ Programming Lanuage. Addison-Wesley,3rd edition, July 1997.

[Szy97] C. Szyperski. Component Software. ACM Press, Addison-Wesley,1997.

[TC97] S. Thibault and C. Consel. A Framework of Application Genera¬

tor Design. In M. Harandi, editor, Proceedings of the ACM SIG-

SOFT Symposium on Software Reusability (SSR '97) Software

Engineering Notes, 22(3): 131-135, Boston, USA, May 1997.

[Tho99] S. Thompson. Haskell: The Craft ofFunctional Programming. Add¬

ison-Wesley, 2nd edition, 1999.

[Van94] M. T Vandevoorde. Exploiting specifications to improve program

performance. PhD thesis, Department of Electrical Engineeringand Computer Science, MIT, February 1994.

132

Bibliography

[VM99] M. L. van de Vanter and T. Murer. Global Names: Support for

Managing Software in a World ofVirtual Organizations. InNinth

International Symposium on System Configuration Management

(SCM-9), Toulouse, France, 5-7 September 1999.

[W3C98] W3C. Extensible Markup Language (XML) l.Q REC-xml-

19980210 edition, February 1998, W3C Recommendation.

[Wal95] C. R. Wallace. The Semantics of the C++ Programming Lan¬

guage.In E. Borger, editor, Specification and

Validation Methods,

pages131-164. Oxford University Press, 1995.

[Wal98] C. R. Wallace. The Semantics of the Java Programming Language:

Preliminary Version. Technical Report CSE-TR-335-97, EECS

Department, University ofMichigan, 1997.

[WC99] J. C. Westland and T. H. K. Clark. Global Electronic Commerce.

MIT Press, 1999.

[Wei99] M. A. Weiss. Data structures and algorithm analysis in Java Addi¬

son Wesley Longman,1999.

[Wir77a] N. Wirth. Modula - A Languagefor ModularMultiprogramming.

Software Practice and Experience, 7(l):3-35, January 1977.

[Wir77b] N. Wirth. What Can We Do about the Unnecessary Diversity of

Notations for Syntactic Definitions? Communications ofthe ACM,

20(ll):882-883, 1977.

[Wir82] N. Wirth. Programming in Modula-2 Springer-Verlag, 1982.

[Wir86] N. Wirth. Compilerbau, volume 36 of Leitfäden der angewandten

Mathematik und Mechanik LAMM B.G.Teubner, 4th edition,

1986.

[Wir88] N. Wirth. The Programming Language Oberon. Software - Prac¬

tice andExperience, 18:671-690, 1988.

Curriculum Vitae

I was born on July 20, 1968 in Liestal (BL). From 1975 to 1984 I attended pri¬

mary school and Progymnasium in Muttenz. In 1984 I entered High School

(Gymnasium) in Muttenz, from which I graduated in 1987 with Matura

Typus C.

In 1988 I began studying computer science at ETH Zürich. During this time I

did two internships at Integra (now Siemens) and Ubilab (UBS). I received the

degree Dipl. Informatik-Ing. ETH in 1993. My master thesis entitled^ Mes¬

sage Mechanismfor Oberonvtas supervised by Prof. Niklaus Wirth.

Afterwards I started working as a research and teaching assistant at the Compu¬ter Engineering and Networks Lab (TIK) of ETH in the System Engineering

group led by Prof. Albert Kündig.

133

Seite Leer /

Blank leaf

in copyright - non-commercial use permitted rights ... · acknowledgements i wouldlike to thank...

Documents