ii survey final

8/2/2019 II Survey Final

1/22

Creating IDEs for the Eclipse Platform

Bruno Dinis Ormonde MedeirosStudent no. 50966

Introducao a Investigacao survey,

Insituto Superior TecnicoLisbon,

Portugal,http://www.ist.utl.pt

Abstract. Modern IDEs support a set of impressive features, such ascode navigation, code assistance, and code refactoring, which greatlyenhance the productivity of IDE users. Of these, Eclipse JDT stands out

as one of the most advanced open-source IDEs available, as well as beingbased on the Eclipse Platform, an extensible framework for the creationof custom IDEs.This survey explores the issues and techniques concerning the creationof IDEs with rich code manipulation features, for the Eclipse Platform.The architecture, and the various components that form an IDE areexamined, with particular detail given to the data structures that are thebasis for IDE functionality. This survey is based on literature pertainingto the topics of Eclipse, IDE creation, and code refactoring tools, as wellas on analysis of JDTs source and overall design.


2/22

Table of Contents

Creating IDEs for the Eclipse Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I

1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Eclipse Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2.1 Eclipse Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.2 IDE Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

3 Core Component . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43.1 Parser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43.2 AST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53.3 Project Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53.4 Project Builder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.5 Project Nature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 UI Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

4.1 The editor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74.2 IDocument . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84.3 Source Viewer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

5 Advanced Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115.1 Basic AST Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115.2 AST - Homogenous vs. Heterogenous tree . . . . . . . . . . . . . . . . . . . . 115.3 Parser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125.4 Entity References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125.5 DOM AST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135.6 Model scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145.7 Model updating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

5.8 Refactoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17


3/22

1 Introduction

Nowadays, the success of modern programming languages is influenced not onlyby the characteristics of the language itself, but also by the availability andquality of the whole language toolchain: compilers, build tools, debuggers, Inte-grated Development Environment(IDEs), profilers, etc. The IDE integrates withthe other development tools, and provides a rich editing environment, responsi-ble for manipulating a project and its source files. As such, the IDE is a mostcrucial part of the toolchain: it is where the developer workflow is centered, andwhere he spends the majority of his time[1].

IDE tools have seen a large amount of development since their dawn, hav-ing greatly increased in power. As an example, consider modern IDEs such asEclipse1, Microsoft Visual Studio2, or IntelliJ IDEA3. For the Java and C#languages, these IDEs have a rich set of features, including semantic code nav-igation (go to declaration, code outline, etc.), full tool integration (compiler,

builder, debugger), code assistance and completion, and, more recently, variousautomated refactorings. This level of IDE functionality greatly enhances devel-oper productivity[2], and for this reason, aspiring new languages that have IDEswith little more than basic text editing features may quickly find themselves ata serious disadvantage versus the languages where such full-featured IDEs areavailable.

Indeed, the existence of high-quality IDEs for a given language may playa role as important as the quality of the language itself. But as much as it isimportant, implementing an IDE from scratch is also a complex and laborioustask. For this purpose, Eclipse presents a very attractive option: the EclipsePlatform. The Eclipse Platform is an extensible IDE development platform thatoffers[3]:

A comprehensive framework for the development of custom IDEs, providingsupport for generic IDE functionalities.

Integration with complementary development tools, such as the build toolApache Ant, source control systems such as CVS or Subversion, or any othertools available as extensions.

A standard visual and behavioral environment across different languages,and perhaps even the possibility of inter-language integration.

The Eclipse Platform is host to two IDEs that are officially part of Eclipse:JDT4 and CDT5. Both of these have become very popular, particularly JDT,which is recognized as one of the most powerful and feature rich IDEs in exis-tence[4], rivaled only by IntelliJ IDEA, and, to a lesser extent, NetBeans.

1 Eclipse: http://www.eclipse.org2 Visual Studio: http://msdn.microsoft.com/vstudio3 IntelliJ IDEA: http://www.jetbrains.com/idea/4 Java Development Tools, http://www.eclipse.org/jdt5 C/C++ Development Tooling, http://www.eclipse.org/cdt

Creating IDEs for the Eclipse Platform 1
http://www.eclipse.org/http://msdn.microsoft.com/vstudiohttp://www.jetbrains.com/idea/http://www.eclipse.org/jdthttp://www.eclipse.org/cdthttp://www.eclipse.org/cdthttp://www.eclipse.org/jdthttp://www.jetbrains.com/idea/http://msdn.microsoft.com/vstudiohttp://www.eclipse.org/


4/22

The success and continuous growth of JDT and CDT, as well as of the under-lying Eclipse Platform, is notorious, and has attracted many to the developmentof Eclipse-based products. Recent projects have started with the goal of creatingEclipse IDEs for newer languages, such as PHP, Ruby, AspectJ, and others. Un-der this motivation, this survey examines the state of the art in creating IDEswith rich code manipulation features, for the Eclipse Platform (altough manyof the considerations taken here also apply outside of Eclipse). Two particularterms are introduced:

Target Language - The language for which the custom IDE is developed for.Host Language - The language in which the custom IDE is developed. In the

case of Eclipse that is Java.

This paper starts with a small overview of Eclipse and of the two basiccomponents of an IDE, and then explores fundamental considerations for theimplementation of advanced IDE features, particularly in the design of the IDEs

data structures. Basic knowledge of Eclipse (as an user) is assumed.

2 Eclipse Platform

The Eclipse Platform is a comprehensive open-source framework for the devel-opment of IDEs. It offers a rich foundation of building blocks where customIDEs (as well as other kinds of applications) can be built upon and integratedtogether[3].

2.1 Eclipse Architecture

Eclipse is designed and built as a plug-in architecture. A project developed forthe Eclipse Platform consists of a set of one or more plug-ins, where each plug-in represents a logical module of that project, and can depend on other plug-ins. Each plug-in contributes to Eclipse by providing implementations (calledextensions) of well defined extension points, and many of Eclipses componentsare plug-ins themselves (this is represented in Fig. 1). A group of inter-relatedplugins can be grouped in what is called a feature, which allows them to beinstalled and updated together.

2.2 IDE Architecture

Most Eclipse IDEs, regardless of the target language, follow a similar generalarchitecture. They are divided into separate components where each component

is packaged into a plug-in of its own, with a well defined API of interaction.There are usually at least two IDE components: the Core, and User Interface(UI). Additionally, there may be components for debug, the build system, ordocumentation, if their size merit a separate plug-in. Fig. 2 shows one suchbasic IDE architecture.



5/22

Fig. 1. Eclipse Platform Plug-in Architecture

Fig. 2. Basic IDE Architecture



6/22

The core: The core is the brain of the IDE and it is responsible for managingthe domain logic of the projects of the target language. The domain logic is com-posed primarily of two things: the project description, which specifies what arethe projects options, files, etc., and the language model, a structured, semanticrepresentation of the source code of the project. This structure is usually an Ab-stract Syntax Tree (AST), or a derivate of it. It is generated by a language parser(See 3.1), and is a very important data structure, used extensively throughoutthe IDE.

The core is also responsible for building and launching the project, whichusually consists in calling an external compiler with the various project config-uration options. The build system may also collect compiler messages from thisprocess (especially if the compiler is external) and report them back to the user.If this system is complex enough it may sometimes be placed into its own plugin.

UI: The UI is responsible for user interaction and controling the lifecycle and

operations of domain data, such as the language elements or filesystem resources.The UI consists of several components such as the editor, views, outline, actions,wizards, preference pages, etc. The UI is responsible for these components, aswell as the interactions between them and between the Eclipse workbench.

Debug: The debug component is responsible for launching and running thetarget languages programs in a debug environment, providing the functionalityrequired for interactive debugging. Debug may implement a debugger of its own,or it may interface with an external one6. The Debug component may have a UIcomponent of its own, to host debug-specific UI features.

The implementation of the Debug component is not examined in this survey.

3 Core Component

The core is, like the name says, the center of the IDE. In this chapter the basicsubcomponents of an IDE core are examined. In chapter 5 more advanced IDEfeatures are analysed.

3.1 Parser

One of the primary subcomponents of the core is the language parser. Theparsers job is to recognize the language from a source file, while creating ain-memory structured representation of the source code. Aho, Sethi, and Ull-man [5,17] distinguish between two kinds of representations: an abstract syntaxtree (AST) and a parse tree7. The AST differs from the parse tree in that thenodes that are redundant or irrelevant for the program structure are removed or

6 Such as, for example, GNU Debugger (GDB), which is what CDT uses.7 Also called Concrete Syntax Tree (CST).



7/22

simplified, whereas the parse tree is a structure directly or very closely obtainedfrom the parsers grammar productions.

Creating a parser is a task fairly outside the scope of the Eclipse Platform,and as such, Eclipse does not offer direct support for it: the parser is codedindependently of Eclipse. There are several options to do so: pre-existing externalcompiler tools can be used to parse the AST (if such tools exists for the targetlanguage); the parser can be developed from scratch; or the parser can be createdusing a parser generator.

Here is a list (by no means extensive) of some parser generators that generatecode in the Java language:

ANTLR8 - Parser generator developed by Terrence Parr with over 15 years ofdevelopment. Very well supported and documented. Generares LL(*) parsersas of version 3.

JikesPG9 - Parser generator that is part of the Jikes compiler. Fast but notwell documented and no longer actively developed as of 2006. Still, it is the

one used by JDT. JavaCC10 - Java-based LL(1) parser. SableCC11 - Java-based LALR(1) parser generator.

3.2 AST

One of the central data structures that the core must manage is the AST. TheAST, as mentioned before, is created by the parser from a source file, and is astructured, semantic representation of the source code, in tree form.

At the very minimum, the ASTs nodes contain information such as the nodetype, children nodes and their role, and the position in the source code where thenode originates. But since it is the AST structure (and related data types) that

allow for many of the advanced IDE features (like code assist, code refactoring,etc.), then the more of these features one wants the IDE to support, the greaterthe degree of complexity and amount of information that the AST must hold.In section 5 the design issues and functional requisites of the AST required forsuch advanced features are examined in greater detail.

3.3 Project Description

Another important data that the core manages is the project description. Thisdata consists of all the information necessary to describe a project, such as theprojects source folders, files and required libraries (generically called the buildpath12), compiler and build options, compile-time conditionals and variables,etc. This information is used not only to build a project, but some of it is also

8 ANTLR URL: http://www.antlr.org/9 JikesPG URL: http://jikes.sourceforge.net/10 JavaCC URL: http://javacc.dev.java.net11 SableCC URL: http://www.sablecc.org/12 Or classpath in Java-specific terms.



8/22

necessary for other core IDE features. For example, in target languages withstructured module systems13, the build path information is necessary to createa complete language model, because the core needs to know which source filesare part of the project.

3.4 Project Builder

Another core subcomponent is the project builder (or simply builder). Al-though not strictly required, this component is usually present in any non-trivialIDE, as it is better for the user experience if the builder is partially or fully in-tegrated with the IDE, not requiring external action to invoke it.

The builder is responsible for executing whatever steps are necessary forbuilding and compiling the target artefact, such as an executable or a library.Additionally, it may also be in charge of processing the output of the building

process and report any notable messages or events back to the UI. One suchexample are compilation errors and warnings, that are reported back to theEclipse workbench in the form of problem markers14.

Eclipse builders can work in two ways: full or incremental. Incremental build-ers are builders that take into consideration only the elements that have changedsince the last build, and produce the build output without doing a full build.This is capacitated in Eclipse by a mechanism that tracks and collects resourcedeltas and inputs them to the builder when requested.

3.5 Project Nature

Another task of the IDE core, is to define a project nature for the target language.An Eclipse project nature is a mechanism that identifies a workbench projectwith a particular characteristic (such as the projects language). Recall that theEclipse platform can host several IDEs, each targeting their own language, andas such a project can belong to any of these languages. A project nature identifiesto which language the project actually belongs. This allows Eclipse to know forwhich projects it should enable a particular IDE and its respective features (suchas UI elements). A project can have multiple natures, as is the case, for example,of a PDE project, which has a PDE nature as well as a Java nature.

A project nature is also an adequate and common way to attach and configure(and possibly de-attach as well) a builder for that project[6]. When a nature isadded to a project (which usually happens during project creation), then thatnature configures an appropriate builder for the project (and does the inverse in

the case of nature removal).

13 like Java, but not like C/C++.14 Eclipse markers are a mechanism to annotate resources with various informations,

such as compiler messages, to-do list, bookmarks, breakpoints, etc.



9/22

4 UI Integration

The next integration element is the UI component of the IDE. This component

consists of elements such as views, outline, actions and menus, preference pages,wizards, and of course the editor itself. Figure 3 illustrates the various UI el-ements of the Eclipse workbench. Most of these UI subcomponents are simpleand straightforward to implement, and one can easily learn how to create themby looking at implementation examples or the respective documentation. Butsome subcomponents are a bit more intricate, namely the source code editorand its underlying text framework, so an overview of this framework and itscustomization mechanism is given.

Fig. 3. Eclipse Workbench UI

4.1 The editor

An Eclipse editor is a contribution to the org.eclipse.ui.editors extensionpoint, together with a class implementation of the org.eclipse.ui



10/22

.IEditorPart interface, which specifies the required interface for any kind ofeditor, be it a text editor, a visual editor, a multi-page editor, etc. To create acustom editor, one could implement this interface from scratch, but that is notthe recommend approach for source code editors, since Eclipse provides us withpre-existing abstract implementations for this kind of editor. Namely, there is theclass org.eclipse.ui.texteditor.AbstractTextEditor for generic text edi-tors, and its subclass org.eclipse.ui.texteditor.AbstractDecoratedTextEditor for source code editors.

An AbstractTextEditor has two main components of interest, the sourceviewer (org.eclipse.jface.text.source.SourceViewer) and the document(org.eclipse.jface.text.IDocument). The SourceViewer is a JFace adapterthat wraps the raw SWT15 widget StyledText into a more high-level abstrac-tion. The IDocument instance represents the document which is the input to theeditor and source viewer. These components are structured in a Model-View-Controller (MVC) design. The IDocument instance is the domain Model (which

wraps both language model information as well as textual information). TheSourceViewer is the Controller (despite its name) and the View is the under-lying StyledText widget. The SourceViewer is then responsible for updatingthe views presentation according to changes in the input document and vice-versa. The editor is integrated tightly with these MVC components, and providesa layer of interaction between them and the Eclipse workbench context, mean-ing it controls the contributions and interactions with the Eclipse workbench.Eclipse editors follow an open-save-close lifecycle.

4.2 IDocument

An IDocument is a JFace component which holds the textual data of its under-lying input source (usually a file). It provides support for:

Text Manipulation - Modifying the underlying source text. Line Information - Providing the line number for a given offset. Positions - Maintaining information about several positions of interest in the

document, and updating them during document changes (these positions areactually ranges).

Partitions - Maintaining information about document partitions. Document change listeners and document partition change listeners - Keep-

ing a list of document and partition change listeners, and notifying themwhen their respective events occur.

An IDocument allows itself to be divided into several non-overlapping regionscalled partitions, where each partition has an associated partition type (alsocalled content type). These partitions represent logical divisions in the docu-ments text (such as comments or strings in a source code file), and are usedin the configuration of several features of the editor and SourceViewer. For ex-ample, features such as syntax highlighting and code completion are configured

15 Simple Widget Toolkit - the GUI widget set used by Eclipse.



11/22

differently for each partition type. An IDocument is set up with an instance ofa partitioner, a class that knows how to calculate the documents partitions, aswell as update them when there are document changes.

An IDocument instance is not directly provided to an editor. Instead the ed-itor uses an org.eclipse.jface.text.IDocumentProvider implementation tocreate an IDocument instance from the given editor input. For example, whenthe input is a file (as is the usual case), the IDocumentProvider acts as a per-sistance manager, and is responsible for loading and saving the IDocument fromthe the filesystem (the storage medium), keeping track of already created doc-uments, and notifying interested listeners of changes in the filesystem. Eclipseprovides a default implementation for this kind of document provider, which isorg.eclipse.ui.editors.text.FileDocumentProvider.

An editors IDocumentProvider is also responsible for creating the annota-tion model from the editor input, as well as setting up the partitioner for thecreated editor document. For this reason the IDocumentProvider is one of the

first entry points for editor customization, usually done by subclassing a classsuch as FileDocumentProvider and extending the appropriate methods.

4.3 Source Viewer

The SourceViewer is the editors main component, as it is responsible for theeditors document presentation and editing features. The SourceViewer ab-stracts away many of the base functionality common to source code editors,thus offering an extensive component for customization of common editor fea-tures (such as syntax highlighting, code completion, hovering, etc.). To imple-ment custom editor features, one subclasses not the SourceViewer itself, but theorg.eclipse.jface.text.source.SourceViewerConfiguration class, a com-ponent of a SourceViewer, which is the central point for editor customizations.

The editors SourceViewer is configured with the customSourceViewerConfiguration, when the editor is initialized.

SourceViewerConfiguration contains a series of getter methods, each re-sponsible for one particular editor feature. To implement such a feature, oneoverrides the respective getter method, making it return a custom class that willfurther control how the feature will operate. The available features to implementare:

Syntax Highlighting - Highlights regions of the text according to a damage-repair mechanism and a token scanner.Method: IPresentationReconciler getPresentationReconciler()

Text Hovers - Displays small tooltips in the editor area presenting information

about the current selection or the element the mouse is pointing to. Usuallythis information is the documentation comments of methods, classes, etc.Method: ITextHover getTextHover()

Auto Edits - Configures various auto edits, which are automatic textual changesthat the editor performs, like adding a closing brace or quotes when the open-



12/22

ing one is typed.Method: IAutoEditStrategy[] getAutoEditStrategies()

Code Completion - Displays a list of possible completions for a method, vari-able or class that the user is typing, according to the partial name alreadyentered. This same extension option is responsible for code templates, whichare pre-defined pieces of code that are inserted into the source, when a codetemplate is requested to be inserted during code completion (code templatesappear in the same popup list as code completion proposals).Method: IContentAssistant getContentAssistant()

Double-Click Selection - Enables language-aware selections whendouble-clicking in the editor. This feature can select identifiers (accordingto what the language considers identifiers), the text range between braces orparenthesis, or any other selection deemed useful.Method: ITextDoubleClickStrategy getDoubleClickStrategy()

Hyperlink detection - Detects if what the mouse is pointing to in the texteditor is a hyperlink or not, and if so, what to do if the user requests tofollow the link. The link can be language neutral links like URL addressesinside strings or comments, or language specific links such as a link froman entity reference to its definition (an alternative way to invoke the Go ToDefinition action).Method: IHyperlinkDetector[] getHyperlinkDetectors()

Code Formatting - Defines a formatter for the editor. The formatter reorga-nizes certain textual elements of the source file, such as indentation, spaces,newlines, etc., in order to make the code prettier and more presentable tothe user.Method: IContentFormatter getContentFormatter()

Quick Fixes - Quick fixes are a feature, similar to code completion, where theuser selects a problem in the project (such as a compilation error, namemismatch, type conflict, etc.), and is presented with a list of fix proposalsfor that given problem. There can be multiple fix proposals for each problem,being up to the user to choose one to be applied.Method: IQuickAssistAssistant getQuickAssistAssistant()

Model Reconciler - This configuration option allows one to specify a classthat will be in charge of reconciling the language model (such as the ASTor related structures) whenever textual changes occurs (such as the usertyping new characters). The reconciler runs in an asynchronous way, andcan be configured to operate in either incremental or non-incremental mode.

In incremental mode the reconcilier collects the textual changes since thelast update and creates a textual delta that is used to update the model.In non-incremental mode, the update is simply performed with the wholesource file. See also Sect. 5.7.Method: IReconciler getReconciler()



13/22

5 Advanced Concepts

The cores data structures are the major point of support for most IDE func-tionalities. This section now describes various design issues and functional re-quirements necessary for advanced IDE features. Most of these considerationsare Eclipse-independent.

5.1 Basic AST Design

The first point to note about the AST (especially if one is trying to reuse existingcompiler tools to parse the AST), is that the AST design must preserve allsource code information relevant to the user, whereas a compiler only needsto work with the information necessary to generate compiled code. This meansthat language constructs such as comments or preprocessor directives, which cansafely be ignored in a compiler AST, should be recorded in some way in the data

structures generated for IDE usage. That is, the AST and overall language modelof an IDE should be at the same abstraction level of that which is the user viewof the source code. Code formatting is an example of one such feature which isdifficult or impractical to implement without an adequate level of informationin the IDEs data structures.

Here is some of the basic information an AST node should have:

Parent node: A link to the parent node. This allows traversing the tree in anupward as well as downward direction.

Source range: The source range is the start position and end position in thesource code where the node appears, usually coded as a character offsetand length. This information allows the user to navigate back into the orig-inal source text, and is used for selections and other features. This range

should start with the first significant character of the node, and end withthe last significant character. For maximum usefulness, the range should alsobe present in an all nodes, and be recursively consistent, such that the rangesof a nodes children should be non-overlapping, and contained in their parentnodes range.

Compilation unit: To which compilation unit (if any) this AST node belongsto.

5.2 AST - Homogenous vs. Heterogenous tree

One design aspect of the AST is whether the tree should be a homogenous orheterogenous tree. A homogenous tree is one where there is only one class for allthe nodes. In a heterogenous tree there are different classes for each node type,

altough they still share a common parent class (usually named ASTNode, as JDTdoes).

Homogenous trees are simpler and faster to implement, and are easier totraverse, but heterogenous trees make it easier to work with the particularitiesof each type of tree node, and so are ultimately considered as the most adequate



14/22

choice (both JDT, CDT, and nearly all non-trivial custom IDEs use heterogenoustrees).

To make traversing heterogenous tree nodes simpler, a Visitor pattern isusually employed, allowing concrete AST visitors to dynamically dispatch todifferent methods according to the type of a node. JDT introduced some usefuladditions to the basic visitor pattern([7], chapter 33): First, the visitor actu-ally has two visit methods, visit() and endVisit(). visit() is called whendescending into a node, and endVisit() is called after the nodes children arevisited. visit() returns a boolean controlling whether the visitor should actu-ally descend into the nodes children or not. Second, there are also two genericvisit methods, preVisit() and postVisit(), which are used to traverse thenode in a non-type specific way, as opposed to the normal visitor pattern. Theoverall invocation order is then[8]:

1. preVisit(ASTNode node)2. visit( node)3. Now visit the nodes children if visit() return true.4. endVisit( node)5. postVisit(ASTNode node)

5.3 Parser

A basic language parser is not too difficult to implement, however, there arecertain IDE features that may require special parser capabilites not present in asimple parser. Two such capabilites are error-recovery and partial parsing.

Error-recovery is the ability for the parser to generate an incomplete, but stillmeaningful AST tree in the presence of syntax errors. This becomes importantin keeping the IDE functional in the moments when the user is editing a source

file and has an incomplete, syntatically invalid file. When the parser is capable ofdoing error recovery, the AST is usually annotated with information of whetherthe AST node come from sintatically valid source or not.

Partial parsing is the ability to parse only a segment of the source file, wherethis segment corresponds to a complete syntax element, such as a statement ordeclaration. This ability becomes useful when one wants to obtain informationabout a given language element (like a function declaration for example), butdoes not want to re-parse the whole compilation unit. Note that doing partialparsing requires knowing beforehand the source range of the element one wantsto parse.

5.4 Entity References

In an AST, many of the AST nodes represent references to other named languageelements. As such, a key aspect of the language model is the ability to, givena reference in the code, find the referenced entity. These references are calledbindings in JDT, and they are used by many other IDE features. A basic exampleis the Go To Definition operation, where given a selected type or variable



15/22

reference, the cursor and editor focus is set to the location where the referencedentity is defined. A more advanced example is the inverse operation: given anentity definition, find all references to that entity.

In most languages, an entity reference is usually restricted in the kind ofelements it can refer to, such as a type kind or a variable kind. For example, aname reference appearing in a mathematical expression should refer to a variableentity, and not a type entity. It is useful to specify these restrictions in the ref-erence structures of the language model. The references may also be categorizedin more fine-grained types, according to the roles they play in the nodes theyappear[8]. Here are some possible roles, taken from JDTs bindings:

1. Name reference - An explicit name reference in an AST to a type, function,variable, etc.

2. Type reference - the type of an AST expression.3. SuperType reference - the super type of a given type.4. Thrown exceptions - the expressions thrown by an expression or statement.5. Members reference - the members of a given type.

An AST node can have more than one such reference. For instance, an ex-pression node can have a reference to the type of the expression, as well as aname reference to a variable present in the expression. Usually the validity ofreferences ends when there are model changes, as is the case in JDT[9].

5.5 DOM AST

An advanced AST technique is to design the AST in the form of a DocumentObject Model (DOM). The term DOM comes the XML/HTML DOM, and isdefined as a platform and language-neutral interface, that defines a standardmodel of the logical structure of a document, as well as a standard interface

for accessing and manipulating that structure[5,18]. The term DOM has beengeneralized from XML to any structure that fits that definition, such as a DOMAST. The benefits of a rich AST manipulation mechanism such as a DOM ASTare manifest in AST manipulation operations, particularly refactoring operationswhich are usually complex in nature.

Designing an AST with DOM capabilities means the following in terms ofcode: Instead of just coding the children of an AST node as class fields of thatnode, such as this:

class WhileStatement extends ASTNode {

Expression condition;

Statement body;

...

one also defines in the node class certain structural information (called structuralproperties in JDT[8], just as in general XML DOM parlance), that for each ofthe nodes elements describe certain attributes such as the name of the element,whether the element is mandatory or not, the kind of element (single children,



16/22

list of children, or simple property), etc. Figure 4 shows an example of the struc-tural properties of a Java method declaration, according to JDT. This structuralinformation allows the IDE to conveniently and dynamically inspect, as well asmanipulate, the structure of an AST. This functionality is similar to languagereflection, and in fact the reflection capabilites of Java, the host language, wouldallow to do this in a certain degree, but the DOM mechanism is much moreadequate.

Fig. 4. Structural properties of a Java method declaration

5.6 Model scalabilityAn important issue with the language model, is scalability. Creating a completemodel for a project, such as creating AST nodes for the source files, implies creat-ing structures with expensive memory footprints. This can become problematicif a project is large and has many source files.



17/22

Most IDE operations, like the outline view, navigator view, code completion,etc., revolve around the manipulation of named language elements only (suchas classes, methods, variables, etc.) and do not descend into more precise ASTelements such as statements, expressions, and their various children. In fact, anAST is a too fine-grained structure for most IDE operations, and its use in suchsituations is a significant memory waste.

Realizing this, JDT has separate data structures for named language ele-ments, the set of which is called the Java Model, and which is used instead ofAST nodes for many IDE features. In fact, full ASTs are usually only used instructured code manipulation and editing (such as in refactoring). These datastructures are lightweight: they have minimal info, such as the name of the ele-ment, the elements parent, the children, and the source range (if applicable). InJDT, the Java Model also includes named language elements that exist outsideof source code text, such as packages, and the compilation units themselves.

Even being much lighter than the AST structure, computing this name model

for the whole project might still be too expensive. As such, an additional strategythat can be employed is the use of lazy-loading[10] (which again is the case inJDT). Instead of computing a model for the whole project, initially only thefiles in which the user is working on have their element information computed.If during the course of IDE interaction some operations (such as opening a newcompilation unit, or requesting completion assistance) require information ofanother language element, then only then will the full element information beretrieved. This is called opening the element.

This lazy loading mechanism is then complemented with caching of the openelements, so that when a certain limit of open elements is reached, a cachemanager tries to close some of the open ones, so as to keep the overall mem-ory footprint from growing during the course of time. In JDT this cache is anunbounded Least Recently Used (LRU) cache.

5.7 Model updating

Another important aspect of the IDE architecture is the model update behavior.Since an IDE is interactive, the user will often be modifying the source code,which will cause the underlying language model to become outdated. The modelwill then need to be re-calculated to reflect the latest changes. This process canbe done in several ways:

The model is updated on each project build or file save. The model is updated on the fly, as the user alters the text file. The model is updated on a fixed interval, like 500ms, or 200ms.

The first alternative is the most simple behavior, and the easiest to implement,but it is also the least satisfactory one, as the user must manually invoke a buildor save for the model to be updated.

In the second alternative the model is updated constantly, with each newkeystroke. This may seem the ideal behavior, but it has a notable problem:



18/22

parsing is a costly operation, and performing it with each new keystroke will taxthe Eclipse UI to the point where it will become sluggish or unresponsive. Forthis behavior to work properly, a very fast and scalabe parser is necessary, whichusually means the parser needs to be incremental[10] (i.e., able to create the newmodel without re-parsing the whole file). However, building such a parser is avery complicated task. Not even JDT updates the model in this way. Instead, thethird alternative, which is to update the model periodically on a small interval,is the most common and adequate method for most languages. The model isupdated on a background thread, so that even in the moments it is active theUI wont be slowed down.

Eclipse provides some support for any of these two behaviors, in the form ofwhat is called the source Reconcilier (see the SourceViewer Model Reconcileroption, Sect. 4.3).

5.8 Refactoring

Refactoring is the process of altering existing code in order to improve its read-ability or internal structure, but without the explicit intent of altering its exter-nal behavior[11]. Examples of refactoring operations are: renaming or moving amethod or variable; extracting a method; extracting a superclass; etc.

Refactoring is regarded as a highly valuable software development technique,but its benefits are hard to realize without the support of automated refactoringtools[12]. Performing a refactoring manually requires various tedious code alter-ations, as well as the creation of a suite of tests to ensure that the refactoringchange was performed correctly and did not introduce a bug [11]. One of the firstlanguages to support automated refactoring was Smalltalk, with its RefactoringBrowser tool. According to Opdyke (in [11], Chapter 14), this tool, which ini-tially was a stand-alone tool, was actually mostly ignored until it was integratedinto the Smalltalk development tool. This illustrates the importance of havingthis feature in an IDE.

Language Toolkit The vast support in JDT for various automated refac-torings of Java code, has made this technique popular amongst Java develop-ers, and has significantly raised the bar for the code manipulation support ofother IDEs, Eclipse-based or not. Recognizing the importance of refactoring,the JDT team has abstracted the generic functionalities for refactoring into alanguage neutral layer of the Eclipse Platform, which is called the LanguageToolkit (LTK), and can be found in the org.eclipse.ltk.core.refactoringand org.eclipse.ltk.ui.refactoring plug-ins. These plugins offers genericrefactoring support in three main aspects: refactoring lifecycle, refactoring par-

ticipants, and UI support[13]:

Refactoring lifecycle - LTK provides a framework which controls the lifecycleof a refactoring, where each phase of the lifecycle must be explicitly definedby each new refactoring operation. The phases are:



19/22

1. Checking the validity of initial conditions, such as if the selected ele-ment supports the requested refactoring operation, or if the source fileis writable.

2. Requesting additional user input, if necessary.3. Checking for validity of final conditions, so as to ensure the refactoring

can be applied correctly (i.e., without altering program behavior).4. Create the set of textual changes of the refactoring.5. Display a preview of the changes to the user so that he may review them

and confirm the refactoring application.

Refactoring participants - Sometimes a code entity can be referred in dis-parate language domains, such as a Java variable being used in a JSP file,or a Java JNI16 method which is linked to a C function. Usually an IDE isonly aware of one of these contexts, and consequently only knows how toperform refactoring on that context. Other contexts must be updated seper-ately. LTK provides support for what it calls the refactoring participants,

a mechanism that allows other plug-ins to selectively take part in refactor-ing operations via plug-in extensions, thus effectively enabling automatedrefactoring across language domains.

UI component for refactoring - Finally, LTK presents UI support for a refac-toring operation, which consists of an UI wizard dialog, that controls theadvance of each step of the refactoring, provides user input, and presents apreview dialogue that displays the textual changes that the refactoring willperform (see Fig. 5).

Of course, the actual logic of the refactoring operations must be implemented bythe IDE developer, for each desired operation. Fowler details in his book[11] thenecessary steps to perform the most common refactoring operations in ObjectOriented code.

6 Conclusions

The development tools that a language has available, particularly the IDE, areof vital importance to the productivity and success of that language. The EclipsePlatform is an extensible IDE development framework that by abstracting awaycommon IDE functionality, offers great potential to those interested in creatingan IDE for a new language. Learning each aspect of the Eclipse Platform maytake a while, but in the end, it will surely compensate in terms of the developmentwork saved, and in the infrastructure and functionality provided by Eclipse.

A modern IDE is expected to have several features, such as a code editor,syntax highlighting, a file/project explorer, outline, program builder, code com-pletion, debugger, code formatting, and in advanced IDEs, refactoring. All ofthese features have some degree of support in the Eclipse Platform framework.

16 Java Native Interface, a protocol which allows for Java methods to be implementedby C functions.



20/22

Fig. 5. Refactoring Preview Dialog



21/22

To enable that support, it is necessary to extend the framework with implemen-tations of language-specific parts of the IDE.

An Eclipse-based IDE has two main components, the Core and the UI. TheCore is the most extensive component, since the majority of it is composed oflanguage-specific functionality. The UI on the otherhand, has most of its basefunctionality enabled by the Eclipse Platform, as most UI elements are commonto any IDE.

The implementation of the Core is then a central task in IDE creation, and afundamental aspect of this task is the creation of a model that represents the tar-get languages projects and source code. This model must be increasingly moreadvanced and detailed as more advanced IDE features are to be supported. Forexample, navigation features such as Go To Definition or its reverse, requirea proper code reference management mechanism. Refactoring on the other handrequires a rich model manipulation mechanism. The IDE core must also take intoconsideration some important non-functional requirements, such as performance

and scalabity, which in turn also impose that certain specific functionality beadded to the model. Finally, as the result of all these requirements, the IDEsmodel may ultimately be divided in diferent structures, each specialized for aparticular use, such as an AST structure for code manipulation, and lightweightelement references for navigation/assistance of named language elements.



22/22

References

1. G. Booch, A. Brown: Collaborative Development Environments. Advances in Com-puters, Vol. 59, Academic Press, (2003). (http://www.booch.com/architecture/blog/artifacts/CDE.pdf)

2. J. des Rivieres, J. Wiegand: Eclipse: A platform for integrating development tools.IBM Systems Journal, Vol. 43, No. 2, pp. 371383, (2004). (http://researchweb.watson.ibm.com/journal/sj/432/desrivieres.html)

3. Eclipse Platform Technical Overview. Eclipse Corner Whitepaper(2006). (http://www.eclipse.org/articles/Whitepaper-Platform-3.1/eclipse-platform-whitepaper.html)

4. G. Goth: Beware the March of This IDE - Eclipse Is Overshadowing OtherTool Technologies: IEEE Software, Vol. 22, No. 4, pp. 108111, (2005). (http://ieeexplore.ieee.org/xpls/abs all.jsp?arnumber=1463218)

5. J. Arthorne, C. Laffra: Official Eclipse 3.0 FAQs. Addison-Wesley, (2004).6. E. Clayberg, D. Rubel: Eclipse: Building Commercial-Quality Plug-Ins. Addison-

Wesley, (2004) (http://www.qualityeclipse.com)7. E. Gamma, K. Beck: Contributing to Eclipse: Principles, Patterns, and Plug-Ins.

Addison-Wesley, (2003)8. Kuhn T., Thomann O. : Abstract Syntax Tree. Eclipse Corner Articles, (2006).

(http://www.eclipse.org/articles/Article-JavaCodeManipulation AST/index.html)

9. M. Aeschlimann, D. Baumer,J. Lanneluc: Java Tool Smithing - Extending theEclipse Java Development Tools. In EclipseCon 2005 presentation, (2005). (http://eclipsecon.org/2005/presentations/EclipseCON2005 Tutorial29.pdf)

10. P. Deva: Create a commercial-quality Eclipse IDE, Part 1, 2 and 3.IBM developerWorks, (2006). (http://www.ibm.com/developerworks/edu/os-dw-os-ecl-commplgin1.html)

11. M. Fowler, K. Beck, J. Brant, W. Opdyke, D. Roberts: Refactoring - Improvingthe design of existing code. Addison-Wesley, (2004).

12. W. Opdyke: Refactoring Object-Oriented Frameworks. Ph.D. thesis, Uni-versity of Illinois, (1992). (ftp://st.cs.uiuc.edu/pub/papers/refactoring/opdyke-thesis.ps.Z)

13. Frenzel L. : The Language Toolkit: An API for Automated Refactorings in Eclipse-based IDEs. Eclipse Corner Articles (originally in Eclipse Magazin, Vol. 5, January2006), (2006). (http://www.eclipse.org/articles/Article-LTK/ltk.html)

14. The Eclipse website at www.eclipse.org (2006).15. The Eclipse JDT source code at CVS://:pserver:[email protected]:

/cvsroot/eclipse (2006).16. Eclipse Help - Platform Plug-in Developers Guide. http://help.eclipse.org/

help31/index.jsp (2006).17. A. Aho,R. Sathi, J. Ullman : Compilers, Principles, Techniques, and Tools.

Addison-Wesley, (1986).18. W3C (World Wide Web Consortium): Document Object Model (DOM) Level

1 Specification. W3C Technical report, (1998) .(

http://www.w3.org/TR/1998/

REC-DOM-Level-1-19981001/ )

http://www.booch.com/architecture/blog/artifacts/CDE.pdfhttp://www.booch.com/architecture/blog/artifacts/CDE.pdfhttp://researchweb.watson.ibm.com/journal/sj/432/desrivieres.htmlhttp://researchweb.watson.ibm.com/journal/sj/432/desrivieres.htmlhttp://www.eclipse.org/articles/Whitepaper-Platform-3.1/eclipse-platform-whitepaper.htmlhttp://www.eclipse.org/articles/Whitepaper-Platform-3.1/eclipse-platform-whitepaper.htmlhttp://www.eclipse.org/articles/Whitepaper-Platform-3.1/eclipse-platform-whitepaper.htmlhttp://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1463218http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1463218http://www.qualityeclipse.com/http://www.eclipse.org/articles/Article-JavaCodeManipulation_AST/index.htmlhttp://www.eclipse.org/articles/Article-JavaCodeManipulation_AST/index.htmlhttp://www.eclipse.org/articles/Article-JavaCodeManipulation_AST/index.htmlhttp://www.eclipse.org/articles/Article-JavaCodeManipulation_AST/index.htmlhttp://eclipsecon.org/2005/presentations/EclipseCON2005_Tutorial29.pdfhttp://eclipsecon.org/2005/presentations/EclipseCON2005_Tutorial29.pdfhttp://www.ibm.com/developerworks/edu/os-dw-os-ecl-commplgin1.htmlhttp://www.ibm.com/developerworks/edu/os-dw-os-ecl-commplgin1.htmlhttp://www.ibm.com/developerworks/edu/os-dw-os-ecl-commplgin1.htmlftp://st.cs.uiuc.edu/pub/papers/refactoring/opdyke-thesis.ps.Zftp://st.cs.uiuc.edu/pub/papers/refactoring/opdyke-thesis.ps.Zhttp://www.eclipse.org/articles/Article-LTK/ltk.htmlhttp://www.eclipse.org/articles/Article-LTK/ltk.htmlhttp://www.eclipse.org/http://cvs//:pserver:[email protected]:/cvsroot/eclipsehttp://cvs//:pserver:[email protected]:/cvsroot/eclipsehttp://help.eclipse.org/help31/index.jsphttp://help.eclipse.org/help31/index.jsphttp://www.w3.org/TR/1998/REC-DOM-Level-1-19981001/http://www.w3.org/TR/1998/REC-DOM-Level-1-19981001/http://www.w3.org/TR/1998/REC-DOM-Level-1-19981001/http://www.w3.org/TR/1998/REC-DOM-Level-1-19981001/http://help.eclipse.org/help31/index.jsphttp://help.eclipse.org/help31/index.jsphttp://cvs//:pserver:[email protected]:/cvsroot/eclipsehttp://cvs//:pserver:[email protected]:/cvsroot/eclipsehttp://www.eclipse.org/http://www.eclipse.org/articles/Article-LTK/ltk.htmlftp://st.cs.uiuc.edu/pub/papers/refactoring/opdyke-thesis.ps.Zftp://st.cs.uiuc.edu/pub/papers/refactoring/opdyke-thesis.ps.Zhttp://www.ibm.com/developerworks/edu/os-dw-os-ecl-commplgin1.htmlhttp://www.ibm.com/developerworks/edu/os-dw-os-ecl-commplgin1.htmlhttp://eclipsecon.org/2005/presentations/EclipseCON2005_Tutorial29.pdfhttp://eclipsecon.org/2005/presentations/EclipseCON2005_Tutorial29.pdfhttp://www.eclipse.org/articles/Article-JavaCodeManipulation_AST/index.htmlhttp://www.eclipse.org/articles/Article-JavaCodeManipulation_AST/index.htmlhttp://www.qualityeclipse.com/http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1463218http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1463218http://www.eclipse.org/articles/Whitepaper-Platform-3.1/eclipse-platform-whitepaper.htmlhttp://www.eclipse.org/articles/Whitepaper-Platform-3.1/eclipse-platform-whitepaper.htmlhttp://researchweb.watson.ibm.com/journal/sj/432/desrivieres.htmlhttp://researchweb.watson.ibm.com/journal/sj/432/desrivieres.htmlhttp://www.booch.com/architecture/blog/artifacts/CDE.pdfhttp://www.booch.com/architecture/blog/artifacts/CDE.pdf

ii survey final

Documents