1
Applying Patterns for Reengineering to the Web
Author: Uwe Zdun
Affiliation:
New Media Lab, Department of Information Systems, Vienna University of Economics, Austria
Address:
Vienna University of Economics
Department of Information Systems
Uwe Zdun
Augasse 2-6
1090 Wien
Austria
Phone: ++43 1 313 36 4796
Fax: ++43 1 313 36 746
Email: [email protected]
2
Applying Patterns for Reengineering to the Web
ABSTRACT
Today reengineering existing (legacy) systems to the web is a typical software
maintenance task. In such projects developers integrate a web representation with
the legacy system’s API and its responses. Often the same information is provided to
other channels than HTTP and in other formats than HTML as well, and the old
(legacy) interfaces are still supported. Add-on services such as security or logging
are required. Performance and scalability of the web application might be crucial.
To resolve these issues, many different concepts and frameworks have to be well
understood, especially legacy system wrapping, connection handling, remoting,
service abstraction, adaptation techniques, dynamic content generation, and others.
In this chapter, we present patterns from different sources that resolve these issues.
We integrate them to a pattern language operating in the context of reengineering to
the web, and present pattern variants and examples in this context.
Keywords: IS Maintenance, Legacy Migration, Reengineering, Software Patterns,
Software Architecture, Object Oriented Design.
INTRODUCTION
Many existing software systems have to be migrated to the web. That means, the legacy system
gets an interactive web interface, typically in addition to its existing interfaces. This web
interface forwards web requests to the legacy system, decorates the responses of the system with
HTML markup, and sends the results to the web client. Inputs are handled via the web browser,
say, using HTML links and forms. For small applications building a web interface is quite
simple; however, for larger, existing systems there are some typical issues, including:
- high flexibility and adaptability of the web interface,
- reuse of the presentation logic,
- high performance,
- preserving states and sessions,
3
- user management, and
- serving multiple channels.
That means, implementing a web interface for a larger, existing legacy system is a non-trivial
task that is likely to be underestimated. This underestimation, in turn, results in unexpected
maintenance costs and development times. However, there are many successful projects that have
avoided the common pitfalls.
Software patterns are a way to convey knowledge from successful solutions. To tackle the
issues named above, we present patterns from various pattern languages and pattern catalogs that
can be applied in the context of reengineering to the web. In particular, we discuss connection
and invocation handling, legacy system wrapping and adaptation, dealing with asynchronous
invocations, concurrency handling, service and channel abstraction, session management, and
dynamic content generation and conversion. The patterns are originally presented in different
contexts; in this chapter we present them in variants specific for reengineering to the web. This
way we build up a pattern language for reengineering to the web, consisting of patterns already
documented elsewhere in related, but yet different, contexts.
The remainder of this chapter is structured as follows. In the next section we give a brief
problem outline, and then we briefly explain the concepts ’pattern’ and ’pattern language.’ Next,
we discuss connection and invocation handling in web applications that connect a legacy system
to the web. Then we introduce solutions for legacy system wrapping, adaptation, and decoration.
In the following section we discuss how to provide the services of a legacy system to more than
one channel. Also we illustrate how to provide sessions and state in web applications despite the
HTTP protocol being stateless. Next we discuss content generation, representation, and
conversion for the web. The following section deals with the integration of typical add-on
services, such as security or logging. Finally, we give an overview and discussion of the pattern
language that we have explained incrementally in the preceding sections, provide a case study,
and conclude.
PROBLEM OUTLINE
In the context of reengineering to the web, a legacy application gets an (additional) interactive
web interface that decorates the outputs of the system with HTML markup and translates the (e.g.
form-based) inputs via the web browser into the (legacy) system’s APIs. At first glance, this
4
seems to be a conceptually straightforward effort, though it might turn out to be a lot work for
larger systems.
In Figure 1 we can see a simplistic three-tier architecture for interactive, web-based
applications. A web user agent, such as a browser, communicates with a web server. The web
server “understands” that certain requests have to be handled interactively. Thus, it forwards the
request and all its information to another module, thread, or process that translates the HTTP
request to the legacy system’s APIs. An HTML decorator builds the appropriate HTML page out
of the system’s response and trigger the web server to send the page to the web client.
WebServer
incoming request
LegacySystem
RequestDecoder
call conforming tolegacy system's APIs
APIs
HTMLDecorator legacy system's
responseresponse in HTML
Figure 1: Simplistic Three-Tier Architecture for (Re-)engineering to the Web
Using this simple architecture for a given legacy system, we can use a simple process model for
migrating to the web. In particular the following steps have to be performed in general. Of course there are
feedback loops during these steps, and they are not in any particular order:
- Providing an Interface API to the Web: To invoke a system’s functions/methods and
decorate them with HTML markup, we first have to identify the relevant components
and provide them with distinct interfaces (which may, in the ideal case, exist already).
These interface have to satisfy the web engineering project’s requirements. To find
such interfaces, in general, for each component there are two possibilities:
o Wrapper: The developer can wrap the component with a shallow wrapper
object that just translates the URL into a call to the legacy system’s API. The
wrapper forwards the response to an HTML decorator or returns HTML
fragments.
o Reengineered or Redeveloped Solution: Sometimes it is suitable to reengineer
or redevelop a given component completely, so that it becomes a purely web-
5
enabled solution. For instance, this makes sense, when the legacy user interface
does not need to be supported anymore.
- Implementing a Request Decoder: A component has to map the contents of the URL
(and other HTTP request information) to the legacy system’s API or the wrapper’s
interface respectively.
- Implementing an HTML Decorator Component: A component has to decorate the
legacy system’s or wrapper’s response with HTML markup.
- Integrating with a Web Server: The components have to be integrated with a web
server solution. For instance, these components can be CGI-programs, run in different
threads or processes, may be part of a custom web server, etc.
In our experience, this architecture is in principal applying for most reengineering to the web
efforts and for many newly developed web-enabled systems (that have to serve other channels
than HTTP as well). However, this architecture and process model do not depict the major
strategic decisions and the design/implementation efforts involved in a large-scale web
development project well. There are many issues, critical for success, that are not tackled by this
simplistic architecture, including: technology choices, conceptual choices, representation
flexibility and reuse, performance, concurrency handling, dealing with asynchronous invocations,
preserving states and sessions, user management, and service abstraction.
In this chapter, we will survey and categorize critical technical aspects for reengineering to the
web using the patterns of successful solutions. Thus we will step by step enrich the simple
architecture presented above to gather a conceptual understanding which elements are required
for reengineering a larger system to the web. Moreover, this way we will build up a framework to
categorize web development environments and packages. An important goal is to assemble a
feature list and terminology for mapping requirements to concrete technological and conceptual
choices so that new projects can easier decide upon used frameworks.
PATTERNS AND PATTERN LANGUAGES
In this chapter, we present a pattern language for reengineering to the web. In short, a pattern is a
proved solution to a problem in a context, resolving a set of forces. Each pattern is a three-part
rule, which expresses a relation between a certain context, a problem, and a solution [Alexander,
6
1979]. In more detail, the context refers to a recurring set of situations in which the pattern
applies. The problem refers to a set of goals and constraints that typically occur in this context,
called the forces of the pattern. The forces are a set of factors that influence the particular
solution to the pattern’s problem. A pattern language is a collection of patterns that solve the
prevalent problems in a particular domain and context, and, as a language of patterns, it
specifically focuses on the pattern relationships in this domain and context. As an element of
language, a pattern is an instruction, which can be used, over and over again, to resolve the given
system of forces, wherever the context makes it relevant [Alexander, 1979].
Our pattern language uses patterns that have been documented in the literature before, either as
“isolated” patterns or in other pattern languages. In the pattern language presented, we
specifically use these patterns in the context of reengineering to the web. That means, we
describe the patterns in this particular context. Note that there is a difference to the general
purpose use of these patterns.
We present the patterns within the explanations of the reengineering solutions, where the
patterns occur the first time. Each pattern is presented with a problem section, followed by a
solution which starts with the word “therefore.” We always cite the original pattern descriptions;
refer to them for more details on the particular patterns.
CONNECTION AND INVOCATION HANDLING
There are many different kinds of frameworks that can be used to access objects running in a
remote server, such as distributed object frameworks (like CORBA, DCOM, RMI), web
application frameworks, or object-oriented messaging protocols. In this chapter, we commonly
refer to these kinds of frameworks as remoting frameworks.
Web application frameworks, such as Java Servlets and JSP, Apache and Apache Modules
[Thau, 1996], AOL Server [Davidson, 2000], TclHttpd [Welch, 2000], WebShell [Vckovski,
2001], Zope [Latteier, 1999], or ActiWeb [Neumann and Zdun, 2001], provide a pure server-side
remoting framework based on web servers. That means, invocations sent with web requests are
interactively handled by code running in the web server. Often for reengineering to the web an
existing web application framework can be reused. In some cases it makes sense to implement a
custom invocation framework on top of a web server, for instance, if the performance of existing
7
web application frameworks causes problems or if it is not possible to access or control required
parameters in the HTTP header fields.
On client side, in most cases, the web browser is used as the primary client. If other programs
are used as clients (or if a web browser should be implemented), an HTTP client suite has to be
chosen (or built) as well. A fundamental difference of web application frameworks in comparison
to other kinds of remoting frameworks, such as distributed object frameworks, is that the client
(e.g. the web browser) is a generic program handling all possible applications. Distributed object
frameworks use an application-specific client implementation that is realized with the client
components of the distributed object frameworks.
Many web applications wrapping a legacy system have to support other remoting frameworks
as well. For instance, distributed object frameworks that are also based on HTTP (such as web
services) can be supported, or frameworks that are based on other protocols (such as CORBA).
Combining different protocols and remoting frameworks can be handled by a SERVICE
ABSTRACTION LAYER [Vogel, 2001].
The named remoting frameworks use a similar basic remoting architecture, but yet there are
many differences. The developer, who wants to use one or more of these remoting frameworks,
has to understand the basic remoting architecture of the used frameworks well. Of course, in the
web reengineering context this especially means to understand the web application framework
used. This is important because it may be necessary to tune or configure the remoting
frameworks, for instance, to optimize Quality of Service (QoS), tune the performance, use
threading, use POOLING [Kircher and Jain, 2002], or extend the invocation process. Also for
integrating different frameworks it is important to understand their basic architecture well.
Layers of a Web Application Framework
In this section, we explain the remoting architecture of web application framework using differ-
ent LAYERS [Buschmann et al., 1996] (see Figure 2):
Pattern 1 – LAYERS:
Consider a system in which high-level functionalities are depending on low-level
functionalities. Such systems often require some horizontal structuring that is orthogonal to
8
their vertical subdivision. Local changeability, exchangeability, and reuse of system parts
should be supported.
Therefore, structure the system into LAYERS and place them on top of each other. Within
each LAYER all constituent components work on the same level of abstraction.
In a web application framework, the lowest layer is an adaptation layer to the operating system
and network communication APIs. The advantage of using an adaptation layer is that higher
layers can abstract from platform details and therefore use an platform-independent interface. The
operating system APIs are (often) written in the procedural C language; thus, if the
communication framework is written in an object-oriented language, WRAPPER FACADES [Schmidt
et al., 2000] are used for encapsulating the operating system’s APIs.
Pattern 2 – WRAPPER FACADE:
Consider an object-oriented system that should use non-object-oriented APIs (such as
operating system APIs, system libraries, or legacy systems). It should be avoided to
program non-object-oriented APIs directly.
Therefore, for each set of related functions and data in a non-object-oriented API, create
one or more WRAPPER FACADE classes that encapsulate these functions and data with a more
concise, robust, portable, and maintainable interface.
The higher layers only access the operating system’s APIs via the WRAPPER FACADES. Each
WRAPPER FACADE consists of one or more classes that contain forwarder operations. These
forwarder operations encapsulate the C-based functions and data within an object-oriented
interface. Typical WRAPPER FACADES within the operating system adaptation layer provide access
to threading, socket communication, I/O events, dynamic linking, etc.
On top of the adaptation layer the communication framework is defined. It handles the basics
of connection handling and asynchronous invocation. The layer containing the HTTP client (e.g.
the web browser) and the HTTP server components, including the web application framework,
are defined on top of the communication framework layer.
The server application, finally, contains the wrappers for the legacy system. If the legacy
system provides a procedural API, the pattern WRAPPER FACADE can be used to access it. Or the
legacy system might run in a different process or different machine (e.g. as a server process or a
9
batch process). Then some kind of connection is required, such as inter-process communication
or another communication framework.
Note that even though only the highest LAYER of the web application framework is concerned
with the reengineering to the web task, the wrappers in it may have to access details in lower
LAYERS. In general, low-level functionalities are hidden by higher level LAYERS. But sometimes it
is necessary to deal with the low-level functionalities directly, say, for configuration,
performance tuning, or Quality of Service tuning. Thus the developer of a reengineering to the
web solution has to understand the lower LAYERS (APIs), even though it is usually possible to
reuse existing lower LAYER implementations.
Web Connection vs. Connection to the Legacy System
As illustrated in Figure 2 there are possibly two connections between different systems. Besides
the web connection between web client and web server, there may be a second connection from
the web application framework to the legacy system. We might have to cope with characteristics
of the web connection when connecting to the legacy system as well. Typical problem fields are
asynchronous invocations, concurrency, the statelessness of the HTTP protocol, missing support
for sessions, and potential performance overheads. If a particular problem is not already resolved
by the web application framework, it has to be resolved within the connection to the legacy
system. Out of these reasons, even if an existing web application framework can be reused, a
reengineer has to understand the solution to these problems in the web application framework
well. The connection to the legacy system has to use the invocation model of the web application
framework and/or customize it to its requirements.
Pos
sibl
y M
achi
ne/P
roce
ss B
ound
ary
Adaptation Layer
Operating System APIs
Communication Framework
Remoting Layer: HTTP Client
Client: Web Browser
Adaptation Layer
Operating System APIs
Communication Framework
Remoting Layer:Web Application Framework
HTTP Server
Server Application
Mac
hine
/Pro
cess
Bou
ndar
y
LegacySystem
invokes
Figure 2: Web Application Layers
10
As a typical example, consider a legacy system that allows for synchronous processing (e.g.
batch processing) only. Web clients send asynchronous and concurrent requests. Thus somewhere
these requests have to be demultiplexed and synchronized. Potentially the web application
framework can demultiplex and synchronize the requests already. But often for performance
reasons such requests are handled by multiple threads in the web server. These threads again can
concurrently access the legacy system. Then the connection to the legacy system has to handle
demultiplexing and synchronization for the access to the legacy system.
That means, the patterns of connection handling and invocation handling discussed in the next
sections either can be implemented within the web application framework or as part of the
connection to the legacy system.
Communication Framework: Connection Handling and Concurrency
For remote communication, connection establishment is typically handled by the
ACCEPTOR/CONNECTOR pattern [Schmidt et al., 2000]:
Pattern 3 – ACCEPTOR/CONNECTOR:
Protocols like HTTP are based on connections (e.g. socket connections). Coupling of
connection management and configuration code with the processing code should be
avoided.
Therefore, separate the connection initialization from the use of established connections. A
connection is created, when the connector on client side connects to the acceptor on server
side. Once a connection is established, further communication is done using a connection
handle.
An HTTP server listens to a network port to accept connections sent by the client; in turn, the
client (i.e. the web browser) uses a connector to start an HTTP request. A connector and acceptor
pair is also used for response handling. The server response connects to the client that awaits the
server’s responses with a response acceptor.
An HTTP server has to support asynchronous invocations to enable multiple simultaneous
requests. Non-trivial clients (like most web browsers) should not block during request and
response, and thus, are also asynchronous. Therefore, the programming model of most web
applications is event-based both on client side and server side, meaning that an event loop and/or
11
multi-threading is supported and connection handling is done using callbacks for the network
events. A REACTOR [Schmidt et al., 2000] reacts on the network events and efficiently
demultiplexes the events. It dispatches all requests to handler objects:
Pattern 4 – REACTOR:
Event-based, distributed systems (may) have to handle multiple requests (and responses)
simultaneously.
Therefore, synchronously wait for (connection) events in the REACTOR. The system
registers event handlers for certain types of events with the REACTOR. When an event
occurs, the REACTOR dispatches the event handler associated with the event.
In web applications, a REACTOR is typically used on client side and server side.
Server side concurrency can simply be handled by the REACTOR and an event loop. That means
the REACTOR queues up all network events in the order of demultiplexing. This is a very simple
concurrency handling architecture that yields a good performance for relatively small, remote
operations. Computation-intensive or blocking operations may cause problems, as queued
requests have to wait for these operations.
If computation-intensive or blocking operations can be expected (what may be the case for a
legacy system), the web server should proceed with other work while waiting for the legacy
system’s response. Thus, in such cases, typically a multi-threaded server is used. Then only one
thread of control has to wait for a computation-intensive or blocking operation, while other
threads can resume with their work.
If we access a legacy system that requires synchronous access from multiple threads, we have
to synchronize and schedule the accesses. This can done by using objects in the web application
server for scheduling access to the legacy system; there are two alternative patterns for
synchronizing and scheduling concurrently invoked remote operations [Schmidt et al., 2000]:
Pattern 5 – ACTIVE OBJECT:
Consider that multiple clients access the same object concurrently. Processing-intensive
operations should not block the entire process. Synchronized access to shared concurrent
objects should be provided in a straightforward manner.
Therefore, decouple the executing from the invoking thread. A queue is used between those
12
threads to exchange requests and responses.
Pattern 6 – MONITOR OBJECT:
Consider that multiple client requests access the same thread of control concurrently. An
object should be protected from uncontrolled concurrent changes and deadlocks.
Therefore, let a MONITOR OBJECT ensure that only one operation runs within an object at a
time by queuing operation executions. It applies one lock per object to synchronize access
to all operations.
Often threading is used for connection handling and execution. For each particular connection,
a connection handle has to be instantiated. To optimize resource allocation for connections, the
connections can be shared in a pool using the pattern POOLING [Kircher and Jain, 2002]. This
reduces the overhead of instantiating the connection handle objects.
Pattern 7 – POOLING:
Consider a system that provides access to multiple resources of the same kind, such as
network connections, objects, threads, or memory. Fast and predictable access to these
resources and scalability are required.
Therefore, manage multiple instances of a resource type in a pool. Clients can acquire the
resources from the pool, and release them back into the pool, when they are no longer
needed. To increase efficiency, the pool eagerly acquires a static number of resources after
creation. If the demand exceeds the available resources in the pool, it lazily acquires more
resources.
When the legacy system can be accessed concurrently, we can also pool the instances that
connect to the legacy system to avoid the overhead of establishing a connection to the legacy
system.
In the context of reengineering to the web, it is especially important to select for a proper
concurrency model. Also, connection and/or thread POOLING have to be adjusted to the
requirements of the legacy application. Usually, these issues do not have to be implemented from
scratch, but only the models of used remoting frameworks have to be well understood and
integrated with the model of the legacy application. In contrast, the connection to the legacy
system is most often hand-built.
13
Remoting Layer
On top of the connection handling and concurrency layer (the communication framework) there
are a few basic remoting patterns (from [Voelter et al., 2002]) that are used for building the
remoting layer.
In a remoting framework clients communicate with objects on a remote server. On the server
side the invoked functionality is implemented as a REMOTE OBJECT [Voelter et al., 2002]:
Pattern 8 – REMOTE OBJECT:
Accessing an object over a network is different from accessing a local object because of
machine boundaries, network latency, network unreliability, and other properties of
network environments.
Therefore, let a distributed object framework transfer local invocations from the client side
to a REMOTE OBJECT within the server. Each REMOTE OBJECT provides a well-defined
interface to be accessed remotely.
In web application frameworks, the REMOTE OBJECT is responsible for building up the
delivered web pages dynamically. In the legacy reengineering context that means it has to invoke
the legacy system and decorate it with HTML. That means, if a connection to the legacy system
has to be established, the REMOTE OBJECT is responsible for building it up. As explained in the
previous section the concurrency patterns and POOLING might also be applied within the REMOTE
OBJECT.
The client invokes an operation of a local object and expects it to be executed by the REMOTE
OBJECT. To make this happen, the invocation crosses the machine boundary, the correct operation
of the correct REMOTE OBJECT is obtained and executed, and the result of this operation invocation
is passed back across the network. In a distributed object framework a CLIENT PROXY [Voelter et
al., 2002] is used by a client to access the REMOTE OBJECT:
Pattern 9 – CLIENT PROXY:
The programming model on client side should not expose the networking details to clients.
Therefore, provide a CLIENT PROXY as a local object within the client process that offers the
REMOTE OBJECT’S interface and hides networking details.
14
In web applications the CLIENT PROXY usually is a generic invocation handler that sends
network requests for web pages using an URL. In the URL or in the HTTP request, the details of
the invocation (object ID, operation name, parameters) are encoded. These details are used by the
server to find the correct object to be invoked. These tasks are performed by an INVOKER [Voelter
et al., 2002] on the server side:
Pattern 10 – INVOKER:
When a CLIENT PROXY sends invocations to the server side, somehow the targeted REMOTE
OBJECT has to be reached.
Therefore, provide an INVOKER that is remotely accessed by the CLIENT PROXY with the
invocation data. The INVOKER de-marshals the invocation data and uses it to dispatch the
correct operation of the target REMOTE OBJECT.
Note that in distributed object frameworks that are primarily designed for program-to-program
communication, such as CORBA or web services, often CLIENT PROXIES and INVOKERS are
specific for a REMOTE OBJECT type. They can be generated from an INTERFACE DESCRIPTION
[Voelter et al., 2002]. In contrast, in a web application the interfaces are solely defined by the
server and the correct access interfaces (and parameters) have to be delivered to the client using
web pages. For integrating web application with distributed object frameworks it is necessary to
integrate these differences of the remoting models. When the legacy system contains full
INTERFACE DESCRIPTIONS, such as C header files for instance, these can potentially be used to
generate the INVOKERS automatically.
Usually, there is more than one INVOKER in a web application framework; for instance, one for
delivering static web pages from the file system and one for handling remote invocations of
REMOTE OBJECTS. Thus some instance is required to pick out the responsible INVOKER for an
incoming request. On client and server side, a REQUEST HANDLER [Voelter et al., 2002] is used to
abstract the communication framework for CLIENT PROXY and INVOKER:
Pattern 11 – REQUEST HANDLER:
On client and server side the details of connection handling have to be handled. In
particular, communication setup, resource management, threading, and concurrency have to
be handled and configured in a central fashion.
15
Therefore, provide a SERVER REQUEST HANDLER, responsible for dealing with the network
communication on server side, together with a respective CLIENT REQUEST HANDLER,
responsible for client side communication. These REQUEST HANDLERS interact with CLIENT
PROXIES and INVOKERS respectively.
Usually, the same communication framework implementations can be used on client and
server side, but the REQUEST HANDLERS are different: the CLIENT REQUEST HANDLER connects to
the server side and uses the REACTOR to wait for the reply, whereas the SERVER REQUEST
HANDLER waits for incoming invocations and sends the reply back, after it is processed. CLIENT
PROXY, REQUEST HANDLER, and INVOKER build together a BROKER [Buschmann et al., 1996] (see
Figure 3).
:ClientProxy:Invoker
:ServerRequestHandler
:Invoker:Invoker:Invoker
:RemoteObject
ThreadPool
ConnectionPool
Communication Framework
InstancePool
Broker
Legacy System API
:ClientRequestHandler
Communication Framework
Mac
hine
/Pro
cess
Bou
ndar
y
Marshaller
Marshaller
Figure 3: Web Remoting Layer
Marshalling of request and response has to happen on client and server side according to the
HTTP protocol. This especially means adding of HTTP headers, encoding URLs and form data,
and converting binary data (e.g. using Base64 encoding). This is done by a MARSHALLER [Voelter
et al., 2002]:
Pattern 12 – MARSHALLER:
The data to describe operation invocations of REMOTE OBJECTS consist of the target object’s
object ID, the operation identifier, the arguments, and possibly other context information.
All this information has to be transported over the network connection.
Therefore, require each REMOTE OBJECT type to provide a way to serialize itself into a
transport format that can be transported over a network as a byte stream. The distributed
16
object framework provides a MARSHALLER (on client and server side) that uses this
mechanism whenever a remote invocation needs to be transported across the network.
A MARSHALLER for web applications also has to convert remote invocations into parts of web
pages. Usually, remote invocations are either modeled as HTML links or actions of HTML
forms. In contrast to distributed object frameworks, in a web application marshalling of
invocations is only performed on server side: the client side invocations are encoded into the
delivered web pages. For instance, the following invocation to the REMOTE OBJECT objectName
and the method methodName can be embedded in an HTML document as a link:
http://www.myhost.org/objectName+methodName?arg=xyz
On server side such invocations have to be de-marshalled, when a request arrives (for instance,
the URL string has to be split and/or the HTML form data has to be evaluated). Marshalling can
triggered by the INVOKER. If more than one INVOKER is used in the system, the REQUEST HANDLER
can de-marshal the invocation partially to find the correct INVOKER to handle the invocation.
Note that the connection to the legacy system might need more context information, such as
for instance a session identifier. Such context information also need to be marshalled and
embedded in HTML or HTTP (that is either in the URLs of HTML links, in the HTML text with
HIDDEN form fields, or as cookies).
Sensitive information, such as user authentications, additionally can be encrypted during
marshalling.
LEGACY SYSTEM WRAPPING, ADAPTATION, AND DECORATION
An important aspect of migrating a legacy system to the web is wrapping. Wrappers are
mechanisms for introducing new behavior to be executed before, after, in, and/or around an
existing method or component. Wrapping is especially used as a technique for encapsulating
legacy components. Common techniques of reusing legacy components by wrapping are
discussed in [Sneed, 2000, Brant et al., 1998].
On server-side of a web application framework, we have to map incoming invocations into the
legacy system. The invoker provides the invocation data that it has obtained by de-marshalling
the URL and the HTTP request header. The invocation data references a wrapper object. This
wrapper is responsible for forwarding the invocation into the legacy system and obtaining the
result.
17
The COMPONENT WRAPPER pattern [Zdun, 2002a] generalizes this form of (legacy) wrapping:
Pattern 13 – COMPONENT WRAPPER:
A component should be used with its external interface only, but still we often require some
way to customize and adapt a component to a degree that goes beyond pure
parameterization.
Therefore, let the access point to a component be a COMPONENT WRAPPER, a first-class
object of the programming language. Within this object the component access can be
customized without interfering with the client or the component implementation.
The COMPONENT WRAPPER is a white-box for the component’s client and enables us to bring in
changes and customizations, e.g. with DECORATORS [Gamma et al., 1994] and ADAPTERS [Gamma
et al., 1994]. Note that the legacy system can run in the same process as the COMPONENT
WRAPPER, or it can run in a different process. When reengineering to the web, a typical
COMPONENT WRAPPER is also a REMOTE OBJECT running in the web application framework. It
exports (some of its) operations as URLs to the web.
The COMPONENT WRAPPER methods can be simple forwarder operations. When a procedural
API is supported by the legacy system, they are usually implemented as WRAPPER FACADES . Or,
alternatively, the COMPONENT WRAPPER might build up a connection to the legacy system either
with inter-process communication (e.g. for batch processing applications) or with a
communication protocol (e.g. for server applications).
As an indirection mechanism, wrappers can also be used as a place for applying adaptations
and decorations [Gamma et al., 1994]:
Pattern 14 – ADAPTER:
A client requires a different interface than provided by a class, yet the class cannot be
changed.
Therefore, use an ADAPTER to convert the interface of a class into another interface clients
expect. It can be realized as a class adapter using multiple inheritance or as an object
adapter which forwards messages to the adapted object.
Pattern 15 – DECORATOR:
18
An object should expose some additional behavior, yet the class of the object should not be
changed.
Therefore, use a DECORATOR to dynamically attach additional responsibility to an object
using a DECORATOR object that forwards invocations to the decorated object, after the
decoration has taken place.
In the context of reengineering to the web, ADAPTERS are especially used for adaptations of
interfaces, if the legacy system does not support a required interface that should be exposed to the
web (or other remote clients). There are many tasks for DECORATORS in the context of
reengineering to the web, including logging, access control, encryption, etc. Besides
DECORATORS of the COMPONENT WRAPPER, DECORATORS can also be applied for the INVOKER, if
the whole invocation process should be decorated.
Both tasks, adaptation and decoration, can also be further supported by the pattern MESSAGE
INTERCEPTOR [Zdun, 2002a]:
Pattern 16 – MESSAGE INTERCEPTOR:
Consider decorations and adaptations take place for certain objects over and over again.
Then it would make sense to support these tasks as a first-class mechanism, instead of
programming them from scratch for each use.
Therefore, build a callback mechanism into the INVOKER working for all messages that pass
it. These callbacks invoke MESSAGE INTERCEPTORS for standardized events, such as
“before” a remote method invocation, “after” a remote method invocation, or “instead-of”
invoking a remote method.
A MESSAGE INTERCEPTOR intercepts messages sent between different entities, and can add
arbitrary behavior to the invocation and/or the response. Support for MESSAGE INTERCEPTORS can
be provided on different levels. A MESSAGE INTERCEPTOR can be supported by the programming
language used for building the COMPONENT WRAPPER (or a language extension). Also,
interceptors can be provided in distributed systems by a middleware, as implemented in TAO or
Orbix (see the pattern INTERCEPTOR [Schmidt et al., 2000] which describes this variant). The
distributed variant of MESSAGE INTERCEPTORS can be implemented in a web application
framework on top of the INVOKER. This is specifically suitable for tasks that have to be handled
per invocation, such as user rights management, marshalling, access control, logging, etc.
19
In Figure 4 an INVOKER that supports MESSAGE INTERCEPTORS is depicted. In this examples, a
request for a COMPONENT WRAPPER arrives. This COMPONENT WRAPPER has two “before”
interceptors. These are dispatched before the actual invocation takes place. When these
interceptors have returned, the COMPONENT WRAPPER is invoked and it forwards the invocation to
the legacy system. The COMPONENT WRAPPER also has one “after” interceptor. It is dispatched by
the INVOKER after the COMPONENT WRAPPER returns and before the result is sent back across the
network. Using such a chained MESSAGE INTERCEPTOR architecture developers can flexibly
register add-on services as MESSAGE INTERCEPTORS at runtime for specific COMPONENT
WRAPPERS, as required.
InvokerInvokerInvoker
Request Handler
2) invoke "Component Wrapper 3 aMethod"
Session Mgt Interceptor
Component Wrapper 1
Component Wrapper 2Legacy
System
Component Wrapper 3
Authentication Interceptor Logging Interceptor
3) before invocation: "ComponentWrapper3 aMethod"
4) before invocation: "ComponentWrapper3 aMethod"
5) invoke aMethod
6) invoke
7) after invocation: "ComponentWrapper3 aMethod"
1) HTTP request: http://.../ComponentWrapper3+aMethod+...
Figure 4: An invoker applying message interceptors for a component wrapper
REMOTE OBJECTS can be pooled using POOLING [Kircher and Jain, 2002]. In case of
COMPONENT WRAPPERS for legacy systems that means a COMPONENT WRAPPER pool is used to
access the legacy system (one pool for each used COMPONENT WRAPPER type). For each
invocation a COMPONENT WRAPPER is taken from the pool, handles the invocation, is cleaned up,
and put back into the pool. This way we can avoid the overhead of instantiating the COMPONENT
WRAPPER and establishing a connection to the legacy system for each incoming invocation. This,
of course, requires the legacy system to support multiple concurrent invocations (or
synchronization).
20
SERVICE AND CHANNEL ABSTRACTION
Often a legacy system provides multiple services. Each COMPONENT WRAPPER type can be seen as
one service of the legacy application, and all these services should be offered via the same web
application interface. Also, different requests coming from different clients, communicating over
different channels, have to be supported by many web application frameworks. It should not be
required to change the application logic every time a new channel has to be supported or a new
service is added to the application.
When reengineering large-scale systems to the web, it can be expected that the problems of
multiple channels and service abstraction are occurring together. These forces are often resolved
by a SERVICE ABSTRACTION LAYER architecture [Vogel, 2001]:
Pattern 17 – SERVICE ABSTRACTION LAYER:
Consider a system that serves multiple services to multiple channels. If possible, services
should be implemented independently from channel handling, and implementations should
be reused.
Therefore, provide a SERVICE ABSTRACTION LAYER as an extra layer to the application logic
tier containing the logic to receive and delegate requests. For each supported channel there
is a channel adapter. The INVOKER does not directly access the COMPONENT WRAPPER but it
sends invocations to the SERVICE ABSTRACTION LAYER first, which then selects the correct
COMPONENT WRAPPER.
The SERVICE ABSTRACTION LAYER also converts the invocation data to a format understood by
the COMPONENT WRAPPER, and converts the response back to the client format. This is done using
CONTENT CONVERTERS [Vogel and Zdun, 2002].
21
MHPClient
LegacySystem
Server Application
Ser
vice
Abs
tract
ion
Laye
rCORBAAdapter
HTTPAdapter
ReturnChannelAdapter
CORBAClient
Service 1
Service 2
Service 3
ComponentWrapper 1
LegacyClient
ComponentWrapper 2
ComponentWrapper 3
Broadcast ChannelAdapter
Figure 5: Service Abstraction Layer
SESSION MANAGEMENT AND STATE PRESERVATION
A client can be expected to make several subsequent requests over a time span, and often some
state has to be preserved between these requests. For instance, especially in multi-user systems,
users might have to log in and out. Most legacy applications are requiring states to be maintained
for continued interaction with a client. The HTTP protocol is stateless and cannot be used for
maintaining the state of an interaction.
The SESSIONS pattern [Sorensen, 2002] provides a solution to this problem:
Pattern 18 – SESSIONS:
A system requires a state for an interaction with a client, but the communication protocol is
stateless.
Therefore, state is tied to a session so that a request can access data accumulated earlier
during an interaction. Such data is referred to as session-specific data. There is a session
identifier to let REMOTE OBJECTS be able to refer to a session.
In web applications, there are two variants of the SESSIONS pattern: sessions can be kept in the
server or in the client. If the session is kept in the server, the session identifier is sent with the
each response to the client, and the client refers to it in the next invocation; if it is kept in the
client, the client has to send it to the server, which refers to it in its responses.
To map a stateless client request properly to the correct session, we have different options in
web applications:
22
- URL Encoding: We can encode information, such as user name and password, in the
URL by attaching them as standard URL parameters to the base URL. This works in
almost any setting, but we have to be aware that some browsers have limitations
regarding the length of the URL. Moreover, the information is readable on the user’s
screen (if URL contents are not encrypted form). For sensitive applications we can
encrypt the contents of the URL.
- HIDDEN Form Fields: We can embed information in an HTML form that is hidden
from display by using HIDDEN form fields. However, of course, they are readable as
plain text in the HTML page’s source. Again, we can use encryption for sensitive data.
- Cookies: Cookies are a way to store and retrieve information on the client side of a
connection by adding a simple, persistent, client-side state settable by the server.
However, the client can deactivate cookies in the user agent; thus, cookies do not
always work. Cookies can optionally be sent via a secure connection (e.g. SSL).
In any case, when reengineering to the web, it is necessary to integrate these SESSIONS with the
session and user model of the legacy application (if there is any). For instance, one solution is
that COMPONENT WRAPPERS log the user in and out for each invocation. Or, alternatively, the
COMPONENT WRAPPERS can hold the session open until the next invocation. In case of many client
sessions, it may be necessary to introduce LEASES [Jain and Kircher, 2002]:
Pattern 19 – LEASES :
A (web) client can abort a stateful interaction without the server realizing it. Then existing
SESSIONS are never stopped and opened connections are never closed.
Therefore, provide LEASES for each session (or connection) that expire after a certain
amount of time. Let the COMPONENT WRAPPER close a session (or connection) that is held
open, when its LEASE expires. When a client sends the next invocation in time, the LEASE is
renewed so that the session is held open again for the full lease duration.
DYNAMIC CONTENT GENERATION AND CONVERSION
In [Vogel and Zdun, 2002] we present a pattern language for content generation, representation,
and conversion for the web. Here, we use a sub-set of this pattern language in the context of
reengineering systems to the web.
23
Dynamic content generation and conversion are usually central tasks of a project dealing with
migration to the web. In particular the web requests received by the INVOKER have to be
translated into the API of the legacy system, and the response of the system has to be decorated
with HTML markup. What seems to be a relatively simple effort at first glance may lead to
severe problems when the resulting system has to be further evolved later on. Often we find
systems in which the HTML pages are simply generated by string concatenation, such as the
following code excerpt:
StringBuffer htmlText = new StringBuffer();
String name = legacyObject.getName();
...
htmlText.append("<BR> <B> Name: </B>");
htmlText.append(name);
Hard-coding of HTML markup in the program code may lead to severe problems regarding
extensibility and flexibility of content creation. Content, representation style, and application
behavior should be changeable ad hoc. In this section, we will discuss two conceptually different
approaches for decorating with HTML markup: template-based approaches and constructive
approaches.
Template-Based Approaches
A CONTENT FORMAT TEMPLATE [Vogel and Zdun, 2002] provides a typical template-based
approach for building up web content:
Pattern 20 – CONTENT FORMAT TEMPLATE:
A system requires content to be built up in a target content format. Content editors should
be able to add dynamic content parts and invocations to legacy systems in a simple way. A
high performance for building up web pages is required.
Therefore, provide a template language that is embedded into the target content format (e.g.
HTML). The templates contain special template language code which is substituted by a
template engine before the page is delivered to the client.
Known uses of template-based approaches are PHP [Bakken and Schmid, 2001], ASP, JSP, or
ColdFusion. These let developers write HTML text with special markup. The special markup is
substituted by the server, and a new page is generated. For instance, in PHP the PHP code is
embedded by escaping HTML with a special comment:
24
<body>
<h1> <?php echo("My PHP Heading\n"); ?> </h1>
</body>
In the same way invocations to COMPONENT WRAPPERS can be embedded in template
languages to connect to a legacy system.
On the first glance, the approach is simple and even well-suited for end-users (such as content
editors), say, by using special editors. The HTML design can be separated from the software
development process and can be fully integrated with content management systems. However,
some web-based applications require more complex interactions than simply expressible with
templates. Sometimes, the same actions of the user should lead to different results in different
situations. Most approaches do not offer high-level programmability in the template or
conceptual integration across the template fragments. Application parts and design are not clearly
separated. Thus template fragments cannot be cleanly reused. Complex templates may quickly
become hard to understand and maintain.
Sometimes integration of different scripts can be handled via a shared dataspace, such as a
persistent database connection. Sometimes, we require server-side components for integrating the
scripts on the server side. In such cases a constructive approach can be chosen as an alternative.
Constructive Approaches
Constructive approaches generate a web page on-the-fly using a distinct API for constructing web
pages. They are not necessarily well-suited for end-users as they require knowledge of a full
programming language. However, they allow for implementing a more complex web application
logic than easily achievable with most template-based approaches.
The most simple constructive approach is the CGI interface [Coar, 1999]. It is a standardized
interface that allows web servers to call external applications with a set of parameters. The
primary advantages of CGI programming are that it is simple, robust, and portable. However, one
process has to be spawned per request, therefore, on some operating systems (but, for instance,
not on many modern UNIX variants) it may be significantly slower than using threads. Usually
different small programs are combined to one web application. Thus conceptual integrity of the
architecture, rapid changeability, and understandability may be reduced significantly compared to
more integrated application development approaches. Since every request is handled by a new
25
process and HTTP is stateless, the application cannot handle session states in the program, but
has to use external resources, such as databases or central files/processes.
A variant of CGI is FastCGI [Open Market, Inc., 1996] which allows a single process to
handle multiple requests. The targeted advantage is mainly performance. However, the approach
is not standardized and implementations may potentially be less robust.
A similar approach integrated with the Java language are servlets. They are basically Java
classes running in a Java-based web server’s runtime environment. They are a rather low-level
approach for constructing web content. In general, HTML content is created by programming the
string-based page construction by hand. The approach offers a potentially high performance. As
different servlets can run in one server application (servlet container), servlets provide a more
integrated architecture than CGI, for instance.
Most web servers offer an extension architecture. Modules are running in the server’s runtime
environment. Thus a high performance can be reached and the server’s feature (e.g. for
scalability) can be fully supported. Examples are Apache Modules [Thau, 1996], Netscape
NSAPI, and Microsoft ISAPI. However, the approach is usually a low-level approach of coding
web pages in C, C++, or Java. Moreover, most APIs are quite complex, and applications tend to
be monolithic and hard to understand.
Custom web servers, such as AOL Server [Davidson, 2000], TclHttpd [Welch, 2000],
WebShell [Vckovski, 2001], Zope [Latteier, 1999], or ActiWeb [Neumann and Zdun, 2001],
provide more high-level environments on top of ordinary web-servers. Often they provide
integration with high-level languages, such as scripting languages, for rapid customizability. A
set of components is provided which implement the most common tasks in web-application
development, such as: HTTP support, session management, content generation, database
access/persistence services, legacy integration, security/authentication, debugging, and dynamic
component loading. Some approaches offer modules for popular web servers as well, as for
instance the Apache Module of WebShell.
In these constructive approaches HTML markup can either be built by hard-coding, perhaps
together with FRAGMENTS (see next section), or using a CONTENT CREATOR [Vogel and Zdun,
2002]:
Pattern 21 – CONTENT CREATOR:
26
Content in different content formats has to be built up dynamically with a constructive
approach. But HTML text should not be hard-coded into the program text.
Therefore, provide an abstract CONTENT CREATOR class that contains the common
denominator of the used interfaces. For each supported content format there are concrete
classes that implement the common denominator interface for the specific content format.
The concrete classes might also contain methods for required specialties.
For instance, WebShell [Vckovski, 2001] uses global procedures to define the elements of its
CONTENT CREATOR:
proc dl {code} {
web::put "<dl>"
uplevel $code
web::put "</dl>"
}
To code these procedures we have to interfere with HTML markup. However, we can avoid it
later on when we combine such procedures to HTML FRAGMENTS:
dl {
b {My first page}
em {in Web Shell}
}
Here we have created a <dl> list entry with a bold and an emphasized text in it. Potentially, the
procedure dl can be exchanged with another implementation for building a different content
format.
In Actiweb [Neumann and Zdun, 2001] CONTENT CREATORS are used to build up pages. Thus
we can use the same code to build up pages for different user interface types. A simple example
just builds up a web page:
HtmlBuilder htmlDoc
htmlDoc startDocument \
-title "My ActiWeb App" \
-bgcolor FFFFFF
htmlDoc addString "My ActiWeb App"
htmlDoc endDocument
We instantiate an object htmlDoc, then start a document, add a string, and end the document.
The developer does not see any HTML markup at all. The page is automatically created by the
CONTENT CREATOR class.
27
Many approaches combine the template-based and the constructive approach. However, often
the two used models are not well-integrated; that is, the developer has to manually care for the
balance between static and dynamic parts.
Caching
When the web client sends a request to the web server, the requested page may be available on
the file system of the server or it may be dynamically created. The template-based and
constructive approaches introduced so far require dynamic page creation. Compared to static
pages, dynamic creation has the problem of memory and performance overheads. A CONTENT
CACHE [Vogel and Zdun, 2002] provides a solution:
Pattern 22 – CONTENT CACHE:
The overhead of dynamic page creation causes performance or memory problems.
Therefore, increase the performance of web page delivery by caching already created
dynamic content.
In the context of reengineering to the web, caching has to work in concert with the legacy
application. Usually, we cannot change the legacy application to invalidate cached elements that
are not valid anymore. We can therefore only cache those operations of the legacy system for
which the COMPONENT WRAPPER can determine whether the cache is still valid or not.
Caching whole pages is only a good idea for simplistic pages. For larger web applications, the
dynamic web pages should be designed in such a way that parts of pages can be cached. The
pattern FRAGMENTS [Vogel and Zdun, 2002] solves this problem:
Pattern 23 – FRAGMENTS:
Web pages should be designed to allow the generation of web pages dynamically by
assuring the consistency of its content. Moreover, these dynamic web pages should be
provided in a highly efficient way.
Therefore, provide an information architecture which represents web pages from smaller
building blocks, called FRAGMENTS. Connect these FRAGMENTS so that updates and changes
can be propagated along a FRAGMENTS chain.
28
Thus FRAGMENTS can be cached instead of whole pages. Only parts of web pages that have (or
might have) changed since a previous creation of the FRAGMENT are dynamically created. Static
FRAGMENTS of web pages and dynamic FRAGMENTS that are still valid are obtained from the
CONTENT CACHE.
Web Content Management
The full pattern language in [Vogel and Zdun, 2002] contains some additional patterns not
described here (GENERIC CONTENT FORMAT, CONTENT CONVERTER, and PUBLISHER AND
GATHERER). These are mainly used for content gathering, storing, and publishing. These content
management tasks are usually not required for reengineering to the web, as the content itself is
managed by the wrapped legacy applications. Note that, in the content management context, the
pattern CONTENT CONVERTER is mostly used to convert to and from a GENERIC CONTENT FORMAT,
whereas in the web reengineering context it can be used for converting to/from the content
formats of the legacy system.
Note that there are also applications that contain content from multiple sources, and a wrapped
legacy application is only one of these sources. For integrating and managing this content, the
pattern language from [Vogel and Zdun, 2002] can be used in its original web content
management focus.
ADD-ON SERVICES
In a web application that wraps a legacy system many add-on services may be required per
invocation. Typical examples are authorization, encryption, and logging. When a request arrives
or when the response is sent, the add-on service has to be executed. That means, we can
potentially apply those add-on services either within:
- the REQUEST HANDLER for request-specific services,
- the INVOKER for invocation-specific services, or
- the COMPONENT WRAPPER for services that are specific per REMOTE OBJECT or REMOTE
OBJECT type.
As discussed before, the patterns DECORATOR and ADAPTER provide a simple, object-oriented
solution for providing additional services. MESSAGE INTERCEPTORS can be used for providing a
29
first-class solution for add-on services. Usually higher-level LAYERS can configure MESSAGE
INTERCEPTORS for lower levels.
Security issues are relevant for many web applications that wrap a legacy system. These are
often provided as add-on services. For instance, a login with user name and password may be
required. Moreover, secure communication or securing transferred data is required. These issues
have to be tightly integrated with session management. In general, we require user
authentications and encryption as typical means to secure an interactive web application, for
instance:
- HTTP Basic Authentication: The definition of HTTP/1.1 [Fielding et al., 1999]
contains some means for user authentication of web pages, called basic authentication
scheme. This simple challenge-response authentication mechanism lets the server
challenge a client request and clients can provide authentication information. The basic
authentication scheme is not considered to be a secure method of user authentication,
since the user name and password are passed over the network in an unencrypted form.
- HTTP Digest Authentication: Digest Access Authentication, defined in RFC 2617
[Franks et al., 1999], provides another challenge-response scheme, that does never
send the password in unencrypted form.
- Encrypted Connection (using SSL): Using a secure network connection, supported by
most servers, we can secure the transaction during a session.
- URL Encryption: To avoid readability of encoded URLs we can encrypt the attached
part of the URLs.
HTTP authentication is usually handled on the REQUEST HANDLER level. The authentication
information is obtained from the COMPONENT WRAPPER that connects to the legacy system. URL
encryption is usually handled per invocation within the MARSHALLER.
An important aspect of most web applications is the required high availability. Usually a web
site should run without any interruptions. This has several implications that have to be considered
when choosing frameworks, concepts, implementations, etc. At least the following functionalities
are usually provided as add-on services:
30
- Permanent and Selective Logging: All relevant actions have to be logged so that
problems can be traced. Some selection criteria should be supported, otherwise it may
be hard to find the required information out of the possibly large number of log entries.
Sometimes, for legal reasons, even more information has to be logged, such as user
transaction traces for e-commerce stores. Thus logging needs to be highly
configurable. For instance, WebShell [Vckovski, 2001] supports log filters that
distinguish different log levels and redirect log entries to different destinations, like
files, stdout, or SMS.
- Notification of Events: In cases when certain events happen, such as certain error
states, a person or application should be notified. For instance, when a error message is
recurring, an email or SMS may be sent to the system’s administrator.
- Testing: Load generators and extensive regression test suite are required for testing
under realistic conditions.
- Incremental Deployment: Dynamically loadable components enable incremental
deployment so that the application needs not to be stopped to deploy new functionality.
If multiple add-on services are required, as well as dynamic configuration of these, typically a
chained MESSAGE INTERCEPTOR architecture is provided. This architecture can also be used for
standard services, such as URL decoding, marshalling, etc.
PATTERN LANGUAGE OVERVIEW AND DISCUSSION
The pattern language presented in this chapter mainly can be used to explain different successful
technical solutions in the realm of reengineering to the web. There are the following main goals:
- Design Guideline: The patterns provide a design guideline for reengineering to the web
that can even be used in early project phases, when there is no decision for concrete
technologies yet.
- Project Estimation and Technology/Framework Selection: The pattern language can
help to estimate the effort for a reengineering to the web project in general, and make
import project-specific decisions, such as selecting for a web application framework.
The project team has to compare the features of particular frameworks with the project
requirements. These, in turn, can be obtained by selecting the required pattern variants.
31
Even though this method cannot ensure to avoid all possible technical problems, it
provides a good means to get a concrete estimation or basis for framework selection in
early project phases. The aspects discussed in this chapter can also be used as a
checklist for a project estimation.
- Means for Communication: Patterns provide a means for communication, for instance,
between technical and non-technical stakeholders of a system that allows for accurate
discussion, but without delving too deep into the technical details (as, for instance,
when a discussion is based on the concrete technical design or implementation).
In Figure 6 an integrated overview of the pattern language is presented. The arrows indicate
important dependencies of the patterns in the context of reengineering to the web. Only the
pattern LAYERS is not depicted here because it structures the other patterns (and thus has
relationships to all other patterns).
Acceptor/Connector
Active Object
Monitor Object
Pooling
Reactor
Remote Object
Marshaller
Client Proxy Invoker
Request Handler
Adapter
Decorator
Message Interceptor
Leases
Sessions
Content Format Template
Content Creator
Fragments
Content Cache
connection handler types
service handlerconcurrency
service handlerconcurrency
connection handling
Wrapper Facade
adaptation/decoration
decoration
adaptation
Component Wrapper
build content(e.g. HTML) build content
(e.g. HTML)
cache contentparts
cache elementsbuildingblocks
buildingblocks
remote objectis a componentwrapper to the legacy system
marshalinvocation
marshalsession data
maintainsession
use leases to expire session
uses wrapper facadeto access procedural
legacy system
instancepooling
connection andthread pooling
invokes
network communication
network communication
Service Abstraction Layer
channelsdispatchservices
Figure 6: Overview: Pattern Language for Reengineering to the Web
As can be seen, the LAYERS and functionalities, discussed in the previous sections, can cleanly
be separated from each other. That means individual parts of a solution can be developed in
separate teams, except for common (i.e. reused) code and integration points.
32
The main integration points are the two connections client/server and server/legacy system.
We have to deal with asynchronous invocations, concurrency, state preservation, and
performance issues at these integration points. When reengineering to the web, we usually cannot
change the legacy system or the CLIENT PROXY (as it is part of the web browser). Thus we have to
deal with these issues somewhere in the server’s processing chain of REQUEST HANDLER,
INVOKER, and COMPONENT WRAPPER. Therefore these three patterns can be seen as the central
patterns of the pattern language.
When more than one service is provided to more than one channel, we can use a SERVICE
ABSTRACTION LAYER. This situation is quite typical when reengineering to the web, as possibly
old interfaces still have to be supported and/or other channels than the web should be supported
(in the future). Thus, as far as possible, decoration and adaptation of the server’s processing chain
should happen at the end of the chain, so that they are reusable for different channels. Only those
parts that are specific for the web (i.e. for the HTTP protocol) should be handled at the REQUEST
HANDLER. Thus it often useful to implement a generic MESSAGE INTERCEPTOR chain, in which
add-on services (like SESSIONS, user control, LEASES, application-level marshalling, security,
testing, etc.) can be defined for each COMPONENT WRAPPER object.
From a managerial point of view many important issues are only indirectly addressed by the
software design and architecture patterns discussed in this paper. Examples are multi-user
concerns, detailed technology selection, security and security audits, or data access related to
converting a legacy system. Some of these issues are directly related to the pattern language, such
as multi-user concerns which have to be considered when deciding for a SESSIONS model. As
discussed above, technology choices only include those technology areas in focus of the patterns,
and many important business aspects of a technology (such as costs, maintenance models, or
credibility of business partners) are not in focus of the patterns. Other concerns go beyond the
scope of this chapter and should be handled using the respective patterns in these areas. A good
starting for security pattern is [Schumacher, 2003]. A good source for the broad area for data
access to legacy systems and (relational) database management systems is [Keller, 2003].
USING THE PATTERN LANGUAGE IN PRACTICE: A CASE STUDY
As pointed out in the previous section, a main use of the pattern language (from a managerial
point of view) is to provide a basis for communication both in early project phases and with non-
33
technical stakeholders. It can also be used to discuss about technology decisions in case the
patterns are already implemented by some technologies/frameworks potentially used within a
project.
We have used the patterns as a basis for communication between technical and non-technical
stakeholders in various projects (see for instance [Goedicke and Zdun, 2002, Zdun, 2002b] for
more details). In this section, we want to consider a few typical design decisions based on the
patterns to illustrate these benefits of patterns for the communication compared to technical
solutions. For space reasons, we cannot consider a whole project’s design decisions here, but will
only outline a few central ones. For each design decision we will describe the requirements
within the project, explain the alternatives based on the pattern language and considered
technologies/frameworks, and finally explain the solution.
Consider a project in which a large legacy system stores, delivers, and manages documents on
different kinds of storage devices. The system should get a additional web interface, however, the
old interfaces are still required. The code base is written entirely in C and has already distinct
APIs for access by custom GUI clients. There are a few crucial requirements for the web
interface, including load balancing, performance, and simple adaptation to the customer’s IT
infrastructure (i.e. simple adaptations of branding and layout of forms). For all other features the
web system should be as simple as possible and only add minimal additional costs to the overall
system. A management requirement is that a Java-based solution should be used, if possible.
The technology pre-selection for a Java-based solution limits the possible implementation
alternatives. We could use a non-Java web server and let a web server module access the Java-
based solution (e.g. via JNI). But this solution would require us to implement many parts of the
pattern language by hand that are already somehow available in Java. A more Java-like solution
is the use of a J2EE application server. Internally we can use different Java frameworks,
including servlets, EJBs, JSP, and others.
As the potential customers for the system are quite diverse, the web application system has
high demands for load balancing and failover for some customers, for others not. Here a simple
solution with the whole web application running in only one application server is not enough for
all customer sites, but it may be enough for some customers. That means a solution should be
34
scalable for customers with high hit rates, but yet impose no extra costs for customers with low
hit rates.
These requirements suggest a scalable LAYERS architecture with REACTORS for each LAYER.
INVOKERS are used for interconnecting the LAYERS. There is one primary ACCEPTOR/CONNECTOR
that accepts web requests running in a frontend web server. This frontend web server also
delivers static web pages and static page FRAGMENTS, and it invokes dynamic page creation in a
separate application server. The application server contains its own REACTOR waiting for requests
for dynamic content. Within the application server the INVOKER is situated. It is actually
responsible for invoking the REMOTE OBJECTS that connect to the legacy system. There are also
other REMOTE OBJECTS that perform other tasks than legacy integration. Note that there is also a
simple variant of the pattern INVOKER in the frontend web server that is responsible for
redirecting the invocations.
Each server has its own REACTOR, and INVOKERS are used to access the next LAYER. That
means, the system uses an event-based form of communication between these LAYERS. As a
positive consequence, there are no cyclic dependencies (back from lower LAYERS into higher
LAYERS), which would make it hard to debug and maintain the web application.
Another consequence of this architecture is that the web application is highly down- and up-
scalable. The extreme of down-scaling is that the whole web application runs within a single
server that serves as frontend web server and application server. The extreme of up-scaling is that
the primary ACCEPTOR/CONNECTOR is situated in a simple server that is just there for load
balancing, then there is LAYER of multiple, redundant web servers delivering static web pages,
then there is another load balancer, and finally there are multiple, redundant application servers.
Figure 7 shows this scalable server architecture.
WebServer
LegacySystemA
PIsWeb
Server
WebServer
LoadBalancer
ApplicationServer
ApplicationServer
ApplicationServer
Wrapper Facade Layer
LoadBalancer
Figure 7: Scalable Architecture for the Web Application. Mandatory elements have plain lines,
35
optional elements are presented with dotted lines. The optional elements are only present in
up-scaled installations.
The REMOTE OBJECTS in the application server provide access to the legacy system. Here, we
actually have two choices: In some cases it would be possible to let the web system access the
database directly (i.e. when data is only read from the database). In other cases we need WRAPPER
FACADES to access the routines of the C-based legacy system to ensure consistency of data,
especially when data is changed (i.e. the C-based interface access the internal database
exclusively).
Because a distinct C-based interface is already existing for separate clients within the legacy
system, writing WRAPPER FACADES is quite straightforward. A WRAPPER FACADE LAYER should
only be bypassed, if really necessary; here, the C-based wrappers are quite lightweight and
bypassing them for performance reasons is not necessary.
Note that in this architecture the legacy system is the bottleneck that cannot as easily be
replicated as web servers and application servers. It is necessary to synchronize access from
concurrent application servers to the legacy system. This can be done using MONITOR OBJECTS
that queue the access to the WRAPPER FACADES (here MONITOR OBJECT is chosen instead of
ACTIVE OBJECT because it is already supported by Java’s synchronized primitive). If this is too
much a performance penalty, direct access to the database and replication of the database might
be an option.
There are two kinds of REMOTE OBJECTS: COMPONENT WRAPPERS to the legacy system and
helper objects that contain application logic solely used for the web application. The REMOTE
OBJECTS are realized as Enterprise Java Beans (EJB). Here, we have two main choices: We can
use so-called entity beans that represent a long-living application logic object with container
managed persistence. Alternatively, we can use session beans. A session bean represents a single
client inside the application server with a SESSIONS abstraction. Session beans can either be
stateful or stateless.
The main task of the COMPONENT WRAPPERS in this architecture is to compose invocations to
the operations of the WRAPPER FACADES as needed by the web application. A central problem
here is that legacy functionalities are required to be called in some order, within the same
36
transaction, or with other invocation constraints. The same problem also arises for entity beans of
the web application (that do not access the legacy system).
As a solution to these problems, all REMOTE OBJECTS should be stateless and just used for
composition of invocations to the stateful entities (COMPONENT WRAPPERS and entity beans).
From the web, we do not access the stateful entities directly, but use the dedicated REMOTE
OBJECTS.
This solution is also chosen because entity beans provide many features (such as
synchronization, pooling, and lifecycle management) that impose a performance penalty. If
synchronization is already done in the WRAPPER FACADE LAYER for access to a legacy system
and/or if only concurrent read access to data is required, these features are not necessarily
required, and thus entity beans do not provide an optimal performance.
A more high-level and more complex alternative to the architecture described above is the
Java Connector Architecture (JCA). JCA is especially applicable for connecting to heterogeneous
information systems such ERP, mainframe transaction processing, database systems, and legacy
applications not written in the Java programming language. A COMPONENT WRAPPER (called
resource adapter in JCA) defines a standard set of contracts between an application server and the
information system: a connection management contract to let an application server pool
connections to an underlying legacy system, a transaction management contract between the
transaction manager and the legacy system, and a security contract to provide secure access to the
legacy system. As most of these features are not required or only in a simple fashion, JCA is not
chosen here. This design decision also avoids the performance overheads of connection,
transaction, and security management.
Invocations from frontend web server to application server are typically realized by embed-
ding URLs pointing to the pages or FRAGMENTS that are dynamically created in the application
server. Somehow the application server has to embed these links as well as session identifiers and
other context information in the web pages. This is typically done by multiple levels of
MARSHALLERS. First, there is the HTTP MARSHALLER of the web server. It marshalls and de-
marshalls all elements in the HTTP header fields. URL marshalling and de-marshalling is used to
access the correct REMOTE OBJECT. Other context information can either be transported in the
header fields, by means of cookies, or within the URL.
37
The overall system architecture already implements a simple form of CONTENT CACHE and
FRAGMENTS because static FRAGMENTS are “cached” in dedicated servers. If additional
performance is required, dynamic parts might be additionally cached by storing the FRAGMENTS
in a database. There is also a CONTENT CACHE for the documents retrieved from storage devices
within the legacy system. Thus a CONTENT CACHE in addition to these caching measures rather
seems to be an overhead.
There are some customizatio n requirements for the system, but the system functionality is
similar in most installations of the web application. To ensure rapid deployment at the customer,
a programmatic description of the web interface, as in CONTENT CREATOR pattern, is not the best
solution. Thus the CONTENT FORMAT TEMPLATE language Java Server Pages (JSP) is used. Java
servlets are used where programmatic, dynamic pages are required. JSP provides for simple
changeability at the customer, because only the templates need to be adapted to the customer’s
requirements. The downside is that the rapid changeability is limited to those changes envisioned
during design of the CONTENT FORMAT TEMPLATES .
Note that the architecture implements already a SERVICE ABSTRACTION LAYER, even if not used
as such. That means, future channels can be integrated easily. For instance, one possible addition
would be to offer a web service interface in the future.
Programmatic adaptations of COMPONENT WRAPPERS can be expected to be occurring in rare
cases because the legacy system has already a mature interface API that is in long use. Thus
changes can be simply introduced with ADAPTERS and DECORATORS. It is sensible to avoid the
implementation effort and performance penalty of a custom MESSAGE INTERCEPTOR architecture.
There are already different ways of POOLING supported in this architecture, including
connection pooling by the web server or component pooling by the EJB server. Thus it does not
seem necessary to add additional pools.
CONCLUSION
In this chapter, we have presented a pattern language for reengineering to the web. It is built from
patterns already published in other contexts. The main contribution of the pattern language is that
it provides clear alternatives and sequences for applying a project for reengineering to the web.
The patterns allow for important technical considerations in a mostly framework-neutral way
(like estimations and technology/framework selection), as well as a means for communication.
38
Also they provide design guidelines for system parts that have to be developed from scratch, such
as the wrappers and connections to the legacy system. Patterns, however, do not provide an out-
of-the-box solution, but a design effort is required for each project. As different systems in focus
of reengineering to the web projects can have quite different characteristics, this variability of the
pattern approach is a strength for finding a suitable solution in such a project.
Note that some parts of the pattern language are concerned with forward engineering of web
applications. That means they can also be applied without a connected legacy system; yet, in this
chapter we have focused on the more complex reengineering to the web case.
REFERENCES
[Alexander, 1979] Alexander, C. (1979). The Timeless Way of Building. Oxford Univ. Press.
[Bakken and Schmid, 2001] Bakken, S. S. and Schmid, E. (1997-2001). PHP manual.
http://www.php.net/manual/en/.
[Brant et al., 1998] Brant, J., Johnson, R. E., Roberts, D., and Foote, B. (1998). Evolution,
architecture, and metamorphosis. In Proc. of 12th European Conference on Object-Oriented
Programming (ECOOP’98), Brussels, Belgium.
[Buschmann et al., 1996] Buschmann, F., Meunier, R., Rohnert, H., Sommerlad, P., and Stal, M.
(1996). Pattern-orinented Software Architecture - A System of Patterns. J. Wiley and Sons
Ltd.
[Coar, 1999] Coar, K. A. L. (1999). The WWW common gateway interface – version 1.1.
http://cgi-spec.golux.com/draft-coar-cgi-v11-03-clean.html.
[Davidson, 2000] Davidson, J. (2000). Tcl in AOL digital city the architecture of a multi-
threaded high-performance web site. In Keynote at Tcl2k: The 7th USENIX Tcl/Tk Confer-
ence, Austin, Texas, USA.
[Fielding et al., 1999] Fielding, R., Gettys, J., Mogul, J., Frysyk, H., Masinter, L., Leach, P., and
Berners-Lee, T. (1999). Hypertext transfer protocol – HTTP/1.1. RFC 2616.
[Franks et al., 1999] Franks, J., Hallam-Baker, P., Hostetler, J., Lawrence, S., Leach, P., Luoto-
nen, A., and Stewart, L. (1999). Http authentication: Basic and digest access authentication.
RFC 2617.
[Gamma et al., 1994] Gamma, E., Helm, R., Johnson, R., and Vlissides, J. (1994). Design
Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley.
39
[Goedicke and Zdun, 2002] Goedicke, M. and Zdun, U. (2002). Piecemeal legacy migrating with
an architectural pattern language: A case study. Journal of Software Maintenance and
Evolution: Research and Practice, 14(1):1–30.
[Jain and Kircher, 2002] Jain, P. and Kircher, M. (2002). Leasing pattern. In Proceedings of the
Conference on Pattern Languages of Programs (PLoP), IIllinois, USA.
[Keller, 2003] Keller, W. (2003). Patterns for Object/Relational Access Layers.
http://www.objectarchitects.de/ObjectArchitects/orpatterns/.
[Kircher and Jain, 2002] Kircher, M. and Jain, P. (2002). In Proceedings of the 7th European
Conference on Pattern Languages of Programs (EuroPLoP), Irsee, Germany.
[Latteier, 1999] Latteier, A. (1999). The insider’s guide to Zope: An open source, object-based
web application platform. Web Review, 3(5).
[Neumann and Zdun, 2001] Neumann, G. and Zdun, U. (2001). Distributed web application
development with active web objects. In Proceedings of The 2nd International Conference on
Internet Computing (IC’2001), Las Vegas, Nevada, USA.
[Open Market, Inc., 1996] Open Market, Inc. (1996). FastCGI: A high-performance web server
interface. http://www.fastcgi.com/devkit/doc/fastcgi-whitepaper/fastcgi.htm.
[Schmidt et al., 2000] Schmidt, D. C., Stal, M., Rohnert, H., and Buschmann, F. (2000). Patterns
for Concurrent and Distributed Objects. Pattern-Oriented Software Architecture. J. Wiley and
Sons Ltd.
[Schumacher, 2003] Schuhmacher, M. (2003). Security Patterns. (2003).
http://www.securitypatterns.org.
[Sneed, 2000] Sneed, H. M. (2000). Encapsulation of legacy software: A technique for reusing
legacy software components. Annals of Software Engineering, 9.
[Sorensen, 2002] Sorensen, K. E. (2002). Sessions. In Proceedings of the 7th European
Conference on Pattern Languages of Programs (EuroPLoP), Irsee, Germany.
[Thau, 1996] Thau, R. (1996). Design considerations for the Apache server api. In Proceedings of
Fifth International World Wide Web Conference, Paris, France.
[Vckovski, 2001] Vckovski, A. (2001). Tcl Web. In Proceedings of 2nd European Tcl User
Meeting, Hamburg, Germany.
40
[Voelter et al., 2002] Voelter, M., Kircher, M., and Zdun, U. (2002). Object-oriented remoting: A
pattern language. In Proceedings of The First Nordic Conference on Pattern Languages of
Programs (VikingPLoP), Denmark.
[Vogel, 2001] Vogel, O. (2001). Service abstraction layer. In Proceedings of the 6th European
Conference on Pattern Languages of Programs (EuroPLoP), Irsee, Germany.
[Vogel and Zdun, 2002] Vogel, O. and Zdun, U. (2002). Content conversion and generation on
the web: A pattern language. In Proceedings of the 7th European Conference on Pattern
Languages of Programs (EuroPLoP), 2002, Irsee, Germany.
[Welch, 2000] Welch, B. (2000). The TclHttpd web server. In Proceedings of Tcl2k: The 7th
USENIX Tcl/Tk Conference, Austin, Texas, USA.
[Zdun, 2002a] Zdun, U. (2002a). Language Support for Dynamic and Evolving Software A
chitectures. PhD thesis, University of Essen, Germany.
[Zdun, 2002b] Zdun, U. (2002b). Xml-based dynamic content generation and conversion for the
multimedia home platform. In Proceedings of the Sixth International Conference on Integrated
Design and Process Technology (IDPT), Pasadena, USA.