lecture 2: basic enterprise architecture modules and patterns gustavo alonso systems group computer...
TRANSCRIPT
Lecture 2:Basic Enterprise Architecture Modules and PatternsGustavo AlonsoSystems GroupComputer Science DepartmentSwiss Federal Institute of Technology (ETHZ)[email protected]://www.iks.inf.ethz.ch/
©Gustavo Alonso, ETH Zürich. 2
Contents Communication
Synchronous –blocking- interaction Asynchronous –non blocking- interaction Batch transfer
Additional modules Name and directory services Persistence Security Transactions Routing and Filtering
Examples of use of patterns Hardware fault tolerance patterns Software fault tolerance patterns Performance patterns
Synchronous and Asynchronous Interaction
©Gustavo Alonso, ETH Zürich. 4
Synchronous interaction Blocking communication (the caller waits until the called responds)
Communication modeled as Request Response
Good match for programming language modularity: Request is a method call Response is the return of a method call Programming model does not change Matches the semantics of programming language (parameters, variables,
methods) program control flow
inter process communication• local• LAN• WAN
©Gustavo Alonso, ETH Zürich. 5
Properties of the synchronous pattern
Advantages Tightly coupled interaction
Speed Simplicity
Simple to understand for developing and debugging
Close match to programming languages (RPC, RMI)
Easy to define interfaces between the interacting parties
Disadvantages Tightly coupled interaction
Reduced fault tolerance Introduces distributed dependencies Makes maintenance and upgrading more complex
Too simple to allow realistic interactions (must be extended with other patterns)
©Gustavo Alonso, ETH Zürich. 6
Fault tolerance in tight coupling Quick review of basics:
Probability of event A = P(A) Mean time to event A = 1/P(A) (memoryless, small P(A)) For several, independent events, the mean time to the
occurrence of any of those events = 1/(P(A) + P(B) + P(C))
With tight coupling (client = caller; server = called) MTTF client = C MTTF server = S MTTF system = 1/(1/C + 1/S) If C = S, then MTTF system = C/2
With tight coupling, the reliability is reduced by half (assuming equal failure probabilities for each component).
With N components, MTTF = C/N
©Gustavo Alonso, ETH Zürich. 7
Simple example Probability that the client fails .01 in a day Probability that the server fails .001 in a day
Mean Time To Failure: Client = 100 days Server = 1000 days
Mean Time To Failure of Client-Server system = 90.9 days
In a client server system, the overall availability will be less than the availabilities of the client and the server
©Gustavo Alonso, ETH Zürich. 8
Asynchronous interactionAsynchronous
Non-blocking communication (neither caller nor called must wait)
Communication modeled as messages events
Not a good match for existing programming languages Streams rather than calls Asynchronous control flow Impedance mismatch
program control flow
inter process communication• local• LAN• WAN
©Gustavo Alonso, ETH Zürich. 9
Properties of the asynchronous pattern
Advantages Loosely coupled
Makes interacting parties independent Messages can be processed in flight
Easier to implement reliable delivery
Additional functionality can be implemented in the messaging system rather than on the communication ends.
Disadvantages Loosely coupled
Requires messaging system Overhead Impedance mismatch
Communication is explicit (send, receive, forward)
Interaction is more complex and involves more elements: more difficult to trace, monitor, and debug
©Gustavo Alonso, ETH Zürich. 10
Architectural possibilities of Async. Pat.
One to many
Many to one
workflow
persistent messaging
messageforwarding
©Gustavo Alonso, ETH Zürich. 11
Batch transfer Batch transfer is a form of asynchronous communication used for:
Large amounts of data File based exchanges (ftp) Data collections Batch update jobs Data uploads
We will not mention it too often during the course but keep in mind that for certain tasks, batch transfer is the best solution and that it complements the other two: Synchronous = parameters Asynchronous = messages Batch = files, collections, …
Additional architectural modules
©Gustavo Alonso, ETH Zürich. 13
Name and directory service Most basic extension to the synchronous interaction pattern
Avoid having to name the destination Ask where destination is Then bind to destination
Advantages: Development is independent of deployment properties (e.g., network
address) More flexibility:
• Change of address Can be combined with:
• Load balancing• Monitoring• Routing• Advanced service search
Name and directory service
1. register2. lookup
3. address
4. request
5. response
©Gustavo Alonso, ETH Zürich. 14
Persistence Persistence is used in all patterns to ensure reliability and recoverability
Persistence keeps a record on stable storage of the relevant state changes of a system
Can be implemented On file system On databases
Persistence does not change the interaction or the nature of the architecture but it does confer properties that are important for fault tolerance
Persistence is typically expensive but often unavoidable and necessary
persistent messaging
persistent objects
logging
©Gustavo Alonso, ETH Zürich. 15
Security Security has many aspects:
Authentication Authorization Confidentiality …
Sometimes it involves patterns: Authorization (credentials, log in, certificates)
Other times it is part of the infrastructure: Cells Domains Controls in the message layer
In the enterprise, security is very important but does not figure prominently in the architecture as it is assumed it is built in in the interactions (this leads to several problems …)
©Gustavo Alonso, ETH Zürich. 16
Transactions Transactions establish guarantees on interactions:
Atomicity: all or nothing Recoverability: ability to recall what happened and reconstruct a previous
state of the system
Implemented through an additional module Keeps track of transactions Runs transactional protocols
Transaction manager
1. Begin2. Txnal. context
3. request
4. request
5. request
6. commit
7. 2 phasecommit
7. 2 phasecommit
©Gustavo Alonso, ETH Zürich. 17
Routing and filtering Routing allows to direct calls to the most appropriate service. It works for both sync. and async. patterns Routing can be based on:
Performance (load balancing) Availability (what works) Contents (e.g., price value) …
Filtering is similar to routing but may also involve: Eliminating messages or calls (incorrect data) Modifying messages o calls (to extend the data or adapt it to a new
interface) Sorting and prioritizing
router
Organizing the architectural modules
©Gustavo Alonso, ETH Zürich. 19
What is common to all of them? All these additional modules have one aspect in common:
They involve introducing an additional module layer where the new functionality is available
Why as a module or additional layer: optional use can be added to already existing systems without changing them much
When all these modules are taken together, homogenized and included in a single platform, the result is an enterprise middleware tool.
©Gustavo Alonso, ETH Zürich. 20
Module proliferation Starting from the simplest pattern (synchronous interaction), adding any new
functionality implies additional modules: Name and directory service Transactions Security …
Historically these modules have been added in an increasingly structured manner: Ad-hoc, code level compatibility (e.g., RPC DCE) Model specific, specification level compatibility (e.g., CORBA) Model independent, specification level compatibility (e.g., Web Services)
The transition from 2 Tier Architectures to 3 Tier Architectures also happened as a result of attempts to organize the additional modules a 2 Tier Architecture needed anyway.
©Gustavo Alonso, ETH Zürich. 21
A historical tour of architectural modules
Middleware platforms were traditionally built around one or two key design decisions (transactions = TP Monitors, transactional OO design = Object Monitors, persistence).
Different platforms and products were conceptually similar but incompatible at all levels
Because conceptually they were all very similar, some systems were used because of the overlapping functionality, not because of the key aspects of a system (e.g., CORBA reinvented the wheel in many areas)
RPCName
services persistence
security
transactions
Runtime engine
©Gustavo Alonso, ETH Zürich. 22
EAI in the 80’s - 90’s The proliferation of:
Products Functionality Systems Services
… led (leads) to a wildly heterogeneous mix of platforms, models, interfaces, and technologies
With the transition from 2-Tier to 3-Tier, the advent of faster networks and eventually the Internet, the need to make it all work together increased significantly
Hence the need for Standardization Enterprise Architecture
RPCName
services persistence
security
transactions
Runtime engine
RPC
Name
services
persistence
security
transactions
Runtime
engineRPC
Name servicespersistence
securitytransactions
Runtime engine
RPCName services persistence
security
transactionsRuntime
engine
Examples of the use of patterns• Hardware fault tolerance
©Gustavo Alonso, ETH Zürich. 24
Hardware fault tolerance Enterprise system require a high degree of reliability and fault
tolerance This can be achieved through
Hardware (high end machines, RAID systems) Software (architectural patterns)
We start with hardware patterns to illustrate the basic principles and to show why certain hardware is always needed to guarantee certain levels of fault tolerance
©Gustavo Alonso, ETH Zürich. 25
Key concepts Modularity
Separates functionality in black boxes Modules can be made redundant
Failfast Clean failure semantics Detects failures and stops (failfast), or Forwards only results from working modules (failvote)
Recovery Repairing a faulty module after the failure Mean Time To Repair
We assume memory-less systems, independent failures, and small probabilities.
©Gustavo Alonso, ETH Zürich. 26
Failfast patterns Pairing or duplexing
Two modules Compare outputs If they disagree, stop (failure detected)
Can be generalized to N modules Works as long as a majority of modules work Output is output of the majority of modules No majority = failure (stop)
Triple Module Redundancy (TMR) Using 3 modules
Recursive failfast patterns Used to detect comparator failures Reduced MTTF
comparator
comparator
comparator comparator
comparator comparator
©Gustavo Alonso, ETH Zürich. 27
Simple analysis (failvote) I
Simple pair MTTF module = 10 years MTTF pair = 5 years Stops as soon as there is no majority working (the important thing is that it
stops)
For triplex = 8.3 years
comparator
Redundant pair With failvote, MTTF does not improve because the system fails as soon as one
of the two pairs fails MTTF module = 10 years MTTF pair = 5 years MTTF redundant pair = 2.5 years
The redundant pair tolerates failures in the connectors and comparators
comparator comparator
comparator comparator
©Gustavo Alonso, ETH Zürich. 28
Simple analysis (failvote) II Why doing this, then?
MTTF decreases Significant redundancy needed
The reasons are: Failfast is important as it provides clean semantics Differences between transient and permanent failures
These patterns can mask transient failures by simply signaling an invalid output Ratio 1:100 hard/soft errors MTTF simple pair = 500 years MTTF red. pair = 250 years MTTF pair and spare = 375 y.
Failfast (instead of Failvote) will increase the MTTF as the number of modules increases
Pair and spare MTTF module = 10 years MTTF system = 7.5 years (calculate
the probability that any of the 4 redundant pair modules fails, 2.5 years, then the probability that any of the remaining two modules fail, 5 years, total 2.5 + 5)
comparator comparator
comparator comparator
OR OR
©Gustavo Alonso, ETH Zürich. 29
High availability through recovery High availability is achieved when the failfast patterns are
combined with recovery of failed modules Mean Time To Repair = time between a failure and the module
working again. Probability one module is unavailable MTTR/(MTTR+MTTF) If MTTF >> MTTR, then MTTR/(MTTR+MTTF) ->
MTTR/MTTF Probability failure of a redundant system with n modules:
(n/MTTF)*(MTTR/MTTF)^(n-1) MTTF for such a system is then
(MTTF/n)*(MTTF/MTTR)^(n-1)
Assume modules with MTTF = 1 year, MTTR = 4 hours MTTF simple pair = 1095 years MTTF triplex = 1’600’000 years
Examples of the use of patterns• Software fault tolerance
©Gustavo Alonso, ETH Zürich. 31
Notation (IBM patterns) We will mostly use a notation proposed by IBM to describe patterns. Types of patterns
Business patterns: describe the interaction at a high level Integration patterns: describe the way systems can be connected Composite patterns: combination of business and integration patterns Application patterns: logical components that make up a solution and the
way they interact Runtime patterns: refinement of the application pattern mapping logical
components to physical run-time nodes Product mappings: map the runtime and application patterns into concrete
products implementing the necessary functionality
From IBM Patterns for e-business
©Gustavo Alonso, ETH Zürich. 32
Making a simple system highly available
Assume a simple interaction: User outside firewall (e.g., browser over internet) Presentation and Application are “local” Synchronous interaction IBM = stand alone single channel application pattern
The question is how to make it highly available We will do this by progressively introducing patterns and layers each one
conferring the system a new property
Figures from “Patterns for the edge of network”. Voegeli & Braswell - IBM Redbook, Nov. 2002
©Gustavo Alonso, ETH Zürich. 33
Rules for high availability Rules are similar to the ones described for hardware fault tolerance
Redundancy There must be a replacement for every module that can fail This implies modularity (as in hardware)
Monitoring for failure Detecting that a failure has occurred This implies some sort of comparator (as in hardware) Failures are also software (exceptions, error codes)
Suppressing failed entities Once a module is determined to be faulty, it should be removed It implies a awareness of all member modules and their status Unlike in hardware, membership can be dynamic
©Gustavo Alonso, ETH Zürich. 34
Basic auxiliary modules There are several options to group
modules so that they provide redundancy: High availability pair
• Primary-back up: one module does the work, the other is in stand-by in case of failures
• Peer pairs: both modules work in parallel and monitor each other
Cluster: several modules running on a set of parallel entities (processes, machines), typically no cross monitoring and not aware of each other
Pool: A special type of cluster where the modules are threads residing in a single machine
Load balancer: module that is aware of all modules in a cluster or pool and is in charge of Monitoring Distributing jobs Suppressing failed modules
©Gustavo Alonso, ETH Zürich. 35
Basic pattern (no high availability)• Outside world = Internet• DMZ = “demilitarized zone” internal to the company but not trusted (no confidential material reachable from the outside)
©Gustavo Alonso, ETH Zürich. 36
Option 1: single load balancer The single load balancer distributes requests to two application
servers The application servers implement the presentation and
application layers for the application Provides redundancy for application server Scalability is achieved by adding more servers to the cluster No redundancy for load balancer
©Gustavo Alonso, ETH Zürich. 37
Option 2: hot standby load balancer To improve over option 1, one can introduce a hot standby back up
for the load balancer This is a primary/back-up pair where the second load balancer is
not active but ready to take over in case of failure (failure detection by heartbeat exchanged between the load balancers)
©Gustavo Alonso, ETH Zürich. 38
Option 3: mutual high availability Two load balancers monitoring each other (heartbeat) Each one with its own cluster Take over (aliasing of IP address) if one load balancer fails The system is now highly available and scalable but it is also more
complex
©Gustavo Alonso, ETH Zürich. 39
Option 4: wide area load balancing Like Option 2 but with load balancers being able to forward load
to remote load balancers It balances entire sites rather than modules Provides high availability for site failures
Examples of the use of patterns• Performance patterns
©Gustavo Alonso, ETH Zürich. 41
Performance For our purposes here, the performance of a system can be improved by:
Adding resources: add more modules so that more requests can be processed in parallel (redundancy)
Lower the workload: Organizing the architecture so that certain operations take less work to complete (caching, specialization of modules)
Split the workload and parallelize: Divide the tasks into sub-components and organize the architecture so that it is possible to execute some of these sub-tasks in parallel
The important aspect of these patterns is what they allow to do in terms of scalability as migration from one to the other is not easy (or not at all possible without redesigning everything). Once the architecture is fixed, so will many of the properties of the system
©Gustavo Alonso, ETH Zürich. 42
Starting pattern Simple web server application Two application servers, initially tying together web server and application
server functionality What are the architectural variations that will increase performance?
©Gustavo Alonso, ETH Zürich. 43
Specialization Separate the application server from the web server.
Web Server redirector: static HML content, request forwarding Application server: application specific functionality Calls to static content take less time, create less work
Security is increased by moving Application Servers behind domain firewall Makes it easier to add resources where bottleneck is (static content, dynamic
content, processing, etc.)
©Gustavo Alonso, ETH Zürich. 44
More specialization: Separation Separate the presentation layer from the application server layer (in essence,
turn the application server into a 2-Tier system) Presentation servers take care of tasks such as page formatting,
composing frames, generating HTML, etc. Application servers run the business logic
The advantages at the application server level are the same as the advantages of the 2-Tier model
Requires to be able to separate the presentation layer from the application layer
©Gustavo Alonso, ETH Zürich. 45
Lower workload: caching proxy Add a pure caching proxy (a module that only caches data, does not do
any processing or load balancing: it stores complete pages and matches them to the request. If there is a match, the page is returned without any further processing)
Reduced response time for content that can be cached (dynamic or static) Eliminates workload from the rest of the system Caches need to be maintained to avoid stale data
©Gustavo Alonso, ETH Zürich. 46
Lessons learned Some architectural designs can be captured in the form of patterns
Understanding these patterns is important to understand the properties of each architecture and to be able to make the right design decisions
There are many patterns and many possible combinations between them
Important as well is the cost of transitioning from one pattern to another: Adding proxy caches is relatively easy Splitting an application is difficult