distributed systems 2006 virtual synchrony ii* *with material adapted from ken birman, ben wang,...

35
Distributed Systems 2006 Virtual Synchrony II* *With material adapted from Ken Birman, Ben Wang, Bill Burke, Bela Ban

Post on 22-Dec-2015

222 views

Category:

Documents


0 download

TRANSCRIPT

Distributed Systems 2006

Virtual Synchrony II*

*With material adapted from Ken Birman, Ben Wang, Bill Burke, Bela Ban

Distributed Systems 2006 2

Plan

  We skip Section 18.2

Tracking group membership: We’ll base it on 2PC and 3PC

Fault-tolerant multicast: We’ll use membership

Ordered multicast: We’ll base it on fault-tolerant multicast

Tools for solving practical replication and availability problems: we’ll base them on ordered multicast

Robust Web Services: We’ll build them with these tools

2PC and 3PC: Our first “tools” (lowest layer)

Distributed Systems 2006 3

Recap: Elements of Virtual Synchrony

  Support for process groups– Processes may join and leave dynamically – Excluded if (thought to) fail

  Various reliable, ordered multicast protocols– Strive to replace synchronous, totally-ordered, dynamically uniform

protocols  View-synchronous delivery

– Two processes that are member of the same view receive same set of multicasts during that view

  Identical process groups views and rankings– Identical sequences of group membership lists

  Gap-freedom guarantees– If, after a failure, m1 has been delivered to a destination, then any

message m0 that should be delivered prior to m1 is also delivered  State transfer for joining processes

– A new process may obtain group state from existing member– (Will develop this shortly)

Distributed Systems 2006 4

Distributed Algorithms

  Election  Consensus  Consistent snapshot  Replicated data and synchronization  State transfer  Load-balancing  Fault tolerance

– Primary-backup– Coordinator-cohort

Distributed Systems 2006 5

State transfer

  We need to transfer current state of a process group to joining members?

  ”State” is application-dependent– Creating state should be done by

application itself– E.g., org.jgroups.MessageListener

• byte[] getState()• void setState(byte[] state)

  Simple approach– Just make all members transfer

state to joining process

  Reasonable approach– Joining process pulls state from

one existing member– Take another if first one fails

Distributed Systems 2006 6

Load-Balancing

  Coordinate members of a group to share workload in order to obtain a speed-ud for parallelism?

  Use groups communication to implement load-balancing of request...

  Styles of algorithms– Group decides who should handle request– Client decides who should handle request

Distributed Systems 2006 7

Group Decides

  Multicast request from client to full membership– May require expensive transfer to all

  Need deterministic rule for deciding who should handle request– E.g., with abcast, requests may be numbered in a

total order– A process may handle the ith request if the process

rank is i mod n

  With cbcast?

Distributed Systems 2006 8

Client affinity schemes

  Group members provide clients with information used to select appropriate serve to send request to

  Best choice dependent on data size, fault-tolerance needed, queries/updates, ...

  E.g.,– Static assigment of client to specific server

• + caching, - for very active clients

– Pick a random server– Base choice on (approximate) load information

• Could also be used with previous approach

Distributed Systems 2006 9

Using approximate load information

  Assume that processes send out load reports using abcast– E.g., 0 for no load, 1 for currently handling 1 request etc.

  Represent load on group of n servers as a vector: [l0, ..., ln-1]– lmax = max(l0, ..., ln-1) + 1– [l0’, ..., ln-1’] = [lmax - l0, ..., lmax - ln-1]– L’ = l0 ’ + ... + ln-1 ’

  Map incoming requests to process– Given a request, choose a random number, r, between 0 and L’

• By applying pseudo-random generator, same seed at all processes

– Now choose process i if l0’ + ... + li’ <= r < l0’ + ... + li+1’ (i < n-1)

  (Think of l0’, ..., li’ as points on a line with length L’, then the algorithm selects the segment that r is within)

Distributed Systems 2006 10

Fault tolerance

  We want to offer clients “fault-tolerant request execution”?

  We can replace a traditional service with a group of members– Each request is assigned to a

primary (ideally, spread the work around) and a backup

• Primary sends a “cc” of the response to the request to the backup

– Backup keeps a copy of the request and steps in only if the primary crashes before replying

  Sometimes called “coordinator/cohort” just to distinguish from “primary/backup”

Distributed Systems 2006 11

Trade-offs

Distributed Systems 2006 12

Trade-offs

  Membership– Static

• Fixed membership, changing connectivity and availability

– Dynamic• Changing membership, fixed

connectivity and availability

  Consistency– Internal

• Defined with respect to members of group observing messages

– External• Defined with respect to

external observer (e.g., a database)

Distributed Systems 2006 13

Toolkits

  Isis  Horus  Ensemble  JGroups  Spread

Distributed Systems 2006 14

Features of major virtual synchrony platforms

  Isis: first and no longer widely used– But was perhaps the most successful; has major roles

in NYSE, Swiss Exchange, French Air Traffic Control system (two major subsystems of it), US AEGIS Naval warship

– Pioneered use of cbcast– Also was first to offer a publish-subscribe interface

that mapped topics to groups

Distributed Systems 2006 15

Features of major virtual synchrony platforms

  Horus, JGroups and Ensemble– Successors to Isis– These focus on flexible protocol stack linked directly into

application address space• A stack is a pile of micro-protocols• Can assemble an optimized solution fitted to specific needs of the

application by plugging together “properties this application requires”, lego-style

• The system is optimized to reduce overheads of this compositional style of protocol stack

  Use– JGroups is very popular

• Used in, e.g., JBoss, OpenSymphony OSCache, Jetty, Tomcat– Ensemble is somewhat popular and supported by a user

community– Horus works well but is not widely used.

Distributed Systems 2006 16

Horus/JGroups/Ensemble protocol stacks

Application belongs to process group

comm

nak

frag

mbrshp fc

comm

comm

nak frag

comm

nak

frag mbrshp

parcld

comm

nak

frag

mbrshp merge

total total

Distributed Systems 2006 17

Spread Toolkit

  Focused on a sort of “RISC” approach– Very simple architecture and system– Fairly fast, easy to use, rather popular

  Supports one large group within which user sees many small “lightweight” subgroups that seem to be free-standing

  Protocols implemented by Spread “agents” that relay messages to apps

Distributed Systems 2006 18

Case: J2EE/JBoss

  Java 2 Enterprise Edition (J2EE)– Multi-tiered, distributed application model / reference

architecture• Tiered = physically layered architecture

– Technologies to support this reference architecture

  Enterprise Java Beans (EJB)– Server-side component model– Component: “… a unit of composition with contractually specified

interfaces and explicit context dependencies only. A software component can be deployed independently and is subject to composition by third parties”

  JBoss– Open source J2EE Application Server– Arguably the most popular Java application server

Distributed Systems 2006 19

J2EE Business Context

  Powerful workstations (and servers) makes distributed computing viable– Programming abstractions for distributed computing needed

  The Web…– Internet-enabled business systems / enterprise information systems– Specific requirements for web applications

• Scalability– support variations in load– e.g., Amazon before Christmas

• Availability– Very small downtime periods– e.g., eBay (400 million transactions/day)

• Security– Authenticate and authorize users

• Usability– Different users should have different contents in different forms

• Performance– Reasonable response times needed– Requests often arrive in bursts

– Also, e.g., time-to-market...

Distributed Systems 2006 20

Architectural Solution

Distributed Systems 2006 21

Tiers

  Client tier– User interfaces

• Internet browser• Standalone Java clients• (COM applications)

  Middle tier– Web tier

• Web server for handling requests from browsers– Gets request from client tier– Forwards to business component tier– Renders result

– Business component tier• Core “business logic”

– E.g., customers, accounts, …, relationships and rules among these in Web shop• Realized by EJBs running in an EJB container• “Application server” = middle-tier component server compatible with J2EE

  Enterprise information systems tier– Databases– Backend systems

Distributed Systems 2006 22

Example: EHR in Ribe Amt

DB2:Database

RMI-IIOP

<<SessionBean>>:LoginSessionBean

Trifork: Application Server

<<SessionBean>>:CommandSessionBean

<<SessionBean>>:PrintHandler

PC:Client

PC:Server

MVS:Backend

DB2:Database

XML

PAS:Fødesystem

<<SessionBean>>:PersistencyDelegate

:Security

Medicin:UI

Lab:UI

Notat:UI

:Validering

MQ Series

MQ Series

Distributed Systems 2006 23

Specific J2EE-Related Technologies

  JavaServer Pages (JSP)– Creating dynamic web-based content

  Java Servlets– Extending functionality of web servers

  Java Messaging Service (JMS)– Asynchronous point-to-point and many-to-many messaging

  Java Naming and Directory Interface (JNDI)– Directory-based retrieval of user-defined objects

  J2EE Connector Architecture– Standard architecture for integrating external systems

  RMI over IIOP– RMI using OMG’s Internet Inter-Orb Protocol

  Java DataBase Connectivity (JDBC)– Uniform interface to relational databases

  Enterprise JavaBeans (EJB)

Distributed Systems 2006 24

EJB Deployment View

  EJB container– Manage execution of components– Expose platform services

Distributed Systems 2006 25

EJB Code View

  Need to enable remote clients to access bean– Remote interface– ~ Proxy object

  Manage lifecycle of bean– Home interface– Possibly functionality to

locate specific instances– ~ Factory object

  Implement functionality– Bean class

  Clients use generated stubs

Distributed Systems 2006 26

Detailed Example

Distributed Systems 2006 27

EJB Types

  Entity beans– Representing business data objects– Data members map to data items stored in associated data base– Container-managed persistence

• Container loads and stores data• No application code required

– Bean-managed persistence• Bean code responsible itself for persistence

– “handcrafted” JDBC

  Session beans– Business logic and services– “Stateful“ (SFSBs)

• Can keep state on behalf of client• Successive calls go to same component• Container handles life-cycle

– Passivation– Activation

• E.g., CommandBean– “Stateless” (SLSBs)

• Does not keep any state on behalf of client• Each successive call delegated to stateless

session bean as needed• Easy scalability and load balancing• E.g., PrintHandlerBean

  Message-driven beans– Stateless– Asynchronous listener style of invocation

Distributed Systems 2006 28

”Clustering” and J2EE

  Scalability– I want to handle x times the number of concurrent access than what I

have now  High availability

– Services are accessible with reasonable (and predictable) response times at any time

– E.g., 99.999 (5 Nines in Telco)  Load balancing

– A way to obtain high availability and better performance by dispatching incoming requests to different servers

– Session affinity (or stickiness)– Checking heart beat

  Failover– Process can continue when it is re-directed to a “backup” node because

the original one fails– What is the policy? Round-robin?

  Fault tolerance– A service that guarantees strictly correct behavior despite system failure

Distributed Systems 2006 29

JGroups in JBoss?

  Serverless JMS  Clustering

– Replication of entity beans, SLSBs and SFSBs– HA-JNDI– Session replication (integrated Tomcat, Jetty)

  Cache– Replicated transactional clustered cache

Distributed Systems 2006 30

The real world is complex...

Distributed Systems 2006 31

Default JBoss JGroup Configuration

<UDP mcast_addr="${jboss.partition.udpGroup:228.1.2.3}" mcast_port="45566"ip_ttl="8" ip_mcast="true" mcast_send_buf_size="800000” mcast_recv_buf_size="150000"

  ucast_send_buf_size="800000" ucast_recv_buf_size="150000" loopback="false"/>

<PING timeout="2000" num_initial_members="3"  up_thread="true" down_thread="true"/><MERGE2 min_interval="10000" max_interval="20000"/><FD shun="true" up_thread="true" down_thread="true"  timeout="2500" max_tries="5"/><VERIFY_SUSPECT timeout="3000" num_msgs="3"  up_thread="true" down_thread="true"/><pbcast.NAKACK gc_lag="50" retransmit_timeout="300,600,1200,2400,4800"  max_xmit_size="8192"  up_thread="true" down_thread="true"/><UNICAST timeout="300,600,1200,2400,4800" window_size="100" min_threshold="10"  down_thread="true"/><pbcast.STABLE desired_avg_gossip="20000"  up_thread="true" down_thread="true"/><FRAG frag_size="8192” down_thread="true" up_thread="true"/><pbcast.GMS join_timeout="5000" join_retry_timeout="2000"  shun="true" print_local_addr="true"/><pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/>

Distributed Systems 2006 32

Serverless JMS

  Java Messaging Service based on JGroups– Peer-to-peer architecture rather than Client/Server

  Client publishing to a topic– Other clients subscribe to topics– The ”publish/subscribe” paradigm

  Instead of sending message to server, and server distributes to multiple clients: publisher multicasts message– JMS Server just another member

• Handles persistent messages (DB)

Distributed Systems 2006 33

Serverless JMS

Publisher

JMS Server

Subscriber

Subscriber

Subscriber

Client/Server Model

Publisher

JMS Server

Subscriber

Subscriber

Subscriber

Serverless Model

Multicast

(discard) (accept)

(accept)

(accept)

(accept)

(discard)

Cost: 4 unicasts Cost: 1 multicast

Distributed Systems 2006 34

Serverless JMS

  Clients are still able to publish even when server is down

  Caveat– works in scenario where client and server are in same

multicast-reachable network

Distributed Systems 2006 37

Summary

  We gave further examples of what can be built on top of Virtual Synchrony

  Brief examples of toolkits and uses– Isis, Ensemble Horus, Spread, JGroups

  J2EE introduction as bonus