distributed systems 2006 virtual synchrony ii* *with material adapted from ken birman, ben wang,...
Post on 22-Dec-2015
222 views
TRANSCRIPT
Distributed Systems 2006
Virtual Synchrony II*
*With material adapted from Ken Birman, Ben Wang, Bill Burke, Bela Ban
Distributed Systems 2006 2
Plan
We skip Section 18.2
Tracking group membership: We’ll base it on 2PC and 3PC
Fault-tolerant multicast: We’ll use membership
Ordered multicast: We’ll base it on fault-tolerant multicast
Tools for solving practical replication and availability problems: we’ll base them on ordered multicast
Robust Web Services: We’ll build them with these tools
2PC and 3PC: Our first “tools” (lowest layer)
Distributed Systems 2006 3
Recap: Elements of Virtual Synchrony
Support for process groups– Processes may join and leave dynamically – Excluded if (thought to) fail
Various reliable, ordered multicast protocols– Strive to replace synchronous, totally-ordered, dynamically uniform
protocols View-synchronous delivery
– Two processes that are member of the same view receive same set of multicasts during that view
Identical process groups views and rankings– Identical sequences of group membership lists
Gap-freedom guarantees– If, after a failure, m1 has been delivered to a destination, then any
message m0 that should be delivered prior to m1 is also delivered State transfer for joining processes
– A new process may obtain group state from existing member– (Will develop this shortly)
Distributed Systems 2006 4
Distributed Algorithms
Election Consensus Consistent snapshot Replicated data and synchronization State transfer Load-balancing Fault tolerance
– Primary-backup– Coordinator-cohort
Distributed Systems 2006 5
State transfer
We need to transfer current state of a process group to joining members?
”State” is application-dependent– Creating state should be done by
application itself– E.g., org.jgroups.MessageListener
• byte[] getState()• void setState(byte[] state)
Simple approach– Just make all members transfer
state to joining process
Reasonable approach– Joining process pulls state from
one existing member– Take another if first one fails
Distributed Systems 2006 6
Load-Balancing
Coordinate members of a group to share workload in order to obtain a speed-ud for parallelism?
Use groups communication to implement load-balancing of request...
Styles of algorithms– Group decides who should handle request– Client decides who should handle request
Distributed Systems 2006 7
Group Decides
Multicast request from client to full membership– May require expensive transfer to all
Need deterministic rule for deciding who should handle request– E.g., with abcast, requests may be numbered in a
total order– A process may handle the ith request if the process
rank is i mod n
With cbcast?
Distributed Systems 2006 8
Client affinity schemes
Group members provide clients with information used to select appropriate serve to send request to
Best choice dependent on data size, fault-tolerance needed, queries/updates, ...
E.g.,– Static assigment of client to specific server
• + caching, - for very active clients
– Pick a random server– Base choice on (approximate) load information
• Could also be used with previous approach
Distributed Systems 2006 9
Using approximate load information
Assume that processes send out load reports using abcast– E.g., 0 for no load, 1 for currently handling 1 request etc.
Represent load on group of n servers as a vector: [l0, ..., ln-1]– lmax = max(l0, ..., ln-1) + 1– [l0’, ..., ln-1’] = [lmax - l0, ..., lmax - ln-1]– L’ = l0 ’ + ... + ln-1 ’
Map incoming requests to process– Given a request, choose a random number, r, between 0 and L’
• By applying pseudo-random generator, same seed at all processes
– Now choose process i if l0’ + ... + li’ <= r < l0’ + ... + li+1’ (i < n-1)
(Think of l0’, ..., li’ as points on a line with length L’, then the algorithm selects the segment that r is within)
Distributed Systems 2006 10
Fault tolerance
We want to offer clients “fault-tolerant request execution”?
We can replace a traditional service with a group of members– Each request is assigned to a
primary (ideally, spread the work around) and a backup
• Primary sends a “cc” of the response to the request to the backup
– Backup keeps a copy of the request and steps in only if the primary crashes before replying
Sometimes called “coordinator/cohort” just to distinguish from “primary/backup”
Distributed Systems 2006 12
Trade-offs
Membership– Static
• Fixed membership, changing connectivity and availability
– Dynamic• Changing membership, fixed
connectivity and availability
Consistency– Internal
• Defined with respect to members of group observing messages
– External• Defined with respect to
external observer (e.g., a database)
Distributed Systems 2006 14
Features of major virtual synchrony platforms
Isis: first and no longer widely used– But was perhaps the most successful; has major roles
in NYSE, Swiss Exchange, French Air Traffic Control system (two major subsystems of it), US AEGIS Naval warship
– Pioneered use of cbcast– Also was first to offer a publish-subscribe interface
that mapped topics to groups
Distributed Systems 2006 15
Features of major virtual synchrony platforms
Horus, JGroups and Ensemble– Successors to Isis– These focus on flexible protocol stack linked directly into
application address space• A stack is a pile of micro-protocols• Can assemble an optimized solution fitted to specific needs of the
application by plugging together “properties this application requires”, lego-style
• The system is optimized to reduce overheads of this compositional style of protocol stack
Use– JGroups is very popular
• Used in, e.g., JBoss, OpenSymphony OSCache, Jetty, Tomcat– Ensemble is somewhat popular and supported by a user
community– Horus works well but is not widely used.
Distributed Systems 2006 16
Horus/JGroups/Ensemble protocol stacks
Application belongs to process group
comm
nak
frag
mbrshp fc
comm
comm
nak frag
comm
nak
frag mbrshp
parcld
comm
nak
frag
mbrshp merge
total total
Distributed Systems 2006 17
Spread Toolkit
Focused on a sort of “RISC” approach– Very simple architecture and system– Fairly fast, easy to use, rather popular
Supports one large group within which user sees many small “lightweight” subgroups that seem to be free-standing
Protocols implemented by Spread “agents” that relay messages to apps
Distributed Systems 2006 18
Case: J2EE/JBoss
Java 2 Enterprise Edition (J2EE)– Multi-tiered, distributed application model / reference
architecture• Tiered = physically layered architecture
– Technologies to support this reference architecture
Enterprise Java Beans (EJB)– Server-side component model– Component: “… a unit of composition with contractually specified
interfaces and explicit context dependencies only. A software component can be deployed independently and is subject to composition by third parties”
JBoss– Open source J2EE Application Server– Arguably the most popular Java application server
Distributed Systems 2006 19
J2EE Business Context
Powerful workstations (and servers) makes distributed computing viable– Programming abstractions for distributed computing needed
The Web…– Internet-enabled business systems / enterprise information systems– Specific requirements for web applications
• Scalability– support variations in load– e.g., Amazon before Christmas
• Availability– Very small downtime periods– e.g., eBay (400 million transactions/day)
• Security– Authenticate and authorize users
• Usability– Different users should have different contents in different forms
• Performance– Reasonable response times needed– Requests often arrive in bursts
– Also, e.g., time-to-market...
Distributed Systems 2006 21
Tiers
Client tier– User interfaces
• Internet browser• Standalone Java clients• (COM applications)
Middle tier– Web tier
• Web server for handling requests from browsers– Gets request from client tier– Forwards to business component tier– Renders result
– Business component tier• Core “business logic”
– E.g., customers, accounts, …, relationships and rules among these in Web shop• Realized by EJBs running in an EJB container• “Application server” = middle-tier component server compatible with J2EE
Enterprise information systems tier– Databases– Backend systems
Distributed Systems 2006 22
Example: EHR in Ribe Amt
DB2:Database
RMI-IIOP
<<SessionBean>>:LoginSessionBean
Trifork: Application Server
<<SessionBean>>:CommandSessionBean
<<SessionBean>>:PrintHandler
PC:Client
PC:Server
MVS:Backend
DB2:Database
XML
PAS:Fødesystem
<<SessionBean>>:PersistencyDelegate
:Security
Medicin:UI
Lab:UI
Notat:UI
:Validering
MQ Series
MQ Series
Distributed Systems 2006 23
Specific J2EE-Related Technologies
JavaServer Pages (JSP)– Creating dynamic web-based content
Java Servlets– Extending functionality of web servers
Java Messaging Service (JMS)– Asynchronous point-to-point and many-to-many messaging
Java Naming and Directory Interface (JNDI)– Directory-based retrieval of user-defined objects
J2EE Connector Architecture– Standard architecture for integrating external systems
RMI over IIOP– RMI using OMG’s Internet Inter-Orb Protocol
Java DataBase Connectivity (JDBC)– Uniform interface to relational databases
Enterprise JavaBeans (EJB)
Distributed Systems 2006 24
EJB Deployment View
EJB container– Manage execution of components– Expose platform services
Distributed Systems 2006 25
EJB Code View
Need to enable remote clients to access bean– Remote interface– ~ Proxy object
Manage lifecycle of bean– Home interface– Possibly functionality to
locate specific instances– ~ Factory object
Implement functionality– Bean class
Clients use generated stubs
Distributed Systems 2006 27
EJB Types
Entity beans– Representing business data objects– Data members map to data items stored in associated data base– Container-managed persistence
• Container loads and stores data• No application code required
– Bean-managed persistence• Bean code responsible itself for persistence
– “handcrafted” JDBC
Session beans– Business logic and services– “Stateful“ (SFSBs)
• Can keep state on behalf of client• Successive calls go to same component• Container handles life-cycle
– Passivation– Activation
• E.g., CommandBean– “Stateless” (SLSBs)
• Does not keep any state on behalf of client• Each successive call delegated to stateless
session bean as needed• Easy scalability and load balancing• E.g., PrintHandlerBean
Message-driven beans– Stateless– Asynchronous listener style of invocation
Distributed Systems 2006 28
”Clustering” and J2EE
Scalability– I want to handle x times the number of concurrent access than what I
have now High availability
– Services are accessible with reasonable (and predictable) response times at any time
– E.g., 99.999 (5 Nines in Telco) Load balancing
– A way to obtain high availability and better performance by dispatching incoming requests to different servers
– Session affinity (or stickiness)– Checking heart beat
Failover– Process can continue when it is re-directed to a “backup” node because
the original one fails– What is the policy? Round-robin?
Fault tolerance– A service that guarantees strictly correct behavior despite system failure
Distributed Systems 2006 29
JGroups in JBoss?
Serverless JMS Clustering
– Replication of entity beans, SLSBs and SFSBs– HA-JNDI– Session replication (integrated Tomcat, Jetty)
Cache– Replicated transactional clustered cache
Distributed Systems 2006 31
Default JBoss JGroup Configuration
<UDP mcast_addr="${jboss.partition.udpGroup:228.1.2.3}" mcast_port="45566"ip_ttl="8" ip_mcast="true" mcast_send_buf_size="800000” mcast_recv_buf_size="150000"
ucast_send_buf_size="800000" ucast_recv_buf_size="150000" loopback="false"/>
<PING timeout="2000" num_initial_members="3" up_thread="true" down_thread="true"/><MERGE2 min_interval="10000" max_interval="20000"/><FD shun="true" up_thread="true" down_thread="true" timeout="2500" max_tries="5"/><VERIFY_SUSPECT timeout="3000" num_msgs="3" up_thread="true" down_thread="true"/><pbcast.NAKACK gc_lag="50" retransmit_timeout="300,600,1200,2400,4800" max_xmit_size="8192" up_thread="true" down_thread="true"/><UNICAST timeout="300,600,1200,2400,4800" window_size="100" min_threshold="10" down_thread="true"/><pbcast.STABLE desired_avg_gossip="20000" up_thread="true" down_thread="true"/><FRAG frag_size="8192” down_thread="true" up_thread="true"/><pbcast.GMS join_timeout="5000" join_retry_timeout="2000" shun="true" print_local_addr="true"/><pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/>
Distributed Systems 2006 32
Serverless JMS
Java Messaging Service based on JGroups– Peer-to-peer architecture rather than Client/Server
Client publishing to a topic– Other clients subscribe to topics– The ”publish/subscribe” paradigm
Instead of sending message to server, and server distributes to multiple clients: publisher multicasts message– JMS Server just another member
• Handles persistent messages (DB)
Distributed Systems 2006 33
Serverless JMS
Publisher
JMS Server
Subscriber
Subscriber
Subscriber
Client/Server Model
Publisher
JMS Server
Subscriber
Subscriber
Subscriber
Serverless Model
Multicast
(discard) (accept)
(accept)
(accept)
(accept)
(discard)
Cost: 4 unicasts Cost: 1 multicast
Distributed Systems 2006 34
Serverless JMS
Clients are still able to publish even when server is down
Caveat– works in scenario where client and server are in same
multicast-reachable network