filterfresh fault-tolerant java servers through active replication arash baratloo

28
Filterfresh Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo Arash Baratloo www.cs.nyu.edu/phd_students/baratloo www.cs.nyu.edu/phd_students/baratloo

Post on 20-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo

FilterfreshFilterfresh

Fault-tolerant Java Servers Through Active Replication

Arash BaratlooArash Baratloowww.cs.nyu.edu/phd_students/baratloowww.cs.nyu.edu/phd_students/baratloo

Page 2: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo

• Investigation of failure models in distributed Java applications

• Provide transparent fault-masking (to users and to programmers)

• Support highly available services in presence of failures

• Remove single-points of failure

FilterfreshFilterfresh

Page 3: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo

Remote Method Invocation (RMI)100% Java, hot, new, easy-to-use

and

Reliable Object Services (ROS) Interest in Providing:– support active-active replication– support Java objects

Motivating FactorsMotivating Factors

Page 4: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo

RoadmapRoadmap

Motivation

– RMI Registry & crash failures– RMI Server Architecture & crash failures– A Unified Solution -- process group

approach– Fault-tolerant Registry– Fault-tolerant RMI– Conclusion

Page 5: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo

RMI in a NutshellRMI in a Nutshell

• Servers register with the local registry

• Clients looks up a server at a well known registry

• Given a remote reference, client performs a remote method invocation

registry

client

serverlook

up"a

bc"

bind to

"abc"

Page 6: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo

Limitations of RMI RegistryLimitations of RMI Registry

• The “well known registry” requirement too restrictive for failure recovery

• Single point of failure• Can not support replicated servers, thus,

highly available servers

Page 7: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo

FT Registry requires...FT Registry requires...

• Distribute and replicate registry servers• Replication strategy to maintain a consistent

state• Failure detection and removal of failed

registry servers• Failed objects must be restarted

automatically• Dynamic addition of registry servers

Page 8: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo

clientapplication

server stub

Transport layer

serverapplication

server skeleton

remote referencelayer (RRL)

remote referencelayer (RRL)

RMI ArchitectureRMI Architecture

• RRL assumes a stream-oriented transport• Transport layer implemented on TCP/IP

Page 9: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo

Architecture Architecture (cont…)(cont…)

client

stub

RRLtransport

RRL

skel

server

interface Server public void foo();

class Client { ... Server s = lookup... s.foo();

Page 10: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo

Architecture Architecture (cont…)(cont…)

client

stub

RRLtransport

RRL

skel

server

interface Server public void foo();

class ServerImplextends ... { public void foo() { ... }

Page 11: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo

Architecture Architecture (cont…)(cont…)

client

stub

RRLtransport

RRL

skel

server

Page 12: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo

client

stub

RRLtransport

RRL

skel

server

class ServerImpl extends

UnicastRemoteObject { public void foo() { ... }

Architecture Architecture (cont…)(cont…)

Transparent FT system implies RRL or below

Page 13: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo

FT Servers Require...FT Servers Require...

• Distribute and replicate servers• Replication strategy to maintain a consistent

state• Failure detection and removal of failed

registry servers• Dynamic addition of registry servers• Object reference must remain valid after the

associated object has failed

Page 14: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo

A Unified Solution...A Unified Solution...

Process Group Approach where all non-faulty

objects– form a group– consistent view of the group– interact through reliable group primitives --

all or nothing– total order on group primitives

Page 15: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo

FortunatelyFortunately

Process Group Membership is– well understood problem and protocols– well tested (ISIS, Transis, Amoeba, etc.)– basis for virtual synchrony

Equivalent Problems* (implement one, get all)– Group Membership– Reliable Failure Detectors– Reliable and ordered multicast

* Chandra and Toueg. Unreliable failure detectors for Reliable Distributed Systems. JACM, March 96.

Page 16: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo

UnfortunatelyUnfortunately

Process Group Membership is– as hard as distributed consensus– impossible in purely asynchronous systems with

crash failures*

Our solution– the standard “timeout” assumption– variation of protocol used in Amoeba OS**

* Chandra, Toueg, Hadzilacos and Charron-Bost. Impossibility of Group Membership in Asynchronous Systems.

** Oey, Langendoen and Bal. Comparing Kernel-level and User-level Communication protocols on Amoeba. ICDCS 95.

Page 17: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo

What We Provide...What We Provide...

A Group Manager Class– 100% Java– build on top of UDP/IP

Implements– group creation– join operation (with state transfer)– leave operation– failure detection and recovery– reliable multicast

All events are atomic and totally ordered

Page 18: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo

Multicast PerformanceMulticast Performance

• Pentium Pro 200, Linux RedHat 4.0, Fast Ethernet hub

0

10

20

30

40

50

60

70

Tim

e (

ms

ec

)

1 byte 512 1024

Message size (bytes)

local RMI

remote RMI

multicast-1

multicast-2

multicast-4

multicast-8

Page 19: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo

FT Registry ArchitectureFT Registry Architecture

• registry on each host/domain

• group managers ensure reliable ordered events

• support dynamic joins w/state transfer

ft registry

rmi registry

group mgr

ft registry

rmi registry

group mgr

server

bind

multicast

Page 20: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo

FT Registry Architecture FT Registry Architecture (cont…)(cont…)

• lookup becomes a local operation

• detect and remove failed objects

• consistent global state

ft registry

rmi registry

group mgr

ft registry

rmi registry

group mgr

client

look

up

Page 21: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo

FT Registry PerformanceFT Registry Performance

• Pentium Pro 200, Linux RedHat 4.0, Fast Ethernet, Ethernet hub

0

10

20

30

40

50

60

70

80

bind lookup

RMI Registry local

RMI Registry Remote

FT Registry-1

FT Registry-2

FT Registry-4

FT Registry-8

Page 22: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo

RMI & FT RegistryRMI & FT Registry

• support multiple servers register with a same name

FT Registry

clientstub

RRL

transport

RRL

skelserver

transport

RRL

skelserver

RRL

skelserver

• can now support

recovery from server failure

Page 23: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo

What if...What if...

In the event of server failure...

FT Registry

clientstub

RRL

transport

RRL

skelserver

transport

RRL

skelserver

RRL

skelserver

Ouch!

Page 24: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo

Failure RecoveryFailure Recovery

• The old connection is patched with

a connection to a non-faulty server• Illusion of a valid object reference• Transparent!

FT Registry

clientstub

RRL

transport

RRL

skelserver

RRL

skelserver

"reverse" lookup

transport

• A “reverse”

lookup returns

a name given a

wire connection

Page 25: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo

Failure Recovery Failure Recovery PerformancePerformance

?Working but measurements have not been

made

Page 26: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo

FT Server ArchitectureFT Server Architecture

• Client has the illusion of a single server• In reality, we have active replicated servers• Highly available?

serverskel

RRL

groupmgr

serverskel

RRL

groupmgr

serverskel

RRL

groupmgr

serverskel

RRL

groupmgr

clientstub

RRL

transporttransport

Page 27: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo

Highly Available ServersHighly Available Servers

• Group managers ensure reliable ordering of events across all servers

• Guarantees servers have a consistent state• Failure detection and removal of failed

servers• Dynamic addition of servers w/state transfer• Illusion of a valid server reference even after

the associated object has failed

Page 28: Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo

ConclusionsConclusions