the project presentationcourse.ece.cmu.edu/~ece749/teams-06/team7/final_presentation_v1.0.pdfthread...
TRANSCRIPT
The Project PresentationThe Project PresentationApril 28, 2006April 28, 2006
1818--749: Fault749: Fault--Tolerant Distributed SystemsTolerant Distributed Systems
Team 7Team 7--SixersSixersKyu Hou Kyu Hou
Minho Jeung Minho Jeung Wangbong LeeWangbong Lee
HeejoonHeejoon Jung Jung WenWen ShuShu TangTang
2
MembersMembers
MSE
http://www.ece.cmu.edu/~ece749/teams-06/team7/
Min Ho [email protected]
MSE
Wangbong [email protected]
MSE
Heejoon [email protected]
MSIT-SE
Wen Shu [email protected]
ECE
3
Baseline ApplicationBaseline ApplicationExpress Bus Ticket CenterExpress Bus Ticket Center
ApplicationApplicationOnline express bus ticketing application
ConfigurationConfiguration• Operating System: Linux servers• Programming Language: Java• Database: MySQL• Middleware: CORBA
Baseline Application FeatureBaseline Application Feature• Users can retrieve bus schedules and tickets.• Users can buy tickets. • Users can cancel the tickets.
4
Baseline Architecture (before)Baseline Architecture (before)
Database
Pool of Customers Bus Ticket System
①
②
③④
①
④
Allocation View-Deployment Style
Legend
InternetClient Server Database Request message Response messageRequest Query Response Query
5
Baseline Architecture (after)Baseline Architecture (after)
DatabasePool of Customers Bus Ticket System
①
③④
Allocation View-Deployment Style
Legend
Client Server Database Request Message Response MessageRequest Query Response Query
②User
InterfaceClient
CORBAInterface
MySQL
ServerCORBAInterface
JDBCBusinessLogic
6
Client requests should be preserved, when exception is occurred.Replication
There are 2 copies of server which perform same operations for fault-tolerance on the chess and risk machine.Replication Type
Active ReplicationAdvantage: PerformanceDisadvantage: More memory and processing cost
Replication ManagerNo specific replication manager exists.As soon as client application begins, the application acquires the replication server name which is stored in Naming Server.
Elements of Fault-Tolerance FrameworkGlobal Manager: HeartbeatRecovery Manager
Re-instantiating a failed replicationThe recovery result is written into a log file in Database.
Fault injector: Shell
FaultFault--Tolerance ApplicationTolerance Application
7
ScenarioScenario
1. Client requests the names of server to the naming server.1. Client requests the names of server to the naming server.2. The naming server sends the names of servers.2. The naming server sends the names of servers.3. Client requests to all servers.3. Client requests to all servers.
a. When the client receives an exception message, then the faua. When the client receives an exception message, then the fault is detected.lt is detected.b. The client already communicates with another replication seb. The client already communicates with another replication server.rver.
4. All servers send the results to clients.4. All servers send the results to clients.5. Client receives the results, and checks duplication.5. Client receives the results, and checks duplication.
FTFT--Baseline ArchitectureBaseline Architecture
Exception CasesException CasesServer_Timeout
Checked by using thread poolDatabase_Timeout:
Checked by using connection poolDead_Server
Solved by using heartbeat (check servers per 2 seconds)
Global Recovery Manager: Heartbeat
Mechanisms for FailMechanisms for Fail--OverOver
9
Performance measurementPerformance measurement
48 Configurations48 ConfigurationsBuy and cancel ticketBuy and cancel ticket
us
Performance measurementPerformance measurement
Performance measurementPerformance measurement
Performance measurementPerformance measurement
Performance measurementPerformance measurement
1 Client
1000 requests
Cancel ticket request
Fault Injection measurementsFault Injection measurements
Performance measurement comparisonPerformance measurement comparison
16
RTRT--FT Baseline ArchitectureFT Baseline ArchitectureActive ReplicationActive Replication
17
Thread PoolWe need to avoid the overhead of thread creation for each request. Create a number of threads at initialize time
Dynamic configuration
Without Thread PoolAVG RTT: 40.5 msec
With Thread PoolAVR RTT: 38.0 msec
Improve the performance about 4%
RTRT--FT Performance Strategy FT Performance Strategy
18
Thread Pool / No Thread Pool
RTRT--FT Performance Measurement FT Performance Measurement
19
List other featuresFault Injector – Shell ScriptLog4j – Logging informationApach DB Connection Pool (DBCP)
What lessons by other features?Useful utilitiesImprove performance by DBCPPowerful shell scripts
Other FeatureOther Feature
20
FT MeasurementFile I/O for logging time grows as the a file size increases
RT-FT MeasurementNo RTT difference between fault-free and fault-injected test cases
Duplicated values reach the client almost at the same time.
RT-FT Performance MeasurementThread creation time is not trivial when the number of replica increase
Need more test cases
Insights from MeasurementInsights from Measurement
21
Issues
Test environment How to set up same test environment for each test case.
How to decide test environment is good enough to get the meaningful data.
Additional features
Load balancing for active replicationOrganizing active replication group
Passive replication for each group
Open IssueOpen Issue
22
What did we learn?
Handling thread
Data gathering and analyzing
Useful open source programApache project :log4j, dbcp
What did we accomplish?
succeed to build active replication system
If we could start our project again,
focus on only FT features
ConclusionConclusion