Download - State Machine Replication
![Page 1: State Machine Replication](https://reader036.vdocument.in/reader036/viewer/2022062501/56815ed1550346895dcd5f95/html5/thumbnails/1.jpg)
State Machine ReplicationProject Presentation
Ido ZachevskyMarat Radan
Supervisor:Ittay Eyal
Winter Semester 2010
![Page 2: State Machine Replication](https://reader036.vdocument.in/reader036/viewer/2022062501/56815ed1550346895dcd5f95/html5/thumbnails/2.jpg)
Goals
• Learn and understand Paxos and Python.
• Design program for fault-tolerant distributed system using the Paxos algorithm.
• Test on a real internet scale system, Planet-Lab.
![Page 3: State Machine Replication](https://reader036.vdocument.in/reader036/viewer/2022062501/56815ed1550346895dcd5f95/html5/thumbnails/3.jpg)
The Problem – Distributed Storage
• Using Distributed Algorithms on a network has many advantages
• It also has many problems
• This project focuses on the Synchronization Problem
![Page 4: State Machine Replication](https://reader036.vdocument.in/reader036/viewer/2022062501/56815ed1550346895dcd5f95/html5/thumbnails/4.jpg)
Synchronization
• The task: Successfully issue a state machine which involves all the computers of a network
• All the computers need to be in sync regarding the Current State and the Next States.
• All the computers need to know the transitions.
![Page 5: State Machine Replication](https://reader036.vdocument.in/reader036/viewer/2022062501/56815ed1550346895dcd5f95/html5/thumbnails/5.jpg)
Problems?
• Can any computer choose the next state?
• What if a computer disconnects ungracefully?
• What if a message is delayed due to congestion?
• Other problems…
• Solution: Use a dedicated algorithm
![Page 6: State Machine Replication](https://reader036.vdocument.in/reader036/viewer/2022062501/56815ed1550346895dcd5f95/html5/thumbnails/6.jpg)
A Solution – Paxos
• Keeping the Safety requirements ensures an agreed-upon value, by all computers, is chosen
• Keeping the Liveness requirements ensures a value will be chosen
![Page 7: State Machine Replication](https://reader036.vdocument.in/reader036/viewer/2022062501/56815ed1550346895dcd5f95/html5/thumbnails/7.jpg)
Paxos - Background
Paxos Made Simple
Leslie Lamport01 Nov 2001
• Paxos Made Live
![Page 8: State Machine Replication](https://reader036.vdocument.in/reader036/viewer/2022062501/56815ed1550346895dcd5f95/html5/thumbnails/8.jpg)
Principles
• The system consists of three agent classes:– Proposers– Acceptors– Learners
• Some of them distinguished
• Communicate via messages
![Page 9: State Machine Replication](https://reader036.vdocument.in/reader036/viewer/2022062501/56815ed1550346895dcd5f95/html5/thumbnails/9.jpg)
Principles – continued
• A single computer – a Leader – is in charge
• Decision cycle in two phases:1. A majority must promise to commit to a
recent proposal.2. Once a majority has committed, all
computers are informed of the Decision.
![Page 10: State Machine Replication](https://reader036.vdocument.in/reader036/viewer/2022062501/56815ed1550346895dcd5f95/html5/thumbnails/10.jpg)
Safety requirements
• Only a value that has been proposed may be chosen,
• Only a single value is chosen, and• A process never learns that a value has been
chosen unless it actually has been.
![Page 11: State Machine Replication](https://reader036.vdocument.in/reader036/viewer/2022062501/56815ed1550346895dcd5f95/html5/thumbnails/11.jpg)
Liveness requirements
• Some proposed value is eventually chosen.• A process can eventually learn the value which
has been chosen.
![Page 12: State Machine Replication](https://reader036.vdocument.in/reader036/viewer/2022062501/56815ed1550346895dcd5f95/html5/thumbnails/12.jpg)
Implementing a State Machine
• Collection of servers, each implementing a state machine.
• The i-th state machine command in the sequence is the value chosen by the i-th instance of the Paxos consensus algorithm.
• A pre-decided set of commands is necessary.
![Page 13: State Machine Replication](https://reader036.vdocument.in/reader036/viewer/2022062501/56815ed1550346895dcd5f95/html5/thumbnails/13.jpg)
Planet-Lab
• Planet-Lab is a global research network that supports the development of new network services.
• Understanding the system is required• Monitoring is necessary
– Generally, implemented via NSSL-lab.
![Page 14: State Machine Replication](https://reader036.vdocument.in/reader036/viewer/2022062501/56815ed1550346895dcd5f95/html5/thumbnails/14.jpg)
Project Design
• Chosen language for implementation: Python• Network framework: Twisted Matrix
• Implementation stages:– Single Decision on NSSL– Multiple Decisions on NSSL– Single Decision on Planet-Lab– Multiple Decisions on Planet-Lab
![Page 15: State Machine Replication](https://reader036.vdocument.in/reader036/viewer/2022062501/56815ed1550346895dcd5f95/html5/thumbnails/15.jpg)
Clients 1
Server 1
Clients 2
Server 2
Clients N
Server N
The Network
……...
Transport
Listening Socket
Transport
Transport
Protocol
Protocol
Protocol
ProtocolFactory
Paxos Algorithm
Transport
Transport
Transport
Protocol
Protocol
Protocol
ProtocolFactory
Reactor Loop
... ...
... ...
![Page 16: State Machine Replication](https://reader036.vdocument.in/reader036/viewer/2022062501/56815ed1550346895dcd5f95/html5/thumbnails/16.jpg)
Implementation
• Use Cases– Acceptor disconnects?
– Leader disconnects?• At which stage?
– Acceptor message fails to deliver?
![Page 17: State Machine Replication](https://reader036.vdocument.in/reader036/viewer/2022062501/56815ed1550346895dcd5f95/html5/thumbnails/17.jpg)
Implementation
• Leader Election– In fact an inherent part of the algorithm
• Output and monitoring– Actual output not visible in general– Only via monitoring
![Page 18: State Machine Replication](https://reader036.vdocument.in/reader036/viewer/2022062501/56815ed1550346895dcd5f95/html5/thumbnails/18.jpg)
Flow
1. Register Nodes 2. Verify and install necessary files3. Upload4. Initiate Monitor5. Run and wait for activity6. Review results
![Page 19: State Machine Replication](https://reader036.vdocument.in/reader036/viewer/2022062501/56815ed1550346895dcd5f95/html5/thumbnails/19.jpg)
Implementation – File Structure
Initial Installation
Installationmy_install (csh)
Initial Communication send_install (py)
Alive Machines Server
install_serv (py)
Uploading and Running
Deployment my_deploy (csh)
Multi-Run my_multirun (csh)
Multi-Stop my_multistop (csh)
Core Paxos Program
Paxos Instancepaxos_inst (py)
Paxos Algorithmpaxos_alg (py)
Network Datapaxos_net_data
(txt)
ProjectFile Structure
Service Scripts and Files
Alive Nodes listnodes (txt)
Paxos Monitorpaxos_mon_serv
(py)
combine_nodes (csh)
conv_nodes (csh)
remove_done (csh)
Additional files
![Page 20: State Machine Replication](https://reader036.vdocument.in/reader036/viewer/2022062501/56815ed1550346895dcd5f95/html5/thumbnails/20.jpg)
Results
• Everything works at the NSSL• In Real-Life, not necessarily• Communication phenomena – messages
arriving unordered, in large chunks, etc.• Works well for up to 20-30 Nodes• Use cases tested in Lab
![Page 21: State Machine Replication](https://reader036.vdocument.in/reader036/viewer/2022062501/56815ed1550346895dcd5f95/html5/thumbnails/21.jpg)
Conclusions
• Preliminary work needed to understand Twisted Matrix and Planet-Lab
• Dealing with network problems– SSH Tunnel instead of “real” monitoring
• Requirements fulfilled
![Page 22: State Machine Replication](https://reader036.vdocument.in/reader036/viewer/2022062501/56815ed1550346895dcd5f95/html5/thumbnails/22.jpg)
Further work
• Optimize networking protocol– Improve client-server interface– Inefficient startup – N(N-1) for N machines
• Partition Decision processes– Only few nodes decide each resolution
![Page 23: State Machine Replication](https://reader036.vdocument.in/reader036/viewer/2022062501/56815ed1550346895dcd5f95/html5/thumbnails/23.jpg)
Thank you