consensus as a network service - usi informatics · offering consensus as a network service is:...
TRANSCRIPT
![Page 1: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9ecc7910d14432f3be440/html5/thumbnails/1.jpg)
Consensus as aNetwork Service
Huynh Tu Dang, Pietro Bressana, Han Wang, Ki Suh Lee, Hakim Weatherspoon, Marco Canini, Fernando Pedone, and Robert Soulé Università della Svizzera italiana (USI),Cornell University, and KAUST
1
![Page 2: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9ecc7910d14432f3be440/html5/thumbnails/2.jpg)
Consensus is a Fundamental Problem
Many distributed problems can be reduced to consensus
E.g., Atomic broadcast, atomic commit
Consensus protocols are the foundation for fault-tolerant systems
E.g., OpenReplica, Ceph, Chubby
Any improvement in performance would have HUGE impact
2
![Page 3: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9ecc7910d14432f3be440/html5/thumbnails/3.jpg)
Key Idea: Move Consensus Into Network Hardware
This work focuses on Paxos
One of the most widely used consensus protocol
Has been proved to be correct
Enabling technology trends:
Hardware is becoming more flexible: e.g. PISA, FlexPipe, NFP-6xxx
Hardware is becoming more programmable: e.g., POF, PX, and P4
3
![Page 4: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9ecc7910d14432f3be440/html5/thumbnails/4.jpg)
Outline of This Talk
Introduction
Consensus Background
Design, Implementation & Evaluation
Conclusions
4
![Page 5: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9ecc7910d14432f3be440/html5/thumbnails/5.jpg)
Paxos Roles and Communication
5
Proposers propose values
A distinct proposer assumes the role of Coordinator
Acceptors accept a proposal, promise not to accept any other proposals
Learners require a quorum of messages from Acceptors, “deliver” a value
Coordinator
Proposer
Acceptor 1
Acceptor 2
Acceptor 3
. .Learners
(up to n). .
proposalPhase2A
Phase2B
![Page 6: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9ecc7910d14432f3be440/html5/thumbnails/6.jpg)
Design
6
![Page 7: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9ecc7910d14432f3be440/html5/thumbnails/7.jpg)
Design Goals 1: Be a Drop-In Replacement
István et al. [NSDI ’16] implement ZAB in FPGAs, but require that the application written in the Hardware Description Language
High-level languages make hardware development easier
Implementing LevelDB in P4 might still be tricky….
7
![Page 8: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9ecc7910d14432f3be440/html5/thumbnails/8.jpg)
Standard Paxos API
8
1 void submit(struct paxos_ctx * ctx,2 char * value,3 int size);4
5 void (*deliver)(struct paxos_ctx* ctx,6 int instance,7 char * value,8 int size);9
10 void recover(struct paxos_ctx * ctx,11 int instance,12 char * value,13 int size);
Figure 4: CAANS application level API.
paxos ctx struct. When a learner learns a value, itcalls the application-specific deliver function. Thedeliver function returns a buffer containing the learnedvalue, the size of the buffer, and the instance number forthe learned value.
The recover function is used by the application todiscover a previously agreed upon value for a particularinstance of consensus. The recover function results in thesame sequence of Paxos messages as the submit func-tion. The difference in the API, though, is that the ap-plication must pass the consensus instance number as aparameter, as well as an application-specific no-op value.The resulting deliver callback will either return the ac-cepted value, or the no-op value if no value had been pre-viously accepted for the particular instance number.Hardware/Software divide. An important question foroffering consensus as a network service is: exactly whatlogic should be implemented in network hardware, andwhat logic should be implemented in software?
In the CAANS architecture, network hardware executesthe logic of coordinators and acceptors. This choice al-lows CAANS to address the bottlenecks identified in Sec-tion 2. Moreover, since the proposer and learner code areimplemented in software, the design facilitates the simpleapplication-level interface described above. The logic ofeach of the roles is neatly encapsulated by communicationboundaries.
Figure 3 illustrates the CAANS architecture for aswitch-based deployment. In the figure, switch hardwareis shaded grey, and commodity servers are colored white.Note that a backup coordinator can execute on either asecond switch, or a commodity server, as we’ll discussbelow. We should also point out that CAANS could be de-ployed on other devices, such as the programmable NICsthat we use in the evaluation.Paxos header. Network hardware is optimized to processpacket headers. Since CAANS targets network hardware,it is a natural choice to map Paxos messages into a Paxos-protocol header. The Paxos header follows the transportprotocol header (e.g., UDP), allowing CAANS messages
1 struct paxos_t {2 uint8_t msgtype;3 uint8_t inst[INST_SIZE];4 uint8_t rnd;5 uint8_t vrnd;6 uint8_t swid[8]7 uint8_t value[VALUE_SIZE];8 };
Figure 5: Paxos packet header.
to co-exist with standard network hardware.In a traditional Paxos implementation, each participant
receives messages of a particular type (e.g., Phase 1A,2A), executes some processing logic, and then synthesizesa new message that it sends to the next participant in theprotocol.
However, network hardware, in general, cannot craftnew messages; they can only modify fields in the headerof the packet that they are currently processing. There-fore, a network-based Paxos needs to map participantlogic into forwarding and header rewriting decisions (e.g.,the message from proposer to coordinator is transformedinto a message from coordinator to each acceptor byrewriting certain fields). Because the message size can-not be changed at the switch, each packet must containthe union of all fields in all Paxos messages, which fortu-nately are still a small set.
Figure 5 shows the CAANS packet header for Paxosmessages, written as a C struct. To keep the header small,the semantics of some of the fields change depending onwhich participant sends the message. The fields are as fol-lows: (i) msgtype distinguishes the various Paxos mes-sages (e.g., phase 1A, 2A); (ii) inst is the consensusinstance number; (iii) rnd is either the round numbercomputed by the proposer or the round number for whichthe acceptor has cast a vote; vrnd is the round numberin which an acceptor has cast a vote; (iv) swid identi-fies the sender of the message; and (v) value containsthe request from the proposer or the value for which anacceptor has cast a vote.
A CAANS proposer differs from a standard Paxos pro-poser because before forwarding messages to the coor-dinator, it must first encapsulate the message in a Paxosheader. Through standard sockets, the Paxos header isthen encapsulated inside a UDP datagram and we rely onthe UDP checksum to ensure data integrity.Memory limitations. CAANS aims to support practi-cal systems that use Paxos as a building block to achievefault tolerance. A prominent example of these are servicesthat rely on a replicated log to persistently record the se-quence of all consensus values. The Paxos algorithm doesnot specify how to handle the ever-growing, replicatedlog that is stored at acceptors. On any system, this cancause problems, as the log would require unbounded disk
5
Send a value
Deliver a value
Discover prior value
![Page 9: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9ecc7910d14432f3be440/html5/thumbnails/9.jpg)
Design Goals 2: Alleviate Bottlenecks
9
0%
25%
50%
75%
100%
ProposerCoordinator
AcceptorLearner
CPU
util
izat
ion
● ●
●● ●
25%
50%
75%
100%
4 8 12 16 20Number of Learners
CPU
Util
izat
ion
● CoordinatorProposerAcceptorLearner
Coordinator and acceptors are to blame!
![Page 10: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9ecc7910d14432f3be440/html5/thumbnails/10.jpg)
Hardware/Software
10
Proposer Proposer
Learner
Coordinator
AcceptorAcceptorAcceptor
Coordinator Backup
Learner
Challenge: map Paxos logic into stateful forwarding decisions
Facilitate software
API
Alleviatebottlenecks
![Page 11: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9ecc7910d14432f3be440/html5/thumbnails/11.jpg)
NetPaxos: Header Definition & Parser
header_type paxos_t { fields { msgtype : 16; inst : 32; rnd : 16; vrnd : 16; acptid : 16; paxosval : 256; } }
parser parse_ethernet { extract(ethernet); return parse_ipv4; } parser parse_ipv4 { extract(ipv4); return parse_udp; } parser parse_udp { extract(udp); return select(udp.dstPort) { PAXOS_PROTOCOL: parse_paxos; default: ingress; } } parser parse_paxos { extract(paxos); return ingress; }
11
![Page 12: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9ecc7910d14432f3be440/html5/thumbnails/12.jpg)
Acceptor Control Flow
isPaxos?
round_tbl
acceptor_tbl
Packet’srnd>=acceptor’srnd?
forward_tbl
12
Drop
load acceptor’s rnd stored in registers
Ingress
Update: registers’ states ‘msgtype’ ‘acptid’ UDP dst port
isIPv4?Drop
Egress
![Page 13: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9ecc7910d14432f3be440/html5/thumbnails/13.jpg)
round_tbl
forward_tbl
13
Drop
Ingress
Drop
Egress
control ingress { if (valid(ipv4)) {
apply(forward_tbl); }
if (valid(paxos)) { apply(round_tbl);
if(paxos.rnd >= current.rnd){ apply(acceptor_tbl); } } }
acceptor_tbl
![Page 14: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9ecc7910d14432f3be440/html5/thumbnails/14.jpg)
round_tbl
forward_tbl
14
Drop
Ingress
Drop
Egress
acceptor_tbl
Acceptor Control Flow
![Page 15: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9ecc7910d14432f3be440/html5/thumbnails/15.jpg)
round_tbl// uint16_t rounds_regs[64000]; register rounds_reg { width : 16; instance_count : 64000; }
action read_round() { // uint16_t current.round = rounds_reg[paxos.inst] register_read(current.round, rounds_reg, paxos.inst); }
table round_tbl { actions { read_round; } size : 1; }
15
![Page 16: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9ecc7910d14432f3be440/html5/thumbnails/16.jpg)
round_tbl
forward_tbl
16
Drop
Ingress
Drop
Egress
Acceptor Control Flow
acceptor_tbl
![Page 17: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9ecc7910d14432f3be440/html5/thumbnails/17.jpg)
acceptor_tblaction handle_2a(learner_port) { // rounds_reg[paxos.inst] = paxos.rnd register_write(rounds_reg, paxos.inst, paxos.rnd); // vrounds_reg[paxos.inst] = paxos.rnd register_write(vrounds_reg, paxos.inst, paxos.rnd); // values_reg[paxos.inst] = paxos.rnd register_write(values_reg, paxos.inst, paxos.paxosval);
register_read(paxos.acptid, acceptor_id, 0); modify_field(paxos.msgtype, PAXOS_2B); modify_field(udp.dstPort, learner_port); }
table acceptor_tbl { reads { paxos.msgtype : exact }; actions { handle_1a; handle_2a }; }
17
![Page 18: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9ecc7910d14432f3be440/html5/thumbnails/18.jpg)
ImplementationSource code
Proposer and learner written in C
Coordinator and acceptor written in P4
4 Compilers
P4C
P4FPGA
Xilinx SDNet
Netronome SDK
18
4 Hardware target platforms
NetFPGA SUME (4x10G)
Netronome Agilio-CX (1x40G)
Alpha Data ADM-PCIE-KU3 (2x40G)
Xilinx VCU109 (4x100G)
2 Software target platforms
Bmv2
DPDK ( work in progress )
![Page 19: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9ecc7910d14432f3be440/html5/thumbnails/19.jpg)
P4 Compilers
19
Compiler Target Remark
P4C SoftwareSwitch Supports most of the P4 constructs
P4@ELTE DPDK Does not support register operations. Limits field length to 32 bits
P4FPGA FPGAs Must write modules for unsupported P4 constructs
XilinxSDNet FPGAs Does not support register operations.
Requires a wrapper for the packet stream Netronome
SDK Netronome ISAs Works only with Netronome devices. Custom actions can be written in Micro-C
BarefootCapilano
Barefoot Tofino Tbps switch
![Page 20: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9ecc7910d14432f3be440/html5/thumbnails/20.jpg)
Evaluation
20
![Page 21: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9ecc7910d14432f3be440/html5/thumbnails/21.jpg)
Experiment: What is the Absolute Performance?
Run Coordinator / Acceptor in isolation
Testbed:
NetFPGA SUME board in a SuperMicro Server
A Packet generator for offering load
21
![Page 22: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9ecc7910d14432f3be440/html5/thumbnails/22.jpg)
Absolute Performance
22
Late
ncy
(us)
0
0.2
0.4
0.6
0.8
Forwarding Coordinator Acceptor
Measured on NetFPGA SUME using P4FPGA
Throughput is over 9 million consensus messages / second (close to line rate)
Little overhead latencycompared to simply forwarding packets
![Page 23: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9ecc7910d14432f3be440/html5/thumbnails/23.jpg)
Experiment: What is the End-to-End Performance?
Comparing NetPaxos to a software-based Paxos (Libpaxos)
Testbed:
4 NetFPGA SUME boards in SuperMicro Servers
An OpenFlow-enable 10 Gbps switch (Pica8 P-3922 switch)
23
![Page 24: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9ecc7910d14432f3be440/html5/thumbnails/24.jpg)
End-to-End Performance
24
0
1,000
2,000
3,000
50,000 100,000Throughput (Msgs / S)
Late
ncy
(µs)
CAANSLibpaxos
2.24x throughput improvement over software implementation
75% reduction in latency
Similar results when replicating LevelDB as application
![Page 25: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9ecc7910d14432f3be440/html5/thumbnails/25.jpg)
Next Steps
We make consensus great again!
The ball is now in the application developer’s court
Suggests direction for future work
25
0%
25%
50%
75%
100%
Proposer Learner
CPU
util
izat
ion
![Page 26: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9ecc7910d14432f3be440/html5/thumbnails/26.jpg)
Lessons Learned
26
![Page 27: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9ecc7910d14432f3be440/html5/thumbnails/27.jpg)
Outlook
The performance of consensus protocols has a dramatic impact on the performance of data center applications
Moving consensus logic into network hardware results in significant performance improvements
27
“a HUGE wave of consensus messages is approaching”
![Page 29: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9ecc7910d14432f3be440/html5/thumbnails/29.jpg)
29
Questions & Answers
![Page 30: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9ecc7910d14432f3be440/html5/thumbnails/30.jpg)
Performance After Failure
30
0
50,000
100,000
150,000
2 4 6 8 10Time (Second)
Thro
ughp
ut (M
sgs /
S)
0
50,000
100,000
150,000
2 4 6 8 10Time (Second)
Thro
ughp
ut (M
sgs /
S)
Coordinator failurewith software backup Acceptor failure
![Page 31: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should](https://reader034.vdocument.in/reader034/viewer/2022042223/5ec9ecc7910d14432f3be440/html5/thumbnails/31.jpg)
End-to-End Experiment NetPaxos Setup
31
RunPaxosprotocol
Programmable device
ApplicationClients
ApplicationServers
ApplicationClients
ApplicationServers