![Page 1: Shoal: A Network Architecture for Disaggregated Racks · 2019. 2. 27. · SerDes SerDes SerDes SerDes Packet Proce-ssing Buffers Crossbar Packet switch. Goal 2: High network performance](https://reader036.vdocument.in/reader036/viewer/2022062610/61073a2e92187d6d12402a86/html5/thumbnails/1.jpg)
Shoal: A Network Architecture for Disaggregated Racks
Vishal Shrivastav (Cornell University)Asaf Valadarsky (Hebrew University of Jerusalem)Hitesh Ballani, Paolo Costa (Microsoft Research)
Ki Suh Lee (Waltz Networks)Han Wang (Barefoot Networks)
Rachit Agarwal, Hakim Weatherspoon (Cornell University)
![Page 2: Shoal: A Network Architecture for Disaggregated Racks · 2019. 2. 27. · SerDes SerDes SerDes SerDes Packet Proce-ssing Buffers Crossbar Packet switch. Goal 2: High network performance](https://reader036.vdocument.in/reader036/viewer/2022062610/61073a2e92187d6d12402a86/html5/thumbnails/2.jpg)
Inter-rack DC Network
Traditional racks in datacenters
![Page 3: Shoal: A Network Architecture for Disaggregated Racks · 2019. 2. 27. · SerDes SerDes SerDes SerDes Packet Proce-ssing Buffers Crossbar Packet switch. Goal 2: High network performance](https://reader036.vdocument.in/reader036/viewer/2022062610/61073a2e92187d6d12402a86/html5/thumbnails/3.jpg)
(FPGA,GPU,TPU)
Inter-rack DC Network
Disaggregated racks in datacenters
NVMe
Storage SoCs
Acclerators
CPUMemory
I/O controllers
NIC
Prior works [OSDI’16] [HPCA’12] [Keeton’15]• High compute density• Fine-grained resource pooling and provisioning• Seamless scaling and independent evolution of resources
Intra-rack Network
![Page 4: Shoal: A Network Architecture for Disaggregated Racks · 2019. 2. 27. · SerDes SerDes SerDes SerDes Packet Proce-ssing Buffers Crossbar Packet switch. Goal 2: High network performance](https://reader036.vdocument.in/reader036/viewer/2022062610/61073a2e92187d6d12402a86/html5/thumbnails/4.jpg)
(FPGA,GPU,TPU)
Inter-rack DC Network
Disaggregated racks in datacenters
NVMe
Storage SoCs
Acclerators
CPUMemory
I/O controllers
NIC
Prior works [OSDI’16] [HPCA’12] [Keeton’15]• High compute density• Fine-grained resource pooling and provisioning• Seamless scaling and independent evolution of resources
Intra-rack Network
![Page 5: Shoal: A Network Architecture for Disaggregated Racks · 2019. 2. 27. · SerDes SerDes SerDes SerDes Packet Proce-ssing Buffers Crossbar Packet switch. Goal 2: High network performance](https://reader036.vdocument.in/reader036/viewer/2022062610/61073a2e92187d6d12402a86/html5/thumbnails/5.jpg)
Challenges for disaggregated rack network
• Connect as many as an order of magnitude more nodes than traditional racks
Network
Compute
~15KW power budget[NSDI’16]
Intra-rack Network
q Be high performant§ low latency / high throughput
q Be power efficient§ to enable high compute density
![Page 6: Shoal: A Network Architecture for Disaggregated Racks · 2019. 2. 27. · SerDes SerDes SerDes SerDes Packet Proce-ssing Buffers Crossbar Packet switch. Goal 2: High network performance](https://reader036.vdocument.in/reader036/viewer/2022062610/61073a2e92187d6d12402a86/html5/thumbnails/6.jpg)
Challenges for disaggregated rack network
• Connect as many as an order of magnitude more nodes than traditional racks
~15KW power budget[NSDI’16]
Network
Compute
Intra-rack Network
q Be high performant§ low latency / high throughput
q Be power efficient§ to enable high compute density
![Page 7: Shoal: A Network Architecture for Disaggregated Racks · 2019. 2. 27. · SerDes SerDes SerDes SerDes Packet Proce-ssing Buffers Crossbar Packet switch. Goal 2: High network performance](https://reader036.vdocument.in/reader036/viewer/2022062610/61073a2e92187d6d12402a86/html5/thumbnails/7.jpg)
Challenges for disaggregated rack network
• Connect as many as an order of magnitude more nodes than traditional racks
~15KW power budget[NSDI’16]
Network
Compute
Intra-rack Network
q Be high performant§ low latency / high throughput
q Be power efficient§ to enable high compute density
![Page 8: Shoal: A Network Architecture for Disaggregated Racks · 2019. 2. 27. · SerDes SerDes SerDes SerDes Packet Proce-ssing Buffers Crossbar Packet switch. Goal 2: High network performance](https://reader036.vdocument.in/reader036/viewer/2022062610/61073a2e92187d6d12402a86/html5/thumbnails/8.jpg)
Potential disaggregated rack network designsLow Power consumption High Performance
(low latency / high throughput)
Packet-switchedNetworks
ToR chasis switch
Network of switches
Direct-connectNetworks
![Page 9: Shoal: A Network Architecture for Disaggregated Racks · 2019. 2. 27. · SerDes SerDes SerDes SerDes Packet Proce-ssing Buffers Crossbar Packet switch. Goal 2: High network performance](https://reader036.vdocument.in/reader036/viewer/2022062610/61073a2e92187d6d12402a86/html5/thumbnails/9.jpg)
Shoal is a network stack and fabric for disaggregated racks that is both low power and
high performance (low latency, high throughput)
Key feature:Shoal network fabric comprises purely fast circuit switches that
can reconfigure within nanoseconds
![Page 10: Shoal: A Network Architecture for Disaggregated Racks · 2019. 2. 27. · SerDes SerDes SerDes SerDes Packet Proce-ssing Buffers Crossbar Packet switch. Goal 2: High network performance](https://reader036.vdocument.in/reader036/viewer/2022062610/61073a2e92187d6d12402a86/html5/thumbnails/10.jpg)
Shoal is a network stack and fabric for disaggregated racks that is both low power and
high performance (low latency, high throughput)
Key feature:Shoal network fabric comprises purely fast circuit switches that
can reconfigure within nanoseconds
![Page 11: Shoal: A Network Architecture for Disaggregated Racks · 2019. 2. 27. · SerDes SerDes SerDes SerDes Packet Proce-ssing Buffers Crossbar Packet switch. Goal 2: High network performance](https://reader036.vdocument.in/reader036/viewer/2022062610/61073a2e92187d6d12402a86/html5/thumbnails/11.jpg)
Goal 1: Low power consumption
q No bufferingq No packet processingq No serialization/de-serialization
Consumes significantly less power than packet switches
SerDes SerDes
SerDes SerDes
PacketProce-ssing
Buffers
Crossbar
Circuit switches
Circuit switch
SerDes SerDes
SerDes SerDes
PacketProce-ssing
Buffers
Crossbar
Packet switch
![Page 12: Shoal: A Network Architecture for Disaggregated Racks · 2019. 2. 27. · SerDes SerDes SerDes SerDes Packet Proce-ssing Buffers Crossbar Packet switch. Goal 2: High network performance](https://reader036.vdocument.in/reader036/viewer/2022062610/61073a2e92187d6d12402a86/html5/thumbnails/12.jpg)
Goal 2: High network performance
Key Challenge:Need to explicitly set up circuits (reconfigure) before sending packets
q Traditional circuit-switched networksq Uses switches with high reconfiguration delay, up to millisecondsq Uses a central controller to decide the circuits (reconfiguration algorithm)q Not suitable for low latency traffic
q Shoalq Leverages circuit switches with nanosecond reconfiguration delay
Key Design Idea:De-centralized, traffic agnostic reconfiguration algorithm• Inspired from LB monolithic packet switches [Comp Comm’02]
![Page 13: Shoal: A Network Architecture for Disaggregated Racks · 2019. 2. 27. · SerDes SerDes SerDes SerDes Packet Proce-ssing Buffers Crossbar Packet switch. Goal 2: High network performance](https://reader036.vdocument.in/reader036/viewer/2022062610/61073a2e92187d6d12402a86/html5/thumbnails/13.jpg)
* -> H
Shoal for a single circuit switch network
A B C D E F G H
* -> H
1 2 3 4 5 6 7
Time slot
A
B
C
D
E
F
G
H
B
C
D
E
F
G
H
A
C
D
E
F
G
H
A
B
D
E
F
G
H
A
B
C
E
F
G
H
A
B
C
D
F
G
H
A
B
C
D
E
G
H
A
B
C
D
E
F
H
A
B
C
D
E
F
G
(a cyclic permutation)
* -> H * -> H * -> H * -> H * -> H * -> H * -> H
A permutationof connections
N-1 time slots(an epoch)
Uniformly load-balanced
traffic
100% throughputArbitrary traffic pattern 50% throughput
in worst-case
A -> H A -> H A ->H A ->H A -> H A ->H* -> H* -> H* -> H* -> H * -> H
A -> HA -> HA -> HA -> HA -> H A -> H A -> H
Static pre-defined schedule
Each node hasN-1 queues
(one per dst)
![Page 14: Shoal: A Network Architecture for Disaggregated Racks · 2019. 2. 27. · SerDes SerDes SerDes SerDes Packet Proce-ssing Buffers Crossbar Packet switch. Goal 2: High network performance](https://reader036.vdocument.in/reader036/viewer/2022062610/61073a2e92187d6d12402a86/html5/thumbnails/14.jpg)
Extending Shoal to a network of circuit switches
A B C D E F G H
1 2 3 4 5 6 7Time slot
ABCDEFGH
BC
DEFGHA
CD
EFGHAB
DE
FGHABC
EF
GHABCD
FG
HABCDE
GH
ABCDEF
HA
BCDEFG
![Page 15: Shoal: A Network Architecture for Disaggregated Racks · 2019. 2. 27. · SerDes SerDes SerDes SerDes Packet Proce-ssing Buffers Crossbar Packet switch. Goal 2: High network performance](https://reader036.vdocument.in/reader036/viewer/2022062610/61073a2e92187d6d12402a86/html5/thumbnails/15.jpg)
Extending Shoal to a network of circuit switches
A B C D E F G H
Requires very tight network-wide synchronizationq DTP [Sigcomm’16] + WhiteRabbit can achieve sub-nanosecond
synchronization precision
1 2 3 4 5 6 7Time slot
ABCDEFGH
BC
DEFGHA
CD
EFGHAB
DE
FGHABC
EF
GHABCD
FG
HABCDE
GH
ABCDEF
HA
BCDEFG
A non-blocking topology of circuit switches
![Page 16: Shoal: A Network Architecture for Disaggregated Racks · 2019. 2. 27. · SerDes SerDes SerDes SerDes Packet Proce-ssing Buffers Crossbar Packet switch. Goal 2: High network performance](https://reader036.vdocument.in/reader036/viewer/2022062610/61073a2e92187d6d12402a86/html5/thumbnails/16.jpg)
Congestion in Shoal
A B C D E F G H
1 2 3 4 5 6 7Time slot
ABCDEFGH
BC
DEFGHA
CD
EFGHAB
DE
FGHABC
EF
GHABCD
FG
HABCDE
GH
ABCDEF
HA
BCDEFG
Flow toH
Flow toH
B -> HA -> HB -> HA -> HB -> HA -> HB -> HA -> HB -> HA -> HB -> HA -> HB -> HA -> HA -> HB -> HA -> HA -> H
B -> HA -> HB -> HA -> H
![Page 17: Shoal: A Network Architecture for Disaggregated Racks · 2019. 2. 27. · SerDes SerDes SerDes SerDes Packet Proce-ssing Buffers Crossbar Packet switch. Goal 2: High network performance](https://reader036.vdocument.in/reader036/viewer/2022062610/61073a2e92187d6d12402a86/html5/thumbnails/17.jpg)
Congestion control in Shoal
A -> H
B -> H
B -> H
A -> H
A -> H
A -> H
A -> H
2
Queue for destination H at CA C
Each per-destination queue !" corresponding to destination " is bounded!#$% !" ≤ ' + "%)*+,_.$/0$$(") packets
at most 1 packet per source
1 2 3 4 5 6 7Time slot
ABCDEFGH
BC
DEFGHA
CD
EFGHAB
DE
FGHABC
EF
GHABCD
FG
HABCDE
GH
ABCDEF
HA
BCDEFG
![Page 18: Shoal: A Network Architecture for Disaggregated Racks · 2019. 2. 27. · SerDes SerDes SerDes SerDes Packet Proce-ssing Buffers Crossbar Packet switch. Goal 2: High network performance](https://reader036.vdocument.in/reader036/viewer/2022062610/61073a2e92187d6d12402a86/html5/thumbnails/18.jpg)
q No central controller for reconfigurationq Fully de-centralized, traffic agnostic reconfiguration logicq Allows circuit switches to reconfigure at nanosecond timescales
q Each per-destination queue in the network is bounded
q Each packet traverses the network at most twiceq Worst-case 50% throughput compared to an ideal packet-switched networkq Can be compensated by allocating 2X bandwidth per nodeq Cost (Shoal) ≤ Cost (packet-switched network with ½ bandwidth of Shoal)
Key properties of Shoal
![Page 19: Shoal: A Network Architecture for Disaggregated Racks · 2019. 2. 27. · SerDes SerDes SerDes SerDes Packet Proce-ssing Buffers Crossbar Packet switch. Goal 2: High network performance](https://reader036.vdocument.in/reader036/viewer/2022062610/61073a2e92187d6d12402a86/html5/thumbnails/19.jpg)
Implementation
Stratix V FPGAq Bluespec System Verilog
Verified the queuing and throughput properties of Shoal on a 8-node testbed
Circuit switch implementation can
reconfigure in < 6.4ns
q Implemented custom NIC and circuit switch on FPGA
![Page 20: Shoal: A Network Architecture for Disaggregated Racks · 2019. 2. 27. · SerDes SerDes SerDes SerDes Packet Proce-ssing Buffers Crossbar Packet switch. Goal 2: High network performance](https://reader036.vdocument.in/reader036/viewer/2022062610/61073a2e92187d6d12402a86/html5/thumbnails/20.jpg)
Evaluation
q Power consumption
• Shoal consumes 3.5x less power than packet-switched network!
Packet-switched Network 8.72 KW (58% of rack budget)Shoal 2.55 KW (17% of rack budget)
For a 512-node rack
q Packet-switched network comprises 24 64x50 Gbps packet switchesq Shoal comprises 48 64x50 Gbps circuit switches
![Page 21: Shoal: A Network Architecture for Disaggregated Racks · 2019. 2. 27. · SerDes SerDes SerDes SerDes Packet Proce-ssing Buffers Crossbar Packet switch. Goal 2: High network performance](https://reader036.vdocument.in/reader036/viewer/2022062610/61073a2e92187d6d12402a86/html5/thumbnails/21.jpg)
Evaluation
q Network performance• Packet-level simulator in C• 512-node rack• 5 disaggregated workload
traces [OSDI’16]• Shoal has 2X bandwidth
(with comparable cost)
• Shoal performs comparableor better than several recentdesigns for packet-switched networks! Short flows (0,100KB] Long flows [1MB,∞)
![Page 22: Shoal: A Network Architecture for Disaggregated Racks · 2019. 2. 27. · SerDes SerDes SerDes SerDes Packet Proce-ssing Buffers Crossbar Packet switch. Goal 2: High network performance](https://reader036.vdocument.in/reader036/viewer/2022062610/61073a2e92187d6d12402a86/html5/thumbnails/22.jpg)
ConclusionLow Power consumption High Performance
(low latency / high throughput)
Packet-switchedNetworks
ToR chasis switch
Network of switches
Direct-connectNetworks
Shoal(circuit-switched)
![Page 23: Shoal: A Network Architecture for Disaggregated Racks · 2019. 2. 27. · SerDes SerDes SerDes SerDes Packet Proce-ssing Buffers Crossbar Packet switch. Goal 2: High network performance](https://reader036.vdocument.in/reader036/viewer/2022062610/61073a2e92187d6d12402a86/html5/thumbnails/23.jpg)
Thank you!
Shoal FPGA prototype and simulator code is available at:https://github.com/vishal1303/Shoal