linkedin dc network architecture - lacnic · leaf1 leaf2 leaf3 leaf4 leaf1 leaf2 leaf3 leaf4 leaf1...
TRANSCRIPT
![Page 1: LinkedIn DC Network Architecture - LACNIC · Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Spine1 SpineX SpineY Spine32 Spine1 SpineX](https://reader033.vdocument.in/reader033/viewer/2022042404/5f1890444d718037c0683158/html5/thumbnails/1.jpg)
LinkedIn DC Network Architecture(or how to build a network for 100,000 servers)
Ernesto OvcharenkoStaff Network EngineerInfrastructure Engineering
![Page 2: LinkedIn DC Network Architecture - LACNIC · Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Spine1 SpineX SpineY Spine32 Spine1 SpineX](https://reader033.vdocument.in/reader033/viewer/2022042404/5f1890444d718037c0683158/html5/thumbnails/2.jpg)
LinkedIn Infrastructure
Bare Metal Servers>200K ~20
PoPs~4000
Networks Peered~1.5Tbps
Inter-DC NG BB
![Page 3: LinkedIn DC Network Architecture - LACNIC · Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Spine1 SpineX SpineY Spine32 Spine1 SpineX](https://reader033.vdocument.in/reader033/viewer/2022042404/5f1890444d718037c0683158/html5/thumbnails/3.jpg)
34% infrastructure growth every year…High bandwidth & compute demand due to the organic growth.
For every single byte, thousands bytes of east-west traffic:
• Application Call Graph
• Kafka (metrics and analytics)
• Hadoop & Offline Compute
• Machine Learning
• Data Replication
• Search and Indexing
Growth
![Page 4: LinkedIn DC Network Architecture - LACNIC · Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Spine1 SpineX SpineY Spine32 Spine1 SpineX](https://reader033.vdocument.in/reader033/viewer/2022042404/5f1890444d718037c0683158/html5/thumbnails/4.jpg)
Plan for 10x Scale on Demand Active Active Datacenters (Multi-colo)
2013-2015Capacity Uplift
Capacity Crisis
![Page 5: LinkedIn DC Network Architecture - LACNIC · Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Spine1 SpineX SpineY Spine32 Spine1 SpineX](https://reader033.vdocument.in/reader033/viewer/2022042404/5f1890444d718037c0683158/html5/thumbnails/5.jpg)
Unlimited Bandwidth
Compute on Demand
Scale Cost Effectively
Programmable Datacenter
2016+Innovate for hyperscale
Innovate for Hyperscale
![Page 6: LinkedIn DC Network Architecture - LACNIC · Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Spine1 SpineX SpineY Spine32 Spine1 SpineX](https://reader033.vdocument.in/reader033/viewer/2022042404/5f1890444d718037c0683158/html5/thumbnails/6.jpg)
Freedom and Choice
Move Fast ControlIndependence
QualityMaintenance
RisksSecurity
ChannelProcurement
Build StrategyOwnership
Growth!Scale
EvolveCode & Innovate
FlexibilityCustomization
Modularity
Own the code
![Page 7: LinkedIn DC Network Architecture - LACNIC · Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Spine1 SpineX SpineY Spine32 Spine1 SpineX](https://reader033.vdocument.in/reader033/viewer/2022042404/5f1890444d718037c0683158/html5/thumbnails/7.jpg)
Edge Network to Eyeballs (EdgeConnect)Backbone Network (Falco)
Bare Metal HW (Open19)OS / Kernel (Linux)
Container (LPS)
Application
Own the code
Data Center Network (Open19 + SONiC + OpenFabric)
enables us to solve puzzles & complexities in different ways
Bare Metal HW (Open19)OS / Kernel (Linux)
Container (LPS)
Application
![Page 8: LinkedIn DC Network Architecture - LACNIC · Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Spine1 SpineX SpineY Spine32 Spine1 SpineX](https://reader033.vdocument.in/reader033/viewer/2022042404/5f1890444d718037c0683158/html5/thumbnails/8.jpg)
• Load balancers: moved to application, x86 server running a BGP daemon.
• Firewalls: moved to application/server.
• NAS filers: failover complexity moved to servers running BGP daemons, allowed for L2 to L3 network migration.
On solving puzzles in a different way…
![Page 9: LinkedIn DC Network Architecture - LACNIC · Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Spine1 SpineX SpineY Spine32 Spine1 SpineX](https://reader033.vdocument.in/reader033/viewer/2022042404/5f1890444d718037c0683158/html5/thumbnails/9.jpg)
5-StageBGP Clos
Single SKUData Center
SingleChip Architecture
Project Altair
![Page 10: LinkedIn DC Network Architecture - LACNIC · Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Spine1 SpineX SpineY Spine32 Spine1 SpineX](https://reader033.vdocument.in/reader033/viewer/2022042404/5f1890444d718037c0683158/html5/thumbnails/10.jpg)
Simple Open ProgrammableIndependent
Core Design Principles
![Page 11: LinkedIn DC Network Architecture - LACNIC · Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Spine1 SpineX SpineY Spine32 Spine1 SpineX](https://reader033.vdocument.in/reader033/viewer/2022042404/5f1890444d718037c0683158/html5/thumbnails/11.jpg)
• Simplicity: “perfection has been reached not when there is nothing left to add, but when there is nothing left to take away.”
• Openness: Use community-based tools where possible.
• Independence: Refuse to develop a dependence on a single vendor or vendor-driven architecture (and hence avoid the inevitable forklift upgrades)
• Programmability: Being able to modify the behavior of the data center fabric in near real time in software…
Core Design Principles
![Page 12: LinkedIn DC Network Architecture - LACNIC · Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Spine1 SpineX SpineY Spine32 Spine1 SpineX](https://reader033.vdocument.in/reader033/viewer/2022042404/5f1890444d718037c0683158/html5/thumbnails/12.jpg)
The Building Block: Hardware
Merchant Silicon Custom Designed Switch (ODM)No Big Chassis SwitchesDesigned around robustness (NSR, ISSU, etc.)
Feature-rich but mostly irrelevant to LinkedIn needsNo (FCoE, VXLAN, EVPN, MCLAG, etc.)
Project Falco
![Page 13: LinkedIn DC Network Architecture - LACNIC · Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Spine1 SpineX SpineY Spine32 Spine1 SpineX](https://reader033.vdocument.in/reader033/viewer/2022042404/5f1890444d718037c0683158/html5/thumbnails/13.jpg)
The Building Block: Software
• Unified Architecture: Single SKU (hardware and software) for all switches while procuring hardware from multiple ODM channels (multi-homing)
• Minimum Features: BGP, BFD, IPv4, IPv6, ECMP, LLDP• No Overlay: For the infrastructure, the application is stateless• No Middle-box: (Firewall, Load-balancer, etc.), moved to application• Network is only a set of intermediate boxes running linux• https://github.com/Azure/SONiC
![Page 14: LinkedIn DC Network Architecture - LACNIC · Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Spine1 SpineX SpineY Spine32 Spine1 SpineX](https://reader033.vdocument.in/reader033/viewer/2022042404/5f1890444d718037c0683158/html5/thumbnails/14.jpg)
Pod 1ToRX ToR32ToRYToR1
Pod XToRX ToR32ToRYToR1
Pod YToRX ToR32ToRYToR1 ToR32ToRX ToRY
Pod 64ToR1
Leaf4Leaf3Leaf2Leaf1Leaf4Leaf3Leaf2Leaf1Leaf4Leaf3Leaf2Leaf1Leaf4Leaf3Leaf2Leaf1
Spine32SpineYSpineXSpine1 Spine1 SpineX SpineY Spine32 Spine1 SpineX SpineY Spine32Spine32SpineYSpineXSpine1
ToR
Leaf
Spine
• True 5 Stage Clos Architecture (Maximum Path Length: 5 Chipsets to Minimize Latency)
• Moved complexity from big boxes to our advantage, where we can manage and control!
• Single SKU - Same Chipset - Uniform IO design (Bandwidth, Latency and Buffering)
• Dedicated control plane, OAM and CPU for each ASIC
DC Architecture: Altair Design
![Page 15: LinkedIn DC Network Architecture - LACNIC · Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Spine1 SpineX SpineY Spine32 Spine1 SpineX](https://reader033.vdocument.in/reader033/viewer/2022042404/5f1890444d718037c0683158/html5/thumbnails/15.jpg)
Pod 1
ToRX ToR32ToRYToR1
Pod X
ToRX ToR32ToRYToR1
Pod Y
ToRX ToR32ToRYToR1 ToR32
ToRX ToRY
Pod 64
ToR1
Leaf4Leaf3Leaf2Leaf1Leaf4Leaf3Leaf2Leaf1Leaf4Leaf3Leaf2Leaf1Leaf4Leaf3Leaf2Leaf1
Spine32SpineYSpineXSpine1 Spine1 SpineX SpineY Spine32 Spine1 SpineX SpineY Spine32Spine32SpineYSpineXSpine1
ToR
Leaf
Spine
DC Architecture: Altair Design
![Page 16: LinkedIn DC Network Architecture - LACNIC · Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Spine1 SpineX SpineY Spine32 Spine1 SpineX](https://reader033.vdocument.in/reader033/viewer/2022042404/5f1890444d718037c0683158/html5/thumbnails/16.jpg)
![Page 17: LinkedIn DC Network Architecture - LACNIC · Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Spine1 SpineX SpineY Spine32 Spine1 SpineX](https://reader033.vdocument.in/reader033/viewer/2022042404/5f1890444d718037c0683158/html5/thumbnails/17.jpg)
● Modular and scalable growth
● Efficient server deployment
● Single protocol
● Operations friendly
● Predictable performance/failure
● Automation friendly
● Server-server latency 2.5uS
![Page 18: LinkedIn DC Network Architecture - LACNIC · Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Spine1 SpineX SpineY Spine32 Spine1 SpineX](https://reader033.vdocument.in/reader033/viewer/2022042404/5f1890444d718037c0683158/html5/thumbnails/18.jpg)
Fabric 4
Fabric 3
Fabric 2
Fabric 1
ServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServer
ToR
Server
ServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServer
ToR
Server
Non-blocking Parallel Fabric
![Page 19: LinkedIn DC Network Architecture - LACNIC · Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Spine1 SpineX SpineY Spine32 Spine1 SpineX](https://reader033.vdocument.in/reader033/viewer/2022042404/5f1890444d718037c0683158/html5/thumbnails/19.jpg)
ToR - Top of the Rack
Broadcom Tomahawk 32x 100G
10/25/50/100G Attachement
Regular Server Attachement 10G
Each Cabinet: 96 Dense Compute units
Half Cabinet (Leaf-Zone) 48x 10G port for servers + 4 uplinks of 50G
Full Cabinet: 2x Single ToR Zones: 48 + 48 = 96 Servers
Project Falco
ToR
Server
Leaf
Spine Spine
Leaf Leaf Leaf
Spine Spine
Tier 1
![Page 20: LinkedIn DC Network Architecture - LACNIC · Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Spine1 SpineX SpineY Spine32 Spine1 SpineX](https://reader033.vdocument.in/reader033/viewer/2022042404/5f1890444d718037c0683158/html5/thumbnails/20.jpg)
Leaf
Broadcom Tomahawk 32x 100G
Non-Blocking Topology:
32x downlinks of 50G to serve 32 ToR
32x uplinks of 50G to provide 1:1 Over-subscription
Project Falco
ToR
Server
Leaf
Spine Spine
Leaf Leaf Leaf
Spine Spine
Tier 2
![Page 21: LinkedIn DC Network Architecture - LACNIC · Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Spine1 SpineX SpineY Spine32 Spine1 SpineX](https://reader033.vdocument.in/reader033/viewer/2022042404/5f1890444d718037c0683158/html5/thumbnails/21.jpg)
Spine
Broadcom Tomahawk 32x 100G
Non-Blocking Topology:
64 downlinks to provide 1:1 Over-subscription
To serve 64 pods (each pod 32 ToR)
100,000 Servers: Each pod (Approximately 1550 Compute)
Project Falco
ToR
Server
Leaf
Spine Spine
Leaf Leaf Leaf
Spine Spine
Tier 3
![Page 22: LinkedIn DC Network Architecture - LACNIC · Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Spine1 SpineX SpineY Spine32 Spine1 SpineX](https://reader033.vdocument.in/reader033/viewer/2022042404/5f1890444d718037c0683158/html5/thumbnails/22.jpg)
• Fault isolation.
• Fault correlation and remediation.
• Build and operations automation.
• Physical design.
• Logical design.
Challenges
![Page 23: LinkedIn DC Network Architecture - LACNIC · Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4 Spine1 SpineX SpineY Spine32 Spine1 SpineX](https://reader033.vdocument.in/reader033/viewer/2022042404/5f1890444d718037c0683158/html5/thumbnails/23.jpg)
Looking Ahead
OPS optimization(Prediction & Remediation
Engine)
Open Fabric(New WebScale
Protocol)
12.8Tbps chip(Ultra-Low
Latency)