![Page 1: SONiC Reliability, Manageability and Extensibility… · • Broadcom is contributing heavily to the success of the SONiC project • Open-source • Cloud, DC, Enterprise use-cases](https://reader034.vdocument.in/reader034/viewer/2022050310/5f72842d0d1bf8658b042dda/html5/thumbnails/1.jpg)
![Page 2: SONiC Reliability, Manageability and Extensibility… · • Broadcom is contributing heavily to the success of the SONiC project • Open-source • Cloud, DC, Enterprise use-cases](https://reader034.vdocument.in/reader034/viewer/2022050310/5f72842d0d1bf8658b042dda/html5/thumbnails/2.jpg)
SONiC – Reliability, Manageability and Extensibility
Guohan Lu| Xin LiuAzure Networking, Microsoft
Ben GaleBroadcom Inc
![Page 3: SONiC Reliability, Manageability and Extensibility… · • Broadcom is contributing heavily to the success of the SONiC project • Open-source • Cloud, DC, Enterprise use-cases](https://reader034.vdocument.in/reader034/viewer/2022050310/5f72842d0d1bf8658b042dda/html5/thumbnails/3.jpg)
More apps SNMP BGP DHCP IPv6
SYNCD
LLDP
RedisDB
TeamD
New New
![Page 4: SONiC Reliability, Manageability and Extensibility… · • Broadcom is contributing heavily to the success of the SONiC project • Open-source • Cloud, DC, Enterprise use-cases](https://reader034.vdocument.in/reader034/viewer/2022050310/5f72842d0d1bf8658b042dda/html5/thumbnails/4.jpg)
LinuxBasic L2/L3ContainerizedRedis DB
RDMA/QoSIPv6Mgmt. via SwarmFast Reboot(<30s)
Streaming TelemetryConfig DBSupport VirtualizationWarm Reboot (<1s)
Richer FeaturesAdvanced MgmtStringent TestsDevelopment Tools
40G5 platforms
100G16 platforms
ARM based & Lower end31 platforms
Chassis Support92 platforms
2016 2017 2018 2019
Powering AI/gaming servicePowering bare metal servicePowering data center ToR/Leaf
ASICBRCM: Trident 2MLNX: Spectrum
Cavium: XpliantCentec: Goldengate
ASIC
BRCM: Tomahawk/ Tomahawk2Marvell: PresteraBarefoot: Tofino
ASIC
Nephos: TaurusBRCM: TD2/TH3, Helix4Cisco: Lacrosse
ASIC
BRCM: DNX Innovium: TeralynxMarvell: FalconMLNX: Spectrum II
Commercial supportMore industry adoption
![Page 5: SONiC Reliability, Manageability and Extensibility… · • Broadcom is contributing heavily to the success of the SONiC project • Open-source • Cloud, DC, Enterprise use-cases](https://reader034.vdocument.in/reader034/viewer/2022050310/5f72842d0d1bf8658b042dda/html5/thumbnails/5.jpg)
Growing Ecosystem
![Page 6: SONiC Reliability, Manageability and Extensibility… · • Broadcom is contributing heavily to the success of the SONiC project • Open-source • Cloud, DC, Enterprise use-cases](https://reader034.vdocument.in/reader034/viewer/2022050310/5f72842d0d1bf8658b042dda/html5/thumbnails/6.jpg)
Newly Joined Members since 2019Last Year
![Page 7: SONiC Reliability, Manageability and Extensibility… · • Broadcom is contributing heavily to the success of the SONiC project • Open-source • Cloud, DC, Enterprise use-cases](https://reader034.vdocument.in/reader034/viewer/2022050310/5f72842d0d1bf8658b042dda/html5/thumbnails/7.jpg)
SONiC – Warm Boot for High Reliability
![Page 8: SONiC Reliability, Manageability and Extensibility… · • Broadcom is contributing heavily to the success of the SONiC project • Open-source • Cloud, DC, Enterprise use-cases](https://reader034.vdocument.in/reader034/viewer/2022050310/5f72842d0d1bf8658b042dda/html5/thumbnails/8.jpg)
Warm Boot: A True Community Effort
![Page 9: SONiC Reliability, Manageability and Extensibility… · • Broadcom is contributing heavily to the success of the SONiC project • Open-source • Cloud, DC, Enterprise use-cases](https://reader034.vdocument.in/reader034/viewer/2022050310/5f72842d0d1bf8658b042dda/html5/thumbnails/9.jpg)
Fast Boot
OS Reboot (kexec)
OS Boots up
Data Plane Reset
Data Plane Restored
Routing
Control plane
Data Plane
Data plane disruption < 30 seconds
Control plane disruption < 90 seconds
![Page 10: SONiC Reliability, Manageability and Extensibility… · • Broadcom is contributing heavily to the success of the SONiC project • Open-source • Cloud, DC, Enterprise use-cases](https://reader034.vdocument.in/reader034/viewer/2022050310/5f72842d0d1bf8658b042dda/html5/thumbnails/10.jpg)
Warm Boot
Control plane disruption < 90 secondsData plane disruption < 1 second
O.S Reboot SONiCStarts
ASIC WarmInit
State Reconciliation, via SAI state-driven API
Warm RebootFinishes
Routing
Control plane
Data Plane
![Page 11: SONiC Reliability, Manageability and Extensibility… · • Broadcom is contributing heavily to the success of the SONiC project • Open-source • Cloud, DC, Enterprise use-cases](https://reader034.vdocument.in/reader034/viewer/2022050310/5f72842d0d1bf8658b042dda/html5/thumbnails/11.jpg)
Warm Boot Architecture
1. Warm boot script stores App/ASIC DB on disc2. Redis restores App/ASIC DB after reboot3. OA reads AppDB and compiles a new ASIC DB4. SyncD compares old/new ASIC DB, and apply
diff to the ASIC5. Applications waking up in parallel
• May staged changes to App DB• OA comes in as usual, updates ASIC dB• SyncD keeps syncing ASIC DB to hardware
APP DB
ASIC DB O
bjec
t Lib
rary
w/
Red
is B
acke
nd
ASIC
SAI
Network Applications
SyncD
Orchestration Agent
![Page 12: SONiC Reliability, Manageability and Extensibility… · • Broadcom is contributing heavily to the success of the SONiC project • Open-source • Cloud, DC, Enterprise use-cases](https://reader034.vdocument.in/reader034/viewer/2022050310/5f72842d0d1bf8658b042dda/html5/thumbnails/12.jpg)
We Are Not Done Yet – Control Plane?
O.S Reboot SONiCStarts
ASIC WarmInit
State Reconciliation, via SAI state-driven API
Warm RebootFinishes
Routing
Control plane
Data Plane
What about ARP, DHCP, etc.?
![Page 13: SONiC Reliability, Manageability and Extensibility… · • Broadcom is contributing heavily to the success of the SONiC project • Open-source • Cloud, DC, Enterprise use-cases](https://reader034.vdocument.in/reader034/viewer/2022050310/5f72842d0d1bf8658b042dda/html5/thumbnails/13.jpg)
Control Plane Assistant
• TOR ASIC encaps ARP and send to CPA
• CPA responds with ARP reply
• TOR ASIC decaps the ARP and sends to the server
TOR
Server
Control Plane Assistant
ARPRequest
ARP Reply
ARPRequest
ARPRequest
ARPReplyARP
Reply
![Page 14: SONiC Reliability, Manageability and Extensibility… · • Broadcom is contributing heavily to the success of the SONiC project • Open-source • Cloud, DC, Enterprise use-cases](https://reader034.vdocument.in/reader034/viewer/2022050310/5f72842d0d1bf8658b042dda/html5/thumbnails/14.jpg)
SONiC - Management Framework
Broadcom and Dell
![Page 15: SONiC Reliability, Manageability and Extensibility… · • Broadcom is contributing heavily to the success of the SONiC project • Open-source • Cloud, DC, Enterprise use-cases](https://reader034.vdocument.in/reader034/viewer/2022050310/5f72842d0d1bf8658b042dda/html5/thumbnails/15.jpg)
Broadcom and SONiC• Broadcom is contributing heavily to the success of the SONiC project
• Open-source• Cloud, DC, Enterprise use-cases
• Engaging closely with Community (upstream, reviews, testing, support etc)
• Features contributed include: -• Now (201910)
• ZTP, NAT, STP/PVST, BFD, L2/L3 Enhancements, MMU Thresholds, Platform Development Kit, VRF-Lite, Error Handling, Debug Framework, Core dump file handling, Build Improvements
• Management Framework• Next
• EVPN/VXLAN, M-LAG, IP Multicast (IGMP, PIM-SSM), VRRP, IGMP Snooping, MSTP, RADIUS, IPv6 Improvements, PTP, Instrumentation/Telemetry
• Management Framework Improvements, FRR Management Integration, RBAC• Items in red are a joint effort with Dell, and is our main topic today
![Page 16: SONiC Reliability, Manageability and Extensibility… · • Broadcom is contributing heavily to the success of the SONiC project • Open-source • Cloud, DC, Enterprise use-cases](https://reader034.vdocument.in/reader034/viewer/2022050310/5f72842d0d1bf8658b042dda/html5/thumbnails/16.jpg)
Management Framework Goals
• Integrated management experience for SONiC• Industry standard CLI• Standards-based Programmatic Interfaces (e.g. OpenConfig)• OEM-style AAA/Security• Broad Feature Coverage
• Create a Framework to allow: -• Rapid UI development from standard or custom data models
• CLI, REST/RESTCONF, gNMI etc• Full configuration validation and error response
• Start the process of filling out UI content
![Page 17: SONiC Reliability, Manageability and Extensibility… · • Broadcom is contributing heavily to the success of the SONiC project • Open-source • Cloud, DC, Enterprise use-cases](https://reader034.vdocument.in/reader034/viewer/2022050310/5f72842d0d1bf8658b042dda/html5/thumbnails/17.jpg)
The Big Picture
SONiC
TransLib
SONiC AppsSONiC DBs
SONiC DBsSONiC DBs
SONiC AppsSONiC
Apps
REST Service gNMI ServiceOther
REST Client gNMI Client
Klish CLI
Standard YANG
SONiC YANG App HandlersApp Handlers
App HandlersApp Handlers
Platform Data
Dev/Build
Custom ValidationCustom
Validation
SDK
Developer
XML ScriptsScripts
Scripts
Framework
Key
![Page 18: SONiC Reliability, Manageability and Extensibility… · • Broadcom is contributing heavily to the success of the SONiC project • Open-source • Cloud, DC, Enterprise use-cases](https://reader034.vdocument.in/reader034/viewer/2022050310/5f72842d0d1bf8658b042dda/html5/thumbnails/18.jpg)
Implementation Work
• SONiC 201910• Complete Framework, along with Developer Training and Guidelines
• See https://github.com/Azure/SONiC/pull/436 for more• Defined guidelines for writing “SONiC YANG” • Feature implementations
• gNMI/OpenConfig - interfaces, LLDP, System, Platform, ACL• Supporting IS-CLI for the above• REST service
• Future releases• Framework Improvements and Optimization
• Taking feedback from the User Community• Full Privilege-level based Authorization, RBAC• Much more feature content (including SONiC legacy features)
• Incl. FRR
![Page 19: SONiC Reliability, Manageability and Extensibility… · • Broadcom is contributing heavily to the success of the SONiC project • Open-source • Cloud, DC, Enterprise use-cases](https://reader034.vdocument.in/reader034/viewer/2022050310/5f72842d0d1bf8658b042dda/html5/thumbnails/19.jpg)
SONiC – Extensibility For New Scenarios
![Page 20: SONiC Reliability, Manageability and Extensibility… · • Broadcom is contributing heavily to the success of the SONiC project • Open-source • Cloud, DC, Enterprise use-cases](https://reader034.vdocument.in/reader034/viewer/2022050310/5f72842d0d1bf8658b042dda/html5/thumbnails/20.jpg)
Layer-4 Load Balancing
Layer-4 load balancing is a critical function
handle both inbound and inter-service traffic
>40%* of cloud traffic needs load balancing (Ananta [SIGCOMM’13])
20
VIP1
DIP1 DIP2 DIP3 DIP4 DIP5
VIP
L4 Load Balancer
![Page 21: SONiC Reliability, Manageability and Extensibility… · • Broadcom is contributing heavily to the success of the SONiC project • Open-source • Cloud, DC, Enterprise use-cases](https://reader034.vdocument.in/reader034/viewer/2022050310/5f72842d0d1bf8658b042dda/html5/thumbnails/21.jpg)
Frequent DIP pool updatesDIP pool updates
failures, service expansion, service upgrade, etc.
up to 100 updates per minute in a Big cluster
Hash function changes under DIP pool updates
packets of a connection get to different DIPs
connection is broken
21
VIP1
L4 Load Balancer
Hash(flow) % 2
ECMP: Hash(flow) = 8
Hash(flow) % 3
VIP1
![Page 22: SONiC Reliability, Manageability and Extensibility… · • Broadcom is contributing heavily to the success of the SONiC project • Open-source • Cloud, DC, Enterprise use-cases](https://reader034.vdocument.in/reader034/viewer/2022050310/5f72842d0d1bf8658b042dda/html5/thumbnails/22.jpg)
Layer-4 Load Balancing
Broken connections degrade the performance of cloud services
tail latency, service level agreement, etc.
PCC: all the packets of a connection go to the same DIP
Per-connection consistency
22
L4 load balancing needs connection states
L4 load balancing needs connection states in HW?
Programmable ASICs allows such states in HW
![Page 23: SONiC Reliability, Manageability and Extensibility… · • Broadcom is contributing heavily to the success of the SONiC project • Open-source • Cloud, DC, Enterprise use-cases](https://reader034.vdocument.in/reader034/viewer/2022050310/5f72842d0d1bf8658b042dda/html5/thumbnails/23.jpg)
Load balancer on SONiC
SONiC Provide basic functions for managing switchesL2/L3 forwarding
Management plane such as LLDP, SNMP, Telemetry
Extending SONiCIntroduce new config db entries
Extend SwSS to provide VIP-to-DIP management
Extend SAI to manage programmable ASICs
![Page 24: SONiC Reliability, Manageability and Extensibility… · • Broadcom is contributing heavily to the success of the SONiC project • Open-source • Cloud, DC, Enterprise use-cases](https://reader034.vdocument.in/reader034/viewer/2022050310/5f72842d0d1bf8658b042dda/html5/thumbnails/24.jpg)
SONiC – Load Balancer Config DB Schema
DIP table• Key • DIP
• Data• Weight(optional)• state {active, disabled}• DMAC• underlay DIP• VNI
VIP table• Key• VIP
• Data• {list of DIP}• Num of DIP(optional)
![Page 25: SONiC Reliability, Manageability and Extensibility… · • Broadcom is contributing heavily to the success of the SONiC project • Open-source • Cloud, DC, Enterprise use-cases](https://reader034.vdocument.in/reader034/viewer/2022050310/5f72842d0d1bf8658b042dda/html5/thumbnails/25.jpg)
Extending SwSS on SONiC
App DB
ASIC DB
Config DB SNMP BGP DHCP IPv6
SYNCD*
LLDPTeamD
SWSS
LB Mgr
LB orchConnection
Table
SLB stat
TCP syn,fin,reset
Load balancer Container
![Page 26: SONiC Reliability, Manageability and Extensibility… · • Broadcom is contributing heavily to the success of the SONiC project • Open-source • Cloud, DC, Enterprise use-cases](https://reader034.vdocument.in/reader034/viewer/2022050310/5f72842d0d1bf8658b042dda/html5/thumbnails/26.jpg)
FlexSAI Extension
![Page 27: SONiC Reliability, Manageability and Extensibility… · • Broadcom is contributing heavily to the success of the SONiC project • Open-source • Cloud, DC, Enterprise use-cases](https://reader034.vdocument.in/reader034/viewer/2022050310/5f72842d0d1bf8658b042dda/html5/thumbnails/27.jpg)
Demo Client
Generate 100K connection per sec
Average connection live time 10 sec
SLB box
Load balancer
Single VIP
1K DIP
Controller
Create DIP change in average every 10 sec
Server
Receive and monitor connection
SLBClient Server
controller
![Page 28: SONiC Reliability, Manageability and Extensibility… · • Broadcom is contributing heavily to the success of the SONiC project • Open-source • Cloud, DC, Enterprise use-cases](https://reader034.vdocument.in/reader034/viewer/2022050310/5f72842d0d1bf8658b042dda/html5/thumbnails/28.jpg)
Bin distribution and hardware CT size
![Page 29: SONiC Reliability, Manageability and Extensibility… · • Broadcom is contributing heavily to the success of the SONiC project • Open-source • Cloud, DC, Enterprise use-cases](https://reader034.vdocument.in/reader034/viewer/2022050310/5f72842d0d1bf8658b042dda/html5/thumbnails/29.jpg)
Open Invitation
Inviting contributions in all areas
• SONiC/SAI
• Hardware platform
• New features, applications, tests and tools
• Download, test, Deploy!
Website: https://azure.github.io/SONiC/
Source code: https://github.com/Azure/SONiC/blob/gh-pages/sourcecode.md
![Page 30: SONiC Reliability, Manageability and Extensibility… · • Broadcom is contributing heavily to the success of the SONiC project • Open-source • Cloud, DC, Enterprise use-cases](https://reader034.vdocument.in/reader034/viewer/2022050310/5f72842d0d1bf8658b042dda/html5/thumbnails/30.jpg)
We will be in Room G108 from 12:00 ~ 2:30pm today for Q&A, welcome !
![Page 31: SONiC Reliability, Manageability and Extensibility… · • Broadcom is contributing heavily to the success of the SONiC project • Open-source • Cloud, DC, Enterprise use-cases](https://reader034.vdocument.in/reader034/viewer/2022050310/5f72842d0d1bf8658b042dda/html5/thumbnails/31.jpg)