innovating the cloud network.… · control plane disruption < 90 seconds data plane disruption...

23

Upload: others

Post on 20-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Innovating the Cloud Network.… · Control plane disruption < 90 seconds Data plane disruption < 1 second O.S Reboot SONiC Starts ASIC Warm Init State Reconciliation, via SAI state-driven
Page 2: Innovating the Cloud Network.… · Control plane disruption < 90 seconds Data plane disruption < 1 second O.S Reboot SONiC Starts ASIC Warm Init State Reconciliation, via SAI state-driven

Lihua YuanPartner Dev Manager

Innovating the Cloud Network

Xin LiuPrincipal Product Manager

Microsoft Azure Networking Team

Page 3: Innovating the Cloud Network.… · Control plane disruption < 90 seconds Data plane disruption < 1 second O.S Reboot SONiC Starts ASIC Warm Init State Reconciliation, via SAI state-driven

More apps SNMP BGP DHCP IPv6

SYNCD

LLDP

RedisDB

TeamD

New New

Page 4: Innovating the Cloud Network.… · Control plane disruption < 90 seconds Data plane disruption < 1 second O.S Reboot SONiC Starts ASIC Warm Init State Reconciliation, via SAI state-driven
Page 5: Innovating the Cloud Network.… · Control plane disruption < 90 seconds Data plane disruption < 1 second O.S Reboot SONiC Starts ASIC Warm Init State Reconciliation, via SAI state-driven

Welcome

Page 6: Innovating the Cloud Network.… · Control plane disruption < 90 seconds Data plane disruption < 1 second O.S Reboot SONiC Starts ASIC Warm Init State Reconciliation, via SAI state-driven

SONiC Keeps Evolving

Data Plane

• L3 VxLan Support

• Large Table/Deep Buffer Devices

Routing Stack

• Quagga → FRR

• cRPD from Juniper

Telemetry

• gRPC for streaming telemetry

• Dataplane Telemetry (Dtel) extension

• Virtual Path for streaming telemetry

Reliability

• Warm Reboot

• Routing stack graceful start

RDMA

• PFC Watermark

• Asymmetric PFC

Configuration

• Incremental config

• ConfigDB

Platform Management

• Sensor/Transceiver monitoring

• Dynamic Parameter Tuning

• Platform Enhancement (PMON)

New Platforms

• Juniper PTX

• Broadcom TH3, JR2

• Mellanox Spectrum II

• Facebook Mini-pack

• Marvell 12.8T Falcon and ARM based switch

• Innovium Teralynx

• And more

System

• Kernel Upgrade

• Component docker upgrade

• Security patches

Page 7: Innovating the Cloud Network.… · Control plane disruption < 90 seconds Data plane disruption < 1 second O.S Reboot SONiC Starts ASIC Warm Init State Reconciliation, via SAI state-driven
Page 8: Innovating the Cloud Network.… · Control plane disruption < 90 seconds Data plane disruption < 1 second O.S Reboot SONiC Starts ASIC Warm Init State Reconciliation, via SAI state-driven

Warm Boot: A True Community Effort

Page 9: Innovating the Cloud Network.… · Control plane disruption < 90 seconds Data plane disruption < 1 second O.S Reboot SONiC Starts ASIC Warm Init State Reconciliation, via SAI state-driven

Fast Boot

OS Reboot

(kexec)

OS Boots

up

Data Plane

Reset

Data Plane Restored

Routing

Control plane

Data Plane

Data plane disruption < 30 seconds

Control plane disruption < 90 seconds

Page 10: Innovating the Cloud Network.… · Control plane disruption < 90 seconds Data plane disruption < 1 second O.S Reboot SONiC Starts ASIC Warm Init State Reconciliation, via SAI state-driven

Warm Boot

Control plane disruption < 90 seconds

Data plane disruption < 1 second

O.S Reboot SONiC

Starts

ASIC

Warm

InitState Reconciliation, via SAI state-driven API

Warm Reboot

Finishes

Routing

Control plane

Data Plane

Page 11: Innovating the Cloud Network.… · Control plane disruption < 90 seconds Data plane disruption < 1 second O.S Reboot SONiC Starts ASIC Warm Init State Reconciliation, via SAI state-driven

Warm Boot Architecture

1. Warm boot script stores App/ASIC DB on disc2. Redis restores App/ASIC DB after reboot3. OA reads AppDB and compiles a new ASIC DB4. SyncD compares old/new ASIC DB, and apply

diff to the ASIC5. Applications waking up in parallel• May staged changes to App DB• OA comes in as usual, updates ASIC dB• SyncD keeps syncing ASIC DB to hardware

APP

DB

ASIC

DB

Ob

ject

Lib

rary

w/

Redis

Backen

d

ASIC

SAI

Network

Applications

SyncD

Orchestration Agent

Page 12: Innovating the Cloud Network.… · Control plane disruption < 90 seconds Data plane disruption < 1 second O.S Reboot SONiC Starts ASIC Warm Init State Reconciliation, via SAI state-driven
Page 13: Innovating the Cloud Network.… · Control plane disruption < 90 seconds Data plane disruption < 1 second O.S Reboot SONiC Starts ASIC Warm Init State Reconciliation, via SAI state-driven

We are not done yet – Control Plane?

O.S Reboot SONiC

Starts

ASIC

Warm

InitState Reconciliation, via SAI

state-driven API

Warm Reboot

Finishes

Routing

Control plane

Data Plane

What about ARP, DHCP, etc.?

Page 14: Innovating the Cloud Network.… · Control plane disruption < 90 seconds Data plane disruption < 1 second O.S Reboot SONiC Starts ASIC Warm Init State Reconciliation, via SAI state-driven

Control Plane Assistant (Upcoming)

CPU

ASIC

Send

up

AssistantCPU

ASIC

tunnel

to A

• ASIC → Assistant:

• ERSPAN mirror

• Assistant → ASIC:

• Assistant encap the payload meant for neighbors

• ASIC decap and forward

Page 15: Innovating the Cloud Network.… · Control plane disruption < 90 seconds Data plane disruption < 1 second O.S Reboot SONiC Starts ASIC Warm Init State Reconciliation, via SAI state-driven

SONiC Support for Disaggregated Chassis

Page 16: Innovating the Cloud Network.… · Control plane disruption < 90 seconds Data plane disruption < 1 second O.S Reboot SONiC Starts ASIC Warm Init State Reconciliation, via SAI state-driven

SONiC Is Powering Microsoft At Cloud Scale

T2-1-1 T2-1-2 T2-1-8

T3-1 T3-2 T3-3 T3-4

Tier 1 – Row Leaf

T2-4-1 T2-4-2 T2-4-4Tier 2 – Data center

T1-1 T1-8T1-7…

T1-2

… …

Tier 3 – Regional

T1-1 T1-8T1-7…

T1-2 T1-1 T1-8T1-7…

T1-2

Tier 0 – Rack

…T0-1 T0-2 T0-20

Servers

…T0-1 T0-2 T0-20

…T0-1 T0-2 T0-20

SONiC SONiC SONiC SONiC SONiC SONiC SONiC SONiC SONiC SONiC SONiC SONiC

SONiC SONiC SONiC SONiC SONiC SONiCSONiC SONiC SONiC

Servers Servers

Page 17: Innovating the Cloud Network.… · Control plane disruption < 90 seconds Data plane disruption < 1 second O.S Reboot SONiC Starts ASIC Warm Init State Reconciliation, via SAI state-driven

Enabling SONiC Beyond Tier 1?

T2-1-1 T2-1-2 T2-1-8

T3-1 T3-2 T3-3 T3-4

Tier 1 – Row Leaf

T2-4-1 T2-4-2 T2-4-4Tier 2 – Data center

T1-1 T1-8T1-7…

T1-2

… …

Tier 3 – Regional

T1-1 T1-8T1-7…

T1-2 T1-1 T1-8T1-7…

T1-2

Tier 0 – Rack

…T0-1 T0-2 T0-20

Servers

…T0-1 T0-2 T0-20

…T0-1 T0-2 T0-20

SONiC SONiC SONiC SONiC SONiC SONiC SONiC SONiC SONiC SONiC SONiC SONiC

SONiC SONiC SONiC SONiC SONiC SONiCSONiC SONiC SONiC

Servers Servers

Chassis

Page 18: Innovating the Cloud Network.… · Control plane disruption < 90 seconds Data plane disruption < 1 second O.S Reboot SONiC Starts ASIC Warm Init State Reconciliation, via SAI state-driven

Chassis – the challenges

Frontend

ASICFrontend

ASIC

Frontend

ASIC

Frontend

ASIC

Ethernet ports

Linecards

Backend

ASIC

Backend

ASICBackplane

Sheet

Metal

- Power efficiency

- Port density

- Low table scale on backend ASICs

- No standard topology/connectivity

- Proprietary ports/packet format

- Proprietary switching/load balancing

Page 19: Innovating the Cloud Network.… · Control plane disruption < 90 seconds Data plane disruption < 1 second O.S Reboot SONiC Starts ASIC Warm Init State Reconciliation, via SAI state-driven

SONiC Support for Disaggregated Chassis

SONiC SONiC SONiC

SONiCSONiCSONiC

1000+ Ports

CLOS Topology with Ethernet ports

VXLAN-based switching

Each front end chip is a VXLAN Tunnel End Point (VTEP)

Packets inside the chassis are encapsulated with VXLAN headers

BGP-EVPN as the internal control plane

One SONiC/BGP instance per ASIC

Frontend SONiC directly redistribute routes using EVPN

BGP-EVPN

VTEP

EBGP

Page 20: Innovating the Cloud Network.… · Control plane disruption < 90 seconds Data plane disruption < 1 second O.S Reboot SONiC Starts ASIC Warm Init State Reconciliation, via SAI state-driven

SONiC Disaggregated Chassis Demo at Booth

Page 21: Innovating the Cloud Network.… · Control plane disruption < 90 seconds Data plane disruption < 1 second O.S Reboot SONiC Starts ASIC Warm Init State Reconciliation, via SAI state-driven

Commercial supportMore industry adoption

Powering AI/gaming servicePowering bare metal servicePowering data center ToR/Leaf

Page 22: Innovating the Cloud Network.… · Control plane disruption < 90 seconds Data plane disruption < 1 second O.S Reboot SONiC Starts ASIC Warm Init State Reconciliation, via SAI state-driven

Open InvitationInviting contributions in all areas

• SONiC/SAI

• Hardware platform

• New features, applications and tools

• Download, test, deploy!

Website: https://azure.github.io/SONiC/

Mailing list: [email protected]

Source code: https://github.com/Azure/SONiC/blob/gh-pages/sourcecode.md

Wiki: https://github.com/Azure/SONiC/wiki/

Page 23: Innovating the Cloud Network.… · Control plane disruption < 90 seconds Data plane disruption < 1 second O.S Reboot SONiC Starts ASIC Warm Init State Reconciliation, via SAI state-driven