linkedin's approach to programmable data center

34
LinkedIn’s Approach to Programmable Data Center Shawn Zandi Principal Architect Infrastructure Engineering

Upload: shawn-zandi

Post on 19-Jan-2017

288 views

Category:

Internet


0 download

TRANSCRIPT

Page 1: LinkedIn's Approach to Programmable Data Center

LinkedIn’s Approach to Programmable Data Center

Shawn ZandiPrincipal ArchitectInfrastructure Engineering

Page 2: LinkedIn's Approach to Programmable Data Center

LinkedIn Infrastructure• Infrastructure architecture based on application’s behavior &

requirements

• Pre-planned static topology

• Single operator

• Single tenant with many applications

• As oppose to multi-tenant with different (or unknown) needs

• 34% infrastructure growth on annual basis and close to half a billion users

Page 3: LinkedIn's Approach to Programmable Data Center

Edge Network to EyeballsBackbone Network

Bare Metal

Operating SystemContainer

Application

End to End Network Design

Data Center Network

End to end control enables us to tackle problems at different parts of the stack

From application code, os, network or client software to solve by architecture…

Bare Metal

Operating SystemContainer

Application

Page 4: LinkedIn's Approach to Programmable Data Center

Traffic Demands• High intra and inter-DC bandwidth demand due to organic growth

• Every single byte of member activity, creates thousands bytes of east-west traffic inside data center:• Application Call Graph• Metrics, Analytics and Tracking via Kafka• Hadoop and Offline Jobs• Machine Learning• Data Replications• Search and Indexing• Ads, recruiting solutions, etc.

Page 5: LinkedIn's Approach to Programmable Data Center

Scaling Out Data Centers Network - Hardware

• White-box Switches (ODM)

• Vendor Switches Based (OEM)

• Based on Merchant Silicon

• Big Chassis Switches• Designed around robustness (NSR, ISSU, etc.)• Feature-rich but mostly irrelevant to LinkedIn needs

Project Falco

Page 6: LinkedIn's Approach to Programmable Data Center

Data center were designed by redundant chassis at corecontrolling and forwarding east-west and north-south traffic

Page 7: LinkedIn's Approach to Programmable Data Center

Pod W

SpineSpine

Spine

LeafLeaf

LeafLeaf

Spine

Pod X

SpineSpine

Spine

LeafLeaf

LeafLeaf

Spine

Pod Y

SpineSpine

Spine

LeafLeaf

LeafLeaf

Spine

Pod Z

SpineSpine

Spine

LeafLeaf

LeafLeaf

Spine

Fabric 2Spine Spine Spine

Fabric 4

Spine Spine Spine

Fabric 1Spine Spine Spine

4,096 x100G portsNon-Blocking

Scale-out

Spine Spine Spine

Fabric 3

Chassis Free Data Center

Page 8: LinkedIn's Approach to Programmable Data Center

Why No Chassis?

• Robust-yet-Fragile• Complex due to NSR, ISSU, feature-sets, etc.• Larger fault domain, Fail-over/Fail-back• Indeterministic boot up process and long upgrade procedures

• Moved complexity from big boxes to our advantage, where we can manage and control!• Better control and visibility to internals by removing black-box abstraction!• Same Switch SKU on ToR, Leaf and Spine (Entire DC)• Single chipset uniform IO design (same bandwidth, latency and buffering)

• True 5-Staged Clos Topology! with deterministic latency• Dedicated control plane, OAM and CPU for each ASIC

Page 9: LinkedIn's Approach to Programmable Data Center

W X Y ZW X Y ZW X Y Z

Distributed Control Plane Complexity

Pod 12 32…1

Pod 11322 352…321

Pod 21642 672…641

Pod 31962 992…961

W X Y Z

2171217021692168213121302129212820912090208920882051205020492048

2339233823372336 2368 2369 2370 2371 2400 2401 2402 24032307230623052304

Page 10: LinkedIn's Approach to Programmable Data Center

“Fabric wide visibility and telemetry”

The wider the fabric, flow tracking and fault isolation becomes more difficult

Problem 1

Page 11: LinkedIn's Approach to Programmable Data Center

“Fabric wide traffic distribution and packet scheduling!”

Forwarding is different than routing, and out of scope for routing protocols.

Problem 2

We need a robust and scalable control protocol designed for a data center fabric

Page 12: LinkedIn's Approach to Programmable Data Center

Control Plane :: Routing

• Routing protocols provide destination-based reachability information

• Routing protocols are not traffic aware.• Best path selection is elementary.• Network graph is built based on series of ECMP groups,

“Routing protocols are more about the destination than the journey”

Page 13: LinkedIn's Approach to Programmable Data Center

ECMP forwarding simply does not cut it!

Problem #3

Page 14: LinkedIn's Approach to Programmable Data Center

ECMP is not really equal!• Elephants and mice issue• ECMP Hashing is not bandwidth aware. Devices use an

algorithm to distribute traffic amongst links regardless of load.

• Traffic is routed using shortest path, not all the available paths, hence not maximizing all the available capacity. Some links may suffer while the other may be underutilized.

• Flows stick to a certain path, as hashing is performed per flow. An established socket cannot be moved to a different path easily!

Page 15: LinkedIn's Approach to Programmable Data Center

“We need a robust and scalable fabric-wide forwarding policy”

Problem #4

Page 16: LinkedIn's Approach to Programmable Data Center

Lack of Centralized Policy and Control

• The more parallel links you add, forwarding decision becomes more random.

• Devices were configured and maintained individually• Routing/Forwarding policy management tasks are performed

individually and hop by hop.• Know when/where to centralize or distribute to scale out!

Page 17: LinkedIn's Approach to Programmable Data Center

“End to End Path Selection & Control”

No application, protocol or packet can dictate a pathCentralized flow based routing does not scale!

Problem #5

Page 18: LinkedIn's Approach to Programmable Data Center

“Using the same familiar, robust and well-known solutions brings along the same restrictions when they were originally designed”

Problem #6

Page 19: LinkedIn's Approach to Programmable Data Center

Hardware

Network

Transport

Application

BGP (1990s)

Clos Topology (1950s)

Ethernet & IP (1980s)

Page 20: LinkedIn's Approach to Programmable Data Center

IP Routing History• IP routing is defined hop-by-hop• BGP is “the” IDR designed to work between different

autonomous system, to provide policy and control between different routing domains to select a best path.

• True: BGP can scale and is extensible. BGP has many policy knobs.

• A datacenter fabric operated under a single administrative domain instead of series of individual routers with different policies and decision process.

Page 21: LinkedIn's Approach to Programmable Data Center

Forwarding traffic based on demands & patterns:• Application• Latency• Loss• Bandwidth (Throughput)

Programmable Data Center

A data center fabric that distributes traffic amongst all available links efficiently and effectively while maintaining lowest latency and providing the most

possible bandwidth to different applications based on different needs and priorities.

Page 22: LinkedIn's Approach to Programmable Data Center

Program forwarding tables individually on all switches from a centralized location

Approach #1

Page 23: LinkedIn's Approach to Programmable Data Center

Flow x > Port 1

Flow x > Port 3

Flow x > Port 2

Forwarding and Control Element Separation

Page 24: LinkedIn's Approach to Programmable Data Center

Encode path information into packet header

Approach #2

Page 25: LinkedIn's Approach to Programmable Data Center

Distributed control plane for topology discovery and reachability information +

Use a controller software for forwarding policy and optimizations

Approach #3

Page 26: LinkedIn's Approach to Programmable Data Center

Scale: No state or flow information required to be stored on every boxNetwork can choose and move flows dynamicallyApplication can choose and move flows dynamicallyWorks with existing data plane (merchant silicon support)Supports ECMP with fallback to IP routingAutomatic Local Repair / LFA

Page 27: LinkedIn's Approach to Programmable Data Center

Hardware

Routing

Policy

Applications

Link Selection and Scheduling

Topology Discovery and Network Graph

Control

Telemetry/Visibility, Machine Learning, Prediction Engine, Self Healing, etc.

Forwarding

Merchant Silicon

Rethinking The Network Stack

Page 28: LinkedIn's Approach to Programmable Data Center

Network Element

Management Plane

SNMP, Syslog, etc.

System & Environmental

DataPacket & Flow

Data

Network Operating System

Kafka Network Agent

ASIC SystemDrivers

Reducing Protocols

Page 29: LinkedIn's Approach to Programmable Data Center

Network ElementNetwork ElementNetwork ElementNetwork Element

Management Plane

SNMP, Syslog, etc.

System & Environmental

DataPacket & Flow

Data

Network Operating System

Kafka Agent

Monitoring and Management System

Kafka Broker

Machine Learning & Data Processing

Alert Processor

Log Retention Data Store

Event Correlation

Kafka Pub/Sub Pipeline

Record, Process and Replay Network State

Page 30: LinkedIn's Approach to Programmable Data Center

Open19

OpenFabric

ASIC ODM

RIB / Forwarding Abstraction LayerFALCO

Apps

Linux OS

Hardware

Physical Layer

Hardware Abstraction Layer

Metrics & AnalyticsMachine LearningSelf Healingetc. (API to Infrastructure)

Policy & Control

Operating System

Base Networking

LinkedIn Infrastructure Strategy

Page 31: LinkedIn's Approach to Programmable Data Center

• Unified Architecture

• We used a single SKU (hardware and software) for all switches while procuring hardware from multiple ODM channels (multi-homing)

• One Software: Base Networking on Merchant Silicon with minimum req. features

• No Overlay - For the infrastructure, the application is stateless

• No Middle-box (Firewall, Load-balancer, etc.) Moved to application

• Network is only a set of intermediate boxes running linux

Simplified Infrastructure to Own

Page 32: LinkedIn's Approach to Programmable Data Center

• To control and own your architecture:

• End to end stack (app, operating system, network and architecture.)

• Ultimate sophistication: Simplicity

• In house support as far as possible

• Move complexity to your comfort zone!

Stay in Control

Page 33: LinkedIn's Approach to Programmable Data Center

• SDN is nor a protocol or a tool or product off the shelf

• SDN is the whole network stack and architecture that enables applications to meet and interact with infrastructure to:

SDN for LinkedIn

• Discover and Learn• Provision• Manage• Control• Monitor