15-744: computer networking l-14 data center networking ii

40
15-744: Computer Networking L-14 Data Center Networking II

Upload: brett-oconnor

Post on 12-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 15-744: Computer Networking L-14 Data Center Networking II

15-744: Computer Networking

L-14 Data Center Networking II

Page 2: 15-744: Computer Networking L-14 Data Center Networking II

Overview

• Data Center Topologies

• Data Center Scheduling

2

Page 3: 15-744: Computer Networking L-14 Data Center Networking II

Current solutions for increasing data center network bandwidth

3

1. Hard to construct 2. Hard to expand

FatTree BCube

Page 4: 15-744: Computer Networking L-14 Data Center Networking II

Background: Fat-Tree• Inter-connect racks (of servers) using a fat-tree topology• Fat-Tree: a special type of Clos Networks (after C. Clos)

K-ary fat tree: three-layer topology (edge, aggregation and core)• each pod consists of (k/2)2 servers & 2 layers of k/2 k-port

switches• each edge switch connects to k/2 servers & k/2 aggr. switches • each aggr. switch connects to k/2 edge & k/2 core switches• (k/2)2 core switches: each connects to k pods

4

Fat-tree with K=4

Page 5: 15-744: Computer Networking L-14 Data Center Networking II

Why Fat-Tree?• Fat tree has identical bandwidth at any bisections

• Each layer has the same aggregated bandwidth

• Can be built using cheap devices with uniform capacity• Each port supports same speed as end host• All devices can transmit at line speed if packets are distributed uniform

along available paths

• Great scalability: k-port switch supports k3/4 servers

5

Fat tree network with K = 3 supporting 54 hosts

Page 6: 15-744: Computer Networking L-14 Data Center Networking II

An alternative: hybrid packet/circuit switched data center network

6

Goal of this work: – Feasibility: software design that enables efficient use of optical

circuits– Applicability: application performance over a hybrid network

Page 7: 15-744: Computer Networking L-14 Data Center Networking II

Electrical packet switching

Optical circuit switching

Switching technology

Store and forward Circuit switching

Switching capacity

Switching time

Optical circuit switching v.s. Electrical packet switching

7

16x40Gbps at high end e.g. Cisco CRS-1

320x100Gbps on market, e.g. Calient FiberConnect

Packet granularity Less than 10ms

e.g. MEMS optical switch

Page 8: 15-744: Computer Networking L-14 Data Center Networking II

Optical Circuit Switch

2010-09-02 SIGCOMM Nathan Farrington 8

Lenses FixedMirror

Mirrors on Motors

Glass FiberBundle

Input 1Output 2Output 1

Rotate MirrorRotate Mirror• Does not decode

packets• Needs take time

to reconfigure

Page 9: 15-744: Computer Networking L-14 Data Center Networking II

Electrical packet switching

Optical circuit switching

Switching technology

Store and forward Circuit switching

Switching capacity

Switching time

Switching traffic

Optical circuit switching v.s. Optical circuit switching v.s. Electrical packet switchingElectrical packet switching

9

16x40Gbps at high end e.g. Cisco CRS-1

320x100Gbps on market, e.g. Calient FiberConnect

Packet granularity Less than 10ms

For bursty, uniform traffic For stable, pair-wise traffic

Page 10: 15-744: Computer Networking L-14 Data Center Networking II

Optical circuit switching is promising despite slow switching time

• [IMC09][HotNets09]: “Only a few ToRs are hot and most their traffic goes to a few other ToRs. …”

• [WREN09]: “…we find that traffic at the five edge switches exhibit an ON/OFF pattern… ”

10

Full bisection bandwidth at packet granularitymay not be necessary

Page 11: 15-744: Computer Networking L-14 Data Center Networking II

Hybrid packet/circuit switched network architecture

Optical circuit-switched network for high capacity transfer

Electrical packet-switched network for low latency delivery

Optical paths are provisioned rack-to-rack– A simple and cost-effective choice – Aggregate traffic on per-rack basis to better utilize optical circuits

Page 12: 15-744: Computer Networking L-14 Data Center Networking II

Design requirements

12

Control plane:– Traffic demand estimation – Optical circuit configuration

Data plane:– Dynamic traffic de-multiplexing– Optimizing circuit utilization

(optional)

Traffic demands

Page 13: 15-744: Computer Networking L-14 Data Center Networking II

c-Through (a specific design)

13

No modification to applications and switches

Leverage end-hosts for traffic

management Centralized control for circuit configuration

Page 14: 15-744: Computer Networking L-14 Data Center Networking II

c-Through - traffic demand estimation and traffic batching

14

Per-rack traffic demand vector

2. Packets are buffered per-flow to avoid HOL blocking.

1. Transparent to applications.

Applications

Accomplish two requirements: – Traffic demand estimation – Pre-batch data to improve optical circuit utilization

Socket buffers

Page 15: 15-744: Computer Networking L-14 Data Center Networking II

c-Through - optical circuit configuration

15

Use Edmonds’ algorithm to compute optimal configuration

Many ways to reduce the control traffic overhead

Traffic demand

configuration

Controller

configuration

Page 16: 15-744: Computer Networking L-14 Data Center Networking II

c-Through - traffic de-multiplexing

16

VLAN #1

Traffic de-multiplexer

VLAN #1 VLAN #2

circuit configuration

traffic

VLAN #2

VLAN-based network isolation:

– No need to modify switches

– Avoid the instability caused by circuit reconfiguration

Traffic control on hosts:– Controller informs hosts

about the circuit configuration

– End-hosts tag packets accordingly

Page 17: 15-744: Computer Networking L-14 Data Center Networking II

17

Page 18: 15-744: Computer Networking L-14 Data Center Networking II
Page 19: 15-744: Computer Networking L-14 Data Center Networking II
Page 20: 15-744: Computer Networking L-14 Data Center Networking II
Page 21: 15-744: Computer Networking L-14 Data Center Networking II
Page 22: 15-744: Computer Networking L-14 Data Center Networking II

Overview

• Data Center Topologies

• Data Center Scheduling

22

Page 23: 15-744: Computer Networking L-14 Data Center Networking II

Datacenters and OLDIs

OLDI = OnLine Data Intensive applications

e.g., Web search, retail, advertisements

An important class of datacenter applications

Vital to many Internet companies

OLDIs are critical datacenter applications

Page 24: 15-744: Computer Networking L-14 Data Center Networking II

OLDIs

• Partition-aggregate • Tree-like structure• Root node sends query• Leaf nodes respond with data

• Deadline budget split among nodes and network• E.g., total = 300 ms, parents-leaf RPC = 50 ms

• Missed deadlines incomplete responses affect user experience & revenue

Page 25: 15-744: Computer Networking L-14 Data Center Networking II

Challenges Posed by OLDIs

Two important properties:

1)Deadline bound (e.g., 300 ms)• Missed deadlines affect revenue

2)Fan-in bursts Large data, 1000s of servers

Tree-like structure (high fan-in) Fan-in bursts long “tail latency”

Network shared with many apps (OLDI and non-OLDI)

Network must meet deadlines & handle fan-in bursts

Page 26: 15-744: Computer Networking L-14 Data Center Networking II

Current Approaches

TCP: deadline agnostic, long tail latency Congestion timeouts (slow), ECN (coarse)

Datacenter TCP (DCTCP) [SIGCOMM '10]

first to comprehensively address tail latency

Finely vary sending rate based on extent of

congestion

shortens tail latency, but is not deadline aware

~25% missed deadlines at high fan-in & tight

deadlinesDCTCP handles fan-in bursts, but is not deadline-aware

Page 27: 15-744: Computer Networking L-14 Data Center Networking II

D2TCP

Deadline-aware and handles fan-in bursts

Key Idea: Vary sending rate based on both deadline

and extent of congestion

Built on top of DCTCP

Distributed: uses per-flow state at end hosts

Reactive: senders react to congestion

no knowledge of other flows

Page 28: 15-744: Computer Networking L-14 Data Center Networking II

D2TCP’s Contributions

1) Deadline-aware and handles fan-in bursts Elegant gamma-correction for congestion avoidance

far-deadline back off more near-deadline back off less

Reactive, decentralized, state (end hosts)

• Does not hinder long-lived (non-deadline) flows

• Coexists with TCP incrementally deployable

• No change to switch hardware deployable today

D2TCP achieves 75% and 50% fewer missed deadlines than DCTCP and D3

Page 29: 15-744: Computer Networking L-14 Data Center Networking II

Coflow Definition

Page 30: 15-744: Computer Networking L-14 Data Center Networking II

30

Page 31: 15-744: Computer Networking L-14 Data Center Networking II

31

Page 32: 15-744: Computer Networking L-14 Data Center Networking II

32

Page 33: 15-744: Computer Networking L-14 Data Center Networking II

33

Page 34: 15-744: Computer Networking L-14 Data Center Networking II

34

Page 35: 15-744: Computer Networking L-14 Data Center Networking II

35

Page 36: 15-744: Computer Networking L-14 Data Center Networking II

36

Page 37: 15-744: Computer Networking L-14 Data Center Networking II

37

Page 38: 15-744: Computer Networking L-14 Data Center Networking II

Data Center Summary

• Topology• Easy deployment/costs• High bi-section bandwidth makes placement less

critical• Augment on-demand to deal with hot-spots

• Scheduling• Delays are critical in the data center• Can try to handle this in congestion control• Can try to prioritize traffic in switches• May need to consider dependencies across flows

to improve scheduling

38

Page 39: 15-744: Computer Networking L-14 Data Center Networking II

Review

• Networking background• OSPF, RIP, TCP, etc.

• Design principles and architecture• E2E and Clark

• Routing/Topology• BGP, powerlaws, HOT topology

39

Page 40: 15-744: Computer Networking L-14 Data Center Networking II

Review• Resource allocation

• Congestion control and TCP performance• FQ/CSFQ/XCP

• Network evolution• Overlays and architectures• Openflow and click• SDN concepts• NFV and middleboxes

• Data centers• Routing• Topology• TCP• Scheduling

40