data center fabrics
DESCRIPTION
Data Center Fabrics. Forwarding Today. Layer 3 approach: Assign IP addresses to hosts hierarchically based on their directly connected switch. Use standard intra-domain routing protocols, eg . OSPF. Large administration overhead Layer 2 approach: Forwarding on flat MAC addresses - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Data Center Fabrics](https://reader035.vdocument.in/reader035/viewer/2022062815/56816934550346895de08c9a/html5/thumbnails/1.jpg)
Data Center Fabrics
![Page 2: Data Center Fabrics](https://reader035.vdocument.in/reader035/viewer/2022062815/56816934550346895de08c9a/html5/thumbnails/2.jpg)
Forwarding Today• Layer 3 approach:
– Assign IP addresses to hosts hierarchically based on their directly connected switch.
– Use standard intra-domain routing protocols, eg. OSPF.– Large administration overhead
• Layer 2 approach:• Forwarding on flat MAC addresses• Less administrative overhead • Bad scalability• Low performance
– Middle ground between layer 2 and layer 3:• VLAN• Feasible for smaller scale topologies• Resource partition problem
![Page 3: Data Center Fabrics](https://reader035.vdocument.in/reader035/viewer/2022062815/56816934550346895de08c9a/html5/thumbnails/3.jpg)
Requirements due to Virtualization• End host virtualization:– Needs to support large addresses and VM migrations– In layer 3 fabric, migrating the VM to a different switch
changes VM’s IP address– In layer 2 fabric, migrating VM incurs scaling ARP and
performing routing/forwarding on millions of flat MAC addresses.
![Page 4: Data Center Fabrics](https://reader035.vdocument.in/reader035/viewer/2022062815/56816934550346895de08c9a/html5/thumbnails/4.jpg)
Motivation
• Eliminate Over-subscription– Solution: Commodity switch hardware
• Virtual Machine Migration– Solution: Split IP address from location.
• Failure avoidance– Solution: Fast scalable routing
![Page 5: Data Center Fabrics](https://reader035.vdocument.in/reader035/viewer/2022062815/56816934550346895de08c9a/html5/thumbnails/5.jpg)
Architectural Similarities• Both approaches use indirection
– Application address doesn’t change when VM moves, all that changes in Location address – Location addresses: specifies location in network– Application address: specifies address of VM
• A network of commodity switches– Reduces energy consumptions– Allows to afford enough switches to eliminate overprovision
• Central entity to perform name resolution between Location address and application address– Directory Service: VL2– Fabric Manager: Portland– Both entities are triggered by ARP request.– Stores mapping of LA to AA
• Gateway devices – Perform encapsulation/decapsulation of external traffic
![Page 6: Data Center Fabrics](https://reader035.vdocument.in/reader035/viewer/2022062815/56816934550346895de08c9a/html5/thumbnails/6.jpg)
Architecture Differences• Routing
– VL2: Source routing based• Each packet contains the address of all switches to traverse
– Portland: topology based routing• Location addresses encoding location with the tree• Each switch is aware of how to decode location addresses
– Forwarding is based on this intimate knowledge.
• Indirection– VL2: Indirection is on L3: IP-in-IP encapsulation– Portland: Indirection is on L2: IP-to-Pmac
• ARP functionality:– Portland: ARP returns IP to Pmac– VL2: ARP returns a list of intermediate switches to traverse
![Page 7: Data Center Fabrics](https://reader035.vdocument.in/reader035/viewer/2022062815/56816934550346895de08c9a/html5/thumbnails/7.jpg)
Portland
![Page 8: Data Center Fabrics](https://reader035.vdocument.in/reader035/viewer/2022062815/56816934550346895de08c9a/html5/thumbnails/8.jpg)
Fat-Tree• Inter-connect racks (of servers) using a fat-tree topology• Fat-Tree: a special type of Clos Networks (after C. Clos)
K-ary fat tree: three-layer topology (edge, aggregation and core)– each pod consists of (k/2)2 servers & 2 layers of k/2 k-port switches– each edge switch connects to k/2 servers & k/2 aggr. switches – each aggr. switch connects to k/2 edge & k/2 core switches– (k/2)2 core switches: each connects to k pods
Fat-tree with K=2
8
![Page 9: Data Center Fabrics](https://reader035.vdocument.in/reader035/viewer/2022062815/56816934550346895de08c9a/html5/thumbnails/9.jpg)
Why?• Why Fat-Tree?
– Fat tree has identical bandwidth at any bisections– Each layer has the same aggregated bandwidth
• Can be built using cheap devices with uniform capacity– Each port supports same speed as end host– All devices can transmit at line speed if packets are distributed uniform along
available paths • Great scalability: k-port switch supports k3/4 servers
Fat tree network with K = 3 supporting 54 hosts
9
![Page 10: Data Center Fabrics](https://reader035.vdocument.in/reader035/viewer/2022062815/56816934550346895de08c9a/html5/thumbnails/10.jpg)
PortLandAssuming: a Fat-tree network topology for DC• Introduce “pseudo MAC addresses” to balance the pros and
cons of flat- vs. topology-dependent addressing• PMACs are “topology-dependent,” hierarchical addresses– But used only as “host locators,” not “host identities”– IP addresses used as “host identities” (for compatibility w/
apps)• Pros: small switch state & Seamless VM migration• Pros: “eliminate” flooding in both data & control planes• But requires a IP-to-PMAC mapping and name resolution– a location directory service
• And location discovery protocol & fabric manager– for support of “plug-&-play”
10
![Page 11: Data Center Fabrics](https://reader035.vdocument.in/reader035/viewer/2022062815/56816934550346895de08c9a/html5/thumbnails/11.jpg)
PMAC Addressing Scheme• PMAC (48 bits): pod.position.port.vmid
– Pod: 16 bits; position and port (8 bits); vmid: 16 bits• Assign only to servers (end-hosts) – by switches
11
pod
position
![Page 12: Data Center Fabrics](https://reader035.vdocument.in/reader035/viewer/2022062815/56816934550346895de08c9a/html5/thumbnails/12.jpg)
Location Discovery Protocol• Location Discovery Messages (LDMs) exchanged between neighboring switches• Switches self-discover location on boot up
Location Characteristics Technique Tree-level (edge, aggr. , core) auto-discovery via neighbor connectivity Position # aggregation switch help edge switches decide Pod # request (by pos. 0 switch only) to fabric manager
12
![Page 13: Data Center Fabrics](https://reader035.vdocument.in/reader035/viewer/2022062815/56816934550346895de08c9a/html5/thumbnails/13.jpg)
PortLand: Name Resolution• Edge switch listens to end hosts, and discover new source MACs• Installs <IP, PMAC> mappings, and informs fabric manager
13
![Page 14: Data Center Fabrics](https://reader035.vdocument.in/reader035/viewer/2022062815/56816934550346895de08c9a/html5/thumbnails/14.jpg)
PortLand: Name Resolution …• Edge switch intercepts ARP messages from end hosts• send request to fabric manager, which replies with PMAC
14
![Page 15: Data Center Fabrics](https://reader035.vdocument.in/reader035/viewer/2022062815/56816934550346895de08c9a/html5/thumbnails/15.jpg)
PortLand: Fabric Manager• fabric manager: logically centralized, multi-homed server• maintains topology and <IP,PMAC> mappings in “soft state”
15
![Page 16: Data Center Fabrics](https://reader035.vdocument.in/reader035/viewer/2022062815/56816934550346895de08c9a/html5/thumbnails/16.jpg)
VL2
![Page 17: Data Center Fabrics](https://reader035.vdocument.in/reader035/viewer/2022062815/56816934550346895de08c9a/html5/thumbnails/17.jpg)
Design: Clos Network• Same capacity at each layer
– No oversubscription• Many paths
available– Low sensitivity
to failures
![Page 18: Data Center Fabrics](https://reader035.vdocument.in/reader035/viewer/2022062815/56816934550346895de08c9a/html5/thumbnails/18.jpg)
Design: Separate Names from Locations
• Packet forwarding– VL2 agent (at host) traps packets and encapsulates them
• Address resolution– ARP requests converted to unicast to directory system– Cached for performance
• Access control (security policy) via the directory system
Directory SystemUser space
Kernel
Server Machine
Application VL2 AgentLookUp (AA)
IncapInfo (AA)
![Page 19: Data Center Fabrics](https://reader035.vdocument.in/reader035/viewer/2022062815/56816934550346895de08c9a/html5/thumbnails/19.jpg)
Design: Separate Names from Locations
![Page 20: Data Center Fabrics](https://reader035.vdocument.in/reader035/viewer/2022062815/56816934550346895de08c9a/html5/thumbnails/20.jpg)
Design : Valiant Load Balancing
• Each flow goes through a different random path
• Hot-spot free for tested TMs
![Page 21: Data Center Fabrics](https://reader035.vdocument.in/reader035/viewer/2022062815/56816934550346895de08c9a/html5/thumbnails/21.jpg)
Design : VL2 Directory System
• Built using servers from the data center• Two-tiered directory system architecture– Tier 1 : read optimized cache servers (directory
server)– Tier 2 : write optimized mapping servers (RSM)
![Page 22: Data Center Fabrics](https://reader035.vdocument.in/reader035/viewer/2022062815/56816934550346895de08c9a/html5/thumbnails/22.jpg)
Benefits + Drawbacks
![Page 23: Data Center Fabrics](https://reader035.vdocument.in/reader035/viewer/2022062815/56816934550346895de08c9a/html5/thumbnails/23.jpg)
Benefits
• VM migration– No need to worry L2 broadcast– Location+address dependence
• Revisiting fault tolerance– Placement requirements
![Page 24: Data Center Fabrics](https://reader035.vdocument.in/reader035/viewer/2022062815/56816934550346895de08c9a/html5/thumbnails/24.jpg)
Loop-free Forwarding and Fault-Tolerant Routing
• Switches build forwarding tables based on their position– edge, aggregation and core switches
• Use strict “up-down semantics” to ensure loop-free forwarding– Load-balancing: use any ECMP path via flow hashing to
ensure packet ordering• Fault-tolerant routing:– Mostly concerned with detecting failures– Fabric manager maintains logical fault matrix with per-link
connectivity info; inform affected switches– Affected switches re-compute forwarding tables
24
![Page 25: Data Center Fabrics](https://reader035.vdocument.in/reader035/viewer/2022062815/56816934550346895de08c9a/html5/thumbnails/25.jpg)
Draw Backs• Higher failures
– Commodity switches fail more frequently
• No straight forward way to expand– Expand in large increments, values of k
• Look-up servers– Additional infrastructure servers– Higher upfront startup latency
• Need special gateway servers
![Page 26: Data Center Fabrics](https://reader035.vdocument.in/reader035/viewer/2022062815/56816934550346895de08c9a/html5/thumbnails/26.jpg)
Draw Backs
• Higher failures– Commodity switches fail more frequently
• No straight forward way to expand– Expand in large increments, values of k
• Look-up servers– Additional infrastructure servers– Higher upfront startup latency
![Page 27: Data Center Fabrics](https://reader035.vdocument.in/reader035/viewer/2022062815/56816934550346895de08c9a/html5/thumbnails/27.jpg)
Draw Backs
• Higher failures– Commodity switches fail more frequently
• No straight forward way to expand– Expand in large increments, values of k
• Look-up servers– Additional infrastructure servers– Higher upfront startup latency