1 rethinking network control & management the case for a new 4d architecture david a. maltz...
TRANSCRIPT
1
Rethinking Network Control & ManagementThe Case for a New 4D Architecture
David A. MaltzCarnegie Mellon University
Joint work withAlbert Greenberg, Gisli Hjalmtysson
Andy Myers, Jennifer Rexford, Geoffrey Xie,
Hong Yan, Jibin Zhan, Hui Zhang
2
Is the Network Down Again?
You sit at your home computer, trying to access a computer at work…
…But no data is getting through
Minutes or hours later, data flows again…
…You never find out why
Network operators aren’t much better at predicting outages …
3
OutlineWhat do networks look like today?
New approach to predicting network behavior
A new architecture for controlling networks
4
Many Kinds of Networks
Each has different• Size – generally 10-1000 routers each• Owner – company, university, organization • Topology – mesh, tree, ring Examples:• Enterprise/Campus networks• Access networks: DSL, cable modems• Metro networks: connect up biz in cities• Data center networks: disk arrays & servers• Transit/Backbone networks
5
A Conventional View of a Network
Physical topology is a graph of nodes and links
Run Dijkstra to find route to each node
A
G
DB J
FI
HEC
6
A Conventional View of a Network
A
G
DB J
FI
HEC
Physical topology is a graph of nodes and links
Run Dijkstra to find route to each node
Knowing how the routers are connected says almost
nothing about whether or not two hosts can communicate
7
Network Equipment
Boxes: router, switchLinks: Ethernet, SONET, T1, …
Picture from Internet2 Abilene Network
8
The Data Plane of a NetworkHosts/servers
Router/SwitchInterfaces
9
Packets
For this talk, networks traffic in packets• A sequence of bytes processed as a unit
Meta-data
Source AddressDestination AddrPort numbers….
User data
Pac
ket
10
The Data Plane of a Network
Forwarding Information Base (FIB)• Basically a look-up table, each entry is a route• Tests fields of packet and determines which
interface to send packet out
Destination NextHop
A left
B right
C left
11
The Data Plane of a Network
Packet Filter• Specific to a single interface• Tests fields of packet and determines whether to
permit or drop packet• Finer granularity than FIB – can test more fields,
even target specific applications
Permit A->B Drop C->B
12
The Data Plane of a Network
Many other mechanisms…• Queueing discipline• Packet transformers (e.g., address translation)
13
The Control Plane of a Network
Where do FIB entries come from?• A distributed system called the Control Plane
Control plane failures responsible for many of the longest, hardest to debug outages!
Destination NextHop
A left
B right
C left
14
The Control Plane of a Network
Routers run routing processes
FIB
RoutingProcess
15
The Control Plane of a Network
Adjacent processes exchange routing information• Information format defined by routing protocol• Many routing protocols: BGP, OSPF, RIP, EIGRP• Adjacent processes must use the same protocol
FIB
RoutingProcess
FIB
RoutingProcess
FIB
RoutingProcessA,B C,D
16
The Control Plane of a Network
Routing protocols define logic for computing routes• Combine all available information• Pick best route for each destination
FIB
RoutingProcess
FIB
RoutingProcess
FIB
RoutingProcessD D
Destination NextHop
D left
17
Control Plane Creates Resiliency
D
D left
RoutingProcess
D left
RoutingProcess
D left
RoutingProcess
D
D
D
18
Control Plane Creates Resiliency
D right
RoutingProcess
D left
RoutingProcess
D left
RoutingProcess
D
D
D
19
A Study of Operational Production Networks
How complicated/simple are real control planes?• What is the structure of the distributed system?
Use reverse-engineering methodology• There are few or no documents• The ones that exist are out-of-date
Anonymized configuration files for 31 active networks (>8,000 configuration files)
• 6 Tier-1 and Tier-2 Internet backbone networks• 25 enterprise networks• Sizes between 10 and 1,200 routers• 4 enterprise networks significantly larger than the
backbone networks
20
Excerpts from a Router Configuration File
interface Ethernet0
ip address 6.2.5.14 255.255.255.128
interface Serial1/0.5 point-to-point
ip address 6.2.2.85 255.255.255.252
ip access-group 143 in
frame-relay interface-dlci 28
router ospf 64
redistribute connected subnets
redistribute bgp 64780 metric 1 subnets
network 66.251.75.128 0.0.0.127 area 0router bgp 64780 redistribute ospf 64 match route-map 8aTzlvBrbaW neighbor 66.253.160.68 remote-as 12762 neighbor 66.253.160.68 distribute-list 4 in
access-list 143 deny 1.1.0.0/16access-list 143 permit anyroute-map 8aTzlvBrbaW deny 10 match ip address 4route-map 8aTzlvBrbaW permit 20 match ip address 7ip route 10.2.2.1/16 10.2.1.7
21
Size of Configuration Files in One Network
Router ID (sorted by file size)8810
Lines in
config file
2000
1000
0
22
Routing Processes Implement Policy
Extensive use of policy commands to filter routes• Prevent some hosts from communicating:
security policy• Limit access to short-cut links: resource policy
FIB
RoutingProcess
FIB
RoutingProcess
FIB
RoutingProcessA,B
A
R1 R2 R3
23
Packet Filters Implement PolicyPacket filters used extensively throughout networks• Protect routers from attack• Implement reachability matrix
– Define which hosts can communicate
– Localize traffic, particularly multicast
25
Multiple Interacting Routing Processes
OSPF BGP OSPF
FIBFIB
OSPF
FIB
OSPF
FIB
OSPF
FIB
OSPF
EB
GPPolicy1 Policy2
Internet
ClientServer
26
The Routing Instance Graph of a 881 Router Network
27
Take Away PointsNetworks deal with both creating connectivity
and preventing it
Networks controlled by complex distributed systems• Must understand system to understand behaviorFocusing on individual protocols is not enough• Composition of protocols is important and complexDeveloped abstractions to model routing design• Routing Process Graph – accurately model design• Routing Instance – abstracts away details• Reverse-engineer routing design from configs
28
OutlineWhat do networks look like today?
New approach to predicting network behavior• Frame the problem of reachability analysis• Sketch algebra for predicting reachability
A new architecture for controlling networks
29
Reachability
Can A send a packet to B?• Depends on routing protocols, advertised
routes, policies, packet filters, ...
Predicting reachability is key to network survivability and security
i j
A B
30
Reachability
We focus on two types of policy:
– Survivability: Certain packets should always be
permitted, under all possible network states
– Security: Certain packets should never be
permitted, under all possible network states
i j
A B
31
Reachability Example
• Two locations, each with data center & front office• All routers exchange routes over all links
R1 R2
R5
R4R3
Chicago (chi)
New York (nyc)Data Center Front Office
32
Reachability Example
R1 R2
R5
R4R3
Chicago (chi)
New York (nyc)Data Center
chi-DC
chi-FO
nyc-DC
nyc-FO
chi-DC
chi-FO
nyc-DC
nyc-FO
Front Office
33
Reachability Example
R1 R2
R5
R4R3
Data Center
chi-DC
chi-FO
nyc-DC
nyc-FO
chi-DC
chi-FO
nyc-DC
nyc-FO
Packet filter:Drop nyc-FO -> *Permit *
Packet filter:Drop chi-FO -> *Permit *
Front Office
chi
nyc
34
Reachability Example
A new short-cut link added between data centers• Intended for backup traffic between centers
R1 R2
R5
R4R3
Data Center
Packet filter:Drop nyc-FO -> *Permit *
Packet filter:Drop chi-FO -> *Permit *
Front Office
chi
nyc
35
Reachability Example
Oops – new link lets packets violate security policy!• Routing changed, but• Packet filters don’t update automatically
R1 R2
R5
R4R3
Data Center
Packet filter:Drop nyc-FO -> *Permit *
Packet filter:Drop chi-FO -> *Permit *
Front Office
chi
nyc
36
Reachability Example
Typical response – add more packet filters to plug the holes in security policy
R1 R2
R5
R4R3
Data Center Front Office
chi
nyc
Packet filter:Drop nyc-FO -> *Permit *
Packet filter:Drop chi-FO -> *Permit *
37
Reachability Example
Packet filters have surprising consequences• Consider a link failure• chi-FO and nyc-FO still connected
R1 R2
R5
R4R3
Data Center
Drop nyc-FO -> *
Front Office
chi
nycDrop chi-FO -> *
38
Reachability Example
Network has less survivability than topology suggests• chi-FO and nyc-FO still connected• But packet filter means no data can flow!• Probing the network won’t predict this problem
R1 R2
R5
R4R3
Data Center
Drop nyc-FO -> *
Front Office
chi
nycDrop chi-FO -> *
39
State of the Art in Reachability Analysis
Build the network, try sending packets• ping, traceroute, monitoring tools
Only checks paths currently selected by routing
protocols
• Cannot be used for “what if” analysis
Our goal: Static Reachability Analysis
• Predict reachability over multiple scenarios
through analysis of router configuration files
40
Predicting Reachability
How can we formalize the reachability provided by a network?
• The set of packets the network will carry from router i to router j
• A function of the forwarding state s• s represents the contents of each FIB• Ri,j(s) is the instantaneous reachability
i jRi,j(s)
41
Computing Reachability
R1
F 4,3(s)
F2,1(s)
F2
,3 (s)
F3
,2(s
)
R3
R2
R4
F1,2(s)
F 3,4(s) Fi,j(s): Set of packets
permitted along link from node i to node j in network state s
Packets allowed along path
The set of all paths from i to j
42
Jointly Modeling the Effects of Packet Filters and Routing
Key Problem:• Fi,j(s) affected by routing and packet filtersKey Insight: • Treat routes as dynamic packet filters
R1 R3R2
Dest NextHop
A R3
B R1
C R3
Permit *->APermit *->CDrop *->*
Permit *->BDrop *->*
43
Bounding the Instantaneous Reachability
Knowing the exact forwarding state s is impractical
Knowing Ri,j(s) doesn’t help much, anyway• Want to predict behavior over a range of states
Luckily, predicting behavior over set of all possible states is easier than predicting reachability for a single state
44
Reachability Bounds
Lower bound on Reachability
Packets in this set never prohibited by network
Upper bound on Reachability
Packets not in this set always prohibited by network
45
Example Upper Bound Analysis
Before short-cut link added:
After short-cut link added:
R1 R2
R5
R4R3
Packet filter:Drop nyc-FO -> *Permit *
Packet filter:Drop chi-FO -> *Permit *
chi
nyc
46
Example Lower Bound Analysis
Before extra packet filters added:
After extra packet filters added:
R1R2
R5
R4R3
Packet filter:Drop nyc-FO -> *Permit *
Packet filter:Drop chi-FO -> *Permit *
chi
nyc
47
Take Away PointsWe have defined an algebra for modeling reachability• Packet filters, routing protocols, NAT• Griffin&Bush validated RFC 2547 VPNsStatus• Algebra works on test cases• Currently experimenting with production networksAlgebra’s strength and weakness is static analysis• Can validate that network meets static objectives• Can have false positives• Cannot design the network to meet objectives• Cannot control network to obey dynamic objectives
48
OutlineWhat do networks look like today?
New approach to predicting network behavior
A new architecture for controlling networks• New principles for network control• New architecture embodying those principles• Experimental validation
49
Does Network Control Actually Matter?
YES!
• Microsoft: All services fell off the network for 23 hours due to misconfiguration of routers in their network (2001)
• Major ISP: 50% of outages occur during planned maintenance (2005)
• IP networks have 2-3x the outages as circuit-switched networks (2005)
50
Three Principles forNetwork Control & Management
Network-level Objectives:• Express goals explicitly• Security policies, QoS, egress point selection• Do not bury goals in box-specific configuration
ManagementLogic
Reachability matrixTraffic engineering rules
51
Three Principles forNetwork Control & Management
Network-wide Views:• Design network to provide timely, accurate info• Topology, traffic, resource limitations• Give logic the inputs it needs
ManagementLogic
Reachability matrixTraffic engineering rules
Read state info
52
Three Principles forNetwork Control & Management
Direct Control:• Allow logic to directly set forwarding state• FIB entries, packet filters, queuing parameters• Logic computes desired network state, let it
implement it
ManagementLogic
Reachability matrixTraffic engineering rules
Read state info
Write state
53
Overview of the 4D Architecture
Decision Plane:• All management logic implemented on centralized servers
making all decisions• Decision Elements use views to compute data plane
state that meets objectives, then directly writes this state to routers
Decision
Dissemination
Discovery
Data
Network-level objectives
Direct control
Network-wide views
54
Overview of the 4D Architecture
Dissemination Plane:• Provides a robust communication channel to each
router• May run over same links as user data, but logically
separate and independently controlled
Decision
Dissemination
Discovery
Data
Network-level objectives
Direct control
Network-wide views
55
Overview of the 4D Architecture
Discovery Plane:• Each router discovers its own resources and
its local environment• E.g., the identity of its immediate neighbors
Decision
Dissemination
Discovery
Data
Network-level objectives
Direct control
Network-wide views
56
Overview of the 4D Architecture
Data Plane:• Spatially distributed routers/switches• No need to change today’s technology
Decision
Dissemination
Discovery
Data
Network-level objectives
Direct control
Network-wide views
57
Control & Management Today
Data Plane• Distributed routers• Forwarding, filtering, queueing• Based on FIB or labels
Management Plane• Figure out what is
happening in network• Decide how to change it
Shell scripts Traffic Eng
DatabasesPlanning tools
OSPFSNMP netflowConfig files
OSPFBGP
Link metrics
OSPFBGP
OSPFBGP
Control Plane• Multiple routing processes
on each router• Each router with different
configuration program• Huge number of control
knobs: metrics, ACLs, policy
FIB
FIB
FIB
Routing policies
Packet filters
58
Good Abstractions Reduce Complexity
All decision making logic lifted out of control plane• Eliminates duplicate logic in management plane• Dissemination plane provides robust
communication to/from data plane routers
ManagementPlane
Control Plane
Data Plane
DecisionPlane
Dissemination
Data Plane
Configs
FIBs, ACLs FIBs, ACLs
59
Three Key Questions
• Could the 4D architecture ever be deployed?
• Is the 4D architecture feasible?
• Can the 4D architecture actually simplify network control and management?
60
Deployment of the 4D Architecture
Pre-existing industry trend towards separating router hardware from software
• IETF: FORCES, GSMP, GMPLS• SoftRouter [Lakshman, HotNets’04]
Incremental deployment path exists• Individual networks can upgrade to 4D and
gain benefits• Small enterprise networks have most to gain
61
The Feasibility of the 4D Architecture
We designed and built a prototype of the 4D
Decision plane• Contains logic to simultaneously compute
routes and enforce reachability matrix• Multiple Decision Elements per network, using
simple election protocol to pick master
Dissemination plane• Uses source routes to direct control messages• Extremely simple, but can route around failed
data links
62
Performance of the 4D Prototype
Evaluated using Emulab (www.emulab.net)• Linux PCs used as routers (650 – 800MHz)• Tested on 9 enterprise network topologies
(10-100 routers each)Recovers from single link failure in < 300 ms• < 1 s response considered “excellent”Survives failure of master Decision Element • New DE takes control within 1 s• No disruption unless second fault occursGracefully handles complete network partitions• Less than 1.5 s of outage
63
4D Makes Network Management & Control Error-proof
R1 R2
R5
R4R3
Packet filter:Drop nyc-FO -> *Permit *
Packet filter:Drop chi-FO -> *Permit *
chi
nyc
Data Center Front Office
chi-DC
chi-FO
nyc-DC
nyc-FO
chi-DC
chi-FO
nyc-DC
nyc-FO
64
Prohibiting Packets from chi-FO to nyc-DC
65
4D Makes Network Management & Control Error-proof
R1 R2
R5
R4R3
Data Center
Drop nyc-FO -> *
Front Office
chi
nycDrop chi-FO -> *
66
Allowing Packets from chi-FO to nyc-FO
67
Related Work
• Driving network operation from network-wide views– Traffic Engineering– Traffic Matrix computation
• Centralization of decision making logic– Routing Control Point [Feamster]– Path Computation Element [Farrel]– Signaling System 7 [Ma Bell]
68
Take AwaysNo need for complicated distributed system in
control plane – do away with it!
4D Architecture a promising approachPower of solution comes from:• Colocating all decision making in one plane• Providing that plane with network-wide views• Directly express solution by writing forwarding state
Benefits• Coordinated state updates ! better reliability• Separates network issues from distributed systems
issues
69
Summary
Networks must meet many different types of objectives• Security, traffic engineering, robustness
Today, objectives met using control plane mechanisms• Results in complicated distributed system• Ripe with opportunities to set time-bombs• Predicting static properties is possible, but difficult
Refactoring into a 4D Architecture very promising• Separates network issues from reliability issues• Eliminates duplicate logic and simplifies network• Enables new capabilities, like joint control
70
Questions?
71
Backup Slides
72
Computing Reachability Bounds
• Problem reduced to estimating all routes potentially in routing table (FIB) of each router
• Much easier than predicting exactly which routes will be in FIB
73
How to Organize the Decision Plane?
We have exposed the network control logic --- now what?
Need a way to structure that logic• Mutual optimization of multiple objectives
– Potentially mutually exclusive
• Each objective has different time constants• Multiple objectives may affect the same bit of
data-plane state
74
Future Directions
4D in different network contexts
• Ethernet networks
• Mixed networks: circuit- and packet-switched
Include services in the 4D
• Domain Name Service
• HTTP Proxies and load balancers
75
Reverse-Engineering Overview
Configuration files
Find links
Find adjacent routing processes
Construct Routing Process Graph
Condense adjacent routing processes
Construct Routing Instance Graph
Construct Layer 3 Topology
OSPF #1 OSPF #2BGP AS1
AS2
76
Reconstruct the Layer 3 Topology
interface Serial1/0.5
ip address 1.1.1.1 255.255.255.252
….
Router 1 Config
interface Serial2/1.5
ip address 1.1.1.2 255.255.255.252
….
Router 2 Config
Internet
77
Abstract to a Routing Instance Graph
• Pick an unassigned Routing Process• Flood fill along process adjacencies, labeling processes• Repeat until all processes assigned to an Instance
OSPF #1 OSPF #2BGP AS1
EBGPAS2
Policy1 Policy2
OSPF BGP OSPF
Route TableRT
OSPF
RT
OSPF
RT
OSPF
RT
OSPF
78
Textbook Routing Design for Enterprise Networks
• Border routers speak eBGP to external peers• BGP selects a few key external routes to redistribute
into OSPF• 7 of 25 enterprise networks follow this pattern
OSPFBGP
AS #1
EBGPEBGP
AS2
AS3
79
Reality: A Diversity of Unusual Routing Designs
• Network broken up into compartments, each with only 1 to 4 routers
• Each compartment has its own AS number• Hub and spoke logical topology• Why? Lots of control over how spokes communicate
BGPAS #1
BGPAS #4
BGPAS #2
BGPAS #3
BGPAS #5
EBGPEBGP
EBGP
EBGP
Rest of the
World
Rest of the
World
80
Reality: A Diversity of Unusual Routing Designs
• Network broken up into many compartments, each running EIGRP, some with 400+ routers
• BGP used to filter routes passed between compartments• Compartments themselves pass information between BGP speakers• Why? Little need for IBGP; few routers speak BGP; Lots of control
over how packets move between compartments
BGPAS #1
EBGP
EBGPRest of
the World
Rest of the
World
EIGRP
BGPAS #2
EIGRPEIGRP
BGPAS #3
BGPAS #4
Rest of the
World
Rest of the
World
EBGP
EBGP
81
Link Down
82
Reconvergence Time UnderSingle Link Failure
83
Reconvergence Time When Master DE Crashes
84
Reconvergence Time WhenNetwork Partitions
85
Reconvergence Time WhenNetwork Partitions
86
Slides in Progressor Looking for a Place to go
87
Separation of Issues
The 4D Architecture separates issues• Networking logic goes into decision plane
88
Dissemination Plane
Make clear that dissem paths can use same physical links, but different routing
Discovery and dissem packets can be independent of data-plane (e.g. IP)
IP is very configuration intensive (addresses, etc) so we avoid it whenever possible
89
Questions
What if I want to take a bunch of hosts and stick them together into a small network? Haven’t you made this common case terrifically hard?
• Today, I’d use static routes – it’s neither common nor easy
• In the 4D model, what do I do?– DE co-located on the host– Doesn’t talk to any other DEs or routers
90
Problems with State of the Art
Today: Network behavior determined by multiple interacting distributed programs, written in assembly language
• No way to visualize or describe routing design• Impossible to establish linkage between
configurations and network objectives• Only a few “textbook” routing designs are
widely known