intra-domain te via igp metric tuning who i am andrew lange exodus, a cable & wireless service...
Post on 11-Jan-2016
212 Views
Preview:
TRANSCRIPT
Intra-Domain TE via IGP Metric Tuning
Who I Am
Andrew Lange Exodus, a Cable & Wireless Service Principal Network Architect andrewl@exodus.net/
andrewl@cw.net Successfully navigated the
straights of Chapter 11, between Scylla & Charybdis...and somehow ended up in Britain.
What this IS
This IS an introduction to the wonderful world of using flow information to tune your IGP metrics.
What this is NOT
This is NOT an end-all be-all guide to how to optimize your IGP metrics.
Problem We're Trying to Solve How can we maximally* utilize our
network infrastructure, without adding the complications associated with MPLS?
Can this even be done? Well, of course it can, or this
presentation would be remarkably short.
Why? The more we can get out of our
network the more cost effective it becomes, and the happier the finance people get. Plus, it's cool.
*Maximally
Maximally is a thorny term. Long story short: Optimum network flows can be
represented as shortest paths with respect to a set of positive link weights (Wang, Wang & Zhang).
With current IGP's, determining optimum is NP-Hard, BUT, very close (within 3%) approximations can be made (Fortz & Thorup).
Scholarship Abounds
The first concrete way of doing this that we ran across was in Fortz & Thorup's paper Internet Traffic Engineering by Optimizing OSPF Weights.
This literature tends to be quite recent (1999 and newer).
How easy it is to determine the optimal values very much depends on what your network and flows look like.
What is Required
Accurate flow data between each set of backbone nodes.
An optimization routine to apply to the flow data.
Getting the Flow Data
Getting the Data - Tool Selection Using Netflow is the only way to
gather a traffic matrix without using an overlay design.
Looked at a variety of options, including building our own, and settled on the Ixia (nee Caimis) product.
Getting the Data - Our Issues Vendor HW/SW combinations are not
always supported with the netflow feature set.
Full deployment of the tools is pending operational deployment of the right code base.
Sampling rate needs to be grossed up.
Getting the Data - Configuration How have we configured our
collectors? Data is collected on the interfaces
inbound into the backbone routers from the datacenter.
Flow data is sampled at 1:100. Collectors peer with the backbone
routers as route-reflector clients. Collectors gather, among other
things, BGP NextHop information.
Node Overview
BBR
BACKBONE
PeersBBR
IBRTransit Fabric
DCRDCR
Customer Service Routers
Munging the Data- Basics
How do we process the collected data? Data is summarized daily. To assemble the flow matrix, data
is aggregated across the interfaces and the routers for a given site. There are some problems with this.
Munging the Data - Problems Data is an average, peak utilization is
not available. This is probably ok for this application, since average and peak tend to follow the same proportions. But we're working on getting peak to compare the results using that data.
Assumes both routers function as one (Nodal Aggregation). This is useful to simplify things as we first work out the models, but we will need to get more detailed as the models are refined.
Munging the Data - Aggregating the Flows Aggregated daily summaries are post
processed with a script that correlates BGP NextHop with destination datacenter and combines the flows destined for that datacenter.
Currently does not gross up flow size to compensate for sampling.
Flow Matrix - Example (sntc08)IDC Bits/secondSNTC04 21,696WHKN01 31,859AUST01 2,295SNTC01 257,722ELSG01 31,298SNTC03 21,492SNTC05 164,455Etc...
Building a Model
Offline vs. Online
We have chosen to pursue offline metric optimization. Online, or dynamic, metric
optimization imposes a whole other set of requirements, such as speed of the optimization model, and lot of trust.
At least at this point, our target for intra-domain TE is in the medium/long term timeframe. If we are running our network so hot that we have to reoptimize multiple times a day, then we need more bandwidth.
Modeling Assumptions
Model assumes that when flows grow the proportions remain the same.
Model does not take flow splitting (ECMP) into account currently. Except ECMP between two adjacent nodes, which is represented by increasing the size of the link between them in the model.
What follows is an Example 10 nodes, 15 links (30 arcs). 10 demand sets. Real backbone network would be
more complicated, but findings still hold.
Example Data
Because we were not able to poll the full matrix of data from the network, the data we're using for this example is extrapolated from the flow data we do have. It is only approximate.
Example Network Diagram
SEA
LAX
SVA
CHI
DLS
ATL
BOS
WDC
NYC
MIA
Example Network Info
All links are OC48. There are no nodal constraints (i.e.
Routers are assumed to be able to push line rate.)
Base Demands
Flow Amount (Mbps)sva- - >sea 57.63sva- - >nyc 595.51sva- - >wdc 461.04nyc- - >sva 479.78nyc- - >wdc 500.64wdc- - >sva 285.2wdc- - >nyc 384.4sea- - >mia 2.54sea- - >sva 58.42mia- - >sea 2.19
Current Metrics
Current Metrics are agnostic to flow information (based on RTT between nodes).
Under current loads the example network is nicely overprovisioned.
We're going to focus on how much more load we can put on this network before any link exceeds 80% utilization (to allow for microbursts, etc.) This is 1990 Mbps for an OC48. We are going to do this by increasing the values of the Base Demands.
Shortest Paths - Current MetricsFlow Pathmia- - >sea mia- - >atl- - >dls- - >chi- - >seanyc- - >sva nyc- - >wdc- - >svanyc- - >wdc nyc- - >wdcsea- - >mia sea- - >chi- - >dls- - >atl- - >miasea- - >sva sea- - >svasva- - >nyc sva- - >wdc- - >nycsva- - >sea sva- - >seasva- - >wdc sva- - >wdcwdc- - >nyc wdc- - >nycwdc- - >sva wdc- - >sva
Link Loading - Current MetricsLink Current Loadatl- - >dls 2.19 4.18
atl- - >mia 2.54 4.78
chi- - >dls 2.54 4.78chi- - >sea 2.19 4.18
dls- - >atl 2.54 4.78dls- - >chi 2.19 4.18
mia- - >atl 2.19 4.18nyc- - >wdc 980.42 1843.19
sea- - >chi 2.54 4.78sea- - >sva 58.42 109.83
sva- - >sea 57.63 108.34sva- - >wdc 1056.55 1986.32
wdc- - >nyc 979.91 1842.23wdc- - >sva 764.98 1438.17
Max Load (+88%)
Abracadabra!
Sample run of one of the models: ampl: model fixtwo-int.mod; data
cap-3.3.dat; solve; CPLEX 7.1.0: optimal integer
solution; objective 12909.92 31 MIP simplex iterations 0 branch-and-bound nodes
A Bit Behind the Curtain
Using AMPL/CPLEX to define the models. This consists of a model file specifying
the model's: Objective (e.g. Minimize overall network load).
Constraints (e.g. Do not exceed capacity on links.)
And a data file, which specifies: What the network looks like. What the demands are.
Shortest Paths - New MetricsFlow Pathmia- - >sea mia- - >nyc- - >chi- - >seanyc- - >sva nyc- - >chi- - >svanyc- - >wdc nyc- - >wdcsea- - >mia sea- - >chi- - >nyc- - >miasea- - >sva sea- - >svasva- - >nyc sva- - >chi- - >nycsva- - >sea sva- - >seasva- - >wdc sva- - >wdcwdc- - >nyc wdc- - >nycwdc- - >sva wdc- - >sva
Link Loading - New Metrics
Link Current Loadchi- - >nyc 598.05 1973.56
chi- - >sea 2.19 7.23
chi- - >sva 479.78 1583.27mia- - >nyc 2.19 7.23
nyc- - >chi 481.97 1590.5nyc- - >mia 2.54 8.38
nyc- - >wdc 500.64 1652.11sea- - >chi 2.54 8.38
sea- - >sva 58.42 192.79sva- - >chi 595.51 1965.18
sva- - >sea 57.63 190.18sva- - >wdc 461.04 1521.43
wdc- - >nyc 384.4 1268.52wdc- - >sva 285.2 941.16
Max Load (+330%)
Resources and Thanks
Optimization Resources - Papers Sample Papers:
Internet Traffic Engineering by Optimizing OSPF Weights, Fortz & Thorup.
Internet Traffic Engineering without Full Mesh Overlaying, Wang, Wang & Zhang.
Dynamic Optimization of OSPF Weights using Online Simulation, Ye, et. al.
Optimization Resources - Tools Mathematical Programming Tools
AMPL/CPLEX (www.ampl.com) OPL/CPLEX
(http://www.ilog.com/products/oplstudio/)
Special Thanks To
Dr. Irv Lustig, for invaluable help with the modeling languages.
top related