ip/f: a novel architecture for k t n l pbp rogrammable...
TRANSCRIPT
8/8/2011
1
TECH 6 – Approaches in Network Monitoring
iP/F: A Novel Architecture for P bl N t k Programmable Network
VisibilitySteve McCanne - CTO
riverbed
Note
• This presentation is for information purposes only and is not a commitment, promise or legal obligation to deliver any new products, features or functionality The development release and timing functionality. The development, release, and timing of any features or functionality described remains at Riverbed's sole discretion.
Some History
tcpdump
libpcapethereal
CACE
wireshark
CACE/Riverbed
’90 ’95 ’00 ’05
BPF winpcap CACE
’10
/
8/8/2011
2
Wireshark
• When CACE was acquired last fall, Riverbed became the sponsor of the wireshark project– Huge following and project momentum
• 5 million users• 500,000 downloads per month• Over 2 million lines of code
– Key component in Riverbed’s strategy• Continued support of ecosystem and openness
• But wireshark alone doesn’t fully address the broader problem of network visibility...
Information Overload
mrtgsnmp
events
A grab bag of point solutions...
reports
packetsnetflow
logs
Information Overload
snmp
events
mrtg
Creates a hay stack of information...
reports
packetsnetflow
logs
8/8/2011
3
The Challenge
• When lost in the hay stack, simple business-level questions cannot be answered...
• How well is the network delivering our critical applications?– How well did we do last week, month, quarter…?How well did we do last week, month, quarter…?– How can I communicate this to business leaders?
• Is there a specific problem in delivering these applications?– If so, where?– What caused or is causing the problem?– How do we fix it?
Status Quo
• Monitoring– Automatically detect
problems– Alert administrators– Spans whole network– Broad audience
• Troubleshooting– Investigate root cause of
problem after detection– Deep analysis of protocols
and applications– Isolated point in network– Experts only
Status Quo
• Monitoring– Typically done with
“flow”
• Troubleshooting– Typically done with
“packets”
8/8/2011
4
Just What is Flow?
WAN
specific collection points in the
network
data store
Just What is Flow?
flow records summarizingthe packets come out
at very low speed
packet traversethe network typicallyat very high speed
Flow Info
• From a set of collection points, gather flow snapshots every minute or so...– each flow identified by tuple (proto, address, port)
network device queried for active flows– network device queried for active flows– device returns list of stats to monitoring system
• Flow 1: #bytes, #pkts, #drops, ...• Flow 2: #bytes, #pkts, #drops, ...• ...• Flow N: #bytes, #pkts, #drops, ...
8/8/2011
5
Flow Example
• What can you do?– Take a metric like byte count– Divide by sampling interval. Plot as time series
time
band
wid
th
xx
x
Flow Example
• What can you do?– Take a metric like byte count– Divide by sampling interval. Plot as time series
time
band
wid
th
xx
x
WAN
The Status Quo
WAN
Remote Site Data Center
Flow Analyzer
8/8/2011
6
WAN
PacketAnalyzer
The Status Quo
WAN
Remote Site Data Center
Flow Analyzer
The Problem
• Monitoring and troubleshooting tackled from separate silos– Even when portrayed as coupled, existing
integration is often loose and hard to leverageintegration is often loose and hard to leverage– Based on decades of development in separate silos– It is REALLY HARD to go from knowing a problem
exists, to running wireshark at the right place and right time to diagnose the root cause
• We need to fix this...
A Shift
• A shift needs to happen• From point solutions
– The haystack of point tools• To a holistic system• To a holistic system
– A holistic and integrated system-wide view of the network and network services
– Not claiming to have solved all the problems, but we believe our new architectural approach is a great foundation for moving forward
8/8/2011
7
Enter Cascade and iP/F
Profiler (Centralized Analysis)
Gateway (Flow Collection)
FLOW-BASED VISIBILITY PACKET-BASED VISIBILITY
Pilot (Packet Analysis)
Sensor (Packet Inspection)Shark (Packet Capture)
REMOTE FLOW/PACKET VISIBILITY
Steelhead (Packet Capture & Flow Export)
E n t e r C a s c a d e a n d iP/F
Profiler (Centralized Analysis)
Gateway (Flow Collection)
FLOW-BASED VISIBILITY PACKET-BASED VISIBILITY
Pilot (Packet Analysis)iP/F:integrated packets
iP/F:integrated packets
Sensor (Packet Inspection)Shark (Packet Capture)
REMOTE FLOW/PACKET VISIBILITY
Steelhead (Packet Capture & Flow Export)
integrated packets and flow
integrated packets and flow
How hard can it be?
• On the surface, the network visibility problem just doesn’t seem that hard– Collect router/switch flow stats and dump to disk
Capture packets and dump to disk– Capture packets and dump to disk– Analyze the performance database
• Look through the hay• Find the needle
8/8/2011
8
Flow Overload
• Some flow math…500,000 flows / min x200 bytes / flow stat x
0 /1440 min/ day x7 days / week =~1 TB / week
• Some packet math…100 Mb/s x3600 sec / hr x
Packets even worse...
24 hours / day x =~11 TB / day
A data navigation problem
In TB’s of stored data, you need the three magic packets that will
l h blreveal the problem...
8/8/2011
9
A data navigation problem
And, these three packets could be in any of the numerous
k packet capture devices...
A data navigation problem
And, these three packets could be in any of the numerous
k packet capture devices...
WAN
iP/F: Flow Plus Packets
Remote Site Data Center
Single Logical Record
8/8/2011
10
How Can We Make Network Visibility As Easy As This?
It’s hard to make things easy
WAN
Remote Site Data Center
A hierarchical storage model
Packets(nsec)
Micro flow(1 sec)
Flow(1 min)
Macro flow(15+ min)
DISTRIBUTED CENTRALIZED
Distributed and centralized
WAN
Packets(nsec)
Micro flow(1 sec)
Flow(1 min)
Macro flow(15+ min)
8/8/2011
11
... and synchronized
DISTRIBUTED CENTRALIZED
WAN
Packets(nsec)
Micro flow(1 sec)
Flow(1 min)
Macro flow(15+ min)
Like a brain with many eyes...
DISTRIBUTED CENTRALIZED
WAN
Packets(nsec)
Micro flow(1 sec)
Flow(1 min)
Macro flow(15+ min)
The iP/F Architecture
packetcapture
packetsmicroflow
flowmacroflow
flowcollector
distributed centralized
storage layer
8/8/2011
12
The iP/F Architecture
computation layer
packetcapture
packetsmicroflow
flowmacroflow
flowcollector
flow analyticspacket analytics
distributed centralized
computation layer
The iP/F Architecture
REST API (XML/HTTP)REST API (XML/HTTP)
protocol layer
packetcapture
packetsmicroflow
flowmacroflow
flowcollector
flow analyticspacket analytics
REST API (XML/HTTP)REST API (XML/HTTP)
distributed centralized
The iP/F Architecture
REST API (XML/HTTP)REST API (XML/HTTP)
flowconsole
packetconsole
wireshark
presentation layer
packetcapture
packetsmicroflow
flowmacroflow
flowcollector
flow analyticspacket analytics
REST API (XML/HTTP)REST API (XML/HTTP)
distributed centralized
8/8/2011
13
CascadeProfiler
The iP/F Architecture
REST API (XML/HTTP)REST API (XML/HTTP)
flowconsole
packetconsole
CascadePilot
Cascade
wireshark
packetcapture
packetsmicroflow
flowmacroflow
flowcollector
flow analyticspacket analytics
REST API (XML/HTTP)REST API (XML/HTTP)
Cascade Shark
Gateway
CascadeProfiler
The iP/F Architecture
REST API (XML/HTTP)REST API (XML/HTTP)
flowconsole
packetconsole
CascadePilot
Cascade
wireshark integration
packetcapture
packetsmicroflow
flowmacroflow
flowcollector
flow analyticspacket analytics
REST API (XML/HTTP)REST API (XML/HTTP)
Cascade Shark
Gateway
A real example
London The network is slow! The problem wassomewhere in here
HongKong
Everything is finehere and there...
8/8/2011
14
iP/F Drill Down
Packets Micro flow Flow Macro flow
Packets Micro flow Flow Macro flow
iP/F Drill Down
Packets Micro flow Flow Macro flow
iP/F Drill Down
8/8/2011
15
Packets Micro flow Flow Macro flow
iP/F Drill Down
Packets Micro flow Flow Macro flow
iP/F Drill Down
Find the right Shark box...
Packets Micro flow Flow Macro flow
iP/F Drill Down
8/8/2011
16
Packets Micro flow Flow Macro flow
iP/F Drill Down
Packets Micro flow Flow Macro flow
iP/F Drill Down
A real example
London The network is slow! The problem wassomewhere in here
HongKong
Everything is finehere and there...
8/8/2011
17
Now we can find the needle...
LondonHere it is, took 15 mininstead of days totrack down
HongKong
Bangaloreoffice
xxx xxx
Minimize “mean-time to innocence”
So how does it work?
• What makes the blinky red light come on?• Lots of important building blocks but a key
foundational technology is behavioral analytics– proactive instead of reactiveproactive instead of reactive– know when something is wrong before getting a
call from an angry user– alerts without manually setting tens of thresholds– adaptive... no manual updates to thresholds when
the network changes
band
wid
th (
Mb/
s)
Week 1
Behavioral Analytics
Sun
Ord
er e
ntry
sys
tem
b
Mon Tue Wed Thu Fri Sat
8/8/2011
18
band
wid
th (
Mb/
s)
Week 1
Behavioral Analytics
Sun
Ord
er e
ntry
sys
tem
b
Mon Tue Wed Thu Fri Sat
Week 2
band
wid
th (
Mb/
s)
averagebandwidth
Behavioral Analytics
Sun
Ord
er e
ntry
sys
tem
b
Mon Tue Wed Thu Fri Sat
band
wid
th (
Mb/
s)
variance
Behavioral Analytics
Sun
Ord
er e
ntry
sys
tem
b
Mon Tue Wed Thu Fri Sat
8/8/2011
19
band
wid
th (
Mb/
s)
Behavioral Analytics
Sun
Ord
er e
ntry
sys
tem
b
Mon Tue Wed Thu Fri Sat
band
wid
th (
Mb/
s)
Behavioral Analytics
Sun
Ord
er e
ntry
sys
tem
b
Mon Tue Wed Thu Fri Sat
band
wid
th (
Mb/
s)
Behavioral Analytics
Sun
Ord
er e
ntry
sys
tem
b
Mon Tue Wed Thu Fri Sat
ANOMALY ALARM
8/8/2011
20
band
wid
th (
Mb/
s)
Behavioral Analytics
Sun
Ord
er e
ntry
sys
tem
b
Mon Tue Wed Thu Fri Sat
Problem fixed!
B e h a v io r a l A n a ly tic s
• Instead of telling the system how to detect a problem, tell it what performance policies you care about
• The anal tics engines learns a tomaticall b • The analytics engines learns automatically by observing your network
• When things change, the engine adapts to changes by learning the “new normal”
Tying it all together
• Lots of customer discussions over past 9 months• A very clear trend emerged...
– A core set of common needsA l l f– A long tail of unique needs
bread&
butter
long tail of custom requests
8/8/2011
21
Tying it all together
• The solution: infrastructure programmability– programmatic access to analytics– customizable packet processing logic– iP/F REST layer/ y
• End users can then– integrate across other IT systems– customize monitoring for their environment– build browser-based mashups that integrate with
existing portals and applications
The iP/F Architecture
REST API (XML/HTTP)REST API (XML/HTTP)
flowconsole
packetconsole
wireshark
packetcapture
packetsmicroflow
flowmacroflow
flowcollector
flow analyticspacket analytics
REST API (XML/HTTP)REST API (XML/HTTP)
distributed centralized
CASCADE PILOT CASCADE PROFILER
iP/F as a Platform
iP/F Platform
iP/F RESTiP/F REST
UIgadgets
flowinfo packets topology alarms
behavioralanalytics
capturecontrol
systemconfig.
packetdissectors reports
etc...
8/8/2011
22
CASCADE PILOTCASCADE PROFILER
iP/F as a Platform
iP/F scripting API
iP/F Platform
iP/F RESTiP/F REST
UIgadgets
flowinfo packets topology alarms
behavioralanalytics
capturecontrol
systemconfig.
packetdissectors reports
etc...
CASCADE PILOTCASCADE PROFILER
iP/F as a Platform
iP/F scripting API
... customapps
iP/F Platform
iP/F RESTiP/F REST
UIgadgets
flowinfo packets topology alarms
behavioralanalytics
capturecontrol
systemconfig.
packetdissectors reports
etc...
Some Examples
• Failover monitor– Monitor active connections to a critical server, and
make sure clients reconnect to backup server if primary server fails
• IDS integrationg– When certain alerts are received from snort, reach out
to shark appliances and save packet trace related to suspected threat
• Google maps mashup– Show network health geographically on a google map,
and correlate failures and performance degradation with natural disasters
8/8/2011
23
Summary
• It’s hard to make things easy– An emerging industry trend is top-down visibility– Monitor, detect problem, troubleshoot– There seems to be a gap between vision and reality
• We think our approach fills the gap– iP/F to integrate flow monitoring and packet capture– Programmable infrastructure to customize the
infrastructure for the “long tail” of requirements– Seemless integration with the popular wireshark tool
• We’re working hard to make it all happen