skydive, real-time network analyzer, container integration
TRANSCRIPT
Sylvain AFCHAINPrincipal Software Engineer
17/05/2016
SkydiveA real-time network analyzer
WHY ?
SDN IS COMPLEX
Troubleshooting/monitoring is even more complex
ImplementationsManagement
Control plane
● OpenFlow● XMPP● BGP● AMQP● Etc...
Data plane
● VLAN● VXLAN● GRE● MPLS● OVS, Linuxbridge, other
TroubleshootingWhere...
packets are dropped ?packets are fragmented ?choke point occurs ?
What…
packet layers path ?kind of traffic for this virtual network ?number of flows on this link ?number of TCP Sessions ?bandwidth for this tenant ?
Current toolbox● iproute2 ● ovs-vsctl, ovs-ofctl, ovs-dpctl...● ethtool● brctl● tcpdump● SDN CLI/API● SSH● ...
What we need● Flow centric● Easy to deploy● SDN Agnostic solution● Non-intrusive / Lightweight● Open, API● Connectors to SDN
What we need● Topology capture
a. interfaces, bond, mtu, vlanb. bridgesc. Network namespacesd. etc..
● Flow capturea. on-demand traffic captureb. on-demand counter capturec. filteringd. underlay/overlay informations
● Topology/flow aggregationa. mapping topology/flowb. analysis
Topology capture● Graph engine, event based● Gremlin like query language● Populated from :
○ netlink ○ netns ○ ovsdb○ ethtool
● External connectors :○ Docker○ Neutron
Topology capture$ ip netns add vm1
$ ip link add vm1-eth0 type veth peer \name eth0 netns vm1
$ ip link set vm1-eth0 up
$ ip netns exec vm1 ip link set eth0 up
$ ip netns exec vm1 ip address add \10.0.0.1/24 dev eth0
$ ovs-vsctl add-port br-int vm1-eth0
Topology capture$ skydive client topology query -q 'G.V().Has("Name", "vm1")'
[{"Host": "localhost.localdomain","ID": "07236227-b280-4947-5ceb-c1f98e8515f3","Metadata": {
"Name": "vm1","Type": "netns"
}}]
Topology capture$ skydive client topology query -q 'G.V().Has("Type", "ovsbridge").Out().Out().Has("Name", Without("br-int"))
[ { "Host": "localhost.localdomain","ID": "a190409e-f76e-4c8f-55b9-985e662a37c0","Metadata": { "Driver": "veth", "IfIndex": 168,
"MAC": "3e:88:b9:65:04:7e", "MTU": 1500, "Name": "vm1-eth0", "State": "UP",
"Type": "veth","UUID": "b6e9bf79-9b58-4b65-800e-1ddf9909d9dc" }}]
Topology capture$ docker run --name=webserver \-p 80:80 -d eboraas/apache
$ sudo docker run --name database \postgres
Topology capture$ skydive client topology query -q 'G.V().Has("Type", "netns")'
[{ "Host": "localhost.localdomain","ID": "5674d492-e2e1-4e6f-63f4-3b9f1073da03","Metadata": { "Docker.ContainerID": "5841d117701051542496d….994e5c2f2284e86c0ce17f2662", "Docker.ContainerName": "/webserver", "Docker.ContainerPID": 17216, "Manager": "docker", "Name": "webserver", "Type": "netns"}
}]
Flow capture● Flow table centric● Local mapping flow/topology● Layer metrics● Packet data from
○ sFlow○ Pcap
Flow capture$ skydive client capture create \--probepath "*/br-int[Type=ovsbridge]"
{ "ProbePath": "*/br-int[Type=ovsbridge]"}
Flow capture$ ip netns exec vm1 ping 10.0.0.2
Flow schema
● Metrics per layer● Unique ID per flow● Unique ID per flow/capture● Origin/Destination● Capture point
Skydive architectureAgents :
● Capture topology● Capture flows, maintains flow table● Local topology/flow mapping● Forward topology/flow to analyzers
Analyzers :
● Aggregate topology/flow● Global topology/flow mapping● Stores topology/flow in a database
Kubernetes integration
Demo
Skydive Use-cases● Detection of common configuration errors
● Detection of live network issues○ bad performances, helping to find the root cause○ DDOS and any unattended traffic
● Possibility to capture traffic at any point○ History of all the captured metrics○ Post mortem analysis
● Detection of bad application performance, bad RTT, wrong security groups
Skydive Roadmap● Topology capture
○ More probes : OpenFlow, L3 informations ○ Versioning
● Live distributed capture○ Filtering
● Analysis○ More protocols○ Alerting
● Security○ RBAC○ SSL○ IP anonymization
Open source
Apache License
Written in Go
Contributions are welcome
Questions ?
https://github.com/redhat-cip/skydiveIRC: #skydive-project @[email protected]@redhat.com