network flow automation and visibility - france-ix · how to solve the network challenges ......
TRANSCRIPT
1 Arista Networks – France IX – 2013-09-26
Network flow automation and Visibility
Corporate Overview
Are your workloads moving and scaling at an increased rate?
Corporate Overview
Are you still waiting for provisioning to happen in seconds not weeks?
Corporate Overview
Do your operations run 24x7 with no planned downtime?
Corporate Overview
How to solve the network challenges
with solutions shipping today…
6
Introduction Evolution of Automation in Networking SDN Use Cases Telemetry Q/A
Agenda
7
Introduction
8
Data Centre Transport
Agreement on the Physical topology Physical Architecture CLOS Leaf/Spine Consistent any-to-any latency/throughput Consistent performance for all racks Fully non-block architecture if required Simple scaling of new racks
Spine Layer 10Gbe/40Gbe Layer 2/3
Leaf layer 10Gbe/1Gbe Layer 2/3
For the East to West Traffic workflows
Consistent performance, subscription and latency between all racks Consistent performance and latency with scale Architecture built for any-to-any Data center traffic workflows
9
Data Centre Transport
Building the Layer 3 network Physical CLOS leaf/Spine architecture Equal Cost Multi-Pathing for active-active
forwarding Standard protocols and Standard hardware, No increase in management or operational cost, minimal risk
Strength All links are active, and forwarding traffic Distributed failure domain, with multiple spine
topology Mac addressed and VLANs stay in the rack Issue VM cannot move between rack, … Layer 3 CLOS topology allows better control of mac and vlan usage
Subnet-A
Subnet-B
Subnet-C
Subnet-D
Subnet-E
Subnet-F
ECMP ECMP Layer 3 between
Leaf and Spine
Layer 2 within
the Rack
OSPF or BGP
L2/L3 switch
1U or chassis L3 switch
Physically Distributed resilient Core
VXLAN was created to solve this issue
10
Virtual eXtensible LAN (VXLAN) IETF framework proposal, co-authored by Arista,
Vmware, Cisco, Citrix, Red hat and Broadcom Announced at VMworld 2011
Vmotion without a large L2 network VM mobility across Layer 3 boundaries Integrates seamlessly with existing infrastructure Supported in hardware or software (Vshield)
Similar standards proposed by Microsoft NVGRE for Microsoft HyperV
Virtual eXtensible LAN
Subnet B Subnet A
Layer 3 Tunnel
VM mobility Across Layer 3 subnets
VM-1 10.10.10.1/24
VM-2 20.20.20.1/24
VM-3 10.10.10.2/24 VM-4
20.20.20.1/24
ESX host ESX host
VM mobility within a best practice layer 3 network Architecture
11
Overlay Network
Overlay Network provides transparency - Scalable Layer 2 services across a layer 3 transport
- Decouples the requirements of the Virtualized from the constraints of the physical network
- Tenant network transparent to the transport for Layer 3 scale
- Multi-Tenancy with 24-bit tenancy ID and overlapping VLANs
- Network becomes a flexible bandwidth platform
Physical Infrastructure
Overlay network
VNI 2000
VNI 3000
VNI 3000
Layer 3 Transport
Transparent L2 Services
Scalable, multi-tenant Layer 2 services transparent to the Layer 3 transport network
12
Virtual eXtensible LAN
VXLAN is carrying Ethernet, so it can work for
Any VMs,
Physical servers
Any physical Appliances Load Balancing, FW, Storage, …
13
Network Operating Systems were historically bound to specific hardware platforms
Required 100% vendor development – no customization
Had no linkages to other companies applications and platforms
Network Operating System
Legacy Network Infrastructure
The past: Outmoded development models
time for a change…
14
The future: Arista architecture for the SDCN
Virtualization – think cloud…
Universally capable infrastructure – enables any application and workload combination
Modular distributed system designed to be customized for customer’s IT operations
Open Partnering: connecting the network to the best and most powerful infrastructure-centric applications available
Series of powerful applications to run on the network in distributed system Network Applications
Programmatic and Flexible Network Operating System
Universal Cloud Network
15
The Foundation is Programmability
OpenFlow or other external / internal agent to control
custom flow path
Integration with OpenStack to support fully
automated provisioning
Native Vmware / Nicira integration into vSphere, vCloud, NSZ - VXLAN
Native API calls with key partners. Enables network automation
16
Evolution of Automation in Networking
17
OpenStack is a cloud operating system that controls large pools of compute, storage, and networking resources throughout a datacenter, all managed through a dashboard that gives administrators control while empowering their users to provision resources through a web interface.
OpenStack
Source - http://www.openstack.org/software/
18
Software Defined Cloud Networks
Openstack Development and Contribution
Openstack Quantum only supported a single network device driver - commonly the Open vSwitch driver
A dual-stack driver to Quantum that enables concurrent physical and virtual network element provisioning
This makes Openstack Quantum deployable and integrated across real world networks
OVS Virtual Driver
Physical Driver
Quantum/ Neutron
Nova
19
VMTracer Provides Automated Integration and Visibility
7050# show vmtracer interface Ethernet48 Ethernet48: esx1.aristanetworks.com/ndsTest/dvuplink1 VM Name Network Adapter VLAN Status State ------------------------------------------------------- Exchange Network adapter 4 7 up/up -- Apache Network adapter 3 6 up/up vMotion MySQL Network adapter 1 5 up/up FT-A
VLAN Trunks Opened/Pruned based on allowed range
VLANs Created/Removed based on VM demand
VLAN Trunks Opened/Pruned based on VM demand
20
Arista EOS Extensibility: XMPP
XMPP Client/Server Multi-device Management
• Scalable, Simple, Open - Multi-device Message Bus
• Open Standard Open Source
• Federated Architecture
• Scales to 10,000+ Switches
• Encrypted, AAA
• Message Replay/Logging
21
Puppet & Chef Integration Puppet Master Puppet Agent
Puppet IPC
Chef Server is centralized store of your infrastructure’s configuration
Chef Workstation is a client with local Chef repository and properly configured Knife
Knife is used to communicate with Nodes using ssh
Systems managed by Chef are called Nodes
Client Client
Abstraction of EOS configuration into “Resources” – Support “Template-based” automation tools Automate mundane tasks, reduce errors, save OPEX , real-time response to events Natively run Client/Agent in EOS Tight integration with EOS Tools (ZTP, AEM, CLI extensibility)
22
SDN Use Cases
23
Used features for deployment example
ZTP : Zero Touch Provisioning - Allow dhcp/bootp and scripting download
Advanced Event Manager - Event Manager configuration
Opscode Chef automation - Allow Chef client integration for management
Cloudvision: XMPP - Remote control of single and multiple equipment simultaneously
DHCP Server TFTP/HTTP Server
DHCP offer 1
Config request 2
3
Config file or script
24
Chef Integration Example /usr/bin/chef-client
installs using standard Omnibus package & runs natively in EOS can run as daemon or scheduled task
Cookbooks eos (used for today’s demonstration) ohai (used for custom Ohai plugins)
Recipes eos::default -- setup environment + custom ohai plugins eos::cloudvision -- manage CloudVision extension
Roles eos_base -- ohai, eos::default eos_cloudvision -- eos::cloudvision
Data Bags provides configuration parameters based on Arista OUI and system mac address
25
Potential Use Cases
Network configuration and state inventory with Ohai Standardized routing policy Extending Chef command line tools into EOS CLI Configuration version control (using git or svn) Deploying EOS extensions Standardized interface templates Security policy Software image updates Development QA Deployment Configuration Methodology
26
Flowchart
27
Deployment Example
28
Stage 1: Arista ZTP
๏Factory default switch boots ๏No startup-config triggers ZTP ๏DHCP request sent out all ports ๏DHCP response including option 67 (bootfile) ๏Download specified bootfile ๏Restart switch
29
Stage 2: Arista AEM
✓Factory default switch boots ✓No startup-config triggers ZTP ✓DHCP request sent out all ports ✓DHCP response including option 67 (bootfile) ✓Download specified bootfile ✓Restart switch
๏AEM fires onBoot trigger ๏Python script bootstraps Chef agent
30
Stage 3: Opscode Chef
✓Factory default switch boots ✓No startup-config triggers ZTP ✓DHCP request sent out all ports ✓DHCP response including option 67 (bootfile) ✓Download specified bootfile ✓Restart switch
✓AEM fires onBoot trigger ✓Python script bootstraps Chef agent
๏Chef agent executes ๏Loads OHAI data from Sysdb ๏Checks for CloudVision extension ๏Downloads CloudVision and installs it
31
Stage 4: Arista CloudVision
✓Factory default switch boots ✓No startup-config triggers ZTP ✓DHCP request sent out all ports ✓DHCP response including option 67 (bootfile) ✓Download specified bootfile ✓Restart switch
✓AEM fires onBoot trigger ✓Python script bootstraps Chef agent
✓Chef agent executes ✓Loads OHAI data from Sysdb ✓Checks for CloudVision extension ✓Downloads CloudVision and installs it
๏CloudVision configured and operational ๏Chef agent scheduled via CLI Scheduler ๏Periodic runs by Chef agent keeps configuration standardized
32
Deployment Example
✓Factory default switch boots ✓No startup-config triggers ZTP ✓DHCP request sent out all ports ✓DHCP response including option 67 (bootfile) ✓Download specified bootfile ✓Restart switch
✓AEM fires onBoot trigger ✓Python script bootstraps Chef agent
✓Chef agent executes ✓Loads OHAI data from Sysdb ✓Checks for CloudVision extension ✓Downloads CloudVision and installs it
✓CloudVision configured and operational ✓Chef agent scheduled via CLI Scheduler ✓Periodic runs by Chef agent keeps configuration standardized
33
10.0.0.0/24
VTEP VNI 5001
VTEP VNI 5001
51.51.51.0
VTEP Hardware Virtualise : - Network appliances - Storage - Servers
Lost Service
ID Address On/ Off
7 10.0.0.7
6 10.0.0.6
5 10.0.0.5
4 10.0.0.4
3 10.0.0.3
2 10.0.0.2
1 10.0.0.1
Arista EOS and A10
34
10.11.11.0/24 via Spine-A @1800-2400 Backup via Spine-B
10.11.11.0/24 via Spine-C @1800-2400 Backup via Spine-C Spine-A
10.11.11.2 10.10.10.2
Spine-B Spine-C
Leaf-B Leaf-A
DirectFlow enabled path selection via Spine-B <-> Spine-C
Host-A Host-B
Backup HTTP
SMTP/Mail SIP/Voice
DirectFlow: Redirecting flows without any controller
Practical Examples of SDN
35
Smart System Upgrade: Initiating Maintenance Mode
Virtualization Infrastructure Maintenance Mode initiated Snapshot – stores #neighbors, peers, etc
36
Smart System Upgrade: Initiating Maintenance Mode
Virtualization Maintenance Mode initiated Snapshot – stores #neighbors, peers, etc
Directly-connected Vmware hosts put into maintenance mode A10 graceful shutdown activated
Infrastructure
37
Smart System Upgrade: Initiating Maintenance Mode
Virtualization Maintenance Mode initiated Snapshot – stores #neighbors, peers, etc
Directly-connected Vmware hosts put into maintenance mode A10 graceful shutdown activated
Open protocols used to drain traffic Exception based flow handling redirects traffic
Infrastructure
38
Smart System Upgrade: General Operation
Virtualization Workload is moved Overlay facilitates virtual re-cabling
Infrastructure
39
Smart System Upgrade: General Operation
Virtualization Workload is moved Overlay facilitates virtual re-cabling
Maintenance is performed on device Device brought back into service API calls inform other devices
Infrastructure
40
Smart System Upgrade: General Operation
Virtualization Workload is moved Overlay facilitates virtual re-cabling
Maintenance summary sent to operations team Health checks are performed Removed from maintenance mode Workloads are rebalanced
Maintenance is performed on device Device brought back into service API calls inform other devices
Infrastructure
41
Telemetry
42
How much is lack of visibility costing you?
$84 000,00
$168 000,00
$252 000,00
$336 000,00
$420 000,00
$504 000,00
$588 000,00
$672 000,00
15 30 45 60 75 90 105 120
Cost of an outage
Minutes
59% of Fortune 500 companies experience a minimum of 1.6 hours of downtime per week
-Dunn & Bradstreet
Downtime costs $5,600 per minute
-Ponemon Institute
think differently….
43
Timeline of a problem is your network costing you money?
$0 Application
intermittently slow
44
Timeline of a problem is your network costing you money?
30 minutes $168,000
Users begin reporting issues
45
Timeline of a problem is your network costing you money?
45 minutes $252,000
Set-up Bridge Assemble the team
46
Timeline of a problem is your network costing you money?
60 minutes $336,000
Find the virtual machine
47
Timeline of a problem is your network costing you money?
90 minutes $504,000
Look at counters Review system logs
48
Timeline of a problem is your network costing you money?
120 minutes $672,000
Get a sniffer
49
Timeline of a problem is your network costing you money?
150 minutes $840,000
Review the capture Isolate problem
50
Advanced Mirroring
Advanced mirroring is used for in-path traffic mirroring. - Capturing data traffic and forwarding directly to mirror port for analysis
- Destination could be a tap aggregation/matrix switch network for forwarding to a pool of analysis tools
Eth-1 Eth-2
Eth-24
Switch in the forwarding path, and mirroring traffic for analysis
Server-A Server-B
Analyzer
51
How do we get from this ….
52
To this ..
53
Monitoring Infrastructure in action
54
Application Path Analysis:
Application Monitor
Switch 1: Latency: 350ns
Status: OK Switch 3:
Latency: 15µs Status: Heavy Congestion
Top 3 congested flows Source Destination Port
10.2.2.13 10.50.14.4 3379 10.51.35.7 10.50.14.4 http 10.51.35.7 10.33.140.5 http
Switch 1
350ns
Switch 3
15µs
Switch 6
733ns
Switch 8
505ns
Switch 6: Latency: 733ns
Status: OK Switch 8:
Latency: 505ns Status: OK
100µs 12µs 10µs
Network-wide Visibility with Tap Aggregation
LANZ+Stream: Live Congestion Metrics
Aggregated Mirror Traffic: Live Traffic Feed, timestamped, filtered
55
Thank You