2014 openstack summit - neutron ovs to linuxbridge migration

Post on 02-Jul-2015

1.751 Views

Category:

Internet

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Presentation titled 'Migrating production workloads from OVS to LinuxBridge'. Presented at the Fall 2014 OpenStack summit in Paris, this slide deck introduced the possibility of migrating live workloads from Open vSwitch to LinuxBridge with minimal downtime.

TRANSCRIPT

Migrating production workloads from OVS to LinuxBridge

Kevin Stevens (kevin.stevens@rackspace.com)

James Denton (james.denton@rackspace.com)

Kevin Stevens

•RPC Engineer since 2012 (Essex)

• IRC: k_stev on irc.freenode.net

We’re operators!

James Denton

• RPC Network Engineer since 2012 (Essex)

• IRC: busterswt on irc.freenode.net

What are we doing here?

1

A history of networking in

Rackspace Private Cloud

Our experiences with Open vSwitch

Swapping out OVS

with LinuxBridge

What to expect with

each

2 3 4

2011Started building OpenStack-powered private clouds

2012

Began architecting, building and supporting private clouds in customer DCs

2013

Over 100 customers running RackspacePrivate Clouds

2014Released RPC v9 based on Icehouse. 99.99% API uptime SLA.

RPC

v2.0/3.0

RPC

v4.0/4.1RPC v4.2 RPC v9.0

OpenStack

ReleaseFolsom Grizzly Havana Icehouse

Network Stack nova-network Quantum Neutron Neutron

L2

ConnectivityflatDHCP Open vSwitch Open vSwitch

LinuxBridge

(ML2)

L3 Agent

SupportN/A No Yes Yes

Host OSUbuntu 12.04

LTS

Ubuntu 12.04

LTS

CentOS 6.5

RHEL 6.5

Ubuntu 12.04

LTS

Ubuntu 14.04

LTS

The Evolution of Networking in RPC

Why Neutron?

Why Neutron w/ Open vSwitch?

•Open vSwitch pushed

by community

•Open vSwitch pushed

by packagers

•Wanted overlay

networks

•Kernel panics (1.10)

•ovs-vswitchd segfaults

(1.11)

•Broadcast storms

•Data corruption (2.01)

The problems

Why Linux Bridge?

•Looking for reliability and stability

•Less moving parts

•Easier to troubleshoot

•Supported by the community

Why move to LinuxBridge?

• Flexibility provided by overlay networking

(if not using vxlan)

•Neutron Distributed Virtual Routers (Juno)

•Any customizability provided by OVS not implemented by Neutron itself

www.rackspace.com 12

What do we lose by moving?

Planning

•Snapshot and delete all instances

•Delete all networks

•Change from OVS -> LB

•Recreate all networks

•Boot instances

•…

•It works but…

Plan A: Scorch the earth!

But wait… these are production environments!

•Deploy LinuxBridge environment

•Snapshot all instances

•Import images into new

environment

•Build new instances

•Cutover

•…

•It works, but… $$$

Plan B: Migration Environment

•Stop services

•Update the database

•Change the configuration from OVS -> LB

•Restart services

•…

•Profit!

Plan C: Switch it out!

• Neutron OVS DB schema != Neutron LinuxBridge DB schema

–Migration to OVS ML2 DB schema is required first

• Overlay networks may not supported

– LinuxBridge uses VXLAN rather than GRE

–Requires kernel >= 3.9

• Means GRE networks must be converted to VLAN networks

–Didn’t want to introduce additional complexity

–VLANs easier to troubleshoot if something went wrong

Issues with migrating

The Process

•Determine what’s needed:

–Dependencies

–Some method of converting database to ML2 schema

–Some method of converting data to LB from OVS

–Which configuration files need mangling

–Which services need disabling

–Which services need restarting

–Roll-back plan

Preparation

•Can instances gain a DHCP lease?

•Do instances have internal/external connectivity?

•Are security groups/other functions still operational?

•Were instances placed into the correct bridge?

•Will the changes survive a reboot?

Define a successful outcome

Normal OVS Operation (Network Node)

Normal OVS Operation (Compute Node)

• Backup! Backup! Backup!

•Use migrate_to_ml2.py (modified) to change the DB schema

•Update segments, ports and vlan tables

–Change GRE to VLAN

–Change segmentation id to real VLAN ID

–Set a provider bridge

First steps: Database manipulation

• Install the LinuxBridge plugin

•Update SQL connection strings

• Configure ml2_conf.ini / linuxbridge_conf.ini

• Change driver from OVS to ML2 in Neutron and Nova conf files

Next steps: Install and Configure

• Stop Neutron services on all nodes

• Remove host data-plane port from the OVS bridge(s)

• Pull instance taps out of the OVS-related linux bridges

• Remove router and dhcp interfaces from OVS integration bridge

• Stop Openvswitch

Next steps: Pull ports from bridges

Interfaces removed from bridges

Stop openvswitch services

• Start Neutron services

• Restart compute services

Finally: Restart services

Post Service Restart (Network Node)

Post Service Restart (Compute Node)

•Instances unresponsive?–Check traffic from tap->bridge->physical interface

–Verify VLANs properly trunked through (and VLANs created on the switch)

Failure Scenarios

•IPs disappear or taps placed in QBR bridges–Check Nova instance_info_caches table.

–Cache can be regenerated with a hard reboot of instance, or by adding an interface to the instance

Failure Scenarios (Cont’d)

•Unable to boot new instances?

– Usual troubleshooting techniques should be used

•DHCP Binding_failed error messages?

– Check /etc/default/neutron-server is referencing ML2 configuration file

•BRQ bridges not built?

– Verify New agents checking in?

– Verify the LinuxBridge agent is installed and running

Failure Scenarios (Cont’d Cont’d)

Benchmarks

* Host-to-host testing; no virtualization. Longer is better.

Compare all the things

0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00 10.00

1

2

4

8

Aggregate Throughput (Gbps)

# ofThreads

iPerf3 Benchmarks (TCP / 1500 MTU / 10G Data) – Intel X520* (ixgbe driver)

Open vSwitch(VXLAN)

LinuxBridge(VXLAN)

Open vSwitch(GRE)

Open vSwitch(VLAN)

LinuxBridge(VLAN)

* Host-to-host testing; no virtualization. Longer is better.

Compare all the things

59.75 Seconds

61.50 Seconds

110.50 Seconds

104.00 Seconds

115.00 Seconds

0.00 20.00 40.00 60.00 80.00 100.00 120.00 140.00 160.00 180.00 200.00

LB VLAN

OVS VLAN

OVS GRE

LB VXLAN

OVS VXLAN

Transfer Speed (MBps)

SCP File Transfers (10G file)*

* Host-to-host testing; no virtualization. Longer is better.

Compare all the things

59.75 Seconds

61.50 Seconds

110.50 Seconds

104.00 Seconds

115.00 Seconds

0.00 20.00 40.00 60.00 80.00 100.00 120.00 140.00 160.00 180.00 200.00

LB VLAN

OVS VLAN

OVS GRE

LB VXLAN

OVS VXLAN

Transfer Speed (MBps)

SCP File Transfers (10G file)*

* Host-to-host testing; no virtualization. Longer is better.

Compare all the things

59.75 Seconds

61.50 Seconds

110.50 Seconds

104.00 Seconds

115.00 Seconds

0.00 20.00 40.00 60.00 80.00 100.00 120.00 140.00 160.00 180.00 200.00

LB VLAN

OVS VLAN

OVS GRE

LB VXLAN

OVS VXLAN

Transfer Speed (MBps)

SCP File Transfers (10G file)*

• OVS provides a great deal of functionality

• Network stability more important for our customers than being on the cutting edge

• Linux bridge provides almost all of the features we might want to use

• How to migrate existing environments to LinuxBridge

• Improved stability and comparable performance with OVS achieved

www.rackspace.com 41

In Summary

Questions?

Download @https://github.com/busterswt/openstackparis2014

top related