technical deep dive into midonet
TRANSCRIPT
MidoNet Deep DivePino de Candia
Agenda1. Virtual Network Topology2. Physical-Virtual Boundary3. Cluster nodes (aka Network State DB)4. Compute nodes5. Flow Switch concept6. Gateway nodes7. How tunneling works8. Change propagation and flow invalidation9. L4 Flow State
Bare MetalServer
Bare MetalServer
MidoNet transforms this...
VM
VM
VM
VM VM
VM VM
VM
VM
VM
VM
VM VM
VM VM
VM
VM
VM
VM
VM VM
VM VM
VM
VM
VM
VM
VM VM
VM VM
VM
IP Fabric
Bare MetalServer
Bare MetalServer
into this...
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM VM
VM
VM
VMVM
VM
VMVM
VM
VM
VM
VM
VM
VM
VM
FW
LB
FW
LB
Internet/WAN
FW
LB
LB
then moves packets...
“Port-Interface Bindings”
● Vport1 => Compute1, tap12345● Vport2 => Compute2, tap67890● Uplink1 => Gateway1, eth1
Virtual-Physical Boundary
Bindings (and the virtual network topology) are stored in MidoNet’s cluster and propagated to the MidoNet Agents.
Bare MetalServer
Bare MetalServer
VM
VM
VM
VM VM
VM VM
VM
VM
VM
VM
VM VM
VM VM
VM
VM
VM
VM
VM VM
VM VM
VM
VM
VM
VM
VM VM
VM VM
VM
Cluster stores and propagates topology
midonet cluster 2
midonet cluster 3
midonet cluster 1
IP FabricIP Fabric
Bare MetalServer
Bare MetalServer
Port-Interface Bindings
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM VM
VM
VM
VMVM
VM
VMVM
VM
VM
VM
VM
VM
VM
VM
FW
LB
FW
LB
Internet/WAN
FW
3
LB
LBVport1 => Compute1, tap12345Uplink1 => Gateway1, eth1
VM1
VM
2Vport2 => Compute2, tap67890
Bare MetalServer
Bare MetalServer
Back to the physical view...
VM
VM
VM
VM VM
VM VM
VM
VM
VM
VM
VM VM
VM VM
VM
VM
VM
VM
VM VM
VM VM
VM
VM
VM
VM
VM VM
VM VM
VM
IP Fabric
Compute 1 Compute 2midonet cluster 2
midonet cluster 3
midonet cluster 1
IP Fabric
Port-Interface Bindings in the Physical View
Compute 1
Flow Switch (in-kernel OVS)
Compute 2
VM
VM
VM
VM VM
VM VM
VM
VM
VM
VM
VM VM
VM VM
VM
IP Fabric
Flow Switch (in-kernel OVS)
IP1 IP2
VXLANTunnel Port
VXLANTunnel Port
eth0 eth0
port5, tap12345 port6, tap678902Vport2 => Compute2, tap67890
1Vport1 => Compute1, tap12345
The compute hosts in a little more detail
Compute 1
Flow Switch (in-kernel OVS)
What is a flow switch?
VM
VM
VM
VM VM
VM VM
VM
IP1
VXLANTunnel Port
eth0
10.0.0.4->10.0.0.510.0.0.3->200.0.0.5
port6 port8
port1
MidoNet Agent (Java Daemon)
10.0.0.3->10.10.0.2
Miss packets go to user-space via Netlink channel
Rule1: Match: in=6, srcIP=10.0.0.4➔ Actions: []
Rule2: Match: in=8, srcIP=10.0.0.3,dstIP=200.0.0.5, proto=TCP, srcPort=23109,dstPort=22➔ Actions: [srcIP=111.0.0.4, tunnel=[src=192.
168.0.3, dst=192.168.0.4, key=100], out=1]
MidoNet can:1. ignore it2. send it back with actions3. install a new flow rule4. do both #3 and #4
port2 Rule3: Match: in=8, srcIP=10.0.0.3, dstIP=10.10.0.2, proto=ICMP➔ Actions: [srcMAC=M1, dstMAC=M2, out=2]
Bare MetalServer
Bare MetalServer
Port-Interface Bindings
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM VM
VM
VM
VMVM
VM
VMVM
VM
VM
VM
VM
VM
VM
VM
FW
LB
FW
LB
Internet/WAN
FW
3
LB
LBUplink1 => Gateway1, eth1VM
VM
Gateway 1
Detail of the Gateway Node
Compute 1
VM
VM
VM
VM VM
VM VM
VM
Quagga, bgpd
IP Fabric
Flow Switch (in-kernel OVS) Flow Switch (in-kernel OVS)
IP1 IP3
VXLANTunnel Port
eth0 eth0 eth1VXLAN
Tunnel Port
3Uplink1 => Gateway1, eth1
Internet/WAN/DC
port5, tap123451Vport1 => Compute1, tap12345
Bare MetalServer
Bare MetalServer
Back to the physical view...
VM
VM
VM
VM VM
VM VM
VM
VM
VM
VM
VM VM
VM VM
VM
VM
VM
VM
VM VM
VM VM
VM
VM
VM
VM
VM VM
VM VM
VM
IP Fabric
midonet cluster 2
midonet cluster 3
midonet cluster 1
midonet gateway
2
midonet gateway
3
midonet gateway
1
IP FabricIP FabricInternet/WAN/DC
Gateway 1
Detail of the Gateway Node - pre-installed flows
Quagga, bgpd
Flow Switch (in-kernel OVS)
IP3
eth0 eth1VXLAN
Tunnel Port
Internet/WAN/DC
Compute 1
VM
VM
VM
VM VM
VM VM
VM
IP Fabric
Flow Switch (in-kernel OVS)
IP1
VXLANTunnel Port
eth0 3Uplink1 => Gateway1, eth1
port5, tap123451Vport1 => Compute1, tap12345
port1 port2
port3, veth0
veth1
Rule1: Match: in=2, srcIP=<Uplink1 Peer’s IP>, dstIP=<Uplink1’s IP>, proto=TCP, dstPort=BGP➔ Actions: [out=3]
Rule2: Match: in=2, srcIP=<Uplink1 Peer’s IP>, dstIP=<Uplink1’s IP>, proto=TCP, srcPort=BGP➔ Actions: [out=3]
Rule3: Match: in=3➔ Actions: [out=2]
Rule4: Match: in=2, ethertype=ARP, op=BOTH, srcIP=<Uplink1 Peer’s IP>➔ Actions: [out=3, to-user-space]
Internet/WAN
Uplink1 => Gateway1, eth1
MidoNet Agent (Java Daemon)
● Flow rules are computed at the ingress host● by simulating a packet’s path through the virtual topology● without fetching any information off-box (~99% of the time)● if the egress port is on a different host, then the packet is
tunneled● and the tunnel key encodes the egress port● so that no computation is needed at the egress
MidoNet uses VNIs to encode Vports - NOT network segments.
Flow rule computation and tunneling
Compute 1
Flow Switch (in-kernel OVS)
VM
VM
VM
VM VM
VM VM
VM
IP1
VXLANTunnel Port
eth0
Compute 2
VM
VM
VM
VM VM
VM VM
VM
IP Fabric
Flow Switch (in-kernel OVS)
IP2
VXLANTunnel Port
eth0
Pre-installed flows on the compute hosts
Rule1: Match: in=1, tunKey=<VNI of VM1>➔ Actions: [out=2]
Rule2: Match: in=1, tunKey=<VNI of VM2>➔ Actions: [out=3]
Rule3: Match: in=1, tunKey=<VNI of VM3>➔ Actions: [out=4]
… and so on...
port1
ExtIP->VM1
IP3 -> IP1VNI of VM1
ExtIP->VM1
Bare MetalServer
Bare MetalServer
A flow between two VMs...
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM VM
VM
VM
VMVM
VM
VMVM
VM
VM
VM
VM
VM
VM
VM
FW
LB
FW
LB
Internet/WAN
FW
LB
LBVM1->FIP1
VIP1->VM2FIP2->FIP1
FIP2->VIP1
is tunneled C1 to C2 (no middle compute nodes)
Compute 2Compute 1
VM
VM
VM
VM VM
VM VM
VM
VM
VM
VM
VM VM
VM VM
VM
IP Fabric
Flow Switch (in-kernel OVS) Flow Switch (in-kernel OVS)
IP1 IP2
VXLANTunnel Port
VXLANTunnel Port
VM1->FIP1
VIP1->VM2
IP1 -> IP2VNI of VM2
VIP1->VM2
Host network stackperforms encapsulation Host network stack
performs decapsulation
New Rule: Match: in=5, srcIP=VM1, dstIP=FIP1, proto=TCP➔ Actions: [srcIP=VIP1, dstIP=VM2, tunnel=
[src=IP1, dst=IP2, key=<VNI of VM2], out=1]
port5, tap12345
Bare MetalServer
Bare MetalServer
A flow that exits an uplink...
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM VM
VM
VM
VMVM
VM
VMVM
VM
VM
VM
VM
VM
VM
VM
FW
LB
FW
LB
Internet/WAN
FW
LB
LBVM1->ExtIP1
FIP1->ExtIP1
Gateway 1
...is tunneled C1 to L3GW node
Compute 1
VM
VM
VM
VM VM
VM VM
VM
Quagga, bgpd
IP Fabric
Flow Switch (in-kernel OVS) Flow Switch (in-kernel OVS)
IP1 IP3
VXLANTunnel Port
eth0 eth0 eth1VXLAN
Tunnel Port
VM1->ExtIP1
FIP1->ExtIP1
IP1 -> IP2Uplink1 VNI
FIP1->ExtIP1
Internet/WAN/DC
port5, tap12345
New Rule: Match: in=5, srcIP=VM1, dstIP=ExtIP1, proto=TCP➔ Actions: [srcIP=FIP1, dstIP=ExtIP1, tunnel=
[src=IP1, dst=IP3, key=<VNI of Uplink1], out=1]
Bare MetalServer
Bare MetalServer
If an uplink fails...
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM VM
VM
VM
VMVM
VM
VMVM
VM
VM
VM
VM
VM
VM
VM
FW
LB
FW
LB
Internet/WAN
FW
LB
LB
Bare MetalServer
Bare MetalServer
notify whomever needs to know
VM
VM
VM
VM VM
VM VM
VM
VM
VM
VM
VM VM
VM VM
VM
VM
VM
VM
VM VM
VM VM
VM
VM
VM
VM
VM VM
VM VM
VM
IP Fabric
midonet cluster 2
midonet cluster 3
midonet cluster 1
midonet gateway
2
midonet gateway
3
midonet gateway
1
IP FabricIP FabricInternet/WAN/DC
Compute 1
Flow Switch (in-kernel OVS)
The receiving Agent invalidates related rules
VM
VM
VM
VM VM
VM VM
VM
IP1
VXLANTunnel Port
eth0
port1
MidoNet Agent (Java Daemon)
New Rule: Match: in=5, srcIP=VM1, dstIP=ExtIP1, proto=TCP➔ Actions: [srcIP=FIP1, dstIP=ExtIP1, tunnel=
[src=IP1, dst=IP3, key=<VNI of Uplink1], out=1]
port5, tap12345
VM1->ExtIP1
If the flow is still active, a miss packet will be sent to the MN Agent via Netlink and a new flow rule can be recomputed that doesn’t use the failed uplink.
Uplink1 is Down
Bare MetalServer
Bare MetalServer
If a flow had L4 state (SNAT)...
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM VM
VM
VM
VMVM
VM
VMVM
VM
VM
VM
VM
VM
VM
VM
FW
LB
FW
LB
Internet/WAN
FW
LB
LBVM1->ExtIP1
FIP1->ExtIP1
Bare MetalServer
Bare MetalServer
VM
VM
VM
VM VM
VM VM
VM
VM
VM
VM
VM VM
VM VM
VM
VM
VM
VM
VM VM
VM VM
VM
VM
VM
VM
VM VM
VM VM
VM
IP Fabric
midonet cluster 2
midonet cluster 3
midonet cluster 1
midonet gateway
2
midonet gateway
3
midonet gateway
1
IP FabricIP FabricInternet/WAN/DC
The state is shared with return flow ingress(es)
...is tunneled C1 to L3GW node
Compute 1
VM
VM
VM
VM VM
VM VM
VM
IP Fabric
Flow Switch (in-kernel OVS)
IP1 IP3
VXLANTunnel Port
eth0
FIP1->ExtIP1
IP1 -> IP2Uplink1 VNI
Internet/WAN/DC
port5, tap12345
VM1->ExtIP1
Gateway 1
Quagga, bgpd
Flow Switch
(in-kernel OVS)
eth0Tunnel Port
eth1
Gateway 2
Quagga, bgpd
Flow Switch
(in-kernel OVS)
eth0Tunnel Port
eth1
Gateway 3
Quagga, bgpd
Flow Switch
(in-kernel OVS)
eth0Tunnel Port
eth1
IP5 IP6
Flow State
IP1 -> IP2Special VNI
Port’s packet pipeline in MN 5.0
PortMirroring
from wireService
RedirectionChain
FilteringChain
into device
Filtering Chain
from device Service
RedirectionChain
Port Mirroring
onto wire to next port or end simulation
Bridge packet pipeline in MN 5.0
Pre-forwarding
Chain
from port Forwarding Table
Post-forwarding
Chain
to one or more ports
Router packet pipeline in MN 5.0
Pre-forwarding
Chainfrom port
Routing Table
Post-forwarding
Chain
to one or more ports
L4 LBaaS
Security Groups are translated to Chains and Rules
New in MN 5.0: L2 SFC API ObjectsL2Insertion:● inspected vm port UUID● inspected vm MAC● service port UUID● vlan tag● fail-open (true/false)● position (relative to other insertions for the same inspected vm port)
L2Service● service port UUID
1 protected VM, 1 SF
VM1(protected) VM2 SF1
1 protected VM, SF down, fail-close1 protected VM, SF down, fail-open