differentiated services == differentiated scheduling gary kotton - vmware gilad zlotkin - radware...

Differentiated Services ==

Differentiated Scheduling

Gary Kotton - VMwareGilad Zlotkin - Radware

The role of the Nova scheduler in managing Quality of Service

1

I am talking about slide 4 here-Gilad Zlotkin

We may want to talk about: "Host Capability" and "Storage proximity" as we do refer to them later in the presentation.-Gilad Zlotkin

Enterprise Ready Openstack

Migrating existing mission critical and performance critical enterprise applications requires:

→ High service levels ● Availability ● Performance ● Security

→ Compliance with existing architectures ● Multi-tier● Fault tolerance models

2

Service Level for Applications

• Availability Fault level Recovery Time

Fault Tolerance (FT) Compute/Network/Storage element(s) 0

High Availability (HA) Compute/Network/Storage element(s) SECs/MINs

Disaster Recovery (DR) The whole site/connectivity HOURs/DAYs

3


• Availability

• Performanceo Transaction Latency (Sec)o Transaction Load/Bandwidth (TPS)

Fault level Recovery Time




3


• Availability


• Securityo Data Privacyo Data Integrityo Denial of Service





3


• Availability


• Securityo Data Privacyo Data Integrityo Denial of Service





What all this has to do with

the Nova Scheduler?

3

High Availability Models

• Availability Zone Redundancy → The “cloud” way

• Server Redundancy→ The “classic” way

• Both Server and Zone Redundancies→ The “enterprise” disaster recovery way

4

Availability Zone Redundancy

WS3

DB2

LB2

WS4

Global Load Balancing

AZ1 AZ2

WS1

DB1

LB1

WS2

5

Server Redundancy

WS1

DB1

LB1

WS2 WS3

LB2

DB2

6

Server and Zone Redundancies

WS1

DB1

LB1

WS2 WS3

LB2

DB2

WS4

DB3

LB3

WS5 WS6

LB4

DB4

Global Load Balancing

AZ1 AZ2

7

Network Availability

Controller Cluster

Transport Network

Logical Network

LB1 LB2

WS1 WS3WS2

DB1 DB2

VMware’s NSX for example

8

Load Balancer Availability

Radware’s Alteon Load Balancer for example

WS1

LB1

WS2 WS3

LB2

Active Standby

Persistency State Synchronization

Configuration Synchronization

Auto Failover

9

Group Scheduling

• Group together VMs to provide a certain service

• Enables scheduling policies per group/sub-group

• Provides a multi-VM application designed for fault tolerance and high performance

10

Example

11

Example

Bad placement: if a host goes down entire service is down!

11

Example

Bad placement: if a host goes down entire service is down!

Placement strategy - anti affinity: achieving fault tolerance

11

Placement Strategies

• Availability - anti affinityo VM's should be placed in different 'failure domains' (e.g., on different

hosts) to ensure application fault tolerance

12




• Performance o Network proximity

Group members should be placed as closely as possible to one another on the network (same 'connectivity domain') to ensure low latency and high performance

12






o Host Capability IO-Intensive, Network-Intensive, CPU-Intensive,...

12







o Storage Proximity

12







o Storage Proximity

• Security - Resource Isolation/Exclusivityo Host, Network, ...

12

Anti Affinity

• Havana: Anti affinity per groupo nova boot --hint group=WS[:anti-affinity]

--image ws.img --flavor 2 --num 3 WSi

• “Instance Groups”o Properties:

Policies - for example anti affinity Members - the instances that are assigned to the

group Metadata - key value pairs

o Sadly did not make the Havana Releaseo Continue work in Icehouse with extended functionality

13

Network Proximity (Same Rack)

14

Host Capabilities

- IO intensive

- CPU intensive

- Network intensive

→ “Smart resource placement” - Yathi Udupi and Debo Dutta (Cisco)

→ “Host Capabilities” - Don Dugger (Intel)

15

Storage Proximity

● Schedule instances to have affinity to Cinder volumes

→ “Scheduling Across Services” - Boris Pavlovic (Mirantis) and Alex Glikson (IBM)

→ “Smart resource placement” - Yathi Udupi and Debo Dutta (Cisco)

16

Resource Exclusivity

• Network Isolation: Neutron, for example VMware’s NSX

• Host Allocation: enable user to have a pool of hosts for exclusive use.

→ “Private Clouds - Whole Host Allocation” - Phil Day (HP), Andrew Laski (Rackspace)

17

Additional Scheduling Topics

→ “Scheduler Performance” - Boris Pavlovic (Mirantis)

→ “Methods to Improve DB Host Statistics” - Shane Wang and Lianhau Lu (Intel)

→ “Scheduler Metrics - Relationship with Ceilometer” - Paul Murray (HP)

→ “Multiple Scheduler Policies” - Alex Glikson (IBM)

18

Icehouse

• Expand on “Instance Group” support

• Topology of resources and relationships between themo Debo Dutta and Yathi Udupi (Cisco)o Mike Spreitzer (IBM)o Gary Kotton (VMware)

19

API - Aiming for I1

• Proposed API (Nova Extension)o id - a unique UUIDo name - human readable nameo tenant_id - the ID of the tenant that owns the groupo policies - a list of policies for the group (anti affinity,

network proximity and host capabilities)o metadata - a way to store arbitrary key value pairs

on a groupo members - UUIDs of all of the instances that are

members of the group

20

Flow

• Group will be created with no memberso Group will have a policy

• Group ID will be used for schedulingo Passed as a hinto Scheduler will update members

• Pending support for group of groups

• Group membership will be removed when instance is deleted

21

Summary Migrating existing mission critical and performance critical enterprise applications requires:

High service levels → Group Scheduling Policies● Availability → Anti-Affinity

● Performance → Proximity / Host Capability

● Security → Resource Exclusivity

22

Q&A

Thank You

Gary Kotton: [email protected] Zlotkin: [email protected]

differentiated services == differentiated scheduling gary kotton - vmware gilad zlotkin - radware...

Documents

group o nova boot

different hosts

group scheduling group

wsi instance groups

different failure domains

entire service

certain service

quality of service