outline - web.imt-atlantique.fr outline what is dynamic consolidation? background and challenge why...

34
Takahiro Hirofuchi, Hidemoto Nakada, Satoshi Itoh, and Satoshi Sekiguchi National Institute of Advanced Industrial Science and Technology (AIST), Japan VTDC2011, Jun. 8 th , 2011 1

Upload: others

Post on 09-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Outline - web.imt-atlantique.fr Outline What is dynamic consolidation? Background and challenge Why is postcopy live migration promising? Comparison between postcopy/precopy Our postcopy

Takahiro Hirofuchi, Hidemoto Nakada, Satoshi Itoh, and Satoshi Sekiguchi

National Institute of Advanced Industrial Science and  Technology (AIST), Japan

VTDC2011, Jun. 8th, 20111

Page 2: Outline - web.imt-atlantique.fr Outline What is dynamic consolidation? Background and challenge Why is postcopy live migration promising? Comparison between postcopy/precopy Our postcopy

Outline What is dynamic consolidation?

Background and challenge  Why is postcopy live migration promising?

Comparison between postcopy/precopy Our postcopy live migration implementation

Reactive consolidation system Overall design Packing algorithm

Evaluation Conclusion

2

Page 3: Outline - web.imt-atlantique.fr Outline What is dynamic consolidation? Background and challenge Why is postcopy live migration promising? Comparison between postcopy/precopy Our postcopy

Dynamic Consolidation Dynamically optimize VM locations in response to VM load changes in order to Eliminate excessive power Assure VM performance

3

Page 4: Outline - web.imt-atlantique.fr Outline What is dynamic consolidation? Background and challenge Why is postcopy live migration promising? Comparison between postcopy/precopy Our postcopy

Dynamic Consolidation

4

Suspend idle physical hosts

Move running VMs into fewer hosts

VM

Eliminate excessive power 

Reduce powerconsumption

When datacenter load becomes small,

idle

Page 5: Outline - web.imt-atlantique.fr Outline What is dynamic consolidation? Background and challenge Why is postcopy live migration promising? Comparison between postcopy/precopy Our postcopy

Dynamic Consolidation

5

VM

Assure VM Performance

Distribute running VMs into other hosts

Power up new physical hosts

The CPU usage of physical machines

Remove overload

When datacenter loadbecomes high,

Page 6: Outline - web.imt-atlantique.fr Outline What is dynamic consolidation? Background and challenge Why is postcopy live migration promising? Comparison between postcopy/precopy Our postcopy

Challenge Be transparent to IaaS customers

IaaS providers present performance criteria EC2 Small Instance: 1.0‐1.2GHz 2007 Opteron

Live migration incurs Long duration until completed CPU Overheads

6

Hide what’s going on in the background as much as possible!

Page 7: Outline - web.imt-atlantique.fr Outline What is dynamic consolidation? Background and challenge Why is postcopy live migration promising? Comparison between postcopy/precopy Our postcopy

Contribution Reactive consolidation by postcopy live migration

Develop postcopy live migration for Qemu/KVM Presented in CCGrid2010

Develop a reactive VM consolidation system Optimize VM locations in response to load changes Exploit postcopy live migration for quick load balance Presented in this talk

Assure VM performance in a higher degree Reduce performance loss on VM repacking

50% better in a randomly‐generated scenario

7

Page 8: Outline - web.imt-atlantique.fr Outline What is dynamic consolidation? Background and challenge Why is postcopy live migration promising? Comparison between postcopy/precopy Our postcopy

Outline What is dynamic consolidation?

Background and challenge  Why is postcopy live migration promising?

Comparison between postcopy/precopy Our postcopy live migration implementation

Reactive consolidation system Overall design Packing algorithm

Evaluation Conclusion

8

Page 9: Outline - web.imt-atlantique.fr Outline What is dynamic consolidation? Background and challenge Why is postcopy live migration promising? Comparison between postcopy/precopy Our postcopy

Precopy v.s. Postcopy Precopy live migration

Copy VM memory before switching the execution host Widely used in VMMs

Postcopy live migration Copy VM memory after switching the execution host No publicly‐available implementation

But, we developed it!

9

Page 10: Outline - web.imt-atlantique.fr Outline What is dynamic consolidation? Background and challenge Why is postcopy live migration promising? Comparison between postcopy/precopy Our postcopy

Ex. VM with 1 GB RAM Takes 10 seconds at least with GbE May take more and more.

10

1. Copy all memory pages to destination

2. Copy memory pages updated during the previous copy again

3. Repeat the 2nd step until the rest of memory pages are enough small

4. Stop VM

5. Copy CPU registers, device states, and the rest of memory pages.

6. Resume VM at destination

VM

Machine A Machine B

RAM

Precopy Live Migration (1)Copy VM memory before relocation

Page 11: Outline - web.imt-atlantique.fr Outline What is dynamic consolidation? Background and challenge Why is postcopy live migration promising? Comparison between postcopy/precopy Our postcopy

Ex. VM with 1 GB RAM Takes 10 seconds at least with GbE May take more and more.

11

1. Copy all memory pages to destination

2. Copy memory pages updated during the previous copy again

3. Repeat the 2nd step until the rest of memory pages are enough small

4. Stop VM

5. Copy CPU registers, device states, and the rest of memory pages.

6. Resume VM at destination

Precopy Live Migration (2)Copy VM memory before relocation

RAM sizeNetwork speed

Migration time  =

+  alpha

Page 12: Outline - web.imt-atlantique.fr Outline What is dynamic consolidation? Background and challenge Why is postcopy live migration promising? Comparison between postcopy/precopy Our postcopy

Ex. VM with 1 GB RAM Takes 10 seconds at least with GbE May take more and more.

12

1. Copy all memory pages to destination

2. Copy memory pages updated during the previous copy again

3. Repeat the 2nd step until the rest of memory pages are enough small

4. Stop VM

5. Copy CPU registers, device states, and the rest of memory pages.

6. Resume VM at destination

Precopy Live Migration (3)Copy VM memory before relocation

RAM sizeNetwork speed

Migration time  =

+  alpha

It takes long time to switch location.It is difficult to estimate how long it takes.

Page 13: Outline - web.imt-atlantique.fr Outline What is dynamic consolidation? Background and challenge Why is postcopy live migration promising? Comparison between postcopy/precopy Our postcopy

Postcopy Live Migration (1)Copy VM memory after relocation

13

VM

Machine A Machine B

Stop 1. Stop VM

2. Copy CPU and device states to destination

3. Resume VM at destination

4. Copy memory pages

RAM

Page 14: Outline - web.imt-atlantique.fr Outline What is dynamic consolidation? Background and challenge Why is postcopy live migration promising? Comparison between postcopy/precopy Our postcopy

Postcopy Live Migration (2)Copy VM memory after relocation

14

VM

Machine A Machine B

1. Stop VM

2. Copy CPU and device states to destination

3. Resume VM at destination

4. Copy memory pages

Copy CPU and device statesOnly 256KB w/o VGA

=> Less than 1 sec for relocation

RAM

Page 15: Outline - web.imt-atlantique.fr Outline What is dynamic consolidation? Background and challenge Why is postcopy live migration promising? Comparison between postcopy/precopy Our postcopy

Postcopy Live Migration (3)Copy VM memory after relocation

15

VMVM

Machine A Machine B

1. Stop VM

2. Copy CPU and device statesto destination

3. Resume VM at destination

4. Copy memory pages

Resume

RAM

Page 16: Outline - web.imt-atlantique.fr Outline What is dynamic consolidation? Background and challenge Why is postcopy live migration promising? Comparison between postcopy/precopy Our postcopy

Postcopy Live Migration (4)Copy VM memory after relocation

16

VMVM

Machine A Machine B

1. Stop VM

2. Copy CPU and device statesto destination

3. Resume VM at destination

4. Copy memory pages

Copy memory pages• On-demand• Background

RAM

Page 17: Outline - web.imt-atlantique.fr Outline What is dynamic consolidation? Background and challenge Why is postcopy live migration promising? Comparison between postcopy/precopy Our postcopy

Demo (Precopy v.s. Postcopy)

17

Postcopy Live MigrationPrecopy Live Migration

Migrating …. Migrating ….

Page 18: Outline - web.imt-atlantique.fr Outline What is dynamic consolidation? Background and challenge Why is postcopy live migration promising? Comparison between postcopy/precopy Our postcopy

Problems in Prior Consolidation Studies All prior studies are based on precopy live migration.

To tackle a long migration time, prior studies use load prediction techniques.

Repacking timescale ~ hour/day (e.g., business hour). However, IaaS datacenters allow users to run any kinds of workloads at any time. Precise prediction is difficult because we cannot use workload specific algorithms.

Nobody cannot predict sudden load changes.

18

Page 19: Outline - web.imt-atlantique.fr Outline What is dynamic consolidation? Background and challenge Why is postcopy live migration promising? Comparison between postcopy/precopy Our postcopy

Reactive Consolidation Exploit postcopy live migration Reactively optimize VM locations in response to load changes.

Repacking timescale ~ 10 seconds

19

Overloaded

Wall clock time

DetectSwitch the execution host (1 sec)

Complete(T = Ramsize / Bandwidth)

Page 20: Outline - web.imt-atlantique.fr Outline What is dynamic consolidation? Background and challenge Why is postcopy live migration promising? Comparison between postcopy/precopy Our postcopy

Packing Algorithm (1) Realistic design

Lightweight, near‐optimal Define two server types

Shared Server (with large RAM, expensive) Consolidate many idle VMs Always power on

Dedicated Server   Host an actively‐running VM Suspend when unused

20

Page 21: Outline - web.imt-atlantique.fr Outline What is dynamic consolidation? Background and challenge Why is postcopy live migration promising? Comparison between postcopy/precopy Our postcopy

Packing Algorithm (2)

21

Server Node(Shared)

Server Node(Dedicated)

Power Off(Suspended)

Always Power On

VMVMVMVM

Server Node(Dedicated)

Power On(Resumed)

VM

Server Node(Dedicated)

Power On(Resumed)

VM

If overloaded, an active VM pops out from Shared Server.Pop‐Out Threshold = over 90% CPU usage of Shared Server.

Power off unusedDedicated Server

Server Node(Dedicated)

Power Off(Suspended)

Page 22: Outline - web.imt-atlantique.fr Outline What is dynamic consolidation? Background and challenge Why is postcopy live migration promising? Comparison between postcopy/precopy Our postcopy

Packing Algorithm (3)

22

Server Node(Shared)

Server Node(Dedicated)

Power Off(Suspended)

Always Power On

VMVMVMVM

Server Node(Dedicated)

Power On(Resumed)

VM

Server Node(Dedicated)

Power On(Resumed)

VM

If a VM becomes idle and Shared Server has space for it, the VM returns to Shared Server.Idle Threshold = under 50% usage of Dedicated Server.

Power off unusedDedicated Server

Server Node(Dedicated)

Power Off(Suspended)

Page 23: Outline - web.imt-atlantique.fr Outline What is dynamic consolidation? Background and challenge Why is postcopy live migration promising? Comparison between postcopy/precopy Our postcopy

Evaluation Experiments

Simple load change scenario Pure CPU‐intensive workload Memory‐intensive workload

Compound load change scenario A new benchmark program

Metric for performance assurance Generate a target CPU load with a specified memory update intensity

Measure achieved operations per second Failed ops = Target ops – Achieved ops

23

Busy loopC (times)

Memory touchC * alpha (times) sleep

One operation

Target CPU LoadMemoryupdateintensity

CPU is busy CPU is idle

Page 24: Outline - web.imt-atlantique.fr Outline What is dynamic consolidation? Background and challenge Why is postcopy live migration promising? Comparison between postcopy/precopy Our postcopy

Simple Load Change Scenario

24

• 6VMs• VM0 and VM1 become active.

Page 25: Outline - web.imt-atlantique.fr Outline What is dynamic consolidation? Background and challenge Why is postcopy live migration promising? Comparison between postcopy/precopy Our postcopy

25

1

2

How it basically works.1. Detect overloading and 

optimize locations. VM0 is moved to Dedicated Server 4

2. Detect idle state and optimize location.

VM0 is moved back to Shared Server

Shared Server’sCPU Usage (%)

Dedicated Server’sCPU Usage (%)

Page 26: Outline - web.imt-atlantique.fr Outline What is dynamic consolidation? Background and challenge Why is postcopy live migration promising? Comparison between postcopy/precopy Our postcopy

Using Postcopy

26

Using PrecopyOverload is removedin less than 10 seconds.

Overload is removedin more than 20 seconds.

VM0 cannot get enough CPU resource.This results in Failed operations.

Shared Server’sCPU Usage (%)

Page 27: Outline - web.imt-atlantique.fr Outline What is dynamic consolidation? Background and challenge Why is postcopy live migration promising? Comparison between postcopy/precopy Our postcopy

Failed Operations per Second(Red: VM0, Green: VM1)

27

Using Postcopy Using Precopy

• Postcopy alleviates the number of failed operations.• 20000(ops) ‐> 5000 (ops)• Note that 4000 (ops) is detection overhead (avg. 5sec).

• In Poscopy, performance loss until completed is small, i.e.,  1000 (ops);  this workload is pure CPU‐intensive 

Page 28: Outline - web.imt-atlantique.fr Outline What is dynamic consolidation? Background and challenge Why is postcopy live migration promising? Comparison between postcopy/precopy Our postcopy

Failed Operation per Second  v.s. Memory Update Intensity

28

Detection overhead

1 GB/s at 100% CPU

Using precopy incurs large performance penalty for memory‐intensive workloads.

Using postcopy greatly contributes to reducing  packing overheads.

Page 29: Outline - web.imt-atlantique.fr Outline What is dynamic consolidation? Background and challenge Why is postcopy live migration promising? Comparison between postcopy/precopy Our postcopy

Compound Load Change Scenarios (1) One‐hour, randomly‐generated load changes

Emulate race‐to‐halt workloads Mostly idle, but sometimes active

An active VM consumes a randomCPU load between 80% and 100%.

An idle VM consumes a randomCPU load between 0% and 20%. 

29

80%

20%

• A new state continues for a random duration between 60 and 300 seconds.• Memory update intensity is 0.6 (600MB/s on 100% CPU usage).

Page 30: Outline - web.imt-atlantique.fr Outline What is dynamic consolidation? Background and challenge Why is postcopy live migration promising? Comparison between postcopy/precopy Our postcopy

Compound Load Change Scenarios (2)

30

By using postcopy migration, failed operations are reduced to the half level of using precopy.

Page 31: Outline - web.imt-atlantique.fr Outline What is dynamic consolidation? Background and challenge Why is postcopy live migration promising? Comparison between postcopy/precopy Our postcopy

Compound Load Change Scenarios (3)

31

The number of live migrations during each scenario

The consolidation system using postcopy frequently optimizes VM locations.

Page 32: Outline - web.imt-atlantique.fr Outline What is dynamic consolidation? Background and challenge Why is postcopy live migration promising? Comparison between postcopy/precopy Our postcopy

Related Work Postcopy live migration

SnowFlock (live VM cloning), EuroSys2009 Postcopy live migration (Xen), VEE2009 Postcopy live migration (Qemu/KVM), CCGrid2010

By me  VM Consolidation System

Black‐box v.s. Gray‐box, NSDI2007 Need workload‐specific information for better prediction

Genetic‐Algorithm, DCAI2009 Quickly find near‐optimal locations By us

32

Page 33: Outline - web.imt-atlantique.fr Outline What is dynamic consolidation? Background and challenge Why is postcopy live migration promising? Comparison between postcopy/precopy Our postcopy

Conclusion Reactive VM consolidation System

Keeps performance criteria of VMs as much as possible in IaaS datacenters

Does not reply on load prediction Reactively optimizes VM locations in response to load changes

Exploits postcopy live migration for quick load balance Evaluation

better performance assurance than using precopy live migration, especially for memory‐intensive workloads

33

Page 34: Outline - web.imt-atlantique.fr Outline What is dynamic consolidation? Background and challenge Why is postcopy live migration promising? Comparison between postcopy/precopy Our postcopy

34

Yabusame ‐流鏑馬Postcopy Live Migration 

for Qemu/KVMKVM Forum 2011

http://grivon.apgrid.org/

Photo: © Yuki Shimazu 2011http://www.flickr.com/photos/shimazu/5631324478/

“Yabusame (流鏑馬) is a type of mounted archery in traditional Japanese archery. An archer on a running horse shoots three special turnip‐headed arrows successively at three wooden targets.“ from wikipedia.