virtualized databases?

24
VIRTUALIZED DATABASES? Approach: mechanics of virtualization "certain big players" will not be mentioned Talk is general, mostly about hardware issues which are the same for any platform

Upload: liz-van-dijk-ameel

Post on 13-Jul-2015

909 views

Category:

Technology


2 download

TRANSCRIPT

VIRTUALIZED DATABASES?

Approach: mechanics of virtualization"certain big players" will not be mentionedTalk is general, mostly about hardware issues which are the same for any platform

ME

• Liz van Dijk (@lizztheblizz)

•Working at Sizing Servers Research Lab

• First-timer at FOSDEM!

•Not really a developer, not really a sysadmin, not really a DBA

• I just like knowing how stuff works.

• It’s far too broad a term

• It’s a pretty old concept. (about half a century, actually)

• Its main purposes are abstraction and security

•Making use of the correct CPU execution mode

•Managing Virtual Memory

SO... VIRTUALIZATION, HUH.

History!Broad term, 100 different meaningsFull-system virtualization on the mainframes in the 60'sIBM m44, trap and emulate

Recently:* x86 did not support full virtualization, trap and emulate did not work* multicore hardware, single threaded software. Inefficient datacenters.

Full Virtualization is not the only virtualizationcombination of different methods

Who uses RAID?Who uses Virtual Memory?

2 big issues that all solutions try to work aroundFocus on these, the next steps should be more or less logical

Problem 1: matter of privilegeskernels assume full control over hardwarehow does the hardware deal with this?

layer-based security system (onion)2-bit code in memory address, cpu verifies the code, does or doesn't do the instruction

x86: 4 layerscode 00: supervisor modecode 11: user mode

• It’s far too broad a term

• It’s a pretty old concept. (about half a century, actually)

• Its main purposes are abstraction and security

•Making use of the correct CPU execution mode

•Managing Virtual Memory

SO... VIRTUALIZATION, HUH.

History!Broad term, 100 different meaningsFull-system virtualization on the mainframes in the 60'sIBM m44, trap and emulate

Recently:* x86 did not support full virtualization, trap and emulate did not work* multicore hardware, single threaded software. Inefficient datacenters.

Full Virtualization is not the only virtualizationcombination of different methods

Who uses RAID?Who uses Virtual Memory?

2 big issues that all solutions try to work aroundFocus on these, the next steps should be more or less logical

Problem 1: matter of privilegeskernels assume full control over hardwarehow does the hardware deal with this?

layer-based security system (onion)2-bit code in memory address, cpu verifies the code, does or doesn't do the instruction

x86: 4 layerscode 00: supervisor modecode 11: user mode

• It’s far too broad a term

• It’s a pretty old concept. (about half a century, actually)

• Its main purposes are abstraction and security

•Making use of the correct CPU execution mode

•Managing Virtual Memory

SO... VIRTUALIZATION, HUH.

History!Broad term, 100 different meaningsFull-system virtualization on the mainframes in the 60'sIBM m44, trap and emulate

Recently:* x86 did not support full virtualization, trap and emulate did not work* multicore hardware, single threaded software. Inefficient datacenters.

Full Virtualization is not the only virtualizationcombination of different methods

Who uses RAID?Who uses Virtual Memory?

2 big issues that all solutions try to work aroundFocus on these, the next steps should be more or less logical

Problem 1: matter of privilegeskernels assume full control over hardwarehow does the hardware deal with this?

layer-based security system (onion)2-bit code in memory address, cpu verifies the code, does or doesn't do the instruction

x86: 4 layerscode 00: supervisor modecode 11: user mode

X86 VIRTUALIZATION

• Binary Translation, aka “faking it”

• Applies ring deprivileging, and translates “bad calls” on the fly

• “Full” Hardware Virtualization

• Introduced Ring -1: Hypervisor mode

•Only intervenes when absolutely necessary

BT, old awesome, employed by QEMU and wine.Less relevant now for full-virtualizationring deprivileging, look it up!

Intel/AMD caught up, implemented VT-x and AMD-Vring -1: hypervisorLet OS'es do whatever they want, but use trap and emulateextra roundtrip, extra overhead

CPU has more tasks to perform, but they also take longernewer cpu is better

X86 VIRTUALIZATION

• Binary Translation, aka “faking it”

• Applies ring deprivileging, and translates “bad calls” on the fly

• “Full” Hardware Virtualization

• Introduced Ring -1: Hypervisor mode

•Only intervenes when absolutely necessary

BT, old awesome, employed by QEMU and wine.Less relevant now for full-virtualizationring deprivileging, look it up!

Intel/AMD caught up, implemented VT-x and AMD-Vring -1: hypervisorLet OS'es do whatever they want, but use trap and emulateextra roundtrip, extra overhead

CPU has more tasks to perform, but they also take longernewer cpu is better

X86 VIRTUALIZATION

• Binary Translation, aka “faking it”

• Applies ring deprivileging, and translates “bad calls” on the fly

• “Full” Hardware Virtualization

• Introduced Ring -1: Hypervisor mode

•Only intervenes when absolutely necessary

BT, old awesome, employed by QEMU and wine.Less relevant now for full-virtualizationring deprivileging, look it up!

Intel/AMD caught up, implemented VT-x and AMD-Vring -1: hypervisorLet OS'es do whatever they want, but use trap and emulateextra roundtrip, extra overhead

CPU has more tasks to perform, but they also take longernewer cpu is better

VIRTUAL MEMORY

Mem

0xA

0xB

0xC

0xD

0xE

0xF

0xG

0xH CPU

Managed by software

Actual Hardware

Problem 2: Virtual memory4kb physical segments with physical addressessoftware: pages

very easy to manage in OS, all software gets a continuous blockpage table keeps track of physical to virtual mapping

TLB cache keeps track of these mappings, very fastneeds to flush every context switch.

VIRTUAL MEMORY

Mem

0xA

0xB

0xC

0xD

0xE

0xF

0xG

0xH CPU

Managed by software

Actual Hardware

OS

1

2

3

4

5

Virtual Memory

6

7

8

9

10

11

12

Problem 2: Virtual memory4kb physical segments with physical addressessoftware: pages

very easy to manage in OS, all software gets a continuous blockpage table keeps track of physical to virtual mapping

TLB cache keeps track of these mappings, very fastneeds to flush every context switch.

VIRTUAL MEMORY

Mem

0xA

0xB

0xC

0xD

0xE

0xF

0xG

0xH CPU

Managed by software

Actual Hardware

OS

1

2

3

4

5

Virtual Memory

6

7

8

9

10

11

12

1 | 0xD

2 | 0xC

3 | 0xF

Page Table

6 | 0xG

5 | 0xH

4 | 0xA

8 | 0xE

7 | 0xB

etc.

Problem 2: Virtual memory4kb physical segments with physical addressessoftware: pages

very easy to manage in OS, all software gets a continuous blockpage table keeps track of physical to virtual mapping

TLB cache keeps track of these mappings, very fastneeds to flush every context switch.

VIRTUAL MEMORY

Mem

0xA

0xB

0xC

0xD

0xE

0xF

0xG

0xH CPU

TLB

1 | 0xD

5 | 0xH

2 | 0xC

etc.

Managed by software

Actual Hardware

OS

1

2

3

4

5

Virtual Memory

6

7

8

9

10

11

12

1 | 0xD

2 | 0xC

3 | 0xF

Page Table

6 | 0xG

5 | 0xH

4 | 0xA

8 | 0xE

7 | 0xB

etc.

Problem 2: Virtual memory4kb physical segments with physical addressessoftware: pages

very easy to manage in OS, all software gets a continuous blockpage table keeps track of physical to virtual mapping

TLB cache keeps track of these mappings, very fastneeds to flush every context switch.

SPT VS HAP

Mem

0xA

0xB

0xC

0xD

0xE

0xF

0xG

0xH CPU

VM A

VM B

1 | 0xD

5 | 0xH

2 | 0xC

N

“Read-only”Page Table

12 | 0xB

10 | 0xE

9 | 0xA

etc.

1

2

3

4

5

1

2

3

4

12

Managed by VM OS

Managed by hypervisor

Actual Hardware

2 methodslocked page table, access generates trap, VMM handles memory accessmuch slower memory access

EPT/RVI/HAPMake TLB much bigger, make it smarter, VM-awaremuch more complex to fill up, though. slow initial memory accessfilled TLB is very fast, tho.

SPT VS HAP

Mem

0xA

0xB

0xC

0xD

0xE

0xF

0xG

0xH CPU

VM A

VM B

1 | 0xD

5 | 0xH

2 | 0xC

N

“Read-only”Page Table

12 | 0xB

10 | 0xE

9 | 0xA

etc.

1

2

3

4

5

1

2

3

4

12

Managed by VM OS

Managed by hypervisor

Actual Hardware

B

A

1 | 0xG

5 | 0xD

2 | 0xF

12 | 0xE

10 | 0xB

9 | 0xC

“Shadow” Page Table

2 methodslocked page table, access generates trap, VMM handles memory accessmuch slower memory access

EPT/RVI/HAPMake TLB much bigger, make it smarter, VM-awaremuch more complex to fill up, though. slow initial memory accessfilled TLB is very fast, tho.

SPT VS HAP

Mem

0xA

0xB

0xC

0xD

0xE

0xF

0xG

0xH CPU

TLB

A1 | 0xD

A5 | 0xH

A2 | 0xC

etc.

B12 | 0xB

B10 | 0xE

B9 | 0xA

VM A

VM B

1 | 0xD

5 | 0xH

2 | 0xC

N

“Read-only”Page Table

12 | 0xB

10 | 0xE

9 | 0xA

etc.

1

2

3

4

5

1

2

3

4

12

Managed by VM OS

Managed by hypervisor

Actual Hardware

2 methodslocked page table, access generates trap, VMM handles memory accessmuch slower memory access

EPT/RVI/HAPMake TLB much bigger, make it smarter, VM-awaremuch more complex to fill up, though. slow initial memory accessfilled TLB is very fast, tho.

WHAT DOES THIS TEACH US?

• All “kernel” activity is a lot more costly:• Interrupts• System Calls (I/O)•Memory page management

so, 3 actions are slower in virtualizationInterrupts - hardware asking for attentionSystem Calls - software asking for kernel attentionPage Management - memory access

IN THE WILD...

• From best to worst case scenario...

• Bare-metal (Xen, KVM, ESX, Hyper-V)

• Host-based (VirtualBox, VMware Workstation, etc.)

• Cloud-based (Amazon, Terremark, etc.)

BARE-METAL OPTIONS

• Know your my.cnf inside out

• Use hardware-assisted paging + Large Pages! (InnoDB: large-pages)

•Make use of paravirtualized HW options

• Take care of all your caching levels

• Use DirectIO (innodb_flush_method=O_DIRECT)

smalls mistakes in a native environment get bigger in virtual onememory allocations are expensiveoptimize your my.cnf!!!tools.percona.com good starting pointconnection-specific buffers (join-buffer, sort-buffer, etc)sweet spot = test!!

SWAPPING = EVILswappiness

Large Pages

DirectIO

BARE-METAL OPTIONS

• Know your my.cnf inside out

• Use hardware-assisted paging + Large Pages! (InnoDB: large-pages)

•Make use of paravirtualized HW options

• Take care of all your caching levels

• Use DirectIO (innodb_flush_method=O_DIRECT)

smalls mistakes in a native environment get bigger in virtual onememory allocations are expensiveoptimize your my.cnf!!!tools.percona.com good starting pointconnection-specific buffers (join-buffer, sort-buffer, etc)sweet spot = test!!

SWAPPING = EVILswappiness

Large Pages

DirectIO

BARE-METAL OPTIONS

• Know your my.cnf inside out

• Use hardware-assisted paging + Large Pages! (InnoDB: large-pages)

•Make use of paravirtualized HW options

• Take care of all your caching levels

• Use DirectIO (innodb_flush_method=O_DIRECT)

smalls mistakes in a native environment get bigger in virtual onememory allocations are expensiveoptimize your my.cnf!!!tools.percona.com good starting pointconnection-specific buffers (join-buffer, sort-buffer, etc)sweet spot = test!!

SWAPPING = EVILswappiness

Large Pages

DirectIO

HARDWARE CHOICES

• Choosing the right CPU’s

• Intel 5500/7500 and later types (Nehalem) / All AMD quadcore Opterons (HW-assisted/MMU virtualization)

• Choosing the right NIC’s (VMDQ)

• Choosing the right storage system (iSCSI vs FC SAN)

CPU's listed here support both HW-assist and HAP

virtual machine device queueing

HOST-BASED

• All of the above, if possible :)

• IO becomes the bigger issue on standard client hardware

• Focus on moving database IO away from the same disk you run the host- and guest-OS on.

• Consider installing an SSD :)

Keep in mind all of the previous thingsIO is a bigger issue2 OS'es + DB running on the same disk always a problemseparate disk, maybe iSCSI lun? buy an SSD!

CLOUD-BASED

•No control whatsoever over host-system :(

• Sometimes unreliable IO

• Change strategy! Design for easy sharding and replication!

• Caching caching caching!

• Consider RDS to reduce operational overhead?

Can't escape the hurtunreliable disk IOCACHINGsharding/replication to spread write/read loadvery write-heavy may be more trouble than it's worthasynchronous writes? not very durableUse RDS to cut back operational cost

THANKS!