grant cohoe impact of disk alignment in virtualized environments

38
Grant Cohoe IMPACT OF DISK ALIGNMENT IN VIRTUALIZED ENVIRONMENTS

Upload: edgar-parrish

Post on 16-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Grant Cohoe

IMPACT OF DISK ALIGNMENT IN VIRTUALIZED ENVIRONMENTS

WHY SHOULD YOU CARE?• Performance

• Misalignment causes more IO’s than you need

• Shared Storage issues

UNDERSTAND YOUR STUFF• Hard Disk Geometry

• Sector Size (Logical & Physical)

• Operating System

• What does it want?

• What does it do by default?

• Sometimes silly things…

LAYERS

Disk Geometry/Partitions

LVM

Host File System

VMDK Geometry/Partitions

VMFS

DISK GEOMETRY/PARTITIONS

Disk Geometry/Partitions

LVM

Host File System

VMDK Geometry/Partitions

VMFS

TERMINOLOGY• Sectors

• Units of disk storage

• Partition

• Logical group of sectors

• Track

• Ring of sectors on a single side of a platter

• Cylinder

• 3D track (all platters at one track location)

Disk Geometry/Partitions

LVM

Host File System

VMDK Geometry/Partitions

VMFS

MASTER BOOT RECORD (MBR)• That thing that boots your OS

• First 512 bytes of the disk

• 440 bytes of bootloader

• 32 bytes of partition information

• 4 primary partitions - max size 2TB

512

STA

RT 440 (Boot loader) 32

Disk Geometry/Partitions

LVM

Host File System

VMDK Geometry/Partitions

VMFS

MASTER BOOT RECORD (MBR)• DOS Compatibility

• Cannot span cylinders (because DOS was silly)

• Number of sectors per cylinder = 63

• 63 – 1 (MBR) = 62 sectors before first usable

• This is deprecatedMBR LBA-1 LBA-62 63

Disk Geometry/Partitions

LVM

Host File System

VMDK Geometry/Partitions

VMFS

MASTER BOOT RECORD (MBR)• 1MB Alignment

• Align all partitions to 1MB

• 1MB = 1048576B / 512B sectors = 2048 (1st Sector)

• Improves performance

• Ensures compatibility for 4K “Advanced Format”

• This is new standard (Windows Vista)

MBR LBA-1 LBA-2047 2048

Disk Geometry/Partitions

LVM

Host File System

VMDK Geometry/Partitions

VMFS

RESULTING DISK

• 512B MBR –

• Alignment Space –

• 1st Partition Starting Sector –

• This is good!

MBR 2048 2049 2050 2051 2052 2053 2054 2055 …16777215

MBR

2048

Disk Geometry/Partitions

LVM

Host File System

VMDK Geometry/Partitions

VMFS

LOGICAL VOLUME MANAGEMENT (LVM)

Disk Geometry/Partitions

LVM

Host File System

VMDK Geometry/Partitions

VMFS

TERMINOLOGY• Physical Volume

• Container of data stored as a partition on disk

• Logical Volume

• Virtualized storage structure stored as data in a PV

• pe_start

• LV offset within a PV

Disk Geometry/Partitions

LVM

Host File System

VMDK Geometry/Partitions

VMFS

LVM PHYSICAL VOLUMES (LVM PV)• pe_start specifies the start of LV data

• Very intelligent. Usually not a problem

• Needs to be aligned to your sectors!

Disk Geometry/Partitions

LVM

Host File System

VMDK Geometry/Partitions

VMFS

LVM PHYSICAL VOLUMES (LVM PV)• Bad

• pe_start does not line up with a sector

• Going to hurt performance later

Disk Geometry/Partitions

LVM

Host File System

VMDK Geometry/Partitions

VMFS

MBR 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059

Physical Volume

pe_start PV Data Region

LVM PHYSICAL VOLUMES (LVM PV)• Good

• As long as pe_start is a multiple ofyour sector size (usually 512B)you’re good!

Disk Geometry/Partitions

LVM

Host File System

VMDK Geometry/Partitions

VMFS

MBR 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059

Physical Volume

pe_start PV Data Region

LVM PHYSICAL VOLUMES (LVM PV)• PE Size

• Physical Extent – LVM “block” size

• Usually default is fine

• Multiple of sector size (512)

Disk Geometry/Partitions

LVM

Host File System

VMDK Geometry/Partitions

VMFS

RESULTING VOLUME

• LV starting point aligned (pe_start)

• PV aligned to sectors on disk

Disk Geometry/Partitions

LVM

Host File System

VMDK Geometry/Partitions

VMFS

MBR 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059

Physical Volume

pe_start PV Data Region

Logical Volume

HOST FILE SYSTEM

Disk Geometry/Partitions

LVM

Host File System

VMDK Geometry/Partitions

VMFS

HOST FILE SYSTEM• Not much to do here

• RAID would be a different story…

• Ext is good at picking sane defaults

• Block size

• Smallest unit of data for the filesystem

Disk Geometry/Partitions

LVM

Host File System

VMDK Geometry/Partitions

VMFS

RESULTING FILESYSTEM

MBR 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059

Physical Volume

pe_start PV Data Region

Logical Volume

Filesystem

VMDK GEOMETRY & PARTITIONS

Disk Geometry/Partitions

LVM

Host File System

VMDK Geometry/Partitions

VMFS

VMDK GEOMETRY/PARTITIONS• Same principles as host disks

• DOS compatibility sucks

• 1MB alignment is good

• Performance impact is bigger

Disk Geometry/Partitions

LVM

Host File System

VMDK Geometry/Partitions

VMFS

MBR 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059

VM FILE SYSTEM

Disk Geometry/Partitions

LVM

Host File System

VMDK Geometry/Partitions

VMFS

VM FILE SYSTEM• Don’t use RAID/LVM in VMs

• Unless you really need it for some reason

• Or if you did a P2V

Disk Geometry/Partitions

LVM

Host File System

VMDK Geometry/Partitions

VMFS

MBR 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059

VM File System

VM ALIGNMENT

Disk Geometry/Partitions

LVM

Host File System

VMDK Geometry/Partitions

VMFS

PERFECTLY ALIGNED VM

Disk Geometry/Partitions

LVM

Host File System

VMDK Geometry/Partitions

VMFS

MBR 2048 2049 2050 2051 2052 2053

VM File System

MBR 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059

Physical Volume

pe_start PV Data Region

Logical Volume

Filesystem

2054

PERFECTLY ALIGNED VM

Disk Geometry/Partitions

LVM

Host File System

VMDK Geometry/Partitions

VMFS

4096

512 512 512 512 512 512 512 512

4096

1024

512 512 512 512 512 512 512 512

VM FS Block

VMDK Sectors

Host FS Block

LVM PE*

Host Disk Blocks

* PE shown as 1K for example

1024 1024 1024

MISALIGNED VM

Disk Geometry/Partitions

LVM

Host File System

VMDK Geometry/Partitions

VMFS

4096

512 512 512 512 512 512 512 512

4096

512 512 512 512 512 512 512 512

4096

512 512 512 512 512 512 512 512

• VM disk image sits across two Host FS blocks, thus requiring more reads of the host disks to get all data

• 4096B of VM data requires 8192B of host disk data to read

1024 1024 1024 1024 1024 1024 1024 1024

END GOAL

Disk Geometry/Partitions

LVM

Host File System

VMDK Geometry/Partitions

VMFS

MBR 2048 2049 2050 2051 2052 2053

VM File System

MBR 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059

Physical Volume

pe_start PV Data Region

Logical Volume

Filesystem Filesystem Filesystem Filesystem Filesystem

Grant Cohoe

http://grantcohoe.com

QUESTIONS?

MODERN STUFFBONUS MATERIAL

ADVANCED FORMAT DISKS• 4K Sectors

• Old:

• New:

• Much more efficient with todays data usage

• 512e Emulation Mode

• Lets old stuff still work with new disks

• Logical (OS):

• Physical (Disk):

64 65 66 67 68 69 70 71

8

64 65 66 67 68 69 70 71

8

ADVANCED FORMAT DISKS & MBR• Regular disks (512 byte sectors)

• LBA-63

• Advanced Format (4K sectors) w/ e512

• LBA-63

• PROBLEM LATER ON

MBR 1 62 63 64 65 66 67 68 69 70 71 72 73 74 75

MBR 1 62 63 64 65 66 67 68 69 70 71 72 73 74 75

0 4K sectors 7 8 9

GUID PARTITION TABLE (GPT)• That new thing that boots your OS

• First 17K of the disk

• Lots of stuff ------------------------------>

• On Disk

GPT Alignment Space 2048

RAID IMPLICATIONS• If RAID volume misaligned, entire array is affected

• RAID in VMs is BAD!

RAID TERMINOLOGY• Data Disk

• A disk that has real data (not parity)

• Stripe

• RAID unit of IO (“block”)

• Also called “Chunk”

• Stride

• Amount of data from a stripe before moving to next disk

• Stripe Width

• Length of a stripe

RAID MATH• Constants

• DATA_DISKS = 3 (lets say this is RAID5 with 4 disks)

• BLOCK_SIZE = 4K (from the filesystem)

• CHUNK_SIZE = 512K

• Calculate Stride

• STRIDE = CHUNK_SIZE / BLOCK_SIZE = 128K

• Calculate Stripe Width

• STRIPE_WIDTH = STRIDE * DATA_DISKS = 384K

• What this means:

• One unit of RAID IO will write 128K to the first disk then move on to the next one