ji-yong shin cornell university in collaboration with mahesh balakrishnan (msr svc), tudor marian...

27
Ji-Yong Shin Cornell University In collaboration with Mahesh Balakrishnan (MSR SVC), Tudor Marian (Google), and Hakim Weatherspoon (Cornell) Gecko: Contention- Oblivious Disk Arrays for Cloud Storage FAST 2013

Upload: albert-mills

Post on 23-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ji-Yong Shin Cornell University In collaboration with Mahesh Balakrishnan (MSR SVC), Tudor Marian (Google), and Hakim Weatherspoon (Cornell) Gecko: Contention-Oblivious

Ji-Yong Shin Cornell University

In collaboration with Mahesh Balakrishnan (MSR SVC), Tudor Marian (Google), and Hakim Weatherspoon (Cornell)

Gecko: Contention-Oblivious Disk Arrays for Cloud Storage

FAST 2013

Page 2: Ji-Yong Shin Cornell University In collaboration with Mahesh Balakrishnan (MSR SVC), Tudor Marian (Google), and Hakim Weatherspoon (Cornell) Gecko: Contention-Oblivious

• What happens to storage?

Cloud and Virtualization

VMM

GuestVM

GuestVM

GuestVM

GuestVM

SharedDisk

SharedDisk

Shared Disk

Shared Disk

SEQUENTIAL

RANDOMRANDOM

2

I/O CONTENTION causes throughput collapse

4-Disk RAID0

Page 3: Ji-Yong Shin Cornell University In collaboration with Mahesh Balakrishnan (MSR SVC), Tudor Marian (Google), and Hakim Weatherspoon (Cornell) Gecko: Contention-Oblivious

Existing Solutions for I/O Contention?• I/O scheduling: reordering I/Os– Entails increased latency for certain workloads– May still require seeking

• Workload placement: positioning workloads to minimize contention– Requires prior knowledge or dynamic prediction– Predictions may be inaccurate– Limits freedom of placing VMs in the cloud

3

Page 4: Ji-Yong Shin Cornell University In collaboration with Mahesh Balakrishnan (MSR SVC), Tudor Marian (Google), and Hakim Weatherspoon (Cornell) Gecko: Contention-Oblivious

– Log all writes to tail

Log-structured File System to the Rescue?[Rosenblum et al. 91]

Addr 0 1 2 … … N

Writ

e

Writ

e

Writ

eLo

g Ta

ilShared

Disk

4

Page 5: Ji-Yong Shin Cornell University In collaboration with Mahesh Balakrishnan (MSR SVC), Tudor Marian (Google), and Hakim Weatherspoon (Cornell) Gecko: Contention-Oblivious

• Garbage collection is the Achilles’ Heel of LFS [Seltzer et al. 93, 95; Matthews et al. 97]

SharedDisk

Challenges of Log-Structured File System

… N…

Log

Tail

SharedDisk Shared

Disk

GC

Garbage Collection (GC)from Log Head

5

Read

GC and read still cause disk seek

Log

Hea

d

Page 6: Ji-Yong Shin Cornell University In collaboration with Mahesh Balakrishnan (MSR SVC), Tudor Marian (Google), and Hakim Weatherspoon (Cornell) Gecko: Contention-Oblivious

Challenges of Log-Structured File System

• Garbage collection is the Achilles’ Heel of LFS– 2-disk RAID-0 setting of LFS – GC under write-only synthetic workload

RAID 0 + LFS

6

Max Sequential Throughput

Throughput falls by 10X during GC

Page 7: Ji-Yong Shin Cornell University In collaboration with Mahesh Balakrishnan (MSR SVC), Tudor Marian (Google), and Hakim Weatherspoon (Cornell) Gecko: Contention-Oblivious

Problem:Increased virtualization leads to

increased disk seeks and kills performance

RAID and LFS do not solve the problem

7

Page 8: Ji-Yong Shin Cornell University In collaboration with Mahesh Balakrishnan (MSR SVC), Tudor Marian (Google), and Hakim Weatherspoon (Cornell) Gecko: Contention-Oblivious

Rest of the Talk• Motivation• Gecko: contention-oblivious disk storage– Sources of I/O contention– New technique: Chained logging– Implementation

• Evaluation• Conclusion

8

Page 9: Ji-Yong Shin Cornell University In collaboration with Mahesh Balakrishnan (MSR SVC), Tudor Marian (Google), and Hakim Weatherspoon (Cornell) Gecko: Contention-Oblivious

What Causes Disk Seeks?• Write-write• Read-read• Write-read

9

VM 1 VM2

Writ

e

Writ

e

Read

Read

Writ

e Read

Page 10: Ji-Yong Shin Cornell University In collaboration with Mahesh Balakrishnan (MSR SVC), Tudor Marian (Google), and Hakim Weatherspoon (Cornell) Gecko: Contention-Oblivious

What Causes Disk Seeks?• Write-write• Read-read• Write-read

• Logging– Write-GC read– Read-GC read

10

VM 1 VM2

Writ

e Read

GC

Page 11: Ji-Yong Shin Cornell University In collaboration with Mahesh Balakrishnan (MSR SVC), Tudor Marian (Google), and Hakim Weatherspoon (Cornell) Gecko: Contention-Oblivious

Principle: A single sequentially accessed disk is better

than multiple randomly seeking disks

11

Page 12: Ji-Yong Shin Cornell University In collaboration with Mahesh Balakrishnan (MSR SVC), Tudor Marian (Google), and Hakim Weatherspoon (Cornell) Gecko: Contention-Oblivious

Gecko’s Chained Logging Design• Separating the log tail from the body

– GC reads do not interrupt the sequential write– 1 uncontended drive >>faster>> N contended drives

Disk 2Disk 1Disk 0

LogTail

Physical Addr Space

GC Garbage

collectionfrom Log Head to Tail

Disk 0 Disk 1 Disk 2

12

Eliminateswrite-write

and reduces write-read contention

Eliminates write-GC read

contention

Page 13: Ji-Yong Shin Cornell University In collaboration with Mahesh Balakrishnan (MSR SVC), Tudor Marian (Google), and Hakim Weatherspoon (Cornell) Gecko: Contention-Oblivious

Gecko’s Chained Logging Design• Smarter “Compact-In-Body” Garbage Collection

Disk 1Disk 0

Physical Addr Space

GC

13

Garbage collectionfrom Head to Body

LogTail

Disk 2

Garbage collection

from Log Head to Tail

Page 14: Ji-Yong Shin Cornell University In collaboration with Mahesh Balakrishnan (MSR SVC), Tudor Marian (Google), and Hakim Weatherspoon (Cornell) Gecko: Contention-Oblivious

Gecko Caching• What happens to reads going to tail drives?

14

LRU Cache

Reduces write-read contention

Disk 1Disk 0 Disk 2

Writ

e

Tail Cache

Read

Read Re

ad

Body Cache (Flash )

Reduces read-read contention

RAMHot data

RAMHot data

MLC SSDWarm data

MLC SSDWarm data

Page 15: Ji-Yong Shin Cornell University In collaboration with Mahesh Balakrishnan (MSR SVC), Tudor Marian (Google), and Hakim Weatherspoon (Cornell) Gecko: Contention-Oblivious

Gecko Properties SummaryNo write-write contention,

No GC-write contention, andReduced read-write contention

15

Disk 1Disk 0 Disk 2

Writ

e

Tail Cache

Read

Read

Body Cache (Flash )

Disk 1’Disk 0’ Disk 2’

Fault tolerance + Read

performance

Mirroring/Striping

Disk 1’Disk 0’Power saving w/o

consistency concerns

Reduced read-write

and read-readcontention

Page 16: Ji-Yong Shin Cornell University In collaboration with Mahesh Balakrishnan (MSR SVC), Tudor Marian (Google), and Hakim Weatherspoon (Cornell) Gecko: Contention-Oblivious

Gecko ImplementationPrimary map: less than 8 GB RAM

for a 8 TB storage

Inverse map: 8 GB flash for a 8 TB storage (written every 1024

writes)

4KB pages

empty filled

Data(in disk)

Physical-to-logical map(in flash)he

ad

head

4-byte entriesLogical-to-physical map(in memory)

16

R

R

W

W

W

W

tail

tail

Disk 0 Disk 1 Disk 2

WW

Page 17: Ji-Yong Shin Cornell University In collaboration with Mahesh Balakrishnan (MSR SVC), Tudor Marian (Google), and Hakim Weatherspoon (Cornell) Gecko: Contention-Oblivious

Evaluation1. How well does Gecko handle GC?2. Performance of Gecko under real workloads?3. Effect of varying Gecko chain length?4. Effectiveness of the tail cache?5. Durability of the flash based tail cache?

17

Page 18: Ji-Yong Shin Cornell University In collaboration with Mahesh Balakrishnan (MSR SVC), Tudor Marian (Google), and Hakim Weatherspoon (Cornell) Gecko: Contention-Oblivious

Evaluation Setup

• In-kernel version– Implemented as block device

for portability– Similar to software RAID

• Hardware– WD 600GB HDD

• Used 512GB of 600GB• 2.5” 10K RPM SATA-600

– Intel MLC (multi level cell) SSD • 240GB SATA-600

• User-level emulator– For fast prototyping– Runs block traces– Tail cache support

18

Page 19: Ji-Yong Shin Cornell University In collaboration with Mahesh Balakrishnan (MSR SVC), Tudor Marian (Google), and Hakim Weatherspoon (Cornell) Gecko: Contention-Oblivious

How Well Does Gecko Handle GC?

Gecko

19

Log + RAID0

Gecko’s aggregate throughput always remains high3X higher aggregate & 4X higher application throughput

2-disk setting; write-only synthetic workload

Max Sequential Throughput of 1 disk Max Sequential

Throughput of 2 disks

Aggregate throughput = Max throughput

Throughput collapses

Page 20: Ji-Yong Shin Cornell University In collaboration with Mahesh Balakrishnan (MSR SVC), Tudor Marian (Google), and Hakim Weatherspoon (Cornell) Gecko: Contention-Oblivious

How Well Does Gecko Handle GC?

20

App throughput can be preserved using smarter GC

Gecko

Page 21: Ji-Yong Shin Cornell University In collaboration with Mahesh Balakrishnan (MSR SVC), Tudor Marian (Google), and Hakim Weatherspoon (Cornell) Gecko: Contention-Oblivious

MS Enterprise and MSR Cambridge Traces

Trace NameEstimatedAddr Space

Total DataAccessed (GB)

Total DataRead (GB)

Total DataWritten (GB) TotalIOReq NumReadReq NumWriteReq

prxy 136 2,076 1,297 779 181,157,932 110,797,984 70,359,948

src1 820 3,107 2,224 884 85,069,608 65,172,645 19,896,963

proj 4,102 2,279 1,937 342 65,841,031 55,809,869 10,031,162

Exchange 4,822 760 300 460 61,179,165 26,324,163 34,855,002

usr 2,461 2,625 2,530 96 58,091,915 50,906,183 7,185,732

LiveMapsBE 6,737 2,344 1,786 558 44,766,484 35,310,420 9,456,064

MSNFS 1,424 303 201 102 29,345,085 19,729,611 9,615,474

DevDivRelease 4,620 428 252 176 18,195,701 12,326,432 5,869,269

prn 770 271 194 77 16,819,297 9,066,281 7,753,016

21

[Narayanan et al. 08, 09]

Page 22: Ji-Yong Shin Cornell University In collaboration with Mahesh Balakrishnan (MSR SVC), Tudor Marian (Google), and Hakim Weatherspoon (Cornell) Gecko: Contention-Oblivious

What is the Performance of Gecko under Real Workloads?

• Gecko– Mirrored chain of length 3– Tail cache (2GB RAM + 32GB SSD)– Body Cache (32GB SSD)

• Log + RAID10 – Mirrored, Log + 3 disk RAID-0 – LRU cache (2GB RAM + 64GB SSD)

22

Gecko showed less read-write contention and higher cache hit rateGecko’s throughput is 2X-3X higher

Gecko Log + RAID10

Mix of 8 workloads: prn, MSNFS, DevDivRelease, proj, Exchange, LiveMapsBE, prxy, and src16 Disk configuration with 200GB of data prefilled

Page 23: Ji-Yong Shin Cornell University In collaboration with Mahesh Balakrishnan (MSR SVC), Tudor Marian (Google), and Hakim Weatherspoon (Cornell) Gecko: Contention-Oblivious

What is the Effect of Varying Gecko Chain Length?

• Same 8 workloads with 200GB data prefilled

• Single uncontended disk • Separating reads and writes

23

better performance

RAID 0

errorbar = stdev

Page 24: Ji-Yong Shin Cornell University In collaboration with Mahesh Balakrishnan (MSR SVC), Tudor Marian (Google), and Hakim Weatherspoon (Cornell) Gecko: Contention-Oblivious

How Effective Is the Tail Cache?• Read hit rate of tail cache (2GB RAM+32GB SSD) on 512GB disk • 21 combinations of 4 to 8 MSR Cambridge and MS Enterprise traces

24

Tail cache can effectively resolve read-write contention

• At least 86% of read hit rate• RAM handles most of hot data

• Amount of data changes hit rate• Still average 80+ % hit rate

Page 25: Ji-Yong Shin Cornell University In collaboration with Mahesh Balakrishnan (MSR SVC), Tudor Marian (Google), and Hakim Weatherspoon (Cornell) Gecko: Contention-Oblivious

How Durable is Flash Based Tail Cache?

• Static analysis of lifetime based on cache hit rate• Use of 2GB RAM extends SSD lifetime

25

2X-8X Lifetime Extension

Lifetime at 40MB/s

Page 26: Ji-Yong Shin Cornell University In collaboration with Mahesh Balakrishnan (MSR SVC), Tudor Marian (Google), and Hakim Weatherspoon (Cornell) Gecko: Contention-Oblivious

Conclusion

26

Page 27: Ji-Yong Shin Cornell University In collaboration with Mahesh Balakrishnan (MSR SVC), Tudor Marian (Google), and Hakim Weatherspoon (Cornell) Gecko: Contention-Oblivious

Question?

27