hybridstore: a cost efficient, high performance storage...
TRANSCRIPT
HybridStore: A Cost Efficient, High Performance Storage System Combining SSDs and HDDs
Youngjae Kim, Aayush Gupta, Bhuvan Urgaonkar, Piotr Berman, and Anand Sivasubramaniam
Oak Ridge National Laboratory Pennsylvania State University
!
July!26th!2011!
!
!
!
Enterprise-scale Storage Systems
2
Enterprise-scale Hard Disk Drive
! Enterprise-scale Storage Systems • Information technology focusing on storage, protection, retrieval of data in
LARGE-SCALE environments
! Data-centric services • File, web & media servers, transaction processing servers
Google's!massive!server!farms!!
A Persistent Hurdle in Enterprise Computing
3
Huge Performance Discrepancy Between CPU and HDD ! Normalized CPU Performance and Media Access Time
Jan-96
Jan-97
Jan-98
Jan-99
Jan-00
Jan-01
Jan-02
Jan-03
Jan-04
Jan-05
Jan-06
Jan-07
Jan-08
Jan-09
0
20
40
60
80
100
120
140
160
180
200
Sca
ling
Fact
or
Multicore CPU
CPU
Source: Intel Measurements
Measured CPU Performance scaling
= 175X
Measured HDD Performance scaling = 1.3X since Jan’96
HDD
o HDD Performance (Random Read IOPs) has been stagnant for decades.
o I/O bottleneck has become increasingly worse over time.
Intel® X25-E Seagate Cheetah 15K.6
4KB$Random$Read:$$35,000$IOPs$ 4,000$IOPs$
Flash SSD Potential !!
Fusion-IO’s ioDrive Duo (MLC) (100KIOPS for Reads, 141KIOPS for Writes)
Scalable Memory Architecture (VXM) 84 VIMMs (Violin Intelligent memory Modules) (1M random IOPs, PCIe x4/x8 I/F, DRAM/Flash SSD)
Violin Memory Inc – Violin 1010
! Enterprise scale storage • Fusion-io’s ioDrive, Texas Memory System’s RamSan-500,
Symmetrix DMX-4 from EMC
! Embedded Storage • PDAs, mobile phones, digital cameras
! Desktop storage • MacBook Air, One Laptop Per Child (OLPC),
game consoles, Intel’s X25-E Extreme SATA Solid-State Drive
Emergence of NAND Flash
4
Embedded, Desktop, and Enterprise
Intel 320 MLC Series (38K IOPS for Reads, 1.4K IOPS for Writes)
Unknown $8,335 / 320GB
$219 / 120GB
Contents
! Introduction
! Background • NAND Flash based SSDs versus HDD • Motivation for HybridStore and Related Works
! Overview of HybridStore • Capacity Planner
− Workload Analyzer − Storage Optimization Solver
! Experimental Results
! Conclusion
5!
Emergence of NAND Flash based SSD
! NAND Flash vs. Hard Disk Drives • Pros:
− Semi-conductor technology, no mechanical parts − Offers lower and more predictable access latencies
" Microseconds (45us Reads / 200us Writes) vs. Milliseconds for Hard Disks
− Lower power consumption − Higher robustness to vibrations and temperature
• Cons:
− Limited lifetime " 10K - 1M erases per block
− High cost " About 8X more expensive than current hard disks
− Random writes can be sometimes slow
6
NAND Flash based SSD
7
SATA 2.5” 32GB SSD
Process Process Process
File System (FAT, Ext2, NTFS…)
Block Device Driver
fwrite (file,data)
Block write (LBA, size)
Control Signal
Page write (Bank, Block, Page)
Application
OS
Flash Banks
Block Interface (SATA, SCSI, etc)
CPU (FTL) Write Buffer Read Cache
Memory Garbage Collector
Address Translation
Wear- Leveler
Device
System Architecture
Existing Storage Server Platform
! Examples of Storage Server Platform • Various network interface
− Fibre Channel, SAS etc • Various types of hard disk drives
− 2.5” SAS drive, 3.5” SATA drive, etc
8!
Can SSDs replace HDDs?
! Challenges • Unique performance characteristics of SSD
− SSD may become worse than HDD due to GC. • Reliability Concerns
− Lifetime of SSDs is limited by the write rates. • Cost Concerns
− NAND Flash is still expensive over HDD.
! HybridStore • Hybrid storage systems that combine HDDs and SSDs.
Homogeneous!SSD!based!
Storage!Server!PlaAorm!
Homogeneous!HDD!based!
Storage!Server!PlaAorm!
HybridStore:!Heterogeneous!!
Storage!Server!PlaAorm!
Existing Proposals in Enterprise
! Hybrid Hard Disk • NAND Flash is on-bard cache in HDD.
! Intel Turbo Memory (ITM) [ACM TOS’08] • Support for the ReadyBoost and Ready Drive of Microsoft
! Two-tier Architecture from Microsoft [Eurosys’09] • Use SSDs as Long-Term read Cache and Short Write Buffer
! ZFS (designed by Sun) • ReadZilla & LogZilla (Implementation of read cache and write buffer)
10!
Hybrid!HDD,!FlashON™!
Intel!Turbo!Memory!
`
Disk Tier
Solid-state Tier
Write!Buffer! Read Cache
Cache miss
Read Write
Flush Update
Storage Tier
TwoOPer!storage!architecture!
Contents
! Introduction
! Background • NAND Flash based SSDs versus HDD • Motivation and Related Works
! HybridStore • Overview of HybridStore • Capacity Planner
− Workload Analyzer − Storage Optimization Solver
! Experimental Results • Synthetic and Realistic Workloads
! Conclusion
11!
Overview of HybridStore
12
Capacity Planner and Dynamic Controller
Flash!Flash!Flash!Flash!
SSD Array HDD Array
Flash!Flash!Flash!Flash!
Capacity(Planner(
Requirement!Lists!
O !ApplicaPons!O !Available!Budget!O !Performance!
Requirement,!etc!
Capacity!Planner!
!
!!
!
!
!
!
!
OpPmal!!
SSD,!HDD!
ConfiguraPon!
Admin!
Dynamic(Controller(
Device Driver
Dynamic Controller
Performance!
LifePme!
Estimator
Dynamic Planning
Time
Capacity Planner Capacity Planner
Long-Term
Perf. Predictor Write Regulator Frag. Busting
Short-Term
Performance!Predictor!
FragmentaPon!Buster!(Front)!
AdapPve!WearOleveler!(Front)!
Write!Regulator!
FragmentaPon!!
Buster!(Back)!
AdapPve!WearO!
leveler!(Back)!
Re-Capacity Planning
Monitor!
Dispatcher!
Client’s requests
Data!ParPPon!
Table!
Performance Budget
Violation Violation
Lifetime Budget
Capacity Planning
13
Problem Formulation: Goal and Constraints?
Cost of HybridStore = Cost SSDs + Cost HDDs + Cost Recur
Capacity Planner
1. Capacity of SSD 2. Workload
Partitioning
Inputs 1. Workload Characteristics 2. Hardware Properties (SSD and HDD)
Constraints 1. Performance requirement 2. Lifetime requirement
1) Perf. of HybridStore > Perf. Budget
2) Lifetime of HybridStore > Lifetime Budget Constraints
Performance Budget
Lifetime Budget
Minimize Cost of HybridStore Goal
Capacity Planning
14
HybridStore Hardware Model ! Provisioning SSDs • Find storage capacity of SSDs and HDDs • Find out the amount of data partition sent to SSDs for a given workload
! Storage Model & Data Partitioning
Finding Workload Attributes
! I/O workloads can be characterized by • Hot (highly accessed) and cold (rarely accessed) data, Read/write ratio,
Sequentiality, Request arrival rate, etc
! Data Classification • A methodology to partition a workload into smaller subsets.
! Finding workload attributes • The entire logical address space of the workload is divided into fixed-size
chunks (or records), then, mapped to different data classes. − 1MB record size is used because 1MB roughly corresponds to the granularity
of data prefetching doe by HDDs/SSDs. • Each data record is represented by the following workload attributes
− Temporality (frequency of accesses per unit time) − Read/write ratio − Request size (spatial locality) – sequential, partially sequential, partially
random, and random. − Request arrival rate
15!
Data Classification
16!
Hierarchical Data Classification
16
• Tuples (Hot or cold, Read ratio, Sequentiality (request size), Arrival rate)
Read (0) or Write (1) ?
0 1
Intensity of Arrival Rate? Eg. Lower (25th perc) Middle (50th perc) Upper (75th perc)
….
…. Sequentiality (Request Size)? Eg. <16KB, <32KB, <64KB, others
….
….
CLA
SS
1
CLA
SS
2
CLA
SS
3
CLA
SS
4 ….
`!Cold (0) or Hot (1)?
0 1 2 3 0 1 2 3
0 1 2 3 0 1 2 3
0 1
Partitioned logical address by records …. …. ….
Record (1MB)
Capacity Planning
! Declaration of Variables
17
! Decision Variables
€
xij = data of class j on yi devices of device type i
yi = number of devices of device type i
Capacity Planner: Problem Formulation
Ci =Capacity of device type i
Ui =Utilization of device type i
Bi =Maximum Bandwidth of device type i
• Properties of data class j!
€
S j = Size of data class j
Fi = Frequency of data class j
Wij =Weight factor for bandwidth of data cass j on yi devices of device type i
• Properties of device type i!
Integer variable
Capacity Planning
! Objective Function
18
€
CostHybridStore = CostInstallation + CostRecurring
€
= ( y(i)i=1
I
∑ ×D$(i) ×Ci + (K$ × y(i)i=1
I
∑ × P(t)dt)t∫ )
€
xij = S ji∑ , (∀j ∈ J)
xijj∑ ≤ (Ui ×Ci) × yi, (∀i ∈ I)
Fj ×xijS j
≤ Bij × yi, (∀i ∈ I, ∀j ∈ J)
Lifetime(i,x) ≤Useful Lifetime of HDD (i ∈ Flash based SSDs)
€
! Constraints
Mixed Integer Linear Programming
€
Expected lifetime =(Size of NAND flash × # of erase cycles)
bytes written per day
Space constraint
Performance constraint
Evaluating HybridPlan ! Solver development
• Developed a trace analyzer (lines of codes less than 500) • Developed the solver of HybridPlan using CPLEX
− CPLEX, a well-regarded Integer Linear Programming (ILP) solver
! Workloads • Synthetic workloads • Realistic workloads
− MSR Cambridge traces, and Microsoft Exchange server Traces
! Devices
19!
Device( Type( Capacity((
(GB)(
Per7GB(
($)(
U:liza
:on(
Read((
(MB/s)(
Write((
(MB/s)(
Latency(
(ms)(
Erase((
(#)(
Power((
(W)(
Seagate(
Cheetah(
15K((
HDD(
146( 1.80( 0.8( 171( 171( 3.6( 7( 12.92(
Seagate(
Barracuda(
7.2K((
HDD(
750( 0.17( 0.8( 125( 125( 4.2( 7( 9.4(
Intel((
X(257E(
SLC((
SSD(
32( 11.96( 0.5( 230( 200( 0.125( 100K( 2(
Intel((
X7257M(
MLC((
SSD(
80( 3.22( 0.5( 220( 80( 0.25( 10K( 2(
Synthetic Workloads
! Description of Synthetic Workloads
20!
Workloads( Index( Read(
(%)(
Size(
(KB)(
Inter7Arrival( I/O(Bandwidth(
Time!(ms)! MB/s! IOPs!
Sequen:al(
Read(
SR1( 80( 128( 100((L)( 1.25( 7(
SR2( 80( 128( 2((M)( 62.5( 7(
SR3( 80( 128( 0.2((H)( 1,250( 7(
Random(
Read(
RR1( 80( 4( 100((L)( 7( 10(
RR2( 80( 4( 2((M)( 7( 500(
RR3( 80( 4( 0.2((H)( 7( 10,000(
Sequen:al(
Write(
SW1( 20( 128( 100((L)( 1.25( 7(
SW2( 20( 128( 2((M)( 62.5( 7(
SW3( 20( 128( 0.2((H)( 1,250( 7(
Random(
Write(
RW1( 20( 4( 100((L)( 7( 10(
RW2( 20( 4( 2((M)( 7( 500(
RW3( 20( 4( 0.2((H)( 7( 10,000(
Impact of I/O Intensity
21!
! Results for Sequential Read Dominant Workloads • SR1, SR2, SR3
0
1
2
3
4
5
6
7
8
9
10
SR1 15K HDD Only
SR2 SR3
Num
ber
of D
evic
es (
#)
7.2K HDD
15K HDD
7.2K HDD
M-SSD
7.2K HDD
M-SSD
15K RPM HDD7.2K RPM HDD
SLC SSDMLC SSD
0
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
2400
2600
SR1 15K HDD Only
SR2 SR3
Tota
l Cost
($)
15K RPM HDD (I)7.2K RPM HDD (I)
SLC SSD (I)MLC SSD (I)
15K RPM HDD (R)7.2K RPM HDD (R)
SLC SSD (R)MLC SSD (R)
• SR1!only!requires!2!slow!7.2K!RPM!HDDs!whereas!it!requires!9!fast!15K!RPM!HDDs.!
• Our!solver!determines!the!right!devices!to!meet!the!capacity!needs.!!
• As!the!arrival!rate!increases,!we!observe!the!need!for!MLC!SSDs!(considering!$/GB!for!
SLC!SSD,!it!is!not!efficient!to!use!compared!to!using!MLC!SSD).!!
• Recurring!cost!(Electricity!cost)!are!quite!small!compared!to!device!installaPon!cost.!!
Impact of I/O Intensity
22!
! Results for Sequential Write Dominant Workloads • SW1, SW2, SW3
• For!write!dominant!SW3,!unlike!observaPon!from!SR!workloads,!the!solver!suggests!
to!use!one!SLC!SSD!instead!of!the!MLC!ones!for!its!readOintensive!counterpart!(SR3).!
It’s!because!SLC!SSD!that!we!use!is!2.5!Pmes!faster!than!the!MLC!one.!!
• Also!it!needs!a!sharp!increase!in!the!number!of!slow!HDDs!because!of!the!vast!$/GB!
difference!between!SLC!SSDs!and!slow!HDDs.!
0
2
4
6
8
10
12
14
16
18
20
SW1 SW2 SW3
Num
ber o
f Dev
ices
(#)
7.2K HDD 7.2K HDDM-SSD
7.2K HDD
S-SSD
15K RPM HDD7.2K RPM HDD
SLC SSDMLC SSD
0 200 400 600 800
1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400 3600
SW1 SW2 SW3
Tota
l Cos
t ($)
15K RPM HDD (I)7.2K RPM HDD (I)
SLC SSD (I)MLC SSD (I)
15K RPM HDD (R)7.2K RPM HDD (R)
SLC SSD (R)MLC SSD (R)
Impact of Sequentiality
! Results for Sequential and Random Workloads • SR, SW
23!
0
1
2
3
4
5
6
7
8
9
10
SR2 RR2 SW3RW3
De
vice
(#
)
7.2K HDD
M-SSD
7.2KHDD
M-SSD
7.2K HDD
M-SSD
7.2K HDD
S-SSD
M-SSD
15K RPM HDD7.2K RPM HDD
SLC SSDMLC SSD
• We!clearly!see!the!needs!of!the!larger!number!of!SSDs!as!the!workloads!are!random.!!
• For!RW3,!we!observe!the!needs!of!SLCOSSDs!to!meet!the!high!IOPS!requirement.!!
• As!a!storage!administrator,!it!is!highly!advisable!to!increase!the!sequenPality!of!
incoming!workloads!so!as!not!to!employ!expensive!SSDs.!!
0
500
1000
1500
2000
2500
SR2 RR2 SW3 RW3
Tota
l Cos
t ($)
15K RPM HDD (I)7.2K RPM HDD (I)
SLC SSD (I)MLC SSD (I)
15K RPM HDD (R)7.2K RPM HDD (R)
SLC SSD (R)MLC SSD (R)
Impact of Lifetime Constraint
! Results for without and with lifetime constraints • denoted as (A) and (B) respectively
24!
0 200 400 600 800
1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000
RW3(A)RW3(B)
SW3(A)SW3(B)
Tota
l Cos
t ($)
15K RPM HDD (I)7.2K RPM HDD (I)
SLC SSD (I)MLC SSD (I)
15K RPM HDD (R)7.2K RPM HDD (R)
SLC SSD (R)MLC SSD (R)
• LifePme!constraint!is!an!important!metric!in!capacity!provisioning.!!
• Without!lifePme!constraint,!we!see!a!greater!porPon!of!SSDs!being!used!than!with!the!
lifePme!constraint.!!
• For!SW3,!without!lifePme!constraint,!we!may!have!lower!number!of!devices!as!well!as!
the!overall!cost!compared!to!when!the!lifePme!constraint!is!forced,!however,!the!
storage!administrator!needs!to!reOprovision!prematurely,!eventually!increase!the!
overall!costs!over!the!iniPal!esPmated!period.!!
0 1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18 19 20
RW3(A)RW3(B)
SW3(A)SW3(B)
Num
ber o
f Dev
ices
(#)
7.2KHDD
M-SSD
7.2K HDD
S-SSDM-SSD
7.2K HDDS-SSD
M-SSD
7.2K HDD
S-SSD
15K RPM HDD7.2K RPM HDD
SLC SSDMLC SSD
Realistic Workloads
! Description of Realistic Workloads
25!
Workload( Size(
(TB)(
Read((
(%)(
Request(Size(
(KB)(
IOPS(
MSR!Trace! 5.7!TB! 68.1! 23.32! 823!
Exchange!
Server!
750GB! 38.3! 16.54! 3,692!
Can SSDs replace HDDs?
! Results for MSR Traces
26!
0
1
10
100
1000
15K HDD Only
7.2K HDD Only
SLC SSD Only
MLC SSD Only
HybridStor
Num
ber o
f Dev
ices
(#)
50
14
364146
10
1
15K RPM HDD7.2K RPM HDD
SLC SSDMLC SSD
1
10
100
1000
10000
100000
1e+06
15K HDD Only
7.2K HDD Only
SLC SSD Only
MLC SSD Only
HybridStore
Tota
l Cos
t ($)
11.3K
2.8K 1.8K576
139.4K
3.2K
37.7K
1.0K 1.3K258412
7
15K HDD (I)7.2K HDD (I)SLC SSD (I)
MLC SSD (I)15K HDD (R)7.2K HDD (R)
SLC SSD (R)MLC SSD (R)
• Employing!7.2K!RPM!HDDs!is!more!economically!efficient!than!employing!15K!RPM!
HDDs.!!
• In!case!of!SSD!systems,!it!requires!several!hundreds!of!SSDs!to!saPsfy!the!capacity!
requirement.!
• The!bounding!factor!for!decisionOmaking!of!HybridPlan!is!not!I/O!bandwidth!but!storage!
capacity!requirement.!!
Efficacy of HybridStore
! Results for MSR Traces
27!
0
1
10
100
1000
15K HDD Only
7.2K HDD Only
SLC SSD Only
MLC SSD Only
HybridStor
Num
ber o
f Dev
ices
(#)
50
14
364146
10
1
15K RPM HDD7.2K RPM HDD
SLC SSDMLC SSD
1
10
100
1000
10000
100000
1e+06
15K HDD Only
7.2K HDD Only
SLC SSD Only
MLC SSD Only
HybridStore
Tota
l Cos
t ($)
11.3K
2.8K 1.8K576
139.4K
3.2K
37.7K
1.0K 1.3K258412
7
15K HDD (I)7.2K HDD (I)SLC SSD (I)
MLC SSD (I)15K HDD (R)7.2K HDD (R)
SLC SSD (R)MLC SSD (R)
• HybridPlan!can!find!the!most!economic!storage!composiPon.!!
• HybridPlan!suggests!2!x!7.2K!RPM!HDDs!and!1!MLC!SSD!for!MSR!Trace.!!
• Total!cost!saving!of!HybridStore!is!about!85%!compared!to!highOend!HDD!only!system.!!
• 99%!data!are!classified!into!C32,!a!data!class!storing!data!rarely!accessed.!!
0.001 0.01
0.1 1
10 100
1000 10000
C0 C2 C4 C6 C8 C10C12
C14C16
C18C20
C22C24
C26C28
C30C32D
ata
Cla
ss D
istri
butio
n (G
B)
7.2K RPM HDD MLC SSD
Lessons Learned
28
I. We developed an capacity planner that finds the most economically efficient storage configurations while meeting the performance and lifetime requirements of devices
II. We provided a general form of comprehensive methodology using a well-known technique for optimization problems, Mixed Integer Linear Programming (LP)
III. Experiments showed that our capacity planner is able to identify close to minimum SSD capacity needed to meet a specified performance goal for realistic workloads while ensuring similar performance as compared to a comparatively more over-provisioned system