from application requests to virtual iopsmfreed/docs/libra-eurosys...libra uses virtual iops to...
TRANSCRIPT
-
PrincetonUniversity
From application requests to Virtual IOPs:Provisioned key-value storage with Libra
David Shue* and Michael J. Freedman
(*now at Google)
-
Shared Cloud
Tenant A Tenant BVM VM VM VM VM VM
Tenant CVM VM VM
-
Shared Cloud Storage
Tenant A Tenant BVM VM VM VM VM VM
Key-Value StorageBlock Storage SQL Database
Tenant CVM VM VM
-
Unpredictable Shared Cloud Storage
Tenant A Tenant BVM VM VM VM VM VM
Key-Value StorageBlock Storage SQL Database
Tenant CVM VM VM
Disk IO-bound TenantsSSD-backed storage
Key-Value Storage
-
Provisioned Shared Key-Value Storage
Tenant AVM VM VM
Tenant BVM VM VM
Tenant CVM VM VM
Application Requests
Shared Key-Value Storage
Low-level IO
SSDSSD SSD SSD SSD
GET/sPUT/s
IOPsBW
ReservationA ReservationB ReservationC
(1KB normalized)
-
Libra Contributions
•Libra IO Scheduler- Provisions low-level IO allocations for app-request reservations
w/ high utilization.
- Supports arbitrary object distributions and workloads.
•2 key mechanisms- Track per-tenant app-request resource profiles.- Model IO resources with Virtual IOPs.
6
-
Related Work
7
Storage Type
App-requests
WorkConserving Media
Maestro Block N N HDD
mClock Block N Y HDD
FlashFQ Block N Y SSD
DynamoDB Key-Value Y N SSD
-
Provisioned Distributed Key-Value Storage
Tenant A Tenant BVM VM VM VM VM VM
ReservationA ReservationB Tenant BVM VM VM
Storage Node N
Storage Node 1 ...
Shared Key-Value Storage
Global Reservation Problem
Local Reservation Problem
[Pisces OSD1 ’12]
-
Storage Node N
Provisioned Distributed Key-Value Storage
9
ReservationA ReservationB
Storage Node 1
data partitions
dem
and
...
Tenant AVM VM VM
Tenant BVM VM VM
-
Storage Node N
Storage Node 1
Provisioned Distributed Key-Value Storage
10
ReservationA ReservationB
...
ReservationA ReservationB
-
Storage Node N
Storage Node 1
Provisioned Distributed Key-Value Storage
11
...
ResAn ResBnResA1 ResB1 ReservationA ReservationB
ReservationA ReservationB
-
12
Key-Value Protocol
Persistence Engine
IO Scheduler
Physical Disk
Retrieve K
Read l337
IO operation
Provisioned Local Key-Value Storage
GET K
GET 1001100
Libra IO Scheduler
-
blah
blah
Libra Design
13
LibraProvisioning
Policy
Libra IO Scheduler
GET PU
T
How much IO to consume?
Reservation Distribution
Policy
How much IO to provision?
Persistence Engine
Physical Disk
DRR
Provision IO allocations for tenant app-request reservations
-
14
Provisioning App-request Reservations is Hard
IO Amplification
IO Interference
Non-linear IO Performance
1 KB PUT ≥ 1 KB Write
Variable IO throughputUnderestimate provisionable IO
Non-linear cost per KB Model IO with Virtual IOPs
Track tenant app-request resource profiles
-
Workload-dependent IO Amplification
15
PUT PUT K,V3
V3V2
FLU
SH
V3 indexA M
LevelDB (LSM-Tree)
0
5
10
15
20
25
30
1KB4KB
8KB16KB
32KB64KB
128KB
IO T
hrou
ghpu
t (ko
p/s)
GET/PUT Request Size
PUT write IO
CO
MPA
CT
V1 indexH V
V0 indexF O
V3 index
A O
-
Workload-dependent IO Amplification
16
PUT
FLU
SHC
OM
PAC
T
PUT K,V3
V3V2
V3 indexA M
V1 indexH V
V0 indexF O
V3 index
A O
LevelDB (LSM-Tree)
0
5
10
15
20
25
30
1KB4KB
8KB16KB
32KB64KB
128KB
IO T
hrou
ghpu
t (ko
p/s)
GET/PUT Request Size
PUT write IO
FLUSH write IO
-
Workload-dependent IO Amplification
17
PUT
FLU
SHC
OM
PAC
T
PUT K,V3
V3V2
V3 indexA M
V1 indexH V
V0 indexF O
V3 index
A O
LevelDB (LSM-Tree)
0
5
10
15
20
25
30
1KB4KB
8KB16KB
32KB64KB
128KB
IO T
hrou
ghpu
t (ko
p/s)
GET/PUT Request Size
PUT write IO
FLUSH write IO
COMPACT write IO
COMPACT read IO
-
Workload-dependent IO Amplification
18
GET K
indexA M
indexH V
indexF O
K indexG Z
0
5
10
15
20
25
30
1KB4KB
8KB16KB
32KB64KB
128KB
IO T
hrou
ghpu
t (ko
p/s)
GET/PUT Request Size
GET read IO
PUT write IO
FLUSH write IO
COMPACT write IO
COMPACT read IO
-
Libra Tracks App-request IO Consumption to Determine IO Allocations
19
5 GET
25 PUT
FLUSH
COMPACT
Per-PUT
Per-GET
Compute app-request IO profiles
GETx
x PUT IO=
Tenant A 5 IO units
PUT
FLUSH
Track IOconsumption
Provision IOallocations
blah
blah
LibraProvisioning
Policy
IO Tenant A 500 IO/s
5
10050
1
+6
10.5
80
70 500
-
1:1 Pure Read/Pure Write
1 2 4 8 16 32 64 128 256Read IOP Size (KB)
1 2 4 8
16 32 64
128 256
Writ
e IO
P Si
ze (K
B)
50
60
70
80
90
100
Pct o
f Ide
al T
hrou
ghpu
t
Unpredictable IO Interference
20
Die-level parallelism, low latency IOPs
4 read/4 write tenants
Shared-controller and bus contentionErase-before-write overheadFTL and read-modify-write garbage colleciton
-
1:1 Pure Read/Pure Write
1 2 4 8 16 32 64 128 256 1 2 4 8
16 32 64
128 256
Writ
e IO
P Si
ze (K
B)
75:25 Read/Write Ratio
1 2 4 8 16 32 64 128 256 1 2 4 8
16 32 64
128 256
50 60 70 80 90 100
Pct o
f Ide
al T
hrou
ghpu
t
50:50 Read/Write Ratio
1 2 4 8 16 32 64 128 256Read IOP Size (KB)
1 2 4 8
16 32 64
128 256
25:75 Read/Write Ratio
1 2 4 8 16 32 64 128 256 1 2 4 8
16 32 64
128 256
50 60 70 80 90 100
Unpredictable IO Interference
21
Unpredictable IO Interference
-
Libra Underestimates IO Capacity to Ensure Provisionable Throughput
22
Provisionable IO throughput = floor(workloads)(18 Kop/s)
Prov
isio
nabl
e IO
lim
it
0
0.2
0.4
0.6
0.8
1
15 20 25 30 35 40 45
Pct o
f Rea
d/W
rite
Expe
rimen
ts
Normalized IO Throughput
75:25 Read/Write50:50 Read/Write25:75 Read/Write
1:1 Pure Read/Pure Write
-
Libra Underestimates IO Capacity to Ensure Provisionable Throughput
23
Provisionable IO throughput = floor(workloads)(18 Kop/s)
Prov
isio
nabl
e IO
lim
it
0
0.2
0.4
0.6
0.8
1
15 20 25 30 35 40 45
Pct o
f Rea
d/W
rite
Expe
rimen
ts
Normalized IO Throughput
75:25 Read/Write75:25 = 4K
75:25 = 32K75:25 = 256K
50:50 Read/Write25:75 Read/Write
1:1 Pure Read/Pure Write
-
Libra Underestimates IO Capacity to Ensure Provisionable Throughput
24
Provisionable IO throughput = floor(workloads)(18 Kop/s)
Prov
isio
nabl
e IO
lim
it
0
0.2
0.4
0.6
0.8
1
15 20 25 30 35 40 45
Pct o
f Rea
d/W
rite
Expe
rimen
ts
Normalized IO Throughput
75:25 = 256K50:50 = 256K25:75 = 256K
1:1 Pure Read/Pure Write
-
Non-linear IO Performance
25
IOP Cost = linear decrease until 6 KB, then constant
IOP Cost = constant until 6 KB, then linear increase
0
50
100
150
200
250
1 2 4 8 16 32 64 128 256
Band
widt
h (M
B/s)
IOP Size (KB)
IO Bandwidth
0 5
10 15 20 25 30 35 40
1 2 4 8 16 32 64 128 256
IOP
(kop
/s)
IOP Size (KB)
IOP ThroughputMax BW Max IOP/s
-
Non-linear IO Performance
26
0
50
100
150
200
250
1 2 4 8 16 32 64 128 256
Band
widt
h (M
B/s)
IOP Size (KB)
IO Bandwidth
Read RandRead Seq
Write RandWrite Seq
0 5
10 15 20 25 30 35 40
1 2 4 8 16 32 64 128 256
IOP
(kop
/s)
IOP Size (KB)
IOP ThroughputMax BW Max IOP/s
-
Libra Uses Virtual IOPs to Model IO Resources
27
cording to a non-linear cost model derived directly from theIOP throughput curves, as shown in Figure 6. Libra calcu-lates the cost model in terms of VOPs-per-byte by dividingthe max IOP throughput by the achieved (read or write) IOPthroughput, normalized by IOP size.
VOPCPB(IOP-size) =Max-IOP
Achieved-IOP(IOP-size) ⇥ IOP-sizeIn this model, the maximum IO throughput in VOP/s is con-stant for the pure read/write throughput curves.
Internally, Libra’s scheduler threads perform distributeddeficit round robin [21] (DDRR) to e�ciently schedule par-allel IO requests (up to 32, which corresponds to the SSDqueue depth). For each IO operation, the scheduler computesthe number of VOPs consumed
VOPcost(IOP-size) = VOPCPB(IOP-size) ⇥ IOP-sizeand deducts this amount from the associated tenant’s VOPallocation to enforce resource limits and track resource con-sumption. DDRR incurs minimal inter-thread synchroniza-tion and schedules IO tasks in constant time to e�cientlyachieve fair sharing and isolation in a work-conserving fash-ion. In general, virtual-time [12] and round-robin [29] basedgeneralized processor sharing approximations are susceptibleto IO throughput fluctuations, since they only provide propor-tional (not absolute) resource shares. However, Libra’s IOcapacity threshold ensures that each tenant receives at least itsallocated share. Any excess capacity consumed by a tenantcan be charged as overage or used by best-e↵ort tenants.
The VOP cost model allows Libra to charge an IO opera-tion in proportion to its actual resource usage. For example,10000 1KB reads, 3000 1KB writes and 160 256KB reads allrepresent about a quarter of the SSD IO throughput at theirrespective IOP sizes. Hence, Libra charges each workloadthe same 10000 VOP/s, or about one quarter of the max IOPcapacity. Thus, barring interference e↵ects, Libra can dividethe full IO throughput arbitrarily among tenant workloadswith disparate IOP sizes. Note that while the shape of theread and write VOP cost curves are similar, their magnitudesreflect the relative cost of their operations. Writes are alwaysmore expensive than reads, but the gap diminishes as IOPsizes increase, due to lower erase block compaction overhead.
Determining the VOP cost model and minimum IO capac-ity for a particular SSD configuration requires benchmarkingthe storage system using a set of experiments similar to theones described for the throughput curves. While these ex-periments are by no means exhaustive, they probe a widerange of operating parameters and give a strong indicationof SSD performance and the minimum VOP bound. Patho-logical cases where IO throughput drops below the minimumcan be detected by Libra, but should be resolved by higher-level mechanisms. Assuming timely resolution, these minorthroughput violations can be absorbed by the provider’s SLA,e.g., EBS guarantees 90% of the provisioned throughput over99.9% of the year [4].
5. ImplementationLibra exposes a posix-compliant IO interface, wrapping theunderlying IO system calls to enforce resource constraints andinterpose scheduling decisions. To utilize Libra, applications,i.e., persistence engines, simply replace their existing IO sys-tem calls (read, write, send, recv, etc) with the correspondingwrappers. Libra also provides a task marking API for appli-cations to tag a thread of execution (task) and its associatedIO calls with the current app-request or internal operationcontext. The Libra IO scheduling framework is implementedin ⇠20000 lines of C code as a user-space library.
Libra employs coroutines to handle blocking disk IO andinter-task coordination, i.e., mutexes and conditionals. Corou-tines allow Libra to pause a tenant’s task execution by swap-ping out processor state, i.e., registers and stack pointer, to acompact data structure and resume from a di↵erent coroutine.Libra uses this facility to reschedule an IO task on resourceexhaustion or mutex lock. This allows Libra to delay IOoperations that would otherwise exceed a tenant’s resource al-location until a subsequent scheduling round when the tenantsresources have been renewed. Libra defaults to synchronousdisk operations (O SYNC), disables all disk page caching(i.e., O DIRECT on linux). The page cache masks IO la-tency and queue back pressure, which undermines Libra’sscheduling decisions and capacity model. We also run Librain tandem with a noop kernel IO scheduler to force IOPs todisk with minimal delay and interruption.
Enabling Libra in our LevelDB-based storage prototype re-quired less than 30 lines of code for replacing system calls andmarking application requests. However, unlocking LevelDB’sfull performance for synchronous PUTs required extensivemodifications. Our prototype enables parallel writes to takefull advantage of SSD IO parallelism (LevelDB serializesall client write threads by default). It also issues sequentialwrites (e.g., FLUSH) in an asynchronous, io-e�cient manner(LevelDB defaults to memory mapped IO which is incompati-ble with O DIRECT). Lastly, our prototype runs FLUSH andCOMPACT operations in parallel (LevelDB schedules bothin the same background task).
We implemented Libra as a user-space library for two mainreasons. First, the model of multi-tenancy model we supportat the storage node is a single process with multiple threadsthat handle requests from any tenant, frequently switchingfrom one tenant to another. This model is commonly used byhigh-performance key-value storage servers [7, 11]. Existingkernel mechanisms for multi-tenant resource allocation, (e.g.,linux cgroups [22]), work well for the process-per-tenantmodel where tasks (processes or threads) are bound to a sin-gle cgroup (tenant) over their lifetime. Frequent switchingbetween cgroups, however, is slow due to lock contention inthe kernel. Second, tracking app-request resource consump-tion across system call boundaries requires additional OSsupport for IO request tagging (as described in [24]), which
8
Unifies IO cost into a single metricCaptures non-linear IO performanceProvides IO insulation
0 5
10 15 20 25 30 35 40
1 2 4 8 16 32 64 128 256
IOPs
(kop
/s)
IOP Size (KB)
IOP Throughput at 1/2 Max VOPsMax ReadMax Write
Half Read IOHalf Write IO
0 0.5
1 1.5
2 2.5
3 3.5
1 2 4 8 16 32 64 128 256
Virtu
al IO
P Co
st (o
p/KB
)
IOP Size (KB)
Libra IO Cost ModelRead IO costWrite IO cost
2 equal-allocation tenantsIO Insulation = 1/2 Max Read/Write
-
Libra Uses Virtual IOPs to Model IO Resources
28
2 equal-allocation tenantsIO Insulation = 1/2 Max Read/Write
0 5
10 15 20 25 30 35 40
1 2 4 8 16 32 64 128 256
IOPs
(kop
/s)
IOP Size (KB)
IOP Throughput at 1/2 Max VOPsLibra Read IO ModelLibra Write IO Model
Max ReadMax Write
0 0.5
1 1.5
2 2.5
3 3.5
1 2 4 8 16 32 64 128 256
Virtu
al IO
P Co
st (o
p/KB
)
IOP Size (KB)
Libra IO Cost ModelRead IO costWrite IO cost
cording to a non-linear cost model derived directly from theIOP throughput curves, as shown in Figure 6. Libra calcu-lates the cost model in terms of VOPs-per-byte by dividingthe max IOP throughput by the achieved (read or write) IOPthroughput, normalized by IOP size.
VOPCPB(IOP-size) =Max-IOP
Achieved-IOP(IOP-size) ⇥ IOP-sizeIn this model, the maximum IO throughput in VOP/s is con-stant for the pure read/write throughput curves.
Internally, Libra’s scheduler threads perform distributeddeficit round robin [21] (DDRR) to e�ciently schedule par-allel IO requests (up to 32, which corresponds to the SSDqueue depth). For each IO operation, the scheduler computesthe number of VOPs consumed
VOPcost(IOP-size) = VOPCPB(IOP-size) ⇥ IOP-sizeand deducts this amount from the associated tenant’s VOPallocation to enforce resource limits and track resource con-sumption. DDRR incurs minimal inter-thread synchroniza-tion and schedules IO tasks in constant time to e�cientlyachieve fair sharing and isolation in a work-conserving fash-ion. In general, virtual-time [12] and round-robin [29] basedgeneralized processor sharing approximations are susceptibleto IO throughput fluctuations, since they only provide propor-tional (not absolute) resource shares. However, Libra’s IOcapacity threshold ensures that each tenant receives at least itsallocated share. Any excess capacity consumed by a tenantcan be charged as overage or used by best-e↵ort tenants.
The VOP cost model allows Libra to charge an IO opera-tion in proportion to its actual resource usage. For example,10000 1KB reads, 3000 1KB writes and 160 256KB reads allrepresent about a quarter of the SSD IO throughput at theirrespective IOP sizes. Hence, Libra charges each workloadthe same 10000 VOP/s, or about one quarter of the max IOPcapacity. Thus, barring interference e↵ects, Libra can dividethe full IO throughput arbitrarily among tenant workloadswith disparate IOP sizes. Note that while the shape of theread and write VOP cost curves are similar, their magnitudesreflect the relative cost of their operations. Writes are alwaysmore expensive than reads, but the gap diminishes as IOPsizes increase, due to lower erase block compaction overhead.
Determining the VOP cost model and minimum IO capac-ity for a particular SSD configuration requires benchmarkingthe storage system using a set of experiments similar to theones described for the throughput curves. While these ex-periments are by no means exhaustive, they probe a widerange of operating parameters and give a strong indicationof SSD performance and the minimum VOP bound. Patho-logical cases where IO throughput drops below the minimumcan be detected by Libra, but should be resolved by higher-level mechanisms. Assuming timely resolution, these minorthroughput violations can be absorbed by the provider’s SLA,e.g., EBS guarantees 90% of the provisioned throughput over99.9% of the year [4].
5. ImplementationLibra exposes a posix-compliant IO interface, wrapping theunderlying IO system calls to enforce resource constraints andinterpose scheduling decisions. To utilize Libra, applications,i.e., persistence engines, simply replace their existing IO sys-tem calls (read, write, send, recv, etc) with the correspondingwrappers. Libra also provides a task marking API for appli-cations to tag a thread of execution (task) and its associatedIO calls with the current app-request or internal operationcontext. The Libra IO scheduling framework is implementedin ⇠20000 lines of C code as a user-space library.
Libra employs coroutines to handle blocking disk IO andinter-task coordination, i.e., mutexes and conditionals. Corou-tines allow Libra to pause a tenant’s task execution by swap-ping out processor state, i.e., registers and stack pointer, to acompact data structure and resume from a di↵erent coroutine.Libra uses this facility to reschedule an IO task on resourceexhaustion or mutex lock. This allows Libra to delay IOoperations that would otherwise exceed a tenant’s resource al-location until a subsequent scheduling round when the tenantsresources have been renewed. Libra defaults to synchronousdisk operations (O SYNC), disables all disk page caching(i.e., O DIRECT on linux). The page cache masks IO la-tency and queue back pressure, which undermines Libra’sscheduling decisions and capacity model. We also run Librain tandem with a noop kernel IO scheduler to force IOPs todisk with minimal delay and interruption.
Enabling Libra in our LevelDB-based storage prototype re-quired less than 30 lines of code for replacing system calls andmarking application requests. However, unlocking LevelDB’sfull performance for synchronous PUTs required extensivemodifications. Our prototype enables parallel writes to takefull advantage of SSD IO parallelism (LevelDB serializesall client write threads by default). It also issues sequentialwrites (e.g., FLUSH) in an asynchronous, io-e�cient manner(LevelDB defaults to memory mapped IO which is incompati-ble with O DIRECT). Lastly, our prototype runs FLUSH andCOMPACT operations in parallel (LevelDB schedules bothin the same background task).
We implemented Libra as a user-space library for two mainreasons. First, the model of multi-tenancy model we supportat the storage node is a single process with multiple threadsthat handle requests from any tenant, frequently switchingfrom one tenant to another. This model is commonly used byhigh-performance key-value storage servers [7, 11]. Existingkernel mechanisms for multi-tenant resource allocation, (e.g.,linux cgroups [22]), work well for the process-per-tenantmodel where tasks (processes or threads) are bound to a sin-gle cgroup (tenant) over their lifetime. Frequent switchingbetween cgroups, however, is slow due to lock contention inthe kernel. Second, tracking app-request resource consump-tion across system call boundaries requires additional OSsupport for IO request tagging (as described in [24]), which
8
-
blah
blah
Libra Design
29
LibraProvisioning
Policy
Libra IO Scheduler
Persistence Engine
Physical Disk
Provision IO allocations for tenant app-request reservations
Charge tenant IOPs based on VOP cost
Track app-request VOP consumption
Provision VOPs within provisionable limit
Update tenant VOPallocations
-
Evaluation
• Does Libra's IO resource model achieve accurate resource allocations?
• Does Libra's IO threshold make an acceptable tradeoff of performance for predictability in a real storage stack?
• Can Libra ensure per-tenant app-request reservations while achieving high utilization?
30
-
31
Interference-free Ideal
Libra Achieves Accurate IO Allocations
Throughput Ratio = Actual / Expected (IO Insulation)
Read 1 KB
0 0.2 0.4 0.6 0.8
1 1.2
W 1KBW 4KB
W 8KBW 16KB
W 32KBW 64KB
W 128KBW 256KB
Thro
ughp
ut R
atio
Read-Write IOP Throughput Ratio
Read TenantsWrite Tenants
even
-
32
Interference-free Ideal
0 0.2 0.4 0.6 0.8
1 1.2
Thro
ughp
ut R
atio
Read-Write IOP Throughput Ratio
R 1KB R 4KB R 8KB R 16KB R 32KB R 64KB R 128KBR 256KB
Read TenantsWrite Tenants
Libra Achieves Accurate IO Allocations
Write 1-256 KB
Throughput Ratio = Actual / Expected (IO Insulation)
0 0.2 0.4 0.6 0.8
1 1.2
W 1KBW 4KB
W 8KBW 16KB
W 32KBW 64KB
W 128KBW 256KB
Thro
ughp
ut R
atio
Read-Write IOP Throughput Ratio
Read TenantsWrite Tenants
-
33
Libra Achieves Accurate IO Allocations
0 0.2 0.4 0.6 0.8
1 1.2
1 2 4 8 16 32 64 128 256
VOP
Cost
(op/
KB)
IOP Size (KB)
Read IO Cost Models
0 0.5
1 1.5
2 2.5
3 3.5
1 2 4 8 16 32 64 128 256VO
P Co
st (o
p/KB
)
IOP Size (KB)
Write IO Cost Models
libraconstant
fixed
-
34
Libra Achieves Accurate IO Allocations
0 0.2 0.4 0.6 0.8
1 1.2
1 2 4 8 16 32 64 128 256
VOP
Cost
(op/
KB)
IOP Size (KB)
Read IO Cost Models
0 0.5
1 1.5
2 2.5
3 3.5
1 2 4 8 16 32 64 128 256VO
P Co
st (o
p/KB
)
IOP Size (KB)
Write IO Cost Models
libraconstant
fixedlinear
-
35
Libra Achieves Accurate IO Allocations
Min-Max Ratio = Min Throughput Ratio / Max Throughput Ratio
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
1
rw rr ww
Accu
racy
(MM
R)IOP Insulation Accuracy
Write-WriteRead-ReadRead-Write
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
1
rw rr ww
Accu
racy
(MM
R)
Virtual IOP Allocation Accuracy
libra linear constant fixedWrite-WriteRead-ReadRead-Write
-
Libra Trades-off Nominal IO Throughput For Predictability
36
75:25 GET-PUT, Variance 4K
1 2 4 8 16 32 64 128 256
GET Request Size (KB)
1 2 4 8
16 32 64
128 256
PUT
Requ
est S
ize (K
B)
50:50 GET-PUT, Variance 4K
1 2 4 8 16 32 64 128
GET Request Size (KB)
1 2 4 8
16 32 64
128 256
25:75 GET-PUT, Variance 4K
1 2 4 8 16 32 64 128 256
GET Request Size (KB)
1 2 4 8
16 32 64
128 256
16 18 20 22 24 26 28 30
VOP
(kop
/s)
-
Libra Trades-off Nominal IO Throughput For Predictability
37
< 10th percentile covered by SLA and higher-level policies
Workload
PercentilePercentilePercentilePercentile
10th 50th 80th All
99:1
25:75
1:99
1.6% 30.5% 40.5% 45.8%
1.4% 14.9% 25.0% 34.7%
0.7% 12.2% 19.5% 28.1%
Unprovisionable Throughput As a Percentage of Total Throughput
-
Libra Achieves App-request Reservations
38
Read Heavy Mixed Write Heavy
0 1 2 3 4 5 6 7
Thro
ughp
ut (k
req/
s)
Libra Normalized GET (1KB)
0 1 2 3 4 5 6 7
Libra Normalized PUT (1KB)
0 1 2 3 4 5 6 7
100 150 200 250 300
Thro
ughp
ut (k
req/
s)
Time (s)
No Profile Normalized GET (1KB)
0 1 2 3 4 5 6 7
100 150 200 250 300Time (s)
No Profile Normalized PUT (1KB)
1.5x0.5x
Work-conserving consumption of unprovisioned
resources
Fully provisioned allocations
-
Libra Achieves App-request Reservations
39
0 1 2 3 4 5 6 7
Thro
ughp
ut (k
req/
s)
Libra Normalized GET (1KB)
0 1 2 3 4 5 6 7
Libra Normalized PUT (1KB)
0 1 2 3 4 5 6 7
100 150 200 250 300
Thro
ughp
ut (k
req/
s)
Time (s)
No Profile Normalized GET (1KB)
0 1 2 3 4 5 6 7
100 150 200 250 300Time (s)
No Profile Normalized PUT (1KB)
Read Heavy Mixed Write Heavy
-
•Libra IO Scheduler- Provisions IO allocations for app-request reservations w/ high utilization.- Supports arbitrary object distributions and workloads.
• 2 key mechanisms- Track per-tenant app-request resource profiles.- Model IO resources with Virtual IOPs.
• Evaluation- Achieves accurate low-level IO allocations. - Provisions the majority of IO resources over a wide range of workloads- Satisfies app-request reservations w/ high utilization.
Conclusion
40