from application requests to virtual iopsmfreed/docs/libra-eurosys...libra uses virtual iops to...

40
PrincetonUniversity From application requests to Virtual IOPs: Provisioned key-value storage with Libra David Shue * and Michael J. Freedman (*now at Google)

Upload: others

Post on 27-Jan-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

  • PrincetonUniversity

    From application requests to Virtual IOPs:Provisioned key-value storage with Libra

    David Shue* and Michael J. Freedman

    (*now at Google)

  • Shared Cloud

    Tenant A Tenant BVM VM VM VM VM VM

    Tenant CVM VM VM

  • Shared Cloud Storage

    Tenant A Tenant BVM VM VM VM VM VM

    Key-Value StorageBlock Storage SQL Database

    Tenant CVM VM VM

  • Unpredictable Shared Cloud Storage

    Tenant A Tenant BVM VM VM VM VM VM

    Key-Value StorageBlock Storage SQL Database

    Tenant CVM VM VM

    Disk IO-bound TenantsSSD-backed storage

    Key-Value Storage

  • Provisioned Shared Key-Value Storage

    Tenant AVM VM VM

    Tenant BVM VM VM

    Tenant CVM VM VM

    Application Requests

    Shared Key-Value Storage

    Low-level IO

    SSDSSD SSD SSD SSD

    GET/sPUT/s

    IOPsBW

    ReservationA ReservationB ReservationC

    (1KB normalized)

  • Libra Contributions

    •Libra IO Scheduler- Provisions low-level IO allocations for app-request reservations

    w/ high utilization.

    - Supports arbitrary object distributions and workloads.

    •2 key mechanisms- Track per-tenant app-request resource profiles.- Model IO resources with Virtual IOPs.

    6

  • Related Work

    7

    Storage Type

    App-requests

    WorkConserving Media

    Maestro Block N N HDD

    mClock Block N Y HDD

    FlashFQ Block N Y SSD

    DynamoDB Key-Value Y N SSD

  • Provisioned Distributed Key-Value Storage

    Tenant A Tenant BVM VM VM VM VM VM

    ReservationA ReservationB Tenant BVM VM VM

    Storage Node N

    Storage Node 1 ...

    Shared Key-Value Storage

    Global Reservation Problem

    Local Reservation Problem

    [Pisces OSD1 ’12]

  • Storage Node N

    Provisioned Distributed Key-Value Storage

    9

    ReservationA ReservationB

    Storage Node 1

    data partitions

    dem

    and

    ...

    Tenant AVM VM VM

    Tenant BVM VM VM

  • Storage Node N

    Storage Node 1

    Provisioned Distributed Key-Value Storage

    10

    ReservationA ReservationB

    ...

    ReservationA ReservationB

  • Storage Node N

    Storage Node 1

    Provisioned Distributed Key-Value Storage

    11

    ...

    ResAn ResBnResA1 ResB1 ReservationA ReservationB

    ReservationA ReservationB

  • 12

    Key-Value Protocol

    Persistence Engine

    IO Scheduler

    Physical Disk

    Retrieve K

    Read l337

    IO operation

    Provisioned Local Key-Value Storage

    GET K

    GET 1001100

    Libra IO Scheduler

  • blah

    blah

    Libra Design

    13

    LibraProvisioning

    Policy

    Libra IO Scheduler

    GET PU

    T

    How much IO to consume?

    Reservation Distribution

    Policy

    How much IO to provision?

    Persistence Engine

    Physical Disk

    DRR

    Provision IO allocations for tenant app-request reservations

  • 14

    Provisioning App-request Reservations is Hard

    IO Amplification

    IO Interference

    Non-linear IO Performance

    1 KB PUT ≥ 1 KB Write

    Variable IO throughputUnderestimate provisionable IO

    Non-linear cost per KB Model IO with Virtual IOPs

    Track tenant app-request resource profiles

  • Workload-dependent IO Amplification

    15

    PUT PUT K,V3

    V3V2

    FLU

    SH

    V3 indexA M

    LevelDB (LSM-Tree)

    0

    5

    10

    15

    20

    25

    30

    1KB4KB

    8KB16KB

    32KB64KB

    128KB

    IO T

    hrou

    ghpu

    t (ko

    p/s)

    GET/PUT Request Size

    PUT write IO

    CO

    MPA

    CT

    V1 indexH V

    V0 indexF O

    V3 index

    A O

  • Workload-dependent IO Amplification

    16

    PUT

    FLU

    SHC

    OM

    PAC

    T

    PUT K,V3

    V3V2

    V3 indexA M

    V1 indexH V

    V0 indexF O

    V3 index

    A O

    LevelDB (LSM-Tree)

    0

    5

    10

    15

    20

    25

    30

    1KB4KB

    8KB16KB

    32KB64KB

    128KB

    IO T

    hrou

    ghpu

    t (ko

    p/s)

    GET/PUT Request Size

    PUT write IO

    FLUSH write IO

  • Workload-dependent IO Amplification

    17

    PUT

    FLU

    SHC

    OM

    PAC

    T

    PUT K,V3

    V3V2

    V3 indexA M

    V1 indexH V

    V0 indexF O

    V3 index

    A O

    LevelDB (LSM-Tree)

    0

    5

    10

    15

    20

    25

    30

    1KB4KB

    8KB16KB

    32KB64KB

    128KB

    IO T

    hrou

    ghpu

    t (ko

    p/s)

    GET/PUT Request Size

    PUT write IO

    FLUSH write IO

    COMPACT write IO

    COMPACT read IO

  • Workload-dependent IO Amplification

    18

    GET K

    indexA M

    indexH V

    indexF O

    K indexG Z

    0

    5

    10

    15

    20

    25

    30

    1KB4KB

    8KB16KB

    32KB64KB

    128KB

    IO T

    hrou

    ghpu

    t (ko

    p/s)

    GET/PUT Request Size

    GET read IO

    PUT write IO

    FLUSH write IO

    COMPACT write IO

    COMPACT read IO

  • Libra Tracks App-request IO Consumption to Determine IO Allocations

    19

    5 GET

    25 PUT

    FLUSH

    COMPACT

    Per-PUT

    Per-GET

    Compute app-request IO profiles

    GETx

    x PUT IO=

    Tenant A 5 IO units

    PUT

    FLUSH

    Track IOconsumption

    Provision IOallocations

    blah

    blah

    LibraProvisioning

    Policy

    IO Tenant A 500 IO/s

    5

    10050

    1

    +6

    10.5

    80

    70 500

  • 1:1 Pure Read/Pure Write

    1 2 4 8 16 32 64 128 256Read IOP Size (KB)

    1 2 4 8

    16 32 64

    128 256

    Writ

    e IO

    P Si

    ze (K

    B)

    50

    60

    70

    80

    90

    100

    Pct o

    f Ide

    al T

    hrou

    ghpu

    t

    Unpredictable IO Interference

    20

    Die-level parallelism, low latency IOPs

    4 read/4 write tenants

    Shared-controller and bus contentionErase-before-write overheadFTL and read-modify-write garbage colleciton

  • 1:1 Pure Read/Pure Write

    1 2 4 8 16 32 64 128 256 1 2 4 8

    16 32 64

    128 256

    Writ

    e IO

    P Si

    ze (K

    B)

    75:25 Read/Write Ratio

    1 2 4 8 16 32 64 128 256 1 2 4 8

    16 32 64

    128 256

    50 60 70 80 90 100

    Pct o

    f Ide

    al T

    hrou

    ghpu

    t

    50:50 Read/Write Ratio

    1 2 4 8 16 32 64 128 256Read IOP Size (KB)

    1 2 4 8

    16 32 64

    128 256

    25:75 Read/Write Ratio

    1 2 4 8 16 32 64 128 256 1 2 4 8

    16 32 64

    128 256

    50 60 70 80 90 100

    Unpredictable IO Interference

    21

    Unpredictable IO Interference

  • Libra Underestimates IO Capacity to Ensure Provisionable Throughput

    22

    Provisionable IO throughput = floor(workloads)(18 Kop/s)

    Prov

    isio

    nabl

    e IO

    lim

    it

    0

    0.2

    0.4

    0.6

    0.8

    1

    15 20 25 30 35 40 45

    Pct o

    f Rea

    d/W

    rite

    Expe

    rimen

    ts

    Normalized IO Throughput

    75:25 Read/Write50:50 Read/Write25:75 Read/Write

    1:1 Pure Read/Pure Write

  • Libra Underestimates IO Capacity to Ensure Provisionable Throughput

    23

    Provisionable IO throughput = floor(workloads)(18 Kop/s)

    Prov

    isio

    nabl

    e IO

    lim

    it

    0

    0.2

    0.4

    0.6

    0.8

    1

    15 20 25 30 35 40 45

    Pct o

    f Rea

    d/W

    rite

    Expe

    rimen

    ts

    Normalized IO Throughput

    75:25 Read/Write75:25 = 4K

    75:25 = 32K75:25 = 256K

    50:50 Read/Write25:75 Read/Write

    1:1 Pure Read/Pure Write

  • Libra Underestimates IO Capacity to Ensure Provisionable Throughput

    24

    Provisionable IO throughput = floor(workloads)(18 Kop/s)

    Prov

    isio

    nabl

    e IO

    lim

    it

    0

    0.2

    0.4

    0.6

    0.8

    1

    15 20 25 30 35 40 45

    Pct o

    f Rea

    d/W

    rite

    Expe

    rimen

    ts

    Normalized IO Throughput

    75:25 = 256K50:50 = 256K25:75 = 256K

    1:1 Pure Read/Pure Write

  • Non-linear IO Performance

    25

    IOP Cost = linear decrease until 6 KB, then constant

    IOP Cost = constant until 6 KB, then linear increase

    0

    50

    100

    150

    200

    250

    1 2 4 8 16 32 64 128 256

    Band

    widt

    h (M

    B/s)

    IOP Size (KB)

    IO Bandwidth

    0 5

    10 15 20 25 30 35 40

    1 2 4 8 16 32 64 128 256

    IOP

    (kop

    /s)

    IOP Size (KB)

    IOP ThroughputMax BW Max IOP/s

  • Non-linear IO Performance

    26

    0

    50

    100

    150

    200

    250

    1 2 4 8 16 32 64 128 256

    Band

    widt

    h (M

    B/s)

    IOP Size (KB)

    IO Bandwidth

    Read RandRead Seq

    Write RandWrite Seq

    0 5

    10 15 20 25 30 35 40

    1 2 4 8 16 32 64 128 256

    IOP

    (kop

    /s)

    IOP Size (KB)

    IOP ThroughputMax BW Max IOP/s

  • Libra Uses Virtual IOPs to Model IO Resources

    27

    cording to a non-linear cost model derived directly from theIOP throughput curves, as shown in Figure 6. Libra calcu-lates the cost model in terms of VOPs-per-byte by dividingthe max IOP throughput by the achieved (read or write) IOPthroughput, normalized by IOP size.

    VOPCPB(IOP-size) =Max-IOP

    Achieved-IOP(IOP-size) ⇥ IOP-sizeIn this model, the maximum IO throughput in VOP/s is con-stant for the pure read/write throughput curves.

    Internally, Libra’s scheduler threads perform distributeddeficit round robin [21] (DDRR) to e�ciently schedule par-allel IO requests (up to 32, which corresponds to the SSDqueue depth). For each IO operation, the scheduler computesthe number of VOPs consumed

    VOPcost(IOP-size) = VOPCPB(IOP-size) ⇥ IOP-sizeand deducts this amount from the associated tenant’s VOPallocation to enforce resource limits and track resource con-sumption. DDRR incurs minimal inter-thread synchroniza-tion and schedules IO tasks in constant time to e�cientlyachieve fair sharing and isolation in a work-conserving fash-ion. In general, virtual-time [12] and round-robin [29] basedgeneralized processor sharing approximations are susceptibleto IO throughput fluctuations, since they only provide propor-tional (not absolute) resource shares. However, Libra’s IOcapacity threshold ensures that each tenant receives at least itsallocated share. Any excess capacity consumed by a tenantcan be charged as overage or used by best-e↵ort tenants.

    The VOP cost model allows Libra to charge an IO opera-tion in proportion to its actual resource usage. For example,10000 1KB reads, 3000 1KB writes and 160 256KB reads allrepresent about a quarter of the SSD IO throughput at theirrespective IOP sizes. Hence, Libra charges each workloadthe same 10000 VOP/s, or about one quarter of the max IOPcapacity. Thus, barring interference e↵ects, Libra can dividethe full IO throughput arbitrarily among tenant workloadswith disparate IOP sizes. Note that while the shape of theread and write VOP cost curves are similar, their magnitudesreflect the relative cost of their operations. Writes are alwaysmore expensive than reads, but the gap diminishes as IOPsizes increase, due to lower erase block compaction overhead.

    Determining the VOP cost model and minimum IO capac-ity for a particular SSD configuration requires benchmarkingthe storage system using a set of experiments similar to theones described for the throughput curves. While these ex-periments are by no means exhaustive, they probe a widerange of operating parameters and give a strong indicationof SSD performance and the minimum VOP bound. Patho-logical cases where IO throughput drops below the minimumcan be detected by Libra, but should be resolved by higher-level mechanisms. Assuming timely resolution, these minorthroughput violations can be absorbed by the provider’s SLA,e.g., EBS guarantees 90% of the provisioned throughput over99.9% of the year [4].

    5. ImplementationLibra exposes a posix-compliant IO interface, wrapping theunderlying IO system calls to enforce resource constraints andinterpose scheduling decisions. To utilize Libra, applications,i.e., persistence engines, simply replace their existing IO sys-tem calls (read, write, send, recv, etc) with the correspondingwrappers. Libra also provides a task marking API for appli-cations to tag a thread of execution (task) and its associatedIO calls with the current app-request or internal operationcontext. The Libra IO scheduling framework is implementedin ⇠20000 lines of C code as a user-space library.

    Libra employs coroutines to handle blocking disk IO andinter-task coordination, i.e., mutexes and conditionals. Corou-tines allow Libra to pause a tenant’s task execution by swap-ping out processor state, i.e., registers and stack pointer, to acompact data structure and resume from a di↵erent coroutine.Libra uses this facility to reschedule an IO task on resourceexhaustion or mutex lock. This allows Libra to delay IOoperations that would otherwise exceed a tenant’s resource al-location until a subsequent scheduling round when the tenantsresources have been renewed. Libra defaults to synchronousdisk operations (O SYNC), disables all disk page caching(i.e., O DIRECT on linux). The page cache masks IO la-tency and queue back pressure, which undermines Libra’sscheduling decisions and capacity model. We also run Librain tandem with a noop kernel IO scheduler to force IOPs todisk with minimal delay and interruption.

    Enabling Libra in our LevelDB-based storage prototype re-quired less than 30 lines of code for replacing system calls andmarking application requests. However, unlocking LevelDB’sfull performance for synchronous PUTs required extensivemodifications. Our prototype enables parallel writes to takefull advantage of SSD IO parallelism (LevelDB serializesall client write threads by default). It also issues sequentialwrites (e.g., FLUSH) in an asynchronous, io-e�cient manner(LevelDB defaults to memory mapped IO which is incompati-ble with O DIRECT). Lastly, our prototype runs FLUSH andCOMPACT operations in parallel (LevelDB schedules bothin the same background task).

    We implemented Libra as a user-space library for two mainreasons. First, the model of multi-tenancy model we supportat the storage node is a single process with multiple threadsthat handle requests from any tenant, frequently switchingfrom one tenant to another. This model is commonly used byhigh-performance key-value storage servers [7, 11]. Existingkernel mechanisms for multi-tenant resource allocation, (e.g.,linux cgroups [22]), work well for the process-per-tenantmodel where tasks (processes or threads) are bound to a sin-gle cgroup (tenant) over their lifetime. Frequent switchingbetween cgroups, however, is slow due to lock contention inthe kernel. Second, tracking app-request resource consump-tion across system call boundaries requires additional OSsupport for IO request tagging (as described in [24]), which

    8

    Unifies IO cost into a single metricCaptures non-linear IO performanceProvides IO insulation

    0 5

    10 15 20 25 30 35 40

    1 2 4 8 16 32 64 128 256

    IOPs

    (kop

    /s)

    IOP Size (KB)

    IOP Throughput at 1/2 Max VOPsMax ReadMax Write

    Half Read IOHalf Write IO

    0 0.5

    1 1.5

    2 2.5

    3 3.5

    1 2 4 8 16 32 64 128 256

    Virtu

    al IO

    P Co

    st (o

    p/KB

    )

    IOP Size (KB)

    Libra IO Cost ModelRead IO costWrite IO cost

    2 equal-allocation tenantsIO Insulation = 1/2 Max Read/Write

  • Libra Uses Virtual IOPs to Model IO Resources

    28

    2 equal-allocation tenantsIO Insulation = 1/2 Max Read/Write

    0 5

    10 15 20 25 30 35 40

    1 2 4 8 16 32 64 128 256

    IOPs

    (kop

    /s)

    IOP Size (KB)

    IOP Throughput at 1/2 Max VOPsLibra Read IO ModelLibra Write IO Model

    Max ReadMax Write

    0 0.5

    1 1.5

    2 2.5

    3 3.5

    1 2 4 8 16 32 64 128 256

    Virtu

    al IO

    P Co

    st (o

    p/KB

    )

    IOP Size (KB)

    Libra IO Cost ModelRead IO costWrite IO cost

    cording to a non-linear cost model derived directly from theIOP throughput curves, as shown in Figure 6. Libra calcu-lates the cost model in terms of VOPs-per-byte by dividingthe max IOP throughput by the achieved (read or write) IOPthroughput, normalized by IOP size.

    VOPCPB(IOP-size) =Max-IOP

    Achieved-IOP(IOP-size) ⇥ IOP-sizeIn this model, the maximum IO throughput in VOP/s is con-stant for the pure read/write throughput curves.

    Internally, Libra’s scheduler threads perform distributeddeficit round robin [21] (DDRR) to e�ciently schedule par-allel IO requests (up to 32, which corresponds to the SSDqueue depth). For each IO operation, the scheduler computesthe number of VOPs consumed

    VOPcost(IOP-size) = VOPCPB(IOP-size) ⇥ IOP-sizeand deducts this amount from the associated tenant’s VOPallocation to enforce resource limits and track resource con-sumption. DDRR incurs minimal inter-thread synchroniza-tion and schedules IO tasks in constant time to e�cientlyachieve fair sharing and isolation in a work-conserving fash-ion. In general, virtual-time [12] and round-robin [29] basedgeneralized processor sharing approximations are susceptibleto IO throughput fluctuations, since they only provide propor-tional (not absolute) resource shares. However, Libra’s IOcapacity threshold ensures that each tenant receives at least itsallocated share. Any excess capacity consumed by a tenantcan be charged as overage or used by best-e↵ort tenants.

    The VOP cost model allows Libra to charge an IO opera-tion in proportion to its actual resource usage. For example,10000 1KB reads, 3000 1KB writes and 160 256KB reads allrepresent about a quarter of the SSD IO throughput at theirrespective IOP sizes. Hence, Libra charges each workloadthe same 10000 VOP/s, or about one quarter of the max IOPcapacity. Thus, barring interference e↵ects, Libra can dividethe full IO throughput arbitrarily among tenant workloadswith disparate IOP sizes. Note that while the shape of theread and write VOP cost curves are similar, their magnitudesreflect the relative cost of their operations. Writes are alwaysmore expensive than reads, but the gap diminishes as IOPsizes increase, due to lower erase block compaction overhead.

    Determining the VOP cost model and minimum IO capac-ity for a particular SSD configuration requires benchmarkingthe storage system using a set of experiments similar to theones described for the throughput curves. While these ex-periments are by no means exhaustive, they probe a widerange of operating parameters and give a strong indicationof SSD performance and the minimum VOP bound. Patho-logical cases where IO throughput drops below the minimumcan be detected by Libra, but should be resolved by higher-level mechanisms. Assuming timely resolution, these minorthroughput violations can be absorbed by the provider’s SLA,e.g., EBS guarantees 90% of the provisioned throughput over99.9% of the year [4].

    5. ImplementationLibra exposes a posix-compliant IO interface, wrapping theunderlying IO system calls to enforce resource constraints andinterpose scheduling decisions. To utilize Libra, applications,i.e., persistence engines, simply replace their existing IO sys-tem calls (read, write, send, recv, etc) with the correspondingwrappers. Libra also provides a task marking API for appli-cations to tag a thread of execution (task) and its associatedIO calls with the current app-request or internal operationcontext. The Libra IO scheduling framework is implementedin ⇠20000 lines of C code as a user-space library.

    Libra employs coroutines to handle blocking disk IO andinter-task coordination, i.e., mutexes and conditionals. Corou-tines allow Libra to pause a tenant’s task execution by swap-ping out processor state, i.e., registers and stack pointer, to acompact data structure and resume from a di↵erent coroutine.Libra uses this facility to reschedule an IO task on resourceexhaustion or mutex lock. This allows Libra to delay IOoperations that would otherwise exceed a tenant’s resource al-location until a subsequent scheduling round when the tenantsresources have been renewed. Libra defaults to synchronousdisk operations (O SYNC), disables all disk page caching(i.e., O DIRECT on linux). The page cache masks IO la-tency and queue back pressure, which undermines Libra’sscheduling decisions and capacity model. We also run Librain tandem with a noop kernel IO scheduler to force IOPs todisk with minimal delay and interruption.

    Enabling Libra in our LevelDB-based storage prototype re-quired less than 30 lines of code for replacing system calls andmarking application requests. However, unlocking LevelDB’sfull performance for synchronous PUTs required extensivemodifications. Our prototype enables parallel writes to takefull advantage of SSD IO parallelism (LevelDB serializesall client write threads by default). It also issues sequentialwrites (e.g., FLUSH) in an asynchronous, io-e�cient manner(LevelDB defaults to memory mapped IO which is incompati-ble with O DIRECT). Lastly, our prototype runs FLUSH andCOMPACT operations in parallel (LevelDB schedules bothin the same background task).

    We implemented Libra as a user-space library for two mainreasons. First, the model of multi-tenancy model we supportat the storage node is a single process with multiple threadsthat handle requests from any tenant, frequently switchingfrom one tenant to another. This model is commonly used byhigh-performance key-value storage servers [7, 11]. Existingkernel mechanisms for multi-tenant resource allocation, (e.g.,linux cgroups [22]), work well for the process-per-tenantmodel where tasks (processes or threads) are bound to a sin-gle cgroup (tenant) over their lifetime. Frequent switchingbetween cgroups, however, is slow due to lock contention inthe kernel. Second, tracking app-request resource consump-tion across system call boundaries requires additional OSsupport for IO request tagging (as described in [24]), which

    8

  • blah

    blah

    Libra Design

    29

    LibraProvisioning

    Policy

    Libra IO Scheduler

    Persistence Engine

    Physical Disk

    Provision IO allocations for tenant app-request reservations

    Charge tenant IOPs based on VOP cost

    Track app-request VOP consumption

    Provision VOPs within provisionable limit

    Update tenant VOPallocations

  • Evaluation

    • Does Libra's IO resource model achieve accurate resource allocations?

    • Does Libra's IO threshold make an acceptable tradeoff of performance for predictability in a real storage stack?

    • Can Libra ensure per-tenant app-request reservations while achieving high utilization?

    30

  • 31

    Interference-free Ideal

    Libra Achieves Accurate IO Allocations

    Throughput Ratio = Actual / Expected (IO Insulation)

    Read 1 KB

    0 0.2 0.4 0.6 0.8

    1 1.2

    W 1KBW 4KB

    W 8KBW 16KB

    W 32KBW 64KB

    W 128KBW 256KB

    Thro

    ughp

    ut R

    atio

    Read-Write IOP Throughput Ratio

    Read TenantsWrite Tenants

    even

  • 32

    Interference-free Ideal

    0 0.2 0.4 0.6 0.8

    1 1.2

    Thro

    ughp

    ut R

    atio

    Read-Write IOP Throughput Ratio

    R 1KB R 4KB R 8KB R 16KB R 32KB R 64KB R 128KBR 256KB

    Read TenantsWrite Tenants

    Libra Achieves Accurate IO Allocations

    Write 1-256 KB

    Throughput Ratio = Actual / Expected (IO Insulation)

    0 0.2 0.4 0.6 0.8

    1 1.2

    W 1KBW 4KB

    W 8KBW 16KB

    W 32KBW 64KB

    W 128KBW 256KB

    Thro

    ughp

    ut R

    atio

    Read-Write IOP Throughput Ratio

    Read TenantsWrite Tenants

  • 33

    Libra Achieves Accurate IO Allocations

    0 0.2 0.4 0.6 0.8

    1 1.2

    1 2 4 8 16 32 64 128 256

    VOP

    Cost

    (op/

    KB)

    IOP Size (KB)

    Read IO Cost Models

    0 0.5

    1 1.5

    2 2.5

    3 3.5

    1 2 4 8 16 32 64 128 256VO

    P Co

    st (o

    p/KB

    )

    IOP Size (KB)

    Write IO Cost Models

    libraconstant

    fixed

  • 34

    Libra Achieves Accurate IO Allocations

    0 0.2 0.4 0.6 0.8

    1 1.2

    1 2 4 8 16 32 64 128 256

    VOP

    Cost

    (op/

    KB)

    IOP Size (KB)

    Read IO Cost Models

    0 0.5

    1 1.5

    2 2.5

    3 3.5

    1 2 4 8 16 32 64 128 256VO

    P Co

    st (o

    p/KB

    )

    IOP Size (KB)

    Write IO Cost Models

    libraconstant

    fixedlinear

  • 35

    Libra Achieves Accurate IO Allocations

    Min-Max Ratio = Min Throughput Ratio / Max Throughput Ratio

    0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

    1

    rw rr ww

    Accu

    racy

    (MM

    R)IOP Insulation Accuracy

    Write-WriteRead-ReadRead-Write

    0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

    1

    rw rr ww

    Accu

    racy

    (MM

    R)

    Virtual IOP Allocation Accuracy

    libra linear constant fixedWrite-WriteRead-ReadRead-Write

  • Libra Trades-off Nominal IO Throughput For Predictability

    36

    75:25 GET-PUT, Variance 4K

    1 2 4 8 16 32 64 128 256

    GET Request Size (KB)

    1 2 4 8

    16 32 64

    128 256

    PUT

    Requ

    est S

    ize (K

    B)

    50:50 GET-PUT, Variance 4K

    1 2 4 8 16 32 64 128

    GET Request Size (KB)

    1 2 4 8

    16 32 64

    128 256

    25:75 GET-PUT, Variance 4K

    1 2 4 8 16 32 64 128 256

    GET Request Size (KB)

    1 2 4 8

    16 32 64

    128 256

    16 18 20 22 24 26 28 30

    VOP

    (kop

    /s)

  • Libra Trades-off Nominal IO Throughput For Predictability

    37

    < 10th percentile covered by SLA and higher-level policies

    Workload

    PercentilePercentilePercentilePercentile

    10th 50th 80th All

    99:1

    25:75

    1:99

    1.6% 30.5% 40.5% 45.8%

    1.4% 14.9% 25.0% 34.7%

    0.7% 12.2% 19.5% 28.1%

    Unprovisionable Throughput As a Percentage of Total Throughput

  • Libra Achieves App-request Reservations

    38

    Read Heavy Mixed Write Heavy

    0 1 2 3 4 5 6 7

    Thro

    ughp

    ut (k

    req/

    s)

    Libra Normalized GET (1KB)

    0 1 2 3 4 5 6 7

    Libra Normalized PUT (1KB)

    0 1 2 3 4 5 6 7

    100 150 200 250 300

    Thro

    ughp

    ut (k

    req/

    s)

    Time (s)

    No Profile Normalized GET (1KB)

    0 1 2 3 4 5 6 7

    100 150 200 250 300Time (s)

    No Profile Normalized PUT (1KB)

    1.5x0.5x

    Work-conserving consumption of unprovisioned

    resources

    Fully provisioned allocations

  • Libra Achieves App-request Reservations

    39

    0 1 2 3 4 5 6 7

    Thro

    ughp

    ut (k

    req/

    s)

    Libra Normalized GET (1KB)

    0 1 2 3 4 5 6 7

    Libra Normalized PUT (1KB)

    0 1 2 3 4 5 6 7

    100 150 200 250 300

    Thro

    ughp

    ut (k

    req/

    s)

    Time (s)

    No Profile Normalized GET (1KB)

    0 1 2 3 4 5 6 7

    100 150 200 250 300Time (s)

    No Profile Normalized PUT (1KB)

    Read Heavy Mixed Write Heavy

  • •Libra IO Scheduler- Provisions IO allocations for app-request reservations w/ high utilization.- Supports arbitrary object distributions and workloads.

    • 2 key mechanisms- Track per-tenant app-request resource profiles.- Model IO resources with Virtual IOPs.

    • Evaluation- Achieves accurate low-level IO allocations. - Provisions the majority of IO resources over a wide range of workloads- Satisfies app-request reservations w/ high utilization.

    Conclusion

    40