evaluating cloud storage strategies - odin · profit from the cloud™ | 7 • a large number of...

17
Evaluating Cloud Storage Strategies James Bottomley; CTO, Server Virtualization

Upload: ngokien

Post on 24-Aug-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

Evaluating Cloud Storage Strategies James Bottomley; CTO, Server Virtualization

Profit from the cloud™ | 2

• Attachments: - Local (Direct – cheap)

• SAS, SATA - Remote (SAN, NAS – expensive)

• FC • net

• Types - Block

• Spinning Disk Drive • SSD • RAID unit

- File • NFS • CEPH

- Object • RADOS • PCS

Introduction to Storage

Profit from the cloud™ | 3

Storage Performance Comparison

Profit from the cloud™ | 4

Storage Cost Comparison

Profit from the cloud™ | 5

• Block device - A unit of storage - May be divided inflexibly (by partitioning) - Usually locally attached, but may be on a SAN

• File based Storage - Exports views of a filesystem via NFS, CIFS or other protocols - Is flexible

• storage in views can be expanded and contracted on the fly - Suffers from metadata issues on the server

• Object Storage - Really just means a flexible block device - May be expanded and contracted on the fly - Easily administrable (unlike LUN partitioning in SANs)

A Closer Look at the Terms

Profit from the cloud™ | 6

Storage Types Comparison

• Inelastic • Hard to Aggregate • Attached to

individual systems

• Slightly Elastic • Fixed size • Good B/W • Dedicated network

• Based on SAN • Limited Scaling

• Tuned to disk image size objects

• Designed for rapid update

• Scalable B/W

• Simple Web API • No easy way to

update objects • Slow

• CEPH, Gluster • Object Size tuning

problem

Cloud U

tility

Hosting Utility

Profit from the cloud™ | 7

• A large number of Cloud storage systems are file based - CEPH, Gluster

• The specific problem is that updating any file requires a change in the metadata

- This produces both a hotness in the journal - As well as locking hierarchy issues - And communication with the metadata server - All of which slow the operations down

• Object storage only uses metadata when objects are

resized, created or destroyed - Using a fixed size object incurs no metadata overhead whatsoever

• So objects providing virtual environment roots allows efficient embedded filesystems with zero metadata overhead

Object vs File and the Metadata Problem

Profit from the cloud™ | 8

• Fuse is the Linux Userspace Filesystem • Main problem is it’s incredibly SLOW • However, it is very useful, so a large number

of cloud filesystems use it - Gluster

• Parallels originally avoided using it. • However, now we’ve decided we’ll fix it for everyone • Parallels engineers are currently interacting with the linux

filesystems and fuse lists • Object is to add write caching and mtime fixes to accelerate

fuse • Tests show we can get ~95% of the performance of a

natively written filesystem

FUSE Issues

Profit from the cloud™ | 9

• Strong Consistency is hard to achieve in clusters - Strong Consistency means that all updates are seen immediately

after they are committed - Strong consistency is most often violated across cluster

reconfigurations - Ironically, this is precisely when you usually need it (HA) - Sheepdog, CEPH, PStorage

• Eventual Consistency is the usual norm - Means that all updates are eventually seen, but may not be

immediately visible after they are committed - SWIFT, Gluster (does have a much slower strong

consistency quorum enforcement mode) • Weak Consistency

- Does not guarantee write ordering and visibility - Too weak to be useful for most cloud storage

Consistency

Profit from the cloud™ | 10

• Cloud storage must be designed to scale not just per node, but also per Virtual Environment per node

• This requires there be no bottlenecks connecting a virtual environment to storage

- Sheepdog problem: it uses a single threaded per-node gateway process causing its scalability per VE to be poor

• Ideally, a direct connection should be made between the virtual environment using the object and the storage providing it with no intermediate broker

- Or using an intermediate broker tuned for scalability • Chunking (large block size for objects) also improves

performance

Performance and Scalability

Profit from the cloud™ | 11

• The Cardinal hosting requirement is that existing local storage should be repurposed as generic object based storage for

1. Supporting Existing Hosting Environments and additional services 2. Enabling the provision of Cloud Services

• Equating to the technical requirements 1. Performance must be wire speed SATA (100MB/s)

• Tuned exactly for GB objects containing small files 2. Storage must be object based to avoid metadata issues 3. Objects should be capable of rapid random read/write updates 4. Storage bandwidth should scale linearly with the cluster

Requirements for Hosting Storage

Profit from the cloud™ | 12

• Hosting Enhancements 1. Free storage from individual nodes

• Easy, fast migration of Virtual Environments • High Availability

2. Simple and Efficient resizing with assist for legacy roots (ext3) • Makes storage easier to sell in increments

3. Cloning and Snapshotting • Value add for templating block based roots • Permits easy backup

4. Redundancy • Allows different storage SLAs for different prices

• Cloud Enhancements (Ideal Storage Solution) 1. Dropbox like services 2. Storage as a Service (like S3) 3. Storage on Demand 4. Tiered Storage Pricing

Simple Requirements for Additional Benefits

Profit from the cloud™ | 13

• Technical Specs - Metadata is the key to improving performance - Large Static objects with rapid updates have fixed metadata - 100MB/s performance over gigabit ethernet (no 10GE requirement)

• Avoid - Anything like a filesystem (CEPH, Gluster) because of

• Locking problems • Speed issues with per file need to consult metadata

- Anything using FUSE (Gluster) • At least anything using FUSE without the Parallels acceleration patches

- Anything with a single threaded connection multiplexor (sheepdog) • Per cluster is worse (kills all scalability) • Per node is still bad (kills VE scalability)

Ideal solution

Profit from the cloud™ | 14

• Why Choose Us? - We’re the experts in the field (we studied the problem) - We fixed FUSE - We redid the Linux loop device to work efficiently for virtual

environment roots • In collaboration with Oracle who did the Direct I/O patches

- Loop device also modified to do snapshotting and legacy filesystem resizing.

- All the necessary infrastructure patches are upstream in linux • Or are moving that way

• What we provide - Complete leverage of existing local node storage - Strong Consistency and Redundancy - Wire speed transfers because of optimised data architecture

• Up to 100MB/s/node over 1GigE - Hot object tiering and SSD caching

Introducing Parallels Cloud Storage

Profit from the cloud™ | 15

Parallels Cloud Storage Architecture

Profit from the cloud™ | 16

• Chunk Server based snapshotting • De-duplication • Thin Provisioning

- Actual storage size can appear much larger than in-use backing store because of sparsity of objects

- Also provides ability to do dynamic in-place upgrades of actual storage capacity

• Innovative redundancy algorithms • Geographic Object Replication for advanced disaster

recovery

Future Features

Profit from the cloud™ | 17

• Getting Cloud storage right for current hosting needs is not a simple problem

- The basic construction of many cloud storage offerings is unsuitable to hosting provider environments

• Parallels has devoted considerable study and effort to mapping the needs of hosters on to cloud storage

• Parallels has studied the strengths and weaknesses of current cloud storage offerings and incorporated the best into our cloud storage offerings

- While attempting to eliminate all the negative issues - And improve performance

• Parallels will leverage (and enhance) open source to achieve the best cloud storage system for hosters

Conclusions