glusterfs internals and - red hat€¦ · glusterfs internals and directions jeff darcy principal...

GlusterFS Internals and

Directions

Jeff Darcy Principal Engineer, Red Hat13 June, 2013

GlusterFSis not

a filesystem

Wait . . . what?

● GlusterFS is a scalable general purpose storage platform

● We handle common storage tasks● cluster management and configuration● data distribution and replication● common control and data structures

● That platform can be used many different ways

Interface Possibilities

Hadoop

Cinder

Swift (UFO)

Files Blocks

Objects

libgfapi

Whatever

IP RDMA

Transports

files BD

Back ends

OpenStack and GlusterFS – Current Integration

Glance Images

NovaNodes

SwiftObjects

Cinder Data

Glance Data

Swift Data

Swift API

Storage Server

Storage Server…

● Separate Compute and Storage Pools

● GlusterFS directly provides Swift object service

● Integration with Keystone● GeoReplication for multi-site

support● Swift data also available via

other protocols● Supports non-OpenStack use in

addition to OpenStack use

Logical View Physical View

OpenStack and GlusterFS - Future Direction

HadoopGuest

OtherGuest

GlusterGuest

HadoopGuest

OtherGuest

GlusterGuest

NovaCompute

Open Stack and GlusterFS - Future Direction

● POC based on proposed OpenStack FaaS (File as a Service) proposal

● Cinder-like virtual NAS service● Tenant-specific file shares● Hypervisor mediated for security

● Avoid exposing servers to Quantum tenant network ● Optional multi-site or multi-zone GeoReplication

● FaaS data optionally available to non OpenStack nodes

● Initial focus on Linux guest

● Windows (SMB) and NFS shares also under consideration

Making Hard Stuff Easier

● Distributed filesystems are notoriously hard to set up● multiple experts for multiple weeks is “normal”

● How about four CLI commands?● probe peer, create volume, start volume, mount

● We handle cluster membership, process management, port mapping, dynamic configuration changes, etc.

● add/remove nodes on the fly● add/remove features on the fly● rolling upgrade

Q: How Do We Do It?

Distribution

Replication

RPC Server

Local Storage

LocalFS

libgfapi

RPC Client

one of... ...plus all of... ...plus all of...

A: Modularity!

Deep Dive: Distribution

Distribution

Replication

RPC Server

Local Storage

LocalFS

libgfapi

RPC Client

Elastic Hashing

Server A

Server BServer C

File X

File Y

● Deterministic mapping: object hash → server

Adding a Node

Server A

Server BServer C

File X

File Y

● Minimize reassignment when server set changes

Server D

Rebalancing

● Goal: optimal layout with minimal data movement

● Greatly improved algorithms in 3.4

Future: Tiering and Topology Awareness

● General deterministic matching function: file attributes to storage attributes

● Currently both attributes are hashes, but...● file attribute could be account ID, age, ...● storage attribute could be disk type (SSD), replication

level, ...● either could be an arbitrary tag

● Rebalance etc. “just work” regardless

● Algorithms can be stacked on top of one another

Tiering ExampleVolume

(select by path)

SSDs(random)

Replicated(random)

Development(random)

Production(select by age)

Deep Dive: Replication

Distribution

Replication

RPC Server

Local Storage

LocalFS

libgfapi

RPC Client

Replicated Writes

Client

Server A

Server B

lock xattr+ write xattr- unlock

● Many optimizations avoid the lock/xattr ops● especially for sequential writes

● Still synchronous● don't try this on a high-latency network

Self Heal

● Generation 1: on demand

● Generation 2: full manual scan

● Generation 3: parallel, automatic repair● index based● GlusterFS 3.3, RHS 2.0

● Future: journal based● even more precise (i.e. faster)● lower overhead

Split Brain

Server A Server B

Client 1 Client 2

write“foo”

write“bar”

networkpartition

Split Brain (continued)

● In 3.3: basic quorum enforcement● client side, replica-set level● poor approach for N=2

● In 3.4: advanced quorum enforcement● server side, cluster level

● In 3.5: hyper-advanced (?) quorum enforcement● volume level● arbiters (best approach for N=2)

Access Methods (past)

Distribution

Replication

RPC Client

HadoopHadoop

Access Methods (present)

Distribution

Replication

RPC Client

Hadoop

qemu libgfapi

Access Methods (future)

Distribution

Replication

RPC Client

HadoopHadoop

qemu libgfapi

Your API

What is libgfapi?

● User-space library for accessing data in GlusterFS

● Filesystem-like API

● Runs in application process● no FUSE, no copies, no context switches● ...but same volfiles, translators, etc.

● Could be used for Apache/nginx modules, MPI I/O (maybe), Ganesha, etc. ad infinitum

● BTW it's usable from Python too :)

Translator API

● If libgfapi isn't enough, you can write your own translators (including glupy for Python)

● Most of what we already do is in translators

● It's a public (though not well documented) API● “Translator 101” series, forge.gluster.org

● Translators are right in the I/O path

● Current examples: encryption, erasure coding

● Other possibilities: dedup/compression, format translation, indexing

http://www.gluster.org

● Modularity makes it all possible

● Expect:● OpenStackHadoopOpenStackHadoop...

● marketing made me say that

● more front-end protocols● more back-end storage options● more functionality within the I/O path● more performance enhancements

● Make the storage system you want

glusterfs internals and - red hat€¦ · glusterfs internals and directions jeff darcy principal...

Documents

codemotion rome 2015. glusterfs

glusterfs documentation

hacking windows internals - black hat

integrating glusterfs & supporting storage array offloads...

red hat storage - introduction to glusterfs

disaster recovery con software-defined storage ed … ·...

glusterfs and openstack storage

glusterfs samba clustering with report from the field ·...

big data management platforms - cheriton school of ... ›...

glusterfs as a dfs

glusterfs ctdb integration

clusters with glusterfs

glusterd 2 - events.static.linuxfound.org · senior s/w...

clusters de stockage : glusterfs

demystifying gluster - red hat · pdf filea peek at...

demystifying gluster - red hat...2014/03/04 ·...

glusterfs and rhs for sysadmins - red...

snapshots in red hat storage server/glusterfs · pdf filerhs...

qemu-glusterfs integration

glusterfs and hadoop