distributed object storage system ceph in practice · distributed object storage system ceph in...

32
Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚ cek [email protected] Trustica 8.10.2016 Dominik Joe Pant˚ cek Trustica Practical Ceph 8.10.2016 1 / 32

Upload: others

Post on 25-Mar-2020

15 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik

Distributed Object Storage System Ceph in Practice

Dominik Joe [email protected]

Trustica

8.10.2016

Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 1 / 32

Page 2: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik

Legal notice.

Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 2 / 32

Page 3: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik

Distributed Object Storage System Ceph in Practice

Dominik Joe [email protected]

Trustica

8.10.2016

Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 3 / 32

Page 4: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik

Why?

� Why daily operations?� We need convenient deployment of VMs: private IaaS cloud.� Why cloud?� It’s convenient.� Why private cloud?� Negotiations with cloud service providers are tough.� Why Ceph?� Everything else failed miserably.

Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 4 / 32

Page 5: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik

What is Ceph?

� Distributed,� highly scalable,� open source,� object storage...

� block storage,� file system� and other applications.

Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 5 / 32

A parallel universe, where, thanks to masterfully written Ceph, peace and prosperity prevail.

Page 6: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik

Ceph Architecture

� Daemons:� OSD - Object Data storage� Monitor� Metadata Server – MDS

� Nodes:� OSD, Monitor, MDS,� client,� admin.

Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 6 / 32

Page 7: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik

Ceph OSD

� Object Storage Device� Disk or at least part of it.� The Ceph OSD daemon.� Multiple OSDs in one physical node,� managed in pools by monitors.

Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 7 / 32

Page 8: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik

Ceph Monitor

� Cluster state – cluster map,� consisting of maps:

� monitor map,� OSD map,� placement group map,� CRUSH map,� MDS map.

� Only 1 monitor and 2 OSDs are needed for bare minimum cluster.

Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 8 / 32

Page 9: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik

Ceph (OSD) Hierarchy

Cluster:� Node

� OSD� Monitor� MDS� Client

Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 9 / 32

Page 10: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik

Objects placement

� No files or directories.� Object storage stores only objects.� Behaves like key:value store.� The value can be really big.� Find balance:

� across all OSDs� and across nodes.

Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 10 / 32

Page 11: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik

Placement groups

� Set of OSDs within a pool� which can store an object.� Size based upon pool number of replicas.� Vast number of objects – compared to number of PGs.

Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 11 / 32

Page 12: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik

CRUSH

� Controlled Replication Under Scalable Hashing.� Each client computes which PGs to use.� Describes cluster hierarchy as a weighted tree.� Selects sets of disks based on deterministic criteria.� Does not need any central authority.

Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 12 / 32

Page 13: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik

RADOS

� Set of protocols governing CRUSH,� RBD and� metadata.� Reliable Autonomic Distributed Object Store.

Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 13 / 32

Page 14: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik

Monitors

� Cluster management,� voting,� osd handling,� yes, everything.

Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 14 / 32

Page 15: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik

Ceph File System

� Built on top of the underlying objects,� POSIX file system with advanced features,� directory size is reported immediately,� virtually no limits on file sizes,� data/metadata separate redundancy settings:

� different pools.

Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 15 / 32

Page 16: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik

Metadata Server

� Separate pool for metadata,� MDS serves objects metadata,� reasonable performance,� required for the actual FS implementation.

Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 16 / 32

Page 17: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik

In the cloud

� Amazon Simple Storage Service (S3)?� Openstack?� Swift.� Ceph backend.

Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 17 / 32

Page 18: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik

RBD for disks

� Block device,� from multiple objects,� naming conventions – prefix,� provides caching,� true concurrent access.

Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 18 / 32

Page 19: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik

Ceph FS for shared data

� ISO images,� configurations,� shared data.

Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 19 / 32

Page 20: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik

Ceph FS release

� Stable in 10 series,� available as FUSE.

Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 20 / 32

Page 21: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik

Kernel support

� Both Ceph RBD and File Systems� Support merged into 2.6.34 (released May 16, 2010)� Convenient usage� Ceph FS:

mount -t ceph mon1,mon2,mon3:/ /mnt/ceph� RBD:

rbd map mypool/mydevicemkfs -t ext4 /dev/rbd1

Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 21 / 32

Page 22: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik

But...

Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 22 / 32

Page 23: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik

RBD thin provisioning

� Provision 20PB,� wait 8 days before de-provisioning finishes� ...� profit???

Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 23 / 32

Page 24: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik

RBD locking

� $ rbd map� On two nodes,� on one mkfs on other mount,� everything works,� another time on single node...� D indefinitely.� $ reboot

Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 24 / 32

Page 25: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik

Ceph-FS locking

� Processes in D state any time,� no way to unmount,� $ reboot

Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 25 / 32

Page 26: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik

Log clogging

� libceph ...� Does not prevent the locking bugs.� Does not even log them ...� $ reboot

Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 26 / 32

Page 27: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik

Userspace RBD

� RBD as storage backend for VM� in KVM – librbd in userspace (qemu-kvm)

Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 27 / 32

Page 28: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik

Userspace Ceph filesystem

� In 10.x ceph-fuse implementation.� Stable.� Decent performance.

Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 28 / 32

Page 29: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik

RBD cache

� Performance is an issue,� commits are a problem,� always use backup battery for the controllers.

Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 29 / 32

Page 30: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik

References

� Ceph, http://ceph.com� Bugemos — Jojin&HedgeHog: The Chronicles of KOS - Alternativnı prıtomnost,

2006, available online at http://www.bugemos.com/?q=node/357� CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data, Sage

A. Weil, Scott A. Brandt, Ethan L. Miller and Carlos Maltzahn, Storage SystemsResearch Center University of California, Santa Cruz, SC2006 November 2006,Tampa, Florida, USA 0-7695-2700-0/06, available online athttp://ceph.com/papers/weil-crush-sc06.pdf

Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 30 / 32

Page 31: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik

Questions

... and answers.

Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 31 / 32

Page 32: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik

Thank you!

Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 32 / 32