a gentle introduction to ceph - linux.conf.au · redundancy replication: – n exact full-size...
TRANSCRIPT
![Page 1: A Gentle Introduction to Ceph - linux.conf.au · Redundancy Replication: – n exact full-size copies – Increase read performance (striping) – More copies lower throughput –](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae8d76f1e21a33f52ff743/html5/thumbnails/1.jpg)
A GentleIntroduction
to Ceph
Narrated by Tim [email protected]
Adapted from a longer work by Lars Marowsky-Bré[email protected]
![Page 2: A Gentle Introduction to Ceph - linux.conf.au · Redundancy Replication: – n exact full-size copies – Increase read performance (striping) – More copies lower throughput –](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae8d76f1e21a33f52ff743/html5/thumbnails/2.jpg)
Once upon a time there was a Free and Open Source distributed storage solution named Ceph.
![Page 3: A Gentle Introduction to Ceph - linux.conf.au · Redundancy Replication: – n exact full-size copies – Increase read performance (striping) – More copies lower throughput –](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae8d76f1e21a33f52ff743/html5/thumbnails/3.jpg)
Ceph...
● Has been around for a while (first stable release in July 2012)● Has lots of goodies:
– Distributed object storage
– Redundancy
– Efficient scale-out
– Build on commodity hardware
● Most popular choice of distributed storage for OpenStack[1]
[1] http://www.openstack.org/assets/survey/Public-User-Survey-Report.pdf (October 2015)
![Page 4: A Gentle Introduction to Ceph - linux.conf.au · Redundancy Replication: – n exact full-size copies – Increase read performance (striping) – More copies lower throughput –](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae8d76f1e21a33f52ff743/html5/thumbnails/4.jpg)
Ceph Gives Us...
● A Storage Cluster– Self healing
– Self managed
– No bottlenecks
● Three interfaces– Object Access (like Amazon S3)
– Block Access
– Distributed File System
![Page 5: A Gentle Introduction to Ceph - linux.conf.au · Redundancy Replication: – n exact full-size copies – Increase read performance (striping) – More copies lower throughput –](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae8d76f1e21a33f52ff743/html5/thumbnails/5.jpg)
Ceph's Architecture
radosgw(object storage)
rbd(block device)
CephFS(file system)
RESTful InterfaceS3 and SWIFT APIs
Block devicesUp to 16 EiBThin ProvisioningSnapshots
POSIX CompliantSeparate Data and MetadataFor use e.g. with Hadoop
RADOS (Reliable Autonomic Distributed Object Store)
![Page 6: A Gentle Introduction to Ceph - linux.conf.au · Redundancy Replication: – n exact full-size copies – Increase read performance (striping) – More copies lower throughput –](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae8d76f1e21a33f52ff743/html5/thumbnails/6.jpg)
Once upon a time there was a Free and Open Source distributed storage solution named Ceph.
Sysadmins throughout the land needed to know the components that made up Ceph...
![Page 7: A Gentle Introduction to Ceph - linux.conf.au · Redundancy Replication: – n exact full-size copies – Increase read performance (striping) – More copies lower throughput –](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae8d76f1e21a33f52ff743/html5/thumbnails/7.jpg)
At the Lowest Level
FS
Disk
OSD Object Storage Daemon
File System (btrfs, xfs)
Physical Disk
OSDs serve stored objects to clients
Peer to perform replication and recovery
![Page 8: A Gentle Introduction to Ceph - linux.conf.au · Redundancy Replication: – n exact full-size copies – Increase read performance (striping) – More copies lower throughput –](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae8d76f1e21a33f52ff743/html5/thumbnails/8.jpg)
Put Several OSDs in One Node
FS
Disk
OSD
FS
Disk
OSD
FS
Disk
OSD
FS
Disk
OSD
FS
Disk
OSD
FS
Disk
OSD
![Page 9: A Gentle Introduction to Ceph - linux.conf.au · Redundancy Replication: – n exact full-size copies – Increase read performance (striping) – More copies lower throughput –](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae8d76f1e21a33f52ff743/html5/thumbnails/9.jpg)
Add a Few Monitor Nodes
M Monitors are the brain cells of the cluster- Cluster membership- Consensus for distributed decision making
Do not serve stored objects to clients
![Page 10: A Gentle Introduction to Ceph - linux.conf.au · Redundancy Replication: – n exact full-size copies – Increase read performance (striping) – More copies lower throughput –](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae8d76f1e21a33f52ff743/html5/thumbnails/10.jpg)
...And You Get a Small Ceph Cluster
M MM
![Page 11: A Gentle Introduction to Ceph - linux.conf.au · Redundancy Replication: – n exact full-size copies – Increase read performance (striping) – More copies lower throughput –](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae8d76f1e21a33f52ff743/html5/thumbnails/11.jpg)
...Which You Can Write To
M MM
clientWrites go to one OSD, then propagate to other replicas
![Page 12: A Gentle Introduction to Ceph - linux.conf.au · Redundancy Replication: – n exact full-size copies – Increase read performance (striping) – More copies lower throughput –](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae8d76f1e21a33f52ff743/html5/thumbnails/12.jpg)
...And Read From
M MM
clientReads are serviced by any replica (improved throughput)
![Page 13: A Gentle Introduction to Ceph - linux.conf.au · Redundancy Replication: – n exact full-size copies – Increase read performance (striping) – More copies lower throughput –](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae8d76f1e21a33f52ff743/html5/thumbnails/13.jpg)
Three Conceptual Components
● Pools
● Placement Groups
● CRUSH (deterministic, decentralised placement algorithm)
![Page 14: A Gentle Introduction to Ceph - linux.conf.au · Redundancy Replication: – n exact full-size copies – Increase read performance (striping) – More copies lower throughput –](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae8d76f1e21a33f52ff743/html5/thumbnails/14.jpg)
Pools
● Logical container for storage objects● Have a set of parameters
– Name, ID
– Number of replicas or erasure coding settings
– Number of placement groups
– CRUSH rules
– Owner
● Support certain operations– Create/read/write objects
– Snapshot pool
![Page 15: A Gentle Introduction to Ceph - linux.conf.au · Redundancy Replication: – n exact full-size copies – Increase read performance (striping) – More copies lower throughput –](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae8d76f1e21a33f52ff743/html5/thumbnails/15.jpg)
Placement Groups (PGs)
● Help balance data across OSDs
● One PG typically spans several OSDs
● One OSD typically serves many PGs
● Tunable – read the docs! (50-100 per OSD)
![Page 16: A Gentle Introduction to Ceph - linux.conf.au · Redundancy Replication: – n exact full-size copies – Increase read performance (striping) – More copies lower throughput –](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae8d76f1e21a33f52ff743/html5/thumbnails/16.jpg)
CRUSH
● Controlled Replication Under Scalable Hashing
● MONs maintain CRUSH map– Physical topology (row, rack, host)
– Rules for which OSDs to consider for certain pool/PG
● Clients understand CRUSH– This is the magic that removes bottlenecks
![Page 17: A Gentle Introduction to Ceph - linux.conf.au · Redundancy Replication: – n exact full-size copies – Increase read performance (striping) – More copies lower throughput –](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae8d76f1e21a33f52ff743/html5/thumbnails/17.jpg)
Once upon a time there was a Free and Open Source distributed storage solution named Ceph.
Sysadmins throughout the land needed to know the components that made up Ceph…
...because they wanted to deploy Software Defined Storage, instead of legacy storage arrays...
![Page 18: A Gentle Introduction to Ceph - linux.conf.au · Redundancy Replication: – n exact full-size copies – Increase read performance (striping) – More copies lower throughput –](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae8d76f1e21a33f52ff743/html5/thumbnails/18.jpg)
Legacy Storage Arrays
● Limits:– Tightly controlled environment
– Limited scalability
– Few options● Certain approved drives● Constrained disk slots● Fewer memory variations● Few networking choices● Fixed controller & CPU
● Benefits:– Reasonably easy to understand
– Long-term experience, “gut instincts”
– Somewhat deterministic in behaviour and pricing
![Page 19: A Gentle Introduction to Ceph - linux.conf.au · Redundancy Replication: – n exact full-size copies – Increase read performance (striping) – More copies lower throughput –](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae8d76f1e21a33f52ff743/html5/thumbnails/19.jpg)
Software Defined Storage (SDS)
● Limits:– ?
● Benefits:– Infinite scalability
– Infinite adapability
– Infinite choices
– Infinite flexibility
To infinity… and beyond!”– Buzz Lightyear
![Page 20: A Gentle Introduction to Ceph - linux.conf.au · Redundancy Replication: – n exact full-size copies – Increase read performance (striping) – More copies lower throughput –](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae8d76f1e21a33f52ff743/html5/thumbnails/20.jpg)
Software Defined Storage Properties
● Throughput● Latency● IOPS
● Capacity● Density
● Availability● Reliability
● Cost
![Page 21: A Gentle Introduction to Ceph - linux.conf.au · Redundancy Replication: – n exact full-size copies – Increase read performance (striping) – More copies lower throughput –](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae8d76f1e21a33f52ff743/html5/thumbnails/21.jpg)
Architecting SDS Systems
● Goals often conflict– Availability vs density
– IOPS vs density
– Everything vs cost
● Many hardware options● Software topology offers many choices● There is no “one size fits all”
![Page 22: A Gentle Introduction to Ceph - linux.conf.au · Redundancy Replication: – n exact full-size copies – Increase read performance (striping) – More copies lower throughput –](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae8d76f1e21a33f52ff743/html5/thumbnails/22.jpg)
Once upon a time there was a Free and Open Source distributed storage solution named Ceph.
Sysadmins throughout the land needed to know the components that made up Ceph…
...because they wanted to deploy Software Defined Storage, instead of legacy storage arrays…
...and they found they had many questions regarding configuration choices.
![Page 23: A Gentle Introduction to Ceph - linux.conf.au · Redundancy Replication: – n exact full-size copies – Increase read performance (striping) – More copies lower throughput –](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae8d76f1e21a33f52ff743/html5/thumbnails/23.jpg)
Network
● Choose the fastest you can afford
● Separate public and cluster networks
● Cluster network should be 2x public bandwidth
● Ethernet (1, 10, 40 GigE), or IPoIB
![Page 24: A Gentle Introduction to Ceph - linux.conf.au · Redundancy Replication: – n exact full-size copies – Increase read performance (striping) – More copies lower throughput –](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae8d76f1e21a33f52ff743/html5/thumbnails/24.jpg)
Storage Nodes
● CPU (number & speed of cores)● Memory● Storage controller (bandwidth, performance, cache size)● SSDs for OSD journal (SSD to HDD ratio)● HDDs (count, capacity, performance)
![Page 25: A Gentle Introduction to Ceph - linux.conf.au · Redundancy Replication: – n exact full-size copies – Increase read performance (striping) – More copies lower throughput –](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae8d76f1e21a33f52ff743/html5/thumbnails/25.jpg)
SSD Journals
● Accelerate bursts & random write IO● Sustained writes that overflow journal degrade to HDD speed● Help very little with read performance● Are costly, and consume storage slots● Use a large battery-backed cache on storage controller if not
using SSDs
![Page 26: A Gentle Introduction to Ceph - linux.conf.au · Redundancy Replication: – n exact full-size copies – Increase read performance (striping) – More copies lower throughput –](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae8d76f1e21a33f52ff743/html5/thumbnails/26.jpg)
Hard Disk Parameters
● Capacity matters (highest density often not most cost effective)● Reliability advantage of Enterprise drives typically marginal
compared to cost● High RPM increases IOPS & throughput, but also power
consumption and cost
![Page 27: A Gentle Introduction to Ceph - linux.conf.au · Redundancy Replication: – n exact full-size copies – Increase read performance (striping) – More copies lower throughput –](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae8d76f1e21a33f52ff743/html5/thumbnails/27.jpg)
Redundancy
● Replication:– n exact full-size copies– Increase read performance
(striping)
– More copies lower throughput– Increased cluster network
utilisation for writes
– Rebuilds leverage multiple sources– Significant capacity impact
● Erasure coding:– Data split into k parts plus m
redundancy codes
– Better space efficiency
– Higher CPU overhead
– Significant CPU & cluster network impact, especially during rebuild
– Cannot directly be used with block devices
![Page 28: A Gentle Introduction to Ceph - linux.conf.au · Redundancy Replication: – n exact full-size copies – Increase read performance (striping) – More copies lower throughput –](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae8d76f1e21a33f52ff743/html5/thumbnails/28.jpg)
Cache Tiering
● One pool acts as transparent write-back overlay for another● Can flush on relative or absolute dirty levels, or age● Additional configuration complexity, requires workload specific
tuning● Some downsides (no snapshots)● Good way to combine advantages of replication & erasure
coding
![Page 29: A Gentle Introduction to Ceph - linux.conf.au · Redundancy Replication: – n exact full-size copies – Increase read performance (striping) – More copies lower throughput –](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae8d76f1e21a33f52ff743/html5/thumbnails/29.jpg)
Adding More Nodes
● Capacity increases● Total throughput increases● IOPS increase● Redundancy increases
● Latency unchanged● Eventual network topology
limitations● Temporary impact during
rebalancing
![Page 30: A Gentle Introduction to Ceph - linux.conf.au · Redundancy Replication: – n exact full-size copies – Increase read performance (striping) – More copies lower throughput –](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae8d76f1e21a33f52ff743/html5/thumbnails/30.jpg)
Adding More Disks to a Node
● Capacity increases● Redundancy increases● Throughput might increase● IOPS might increase
● Internal node bandwidth consumed
● Higher CPU & memory load● Cache contention● Latency unchanged
![Page 31: A Gentle Introduction to Ceph - linux.conf.au · Redundancy Replication: – n exact full-size copies – Increase read performance (striping) – More copies lower throughput –](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae8d76f1e21a33f52ff743/html5/thumbnails/31.jpg)
Once upon a time there was a Free and Open Source distributed storage solution named Ceph.
Sysadmins throughout the land needed to know the components that made up Ceph…
...because they wanted to deploy Software Defined Storage, instead of legacy storage arrays…
...and they found they had many questions regarding configuration choices.
But learning which questions to ask enabled them to build sensible proofs-of-concept, which they scaled up and out...
![Page 32: A Gentle Introduction to Ceph - linux.conf.au · Redundancy Replication: – n exact full-size copies – Increase read performance (striping) – More copies lower throughput –](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae8d76f1e21a33f52ff743/html5/thumbnails/32.jpg)
How to Size a Ceph Cluster?
● Understand your workload● Make a best guess, based on desirable properties & factors● Build 10% pilot / proof of concept● Refine until desired performance is achieved● Scale up (most characteristics retained or even improved)● It doesn't have to be perfect, you can always evolve it later
![Page 33: A Gentle Introduction to Ceph - linux.conf.au · Redundancy Replication: – n exact full-size copies – Increase read performance (striping) – More copies lower throughput –](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae8d76f1e21a33f52ff743/html5/thumbnails/33.jpg)
Once upon a time there was a Free and Open Source distributed storage solution named Ceph.
Sysadmins throughout the land needed to know the components that made up Ceph…
...because they wanted to deploy Software Defined Storage, instead of legacy storage arrays…
...and they found they had many questions regarding configuration choices.
But learning which questions to ask enabled them to build sensible proofs-of-concept, which they scaled up and out…
...and they all lived happily ever after.
![Page 34: A Gentle Introduction to Ceph - linux.conf.au · Redundancy Replication: – n exact full-size copies – Increase read performance (striping) – More copies lower throughput –](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae8d76f1e21a33f52ff743/html5/thumbnails/34.jpg)
Once upon a time there was a Free and Open Source distributed storage solution named Ceph.
Sysadmins throughout the land needed to know the components that made up Ceph…
...because they wanted to deploy Software Defined Storage, instead of legacy storage arrays…
...and they found they had many questions regarding configuration choices.
But learning which questions to ask enabled them to build sensible proofs-of-concept, which they scaled up and out…
...and they all lived happily ever after.