suse enterprise storage powered by ceph - opentechday.nl · file system (xfs) or bluestore object...
TRANSCRIPT
CephAbout
#opentechday #suse
• Scale-out
• Object store
• Multiple interfaces
• Open source
• Community based
• Common hardware
• Self-healing & managing
https://ceph.com
CephCommunity
Code developers
782Core Regular Casual
22 53 705
Total downloads
160,015,454
Unique downloads
21,264,047
#opentechday #suse
CephObject Storage Daemon (OSD)
• OSDs serve storage objects to clients
• Peer to perform replication and recovery and scrubbing
• Journal often stored on faster media like SSD (often shared)
Physical disk, other persistent storage device
File system (xfs) or BlueStore
Object Storage Daemon
#opentechday #suse
CephMonitor node
• Monitors are the brain cells of the cluster
- Cluster membership (cluster map)
- Consensus for distributed decision making
• Not in the performance path
- Do not serve stored objects to clients
#opentechday #suse
CephReplication options
One copy plus parity
- Cost-effective durability
- 1.5x (50% overhead)
- Expensive recovery
Full copies of stored objects
- Very high durability
- 3x (200% overhead)
- Quicker recovery
#opentechday #suse
CephCRUSH placement algorithm
Pseudo-random data placement algorithm
- Fast calculation, no lookup table
- Repeatable, deterministic
- Statistically uniform distribution
CRUSH uses a map of OSDs in the cluster
- Includes physical topology, like row, rack, host
- Includes rules describing which OSDs to consider
#opentechday #suse
CephPlacement Group (PG)
• Balance data across OSDs in the cluster
• One PG typically exists on several OSDs for replication
• One OSD typically serves many PGs
#opentechday #suse
CephPlacement Group (PG)
• Each placement group maps to a pseudo random set of OSDs
• When an OSD fails, replication generally involves all OSDs in the pool replicating PGs across all OSDs in the pool
• Massively parallel recovery
#opentechday #suse
CephPools
• Logical container for storage objects
• Number of replicas OR erasure encoding settings
• Number of placement groups
Pool operations
- Create object
- Remove object
- Read object
- Write entire object
- Snapshot of the entire pool
#opentechday #suse
Ceph !But why SUSE?
25+Years of Open Source Engineering Experience
2/3+Of the Fortune Global 100 use SUSE Linux Enterprise
10Awards in 2016 for SUSE Enterprise Storage
1stEnterprise Linux Distribution
1stEnterprise OpenStack Distribution
50%+Development Engineers
1.4BAnnualRevenue
Top 15Worldwide System infrastructure Software Vendor
+8%SUSE Growth vs. Other Linux in 2015*
#opentechday #suse
SUSE Enterprise StorageEnable transformation
Support today’s investment Adapt to the future
Legacy Data Center
- Network, compute and storage silos
- Traditional protocols – Fibre Channel, iSCSI, CIFS/SMB, NFS
Process Driven
- Slow to respond
Software-defined Data Center
- Software-defined everything
Agile Infrastructure
- Supporting a DevOps model
- Business driven
Mode 1 – Gartner for Traditional Mode 2 – Gartner for Software Defined
This is where you probably are today This is where you need to get to
#opentechday #suse
SUSE Enterprise StorageUse cases
Video Surveillance
• Security surveillance• Red light / traffic cameras• License plate readers• Body cameras for law
enforcement• Military/government visual
reconnaissance
Virtual Machine Storage
Low and mid i/o performancefor major hypervisor platforms• kvm – native RBD• Hyper-V – iSCSI• VMware - iSCSI
Bulk Storage
• SharePoint data• Medical records• Medical images
• X-rays• MRIs• CAT scans
• Financial records
Data Archive
Long-term storage and back up:• HPC• Log retention• Tax documents• Revenue reports
#opentechday #suse
SUSE Enterprise Storage Data Capacity Utilization
Tier 0
- Ultra high performance
Tier 1
- High-value, OLTP, Revenue Generating
Tier 2
- Backup/recovery, reference data, bulk data
Tier 3
- Object archive
- Compliance archive
- Long-term retention
#opentechday #suse
SUSE Enterprise Storage Roadmap
2016 2017 2018
V3
V4
V5
Information is forward looking and subject to change at any time.
SUSE Enterprise Storage 3 SUSE Enterprise Storage 4 SUSE Enterprise Storage 5
Built On
• Ceph Jewel release
• SLES 12 SP1 (Server)
Manageability
• Initial Salt integration (tech preview)
Interoperability
• CephFS (Tech Preview)
• AArch64 (Tech Preview)
Availability
• Multisite object replication (Tech Preview)
• Async block mirroring (Tech Preview)
Built On
• Ceph Jewel release
• SLES 12 SP 2 (Server)
Manageability
• SES openATTIC management
• Initial Salt integration
Interoperability
• AArch64
• CephFS (production use cases)
• NFS Ganesha (Tech Preview)
• NFS access to S3 buckets (Tech
Preview)
• CIFS Samba (Tech Preview)
• RDMA/Infiniband (Tech Preview)
Availability
• Multisite object replication
• Asynchronous block mirroring
Built On
• Ceph Luminous release
• SLES 12 SP 3 (Server)
Manageability
• SES openATTIC management phase 2
• SUSE Manager integration
Interoperability
• NFS Ganesha
• NFS access to S3 buckets
• CIFS Samba (Tech Preview)
• Fibre Channel (Tech Preview)
• RDMA/Infiniband
• Support for containers
Availability
• Asynchronous block mirroring
• Erasure coded block pool
Efficiency
• BlueStore back-end
• Data compression
• Quality of Service (Tech Preview)
#opentechday #suse
Proposed landscape
[HP DL360 Gen9] CIFS/NFS gateway / management node
2x E5-2630v3
4x 16GB PC4-2133
1x Dual 120GB SSD M.2
2x 10GbE T
500W R-PS
[HP DL160 Gen9] monitoring node
1x E5-2603v3
1x 8GB PC4-2133
1x Dual 1210GB SSD M.2
1x 10GbE T
550W PS
[HP DL380 Gen9] OSD node
2x E5-2630v3
8x 16GB PC4-2133
2x 8GB PC4-2133
1x Dual 120GB SSD M.2
2x 400GB SSD
1x 800GB SSD
12x 8TB HDD
2x 10GbE T
800W R-PS
Start with 480 TB Netto + extend with 68,6 TB
1 OSD node 12x 8 TB 96 TB RAW
7 OSD nodes 84x 8 TB 672 TB RAW (96x7)
Erasure code k=5, m=2 480 TB Netto (672/7x5)
1 OSD node k=5, m=2 68,6 TB Netto (480/7)
UID
2
1
4
3
6
5
7 8 ProLiantDL160Gen9
monitoring node
UID
2
1
4
3
6
5
7 8 ProLiantDL160Gen9
monitoring node
UID
2
1
4
3
6
5
7 8 ProLiantDL160Gen9
monitoring node
UID
2
1
4
3
6
5
7 8
ProLiant
DL360
Gen9
CIFS/NFS gatewaymanagement node
ProLiant
DL380
Gen9
UID
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
ProLiant
DL380
Gen9
UID
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
ProLiant
DL380
Gen9
UID
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
ProLiant
DL380
Gen9
UID
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
ProLiant
DL380
Gen9
UID
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
ProLiant
DL380
Gen9
UID
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
ProLiant
DL380
Gen9
UID
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
ProLiant
DL380
Gen9
UID
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
CIFS/NFS network10GbE
OSD node 1
OSD node 8
OSD node 2
OSD node 3
OSD node 4
OSD node 5
OSD node 6
OSD node 7
480 TB Netto
Spare capacity
DL 360 Gen9
DL 160 Gen9
DL 160 Gen9
DL 160 Gen9
DL 380 Gen9
#opentechday #suse
UID
2
1
4
3
6
5
7 8 ProLiantDL160Gen9
monitoring node
UID
2
1
4
3
6
5
7 8 ProLiantDL160Gen9
monitoring node
UID
2
1
4
3
6
5
7 8 ProLiantDL160Gen9
monitoring node
UID
2
1
4
3
6
5
7 8
ProLiant
DL360
Gen9
CIFS/NFS gatewaymanagement node
ProLiant
DL380
Gen9
UID
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
ProLiant
DL380
Gen9
UID
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
ProLiant
DL380
Gen9
UID
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
ProLiant
DL380
Gen9
UID
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
ProLiant
DL380
Gen9
UID
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
ProLiant
DL380
Gen9
UID
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
ProLiant
DL380
Gen9
UID
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
ProLiant
DL380
Gen9
UID
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
SATA
7.2K
4.0 TB
CIFS/NFS network10GbE
OSD node 1
OSD node 8
OSD node 2
OSD node 3
OSD node 4
OSD node 5
OSD node 6
OSD node 7
DL 360 Gen9
DL 160 Gen9
DL 160 Gen9
DL 160 Gen9
DL 380 Gen9
Proposed Landscape
Erasure Coding
– Think of it as software RAID for an object
– Object is broken up into ‘k’ fragments and given ‘m’ durability pieces
– k=5, m=2 RAID 6
D1
D2
D3
D4
D5
P2
P1
D1
D2
D3
D4
D5
P1
P2
D1
D2
D3
D4
D5
P1
P2
D2
D3
D1
D4
D5
P1
P2
D1
D3
D4
D2
D5
P1
P2
Sp
are
ca
pa
city
1 2 3 4 5
D1
D2
D3
D4
D5
P2
P1
Objects
#opentechday #suse
SUSE Enterprise StorageExtend your scale out storage to improve resilience
1 DC
k=5, m=2
40% overhead
Failure protection:
2 OSDs
D1
D2
P1
P2
D3
D4
D5
DC 1
D2
D4
P1
P3
P5
D1
D3
P2
P4
D5
DC 2DC 1
D1
D5
P1
P5
D2
D6
P2
P6
D3
D7
P3
P7
D4
D8
P4
P8
DC 1 DC 2 DC 3 DC 42 DCs
k=5, m=5
100% overhead
Failure protection:
5 OSDs
1 Datacenter
4 DCs
k=8, m=8
100% overhead
Failure protection:
8 OSDs
2 Datacenters
#opentechday #suse
SUSEAll rights reserved + general disclaimer
Unpublished Work of SUSE LLC. All Rights Reserved.This work is an unpublished work and contains confidential, proprietary and trade secret information of SUSE LLC.
Access to this work is restricted to SUSE employees who have a need to know to perform tasks within the scope of their
assignments. No part of this work may be practiced, performed, copied, distributed, revised, modified, translated,
abridged, condensed, expanded, collected, or adapted without the prior written consent of SUSE.
Any use or exploitation of this work without authorization could subject the perpetrator to criminal and civil liability.
General DisclaimerThis document is not to be construed as a promise by any participating company to develop, deliver, or market a
product. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making
purchasing decisions. SUSE makes no representations or warranties with respect to the contents of this document, and
specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose. The
development, release, and timing of features or functionality described for SUSE products remains at the sole discretion
of SUSE. Further, SUSE reserves the right to revise this document and to make changes to its content, at any time,
without obligation to notify any person or entity of such revisions or changes. All SUSE marks referenced in this
presentation are trademarks or registered trademarks of Novell, Inc. in the United States and other countries. All third-
party trademarks are the property of their respective owners.