suse enterprise storage powered by ceph - opentechday.nl · file system (xfs) or bluestore object...

31
SUSE Enterprise Storage powered by Ceph SUSE Tom D’Hont #opentechday #suse

Upload: vunguyet

Post on 19-Jul-2018

228 views

Category:

Documents


0 download

TRANSCRIPT

SUSE Enterprise Storage powered by CephSUSE

Tom D’Hont #opentechday

#suse

CephAbout

#opentechday #suse

• Scale-out

• Object store

• Multiple interfaces

• Open source

• Community based

• Common hardware

• Self-healing & managing

https://ceph.com

CephCommunity

Code developers

782Core Regular Casual

22 53 705

Total downloads

160,015,454

Unique downloads

21,264,047

#opentechday #suse

CephObject Storage Daemon (OSD)

• OSDs serve storage objects to clients

• Peer to perform replication and recovery and scrubbing

• Journal often stored on faster media like SSD (often shared)

Physical disk, other persistent storage device

File system (xfs) or BlueStore

Object Storage Daemon

#opentechday #suse

CephStorage node

Put several OSDs in one Storage Node

#opentechday #suse

CephMonitor node

• Monitors are the brain cells of the cluster

- Cluster membership (cluster map)

- Consensus for distributed decision making

• Not in the performance path

- Do not serve stored objects to clients

#opentechday #suse

CephReliable Autonomous Distributed Object Store Cluster

#opentechday #suse

CephRADOS Interfaces

#opentechday #suse

CephReplication options

One copy plus parity

- Cost-effective durability

- 1.5x (50% overhead)

- Expensive recovery

Full copies of stored objects

- Very high durability

- 3x (200% overhead)

- Quicker recovery

#opentechday #suse

CephCRUSH placement algorithm

Pseudo-random data placement algorithm

- Fast calculation, no lookup table

- Repeatable, deterministic

- Statistically uniform distribution

CRUSH uses a map of OSDs in the cluster

- Includes physical topology, like row, rack, host

- Includes rules describing which OSDs to consider

#opentechday #suse

CephPlacement Group (PG)

• Balance data across OSDs in the cluster

• One PG typically exists on several OSDs for replication

• One OSD typically serves many PGs

#opentechday #suse

CephPlacement Group (PG)

• Each placement group maps to a pseudo random set of OSDs

• When an OSD fails, replication generally involves all OSDs in the pool replicating PGs across all OSDs in the pool

• Massively parallel recovery

#opentechday #suse

CephPools

• Logical container for storage objects

• Number of replicas OR erasure encoding settings

• Number of placement groups

Pool operations

- Create object

- Remove object

- Read object

- Write entire object

- Snapshot of the entire pool

#opentechday #suse

CephCache tiered pools

#opentechday #suse

Ceph !But why SUSE?

#opentechday #suse

Ceph !But why SUSE?

25+Years of Open Source Engineering Experience

2/3+Of the Fortune Global 100 use SUSE Linux Enterprise

10Awards in 2016 for SUSE Enterprise Storage

1stEnterprise Linux Distribution

1stEnterprise OpenStack Distribution

50%+Development Engineers

1.4BAnnualRevenue

Top 15Worldwide System infrastructure Software Vendor

+8%SUSE Growth vs. Other Linux in 2015*

#opentechday #suse

SUSE Enterprise StorageEnable transformation

Support today’s investment Adapt to the future

Legacy Data Center

- Network, compute and storage silos

- Traditional protocols – Fibre Channel, iSCSI, CIFS/SMB, NFS

Process Driven

- Slow to respond

Software-defined Data Center

- Software-defined everything

Agile Infrastructure

- Supporting a DevOps model

- Business driven

Mode 1 – Gartner for Traditional Mode 2 – Gartner for Software Defined

This is where you probably are today This is where you need to get to

#opentechday #suse

SUSE Enterprise StorageUse cases

Video Surveillance

• Security surveillance• Red light / traffic cameras• License plate readers• Body cameras for law

enforcement• Military/government visual

reconnaissance

Virtual Machine Storage

Low and mid i/o performancefor major hypervisor platforms• kvm – native RBD• Hyper-V – iSCSI• VMware - iSCSI

Bulk Storage

• SharePoint data• Medical records• Medical images

• X-rays• MRIs• CAT scans

• Financial records

Data Archive

Long-term storage and back up:• HPC• Log retention• Tax documents• Revenue reports

#opentechday #suse

SUSE Enterprise Storage Data Capacity Utilization

Tier 0

- Ultra high performance

Tier 1

- High-value, OLTP, Revenue Generating

Tier 2

- Backup/recovery, reference data, bulk data

Tier 3

- Object archive

- Compliance archive

- Long-term retention

#opentechday #suse

SUSE Enterprise Storage 4 Major features summary

#opentechday #suse

SUSE Enterprise Storage 4 openATTIC

#opentechday #suse

SUSE Enterprise Storage Roadmap

2016 2017 2018

V3

V4

V5

Information is forward looking and subject to change at any time.

SUSE Enterprise Storage 3 SUSE Enterprise Storage 4 SUSE Enterprise Storage 5

Built On

• Ceph Jewel release

• SLES 12 SP1 (Server)

Manageability

• Initial Salt integration (tech preview)

Interoperability

• CephFS (Tech Preview)

• AArch64 (Tech Preview)

Availability

• Multisite object replication (Tech Preview)

• Async block mirroring (Tech Preview)

Built On

• Ceph Jewel release

• SLES 12 SP 2 (Server)

Manageability

• SES openATTIC management

• Initial Salt integration

Interoperability

• AArch64

• CephFS (production use cases)

• NFS Ganesha (Tech Preview)

• NFS access to S3 buckets (Tech

Preview)

• CIFS Samba (Tech Preview)

• RDMA/Infiniband (Tech Preview)

Availability

• Multisite object replication

• Asynchronous block mirroring

Built On

• Ceph Luminous release

• SLES 12 SP 3 (Server)

Manageability

• SES openATTIC management phase 2

• SUSE Manager integration

Interoperability

• NFS Ganesha

• NFS access to S3 buckets

• CIFS Samba (Tech Preview)

• Fibre Channel (Tech Preview)

• RDMA/Infiniband

• Support for containers

Availability

• Asynchronous block mirroring

• Erasure coded block pool

Efficiency

• BlueStore back-end

• Data compression

• Quality of Service (Tech Preview)

#opentechday #suse

SUSE Enterprise Storage 4Case study

4 DC campus

480 TB

CIFS / NFS

#opentechday #suse

Existing landscape

RobocopyRsync

#opentechday #suse

Proposed landscape

[HP DL360 Gen9] CIFS/NFS gateway / management node

2x E5-2630v3

4x 16GB PC4-2133

1x Dual 120GB SSD M.2

2x 10GbE T

500W R-PS

[HP DL160 Gen9] monitoring node

1x E5-2603v3

1x 8GB PC4-2133

1x Dual 1210GB SSD M.2

1x 10GbE T

550W PS

[HP DL380 Gen9] OSD node

2x E5-2630v3

8x 16GB PC4-2133

2x 8GB PC4-2133

1x Dual 120GB SSD M.2

2x 400GB SSD

1x 800GB SSD

12x 8TB HDD

2x 10GbE T

800W R-PS

Start with 480 TB Netto + extend with 68,6 TB

1 OSD node 12x 8 TB 96 TB RAW

7 OSD nodes 84x 8 TB 672 TB RAW (96x7)

Erasure code k=5, m=2 480 TB Netto (672/7x5)

1 OSD node k=5, m=2 68,6 TB Netto (480/7)

UID

2

1

4

3

6

5

7 8 ProLiantDL160Gen9

monitoring node

UID

2

1

4

3

6

5

7 8 ProLiantDL160Gen9

monitoring node

UID

2

1

4

3

6

5

7 8 ProLiantDL160Gen9

monitoring node

UID

2

1

4

3

6

5

7 8

ProLiant

DL360

Gen9

CIFS/NFS gatewaymanagement node

ProLiant

DL380

Gen9

UID

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

ProLiant

DL380

Gen9

UID

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

ProLiant

DL380

Gen9

UID

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

ProLiant

DL380

Gen9

UID

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

ProLiant

DL380

Gen9

UID

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

ProLiant

DL380

Gen9

UID

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

ProLiant

DL380

Gen9

UID

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

ProLiant

DL380

Gen9

UID

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

CIFS/NFS network10GbE

OSD node 1

OSD node 8

OSD node 2

OSD node 3

OSD node 4

OSD node 5

OSD node 6

OSD node 7

480 TB Netto

Spare capacity

DL 360 Gen9

DL 160 Gen9

DL 160 Gen9

DL 160 Gen9

DL 380 Gen9

#opentechday #suse

UID

2

1

4

3

6

5

7 8 ProLiantDL160Gen9

monitoring node

UID

2

1

4

3

6

5

7 8 ProLiantDL160Gen9

monitoring node

UID

2

1

4

3

6

5

7 8 ProLiantDL160Gen9

monitoring node

UID

2

1

4

3

6

5

7 8

ProLiant

DL360

Gen9

CIFS/NFS gatewaymanagement node

ProLiant

DL380

Gen9

UID

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

ProLiant

DL380

Gen9

UID

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

ProLiant

DL380

Gen9

UID

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

ProLiant

DL380

Gen9

UID

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

ProLiant

DL380

Gen9

UID

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

ProLiant

DL380

Gen9

UID

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

ProLiant

DL380

Gen9

UID

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

ProLiant

DL380

Gen9

UID

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

SATA

7.2K

4.0 TB

CIFS/NFS network10GbE

OSD node 1

OSD node 8

OSD node 2

OSD node 3

OSD node 4

OSD node 5

OSD node 6

OSD node 7

DL 360 Gen9

DL 160 Gen9

DL 160 Gen9

DL 160 Gen9

DL 380 Gen9

Proposed Landscape

Erasure Coding

– Think of it as software RAID for an object

– Object is broken up into ‘k’ fragments and given ‘m’ durability pieces

– k=5, m=2 RAID 6

D1

D2

D3

D4

D5

P2

P1

D1

D2

D3

D4

D5

P1

P2

D1

D2

D3

D4

D5

P1

P2

D2

D3

D1

D4

D5

P1

P2

D1

D3

D4

D2

D5

P1

P2

Sp

are

ca

pa

city

1 2 3 4 5

D1

D2

D3

D4

D5

P2

P1

Objects

#opentechday #suse

SUSE Enterprise StorageExtend your scale out storage to improve resilience

1 DC

k=5, m=2

40% overhead

Failure protection:

2 OSDs

D1

D2

P1

P2

D3

D4

D5

DC 1

D2

D4

P1

P3

P5

D1

D3

P2

P4

D5

DC 2DC 1

D1

D5

P1

P5

D2

D6

P2

P6

D3

D7

P3

P7

D4

D8

P4

P8

DC 1 DC 2 DC 3 DC 42 DCs

k=5, m=5

100% overhead

Failure protection:

5 OSDs

1 Datacenter

4 DCs

k=8, m=8

100% overhead

Failure protection:

8 OSDs

2 Datacenters

#opentechday #suse

SUSE Enterprise Storage 4Demo

#opentechday #suse

Questions

& Answers

#opentechday #suse

The End#opentechday

#suse

SUSEAll rights reserved + general disclaimer

Unpublished Work of SUSE LLC. All Rights Reserved.This work is an unpublished work and contains confidential, proprietary and trade secret information of SUSE LLC.

Access to this work is restricted to SUSE employees who have a need to know to perform tasks within the scope of their

assignments. No part of this work may be practiced, performed, copied, distributed, revised, modified, translated,

abridged, condensed, expanded, collected, or adapted without the prior written consent of SUSE.

Any use or exploitation of this work without authorization could subject the perpetrator to criminal and civil liability.

General DisclaimerThis document is not to be construed as a promise by any participating company to develop, deliver, or market a

product. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making

purchasing decisions. SUSE makes no representations or warranties with respect to the contents of this document, and

specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose. The

development, release, and timing of features or functionality described for SUSE products remains at the sole discretion

of SUSE. Further, SUSE reserves the right to revise this document and to make changes to its content, at any time,

without obligation to notify any person or entity of such revisions or changes. All SUSE marks referenced in this

presentation are trademarks or registered trademarks of Novell, Inc. in the United States and other countries. All third-

party trademarks are the property of their respective owners.