ceph day berlin: scaling an academic cloud
TRANSCRIPT
Scaling an Academic Cloud with Ceph
28.04.2015 | Berlin, Germany
Ceph Day Berlin
Christian Spindeldreher Enterprise Technologist Dell EMEA
The Cloud
2
The
Software-Defined Datacenter
3
Defining “software-defined”
The capabilities
• Compute
• Storage/availability
• Networking/ security & management
The benefits
• Automated & simplified
• Unlimited agility
• Maximum efficiency
SDN
SDS
SDC
SDE
4
Data plane
Control plane
Traditional system
Purpose-built hardware & software
General-purpose hardware
Software-defined
Open standard, e.g., OpenFlow
Next-gen compute block
Purpose-built function virtualized in general-purpose hardware
delivered as a service
The basics
5
The Cloud Operating System Manage the Resources…
6
Ceph and OpenStack
Ceph in Academia & Research
7
CLIMB project
8
picture from http://westcampus.yale.edu
• Collaboration between 4 Universities:
Birmingham, Cardiff, Swansea & Warwick
• Ceph environment across the 4 sites – part of a HPC Cloud to deploy virtual
resources for microbial bioinformatics (e.g. DNA sequencer output,…)
– shared data across the sites – robust solution with low €/TB ratio for
mid/long term storage – Ceph Solution by OCF, Inktank* & Dell
– more information: http://www.climb.ac.uk * now Red Hat
CLIMB project
• 4 Ceph Clusters – 6.9PB raw capacity (total) – 3 replicas – at least 1 remote:
2.3PB useable capacity
– server infrastructure (per site) › 5 MON nodes › 2 Gateway nodes
– R420, 4x 10GbE
› 27 OSD nodes – R730xd, 16x 4TB, 2 SSDs, 2x 10GbE
– network infrastructure › Brocade VDX6740T switches
– 48x 10GbE, 4x 40GbE
9
S3IT − Central IT, University of Zurich (UZH)
• UZH – some interesting facts – 26.000 enrolled students – Switzlerland‘s
largest university – member of the “League European Research
Universities” (LERU) – international renown in medicine, immunology,
genetics, neuroscience, structural biology, economics,…
› 12 UZH scholars have been awarded the Nobel Prize
• Scale-Out Storage for Scientific Cloud (based on OpenStack) – based on Ceph – commodity components – ethernet network – good balance between performance, capacity & cost
10
picture: http://www.hausarztmedizin.uzh.ch/index.html
S3IT − Central IT, University of Zurich (UZH)
• Requirements for High-Capacity Tier – 4.2PB raw capacity (1st batch)
› cinder volumes, glance images, ephemeral disks of VMs, radosgw (S3-like object storage)
› replication, erasure coding & cache tiering – R630 + 2x MD1400 JBOD
› 24x 4TB nSAS › 6x 800GB SSD (in R630)
• Requirements for High-Performance Tier
– 112TB raw capacity (1st batch) › block access › SSD pool, replicated
– R630 › 8x 1.6TB SSD
• Network – scale-out 40GbE back-bone:
2x Z9500 (132x 40GbE in 3RU) – ToR: S4810 (48x 10GbE, 4x 40GbE)
11
Requirements in Academia, Science & Research today What we see…
• Ceph Stand-Alone vs. OpenStack-related
• Large Scale Environments – 5PB / 20PB / 100PB target capacity – usually object
• Multi-Site Environments – cross-site replication – unified object space – searchable meta data
› out-of-scope for Ceph?!
12
Design Considerations
13
Infrastructure Considerations – Storage Nodes
• Form Factors – Small Nodes vs. Big Nodes
vs. Super-Nodes – Node Count – Ethernet-based Drives
• Use of SSDs – Journaling – Cache Tiering – SSD-only Pools – Check new SSD Types
› PCIe, form factors (1.8“ size), write endurance,…
14
Infrastructure Considerations – Storage Node Example
• Storage Node: R730xd – 2 RU – 1 or 2 CPUs – local drives
› 16x 3.5“ HDD slots (+ 2x 2.5“ for boot) – up to 6TB per drive today (96TB total)
› 24x 2.5“ HDD slots (+ 2x 2.5“ for boot) › 8x 3.5“ HDD slots + 18x 1.8“ SSDs
(+ 2x 2.5“ for boot)
– highly flexible system – JBOD expansion optional
15
Infrastructure Considerations – Storage Node Example
• Head Node: R630 – 1 RU – 1 or 2 CPUs – local drives
› 10x 2.5“ HDD slots or › 24x 1.8“ SSDs › could host Write Journaling, Cache Tiering or
SSD-only pools (then without a JBOD)
• JBOD: MD3060e – 4 RUs – SAS attach – 60x 3.5“ HDD slost
› up to 6TB per drive today (360TB total)
• VoC (example) – “Write Journal on SSD has no real impact
with 60 HDDs“
16
SAS
Infrastructure Considerations – Network
• Client-facing vs. Cluster-internal IO – be aware of replication traffic
• ToR – 1x or 2x 10GbE Switch
› failure domain?!
– 40GbE Uplinks
• Distributed Core – Scale-Out Core-Switch Design – 40/50/100GbE Mesh – Virtual Link Trunking (VLT) for HA/Load-
Balancing
17
Infrastructure Considerations – the Site/DC…
• Power & Cooling – high density has some impacts – example for 1 rack (42 RUs)
› R630 & MD3060e building block / 8 units › input power: › weight: › raw capacity:
• Fresh Air Technology – use higher air temperature for cooling – 25°C vs. 30°C vs. 40°C
18
High Density: TACC Stampede Cluster
› 21kW › ~ 1000kg
› 2.9PB
Dell Fresh Air Hot House, Round Rock TX
19
Dell|Inktank (now RH) Ceph Reference Architecture
HW + SW + Services
Hardware
HW Reference Architecture
• R730xd Servers • Storage and compute • Dell S/Z-Series Switches
Configuration • Min of 6 nodes:
3x MON + 3x Data
Software
Software • Inktank ICE platform • optional OpenStack cloud
software
Operating System • RHEL • SUSE, Ubuntu,…
Access • Object & Block (today)
Services
Deployment • Onsite HW Install • Onsite SW Install • Whiteboard session & training
Support • HW: Dell ProSupport • SW: OpenStack support
Solution based on (e.g.):
• Server nodes:
• R730xd,…
• Fully populated drives
• Dell F10 10/40GbE switches
• Modules are flexible
Dell Solution Centers
• 30-90 minute briefings
• 1-4 hour Design Workshops
• 5-10 days Proofs-of-Concept for hands-on “prove-it”
20
Thank You!