seagate exascale hpc storage

21
© 2015 Seagate, Inc. All Rights Reserved. Seagate ExaScale HPC storage Miro Lehocky System Engineer Seagate Systems Group, HPC

Upload: inside-bigdatacom

Post on 12-Apr-2017

472 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Seagate Exascale HPC Storage

© 2015 Seagate, Inc. All Rights Reserved.

Seagate ExaScale HPC storage

Miro Lehocky System Engineer Seagate Systems Group, HPC

Page 2: Seagate Exascale HPC Storage

55 PB Lustre File System

1.6 TB/sec Lustre File System

1 TB/sec Lustre File System

500+ GB/s Lustre File System

100+ PB Lustre File System

130+ GB/s Lustre File System

140+ GB/s Lustre File System

Page 3: Seagate Exascale HPC Storage

Market leadership ….

n.b. NCSA Bluewaters 24 PB 1100 GB/s (Lustre 2.1.3)

Rank Name Computer Site Total Cores Rmax (TFLOPS)

Rpeak (TFLOPS)

Power (KW) File system Size Perf

1 Tianhe-2TH-IVB-FEP Cluster, Xeon E5-2692 12C 2.2GHz, TH Express-2, Intel Xeon Phi

National Super Computer Center in Guangzhou

3120000 33862700 54902400 17808 Lustre/H2FS 12.4 PB ~750 GB/s

2 TitanCray XK7 , Opteron 6274 16C 2.2GHz, Cray Gemini interconnect, NVIDIA K20x

DOE/SC/Oak Ridge National Laboratory 560640 17590000 27112550 8209 Lustre 10.5 PB 240 GB/s

3 Sequoia BlueGene/Q, Power BQC 16C 1.60 GHz, Custom Interconnect DOE/NNSA/LLNL 1572864 17173224 20132659 7890 Lustre 55 PB 850 GB/s

4 K computer Fujitsu, SPARC64 VIIIfx 2.0GHz, , Tofu interconnect RIKEN AICS 705024 10510000 11280384 12659 Lustre 40 PB 965 GB/s

5 Mira BlueGene/Q, Power BQC 16C 1.60GHz, Custom

DOE/SC/Argonne National Lab. 786432 8586612 10066330 3945 GPFS 28.8 PB 240 GB/s

6 Trinity Cray XC40, Xeon E5-2698v3 16C 2.3GHz, Aries interconnect DOE/NNSA/LANL/SNL 301056 8100900 11078861 Lustre 76 PB 1,600 GB/s

7 Piz DaintCray XC30, Xeon E5-2670 8C 2.600GHz, Aries interconnect , NVIDIA K20x

Swiss National Supercomputing Centre (CSCS)

115984 6271000 7788853 2325 Lustre 2.5 PB 138 GB/s

8 Shaheen II Cray XC40, Xeon E5-2698v3 16C 2.3GHz, Aries interconnect

KAUST,Saudi Arabia 196,608 5,537 7,235 2,834 Lustre 17 PB 500 GB/s

9Hazel Hen Cray XC40, Xeon E5-2680v3 12C

2.5GHz, Aries interconnect HLRS - Stuttgart 185088 5640170 7403520 Lustre 7 PB ~ 100 GB/s

10 Stampede PowerEdge C8220, Xeon E5-2680 8C 2.7GHz, IB FDR, Intel Xeon Phi

TACC/Univ. of Texas 462462 5168110 8520112 4510 Lustre 14 PB 150 GB/s

Page 4: Seagate Exascale HPC Storage

© 2015 Seagate, Inc. All Rights Reserved.

Still The Same Concept: Fully integrated, fully balanced, no bottlenecks …

ClusterStor Scalable Storage Unit Ø  Intel Ivy bridge or Haswell CPUs Ø  F/EDR, 100 GbE & 2x40GbE, all SAS infrastructure Ø  SBB v3 Form Factor, PCIe Gen-3 Ø  Embedded RAID & Lustre support

Lustre File System (2.x)

ClusterStor Manager

Data Protection Layer (PD-RAID/Grid-RAID)

Linux OS

Unified System Management (GEM-USM)

Embedded server modules

Page 5: Seagate Exascale HPC Storage

© 2015 Seagate, Inc. All Rights Reserved.

CS9000 ClusterStor Lustre Software ›  Lustre v2.5 ›  Linux v6.5

Top of Rack Switches (ToR) ›  Infiniband (IB) Mellanox FDR or 40GbE (High availability connectivity in the Rack)

ClusterStor Management Unit Hardware (CMU) ›  4 Servers in a 2U Chassis, FDR / 40GbE

›  Sever 1 & 2 File System Management, Boot ›  Server 3 & 4 Meta Data (user data location map)

›  2U24 JBOD (HDD storage for Management and Meta Data)

CS L300

ClusterStor Scalable Storage (SSU) ›  5U84 Enclosure

›  6Gbit SAS ›  Object Storage Servers (OSS) Network I/O

›  Mellanox FDR/40GbE ›  7.2K RPM HDDs

Management Switches ›  1Gbe Switches (component communication)

Top of Rack Switches (ToR) ›  Infiniband (IB) Mellanox EDR or 100/40GbE

ClusterStor System Management Unit (SMU) ›  2U24 w/ Embedded Servers in HA Configuration

›  File System Management, Boot, Storage ClusterStor Meta Data Management Unit (MMU) ›  2U24 w/ Embedded Servers in HA Configuration

›  Meta Data (user data location map)

ClusterStor Scalable Storage (SSU) ›  5U84 Enclosure

›  Object Storage Servers (OSS) Network I/O ›  Mellanox EDR 100/40GbE

›  10K and 7.2K RPM HDDs ›  2U24 Enclosure

›  Seagate Koho Flash SSDs

So what’s new ??? Seagate Next Gen Lustre appliance

Management Switches ›  1Gbe Switches (component communication)

ClusterStor Lustre Software ›  Lustre v2.5 Phase 1, Lustre 2.7 Phase 2 ›  Linux v6.5 Phase 1, Linux 7.2 Phase 2

Page 6: Seagate Exascale HPC Storage

© 2015 Seagate, Inc. All Rights Reserved.

ClusterStor GRIDRAID

Parity Rebuild Disk Pool #1

Parity Rebuild Disk Pool #2

Parity Rebuild Disk Pool #3

OSS Server

Parity Rebuild Disk Pool #4

Traditional RAID

Parity Rebuild Disk Pool #1

OSS Server

GridRAID

De-clustered RAID 6: Up to 400% faster to repair Rebuild of 6TB drive – MD RAID ~ 33.3 hours, GridRAID ~ 9.5 hours Recover from a disk failure and return to full data protection faster

Repeal Amdahl’s Law: speed of a parallel system is gated by the performance of the slowest component Minimizes application impact to widely striped file performance

Minimize file system fragmentation Improved allocation and layout maximizes sequential data placement

4 to1 Reduction in OSTs Simplifies scalability challenges

ClusterStor Integrated Management CLI and GUI configuration, monitoring and management reduces Opex

Feature Benefit

Page 7: Seagate Exascale HPC Storage

ClusterStor L300 HPC Disk Drive

Page 8: Seagate Exascale HPC Storage

• • 

– – – 

• • 

– • 

– • 

High level product description HPC Optimized Performance 3.5 HDD

Page 9: Seagate Exascale HPC Storage

ClusterStor L300 HPC 4TB SAS HDD HPC Industry First; Best Mixed Application Workload Value

0

100

200

300

400

500

600

Random writes (4K IOPS, WCD)

Random reads (4KQ16 IOPS)

Sequential data rate (MB/s)

Performance Leader World-beating performance over other 3.5in HDDs: Speeding data ingest, extraction and access

Capacity Strong 4TB of storage for big data applications

Reliable Workhorse 2M hour MTBF and 750TB/year ratings for reliability under the toughest workloads your users throw at it

Power Efficient Seagate’s PowerBalance feature provides significant power benefits for minimal performance tradeoffs

CS HPC HDD

CS HPC HDD

NL 7.2K RPM HDD

CS HPC HDD

NL 7.2K RPM HDD

NL 7.2K RPM HDD

Page 10: Seagate Exascale HPC Storage

© 2015 Seagate, Inc. All Rights Reserved.

IBM Spectrum Scale based solutions

Page 11: Seagate Exascale HPC Storage

Best in Class Features Designed for HPC, Big Data and Cloud

Connectivity ›  IB – FDR, QDR and 40 GbE

(EDR, 100GbE and Omnipath on roadmap) ›  Exportable via CNFS, CIFS, Object storage,

HDFS connectors ›  Linux and Windows Clients

Robust Feature Set ›  Global Shared Access with Single Namespace

across cluster/file systems ›  Building Block approach with Scalable

performance and capacity ›  Distributed file system data caching and

coherency management ›  RAID 6 (8 +2) with De-clustered RAID ›  Snapshot and Rollback ›  Integrated Lifecycle Management ›  Backup to Tape ›  Non Disruptive Scaling, Restriping, Rebalancing ›  Synchronously replicated data and metadata

Management and Support ›  Clusterstor CLI Based Single Point of

Management ›  RAS/Phone home ›  SNMP integration with Business Operation

Systems ›  Low level Hardware Monitoring & Diagnostics ›  Embedded monitoring, ›  Proactive alerts

Hardware Platform ›  Industry’s Fastest Converged Scale-Out

Storage Platform ›  Latest Intel Processors ›  Embedded High Availability NSD Servers

Integrated into Data Storage Backplane ›  Fastest Available IO – details?? ›  Extremely Dense Storage Enclosures with 84

drives in 5U

Page 12: Seagate Exascale HPC Storage

ClusterStor Spectrum Scale Performance Density Rack Configuration

NSD

CSM

ETN 1

NSD

NSD

NSD

NSD

NSD

High Speed Network 1 ETN 2

High Speed Network 2

NSD

Key components: ›  ClusterStor Manager Node (2U enclosure)

•  2 HA management servers •  10 drives

›  2 Management switches

Performance: ›  Up to 56GB /sec per rack

Key components: ›  5U84 Enclosure Configured as NSDs + Disk

•  2 HA Embedded NSD Servers •  76 to 80 7.2K RPM HDDs •  4 to 8 SSD

›  42U reinforced Rack •  Custom cable harness •  Up to 7 enclosures in each rack

(base + expansion)

NSD

ETN 1

NSD

NSD

NSD

NSD

NSD

High Speed Network 1 ETN 2

High Speed Network 2

NSD

Base Rack Expansion Rack

Page 13: Seagate Exascale HPC Storage

ClusterStor Spectrum Scale Capacity Optimized Rack Configuration

NSD

CSM

ETN 1

JBOD

NSD

JBOD

NSD

JBOD

High Speed Network 1 ETN 2

High Speed Network 2

NSD

JBOD

ETN 1

NSD

JBOD

NSD

JBOD

NSD

High Speed Network 1 ETN 2

High Speed Network 2

JBOD

Base Rack Expansion Rack

Key components: ›  ClusterStor Manager Node (2U enclosure)

•  2 HA management servers •  10 drives

›  2 Management switches

Performance: ›  Up to 32GB /sec per rack

Key components: ›  5U84 Enclosure Configured as NSDs + Disk

•  2 HA Embedded NSD Servers •  76 to 80 7.2K RPM HDDs •  4 to 8 SSD

›  5U84 Enclosure Configured as JBODs •  84 7.2K RPM HDDs •  SAS connected to NSD servers, 1 to 1 ratio

›  42U reinforced Rack

Page 14: Seagate Exascale HPC Storage

ClusterStor Spectrum Scale – Standard Configuration

2U24 Management Server

5U84 Disk Enclosure

NSD (MD) Server x 2 servers > 8GB/sec per 5U84 (Clustered) ~ 20K File Creates per Sec ~ 2 Billion Files

ClusterStor ToR & Mgt Switch, Rack, Cables, PDU

Factory Integration & Test

Up to (7) 5U84’s in base rack Metadata SSD Pool

~10K File Creates / sec ~ 1Billion files, 800 GB SSD x 2

User Data Pool ~4GB/sec

HDD x qty (40)

Metadata SSD Pool ~10K File Creates / sec

~ 1Billion files, 800 GB SSD x 2

User Data Pool ~4GB/sec

HDD x qty (40)

NSD (MD) Server #1 NSD (MD) Server #2

Page 15: Seagate Exascale HPC Storage

© 2015 Seagate, Inc. All Rights Reserved.

Object Storage based

archiving solutions

Page 16: Seagate Exascale HPC Storage

Clusterstor A200 Object Store Features

ClusterStor A200

Global object namespace

Object API & portfolio of network based interfaces

High density storage up to 3.6PB usable per rack

Can achieve 4 “nines” system availability

Rapid drive rebuild (<1hr for 8TB in a large system)

Integrated management & consensus based HA

“Infinite” numbers of objects (2128)

Performance scales with capacity(up to 10GB/s per rack)

8+2 network erasure coding for cost-effective protection

Page 17: Seagate Exascale HPC Storage

ClusterStor A200: Resiliency Built-In

42U Rack with wiring loom & power cables

›  Dual PDUs ›  2U spare space reserved for future

configuration options ›  Blanking plates as required

Redundant TOR switches

›  Combines data and management network traffic

›  VLANs used to segregate network traffic ›  10GbE with 40GbE TOR uplinks

Management Unit

›  1x2U24 enclosure ›  2x Embedded Controllers

Storage Units

›  Titan v2 5U84 enclosures – 6x is the minimum config

›  82 SMR SATA HDD(8TB) ›  Single Embedded Storage Controller ›  Dual 10GbE network connections ›  Resiliency withj 2SSU failures

(12SSU’s miimum)

Page 18: Seagate Exascale HPC Storage

Economic Benefits of SMR drives B

and 1

Read Head

Write Head

Updates destroy portion of next track

Band 2

Shingled Technology increases capacity of platter by 30-40% ›  Write tracks are overlapped by up to 50% of write

width ›  Read head is much smaller & can reliably read

narrower tracks SMR Drives are optimal for object stores as most data is static/WORM › Updates require special intelligence and may be expensive in terms of performance

› “Wide tracks in each band are often reserved for updates

CS A200 manages SMR Drives directly to optimize workflow & caching

SMR Drives

Page 19: Seagate Exascale HPC Storage

Current Clusterstor Lustre solutions line-up CS-1500 CS-9000 L300

Base rack Expansion rack Base rack Expansion rack

Page 20: Seagate Exascale HPC Storage

Current ClusterStor Spectrum Scale/Active Archive line-up G200 A200

Base rack Expansion rack Base rack Expansion rack

Page 21: Seagate Exascale HPC Storage