introduction to ibm spectrum scale and its use in life science

30
#ibmedge © 2016 IBM Corporation SBD-1266 Introduction to IBM Spectrum Scale and Its Use in Life Science Sven Oehme, IBM Research Konstantin Arnold, University of Basel

Upload: sandeep-patil

Post on 13-Jan-2017

81 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Introduction to IBM Spectrum Scale and Its Use in Life Science

#ibmedge© 2016 IBM Corporation

SBD-1266Introduction to IBM Spectrum Scale and Its Use in Life ScienceSven Oehme, IBM ResearchKonstantin Arnold, University of Basel

Page 2: Introduction to IBM Spectrum Scale and Its Use in Life Science

#ibmedge1

Page 3: Introduction to IBM Spectrum Scale and Its Use in Life Science

#ibmedge2

Page 4: Introduction to IBM Spectrum Scale and Its Use in Life Science

#ibmedge

Spectrum Scale Architecture Highlights: Scalability

3

Page 5: Introduction to IBM Spectrum Scale and Its Use in Life Science

#ibmedge

Spectrum Scale Architecture Highlights: HA/Reliability

4

Page 6: Introduction to IBM Spectrum Scale and Its Use in Life Science

#ibmedge

Spectrum Scale Software Local Read Only Cache (LROC)

5

• Many NAS workloads benefit from large read cache• SPECsfs• OpenStack, VMWare and other virtualization• Database

• Augment the Spectrum Scale Node DRAM cache with SSD/NVMe• Used to cache:

– Data– Inodes– Indirect blocks

• Cache consistency insured by standard Spectrum Scale tokens• Assumes SSD device is unreliable, data is protected by checksum and verified on read• Provide low-latency access to file system metadata and data

• Implement with consumer flash for maximum Cache/$• Enabled by FLEA’s LSA (Data is written Sequential to Device, to eliminate wear leveling)• Reach small File performance leadership compared to other NAS Devices

Page 7: Introduction to IBM Spectrum Scale and Its Use in Life Science

#ibmedge

LROC Example Speed Up

6

• Two consumer grade 200 GB SSDs cache a forty-eight 300 GB 10K SAS disk Spectrum Scale storage system

• Initially, with all data coming from the disk storage system, the client reads data from the SAS disks at ~ 3,000 IOPS

• As more data is cached in Flash, client performance increases to 33,000 IOPS while reducing the load on the disk subsystem bymore than 95%

Page 8: Introduction to IBM Spectrum Scale and Its Use in Life Science

#ibmedge

Spectrum Scale Raid features

7

Page 9: Introduction to IBM Spectrum Scale and Its Use in Life Science

#ibmedge

ESS (Spectrum Scale Raid Building Blocks)• Elastic Storage Server (ESS) is a prepacked solution using on the Spectrum Scale Raid technology and

Commodity HW components

• SSD/10k SAS Models• GS1, GS2, GS4,GS6• 2 x High Volume Servers• 1/2/4/6 x JBOD disk enclosures

• NL-SAS Models• GL2, GL4,GL6• 2 x High Volume Servers• 2/4/6 x JBOD disk enclosures

8

Page 10: Introduction to IBM Spectrum Scale and Its Use in Life Science

#ibmedge

ESS : various models

9

Page 11: Introduction to IBM Spectrum Scale and Its Use in Life Science

#ibmedge

University of Basel, Switzerland

10

1460: First and only University in Switzerlanduntil 19th century

7 faculties: Humanities, Science, Medicine, Law, Business and Economics, Psychology, Theology

7600 undergraduate students3700 postgraduate and doctoral students1300 academic staff358 Professors

Page 12: Introduction to IBM Spectrum Scale and Its Use in Life Science

#ibmedge

Scientific Computing @ University of Basel• HPC Clusters – specialized for large IO (bioinformatics) and high-speed

interconnects (molecular simulations)• Central systems administration• Up-to-date scientific databases• Up-to-date software stack• Back-up service• User training • User support• Developer support

(code version, issue tracking,wiki, etc.)

• Dedicated 24/7 production server environment for web services (SWISS-MODEL, Ismara, Mirz, etc.)

11

3.5 PB storage

10'000 CPU

cores

HPCcompute clusters

scientific software

training &

support

Page 13: Introduction to IBM Spectrum Scale and Its Use in Life Science

#ibmedge

Supporting research in Northwest Switzerland

12

• Hosting reference bioinformatics services• 500 registered users• 110 research groups

• Acknowledged in 70 life-science publications in 2016

From stellar astrophysics…

… to brain genomics…

… to structural biology … … to hosting reference services…

SWISS-MODEL

Major funding

Page 14: Introduction to IBM Spectrum Scale and Its Use in Life Science

#ibmedge

Scientific Storage and Computing Infrastructure

Once upon time …

13

HPC Cluster

NFS Server

Page 15: Introduction to IBM Spectrum Scale and Its Use in Life Science

#ibmedge

Scientific Storage and Computing Infrastructure

Cluster and storage grew bigger ...

14

HPC Cluster

NSD Server NSD ServerNSD Server

Page 16: Introduction to IBM Spectrum Scale and Its Use in Life Science

#ibmedge

Scientific Storage and Computing Infrastructure

15

SONAS

NSD Server

Spectrum Scale Data Hub Layer

NSD Server NSD Server

TSM-HSMLTFS-EE

HPC Cluster

BiomedicalResearch

Life SciencesDepartment

PhysicsDepartment

ChemistryDepartment

PsychologyDepartment

MicroscopyFacility

EconomyDepartment

…Genome

SequencingFacility

Page 17: Introduction to IBM Spectrum Scale and Its Use in Life Science

#ibmedge

Cluster Export ServicesHigh available file and object export services- export/share configuration straight forward - authentication against AD or LDAP

Important for planning:- NFS and Apple OS X- SMB1 not supported- mixed workload and performance- changes in authentication

16

NSD ServerNSD Server NSD Server

Protocol Nodes

Spectrum Scale Data Hub Layer

Active Directory

Authentication

CIFS NFS

Page 18: Introduction to IBM Spectrum Scale and Its Use in Life Science

#ibmedge

AFM for Data migration, Example: SONAS migrationOperational advantages:- preparing and prefetching before switching clients- migrate data while clients working on new share - minimal downtime: 1min (AFM) for share 30TB, 30M inodesvs. several months (using transfer host with robocopy)

Technical advantages:- data transfer: observed up to 1TB/h per gateway host

- ACL: transferred together with data- Direct storage → storage migration, no transfer host or copy software needed (e.g. robocopy, rsync)

17

NSD ServerNSD Server NSD Server

SONAS

Gateway NodesHome Cluster

Spectrum Scale Data Hub Layer

Page 19: Introduction to IBM Spectrum Scale and Its Use in Life Science

#ibmedge

Example: Scientific web serverProtein sequences vs. protein structures

18

Page 20: Introduction to IBM Spectrum Scale and Its Use in Life Science

#ibmedge

Example: Scientific web serverProtein annotation: humans vs. machines

19

Page 21: Introduction to IBM Spectrum Scale and Its Use in Life Science

#ibmedge

Example: Scientific web serverDisaster recovery: AFM between two sites- less work to develop data replication to DR siteScientific pipeline speedup x8: big pagepool + LROC- processing steps depend on bigger datasets, unchanged for 1 week- update of datasets very simple, no data distribution required

20

NSD Server

HPC Cluster

NSD Server NSD Server

200km

pagepool=128GB

LROC: 1TB SSD

AFM independent writer(replication not speed critical)

Internet

Page 22: Introduction to IBM Spectrum Scale and Its Use in Life Science

#ibmedge

Information Lifecycle Management - HSMUse of tape to lower cost of storageSpectrum Archive EE (LTFS-EE):- easy to manage, direct control of tape- use of policies for fine grained placement- well suited for data export- not a full fledged backup system

Spectrum Protect for Space Management- integration with backup system- requires TSM infrastructure

2121

Disk Pool

TS3500 TS3500

NSD Server

Spectrum Protect for Space ManagementSpectrum Archive EE

TSM Server

NSD Server

Spectrum Scale Data Hub Layer

ClientsClients

Page 23: Introduction to IBM Spectrum Scale and Its Use in Life Science

#ibmedge

Secure environment for biomedical researchEncryption - encryption of data at rest and on network- defined via policies- possibility of fine grained access groups- encryption keys managed by key management software (IBM SKLM)

- integration with general research infrastructure- suited for Biomedical data and processing

22

SKLM

Secure research environment

Login

HPC Cluster

NSD Server

General research environment

NSD Server

Clients

Page 24: Introduction to IBM Spectrum Scale and Its Use in Life Science

#ibmedge

Summary

23

SONASNSD Server

Spectrum Scale Data Hub Layer

NSD Server NSD Server

TSM-HSMLTFS-EE

HPC Cluster

BiomedicalResearch

Life SciencesDepartment

PhysicsDepartment

ChemistryDepartment

PsychologyDepartment

MicroscopyFacility

EconomyDepartment

…Genome

SequencingFacility

SKLM

Secure research environment

Login

HPC Cluster

NSD Server

Remote Site

AFM

CES: CIFS,NFSEncryption

ILM, HSM

LROC

Remote Cluster

Page 25: Introduction to IBM Spectrum Scale and Its Use in Life Science

#ibmedge

Spectrum Scale User Group• The Spectrum Scale User Group is free

to join and open to all using, interestedin using or integrating Spectrum Scale.

• Join the User Group activities to meetyour peers and get access to expertsfrom partners and IBM.

• Next meetings:- APAC: October 14, Melbourne- Global at SC16 : November 13 1pm to 5pm, Salt Lake City

• Web page: http://www.spectrumscale.org/• Presentations: http://www.spectrumscale.org/presentations/• Mailing list: http://www.spectrumscale.org/join/• Contact: http://www.spectrumscale.org/committee/• Meet Bob Oesterlin (US Co-Principal) at Edge2016: [email protected]

Page 26: Introduction to IBM Spectrum Scale and Its Use in Life Science

#ibmedge

Session : Futures of IBM Spectrum Scale

NDA & Customers ONLY

• Who: IBM Spectrum Scale Offering Management

• Carl Zetie, Ron Riffe

• When: Tuesday, September 20, 2016

• 1pm to 2pm

• Where: MGM Grand, Signature Tower 3

• Meeting Room D

• Contact (if any questions)

[email protected], [email protected]

25

Page 27: Introduction to IBM Spectrum Scale and Its Use in Life Science

#ibmedge

Session : How to apply Flash benefits to big data analytics and unstructured data

NDA & Customers ONLY

• Who: IBM Elastic Storage Server Offering Management

• Alex Chen

• When: Thursday, September 22, 2016

• 1:15pm to 2:15pm

• Where: Grand Garden Arena, Lower Level, MGM, Studio 10

• Contact(if any questions)

• • [email protected], [email protected]

26

Page 28: Introduction to IBM Spectrum Scale and Its Use in Life Science

#ibmedge

Spectrum Scale Trial VM• Download the IBM Spectrum Scale Trial VM from :

• http://www-03.ibm.com/systems/storage/spectrum/scale/trial.html

27

Page 29: Introduction to IBM Spectrum Scale and Its Use in Life Science

#ibmedge

Spectrum Scale Edge – Technical Sessions

• Just Search for “ Spectrum Scale” in the IBM Events mobile app. There are 15+ sessions on various topics including Lab sessions.

Lab Sessions:• Spectrum Scale Problem Determination Lab

Date: Sept 20th 2:15 PM – 3:15 PMLocation : MGM Grand , Room 317 Lab Center F

• Spectrum Scale Trail VM LabDate: Sept 20th 3:45PM – 4:45PMLocation: MGM Grand , Room 317 Lab Center F

• Booth on ESS , Spectrum Scale + TCT and DeepFlash

28

Page 30: Introduction to IBM Spectrum Scale and Its Use in Life Science

© 2016 IBM Corporation #ibmedge

Thank You