tiered storage for big data and supercomputing - asis&t · tiered storage for big data and...
TRANSCRIPT
C O M P U T E | S T O R E | A N A L Y Z E
Tiered Storage for Big Data and Supercomputing
ASIS Webinar
July 31, 2014
C O M P U T E | S T O R E | A N A L Y Z E
Safe Harbor Statement
This presentation may contain forward-looking statements that are
based on our current expectations. Forward looking statements
may include statements about our financial guidance and expected
operating results, our opportunities and future potential, our product
development and new product introduction plans, our ability to
expand and penetrate our addressable markets and other
statements that are not historical facts. These statements are only
predictions and actual results may materially vary from those
projected. Please refer to Cray's documents filed with the SEC from
time to time concerning factors that could affect the Company and
these forward-looking statements.
Cray Workflow-Driven Storage Solutions2
C O M P U T E | S T O R E | A N A L Y Z E
Agenda
Cray Workflow-Driven Storage Solutions
● Cray Storage & Data Management Business Overview
● Trends and Challenges● Big data and what it means for storage● Planning and managing HSMs
● Product overview● Cray Sonexion – for high-performance storage● Cray Tiered Adaptive Storage (TAS)
● Case Studies
● Summary
3
C O M P U T E | S T O R E | A N A L Y Z E
Cray Builds Computational and Storage Tools That Help Change The World
Cray Workflow-Driven Storage Solutions4
SupercomputersFlexible Clusters
Hybrid ArchitecturesCompute
Tiered Storage& Data Management
Systems and SolutionsStore
AnalyzeGraph Analytics
Hadoop Solutions
Merging Big Data and Supercomputing
Supercomputing Big Data
C O M P U T E | S T O R E | A N A L Y Z E
Cray Storage and Data Management
Cray Workflow-Driven Storage Solutions5
● Cray is a public company with steady growth● Overall for Cray in 2013: 525M in revenue● SDM 63M with 27% annual growth
● Track record of delivering results● 120+ petabytes of deployed storage● 150+ Cray-supported Lustre deployments● World’s leading parallel systems team
● World’s fastest production file system● Exascale leader in storage performance
● Growing storage portfolio● Storage systems
● Cray Sonexion● Cray Tiered Adaptive Storage (TAS)
● Expert systems architectures● High-performance storage● Tiered storage● Archive solutions
● Services● 24/7 global support organization
Expertise & Best Practices
System Architectures
Storage Systems
Cray Solutions
C O M P U T E | S T O R E | A N A L Y Z E
Data-Driven Markets Need Scalable Storage
Cray Workflow-Driven Storage Solutions
Supercomputing Analytics Cluster Computing
Earth SciencesCLIMATE CHANGE& WEATHER PREDICTION.REMOTE SENSING.
ManufacturingAIRCRAFT DESIGN,CRASH SIMULATION &FLUID DYNAMICS
Life SciencesDRUG DISCOVERY, GENOMICRESEARCH, COMPLEXMODELING
Higher EducationUNIVERSITY-DRIVEN SCIENCE, NEW ENERGY SOURCES &EFFICIENT COMBUSTION
EnergySEISMIC IMAGING & RESERVOIR SIMULATION
Defense & NationalSecurityWARFIGHTER SUPPORT,THREAT PREDICTION &STOCKPILE STEWARDSHIP
Cray Storage and Data Solutions
6
C O M P U T E | S T O R E | A N A L Y Z E
Cray Storage and Data Management Solutions
Cray Workflow-Driven Storage Solutions7
• Proven experts in parallel systems, disk storage and HSM
• 150 Lustre deployments
• 120 petabytes primary storage shipped/installed
• Exascale leadership in storage performance and archiving
• Scale-as-you-go performance from GB/s to 1TB/s in a file system
• Fluid capacity scalability from terabytes to exascale-capable archives
• Quality assurance and stress testing for the largest production environments
• Simplify and reduce time to deployment
• Fastest in-production Lustre file system
• Reduced time to results by 24x at NCSA
• Reduce storage footprint by 50% for petascale systems
Massively Scalable Storage Solutions for Big Data & Supercomputing
Your Trusted Expert Scale Optimally Results Faster
Experts in workflow-driven storage, optimized to scale and deliver results
C O M P U T E | S T O R E | A N A L Y Z E
What Our Storage Customers are Saying
Cray Workflow-Driven Storage Solutions
We immediately saw success from the perspective of stability and performance. Our bandwidth numbers were higher than the previous vendor’s, using the exact same hardware. We went from the file system being our biggest issue to the least of our issues, with Cray.
Jim Lujan, HPC Project Leader, LANL
Pawsey
Center
“Some of the science teams have been able to do 3 years worth of work in 3 months.”
Michelle Butler, Head of Storage & Networking, NCSA Blue Waters project
Cray was chosen at Pawsey because Cray is the most credible and reliable partner and best understood the requirements. Knowing we have Cray onsite is very important. If Cray can’t do it, nobody can.
Dr. George Beckett, Deputy Director & Head of Supercomputing Team
8
C O M P U T E | S T O R E | A N A L Y Z E
Storage Trends in Data-Driven Workflows
C O M P U T E | S T O R E | A N A L Y Z E
Meta Issues for Data and Science
Cray Workflow-Driven Storage Solutions
Science and research
moving faster than IT
Storage and data flow
(usually) the bottleneckT
P C
10
C O M P U T E | S T O R E | A N A L Y Z E
Other Trends – More Data to Retain, Indefinitely
Cray, Inc. - Tiered Adaptive Storage11
● Massive data growth● More to keep● Outpacing IT’s ability to store, protect, and manage data
● Storage too expensive – for one monolithic “tier”● Match storage tier to value of data based on policies/rules● Each tier has to be price competitive
● Data should be continuously accessible, in many cases
● Simplicity rules ● Users: Keep them working (NFS, CIFS, FTP, Web)● Storage ops: Needs to simplify managing the storage and data
● Data management poses biggest challenges
C O M P U T E | S T O R E | A N A L Y Z E
“Big Data” Defined – Cray’s View
12
● Data Types● Structured & Transactional
● Databases and Analysis● Unstructured (our primary focus)
● Persistent file data● Larger files—and data sets
● Volume, Variety, Velocity● Volume
● Big data sets – and file systems● Variety
● Mainly unstructured● Mix of file types and sizes
● Velocity● Data is “fast” early in the lifecycle
● Data I/O - I/O from small random to “big I/O”● Data movement – file movement, copying, staging,
archiving
● Data continues to move, and needs to be continuously accessible throughout its lifecycle, too
Massive Volumes of
Data
Velocity –from fast to slow
Variety of File Types
Cray, Inc. - Tiered Adaptive Storage
C O M P U T E | S T O R E | A N A L Y Z E
The Innovation Explosion in Life Sciences
Cray Workflow-Driven Storage Solutions
● Data generated at greater rate than IT can manage● Lab science changes every month● IT refreshes occur over 2-10 years
● Data outputs doubling yearly● Genomics, Microscopy, Radio
Astronomy, Physics.
● More data must be kept● Deleting results not an option for
many commercial companies or research organizations
Today’s Bio-IT professionals have to design, deploy and support IT infrastructures with life cycles measured over several years in the face of an innovation explosion where major laboratory and research enhancements arrive on the scene every few months.
-- the BioTeam
How do enterprise and HPC infrastructure teams keep up?
13
C O M P U T E | S T O R E | A N A L Y Z E
The Data Lifecycle – for Performance
Cray Workflow-Driven Storage Solutions14
0
5
10
15
20
25
30
0
500
1000
1500
2000
2500
3000
Day 1 30 Days 60 Days 90 Days 180 Days 360 Days 2 Years
Cap
acit
y (P
B)
Thro
ugh
pu
t (M
B/s
ec)
Throughput and I/O
Parallel Access
Performance Scaling
C O M P U T E | S T O R E | A N A L Y Z E
The Data Lifecycle – for Capacity
Cray Workflow-Driven Storage Solutions15
0
5
10
15
20
25
30
0
500
1000
1500
2000
2500
3000
Day 1 30 Days 60 Days 90 Days 180 Days 360 Days 2 Years
Cap
acit
y (P
B)
Thro
ugh
pu
t (M
B/s
ec)
Maximum Efficiency
Infrequent Access
Capacity Scalability
C O M P U T E | S T O R E | A N A L Y Z E
The Tiers in Tiering—Form Follows Function
16
● HSM and the Tiers in Storage● Tier 0: Fast (SSD and disk) ● Tier 1: Primary (SSD and disk)● Tier 2: Nearline (disk or tape)● Tier 3: Offline or offsite (disk or tape)
Primary
Nearline
Deep Archives
Fast Performance-optimized – IO and throughput
Where data lives most of the time. Performance to capacity mix depends on use case
Capacity-optimized archives (disk or tape)
Long-term capacity- and cost-optimized (usually tape)
Cray, Inc. - Tiered Adaptive Storage
Data M
igration
C O M P U T E | S T O R E | A N A L Y Z E
Storage Challenges
Cray Workflow-Driven Storage Solutions17
C O M P U T E | S T O R E | A N A L Y Z E
Data & Storage Management Challenges
18
● Data Sprawl ● Multiple file systems to manage● Data not easily managed across
file systems● Data not easily found
● Storage Cost & Complexity● Linux – devices, drivers, etc.● Massive complexity● Configuration and testing● Data wasting space or deleted
● Data Protection & Availability● Best case: data found or
recovered in days● Worst case: data lost or corrupted
App 1 App 3 App 3
Fast Storage
Primary Archive
FS 1 FS 2 FS 3 FS 3 ./foo
Libraries
Backup SW
Where’s My Data?
Cray, Inc. - Tiered Adaptive Storage
C O M P U T E | S T O R E | A N A L Y Z E
Build Your Own HSM: Complex Undertaking
19
● Server OS and hardware validation● Server, networking, storage devices (HBAs, NICs)● Configuration management
● HSM software and hardware validation● Stacks often proprietary ● Tied to local OS and files systems (DMAPI)● Tied to hardware (servers and storage)● Require expertise to configure
● Networking expertise● Fiber Channel, iSCSI, InfiniBand● NFS, CIFS, FTP, etc.
● Disk storage (SANs) expertise● Massive complexity at scale
● LUNs, masking, zoning, multi-pathing ● Requires platform-specific expertise
● Tape storage & library integration● Requires library support and testing● API integration● Data management and protection requires HSM
Cray, Inc. - Tiered Adaptive Storage
Data Checks In—But Never Checks Out
C O M P U T E | S T O R E | A N A L Y Z E
Traditional HSM Architecture – Complex
Cray Workflow-Driven Storage Solutions
IB Fabric
fs1
fs2
fs3
QDR
FDR
FC
Ethernet
DM
DM
DM
DM
DM
DM
Ethernet
HSM
HSM
HSM
HSM
HSM
HSM
Disk Cache
Archive Media
Archive Media
Archive Media
Archive Media
Archive Media
Archive Media
Lustre Movers HSM Movers
Data Ingest
20
C O M P U T E | S T O R E | A N A L Y Z E
Cray’s Goals for Workflow-Driven Tiered Storage
Cray Workflow-Driven Storage Solutions21
● Pay as you grow performance and capacity scalability● Performance scaling for fast tiers - High Performance Storage● Capacity expansion for primary and archive tiers - Tiered Storage/Archiving
● Simplify managing everything● Ease of deployment, configuration and operations● Upgradeable and sustainable infrastructure, over lifespan of data● In-place data migration (no forklift)
● Build on open, portable and sustainable architectures● Open data formats for long term storage● Open source operating systems and tools● Flexible storage choices to fit requirements - flash, disk and tape
● Data protection and accessibility at scale● Data must be available within reasonable timeframes
● Quality and dependability for large-scale deployments● Solutions that work as advertised● Single point of support for entire solution
C O M P U T E | S T O R E | A N A L Y Z E
Process Store Archive
Abstracting Data Access - Across Workflows
Cray Workflow-Driven Storage Solutions22
Tier 1 Tier 2 Tier 3 Tier 4
Data Movement
Fast Persistent Efficient
High Speed Interconnect
DistributionAccessIngest / CreationHPC Systems, Workflows and Applications
C O M P U T E | S T O R E | A N A L Y Z E
Cray Storage Products
Cray Workflow-Driven Storage Solutions23
Scale optimally – small to large systems• Gigabytes to terabytes of performance• Terabytes to exabytes of capacity
Scalable building blocks• Best-of-breed storage technologies• Open systems and software
C O M P U T E | S T O R E | A N A L Y Z E
Cray Sonexion Storage System
Cray Workflow-Driven Storage Solutions24
● Purpose-built system for Lustre at scale● Key building block for Cray system architectures● Enables data protection and availability at scale
● Simplifies deployment and management● Builds on pre-configured integrated design● Reduces storage footprint by 50% for petascale systems
● Optimized for scale ● Scale bandwidth from 5GB/s to 1TB/s per-file system● Optimal performance-to-drive ratio
● Proven by Cray● Cray delivering sustained real-world performance at scale for
supercomputing and big data
Scale-out Lustre System for Big Data and Supercomputing
C O M P U T E | S T O R E | A N A L Y Z E
Cray Tiered Adaptive Storage (TAS)
Cray Workflow-Driven Storage Solutions25
C O M P U T E | S T O R E | A N A L Y Z E
Cray Tiered Adaptive Storage (TAS)
Cray Workflow-Driven Storage Solutions26
● Preserve data indefinitely● Optimized for scale● Data fully protected● Upgrade with technology
● Simplified management and implementation● Up to 5 tiers with Lustre - flexible media choices● Non-disruptively upgrade storage and media● Familiar SAM-QFS tools and commands● Minimal impact to users or applications
● Access data forever● Data protection and accessibility at scale● Expert design for maximum scalability● Single point of support by Cray
Open archive and deployment-ready HSM for Big Data and Supercomputing
C O M P U T E | S T O R E | A N A L Y Z E
Our partnership with Versity: Using the Versity Storage Manager (VSM)
Cray Tiered Adaptive Storage
● Based on open source SAM-QFS ● Versity has strong heritage with data management solutions
● Familiar tools – works like SAM● Policy driven to provide constant management of user data● Flexible options to classify data● Integrated migration capability from 3rd party solutions● Tightly integrated into file system
● Shared file system● POSIX SAN shared file system● Support for hundreds of native file system clients● Native interface to storage tiers● Integrated volume management with performance configurations
● Integrated and configured by Cray● Pre-validated systems● Cray support for all hardware and software
C O M P U T E | S T O R E | A N A L Y Z E
Cray TAS – Simplifying HSM
Cray Workflow-Driven Storage Solutions
IB Fabric
fs1
fs2
fs3
QDR
FDR
FC
Ethernet
Data Movement and Transparent User Access
Shared Virtualized Storage Pool
28
C O M P U T E | S T O R E | A N A L Y Z E
Cray Tiered Adaptive Storage Architecture
Cray Workflow-Driven Storage Solutions29
● Virtualizes storage● Single interface to multiple tiers● File systems appear infinitely large● No user interaction required
● Protects data at scale● Multiple copies of files● Disaster recovery capabilities
● Flexible storage tiers● Scale the correct tiers to your needs● Support for both disk and tape
● Transparent for users and apps● Maintain ease of use for your customers
● Extensible to Lustre file system● Lustre file system integration● Maintain transparency throughout
Users and Applications
Tier 1
Tier 2
Tier 3
Tier 4
File System
Policy-based Data Movement
Policy Engine
Lustre File System
Users and Applications
Tran
spar
ent
Dat
a A
cces
s
Users and applications always have access to data
C O M P U T E | S T O R E | A N A L Y Z E
Cray TAS Connector for Lustre
Cray Workflow-Driven Storage Solutions30
● Data management and protection for Lustre file systems● Transparently manage data within Lustre● Up to 5 storage tiers● Option offered with Cray Tiered Adaptive Storage (TAS) solution
● Key challenges● Efficiently utilizing Lustre storage● Data protection and preservation● Sustaining exascale-capable archives
● Key benefits of Cray TAS Connector for Lustre● TCO reduction
● Increase utilization of high-performance Lustre storage● Flexible data protection
● Up to 5 copies per file protected over multiple tiers● On and off-site copy options for disaster recovery
● Open systems and formats● Support Lustre 2.5 building on Lustre HSM API● Cray TAS relies on open source systems and tools and Versity open format archiving
C O M P U T E | S T O R E | A N A L Y Z E
Benefits to Existing SAM-QFS Customers
Cray Workflow-Driven Storage Solutions31
● Reduce TCO by 30% over SAM-QFS● 20-40% over 3 years
● Have confidence preserving data forever● Cray commitment to SAM-QFS capabilities on open systems ● Cray driving long-term roadmap for Linux (vs. Solaris)
● Simplify managing data using familiar tools● Similar toolset as SAM-QFS● Geared toward HPC and big data environment on Linux
● Easy migration path from SAM-QFS to TAS● In-place conversion from SAM to TAS● Performed by Cray
Cray TAS offers a deployment-ready, open archiving and tiered storage solution that reduces TCO by 30% over 3 years compared to SAM-QFS, instills confidence in the future, and simplifies managing data using familiar tools.
C O M P U T E | S T O R E | A N A L Y Z E
Case Studies
C O M P U T E | S T O R E | A N A L Y Z E
Cray Customers
Cray Workflow-Driven Storage Solutions33
C O M P U T E | S T O R E | A N A L Y Z E
HLRN
Cray Workflow-Driven Storage Solutions34
Challenges
• Migrate data onto new tape libraries-and from SAM-QFS
• Utilize Linux and ecosystem of administration
• Maintain familiar management environment
• Single point of support
Solution
• Over 5 PB Cray Tiered Adaptive Storage, powered by Versity, open archive system
• In-place migration & conversion from SAM-QFS to Versity Storage Manager, using TAS
Results
• Seamless migration from original SAM-QFS environment to Cray TAS
• Seamless migration from original tape libraries to new tape libraries
• Simplified management, standardized on Linux
• Single point of support by Cray.
We wanted a uniform hardware and software landscape, utilizing Linux. From a management perspective, Cray TAS is superior to maintaining a proprietary environment.
Dr. Steffen Schulze-Kremer, head of HPC at RRZN
Cray TAS open archiving system for big data deployed at North German Supercomputing Center for long-term data preservation
C O M P U T E | S T O R E | A N A L Y Z E
University of Chicago – Life Sciences Research
Cray Workflow-Driven Storage Solutions
Parallel Whole Genome Analysis (WGA) workflow, Megaseq
Challenges
• Time constraints and efficiency of whole genome analysis using conventional clusters
• Scaling storage and compute to support Megaseq WGA workflow
Solution
• The Beagle XE6 supercomputer and associated storage, by Cray
• Storage solution included:
• Direct-attached Lustre (DAL)
• 600TBs of usable capacity
Results
• Reduced analysis time of 240 genomes from ~37 years of theoretical CPU time to 50.4 real time hours using the MegaSeq workflow on the Cray system, Beagle
Study published on: https://beagle.ci.uchicago.edu/science-at-beagle/
35
C O M P U T E | S T O R E | A N A L Y Z E
Cray Workflow-Driven Storage Solutions
Challenges• Reducing time to results for production research
while maintaining consistent user experience
• Meeting resiliency, performance and other stringent testing requirements
• Optimizing deployed storage assets and supporting multi-vendor products
Solution
• Single point of solution architecture and support entire solution—to Cray
• Lustre File System by Cray
• Data Virtualization Services for scaling NFS
• NetApp E-Series storage
• Lustre Client for Cray XE6
Results
• Increased file system stability, reliability, and performance using same storage
• Users like consistency in job performance
Alliance for Computing at Extreme Scale (Cielo)Large capability-class system supporting diverse science and weapons physics, simulations and modeling, such as asteroid impact scenarios
We immediately saw success from the perspective of stability and performance. Our bandwidth numbers were higher than the previous vendor’s, using the exact same hardware. We went from the file system being our biggest issue to the least of our issues, with Cray.
Jim Lujan, HPC Project Leader, LANL
Cielo web site: http://www.lanl.gov/orgs/hpc/cielo/
36
C O M P U T E | S T O R E | A N A L Y Z E
InfiniBand
Large Integrated Oil & Gas Company
Total Capacity: 10+PBsTotal IO Performance: 180GB/s3rd-Party Cluster Nodes: 1400+
Cray Lustre Clients
Cray XE6
Production (Seismic)
3rd Party Cluster
Cray Lustre Clients Cray Lustre Clients
Cray XE6
Test/Dev
Lustre fs1Lustre
fs2Lustre Lustre
Test/Dev
Cray Sonexion Scale-out Lustre System
Challenges
• Breaking I/O bottlenecks and reducing cost of processing
• Sharing data among heterogeneous HPC compute clusters
• Interoperability across file systems
Solution
• Cray Sonexion 1600
• Lustre Client for 3rd-party Cluster
• Lustre Client for Cray XE6 & XC30
Results
• multiple production file systems
• 10+PBs capacity
• 180 GB/s sustained IO
• Shared across all Cray compute systems and cluster nodes
Data- and IO-intensive seismic processing for discovery and analysis, from basic seismic to Full Waveform Inversion (FWI)
Cray Workflow-Driven Storage Solutions37
C O M P U T E | S T O R E | A N A L Y Z E
Enabling Scientific Breakthroughs at the Petascale
● Integrated Storage at 1TB/sec● 25+ PB of usable space● Production Science at Full Scale● Cray Reliability and Service
NCSA Blue Waters Case StudySupercomputing at
Sustained Petascale Performance
Cray Workflow-Driven Storage Solutions38
C O M P U T E | S T O R E | A N A L Y Z E
Summary
Cray Workflow-Driven Storage Solutions39
● Scale performance and capacity based on workflow needs● Scalable tiered storage architecture for Lustre● Comprehensive data management across tiers
● Reduce TCO● By 30% over 3 years compared to SAM-QFS● Easy, familiar management tools, like SAM-QFS
● Protect data indefinitely, at the right cost● Ideal for reducing, protecting, and managing Lustre data● Protect up to 5 copies—within and across locations
● Stay open and portable● Using Linux, open systems, and open formats
Cray Sonexion and Cray TAS provide an easy, open and scalable data management and protection solution spanning the range of workflows for big data and supercomputing.
C O M P U T E | S T O R E | A N A L Y Z E
Seymour Cray
June 4, 1995
The future is seldom the same as the past
Cray Workflow-Driven Storage Solutions40
C O M P U T E | S T O R E | A N A L Y Z E
Cray Workflow-Driven Storage Solutions41
C O M P U T E | S T O R E | A N A L Y Z E
Legal Disclaimer
Information in this document is provided in connection with Cray Inc. products. No license, express or implied, to any intellectual property rights is granted by this document.
Cray Inc. may make changes to specifications and product descriptions at any time, without notice.
All products, dates and figures specified are preliminary based on current expectations, and are subject to change without notice.
Cray hardware and software products may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request.
Cray uses codenames internally to identify products that are in development and not yet publically announced for release. Customers and other third parties are not authorized by Cray Inc. to use codenames in advertising, promotion or marketing and any use of Cray Inc. internal codenames is at the sole risk of the user.
Performance tests and ratings are measured using specific systems and/or components and reflect the approximate performance of Cray Inc. products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance.
The following are trademarks of Cray Inc. and are registered in the United States and other countries: CRAY and design, SONEXION, URIKA, and YARCDATA. The following are trademarks of Cray Inc.: ACE, APPRENTICE2, CHAPEL, CLUSTER CONNECT, CRAYPAT, CRAYPORT, ECOPHLEX, LIBSCI, NODEKARE, THREADSTORM. The following system family marks, and associated model number marks, are trademarks of Cray Inc.: CS, CX, XC, XE, XK, XMT, and XT. The registered trademark LINUX is used pursuant to a sublicense from LMI, the exclusive licensee of Linus Torvalds, owner of the mark on a worldwide basis. Other trademarks used in this document are the property of their respective owners.
Cray Workflow-Driven Storage Solutions42