the nci high performance computing (hpc) and high...
TRANSCRIPT
nci.org.aunci.org.au
@NCInews
The NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform to Support the Analysis of Petascale Environmental Data CollectionsBen Evans1, Lesley Wyborn1, Tim Pugh2, Chris Allen1, Joseph Antony1, Kashif Gohar1, David Porter1, Jon Smillie1, Claire Trenham1, JingboWang1, Irina Bastrakova3, Alex Ip3, Gavin Bell4
1ANU, 2Bureau of Meteorology,3Geoscience Australia, 4The 6th Column Project(Second part of this talk is in next ESSI Session)
ESSI 2015-8273
nci.org.au
• High Performance Data (HPD) - data that is carefully prepared, standardised and structured so that it can be used in Data-Intensive Science on HPC (Evans, ISESS 2015, Springer)– HPC – turning compute into IO-bound problems
– HPD – turning IO-bound into ontology + semantic problems
• What are the HPC and HPD drivers?
• How do you build environments for this infrastructure that is easy for users to do science?
© National ComputationalInfrastructure 2015
“NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans
1/25
nci.org.au
Top 500 Super Computer list since 1990
• Fast-and-flexible data access to structured data is required
• The needs to be a balance between processing power and ability to access data (data scaling)
• The focus is for on-demand direct access to large data sources
• enabling High performance analytics and analysis tools directly on that contenthttp://www.top500.org/statistics/perfdevel/
Current NCI
Next NCI
2/25
© National ComputationalInfrastructure 2015
“NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans
nci.org.au
Elephant Flows Place Great Demands on Networks
Physical pipe that leaks water at rate of .0046% by volume.
Network ‘pipe’ that drops packets at rate of .0046%.
Result100% of data transferred, slowly, at <<5% optimal speed.
Result 99.9954% of water transferred.
essentially fixed
determined by speed of light
With proper engineering, we can minimize packet loss.
Assumptions: 10Gbps TCP flow, 80ms RTT. See Eli Dart, Lauren Rotman, Brian Tierney, Mary Hester, and Jason Zurawski. The Science DMZ: A Network Design
Pattern for Data-Intensive Science. In Proceedings of the IEEE/ACM Annual SuperComputing Conference (SC13), Denver CO, 2013.
© National ComputationalInfrastructure 2015
3/25
“NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans
nci.org.au
Raijin:• 57,472 cores (Intel Xeon Sandy Bridge technology,
2.6 GHz) in 3592 compute nodes;• 160 TBytes (approx.) of main memory;• Infiniband FDR interconnect; and• 7 PBytes (approx.) of usable fast filesystem (for
short-term scratch space).• 1.5 MW power; 100 tonnes of water in cooling
Partner Cloud• Same generation of technology as raijin (Intel
Xeon Sandy Bridge technology, 2.6 GHz) but only 1500 cores;
• Infiniband FDR interconnect;• Collaborative platform for services and• The platform for hosting non-batch services
NCI Nectar Cloud• Same generation as partner cloud• Non-managed environment• Weak integration
Computational and Cloud Platforms
© National ComputationalInfrastructure 2015
4/25
“NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans
nci.org.au
Per-Tenant public IP assignments (CIDR boundaries – typically /29)
FDR
IB
FDR
IB
FDR
IBFD
R IB
FDR
IB
FDR
IB
OpenStack private IP (flat network*) - quota managed
NFS
Lustre
NFS
SSDSSDSSDSSDSSDSSD
NCI Cloud
© National ComputationalInfrastructure 2015
5/25
“NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans
nci.org.au
NCI’s integrated high-performance environment
10 GigE
/g/data 56Gb FDR IB Fabric
/g/data17.4 PB
/g/data26.75 PB
/short7.6PB
/home, /system, /images,
/apps
Cache 1.0PB, Tape 20PB
Massdata (tape) Persistent global parallel filesystem
Raijin high-speed filesystem
Raijin HPC Compute
Raijin Login + Data movers
NCI data movers
To s
econ
d da
ta c
entr
e
Raijin 56Gb FDR IB Fabric
Internet
6/25
© National ComputationalInfrastructure 2015
“NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans
/g/data39 PB
Cloud
nci.org.au
10+ PB of Data for Interdisciplinary Science
BOM GA CSIRO ANU Inter-national
Other National
CMIP53PB
Astronomy (Optical)
200 TB
WaterOcean1.5 PB
Atmosphere2.4 PBEarth
Observ. 2 PB
MarineVideos 10 TB
Geophysics 300 TB
Weather340 TB
Bathy, DEM100 TB
7/25
© National ComputationalInfrastructure 2015
“NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans
nci.org.au
Data Collections Approx. Capacity
CMIP5, CORDEX ~3 Pbytes
ACCESS products 2.4 Pbytes
LANDSAT, MODIS, VIIRS, AVHRR, INSAR, MERIS 1.5 Pbytes
Digital Elevation, Bathymetry, Onshore Geophysics 700 Tbytes
Seasonal Climate 700 Tbytes
Bureau of Meteorology Observations 350 Tbytes
Bureau of Meteorology Ocean-Marine 350 Tbytes
Terrestrial Ecosystem 290 Tbytes
Reanalysis products 100 Tbytes
National Environment Research Data Collections (NERDC)1. Climate/ESS Model Assets and Data Products2. Earth and Marine Observations and Data Products3. Geoscience Collections4. Terrestrial Ecosystems Collections5. Water Management and Hydrology Collections
8/25
© National ComputationalInfrastructure 2015
“NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans
nci.org.aunci.org.au
Internationally sourced • Satellite Data (USGS, NASA, JAXA, ESA, …)• Reanalysis (ECMWF, NCEP, NCAR, …)• Climate Data (CMIP5, AMIP, GeoMIP, CORDEX, …)• Ocean Modelling (Earth Simulator, NOAA, GFDL, …)These will only increase as we depend on more data, and some will be replicated.
How can we better keep this in sync, versioned, and back-referenced for the supplier?
• Organise “long-tail” data that calibrates and integrates with the big data.How should we manage this data, versioned, and easily attribute
supplier (researcher? Collab? Uni? Agency?)
9/25
© National ComputationalInfrastructure 2015
“NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans
nci.org.au
• Data Formats• Standardize data formats – time to convert legacy and proprietary ones• Appropriately normalise the data data models and conventions• Adopt HPC-enabled libraries that abstracts storage
• Expose all attributes for search • not just collection-level search, not just datasets, all data attributes• What are the handles we need to access the data?
• Provide more programmatic interfaces and link up data and compute resources• More server side processing
• Add the semantic meaning to the data• Create useful datasets (in the programming context) from data collections• Is it scientifically appropriate for a data service to aggregate/interpolate?
• What unique/persistent identifiers do we need? • DOI is only part of the story.• Versioning is important.• Born linked data and maintaining graph infrastructure
Some Data Challenges10/25
© National ComputationalInfrastructure 2015
“NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans
nci.org.au
Regularising High Performance Data using HDF5
HDF5 MPI-enabled HDF5 Serial
NetCDF-4 Library libgdal
Library Layer 1
Library Layer 2
[SEG-Y]Airborne
Geophysics Line data
[FITS] BAG …
Metadata
Layer
netCDF-CF HDF-EOS5 ISO 19115, RIF-CS, DCAT etc
Com
pilers &
Tools
Globe Caritas
Python, R, MatLab, IDL
Open NavSurface
Fortran, C, C++
Ferret, CDO,NCL, NCO, GDL,GDAL,
GrADS,GRASS,QGIS
Lustre Other Storage (options)
© National ComputationalInfrastructure 2015
11/25
“NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans
nci.org.au
HDF5 MPI-enabled HDF5 Serial
NetCDF-4 Library libgdal
Library Layer 1
Library Layer 2
Airborne Geophysics Line data
BAG …
Metadata
Layer
netCDF-CF HDF-EOS5 ISO 19115, RIF-CS, DCAT etc
Com
pilers &
Tools
Globe Caritas
Python, R, MatLab, IDL
Open NavSurface
Fortran, C, C++
Ferret, CDO,NCL, NCO, GDL,GDAL,
GrADS,GRASS,QGIS
Services(expose data m
odel+sema
ntics)
Fast “whole-of-library” catalogue
OG
C W
FS
OG
C SO
S
OG
C W
PS
OG
C W
CS
OG
C W
MS
OpenD
AP
Lustre Other Storage (options)
© National ComputationalInfrastructure 2015
12/25
[SEG-Y][FITS]
“NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans
Regularising High Performance Data using HDF5 –including Data Services
nci.org.au
GeoNetwork catalogue
Lucene database
DAP, OGC, … Services
/g/data1
/g/data2
Supercomputer access
Virtual lab
Trialing Elastic Search
Finding data and services13/25
© National ComputationalInfrastructure 2015
“NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans
nci.org.au
Virtual Labs:• Separating Researcher from Software builders• Cloud is an enabler, but:
• don’t make researchers become full system admins.• save developers from being operational
Productivity
Perspiration
Proj1:Start Proj1:End
Project lifecycle – and preparing success
Proj2-4:Start Proj2-4:End
Prototype to Production - anti-”Mine” craft 14/25
© National ComputationalInfrastructure 2015
“NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans
nci.org.au
Development Phase in a project
VL Managers
Dev
elop
ers
Hea
dspa
ce h
ours
VL Managers
Dev
elop
er
Poorly executed
DeveloperReasonablyexecuted
VL Mgr.
Wellexecuted
?
Prototype to Production - anti-”Mine” craft 15/25
© National ComputationalInfrastructure 2015
“NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans
nci.org.au
Prototype to Production - anti-”Mine” craft
Development Phase in a project
VL Managers
Dev
elop
ers
Hea
dspa
ce h
ours
VL Managers
Dev
elop
er
Poorly executed
DeveloperReasonablyexecuted
VL Mgr
Wellexecuted
Changed Scope – adopted broadly
16/25
© National ComputationalInfrastructure 2015
“NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans
nci.org.aunci.org.au
Virtual Laboratory driven software patterns
Basic OS functions
Common Modules
Bespoke Services
Special config choices
Super Software Stack
NCI Stack 1NCI Env Stack
WorkflowX
Analytics Stack
2xStack1
Modify Stack1Modify Stack 2P2P
Vis Stack
Gridftp
Take Stacks from UpstreamAnd use as Bundles
17/25
© National ComputationalInfrastructure 2015
“NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans
nci.org.au
Step 1: Development• Get template for development• What is special, separate out what is common• Reuse other software stacks where possible
Step 2: Prototype• Deploy in an isolated tenant of a cloud• Determine dependencies.• Test cases to demonstrate correctly functioning.
Step 3: Sustainability• Pull repo into operational tenant• Prepare bundle for integration with rest of framework• Hand back cleaned bundle• Establish DevOps process
Transition from developer, to prototype, to DevOps18/25
© National ComputationalInfrastructure 2015
“NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans
nci.org.aunci.org.au
NCI Core Bundles
Community1 repoCommunity2 repo
Virtual LaboratoryOperational Bundle- Git controlled- pull model- continuous integration
testing
DevOps approach to building and operating environments19/25
© National ComputationalInfrastructure 2015
“NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans
nci.org.au
• Separates roles and responsibilities - from gatekeeper to DevOps management: • Specialist on package• VL managers• system admin
• “Architecture” to “Platform”• flexible with technology change• makes handover/maintenance easier
• Both Test/Dev/Ops and patches/rollback become BAU• Sharable bundles• Can tag release of software stacks• Precondition for trusted software stacks• Provenance - Scientific / gov policy scrutiny
Advantages20/25
© National ComputationalInfrastructure 2015
“NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans
nci.org.aunci.org.au
A snapshot of layered bundles to build complex VLs21/25
© National ComputationalInfrastructure 2015
“NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans
nci.org.au
Increasing use of iPython Notebooks
VDI - Easy In-situ environment using virtual analysis desktops.
Easy analysis environments22/25
© National ComputationalInfrastructure 2015
“NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans
nci.org.au
VDI – cont …23/25
© National ComputationalInfrastructure 2015
“NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans
nci.org.aunci.org.au
NCI Petascale Data-Intensive Science Platform
10PB+ Research Data
Server-side analysis and visualization
Data ServicesTHREDDS
VDI: Cloud scale user desktops on data
Web-time analytics software
24/25
© National ComputationalInfrastructure 2015
“NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans
nci.org.au
• Interdisciplinary Science To publish, catalogue and access data and software for enhancing interdisciplinary, big data-intensive (HPD) science and with interoperable data services and protocols.
• Integrity of ScienceManaged services to capture a workflow’s process as a comparable, traceable output.Ease-of-access to data and software for enhanced workflow development and repeatable science which can be conducted with less effort or an acceleration of outputs.
• Integrity of DataThe data repository services to ensure data integrity, provenance records, universal identifiers, repeatable data discovery and access from workflows or interactive users.
Summary: Progress toward Major Milestones25/25
© National ComputationalInfrastructure 2015
“NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans