big applications: simulations, models, visualization, … scientific data management for big...
TRANSCRIPT
Big Applications: Simulations, Models, Visualization, …
Scientific data management for
big computers and big datahttp://hdf.ncsa.uiuc.edu/HDF5/
HDF5 (serial and/or parallel)
Parallel UDM
Software StacksApplications and readers, often customized for particular technical fields, enable users to create, manipulate, and view scientific and engineering data. With the support of intervening libraries, common interfaces, and HDF5, scientists and engineers in many fields are able to share data and software.
Specialized libraries and Common Interfaces use HDF5 layer for data management and often provide specialized metadata, context, and tools for data transformations and exchange.
The HDF5 layer provides many data management functions, including machine-independent storage of all datatypes, metadata describing datatypes, user-defined attributes, etc., sophisticated subsetting and subsampling capabilities.
Parallel HDF5 uses MPI-IO to provide parallel file system functionality and global file access.
SAF LibSheaf HDF-EOS
ReadersCommon Interfaces
Examples: Thermonuclear simulationsProduct modelingData mining tools
Visualization toolsClimate models
IDL
Storage
HDF5 virtual file layer (I/O drivers)
File on parallelfile systemFile
MPI I/O
Split metadata and raw data files
Split FilesStdio
User-defineddevice
Custom
?
Virtual File LayerThe HDF5 VFL, or virtual file layer, provides access to many different data input and output mechanisms. The standard (stdio), split, and MPI drivers read from and write to files on storage media; the stream driver reads and writes virtual files or streams of data.
The VFL also enables the creation of custom drivers, such as the stream driver, for specialized or user-defined situations.
Across the networkor to/from another
application or library
Stream
Representative Technical Fields* in which HDF5 Is Used
* from selected HDF5 download registrations, 15 October 2001 through 22 February 2002
ToolsVarious tools provide means of accessing HDF5 files, including the data, metadata, and hierarchical structure, without having to write new software.
HDFview, illustrated at the top of this image, displays the structure of a simple HDF5 file in one panel, raw data in another, and if appropriate an image or portion of it in a third. The larger image is the full, independently-generated gravity wave image.
HDF5 runs on almost all computers, including many parallel computers
LawrenceLivermoreNational Laboratory
National Center for Supercomputing ApplicationsUniversity of Illinois at Urbana-Champaign
Matter & the universe
Weather and climateA15-projector display wall (resolution 6400 x 3072) for viewing interactive applications and pre-computed animations at Lawrence Livermore National Laboratory.
August 24, 2001 August 24, 2002
Total Column Ozone (Dobson)
60 385 610
Answering big questions … involves big
data …
The ASCI White system contains 8,192 interconnected processors. Its 6.2 terabyte (trillion byte) The ASCI White system contains 8,192 interconnected processors. Its 6.2 terabyte (trillion byte) memory is about 97,000 times that of a 64-MB PC. Its 7,000 disk drives with 160 terabytes of storage memory is about 97,000 times that of a 64-MB PC. Its 7,000 disk drives with 160 terabytes of storage space has about 16,000 times the storage capacity of a desktop computer with a 10-GB hard disk. space has about 16,000 times the storage capacity of a desktop computer with a 10-GB hard disk.
on big computers.
Life and nature
How do we… Describe big data? Store it? Find it? Share it? Mine it? Move it into, out of, and between computers?A file format and software to describe, organize, store, share, and access big data:• Store large, complex scientific and engineering data sets• Retrieve complete data or partial data, easily and quickly• Enable parallel I/O, remote access, specialized access• A free, open standard developed by NCSA and the Lawrence Livermore, Sandia, and Los Alamos National Laboratories, with additional support from NASA
The name HDF5 derives from the term hierarchical data format. An HDF5 file is a hierarchically structured set of groups, datasets, and metadata.
Density gradient in the plasma causes the laser beam to self-focus and then split up into several "filaments".
Simulation of a NIF laser beam passing through a plasma.
Simulation by Bert Still, Visualization by Steve Langer, LLNL
HDF5 File Structure
Copyright 2002 by the Board of Trustees of the University of Illinois
HDF5
Courtesy of Arthur Mirin, LLNL
University of Illinois
NASA
National ScienceFoundation
DOE SciDAC
LANL LLNL, SNL TriLab NASA
Visualization courtesy of John Shalf, NERSC/Lawrence Berkeley Laboratory,using data computed on the NERSC SP2 by Dennis Pollney and the Cactus Team, Albert Einstein Institute
Aerospace
Agricultural research
Air traffic control
Aircraft emissions database
Applied mathematics
Astrophysics
Astrophysics / supernovae
Atmospheric chemistry
Atmospheric physics
Bioengineering
CEM Simulation
Climatology / hydrology
Computational fluid dynamics
Computational physics
Computational physics / education
Computational physics and computational
astrophysics
Computer modeling
Computer science
Data processing
Earth observation / atmospheric science
Earth science
Environment
Fast searching, sorting and retrieval
Film making special effects
Fluid mechanics
GIS
Geodetic Science
Geology
Gravitational physics
Hydrology
Information technology
Magnetic mass spectrometer development
Marine biology / ecology
Materials science
Meteorological data products
Meteorology
Microscopy
Molecular biology
Nano device simulation
Neutron scattering
Ocean color
Ocean remote sensing
Optics / optoelectronics
Petroleum engineering
Photonic band gap studies
Photonic crystals
Photonics
Post-fire erosion analysis
Protein crystallography, molecular modeling
Protostellar accretion discs
Remote sensing
SAR processing
Satellite / weather radar remote sensing
Satellite oceanography
Semiconductor process simulation
Software engineering, distributed systems
Space geodesy
Space physics
Surface water flow and sediment transport
Theoretical chemistry
Visualization
Volcanology
Water resources management
X-ray physics
Computers and operating systems include:
MacOS X
MS Windows
UNIX
Linux
FreeBSD
OSF1
HP-UX
IBM SP
SGI IRIX64
Cray T3E
Cray SV1
Sun Solaris
IA-32 and IA-64
Clusters and high performance computers
include:
ASCI Red
ASCI Blue Mountain
ASCI Blue Pacific
ASCI White
Various experimental clusters
Other HDF5 sponsors include