your name here challenges for scalable scientific knowledge discovery alok choudhary eecs...
TRANSCRIPT
your name here
Challenges for Scalable Scientific Knowledge Discovery
Alok Choudhary
EECS Department, Northwestern University
Wei-keng Liao, Kui Gao, Arifa Nisar
Rob Ross , Rajeev Thakur, Rob Latham (ANL)
Many people from SDM center
1
your name here
Outline
• Achievements
• Success stories
• Vision for the future (and of the past!)
2
Analytics and Mining
Scientific Data Management
High-Performance I/O
•Data management•Query of Scientific DB•Performance optimizations
•High-level interface•proactive•What not How?
•In-place analytics•Customized acceleration•Scalable Mining
Knowledge DiscoveryKnowledge Discovery
your name here
Achievements
• Parallel NetCDF• New parallel I/O APIs
• Scalable data file (64bit) implementation
• Application communities: DOE climate, astrophysics, ocean modeling
• MPI-IO• A coherent cache layer in ROMIO
• Locking protocol aware file domain partitioning methods
• Many optimizations
• Use in production applications
• PVFS• Datatype I/O
• Distributed file locking
• I/O benchmark• S3aSim: a sequence similarity search framework
3
your name here
Success stories
• Parallel NetCDF
• Application communities: DOE climate, astrophysics, ocean modeling
• FLASH-IO benchmark with pnetCDF method
• Application
• S3D combustion simulation from Jacqueline Chen at SNL
• MPI collective I/O method
• PnetCDF method
• HDF5 method
• ADIOS method
• I/O benchmark
• S3aSim: a sequence similarity search framework
• Lots of downloads of software in public domain –
Techniques directly and indirectly used by many
applications
4
your name here
Illustrative pnetCDF users
• FLASH – astrophysical thermonuclear application from
ASCI/Alliances center at university of Chicago
• ACTM – atmospheric chemical transport model, LLNL
• WRF-ROMS – regional ocean model system I/O module from
scientific data technologies group, NCSA
• ASPECT – data understanding infrastructure, ORNL
• pVTK – parallel visualization toolkit, ORNL
• PETSc – portable, extensible toolkit for scientific computation, ANL
• PRISM – PRogram for Integrated Earth System Modeling, users
from C&C Research Laboratories, NEC Europe Ltd.
• ESMF – earth system modeling framework, national center for
atmospheric research
J. Li, W. Liao, A. Choudhary, R. Ross, R. Thakur, W. Gropp, R. Latham, A. Siegel, B. Gallagher, and M. Zingale. Parallel netCDF: A Scientific High-Performance I/O Interface. SC 2003.
5
your name here
PnetCDF large array support
• The limitations of current pnetCDF
• CDF-1: < 2GB file size and < 2GB array size
• CDF-2: > 2GB file size but still < 2GB array size
• File format: uses only 32-bit signed integers
• Implementations: MPI Datatype constructor uses only 32-bit
integers
• Large array support
• CDF-5: > 2GB file size and > 2GB array size
• Changes in file format and APIs
• Replace all 32-bit integers with 64-bit integers
• New 64-bit integer attributes
• Changes in implementation
• Replace MPI functions and maintain or enhance optimizations
(Current/future work)
6
your name here
PnetCDF subfiling
• As the number of processes increases in today’s HPC,
problem domain size increases, so are array sizes
• Storing global arrays of size > 100GB in a single
netCDF file may not be effective/efficient for post
data analysis
• Subfiling divides a netCDF dataset into multiple files,
but still maintaining the canonical data structure
• Automatically reconstruct arrays, subarrays, based
on the subfiling metadata
Lustre
(Current/future work)
7
your name here
Analytical functions for pnetCDF
• A new set of APIs
• Reduction functions, statistical functions, histograms, and
multidimensional transformations, data mining
• Enable on-line processing while data is generated
• Built on top of the existing pnetCDF data access
infrastructure
(Future work)
8
your name here
MPI-IO persistent file domain
• Aim to reduce the cost of cache coherence control
across multiple MPI-IO calls
• Keep file access domains unchanged from one to another
IO call
• Cached data can safely stay at client-side memory without
being evicted
• Implementations:
• User provided domain size
• Automatically determined by the aggregate access region
K. Coloma, A. Choudhary, W. Liao, L. Ward, E. Russell, and N. Pundit. Scalable High-level Caching for Parallel I/O. IPDPS 2004.
(Past work)
9
your name here
MPI-IO file caching
• A coherent client-side file caching system
• Aim to improve performance across multiple I/O calls
• Implementations
• I/O threads: one POSIX thread in each I/O aggregator
• MPI remote memory access functions
• I/O delegate: using MPI dynamic process management functions
FLASH-IO
•W. Liao, A. Ching, K. Coloma, A. Choudhary, and L. Ward. An Implementation and Eval- uation of Client-side File Caching for MPI-IO. IPDPS 2007.• K. Coloma, A. Choudhary, W. Liao, L. Ward, and S. Tideman. DAChe: Direct Access Cache System for Parallel I/O. International Supercomputer Conference, 2005.
(Current/future work)
10
your name here
Caching with I/O delegate
• Allocate a dedicate group of processes to perform
I/O
• Uses a small percentage (< 10 %) of additional resource
• Entire memory space at delegates can be used for caching
• Collective I/O off-load
I/O delegate size is 3%
A. Nisar, W. Liao, and A. Choudhary. Scaling Parallel I/O Performance through I/O Delegate and Caching System. SC 2008.
(Current/future work)
11
your name here
Operations off-load
• I/O delegates are additional compute resource
• Idle while parallel program is in the computation stage
• Powerful enough to run complete parallel programs
• Potential operations
• On-line data analytical processing
• Operations for active disk with caching support
• Parallel programs since delegates can communicate with each
other
• Data redundancy and reliability support – parity, mirroring
across all delegates
(Future work)
12
your name here
MPI file domain partitioning methods
• Partitioning methods are based on underlying file
system locking protocol
• GPFS token-based protocol
• Align the partitioning with the lock boundaries
• Lustre server-based protocol
• Static-cyclic based
• Group-cyclic based
W. Liao and A. Choudhary. Dynamically Adapting File Domain Partitioning Methods for Collective I/O Based on Underlying Parallel File System Locking Protocols. SC 2008.
(Current/future work)
13
your name here
S3D-IO on Cray XTPerformance/Productivity
• Problem:
• Number of files created are often
generated per processor
• Causes problems with archiving and
future access
• Approach
• Parallel I/O (MPI-IO) optimization
• One file per variable during I/O
• Requires multi-processor coordination
during I/O
• Achievement
• Shown to scale to 10s of thousands of
processors on production systems
• better performance but eliminating the
need to create 100K+ files
(Current work)
14
your name here
Optimizations for PVFS
• Datatype I/O
• Packing non-contiguous I/O
requests into a single request
• Data layout is presented in a
concise description, which is
passed over the network instead
of (offset, length)
• Distributed locking
component
• Datatype lock – consisting of
many non-contiguous regions
• Try-lock protocol
• When failed, fall back to ordered two-
phase lock
FLASH-IO
• A. Ching, A. Choudhary, W. Liao, R. Ross, and W. Gropp. Efficient Structured Data Access in Parallel File Systems. Cluster Computing 2003• A. Ching, R. Ross, W. Liao, L. Ward, and A. Choudhary. Noncontiguous Locking Techniques for Parallel File Systems. SC 2007.
(past work)
15
your name here
I/O benchmark
• S3aSim
• A sequence similarity search algorithm framework for MPI-
IO evaluation. It uses a master-slave parallel programming
model with database segmentation, which mimics the
mpiBLAST access pattern
A. Ching, W. Feng, H. Lin, X. Ma, and A. Choudhary. Exploring I/O strategies for parallel sequence database search tools with S3aSim. HPDC 2006
(Past work)
16
your name here
Data analytic run-time library at active storage nodes
• Enhance the MPI-IO interfaces and functionality
• Pre-define functions
• Plug-in user-defined functions
• Embedded functions in MPI data representation
• Active storage infrastructure
• General-purpose CPU with GPUs and/or FPGA
• FPGAs for reconfiguration and acceleration of analysis
functions
• Software programming model
• Traditional application codes
• Acceleration codes for GPUs and FPGAs
(Future work)
17
your name here
THE VISION THING!
18
your name here
Science Goal: Understand global scale patterns in biosphere processes
Earth Science Questions:• When and where do ecosystem disturbances
occur?• What is the scale and location of land cover
change and its impact?• How are ocean, atmosphere and land
processes coupled?
Data sources:• Weather observation stations• High-resolution EOS satellites
1982-2000 AVHRR at 1° x 1° resolution (~115kmx115km) 2000-present MODIS at 250m x 250m resolution
• Model-based data from forecast and other models
Sea level pressure 1979-present at 2.5° x 2.5°
Sea surface temperature 1979-present 1° x 1°
• Data sets created by data fusion
Discovery of Patterns from Global Earth Science Data Sets
(Instruments, Sensors and/or Simulations)
Earth Observing
System
Monthly Average Temperature
your name here
Analytics/Knowledge Discovery Challenges
• Spatio-temporal nature of data
• Traditional data mining techniques do not take
advantage of spatial and temporal autocorrelation.
• Scalability
• Size of Earth Science data sets can be very large,
especially for data such as high-resolution
vegetation
• Grid cells can range from a resolution of 2.5° x 2.5°
(10K locations for the globe) to 250m x 250m (15M
locations for just California; about 10 billion for the
globe)
• High-dimensionality
• Long time series are common in Earth Science
your name here
Some Climate problems and Knowledge Discovery Challenges
Challenges
• Spatio-temporal nature of data
• Traditional data mining techniques do not take advantage of spatial and temporal autocorrelation.
• Scalability
• Size of Earth Science data sets has increased 6 orders of magnitude in 20 years, and continues to grow with higher resolution data.
• Grid cells have gone from a resolution of 2.5° x 2.5° (10K points for the globe) to 250m x 250m (15M points for just California; about 10 billion for the globe)
• High-dimensionality
• Long time series are common in Earth Science
Climate Problems Extend the range, accuracy, and
utility of weather prediction Improve our understanding and
timely prediction of severe weather, pollution, and climate events.
Improve understanding and prediction of seasonal, decadal, and century-scale climate variation on global, regional, and local scales
Create the ability to make accurate predictions of global climate and carbon-cycle response to various forcing scenarios over the next 100 years.
21
your name here
Astrophysics
Cosmological Simulations
•Simulate formation and evolution of galaxies
Snapshot from a pure N-body simulation showing the distribution of dark matter at the present time (light colors represent greater density of dark matter). 1B particles Postprocessed to demonstrate the impact
of ionizing radiation from galaxies.
•What is dark matter?•What is the nature of dark energy?•How did galaxies, quasars, and supermassive black holes form from the initial conditions in the early universe.
22
your name here
SDM Future Vision
• Build “Science Intelligence and Knowledge Discoverer”
• Think of this as “Oracle”, “SAS”, “NetAPP” and “Amazon” combined into one
• Build tools for customization to application domain (potential verticals)
• Provide “Toolbox” for common applications
• Develop Scientific Warehouse infrastructure
• Build intelligence into the I/O Stack
• Develop an analytics appliance
• Develop a language and support for specifying management and analytics
• “Focus on Needs” as more important consideration than ‘feature”
23
Alok Choudhary [email protected] 24 Northwestern University
Large-Scale Scientific Data Managementand Analysis
Prof. Alok ChoudharyECE Department, Northwestern UniversityEvanston, IL Email: [email protected]
ACKNOLEDGEMENTS: Wei-Keng Liao, M. Kandemir, X. Shen, S. More, R. Thakur, G. Memik, J No, R. Stevens
Project Web Page - http://www.ece.northwestern.edu/~wkliao/MDMSSalishan Conference on High-Speed Computing, April
2001
Alok Choudhary [email protected] 26 Northwestern University
Virtuous Cycle
Problem setup(Mesh, domainDecomposition)
Simulation(Execute app,Generate data)
Manage,Visualize, Analyze
Measure Results,Learn, Archive
Alok Choudhary [email protected] 27 Northwestern University
Problems and Challenges• Large-scale data (TB, PB ranges)• Large-scale parallelism (unmanageable)• Complex data formats and hierarchies• Sharing, analysis in a distributed environment• Non-standard systems and interoperability problems
(e.g., file systems)• Technology driven by commercial applications
– Storage– File systems– Data management
• What about analysis? Feature extraction, mining, pattern recognition etc.
Alok Choudhary [email protected] 28 Northwestern University
MDMS - Goals and Objectives• High-performance data access
– Determine optimal parallel I/O techniques for applications– Data access prediction– Transparent data pre-fetching, pre-staging, caching,
subfiling on storage system– Automatic data analysis for data mining
• Data management for large-scale scientific computations– Use a database to store all metadata for performance (and
other information) – future (XML?)– Static metadata: data location, access, storage pattern,
underlying storage device, etc– Dynamic metadata: data usage, historical performance and
access patterns, associations and relationships among datasets
– Support for on-line and off-line data analysis and mining
Alok Choudhary [email protected] 29 Northwestern University
Architecture
UserApplications
MDMS
Storage Systems(I/O Interface)
SimulationData AnalysisVisualization
Metadataaccess pattern, history
MPI-IO(Other interfaces..)
QueryInput MetadataHints, Directives
Associations OIDsparameters for I/O
Schedule, Prefetch, cacheHints (coll I/O)
Performance InputSystem metadata
I/O func (best_I/O (for these param))Hint
Data
Alok Choudhary [email protected] 30 Northwestern University
Metadata• Application Level
– Date, run-time parameters, execution environment, comments, result summary, etc.
• Program Level– Data types, structures– Association of multiple datasets and files– File location, file structures (single/multiple datasets multiple/single
file)
• Performance Level– I/O functions (eg. Collective/non-collective I/O parameters)– Access hints, access pattern, storage pattern, dataset
associations– Striping, pooled striping, storage association– Prefetching, staging, migration, caching hints– Historical performance
Alok Choudhary [email protected] 31 Northwestern University
Interface
Alok Choudhary [email protected] 32 Northwestern University
Run Application
Alok Choudhary [email protected] 33 Northwestern University
Dataset and Access Pattern Table
Alok Choudhary [email protected] 34 Northwestern University
Data Analysis
Alok Choudhary [email protected] 35 Northwestern University
Visualize
Alok Choudhary [email protected] 36 Northwestern University
Incorporating Data Analysis, Mining and Feature Detection
• Can these tasks be performed on-line?– It is expensive to write and read back data for
future analysis– Why not embed analysis functions within the
storage (I/O) runtime systems?– Utilize resources by partitioning system into data
generator and analyzer
Alok Choudhary [email protected] 37 Northwestern University
Integrating Analysis
Problem setup(Mesh, domainDecomposition)
Simulation(Execute app,Generate data)
Manage,Visualize, Analyze
Measure Results,Learn, Archive
On-line analysisAnd mining
Alok Choudhary [email protected] 38 Northwestern University
Some Publications• A. Choudhary, M. Kandemir, J. No, G. Memik, X. Shen, W. Liao, H.
Nagesh, S. More, V. Taylor, R. Thakur, and R. Stevens. ``Data Management for Large-Scale Scientific Computations in High Performance Distributed Systems'' in Cluster Computing: the Journal of Networks, Software Tools and Applications, 2000
• A. Choudhary, M. Kandemir, H. Nagesh, J. No, X. Shen, V. Taylor, S. More, and R. Thakur. ``Data Management for Large-Scale Scientific Computations in High Performance Distributed Systems'' in High-Performance Distributed Computing Conference'99, San Diego, CA, August, 1999.
• A. Choudhary and M. Kandemir. ``System-Level Metadata for High-Performance Data management'' in IEEE Metadata Conference, April, 1999.
• X. Shen, W. Liao, A. Choudhary, G. Memik, M. Kandemir, S. More, G. Thiruvathukal, and A. Singh. ``A Novel Application Development Environment for Large-Scale Scientific Computations'’, International Conference on Supercomputing, 2000
These and more Available at http://www.ece.northwestern.edu/~wkliao/MDMS
Alok Choudhary [email protected] 39 Northwestern University
Internal Architecture and Data Flow
your name here
In-Place On-Line Analytics – Software Architecture
Login
Network
System
I/O
Active
Analysis
Application
MPI/MPI-IOLibrary MPI-based analytics
functions
Parallel File system/Storage Functions
Active storage functions/Mining & Analytics library
TraditionalStorage & I/O
nodes
Active Storage &Analytics Nodes
Application
MPI/MPI-IOLibrary MPI-based analytics
functions
Parallel File system/Storage Functions
Active storage functions/Mining & Analytics library
TraditionalStorage & I/O
nodes
Active Storage &Analytics Nodes
40
your name here
Statistical and Data Mining Functions on Active Storage
Cluster • Develop computational kernels common in
analytics, data mining and statistical operations
for acceleration on FPGAs
• NU-minebench data mining package
• Develop parallel version of the data mining
kernels that can be accelerated using GPUs and
FPGAs
(Future work)
41
MineBench Project Homepage http://cucis.ece.northwestern.edu/projects/DMS
your name here
Accelerating and Computing in the Storage
Problem setupdecomposition
Application executionSimulation
I /O, Storageaccess
Analyze/Manage
MeasureArchive
Problem setupdecomposition
Application executionSimulation
I /O, Storageaccess
Analyze/Manage
MeasureArchive
Problem setupdecomposition
Application executionSimulation
I /O, Storageaccess
Analyze(on-line)
MeasureManageArchive
Problem setupdecomposition
Application executionSimulation
I /O, Storageaccess
Analyze(on-line)
MeasureManageArchive
Application
MPI/MPI-IOLibrary MPI-based analytics
functions
Parallel File system/Storage Functions
Active storage functions/Mining & Analytics library
TraditionalStorage & I/O
nodes
Active Storage &Analytics Nodes
Application
MPI/MPI-IOLibrary MPI-based analytics
functions
Parallel File system/Storage Functions
Active storage functions/Mining & Analytics library
TraditionalStorage & I/O
nodes
Active Storage &Analytics Nodes
42
your name here
Illustration of Acceleration (1) Classification (2) PCA
43
your name here
GPU Coprocessing
• Compared to CPUs, GPUs offer 10x higher computational capability and 10x greater memory bandwidth.• Lower operating speed, but higher transistor
count.• More transistors devoted to computation.
• In the past, general purpose computation on GPUs was difficult.• Hardware was specialized.• Programming required knowledge of the
rendering pipeline.• Now, however, GPUs look much more like SIMD
machines.• More of the GPU’s resources can be applied
toward general-purpose computation.• Coding for the GPU no longer requires
background knowledge in graphics rendering.• Performance gains of 1-2 orders of magnitude
are possible for data-parallel applications.
44
your name here
k-Means Performance (compared with host processor)
45
your name here
Results
• Matrix size : 2048
PCA : CUDA Vs Processor
0
5
10
15
20
25
30
35
0 500 1000 1500 2000 2500
No of Principal components
Sp
ee
du
p
46
your name here Aug 5,
2008
@ANC
Challenges in Scientific Knowledge Discovery
Scientific Data Management
Analytics and Mining
High-Performance I/O
•Data management•Query of Scientific DB•Performance optimizations
•High-level interface•proactive•What not How?
•In-place analytics•Customized acceleration•Scalable Mining
Knowledge DiscoveryKnowledge Discovery
your name here
SDM Future Vision
• Build “Science Intelligence and Knowledge Discoverer”
• Think of this as “Oracle”, “SAS”, “NetAPP” and “Amazon” combined into one
• Build tools for customization to application domain (potential verticals)
• Provide “Toolbox” for common applications
• Develop Scientific Warehouse infrastructure
• Build intelligence into the I/O Stack
• Develop an analytics appliance
• Develop a language and support for specifying management and analytics
• “Focus on Needs” as more important consideration than ‘feature”
48