your name here challenges for scalable scientific knowledge discovery alok choudhary eecs...

your name here

Challenges for Scalable Scientific Knowledge Discovery

Alok Choudhary

EECS Department, Northwestern University

Wei-keng Liao, Kui Gao, Arifa Nisar

Rob Ross , Rajeev Thakur, Rob Latham (ANL)

Many people from SDM center

1

your name here

Outline

• Achievements

• Success stories

• Vision for the future (and of the past!)

2

Analytics and Mining

Scientific Data Management

High-Performance I/O

•Data management•Query of Scientific DB•Performance optimizations

•High-level interface•proactive•What not How?

•In-place analytics•Customized acceleration•Scalable Mining

Knowledge DiscoveryKnowledge Discovery

your name here

Achievements

• Parallel NetCDF• New parallel I/O APIs

• Scalable data file (64bit) implementation

• Application communities: DOE climate, astrophysics, ocean modeling

• MPI-IO• A coherent cache layer in ROMIO

• Locking protocol aware file domain partitioning methods

• Many optimizations

• Use in production applications

• PVFS• Datatype I/O

• Distributed file locking

• I/O benchmark• S3aSim: a sequence similarity search framework

3

your name here

Success stories

• Parallel NetCDF

• Application communities: DOE climate, astrophysics, ocean modeling

• FLASH-IO benchmark with pnetCDF method

• Application

• S3D combustion simulation from Jacqueline Chen at SNL

• MPI collective I/O method

• PnetCDF method

• HDF5 method

• ADIOS method

• I/O benchmark

• S3aSim: a sequence similarity search framework

• Lots of downloads of software in public domain –

Techniques directly and indirectly used by many

applications

4

your name here

Illustrative pnetCDF users

• FLASH – astrophysical thermonuclear application from

ASCI/Alliances center at university of Chicago

• ACTM – atmospheric chemical transport model, LLNL

• WRF-ROMS – regional ocean model system I/O module from

scientific data technologies group, NCSA

• ASPECT – data understanding infrastructure, ORNL

• pVTK – parallel visualization toolkit, ORNL

• PETSc – portable, extensible toolkit for scientific computation, ANL

• PRISM – PRogram for Integrated Earth System Modeling, users

from C&C Research Laboratories, NEC Europe Ltd.

• ESMF – earth system modeling framework, national center for

atmospheric research

J. Li, W. Liao, A. Choudhary, R. Ross, R. Thakur, W. Gropp, R. Latham, A. Siegel, B. Gallagher, and M. Zingale. Parallel netCDF: A Scientific High-Performance I/O Interface. SC 2003.

5

your name here

PnetCDF large array support

• The limitations of current pnetCDF

• CDF-1: < 2GB file size and < 2GB array size

• CDF-2: > 2GB file size but still < 2GB array size

• File format: uses only 32-bit signed integers

• Implementations: MPI Datatype constructor uses only 32-bit

integers

• Large array support

• CDF-5: > 2GB file size and > 2GB array size

• Changes in file format and APIs

• Replace all 32-bit integers with 64-bit integers

• New 64-bit integer attributes

• Changes in implementation

• Replace MPI functions and maintain or enhance optimizations

(Current/future work)

6

your name here

PnetCDF subfiling

• As the number of processes increases in today’s HPC,

problem domain size increases, so are array sizes

• Storing global arrays of size > 100GB in a single

netCDF file may not be effective/efficient for post

data analysis

• Subfiling divides a netCDF dataset into multiple files,

but still maintaining the canonical data structure

• Automatically reconstruct arrays, subarrays, based

on the subfiling metadata

Lustre


7

your name here

Analytical functions for pnetCDF

• A new set of APIs

• Reduction functions, statistical functions, histograms, and

multidimensional transformations, data mining

• Enable on-line processing while data is generated

• Built on top of the existing pnetCDF data access

infrastructure

(Future work)

8

your name here

MPI-IO persistent file domain

• Aim to reduce the cost of cache coherence control

across multiple MPI-IO calls

• Keep file access domains unchanged from one to another

IO call

• Cached data can safely stay at client-side memory without

being evicted

• Implementations:

• User provided domain size

• Automatically determined by the aggregate access region

K. Coloma, A. Choudhary, W. Liao, L. Ward, E. Russell, and N. Pundit. Scalable High-level Caching for Parallel I/O. IPDPS 2004.

(Past work)

9

your name here

MPI-IO file caching

• A coherent client-side file caching system

• Aim to improve performance across multiple I/O calls

• Implementations

• I/O threads: one POSIX thread in each I/O aggregator

• MPI remote memory access functions

• I/O delegate: using MPI dynamic process management functions

FLASH-IO

•W. Liao, A. Ching, K. Coloma, A. Choudhary, and L. Ward. An Implementation and Eval- uation of Client-side File Caching for MPI-IO. IPDPS 2007.• K. Coloma, A. Choudhary, W. Liao, L. Ward, and S. Tideman. DAChe: Direct Access Cache System for Parallel I/O. International Supercomputer Conference, 2005.


10

your name here

Caching with I/O delegate

• Allocate a dedicate group of processes to perform

I/O

• Uses a small percentage (< 10 %) of additional resource

• Entire memory space at delegates can be used for caching

• Collective I/O off-load

I/O delegate size is 3%

A. Nisar, W. Liao, and A. Choudhary. Scaling Parallel I/O Performance through I/O Delegate and Caching System. SC 2008.


11

your name here

Operations off-load

• I/O delegates are additional compute resource

• Idle while parallel program is in the computation stage

• Powerful enough to run complete parallel programs

• Potential operations

• On-line data analytical processing

• Operations for active disk with caching support

• Parallel programs since delegates can communicate with each

other

• Data redundancy and reliability support – parity, mirroring

across all delegates

(Future work)

12

your name here

MPI file domain partitioning methods

• Partitioning methods are based on underlying file

system locking protocol

• GPFS token-based protocol

• Align the partitioning with the lock boundaries

• Lustre server-based protocol

• Static-cyclic based

• Group-cyclic based

W. Liao and A. Choudhary. Dynamically Adapting File Domain Partitioning Methods for Collective I/O Based on Underlying Parallel File System Locking Protocols. SC 2008.


13

your name here

S3D-IO on Cray XTPerformance/Productivity

• Problem:

• Number of files created are often

generated per processor

• Causes problems with archiving and

future access

• Approach

• Parallel I/O (MPI-IO) optimization

• One file per variable during I/O

• Requires multi-processor coordination

during I/O

• Achievement

• Shown to scale to 10s of thousands of

processors on production systems

• better performance but eliminating the

need to create 100K+ files

(Current work)

14

your name here

Optimizations for PVFS

• Datatype I/O

• Packing non-contiguous I/O

requests into a single request

• Data layout is presented in a

concise description, which is

passed over the network instead

of (offset, length)

• Distributed locking

component

• Datatype lock – consisting of

many non-contiguous regions

• Try-lock protocol

• When failed, fall back to ordered two-

phase lock

FLASH-IO

• A. Ching, A. Choudhary, W. Liao, R. Ross, and W. Gropp. Efficient Structured Data Access in Parallel File Systems. Cluster Computing 2003• A. Ching, R. Ross, W. Liao, L. Ward, and A. Choudhary. Noncontiguous Locking Techniques for Parallel File Systems. SC 2007.

(past work)

15

your name here

I/O benchmark

• S3aSim

• A sequence similarity search algorithm framework for MPI-

IO evaluation. It uses a master-slave parallel programming

model with database segmentation, which mimics the

mpiBLAST access pattern

A. Ching, W. Feng, H. Lin, X. Ma, and A. Choudhary. Exploring I/O strategies for parallel sequence database search tools with S3aSim. HPDC 2006

(Past work)

16

your name here

Data analytic run-time library at active storage nodes

• Enhance the MPI-IO interfaces and functionality

• Pre-define functions

• Plug-in user-defined functions

• Embedded functions in MPI data representation

• Active storage infrastructure

• General-purpose CPU with GPUs and/or FPGA

• FPGAs for reconfiguration and acceleration of analysis

functions

• Software programming model

• Traditional application codes

• Acceleration codes for GPUs and FPGAs

(Future work)

17

your name here

THE VISION THING!

18

your name here

Science Goal: Understand global scale patterns in biosphere processes

Earth Science Questions:• When and where do ecosystem disturbances

occur?• What is the scale and location of land cover

change and its impact?• How are ocean, atmosphere and land

processes coupled?

Data sources:• Weather observation stations• High-resolution EOS satellites

1982-2000 AVHRR at 1° x 1° resolution (~115kmx115km) 2000-present MODIS at 250m x 250m resolution

• Model-based data from forecast and other models

Sea level pressure 1979-present at 2.5° x 2.5°

Sea surface temperature 1979-present 1° x 1°

• Data sets created by data fusion

Discovery of Patterns from Global Earth Science Data Sets

(Instruments, Sensors and/or Simulations)

Earth Observing

System

Monthly Average Temperature

your name here

Analytics/Knowledge Discovery Challenges

• Spatio-temporal nature of data

• Traditional data mining techniques do not take

advantage of spatial and temporal autocorrelation.

• Scalability

• Size of Earth Science data sets can be very large,

especially for data such as high-resolution

vegetation

• Grid cells can range from a resolution of 2.5° x 2.5°

(10K locations for the globe) to 250m x 250m (15M

locations for just California; about 10 billion for the

globe)

• High-dimensionality

• Long time series are common in Earth Science

your name here

Some Climate problems and Knowledge Discovery Challenges

Challenges

• Spatio-temporal nature of data

• Traditional data mining techniques do not take advantage of spatial and temporal autocorrelation.

• Scalability

• Size of Earth Science data sets has increased 6 orders of magnitude in 20 years, and continues to grow with higher resolution data.

• Grid cells have gone from a resolution of 2.5° x 2.5° (10K points for the globe) to 250m x 250m (15M points for just California; about 10 billion for the globe)

• High-dimensionality

• Long time series are common in Earth Science

Climate Problems Extend the range, accuracy, and

utility of weather prediction Improve our understanding and

timely prediction of severe weather, pollution, and climate events.

Improve understanding and prediction of seasonal, decadal, and century-scale climate variation on global, regional, and local scales

Create the ability to make accurate predictions of global climate and carbon-cycle response to various forcing scenarios over the next 100 years.

21

your name here

Astrophysics

Cosmological Simulations

•Simulate formation and evolution of galaxies

Snapshot from a pure N-body simulation showing the distribution of dark matter at the present time (light colors represent greater density of dark matter). 1B particles Postprocessed to demonstrate the impact

of ionizing radiation from galaxies.

•What is dark matter?•What is the nature of dark energy?•How did galaxies, quasars, and supermassive black holes form from the initial conditions in the early universe.

22

your name here

SDM Future Vision

• Build “Science Intelligence and Knowledge Discoverer”

• Think of this as “Oracle”, “SAS”, “NetAPP” and “Amazon” combined into one

• Build tools for customization to application domain (potential verticals)

• Provide “Toolbox” for common applications

• Develop Scientific Warehouse infrastructure

• Build intelligence into the I/O Stack

• Develop an analytics appliance

• Develop a language and support for specifying management and analytics

• “Focus on Needs” as more important consideration than ‘feature”

23

Alok Choudhary [email protected] 24 Northwestern University

Large-Scale Scientific Data Managementand Analysis

Prof. Alok ChoudharyECE Department, Northwestern UniversityEvanston, IL Email: [email protected]

ACKNOLEDGEMENTS: Wei-Keng Liao, M. Kandemir, X. Shen, S. More, R. Thakur, G. Memik, J No, R. Stevens

Project Web Page - http://www.ece.northwestern.edu/~wkliao/MDMSSalishan Conference on High-Speed Computing, April

2001

mailto:[email protected]

http://www.ece.northwestern.edu/~wkliao/MDMS





Cosmology Application

Time

Variables


Virtuous Cycle

Problem setup(Mesh, domainDecomposition)

Simulation(Execute app,Generate data)

Manage,Visualize, Analyze

Measure Results,Learn, Archive


Problems and Challenges• Large-scale data (TB, PB ranges)• Large-scale parallelism (unmanageable)• Complex data formats and hierarchies• Sharing, analysis in a distributed environment• Non-standard systems and interoperability problems

(e.g., file systems)• Technology driven by commercial applications

– Storage– File systems– Data management

• What about analysis? Feature extraction, mining, pattern recognition etc.


MDMS - Goals and Objectives• High-performance data access

– Determine optimal parallel I/O techniques for applications– Data access prediction– Transparent data pre-fetching, pre-staging, caching,

subfiling on storage system– Automatic data analysis for data mining

• Data management for large-scale scientific computations– Use a database to store all metadata for performance (and

other information) – future (XML?)– Static metadata: data location, access, storage pattern,

underlying storage device, etc– Dynamic metadata: data usage, historical performance and

access patterns, associations and relationships among datasets

– Support for on-line and off-line data analysis and mining


Architecture

UserApplications

MDMS

Storage Systems(I/O Interface)

SimulationData AnalysisVisualization

Metadataaccess pattern, history

MPI-IO(Other interfaces..)

QueryInput MetadataHints, Directives

Associations OIDsparameters for I/O

Schedule, Prefetch, cacheHints (coll I/O)

Performance InputSystem metadata

I/O func (best_I/O (for these param))Hint

Data


Metadata• Application Level

– Date, run-time parameters, execution environment, comments, result summary, etc.

• Program Level– Data types, structures– Association of multiple datasets and files– File location, file structures (single/multiple datasets multiple/single

file)

• Performance Level– I/O functions (eg. Collective/non-collective I/O parameters)– Access hints, access pattern, storage pattern, dataset

associations– Striping, pooled striping, storage association– Prefetching, staging, migration, caching hints– Historical performance


Interface


Run Application


Dataset and Access Pattern Table


Data Analysis


Visualize


Incorporating Data Analysis, Mining and Feature Detection

• Can these tasks be performed on-line?– It is expensive to write and read back data for

future analysis– Why not embed analysis functions within the

storage (I/O) runtime systems?– Utilize resources by partitioning system into data

generator and analyzer


Integrating Analysis

Problem setup(Mesh, domainDecomposition)

Simulation(Execute app,Generate data)

Manage,Visualize, Analyze

Measure Results,Learn, Archive

On-line analysisAnd mining


Some Publications• A. Choudhary, M. Kandemir, J. No, G. Memik, X. Shen, W. Liao, H.

Nagesh, S. More, V. Taylor, R. Thakur, and R. Stevens. ``Data Management for Large-Scale Scientific Computations in High Performance Distributed Systems'' in Cluster Computing: the Journal of Networks, Software Tools and Applications, 2000

• A. Choudhary, M. Kandemir, H. Nagesh, J. No, X. Shen, V. Taylor, S. More, and R. Thakur. ``Data Management for Large-Scale Scientific Computations in High Performance Distributed Systems'' in High-Performance Distributed Computing Conference'99, San Diego, CA, August, 1999.

• A. Choudhary and M. Kandemir. ``System-Level Metadata for High-Performance Data management'' in IEEE Metadata Conference, April, 1999.

• X. Shen, W. Liao, A. Choudhary, G. Memik, M. Kandemir, S. More, G. Thiruvathukal, and A. Singh. ``A Novel Application Development Environment for Large-Scale Scientific Computations'’, International Conference on Supercomputing, 2000

These and more Available at http://www.ece.northwestern.edu/~wkliao/MDMS


Internal Architecture and Data Flow

your name here

In-Place On-Line Analytics – Software Architecture

Login

Network

System

I/O

Active

Analysis

Application

MPI/MPI-IOLibrary MPI-based analytics

functions

Parallel File system/Storage Functions

Active storage functions/Mining & Analytics library

TraditionalStorage & I/O

nodes

Active Storage &Analytics Nodes

Application


functions




nodes


40

your name here

Statistical and Data Mining Functions on Active Storage

Cluster • Develop computational kernels common in

analytics, data mining and statistical operations

for acceleration on FPGAs

• NU-minebench data mining package

• Develop parallel version of the data mining

kernels that can be accelerated using GPUs and

FPGAs

(Future work)

41

MineBench Project Homepage http://cucis.ece.northwestern.edu/projects/DMS

your name here

Accelerating and Computing in the Storage

Problem setupdecomposition

Application executionSimulation

I /O, Storageaccess

Analyze/Manage

MeasureArchive



I /O, Storageaccess

Analyze/Manage

MeasureArchive



I /O, Storageaccess

Analyze(on-line)

MeasureManageArchive



I /O, Storageaccess

Analyze(on-line)

MeasureManageArchive

Application


functions




nodes


Application


functions




nodes


42

your name here

Illustration of Acceleration (1) Classification (2) PCA

43

your name here

GPU Coprocessing

• Compared to CPUs, GPUs offer 10x higher computational capability and 10x greater memory bandwidth.• Lower operating speed, but higher transistor

count.• More transistors devoted to computation.

• In the past, general purpose computation on GPUs was difficult.• Hardware was specialized.• Programming required knowledge of the

rendering pipeline.• Now, however, GPUs look much more like SIMD

machines.• More of the GPU’s resources can be applied

toward general-purpose computation.• Coding for the GPU no longer requires

background knowledge in graphics rendering.• Performance gains of 1-2 orders of magnitude

are possible for data-parallel applications.

44

your name here

k-Means Performance (compared with host processor)

45

your name here

Results

• Matrix size : 2048

PCA : CUDA Vs Processor

0

5

10

15

20

25

30

35

0 500 1000 1500 2000 2500

No of Principal components

Sp

ee

du

p

46

your name here Aug 5,

2008

@ANC

Challenges in Scientific Knowledge Discovery

Scientific Data Management

Analytics and Mining

High-Performance I/O

•Data management•Query of Scientific DB•Performance optimizations

•High-level interface•proactive•What not How?

•In-place analytics•Customized acceleration•Scalable Mining

Knowledge DiscoveryKnowledge Discovery

your name here

SDM Future Vision

• Build “Science Intelligence and Knowledge Discoverer”

• Think of this as “Oracle”, “SAS”, “NetAPP” and “Amazon” combined into one

• Build tools for customization to application domain (potential verticals)

• Provide “Toolbox” for common applications

• Develop Scientific Warehouse infrastructure

• Build intelligence into the I/O Stack

• Develop an analytics appliance

• Develop a language and support for specifying management and analytics

• “Focus on Needs” as more important consideration than ‘feature”

48

your name here challenges for scalable scientific knowledge discovery alok choudhary eecs...

Documents

data location

data usage

northwestern universityrun

storage pattern

offline data analysis

alok choudharyece department

data miningdata management

challengeslargescale