scalable i/o: research to reach the tipping point

11
Alok Choudhary, Professor Director: Center for Ultra-Scale Computing and Security Dept. of Electrical & Computer Engineering And Kellogg School of Management Northwestern University [email protected] Scalable I/O: Research to Reach the Tipping Point

Upload: others

Post on 24-Jan-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Alok Choudhary, ProfessorDirector: Center for Ultra-Scale Computing

and SecurityDept. of Electrical & Computer Engineering

And Kellogg School of ManagementNorthwestern University

[email protected]

Scalable I/O: Research to Reach the Tipping Point

August 16, 2005 @ANC HEC-IWG 2

Tapes

Disks

ScientificSimulations

& experiments

Consider Productivity and Performance

ScientificAnalysis

& Discovery

DataManipulation:

• Getting files from storage

• Extracting subsetof data from files

• Reformatting data• Getting data fromheterogeneous,distributed systems

• moving data overthe network

Petabytes

Terabytes

Tapes

Disks

Petabytes

Terabytes

DataManipulation:

~80%time

~20%time

~20%time

~80%time

ScientificAnalysis

& Discovery

• Climate Modeling• Astrophysics• Genomics and Proteomics• High Energy Physics •Optimizing shared access

from storage systems•Metadata•High-dimensional indexing

•Adaptive file caching•Parallel File Systems•Runtime libraries

SDM Technology

Current Goal

GoalsOptimize and simplify:• access to very large datasets• access to distributed data• access of heterogeneous data• data mining of very large datasets

August 16, 2005 @ANC HEC-IWG 3

Typical Software Layers for I/O in HEC

Based on a lot of current appsHigh-Level

E.g., NetCDF, HDF, ABCApplications use these

Mid-levelE.g., MPI-IOPerformance experience

Low LevelE.g., File Systems Critical for performance in above

Parallel netCDF/HDF/..

Compute node Compute node

Compute node

Compute node

switch network

I/OServer

I/OServer

I/OServer

MPI-IO

End-to-End Performance critical

Parallel File System

Applications

August 16, 2005 @ANC HEC-IWG 4

User specifies how Complex non-portable optimization space

streaming/

Small/large

configuration

s/w layer

Regular/irregular

Local/remote

• user burdened• Ineffective interfaces• Non-communicating layers• Reactive

I/O – Complex Optimization Space

“Currently data handling, I/O, storage (speed), analysis is the main bottleneck. It is known how to scale computations based on processing power and memory. Subsequent phases are a bottleneck due to h/w and s/w infrastructure” An App scientist

August 16, 2005 @ANC HEC-IWG 5

Current Research Example: Direct Access Cache System (DAChe)

Main Idea: Runtime Cache in user space to capture small, irregular accessesPortable4 main subsystems

I/O interface and protocolCache ManagementLook-up managementLocking Subsystem

DAChe InterfaceDAChe Interface

Cache MgmtCache Mgmt LookLook--upup LockingLocking

ApplicationApplication

ISC2005

August 16, 2005 @ANC HEC-IWG 6

File System Calls

Clients

August 16, 2005 @ANC HEC-IWG 7

ML310-board4

ML310-board3

ML310-board2

Active Storage System (reconfigurable system)

External net

External net

ML310-host

ML310-board1

Switch

Xilinx XC2VP30 Virtex-II Pro family30,816 logic cells (3424 CLBs)2 PPC405 embedded cores2,448 Kb (136 18 Kb blocks) BRAM136 dedicated 18x18 multiplier blocks

Software:Data MiningEncryptionFunctions and runtime libsLinux micro-kernel

Research Directions

August 16, 2005 @ANC HEC-IWG 9

FS DM Datasets HSS

Goal

Decouple “What” from “How” and Be Proactive

cachingcollectivereorganize

loadbalanceFault-tolerance

Understand

App

1

App

2

App

3

App

4

I/O S

W O

PT

streaming/

Small/large

configuration

s/w layer

Regular/irregular

Local/remote

• user burdened• Ineffective interfaces• Non-communicating layers

Current

SpeedBWLatencyQoS

August 16, 2005 @ANC HEC-IWG 10

pnetCDFHDF5

MPI-IO

Client-side file system

Server-side file system

Storage system

User application

Collectives, independentsI/O hints: access style (read_once, write_mostly, sequential, random, …), collective buffering, chunking, striping

Open mode (O_RDONLY, O_WRONLY, O_SYNC), file status, locking, flushing, cache invalidationMachine dependent: data shipping, sparse access, double buffering

Access base on : file blocks, objects Scheduling, aggregation

Read-ahead, write-behind, metadata management, file striping, security, redundancy

Data types (byte-alignment), data structures (flexible dimensionality), hierarchical data model

Access patterns: shared files, individual files, data partitioning, check-pointing, data structures, inter-data relationship

application-aware caching, pre-fetching, file grouping, “vector of bytes”, flexible caching control, object-based data alignment, memory-file layout mapping, more control over hardware, Shared file descriptors,

Group locks, flexible locking control, scalable metadata management, zero-copying, QoS, Shared file descriptors,

Active storage: data filtering,object-based/hierarchical storage management, indexing, mining, power-management

Caching, fault tolerance, read-ahead, write-behind, I/O load balance, wide-area, heterogeneous FS support, thread-safe

August 16, 2005 @ANC HEC-IWG 11

Some Complexity Dimensions

Data AccessPatterns

Usa

ge M

odel

(Im

med

iate

/Arc

hiva

l)Access Type

(Overwrite/Streaming)C

oncu

rren

cy/P

aral

lelis

m(S

mal

l/Lar

ge)

Storage Systems(Active/Passive)

Software Layers(High-Level/MPI/Filesystem)

Data AccessPatterns

Usa

ge M

odel

(Im

med

iate

/Arc

hiva

l)Access Type

(Overwrite/Streaming)C

oncu

rren

cy/P

aral

lelis

m(S

mal

l/Lar

ge)

Storage Systems(Active/Passive)

Software Layers(High-Level/MPI/Filesystem)

Others•Fault-tolerance•Platform specific•…