big data meets hpc: getting better acquainted · big data meets hpc: getting better acquainted dan...

21
Big Data Meets HPC: Getting better acquainted Dan Reed Vice President for Research and Economic Development University Computational Science and Bioinformatics Chair Computer Science, Electrical Engineering & Computer Engineering, and Medicine [email protected] www.hpcdan.org

Upload: others

Post on 28-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Big Data Meets HPC: Getting better acquainted · Big Data Meets HPC: Getting better acquainted Dan Reed Vice President for Research and Economic Development University Computational

Big Data Meets HPC:Getting better acquainted

Dan Reed

Vice President for Research and Economic Development

University Computational Science and Bioinformatics Chair

Computer Science, Electrical Engineering & Computer Engineering, and Medicine

[email protected] www.hpcdan.org

Page 2: Big Data Meets HPC: Getting better acquainted · Big Data Meets HPC: Getting better acquainted Dan Reed Vice President for Research and Economic Development University Computational

2

Other forces matter Higgs

Quantum mechanics spacetime gravity

http://www.preposterousuniverse.com

Page 3: Big Data Meets HPC: Getting better acquainted · Big Data Meets HPC: Getting better acquainted Dan Reed Vice President for Research and Economic Development University Computational

3

Page 4: Big Data Meets HPC: Getting better acquainted · Big Data Meets HPC: Getting better acquainted Dan Reed Vice President for Research and Economic Development University Computational

4

Technical application complexity is rising

… along with multiple optimization axes

C, Fortran,

C++, MPI,

OpenMP

Python, Ruby, R

Cloud/Web

Services

Technical and mainstream software development have diverged

Page 5: Big Data Meets HPC: Getting better acquainted · Big Data Meets HPC: Getting better acquainted Dan Reed Vice President for Research and Economic Development University Computational

5

Page 6: Big Data Meets HPC: Getting better acquainted · Big Data Meets HPC: Getting better acquainted Dan Reed Vice President for Research and Economic Development University Computational

6

The talent in data analytics have shifted from science to companies. We can’t compete.

Astronomy researcher

Vectors(1980s)

Mainframes MPPs(1990s)

Clusters & Grids(2000s)

Clouds, Big Dataand Devices

Page 7: Big Data Meets HPC: Getting better acquainted · Big Data Meets HPC: Getting better acquainted Dan Reed Vice President for Research and Economic Development University Computational

7

Mostly similar technology issues With substantial differences• SAN/local storage models

• Virtualization and scheduling strategies

• Software development tools

• Culture and expectations

Page 8: Big Data Meets HPC: Getting better acquainted · Big Data Meets HPC: Getting better acquainted Dan Reed Vice President for Research and Economic Development University Computational

8

Programming efficiency

Network optimization

Supply chain optimization

Generic server design

Energy optimization

Systemic resilience

Page 9: Big Data Meets HPC: Getting better acquainted · Big Data Meets HPC: Getting better acquainted Dan Reed Vice President for Research and Economic Development University Computational

9

Requested 200 nodes and 2 PB for four years?

Logged onto a node and killed processes just to see what would happen?

Wished you could load containers rather than just applications?

Found your code performance limited by the I/O bandwidth of a Raspberry Pi?

Thought SAN was just a typo in a message meant for Sam?

Asked your system for recommendations?

Wondered why R came after S and C doesn’t matter?

Page 10: Big Data Meets HPC: Getting better acquainted · Big Data Meets HPC: Getting better acquainted Dan Reed Vice President for Research and Economic Development University Computational

10

Ethernet

Switches

Local Node

Storage

x86 +GPUs or

Accelerators

IB+ Enet

Switches

SAN+Local

StorageCommodity

X86 Racks

PFS

(e.g, Lustre)

Batch

SchedulerHDFS (Hadoop File System)System

Monitoring

MPI-OpenMP

CUDA/OpenCL

Perf &

Debug

(e.g.,

PAPI) NA Libs

Applications and Community Codes

Hbase BigTable

(key-value store)

AV

RO

Zo

okeep

er (co

ord

inatio

n)

Map-Reduce Storm

Hive Pig Sqoop Flume

Mahout, R and Applications

Domain-specific Libraries

FORTRAN, C, C++ and IDEs

Clo

ud

Serv

ices (e

.g., A

WS) VMs, Containers and Cloud Services

Page 11: Big Data Meets HPC: Getting better acquainted · Big Data Meets HPC: Getting better acquainted Dan Reed Vice President for Research and Economic Development University Computational

11

SAN vs. node-local temporary data

Software-defined networks (SDNs)

Content distribution networks (CDNs)

Page 12: Big Data Meets HPC: Getting better acquainted · Big Data Meets HPC: Getting better acquainted Dan Reed Vice President for Research and Economic Development University Computational

12

User allocations

Software models

Job scheduling

Page 13: Big Data Meets HPC: Getting better acquainted · Big Data Meets HPC: Getting better acquainted Dan Reed Vice President for Research and Economic Development University Computational

13

Chaos Monkey

Latency Monkey

Janitor Monkey

Conformity Monkey

Doctor Monkey

Security Monkey

Page 14: Big Data Meets HPC: Getting better acquainted · Big Data Meets HPC: Getting better acquainted Dan Reed Vice President for Research and Economic Development University Computational

14

Virtual machines (VMs)

Containers

Implications

Page 15: Big Data Meets HPC: Getting better acquainted · Big Data Meets HPC: Getting better acquainted Dan Reed Vice President for Research and Economic Development University Computational

15

UnstructuredSemi-structured

Next-generation architecture

(mostly open source and cloud based)

Page 16: Big Data Meets HPC: Getting better acquainted · Big Data Meets HPC: Getting better acquainted Dan Reed Vice President for Research and Economic Development University Computational

16

Known questions, the traditional approach

Unknown questions, the big data approach

What information consumes is rather obvious: it consumes the attention of its

recipients. Hence a wealth of information creates a poverty of attention, and a

need to allocate that attention efficiently among the overabundance of

information sources that might consume it.

Herbert Simon

Page 17: Big Data Meets HPC: Getting better acquainted · Big Data Meets HPC: Getting better acquainted Dan Reed Vice President for Research and Economic Development University Computational

17

Item hierarchy (Amazon)

Attributes (Pandora)

Item similarity (Netflix)

User similarity (Walmart)

Social network (Linkedin)

Model based (HPC challenges and needs)

Page 18: Big Data Meets HPC: Getting better acquainted · Big Data Meets HPC: Getting better acquainted Dan Reed Vice President for Research and Economic Development University Computational

18

Clustering (grouping similar items)

Hidden Markov models (HMMs)

Blind signal separation/feature extraction for dimensionality reduction

Artificial neural networks (ANNs)

Page 19: Big Data Meets HPC: Getting better acquainted · Big Data Meets HPC: Getting better acquainted Dan Reed Vice President for Research and Economic Development University Computational

19

Materials

Astronomy

Sensing

Network

science

IoT

Bioscience

Environment

Social

media

Scheduling

Monitoring &

adaptationVMs, containers &

stacksStorage & file

systems Parallel algorithms

Libraries & tools

DSLs

Multicore/GPUs Networks & QoS

Data Analytics

Ecosystem

HPC

EcosystemSpark

Mahout/MLlib

R Python

Storm

PIG

HBase

Zookeeper

MPI

LustreSLURM

OpenMP

PAPI

OpenCL

Graphs StreamingDeep

learning

Core machine

learning

Page 20: Big Data Meets HPC: Getting better acquainted · Big Data Meets HPC: Getting better acquainted Dan Reed Vice President for Research and Economic Development University Computational

20

Page 21: Big Data Meets HPC: Getting better acquainted · Big Data Meets HPC: Getting better acquainted Dan Reed Vice President for Research and Economic Development University Computational

21

Discussion