convergence of high performance computing and …...2018/05/23  · introduction hpc devices...

56
Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References Convergence of High Performance Computing and Big Data Michael Kuhn [email protected] Scientific Computing Department of Informatics Universität Hamburg https://wr.informatik.uni-hamburg.de -- Michael Kuhn Convergence of High Performance Computing and Big Data /

Upload: others

Post on 09-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

Convergence of High Performance Computing and Big Data

Michael [email protected]

Scientific ComputingDepartment of Informatics

Universität Hamburghttps://wr.informatik.uni-hamburg.de

2018-05-23

Michael Kuhn Convergence of High Performance Computing and Big Data 1 / 53

Page 2: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

About Us: Scientific Computing

Analysis of parallel I/OI/O & energy tracing toolsMiddleware optimization

Alternative I/O interfacesData reduction techniquesCost & energy e�iciency

We are an Intel Parallel Computing Center for Lustre(“Enhanced Adaptive Compression in Lustre”)

Michael Kuhn Convergence of High Performance Computing and Big Data 2 / 53

Page 3: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

1 Introduction and Motivation

2 High Performance Computing

3 Storage Devices and Arrays

4 File Systems

5 Parallel Distributed File Systems

6 Libraries

7 Convergence of HPC and Big Data

8 Summary

9 References

Michael Kuhn Convergence of High Performance Computing and Big Data 3 / 53

Page 4: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

Command Line Meets Big Data

Big Data tools may not always be the best suited for a problem [1]Use MapReduce to compute win/loss ratios of chess games

Archive has a size of 1.75 GB and contains two million chess games

MapReduce job took roughly 26 minutes (which equals a throughput of 1.14 MB/s)Archive has a size of 3.46 GB

Command line tools can perform the same job in roughly 12 seconds (270 MB/s)

Michael Kuhn Convergence of High Performance Computing and Big Data 4 / 53

Page 5: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

Command Line Meets Big Data

Big Data tools may not always be the best suited for a problem [1]Use MapReduce to compute win/loss ratios of chess games

Archive has a size of 1.75 GB and contains two million chess gamesMapReduce job took roughly 26 minutes (which equals a throughput of 1.14 MB/s)

Archive has a size of 3.46 GB

Command line tools can perform the same job in roughly 12 seconds (270 MB/s)

Michael Kuhn Convergence of High Performance Computing and Big Data 4 / 53

Page 6: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

Command Line Meets Big Data

Big Data tools may not always be the best suited for a problem [1]Use MapReduce to compute win/loss ratios of chess games

Archive has a size of 1.75 GB and contains two million chess gamesMapReduce job took roughly 26 minutes (which equals a throughput of 1.14 MB/s)

Archive has a size of 3.46 GBCommand line tools can perform the same job in roughly 12 seconds (270 MB/s)

Michael Kuhn Convergence of High Performance Computing and Big Data 4 / 53

Page 7: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

Command Line Meets Big Data

Big Data tools may not always be the best suited for a problem [1]Use MapReduce to compute win/loss ratios of chess games

Archive has a size of 1.75 GB and contains two million chess gamesMapReduce job took roughly 26 minutes (which equals a throughput of 1.14 MB/s)

Archive has a size of 3.46 GBCommand line tools can perform the same job in roughly 12 seconds (270 MB/s)

find . -type f -name '*.pgn' -print0 | xargs -0 -n4 -P4 mawk '/Result/ { split($0, a,↪→ "-"); res = substr(a[1], length(a[1]), 1); if (res == 1) white++; if (res ==↪→ 0) black++; if (res == 2) draw++ } END { print white+black+draw, white, black,↪→ draw }' | mawk '{games += $1; white += $2; black += $3; draw += $4; } END {↪→ print games, white, black, draw }'

Michael Kuhn Convergence of High Performance Computing and Big Data 4 / 53

Page 8: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

So�ware Ecosystems [4, 2]

Michael Kuhn Convergence of High Performance Computing and Big Data 5 / 53

Page 9: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

1 Introduction and Motivation

2 High Performance Computing

3 Storage Devices and Arrays

4 File Systems

5 Parallel Distributed File Systems

6 Libraries

7 Convergence of HPC and Big Data

8 Summary

9 References

Michael Kuhn Convergence of High Performance Computing and Big Data 6 / 53

Page 10: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

Architectures

Until 2005: performance increase via clock rateSince 2005: performance increase via core count

Clock rate can not be increased furtherEnergy consumption/heat dissipation depends on clock rateThe largest supercomputers have more than 10,000,000 cores

Categorization using memory connectionShared and distributed memoryHybrid systems are commonly found in practice

Michael Kuhn Convergence of High Performance Computing and Big Data 7 / 53

Page 11: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

Threads vs. Processes

Threads share one common address spaceCommunication using shared memory segmentsIf one thread crashes, all of them crash

Processes have their own address spacesCommunication usually happens via messagesOverhead is typically higher than for shared memory

Hybrid approaches in practiceA few processes per node (for example, one per socket)Many threads per process (for example, one per core)

Michael Kuhn Convergence of High Performance Computing and Big Data 8 / 53

Page 12: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

Parallelization

Parallel applications run on multiple nodesIndependent processes, communication via messagesOpenMP threads within processes

MPI provides communication operationsMPI is the de-facto standardProcess groups, synchronization, communication, reduction etc.Point-to-point and collective communication

MPI also supports input/outputParallel I/O for shared files

Michael Kuhn Convergence of High Performance Computing and Big Data 9 / 53

Page 13: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

Latencies

Level LatencyL1 cache ≈ 1 nsL2 cache ≈ 5 nsL3 cache ≈ 10 ns

RAM ≈ 100 nsInfiniBand ≈ 500 ns

Ethernet ≈ 100,000 nsSSD ≈ 100,000 ns

HDD ≈ 10,000,000 ns

Table: Latencies [5, 3]

Processors require data fastCaches should be used optimally

Additional latency due to I/O and networkMichael Kuhn Convergence of High Performance Computing and Big Data 10 / 53

Page 14: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

Performance and Energy E�iciency

Supercomputers are very complexPerformance yield is typically not optimalInteraction of many di�erent components

Processors, caches, main memory, network, storage systemPerformance analysis is an important topic

Supercomputers are very expensiveShould be used as e�iciently as possibleProcurement costs: e 40,000,000–250,000,000Operating costs (per year):

Sunway TaihuLight (rank 1): 15.4 MW ≈e 15,400,000 (in Germany)DKRZ (rank 42): 1.1 MW ≈e 1,100,000

Michael Kuhn Convergence of High Performance Computing and Big Data 11 / 53

Page 15: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

I/O Layers

Application

Libraries

Storage device/array

Parallel distributed file system

File system

Perf

orm

ance

ana

lysi

s

Opt

imiz

atio

ns

Data

redu

ctio

n

I/O is o�en responsible for performance problemsHigh latency causes idle processorsI/O is o�en still serial

I/O stack is layeredMany di�erent components are involved in storing dataAn unoptimized layer can significantly decrease performance

Michael Kuhn Convergence of High Performance Computing and Big Data 12 / 53

Page 16: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

1 Introduction and Motivation

2 High Performance Computing

3 Storage Devices and Arrays

4 File Systems

5 Parallel Distributed File Systems

6 Libraries

7 Convergence of HPC and Big Data

8 Summary

9 References

Michael Kuhn Convergence of High Performance Computing and Big Data 13 / 53

Page 17: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

Hard Disk Drives

First HDD: 1956IBM 350 RAMAC (3,75 MB,8,8 KB/s, 1.200 RPM)

HDD developmentCapacity: factor of 100 every10 yearsThroughput: factor of 10every 10 years

Figure: HDD development [12]

Michael Kuhn Convergence of High Performance Computing and Big Data 14 / 53

Page 18: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

Solid-State Drives

BenefitsRead/write throughput: factor of 10–20Latency: factor of 100Energy consumption: factor of 1–10

DrawbacksPrice: factor of 7–10Write cycles: 10.000–100.000Complexity

Di�erent optimal access sizes for reads and writesAddress translation, thermal issues etc.

Michael Kuhn Convergence of High Performance Computing and Big Data 15 / 53

Page 19: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

RAID

RAIDRAID 0: stripingRAID 1: mirroringRAID 2/3: bit/byte stripingRAID 4: block stripingRAID 5/6: block striping

FailuresHDDs usually have roughly the same ageFabrication defects within same batch

ReconstructionRead errors on other HDDsDuration (30 min in 2004, 11 h in 2017)

Michael Kuhn Convergence of High Performance Computing and Big Data 16 / 53

Page 20: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

RAID. . . [10]

Michael Kuhn Convergence of High Performance Computing and Big Data 17 / 53

Page 21: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

RAID. . . [10]

Michael Kuhn Convergence of High Performance Computing and Big Data 18 / 53

Page 22: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

1 Introduction and Motivation

2 High Performance Computing

3 Storage Devices and Arrays

4 File Systems

5 Parallel Distributed File Systems

6 Libraries

7 Convergence of HPC and Big Data

8 Summary

9 References

Michael Kuhn Convergence of High Performance Computing and Big Data 19 / 53

Page 23: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

Tasks

StructureTypically files and directoriesHierarchical organizationOther approaches: tagging

Management of data and metadataBlock allocationAccess permissions, timestamps etc.

File systems use underlying storage devices or arraysLogical Volume Manager (LVM) and/or mdadm

Michael Kuhn Convergence of High Performance Computing and Big Data 20 / 53

Page 24: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

I/O Interfaces

Requests are realized through I/O interfacesForwarded to the file systemDi�erent abstraction levels

Low-level functionality: POSIX, MPI-IO etc.High-level functionality: HDF, NetCDF etc.

1 fd = open("/path/to/file", O_RDWR | O_CREAT | O_TRUNC, S_IRUSR | S_IWUSR);2 nb = write(fd, data, sizeof(data));3 rv = close(fd);4 rv = unlink("/path/to/file");

Initial access via pathA�erwards access via file descriptor (few exceptions)

Functions are located in libcLibrary executes system calls

Michael Kuhn Convergence of High Performance Computing and Big Data 21 / 53

Page 25: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

Virtual File System (Switch)

Central file system component in the kernelSets file system structure and interface

Forwards applications’ requests based on mount pointEnables supporting multiple di�erent file systems

Applications are still portable due to POSIXPOSIX: standardized interface for all file systems

Syntaxopen, close, creat, read, write, lseek, chmod, chown, stat etc.

Semanticswrite: “POSIX requires that a read(2) which can be proved to occur a�er a write() hasreturned returns the new data. Note that not all filesystems are POSIX conforming.”

Michael Kuhn Convergence of High Performance Computing and Big Data 22 / 53

Page 26: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

Virtual File System (Switch). . . [11]Applications (Processes)

VFS

malloc

BIOs (Block I/O)

The Linux Storage Stack Diagramhttp://www.thomas-krenn.com/en/wiki/Linux_Storage_Stack_Diagram

Created by Werner Fischer and Georg SchönbergerLicense: CC-BY-SA 3.0, see http://creativecommons.org/licenses/by-sa/3.0/

ext2 ext3

btrfs

ext4

xfs ifsiso9660

...

NFS coda

Network FS

gfs ocfssmbfs ...

pseudo FS specialpurpose FSproc sysfs

futexfs

usbfs ...

tmpfs ramfs

devtmpfspipefs

network

mmap(anonymous pages)

block based FS

read

(2)

wri

te(2

)

op

en(2

)

stat(

2)

chm

od

(2)

...

PageCache

mdraid...

stackable

devices on top of “normal”block devices drbd

optional

LVM

BIOs (Block I/O)

ceph

struct bio- sector on disk - bio_vec cnt- bio_vec index- bio_vec list

- sector cnt

direct I/O(O_DIRECT)

device mapperdm-crypt dm-mirror

dm-thindm-cache bcache

Michael Kuhn Convergence of High Performance Computing and Big Data 23 / 53

Page 27: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

File System Objects

User vs. system viewUsers see files and directoriesSystem manages inodes

Relevant for stat etc.

FilesContain data as byte arraysCan be read and written (explicitly)Can be mapped to memory (implicit)

DirectoriesContain files and directoriesStructures the namespace

Michael Kuhn Convergence of High Performance Computing and Big Data 24 / 53

Page 28: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

Modern File Systems

File system demands are growingData integrity, storage management, convenience functionality

Error rate for SATA HDDs: 1 in 1014 to 1015 bits [9]That is, one bit error per 12,5–125 TBAdditional bit errors in RAM, controller, cable, driver etc.

Error rate can be problematicAmount can be reached in daily useBit errors can occur in the superblock

File system does not have knowledge about storage arrayKnowledge is important for performanceFor example, special options for ext4

Michael Kuhn Convergence of High Performance Computing and Big Data 25 / 53

Page 29: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

1 Introduction and Motivation

2 High Performance Computing

3 Storage Devices and Arrays

4 File Systems

5 Parallel Distributed File Systems

6 Libraries

7 Convergence of HPC and Big Data

8 Summary

9 References

Michael Kuhn Convergence of High Performance Computing and Big Data 26 / 53

Page 30: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

Definition

Parallel file systemsAllow parallel access to shared resourcesAccess should be as e�icient as possible

Distributed file systemsData and metadata is distributed across multiple serversSingle servers do not have a complete view

Naming is inconsistentO�en just “parallel file system” or “cluster file system”

Data

Clients

Server

File

Michael Kuhn Convergence of High Performance Computing and Big Data 27 / 53

Page 31: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

Architecture

Network

Clients

Server

Access to file system via I/O interfaceTypically standardized, frequently POSIX

Interface consists of syntax and semanticsSyntax defines operations, semantics defines behavior

Michael Kuhn Convergence of High Performance Computing and Big Data 28 / 53

Page 32: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

Semantics

POSIX has strong consistency/coherence requirementsChanges have to be visible globally a�er writeI/O should be atomic

POSIX for local file systemsRequirements easy to support due to VFS

Contrast: Network File System (NFS)Same syntax, di�erent semantics

Session semanticsChanges only visible to other clients a�er session endsclose writes changes and returns potential errors

MPI-IO has been designed for parallel I/OLess strict for higher scalability

Michael Kuhn Convergence of High Performance Computing and Big Data 29 / 53

Page 33: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

Architecture

Network

Clients

MDSsMDTs

OSSsOSTs

Separate servers for data and metadataProduce di�erent access patterns

Michael Kuhn Convergence of High Performance Computing and Big Data 30 / 53

Page 34: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

Architecture. . .

Blocks

Server

Stripes

B1B0 B5 B6B3 B4B2 B7

B1

B7B5 B6

B3B2B0B4

File is split into blocks, distributed across serversIn this case, with a round-robin distribution

Distribution does not have to start at first serverMichael Kuhn Convergence of High Performance Computing and Big Data 31 / 53

Page 35: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

Performance

Parallel distributed file systems enable massive high performance storage systemsBlizzard (DKRZ, GPFS)

Capacity: 7 PBThroughput: 30 GB/s

Mistral (DKRZ, Lustre)Capacity: 54 PiBThroughput: 450 GB/s (5.9 GB/s per node)IOPS: 80,000 operations/s (phase 1 only)

Titan (ORNL, Lustre)Capacity: 40 PBThroughput: 1.4 TB/s

Michael Kuhn Convergence of High Performance Computing and Big Data 32 / 53

Page 36: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

1 Introduction and Motivation

2 High Performance Computing

3 Storage Devices and Arrays

4 File Systems

5 Parallel Distributed File Systems

6 Libraries

7 Convergence of HPC and Big Data

8 Summary

9 References

Michael Kuhn Convergence of High Performance Computing and Big Data 33 / 53

Page 37: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

Overview

POSIX and MPI-IO can be used for parallel I/OBoth interfaces are not very convenient for developers

ProblemsExchangeability of data, complex programming, performance

Libraries o�er additional functionalitySelf-describing data, internal structuring, abstract I/O

Alleviating existing problemsSIONlib (performance)NetCDF, HDF (exchangeability)ADIOS (abstract I/O)

Michael Kuhn Convergence of High Performance Computing and Big Data 34 / 53

Page 38: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

SIONlib

Mainly exists to circumvent deficiencies in existing file systemsOn the one hand, problems with many files

Low metadata performance but high data performanceOn the other hand, shared file access also problematic

POSIX requires locks, access pattern very important

O�ers e�icient access to process-local filesAccesses are mapped to one or a few physical filesAligned to file system blocks/stripes

Backwards-compatible and convenient to useWrappers for fread and fwriteOpening and closing via special functions

Michael Kuhn Convergence of High Performance Computing and Big Data 35 / 53

Page 39: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

NetCDF

Developed by Unidata Program CenterUniversity Corporation for Atmospheric Research

Mainly used for scientific applicationsEspecially in climate science, meteorology and oceanography

Consists of libraries and data formats1 Classic format (CDF-1)2 Classic format with 64 bit o�sets (CDF-2)3 NetCDF-4 format

Data formats are open standardsClassic formats are international standards of the Open Geospatial Consortium

Michael Kuhn Convergence of High Performance Computing and Big Data 36 / 53

Page 40: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

HDF

Consists of libraries and data formatsAllows managing self-describing data

Current version is HDF5HDF4 is still actively supported

Problems with previous versionsComplex API, limitations regarding 32 bit addressing

Used as base for many higher-level librariesNetCDF-4 (climate science)NeXus (neutron, x-ray and muon science)

Michael Kuhn Convergence of High Performance Computing and Big Data 37 / 53

Page 41: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

ADIOS

ADIOS is heavily abstractedNo byte- or element-based accessDirect support for application data structures

Designed for high performanceMainly for scientific applicationsCaching, aggregation, transformation etc.

I/O configuration is specified via an XML fileDescribes relevant data structuresCan be used to generate code automatically

Developers specify I/O on a high abstraction levelNo contact to middleware or file system

Michael Kuhn Convergence of High Performance Computing and Big Data 38 / 53

Page 42: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

ADIOS. . .

1 <adios-config host-language="C">2 <adios-group name="checkpoint">3 <var name="rows" type="integer"/>4 <var name="columns" type="integer"/>5 <var name="matrix" type="double" dimensions="rows,columns"/>6 </adios-group>7 <method group="checkpoint" method="MPI"/>8 <buffer size-MB="100" allocate-time="now"/>9 </adios-config>

Data is combined in groupsI/O methods can be specified per groupBu�er sizes etc. can be configured

Michael Kuhn Convergence of High Performance Computing and Big Data 39 / 53

Page 43: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

Interaction. . .

ADIOS

NetCDF

POSIX

HDF

SIONlib MPI-IO

Parallel application

NetCDF

MPI-IO

Storage device/array

ADIO

HDF5

Lustre

ldiskfs

Kernelspace

Userspace

Data transformationTransport through all layersLoss of information

Complex interactionOptimizations and workarounds on all layersInformation about other layersAnalysis is complex

Convenience vs. performanceStructured data in applicationByte stream in POSIX

Michael Kuhn Convergence of High Performance Computing and Big Data 40 / 53

Page 44: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

1 Introduction and Motivation

2 High Performance Computing

3 Storage Devices and Arrays

4 File Systems

5 Parallel Distributed File Systems

6 Libraries

7 Convergence of HPC and Big Data

8 Summary

9 References

Michael Kuhn Convergence of High Performance Computing and Big Data 41 / 53

Page 45: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

So�ware Ecosystems [4, 2]

Michael Kuhn Convergence of High Performance Computing and Big Data 42 / 53

Page 46: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

So�ware Ecosystems. . .

The importance of Big Data is growingPerformance is o�en not optimal due to Ethernet, HTTP etc.

Hadoop usually makes use of HDFSData is copied to local storage devicesCommunication via HTTP

HPC frameworks are starting to support Big Data applicationsLustre, OrangeFS etc.

Michael Kuhn Convergence of High Performance Computing and Big Data 43 / 53

Page 47: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

Interfaces [8]

Michael Kuhn Convergence of High Performance Computing and Big Data 44 / 53

Page 48: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

Big Data. . .

Concepts from the Big Data and cloud environments can be applied in HPCElasticity to dynamically adapt the file systemFor example, to add servers during high load situations

Object stores provide e�icient storage abstractionsMany applications do not need POSIX file systemsMPI-IO functionality can be mapped to object stores

Investigated as part of the BigStorage projectExample: Týr is a blob storage system with support for transactions [7]For more information, see http://bigstorage-project.eu

Michael Kuhn Convergence of High Performance Computing and Big Data 45 / 53

Page 49: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

Big Data. . . [6]

Michael Kuhn Convergence of High Performance Computing and Big Data 46 / 53

Page 50: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

1 Introduction and Motivation

2 High Performance Computing

3 Storage Devices and Arrays

4 File Systems

5 Parallel Distributed File Systems

6 Libraries

7 Convergence of HPC and Big Data

8 Summary

9 References

Michael Kuhn Convergence of High Performance Computing and Big Data 47 / 53

Page 51: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

Summary

Achieving high performance I/O is a complex taskMany layers: storage devices, file systems, libraries etc.

File systems organize data and metadataModern file systems provide additional functionality

Parallel distributed file systems allow e�icient accessData is distributed across multiple servers

I/O libraries facilitate ease of useExchangeability of data is an important factor

HPC and Big Data technologies are convergingEnable more versatile and e�icient systems

Michael Kuhn Convergence of High Performance Computing and Big Data 48 / 53

Page 52: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

1 Introduction and Motivation

2 High Performance Computing

3 Storage Devices and Arrays

4 File Systems

5 Parallel Distributed File Systems

6 Libraries

7 Convergence of HPC and Big Data

8 Summary

9 References

Michael Kuhn Convergence of High Performance Computing and Big Data 49 / 53

Page 53: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

References I

[1] Adam Drake. Command-line Tools can be 235x Faster than your Hadoop Cluster.https://adamdrake.com/command-line-tools-can-be-235x-faster-than-your-hadoop-cluster.html.

[2] J.-C. Andre, G. Antoniu, M. Asch, R. Badia Sala, M. Beck, P. Beckman, T. Bidot,F. Bodin, F. Cappello, A. Choudhary, B. de Supinski, E. Deelman, J. Dongarra,A. Dubey, G. Fox, H. Fu, S. Girona, W. Gropp, M. Heroux, Y. Ishikawa, K. Keahey,D. Keyes, W. Kramer, J.-F. Lavignon, Y. Lu, S. Matsuoka, B. Mohr, T. Moore, D. Reed,S. Requena, J. Saltz, T. Schulthess, R. Stevens, M. Swany, A. Szalay, W. Tang,G. Varoquaux, J.-P. Vilotte, R. Wisniewski, Z. Xu, and I. Zacharov. Big Data andExtreme-Scale Computing: Pathways to Convergence. Technical report, EXDCIProject of EU-H2020 Program and University of Tennessee, 01 2018.

Michael Kuhn Convergence of High Performance Computing and Big Data 50 / 53

Page 54: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

References II

http://www.exascale.org/bdec/sites/www.exascale.org.bdec/files/whitepapers/bdec2017pathways.pdf.

[3] Jian Huang, Karsten Schwan, and Moinuddin K. Qureshi. Nvram-aware logging intransaction systems. Proc. VLDB Endow., 8(4):389–400, December 2014.

[4] John Russell. New Blueprint for Converging HPC, Big Data.https://www.hpcwire.com/2018/01/18/new-blueprint-converging-hpc-big-data/.

[5] Jonas Bonér. Latency Numbers Every Programmer Should Know.https://gist.github.com/jboner/2841832.

Michael Kuhn Convergence of High Performance Computing and Big Data 51 / 53

Page 55: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

References III

[6] Pierre Matri, Yevhen Alforov, Alvaro Brandon, Michael Kuhn, Philip H. Carns, andThomas Ludwig. Could blobs fuel storage-based convergence between HPC and bigdata? In 2017 IEEE International Conference on Cluster Computing, CLUSTER 2017,Honolulu, HI, USA, September 5-8, 2017, pages 81–86, 2017.

[7] Pierre Matri, Alexandru Costan, Gabriel Antoniu, Jesús Montes, and María S. Pérez.Týr: blob storage meets built-in transactions. In Proceedings of the InternationalConference for High Performance Computing, Networking, Storage and Analysis, SC2016, Salt Lake City, UT, USA, November 13-18, 2016, pages 573–584, 2016.

[8] OrangeFS Development Team. OrangeFS Documentation.http://docs.orangefs.com/.

[9] Seagate. Desktop HDD. http://www.seagate.com/www-content/datasheets/pdfs/desktop-hdd-8tbDS1770-9-1603DE-de_DE.pdf.

Michael Kuhn Convergence of High Performance Computing and Big Data 52 / 53

Page 56: Convergence of High Performance Computing and …...2018/05/23  · Introduction HPC Devices FileSystems ParallelFSs Libraries Convergence Summary References ReferencesIII [6]PierreMatri,YevhenAlforov,AlvaroBrandon,MichaelKuhn,PhilipH.Carns,and

Introduction HPC Devices File Systems Parallel FSs Libraries Convergence Summary References

References IV

[10] Veera Deenadhayalan. General Parallel File System (GPFS) Native RAID.https://www.usenix.org/legacy/events/lisa11/tech/slides/deenadhayalan.pdf.

[11] Werner Fischer and Georg Schönberger. Linux Storage Stack Diagramm. https://www.thomas-krenn.com/de/wiki/Linux_Storage_Stack_Diagramm.

[12] Wikipedia. Hard disk drive.http://en.wikipedia.org/wiki/Hard_disk_drive.

Michael Kuhn Convergence of High Performance Computing and Big Data 53 / 53