easy and instantaneous processing for data-intensive workflows

27
November 15th, 2010 MTAGS 2010 New Orleans, USA Easy and Instantaneous Processing for Data-Intensive Workflows Nan Dun , Kenjiro Taura, and Akinori Yonezawa Graduate School of Information Science and Technology The University of Tokyo Contact Email: [email protected]

Upload: others

Post on 12-Sep-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Easy and Instantaneous Processing for Data-Intensive Workflows

November 15th, 2010 MTAGS 2010 New Orleans, USA

Easy and Instantaneous Processing for Data-Intensive WorkflowsNan Dun, Kenjiro Taura, and Akinori YonezawaGraduate School of Information Science and TechnologyThe University of TokyoContact Email: [email protected]

Page 2: Easy and Instantaneous Processing for Data-Intensive Workflows

Background

✤ More computing resources

✤ Desktops, clusters, clouds, and supercomputers

✤ More data-intensive applications

✤ Astronomy, bio-genome, medical science, etc.

✤ More domain researchers using distributed computing

✤ Know their workflows well, but with few system knowledge

2

Page 3: Easy and Instantaneous Processing for Data-Intensive Workflows

Motivation

3

Domain Researchers System People

•Able to use resources provided by universities, institutions, etc.

•Know their apps well

•Know little about system, esp. distributed systems

•Only able to control resources they administrate

•Know system well

•Know little domain apps

Would you please help me

running my applications?

OK, but teach me about your

applications first

Actually, you can do it by yourself on any machines!

Page 4: Easy and Instantaneous Processing for Data-Intensive Workflows

Outlines

✤ Brief description of our processing framework

✤ GXP parallel/distributed shell

✤ GMount distributed file system

✤ GXP Make workflow engine

✤ Experiments

✤ Practices in clusters and supercomputer

✤ From the view point of underlying data sharing, since our target is data-intensive!

4

Page 5: Easy and Instantaneous Processing for Data-Intensive Workflows

Usage: How simple it is!

1. Write workflow description in makefile

2. Resource exploration (Start from one single node!)

• $ gxpc use ssh clusterA clusterB$ gxpc explore clusterA[[000-020]] clusterB[[100-200]]

3. Deploy distributed file system

• $ gmnt /export/on/each/node /file/system/mountpoint

4. Run workflow

• $ gxpc make -f makefile -j N_parallelism5

Page 6: Easy and Instantaneous Processing for Data-Intensive Workflows

GXP Parallel/Distributed Shell

✤ GXP shell magic

✤ Install and start from one single node

✤ Implemented in Python, no compilation

✤ Support various login channels, e.g. SSH, RSH, TORQUE

✤ Efficiently issue command and invoke processes on many nodes in parallel

6

SSH

RSH

TORQUE

$gxpc e ls

ls

ls

ls

ls

ls

ls

ls

ls

ls

Page 7: Easy and Instantaneous Processing for Data-Intensive Workflows

GMount Distributed File System

✤ Building block: SSHFS-MUX

✤ Mount multiple remote directories to local one

✤ SFTP protocol over SSH/Socket channel

✤ Parallel mount

✤ Use GXP shell to execute on every nodes.

7

B

DA

C

$gmnt

sshfsm B C D

sshfsm Asshfsm A

sshfsm A

Page 8: Easy and Instantaneous Processing for Data-Intensive Workflows

GMount (Cont.)

8

✤ GMount features esp. for wide-area environments

✤ No centralized servers

✤ Locality-aware file lookup

✤ Efficient when application has access locality

✤ New file created locally

Cluster A

Cluster B

Cluster DCluster C

Page 9: Easy and Instantaneous Processing for Data-Intensive Workflows

Shared by GMount

GXP Make Workflow Engine

✤ Fully compatible with GNU Make

✤ Straightforward to write data-oriented applications

✤ Integrated in GXP shell

✤ Practical dispatching throughput in wide-area

✤ 62 tasks/sec in InTrigger vs. 56 tasks/sec by Swift+Falkon in TeraGrid

9

B

D

A

C

$gxp make out.dat: in.dat run a.job run b.job run c.job run.d.job

a.job

b.job

c.job

d.job

Page 10: Easy and Instantaneous Processing for Data-Intensive Workflows

GXP Make (Cont.)

✤ Why Make is Good?

✤ Straightforward to write data-oriented applications

✤ Expressive: embarrassly parallel, MapReduce, etc.

✤ Fault tolerance: continue at failure point

✤ Easy to debug: “make -n” option

✤ Concurrency control: “make -j” option

✤ Widely used and thus easy to learn10

Page 11: Easy and Instantaneous Processing for Data-Intensive Workflows

Evaluation

11

✤ Experimental environments

✤ InTrigger multi-cluster platform

✤ 16 clusters, 400 nodes, 1600 CPU cores

✤ Connected by heterogeneous wide-area links

✤ HA8000 cluster system

✤ 512 nodes, 8192 cores

✤ Highly coupled network

Page 12: Easy and Instantaneous Processing for Data-Intensive Workflows

Evaluation

12

✤ Benchmark

✤ ParaMark

✤ Parallel metadata I/O benchmark

✤ Real-world application

✤ Event recognition from PubMed database

Medline XML

Text Extraction

Protein Name Recognizer

Enju ParserSagae’s

Dependency Parser

Event Recognizer

Event Structure

Page 13: Easy and Instantaneous Processing for Data-Intensive Workflows

Task Characteristics

13

0

500

1,000

1,500

2,000

2,500

3,000

0102030405060708090

Proc

essi

ng T

ime

(sec

)

Input Files

File

Siz

e (K

B)

Processing Time File Size

Page 14: Easy and Instantaneous Processing for Data-Intensive Workflows

Experiments (Cont.)

14

✤ Comparison to another two different sharing approaches

M

NFS Raid

M M

N N N N N N

Master Site

DSS

MDSClient

DSS

Client DSS

Gfarm Distributed File System SSHFS-MUX All-to-One Mount

Page 15: Easy and Instantaneous Processing for Data-Intensive Workflows

Summary of Data Sharing

15

Operations NFS SSHFS-MUX All-to-One Gfarm GMount

Metadata

I/O

Central server Central server Metada

ServersLocalty-aware

Operations

Central server Central server Data Server Data Server

Page 16: Easy and Instantaneous Processing for Data-Intensive Workflows

Transfer Rate in WAN

16

0

5

10

15

20

4K 8K 16K 32K 64K 128K 256K 512K 1M 2M 4M 8M 16M

Thro

ughp

ut (M

B/se

c)

Block Size (Bytes)

Gfarm SSHFSM (direct) SSHFS Iperf

Page 17: Easy and Instantaneous Processing for Data-Intensive Workflows

Workflow in LAN

✤ Single cluster using NFS and SSHFSM All-to-One, 158 tasks, 72 workers

17

0

42.5

85

0 750 1500 2250 3000

Para

llelis

m

Execution Time (sec)

NFSSSHFSM All-to-One

Page 18: Easy and Instantaneous Processing for Data-Intensive Workflows

Workflow in WAN

✤ 11 clusters using SSHFSM All-to-One, 821 tasks, 584 workers

18

0

300

600

0 1250 2500 3750 5000

Para

llelis

m

Execution Time (sec)

All JobsLong Jobs

Long jobs dominate the execution time

Page 19: Easy and Instantaneous Processing for Data-Intensive Workflows

GMount vs. Gfarm

19

1E+00

1E+01

1E+02

1E+03

1E+04

1E+05

2 4 8 16

ops/

sec

# of Concurrent Clients

Gfarm in LANGfarm in WANGMount in WAN

Aggregate Metadata Performance

1

10

100

1000

10000

100000

2 4 8 16

MBs

/sec

# of Concurrent Clients

Gfarm ReadGfarm WriteGMount ReadGMount Write

Aggregate I/O Performance

Page 20: Easy and Instantaneous Processing for Data-Intensive Workflows

GMount vs. Gfarm (Cont.)

✤ 4 clusters using Gfarm and GMount, 159 tasks, 252 workers

20

0

87.5

175

0 875 1750 2625 3500

Para

llelis

m

Execution Time (sec)

GMountGfarm

15% speed up

Many new files are created

Page 21: Easy and Instantaneous Processing for Data-Intensive Workflows

GMount vs. Gfarm (Cont.)

21

0.01

0.1

1

10

100

1,000

Elas

ped

Tim

e (s

econ

d)

Small Jobs only Create New Empty Files on “/” directory

“Create” Jobs on GMount“Create” Jobs on Gfarm

Page 22: Easy and Instantaneous Processing for Data-Intensive Workflows

ha8000

On Supercomputer

22

Lustre FS

ha8000

ha8000

ha8000

ha8000

ha8000

gateway

gateway

gateway

clokoxxx

clokoxxx

clokoxxx

clokoxxx

hongo

cloko cluster

hongo cluster

hongo clusterGXP Make Master

sshfs mount

Page 23: Easy and Instantaneous Processing for Data-Intensive Workflows

On Supercomputer (Cont.)

23

0

1,500

3,000

4,500

6,000

7,500

9,000

0 5000 10000 15000 20000 25000 30000 35000 40000 45000 50000

• Allocated 6 hours time slot

• External workers appended

Page 24: Easy and Instantaneous Processing for Data-Intensive Workflows

Conclusion

✤ GXP shell + GMount + GXP Make

✤ Simplicity: no need stack of middleware, wide compatibility

✤ Easiness: effortlessly and rapidly build

✤ User-level: no privilege required, by any users

✤ Adaptability: uniform interface for clusters, clouds, and supercomputer

✤ Scalability: scales to hundreds of nodes

✤ Performance: high throughput in wide-area environments24

Page 25: Easy and Instantaneous Processing for Data-Intensive Workflows

Future Work

✤ Improve GMount for better create performance

✤ Improve GXP Make for better and smart scheduling

✤ Better user interface: configuration ---> workflow execution

✤ Further reduce installation cost

✤ Implement SSHFS-MUX in Python

25

Page 26: Easy and Instantaneous Processing for Data-Intensive Workflows

Open Source Software

✤ SSHFS-MUX/GMount

✤ http://sshfsmux.googlecode.com/

✤ GXP parallel/distributed shell

✤ http://gxp.sourceforge.net/

✤ ParaMark

✤ http://paramark.googlecode.com/

26

Page 27: Easy and Instantaneous Processing for Data-Intensive Workflows

Questions?