reliable and efficient grid data placement using stork and diskrouter

31
Reliable and Efficient Grid Data Placement using Stork and DiskRouter Tevfik Kosar University of Wisconsin-Madison [email protected] April 15 th , 2004

Upload: blade

Post on 07-Jan-2016

44 views

Category:

Documents


8 download

DESCRIPTION

Reliable and Efficient Grid Data Placement using Stork and DiskRouter. Tevfik Kosar University of Wisconsin-Madison [email protected] April 15 th , 2004. A Single Project. LHC (Large Hadron Collider) Comes online in 2006 Will produce 1 Exabyte data by 2012 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Reliable and Efficient      Grid Data Placement using Stork and DiskRouter

Reliable and Efficient Grid Data Placement

using Stork and DiskRouter

Tevfik Kosar University of Wisconsin-Madison

[email protected]

April 15th, 2004

Page 2: Reliable and Efficient      Grid Data Placement using Stork and DiskRouter

A Single Project..

LHC (Large Hadron Collider) Comes online in 2006 Will produce 1 Exabyte data by 2012 Accessed by ~2000 physicists, 150

institutions, 30 countries

Page 3: Reliable and Efficient      Grid Data Placement using Stork and DiskRouter

And Many Others..

Genomic information processing applicationsBiomedical Informatics Research Network (BIRN) applicationsCosmology applications (MADCAP)Methods for modeling large molecular systems Coupled climate modeling applicationsReal-time observatories, applications, and data-management (ROADNet)

Page 4: Reliable and Efficient      Grid Data Placement using Stork and DiskRouter

The Same Big Problem..

Need for data placement: Locate the data Send data to processing sites Share the results with other sites Allocate and de-allocate storage Clean-up everythingDo these reliably and efficiently

Page 5: Reliable and Efficient      Grid Data Placement using Stork and DiskRouter

Outline

IntroductionStorkDiskRouterCase StudiesConclusions

Page 6: Reliable and Efficient      Grid Data Placement using Stork and DiskRouter

Stork

A scheduler for data placement activities in the GridWhat Condor is for computational jobs, Stork is for data placement Stork comes with a new concept:“Make data placement a first class

citizen in the Grid.”

Page 7: Reliable and Efficient      Grid Data Placement using Stork and DiskRouter

The Concept

• Stage-in

• Execute the Job

• Stage-out

Stage-in

Execute the job

Stage-outRelease input space

Release output space

Allocate space for input & output data

Individual Jobs

Page 8: Reliable and Efficient      Grid Data Placement using Stork and DiskRouter

The Concept

• Stage-in

• Execute the Job

• Stage-out

Stage-in

Execute the job

Stage-outRelease input space

Release output space

Allocate space for input & output data

Data Placement Jobs

Computational Jobs

Page 9: Reliable and Efficient      Grid Data Placement using Stork and DiskRouter

DAGMan

The Concept

CondorJob

QueueDaP A A.submitDaP B B.submitJob C C.submit…..Parent A child BParent B child CParent C child D, E…..

C

StorkJob

Queue

E

DAG specification

A CBD

E

F

Page 10: Reliable and Efficient      Grid Data Placement using Stork and DiskRouter

Why Stork?

Stork understands the characteristics and semantics of data placement jobs.Can make smart scheduling decisions, for reliable and efficient data placement.

Page 11: Reliable and Efficient      Grid Data Placement using Stork and DiskRouter

Failure Recovery and Efficient Resource Utilization

Fault tolerance Just submit a bunch of data placement jobs,

and then go away..

Control number of concurrent transfers from/to any storage system Prevents overloading

Space allocation and De-allocations Make sure space is available

Page 12: Reliable and Efficient      Grid Data Placement using Stork and DiskRouter

Support for Heterogeneity

Protocol translation using Stork memory buffer.

Page 13: Reliable and Efficient      Grid Data Placement using Stork and DiskRouter

Support for Heterogeneity

Protocol translation using Stork Disk Cache.

Page 14: Reliable and Efficient      Grid Data Placement using Stork and DiskRouter

Flexible Job Representation and Multilevel Policy Support[

Type = “Transfer”; Src_Url =

“srb://ghidorac.sdsc.edu/kosart.condor/x.dat”; Dest_Url =

“nest://turkey.cs.wisc.edu/kosart/x.dat”;…………Max_Retry = 10;Restart_in = “2 hours”;

]

Page 15: Reliable and Efficient      Grid Data Placement using Stork and DiskRouter

Run-time AdaptationDynamic protocol selection[ dap_type = “transfer”; src_url = “drouter://slic04.sdsc.edu/tmp/test.dat”; dest_url = “drouter://quest2.ncsa.uiuc.edu/tmp/test.dat”; alt_protocols = “nest-nest, gsiftp-gsiftp”;]

[ dap_type = “transfer”; src_url = “any://slic04.sdsc.edu/tmp/test.dat”; dest_url = “any://quest2.ncsa.uiuc.edu/tmp/test.dat”;]

Page 16: Reliable and Efficient      Grid Data Placement using Stork and DiskRouter

Run-time Adaptation

Run-time Protocol Auto-tuning[

link = “slic04.sdsc.edu – quest2.ncsa.uiuc.edu”; protocol = “gsiftp”;

bs = 1024KB; //block sizetcp_bs = 1024KB; //TCP buffer sizep = 4;

]

Page 17: Reliable and Efficient      Grid Data Placement using Stork and DiskRouter

Outline

IntroductionStorkDiskRouterCase StudiesConclusions

Page 18: Reliable and Efficient      Grid Data Placement using Stork and DiskRouter

DiskRouter

A mechanism for high performance, large scale data transfersUses hierarchical buffering to aid in large scale data transfers Enables application-level overlay network for maximizing bandwidthSupports application-level multicast

Page 19: Reliable and Efficient      Grid Data Placement using Stork and DiskRouter

Store and Forward

Improves performance when bandwidth fluctuation between A and B is independent of the bandwidth fluctuation between B and C

DiskRouter

With DiskRouter

Without DiskRouter

A

B

C

Page 20: Reliable and Efficient      Grid Data Placement using Stork and DiskRouter

DiskRouter Overlay Network

A B

90 Mb/s

Page 21: Reliable and Efficient      Grid Data Placement using Stork and DiskRouter

DiskRouter Overlay Network

A B

DiskRouter

90 Mb/s

400 Mb/s 400 Mb/s

C

Add a DiskRouter Node C which is not necessarily on the path from A to B, to enforce use of an

alternative path.

Page 22: Reliable and Efficient      Grid Data Placement using Stork and DiskRouter

Data Mover/Distributed Cache

Source writes to the closest DiskRouter and Destination receives it up from its closest DiskRouter

Source Destination

DiskRouter Cloud

Page 23: Reliable and Efficient      Grid Data Placement using Stork and DiskRouter

Outline

IntroductionStorkDiskRouterCase StudiesConclusions

Page 24: Reliable and Efficient      Grid Data Placement using Stork and DiskRouter

Case Study I: SRB-UniTree Data Pipeline

Transfer ~3 TB of DPOSS data from SRB @SDSC to UniTree @NCSAA data pipeline created with Stork and DiskRouter

SRB Server UniTree

Server

SDSC Cache

NCSA Cache

Submit Site

Page 25: Reliable and Efficient      Grid Data Placement using Stork and DiskRouter

UniTree not responding Diskrouter reconfigured and restarted

SDSC cache reboot & UW CS Network outage Software problem

Failure Recovery

Page 26: Reliable and Efficient      Grid Data Placement using Stork and DiskRouter

Case Study -II

Page 27: Reliable and Efficient      Grid Data Placement using Stork and DiskRouter

Dynamic Protocol Selection

Page 28: Reliable and Efficient      Grid Data Placement using Stork and DiskRouter

Runtime Adaptation

Before Tuning:

• parallelism = 1

• block_size = 1 MB

• tcp_bs = 64 KBAfter Tuning:

• parallelism = 4

• block_size = 1 MB

• tcp_bs = 256 KB

Page 29: Reliable and Efficient      Grid Data Placement using Stork and DiskRouter

Conclusions

Regard data placement as first class citizen.Introduce a specialized scheduler for data placement.Introduce a high performance data transfer tool.End-to-end automation, fault tolerance, run-time adaptation, multilevel policy support, reliable and efficient transfers.

Page 30: Reliable and Efficient      Grid Data Placement using Stork and DiskRouter

Future work

Enhanced interaction between Stork, DiskRouter and higher level planners co-scheduling of CPU and I/O

Enhanced authentication mechanismsMore run-time adaptation

Page 31: Reliable and Efficient      Grid Data Placement using Stork and DiskRouter

You don’t have to FedEx your data anymore.. We deliver it for you!

For more information Stork:

• Tevfik Kosar• Email: [email protected]• http://www.cs.wisc.edu/condor/stork

DiskRouter:• George Kola• Email: [email protected]• http://www.cs.wisc.edu/condor/diskrouter