modeling and optimizing large-scale wide-area data transfers

Modeling and Optimizing Large-Scale Wide-Area Data Transfers

Raj Kettimuthu, Gayane Vardoyan, Gagan Agrawal, and P. Sadayappan

Exploding data volumes

100,000 TB

MACHO et al.: 1 TBPalomar: 3 TB

2MASS: 10 TBGALEX: 30 TBSloan: 40 TB

Pan-STARRS: 40,000 TB

2004: 36 TB2012: 2,300 TB

105 increase in data volumes in 6 years

Astronomy

Climate

Genomics

Datasets must frequently be transported over WAN– Analysis, visualization, archival

Data movement bandwidths not increasing at same rate as dataset sizes – Major constraint for data-driven sciences

File transfer - dominant data transfer mode GridFTP - widely used by scientific communities

– 1000s of servers deployed worldwide move >1 PB per day Characterize, control and optimize transfers

Data movement

High-performance, secure data transfer protocol optimized for high-bandwidth wide-area networks

Based on FTP protocol - defines extensions for high-performance operation and security

Globus implementation of GridFTP is widely used. Globus GridFTP servers support usage statistics

collection – Transfer type, size in bytes, start time of the transfer,

transfer duration etc. are collected for each transfer

GridFTP

4

GridFTP usage log

Parallelism vs concurrency in GridFTP

Data Transfer Node at Site B

GridFTP Server Process



Data Transfer Node at Site A

Parallel File System

TCP Connection

Parallelism = 3

Concurrency = 3

TCP Connection

TCP Connection




TCP ConnectionTCP Connection

TCP Connection

TCP ConnectionTCP Connection

TCP Connection

Parallelism vs concurrency

Objective - control bandwidth allocation for transfer(s) from a source to the destination(s)

Most large transfers between supercomputers– Ability to both store and process large amounts of data

Site heavily loaded, most bandwidth consumed by small number of sites

Goal – develop simple model for GridFTP – Source concurrency - total number of ongoing transfers between

the endpoint A and all its major transfer endpoints – Destination concurrency - total number of ongoing transfers

between the endpoint A and the endpoint B– External load - All other activities on the endpoints including

transfers to other sites

Problem formulation

Modeling throughput Linear models

Models that consider only source and destination CC

Separate model for each destination Data to train, validate models – load variation experiments Errors >15% for most cases Log models

Y’ = a1X1 + a2X2 + … + akXk + b

DT = a1*DC + a2*SC + b1 DT = a3 *DC/SC + b2

DT = SCa4 *DCa5 * 2b3 log(DT)=a4*log(SC) + a5*log(DC) + b3

Modeling throughput

Log model better than linear models, still high errors Model based on just SC and DC too simplistic Incorporate external load

– External load - network, disk, and CPU activities outside transfers– How to measure the external load?– How to include external load in model(s)?

External load

Transfers stable over short duration but vary widely over entire day

Multiple training data – same SC, DC - different days & times Throughput differences for same SC, DC attributed to

difference in external load Three different functions for external load (EL) EL1=T −AT, T - throughput for transfer t, AT - average

throughput of all transfers with same SC, DC as t EL2=T−MT, MT - max throughput with same SC, DC as t EL3 = T/MT

Models with external load

ELa11 if EL>0 |EL|(−a11) otherwise

AEL{a11} =

DT = a6*DC + a7*SC + a8*EL + b4

DT = SCa9 * DCa10 * AEL{a11} * 2b5

Linear

Log

Calculating external load in practice

Unlike SC and DC, external load is unknown Multiple data points with same SC, DC used to train models In practice, may not be any recent transfers with same SC, DC Some recent transfers, no substantial change in external load

over few minutes Most recent transfer’s load as current load Average load of transfers in past 30 minutes as current load Average load in the past 30 minutes with error correction

DT = a6*DC + a7*SC + a8*EL + b4

Given Control Unknown

Recent transfers load with error correction

DT = a6*DC + a7*SC + a8*EL + b4

Known Compute

Transfers in past 30 minutes

DT = a6*DC + a7*SC + a8*EL + b4 + e

Historictransfers

Previous Transfer Method

Recent Transfers Method

Recent Transfers with Error Correction

Applying models to control bandwidth

Experimental setup: DTNs at 5 XSEDE sites (Source: TACC, Destinations: PSC, NCAR, NICS, Indiana, SDSC)

Goal – control bandwidth allocation to destinations when source is saturated

Models express throughput in terms of SC, DC, and EL Given target throughput, determine DC to achieve target

– Often more than one destination transfer data, SC is also unknown. Limit DC to 20 to narrow search space

– Even then, large number of possible DC combinations (20n) Heuristics to limit search space to (SCmax – ND + 1)

Experiments

Ratio experiments – allocate available bandwidth at source to destinations using predefined ratio– Achieve specific fraction of bandwidth for each destination– Four ratio combinations

Factoring experiments – increase destination’s throughput by a factor when source is saturated– Bandwidth increase because of certain priorities

Four models/methods (log EL1/EL3 models and RT/RTEC methods) were used – Effective in predicting the throughputs– 83.6% of the errors are below 15%, and 65.5% of them are below 10%

Results – Ratio experiments

Ratios are 4:5:6:8:9 for Kraken, Mason, Blacklight, Gordon, and Yellowstone. Concurrencies picked by Algorithm were {1,3,3,1,1}. Model: log with EL1. Method: RTEC

Ratios are 4:5:6:8:9 for Kraken, Mason, Blacklight, Gordon, and Yellowstone. Concurrencies picked by Algorithm were {1,4,3,1,1}. Model: log with EL3. Method: RT

Results – Factoring experiments

Increasing Gordon’s baseline throughput by 2x. Concurrency picked by picked by Algorithm for Gordon was 5

Increasing Yellowstone’s baseline throughput by 1.5x. Concurrency picked by picked by Algorithm for Yellowstone was 3

Related work

Several models for predicting behavior & finding optimal parallel TCP streams – Uncongested networks, simulations

Several studies developed models to find optimal streams, TCP buffer size for GridFTP – Buffer size not needed with TCP autotuning

Major difference - attempt to model GridFTP throughput based on end-to-end behavior– End-system load, destinations’ capabilities, concurrent transfers

Many studies on bandwidth allocation at router – Our focus is application-level control

Summary

Understand performance of WAN transfers Control bandwidth allocation at FTP level Transfers between major supercomputing centers Concurrency powerful than parallelism Models to help control bandwidth allocation Log models that combine total source CC, destination CC, and

a measure of external load are effective Methods that utilize both recent and historical experimental

data better at estimating external load

Questions

modeling and optimizing large-scale wide-area data transfers

Documents

transfer duration

secure data transfer

gridftpdata transfer

external load elel1

endpoint bexternal load

dc a2

t throughput

external loadtransfers