1 a distributed task scheduler optimizing data transfer time taura lab. kei takahashi (56428) taura...

28
1 A distributed Task Scheduler Optimizing Data Transfer Time Taura lab. Kei Takahashi (56428)

Upload: janice-cook

Post on 03-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 A distributed Task Scheduler Optimizing Data Transfer Time Taura lab. Kei Takahashi (56428) Taura lab. Kei Takahashi (56428)

1

A distributed Task SchedulerOptimizing Data Transfer Time

Taura lab. Kei Takahashi (56428)

Page 2: 1 A distributed Task Scheduler Optimizing Data Transfer Time Taura lab. Kei Takahashi (56428) Taura lab. Kei Takahashi (56428)

2

Task SchedulersTask Schedulers A system which distributes many serial

tasks onto the grid environment►Task assignments►File transfers

Many serial tasks can be executed in parallel

Some constraints need to be considered►Machine availability►Data location

A system which distributes many serial tasks onto the grid environment►Task assignments►File transfers

Many serial tasks can be executed in parallel

Some constraints need to be considered►Machine availability►Data location

Page 3: 1 A distributed Task Scheduler Optimizing Data Transfer Time Taura lab. Kei Takahashi (56428) Taura lab. Kei Takahashi (56428)

3

Data Intensive ApplicationsData Intensive Applications

A computation using large data►Some gigabytes to petabytes►Natural language processing, data mining, etc.►A simple algorithm can extract useful general

knowledge A scheduler need additionally consider:

►Reduction in data transfers►Effective placement of data replicas

A computation using large data►Some gigabytes to petabytes►Natural language processing, data mining, etc.►A simple algorithm can extract useful general

knowledge A scheduler need additionally consider:

►Reduction in data transfers►Effective placement of data replicas

Page 4: 1 A distributed Task Scheduler Optimizing Data Transfer Time Taura lab. Kei Takahashi (56428) Taura lab. Kei Takahashi (56428)

4

An Example of Scheduling An Example of Scheduling The scheduler maps tasks on each

machine

The scheduler maps tasks on each machine

AA BB

SchedulerScheduler Task t0Requires : f0Task t0Requires : f0

Task t1Requires : f1Task t1Requires : f1

Task t2Requires : f1Task t2Requires : f1

File f1File f1 File f0File f0

t1t1

A

B

f0A

B

Shorter processing time

t2t2f1 t1t1

t2t2t0t0

t0t0

Page 5: 1 A distributed Task Scheduler Optimizing Data Transfer Time Taura lab. Kei Takahashi (56428) Taura lab. Kei Takahashi (56428)

5

Related WorkRelated Work Schedulers for data intensive applications

►GrADS: predicts transfer time, but only uses static bandwidth value between two hosts

Efficient multicast

Topology-aware bandwidth simulation►PBS: scheduling parallel jobs to nodes with

consodering network topology

Schedulers for data intensive applications►GrADS: predicts transfer time, but only uses st

atic bandwidth value between two hosts Efficient multicast

Topology-aware bandwidth simulation►PBS: scheduling parallel jobs to nodes with

consodering network topology

Page 6: 1 A distributed Task Scheduler Optimizing Data Transfer Time Taura lab. Kei Takahashi (56428) Taura lab. Kei Takahashi (56428)

6

Topology-aware TransferTopology-aware Transfer If a bandwidth topology map is given, file

transfers can be more precisely estimated►Detecting congestions

By the estimation, more precise optimization becomes possible for the task scheduling

If a bandwidth topology map is given, file transfers can be more precisely estimated►Detecting congestions

By the estimation, more precise optimization becomes possible for the task schedulingCongestionoccurs

Switch

Destination NodesSource Nodes

Page 7: 1 A distributed Task Scheduler Optimizing Data Transfer Time Taura lab. Kei Takahashi (56428) Taura lab. Kei Takahashi (56428)

7

Research Purpose Research Purpose Design and implement a distributed task

scheduler for data intensive applications►Predict data transfer time by using network

topology map with bandwidth►Map tasks onto machine►Plan efficient data transfers

Design and implement a distributed task scheduler for data intensive applications►Predict data transfer time by using network

topology map with bandwidth►Map tasks onto machine►Plan efficient data transfers

Page 8: 1 A distributed Task Scheduler Optimizing Data Transfer Time Taura lab. Kei Takahashi (56428) Taura lab. Kei Takahashi (56428)

8

Input and OutputInput and Output Given information:

►File Locations and size►Network Topology

Output:►Task Scheduling:

• Assignments of nodes to tasks

►Transfer Scheduling:• From/to which host data are transferred• Limit bandwidth during the transfer if needed

The amount of data transfer changes depending on a task schedule

Given information:►File Locations and size►Network Topology

Output:►Task Scheduling:

• Assignments of nodes to tasks

►Transfer Scheduling:• From/to which host data are transferred• Limit bandwidth during the transfer if needed

The amount of data transfer changes depending on a task schedule

Page 9: 1 A distributed Task Scheduler Optimizing Data Transfer Time Taura lab. Kei Takahashi (56428) Taura lab. Kei Takahashi (56428)

9

AgendaAgenda Background Purpose Our Approach Related Work Conclusion

Background Purpose Our Approach Related Work Conclusion

Page 10: 1 A distributed Task Scheduler Optimizing Data Transfer Time Taura lab. Kei Takahashi (56428) Taura lab. Kei Takahashi (56428)

10

Our ApproachOur Approach Task execution time is hard to predict in general

►Schedule one task for each host at a time ►Assign new tasks when a certain number of

hosts have completed, or certain time passed When the data size and bandwidth is known,

file transfer time is predictable►Optimize data transfer time by using network t

opology and bandwidth information• Multicast• Transfer priorities

Task execution time is hard to predict in general►Schedule one task for each host at a time ►Assign new tasks when a certain number of

hosts have completed, or certain time passed When the data size and bandwidth is known,

file transfer time is predictable►Optimize data transfer time by using network t

opology and bandwidth information• Multicast• Transfer priorities

Page 11: 1 A distributed Task Scheduler Optimizing Data Transfer Time Taura lab. Kei Takahashi (56428) Taura lab. Kei Takahashi (56428)

11

Problem FormulationProblem Formulation Final goal: minimizing the makespan Immediate goal: minimizing the sum of time

that takes before each task can start

More immediate goal: minimizing the sum of arrival time of every file on every node

Final goal: minimizing the makespan Immediate goal: minimizing the sum of time

that takes before each task can start

More immediate goal: minimizing the sum of arrival time of every file on every node

))(_(_

,_

taskeveryi

jifileeveryj

filetimetransfer

))(_(( ,_ _max ji

taskeveryi fileeveryj

filetimetransfer

(filei, j : the j th file required by taski )

(filei, j : the j th file required by taski )

Page 12: 1 A distributed Task Scheduler Optimizing Data Transfer Time Taura lab. Kei Takahashi (56428) Taura lab. Kei Takahashi (56428)

12

(An image will come here) (An image will come here)

Page 13: 1 A distributed Task Scheduler Optimizing Data Transfer Time Taura lab. Kei Takahashi (56428) Taura lab. Kei Takahashi (56428)

13

AlgorithmAlgorithm When some nodes are unschduled,

►Create an initial candidate task schedule►For each candidate task schedule:

• Decide priorities on each file transfer (including ongoing transfers)

• Plan efficient file transfer schedule• Estimate the usage of each link, and find the most

crowded link

►Search for a better schedule whose most crowded link is less crowded(by using heuristics like GA, SA)

When some nodes are unschduled,►Create an initial candidate task schedule►For each candidate task schedule:

• Decide priorities on each file transfer (including ongoing transfers)

• Plan efficient file transfer schedule• Estimate the usage of each link, and find the most

crowded link

►Search for a better schedule whose most crowded link is less crowded(by using heuristics like GA, SA)

Page 14: 1 A distributed Task Scheduler Optimizing Data Transfer Time Taura lab. Kei Takahashi (56428) Taura lab. Kei Takahashi (56428)

14

F0F0

F1F1

Transfer Priorities (1)Transfer Priorities (1) If several transfers share a link, they divide the bandwidth

of that link Since a task cannot be started before the “entire” file has

arrived, it is more efficient to put priorities on each transfer► If two transfers are going on, the it is more efficient to

transfer the smaller file first

If several transfers share a link, they divide the bandwidth of that link

Since a task cannot be started before the “entire” file has arrived, it is more efficient to put priorities on each transfer► If two transfers are going on, the it is more efficient to

transfer the smaller file first

F0F0

F1F1

50MBps

50MBps

100MBps

100MBps

Both F0 and F1 arrive20 second after

100MBps 100MBps 100MBps

F1F1

Transfer of F0 completes 10 seconds after, and F1 arrive 20 seconds after

1GB 1GBF0F0

1GB 1GB

Page 15: 1 A distributed Task Scheduler Optimizing Data Transfer Time Taura lab. Kei Takahashi (56428) Taura lab. Kei Takahashi (56428)

15

Transfer Priorities (2)Transfer Priorities (2) Objective: minimizing

Priorities are determined by these criteria:►A file needed by more tasks►A smaller file►A file with less replicas (sources)

Objective: minimizing

Priorities are determined by these criteria:►A file needed by more tasks►A smaller file►A file with less replicas (sources)

))(_(_

,_

taskeveryi

jifileeveryj

filetimetransfer

(filei, j : the j th file required by taski )

Page 16: 1 A distributed Task Scheduler Optimizing Data Transfer Time Taura lab. Kei Takahashi (56428) Taura lab. Kei Takahashi (56428)

16

Transfer PlanningTransfer Planning Transfers are planned from a file with

higher priority (Greedy method)►The highest priority transfer can use as much

bandwidth as possible►A transfer with less priority can use the rest of

the bandwidth The bandwidth for ongoing transfers are

also reassigned

Transfers are planned from a file with higher priority (Greedy method)►The highest priority transfer can use as much

bandwidth as possible►A transfer with less priority can use the rest of

the bandwidth The bandwidth for ongoing transfers are

also reassigned

Page 17: 1 A distributed Task Scheduler Optimizing Data Transfer Time Taura lab. Kei Takahashi (56428) Taura lab. Kei Takahashi (56428)

17

Transfer Priorities (example)Transfer Priorities (example)

Task0, Task1, Task2 are scheduled►File0: 3GB, needed by task0►File1: 2GB, needed by task0, task1, task2►File2: 1GB, needed by task1

Transferring priority:►File1 > File2 > File0

File1 is transferred using multicast pipeline

Task0, Task1, Task2 are scheduled►File0: 3GB, needed by task0►File1: 2GB, needed by task0, task1, task2►File2: 1GB, needed by task1

Transferring priority:►File1 > File2 > File0

File1 is transferred using multicast pipeline

Total BandwidthTask0 Task1 Task2

File1Task0 Task1 Task2

File2Task0 Task1 Task2 Task0 Task1 Task2

File0

Page 18: 1 A distributed Task Scheduler Optimizing Data Transfer Time Taura lab. Kei Takahashi (56428) Taura lab. Kei Takahashi (56428)

18

Pipeline Multicast (1)Pipeline Multicast (1) For a given schedule, it is known which

nodes require which files When multiple nodes need a common file,

a pipeline multicast shortens transfer time(in the case of large files)

The speed of a pipeline broadcast is limited by the narrowest link in the tree

A broadcast can be sped up by efficiently using multiple sources

For a given schedule, it is known which nodes require which files

When multiple nodes need a common file, a pipeline multicast shortens transfer time(in the case of large files)

The speed of a pipeline broadcast is limited by the narrowest link in the tree

A broadcast can be sped up by efficiently using multiple sources

Page 19: 1 A distributed Task Scheduler Optimizing Data Transfer Time Taura lab. Kei Takahashi (56428) Taura lab. Kei Takahashi (56428)

19

Pipeline Multicast (2)Pipeline Multicast (2) The tree is constructed depth-first manner Every related link is only used twice

(upward/downward) Since disk access is as slow as network, the disk

access bandwidth should be also counted

The tree is constructed depth-first manner Every related link is only used twice

(upward/downward) Since disk access is as slow as network, the disk

access bandwidth should be also counted

Source Destination

Page 20: 1 A distributed Task Scheduler Optimizing Data Transfer Time Taura lab. Kei Takahashi (56428) Taura lab. Kei Takahashi (56428)

20

Multi-source MulticastMulti-source Multicast M nodes have the same source data; N nodes need it For each link in the order of bandwidth:

► If the link connects two nodes/switches which are already connected to the source node:

→ Discard the link►Otherwise: → Adopt the link(Kruskal's Algorithm: it maximizes the narrowest link in the pipeline

s)

M nodes have the same source data; N nodes need it For each link in the order of bandwidth:

► If the link connects two nodes/switches which are already connected to the source node:

→ Discard the link►Otherwise: → Adopt the link(Kruskal's Algorithm: it maximizes the narrowest link in the pipeline

s)

Discard this link

Pipeline 1

Pipeline 2Source Destination

Page 21: 1 A distributed Task Scheduler Optimizing Data Transfer Time Taura lab. Kei Takahashi (56428) Taura lab. Kei Takahashi (56428)

21

Find Crowded LinkFind Crowded Link On a certain schedule, for each link,

►list every transfer using that link►sum up transfer size►calculate ”rough transfer time” as follows:

(total transfer size) / (bandwidth) Find the longest “rough transfer time”

On a certain schedule, for each link,►list every transfer using that link►sum up transfer size►calculate ”rough transfer time” as follows:

(total transfer size) / (bandwidth) Find the longest “rough transfer time”

Link 0 Link 1 Link 2 Link 3

Bandwidth bw0 bw1 bw2 Bw3

File 0 size0 0 0 size0

File 1 0 size1 size1 size1

File 2 0 0 0 size2

Rough transfer time

size0 / bw0 size1 / bw1 size1 / bw1 (size0 + size1 + size2 ) / bw3

Page 22: 1 A distributed Task Scheduler Optimizing Data Transfer Time Taura lab. Kei Takahashi (56428) Taura lab. Kei Takahashi (56428)

22

Improve the ScheduleImprove the Schedule After the most crowded link is found, the

scheduler tries to reduce the transfer size by altering task assignments

We are thinking of using GA or Simulated Annealing. Since the most crowded link is known, we can try to reduce the transfer of this link in the mutation phase.

After the most crowded link is found, the scheduler tries to reduce the transfer size by altering task assignments

We are thinking of using GA or Simulated Annealing. Since the most crowded link is known, we can try to reduce the transfer of this link in the mutation phase.

Page 23: 1 A distributed Task Scheduler Optimizing Data Transfer Time Taura lab. Kei Takahashi (56428) Taura lab. Kei Takahashi (56428)

23

Actual TransfersActual Transfers After the transfer schedule has determined,

the plan is performed as simulated In a pipeline transfer, the bandwidth is the s

ame through links The bandwidth of the narrowest link is limite

d to the calculated value When detecting a significant change in ban

dwidth, the schedule is reconstructed►The bandwidth is measured by using existing

methods (eg. nettimer, bprobe)

After the transfer schedule has determined, the plan is performed as simulated

In a pipeline transfer, the bandwidth is the same through links

The bandwidth of the narrowest link is limited to the calculated value

When detecting a significant change in bandwidth, the schedule is reconstructed►The bandwidth is measured by using existing

methods (eg. nettimer, bprobe)

Page 24: 1 A distributed Task Scheduler Optimizing Data Transfer Time Taura lab. Kei Takahashi (56428) Taura lab. Kei Takahashi (56428)

24

Re-scheduling TransfersRe-scheduling Transfers When a file transfer has finished, transfer

schedule is recalculated►Calculating the priority on each task►Assign bandwidth for each task

A new bandwidth value is assigned for each transfer, but the pipeline is not changed

When a file transfer has finished, transfer schedule is recalculated►Calculating the priority on each task►Assign bandwidth for each task

A new bandwidth value is assigned for each transfer, but the pipeline is not changed

Page 25: 1 A distributed Task Scheduler Optimizing Data Transfer Time Taura lab. Kei Takahashi (56428) Taura lab. Kei Takahashi (56428)

25

Task DescriptionTask Description A user need specify required files, output fil

es and dependencies for each task In order to enable flexible task description,

we provide scheduling API for script languages►Files are identified by URIs

(ex. “abc.com:/home/kay/some/location”)►The scheduler analyses dependencies from fil

epath

A user need specify required files, output files and dependencies for each task

In order to enable flexible task description, we provide scheduling API for script languages►Files are identified by URIs

(ex. “abc.com:/home/kay/some/location”)►The scheduler analyses dependencies from fil

epath

Page 26: 1 A distributed Task Scheduler Optimizing Data Transfer Time Taura lab. Kei Takahashi (56428) Taura lab. Kei Takahashi (56428)

26

Task SubmissionTask Submission A task is submitted by calling submit()

with required filepath

A task is submitted by calling submit() with required filepath

fs = sched.list_files("abc.com:/home/kay/some/location/**”)parsed_files = []

for fn in fs: new_fn = fn + ".parsed“ command = "parser " + fn + " " + new_fn sched.submit(command, [new_fn], []) parsed_files.append(new_fn)

sched.gather(parsed_files, “abc.com:/home/kay/parsed/")

Page 27: 1 A distributed Task Scheduler Optimizing Data Transfer Time Taura lab. Kei Takahashi (56428) Taura lab. Kei Takahashi (56428)

27

ConclusionConclusion Introduced new scheduling algorithm

►Predict transfer time by using network topology, and search for a better task schedule

►Efficiently transfer files by limiting bandwidth►Dynamically re-scheduling transfers

Current status:►The implementation is ongoing

Introduced new scheduling algorithm►Predict transfer time by using network

topology, and search for a better task schedule►Efficiently transfer files by limiting bandwidth►Dynamically re-scheduling transfers

Current status:►The implementation is ongoing

Page 28: 1 A distributed Task Scheduler Optimizing Data Transfer Time Taura lab. Kei Takahashi (56428) Taura lab. Kei Takahashi (56428)

28

PublicationsPublications 高橋慧 , 田浦健次朗 , 近山隆 . マイグレーションを支援する分散集合オブジ

ェクト.並列/分散/協調処理に関するサマーワークショップ( SWoPP2005 ),武雄, 2005 年 8 月.

高橋慧 , 田浦健次朗 , 近山隆 . マイグレーションを支援する分散集合オブジェクト . 先進的計算基盤シンポジウム( SACSIS 2005 ),筑波, 2005 年 5月.

高橋慧 , 田浦健次朗 , 近山隆 . マイグレーションを支援する分散集合オブジェクト.並列/分散/協調処理に関するサマーワークショップ( SWoPP2005 ),武雄, 2005 年 8 月.

高橋慧 , 田浦健次朗 , 近山隆 . マイグレーションを支援する分散集合オブジェクト . 先進的計算基盤シンポジウム( SACSIS 2005 ),筑波, 2005 年 5月.