data placement and task scheduling in cloud, online and offline 2014.11.27 赵青 天津科技大学...

12
Data Placement and Task Scheduling in cloud, Online and Offline 2014.11.27 赵赵 赵赵赵赵赵赵 [email protected]

Upload: ralf-baldwin

Post on 02-Jan-2016

239 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Placement and Task Scheduling in cloud, Online and Offline 2014.11.27 赵青 天津科技大学 zhaoqingtj@tust.edu.cn

Data Placement and Task Scheduling in cloud, Online and Offline

2014.11.27

赵青天津科技大学[email protected]

Page 2: Data Placement and Task Scheduling in cloud, Online and Offline 2014.11.27 赵青 天津科技大学 zhaoqingtj@tust.edu.cn

Motivation

● Increase the corresponding speed and throughput

● Guarantee QoS

● Energy Efficient and Green Computing

Page 3: Data Placement and Task Scheduling in cloud, Online and Offline 2014.11.27 赵青 天津科技大学 zhaoqingtj@tust.edu.cn

Overview

● Data placement for data-intensive application

● Task scheduling for QoS and energy efficiency

● Online task scheduling

Page 4: Data Placement and Task Scheduling in cloud, Online and Offline 2014.11.27 赵青 天津科技大学 zhaoqingtj@tust.edu.cn

1. Data Placement for data-intensive application

● Data clustering based on data correlation

if put every 2at different nodes?

how much data transfer amount would be in-creased

BEA

Hierarchical clustering tree

Objective:Place the close-related data items together so as to decrease data transfersContributions: 1. Introduced data size factors2. Issued “First Order Conduction

Correlation” from intermediate data

Page 5: Data Placement and Task Scheduling in cloud, Online and Offline 2014.11.27 赵青 天津科技大学 zhaoqingtj@tust.edu.cn

● Data distribution

Storage capacity, computation load balance “Tree-to-Tree” greedy allocation strategy

Modified PSO algorithm

1. Data Placement for data-intensive application

● Cloud platform modeling

Physical network structure/ BEA

Objective:Make the frequent data movements happene on high-speed channels so as to improve network utilization and the efficiency of the whole cloud system.

Page 6: Data Placement and Task Scheduling in cloud, Online and Offline 2014.11.27 赵青 天津科技大学 zhaoqingtj@tust.edu.cn

1. Data Placement for data-intensive application

● Runtime data placement— Newly generated datasets will be saved to the data center which has

the maximum dependency with it— The cost of re-distribution itself will also be taken into account.

● Results: by the greedy allocation strategies

10% 20% 30% 50%1800

1900

2000

2100

2200

2300

2400

2500

No.3 strategy (without runtime algorithm)

No.4 strategy (with runtime algorithm)

DongYuan's strategy

No.5 strategy

prediction error rate

Tota

l D

ata

Movem

ent

Am

ount

10% 20% 30% 50%1800

1900

2000

2100

2200

2300

2400

2500

No.3 strategy (without runtime algorithm)

No.4 strategy (with runtime algorithm)

DongYuan's strategy

No.5 strategy

prediction error rate

Tota

l Tim

e C

onsum

ed b

y M

ovem

ent

Page 7: Data Placement and Task Scheduling in cloud, Online and Offline 2014.11.27 赵青 天津科技大学 zhaoqingtj@tust.edu.cn

2. Task Scheduling and Virtual Machine Allocation

● Objective:— Distribute the tasks with strong data dependences to the servers on a

high-bandwidth connection, and turn off some of these servers with low utilization

— Therefore:• the response time can be reduced

• the utilization of system wide can be improved

• some idle network devices can also be turned off

● Task Clustering by— Hypergraph partitioning— BEA Transformation

Efficient & Energy Saving!

Page 8: Data Placement and Task Scheduling in cloud, Online and Offline 2014.11.27 赵青 天津科技大学 zhaoqingtj@tust.edu.cn

2. Task Scheduling and Virtual Machine Allocation

● Requirement of tasks— Storage requirement— Computing Resource requirement: represented by VMs.

● Task Scheduling and Deadline constraint:

:)1( mxVM x ),( xmem

xcpu VCVC

),,(:)1( ideadline

iii TWCETVMnit

Decrease the number of VMs as much as possible, while ensuring users’ Service Level Agreements.

Page 9: Data Placement and Task Scheduling in cloud, Online and Offline 2014.11.27 赵青 天津科技大学 zhaoqingtj@tust.edu.cn

2. Task Scheduling and Virtual Machine Allocation

● Physical machine allocation— Optimization objective: energy efficiency, high-bandwidth networks,

load balance

— Greedy Strategy: • Each server’s energy efficiency

• TRD (Task Requirement Degree)

• Top-Down & Bottom-up: reduce data transfers, and improve network utilization

• Load balance

— Constraint conditions: storage capacity, CPU and memory constraints— Other Methods: Genetic algorithms, PSO algorithms

Optimal utilization level in terms of performance-per-watt:Commonly,

yOpt%70yOpt

yyy

yyyx OptUtilUtil

OptUtilorUtilTRD

0,

0,0

%1001

x

CPU

m

x

xCPU

yx

y C

RQUtil

Page 10: Data Placement and Task Scheduling in cloud, Online and Offline 2014.11.27 赵青 天津科技大学 zhaoqingtj@tust.edu.cn

3. Online Scheduling

● Problems:— How to schedule the tasks in a fine-grained workflow?— How to deal with some variable conditions at runtime?

● Reinforcement learning based methods

T

t tttt sasrhR

1 11 ),,()(

dhhRhpJ )()|()(

),|(),|()()|( 111 tttttTt saasspsphp

The goal of RL is to find the optimal pol-icy parameter

)(maxarg* J

Agent Environment

State s

Action a

Reward r

Page 11: Data Placement and Task Scheduling in cloud, Online and Offline 2014.11.27 赵青 天津科技大学 zhaoqingtj@tust.edu.cn

3. Online Scheduling

Example: Cart-Pole Swing-up

● Task: swing up the pole by moving the cart

● State (2-D continuous): angle , and velocity of the pole

● Action (1-D continuous): force applied to cart

● Reward:

]2,0[ ]3,3[

)cos(),,( 11 tttt sasr

Page 12: Data Placement and Task Scheduling in cloud, Online and Offline 2014.11.27 赵青 天津科技大学 zhaoqingtj@tust.edu.cn

Thank for your time!