cost minimization for big data processing in geo- …cssongguo/papers/bigdata14-ppt.pdf · 2014....

17
COST MINIMIZATION FOR BIG DATA PROCESSING IN GEO- DISTRIBUTED DATA CENTERS 1 Song Guo The University of Aizu Homepage: http://www.u-aizu.ac.jp/~sguo Email: [email protected]

Upload: others

Post on 04-Sep-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: COST MINIMIZATION FOR BIG DATA PROCESSING IN GEO- …cssongguo/papers/bigdata14-ppt.pdf · 2014. 12. 2. · expected task completion time in closed form. We explore the big data placement

COST MINIMIZATION FOR BIG

DATA PROCESSING IN GEO-

DISTRIBUTED DATA CENTERS

1

Song Guo

The University of Aizu

Homepage: http://www.u-aizu.ac.jp/~sguo

Email: [email protected]

Page 2: COST MINIMIZATION FOR BIG DATA PROCESSING IN GEO- …cssongguo/papers/bigdata14-ppt.pdf · 2014. 12. 2. · expected task completion time in closed form. We explore the big data placement

System Model

• Topology:

– geo-distributed data centers (DCs) connected with switches

• Cost:

– Inter-DC cost “CR” vs Intra-DC cost “CL”

– Server cost when a server is turned on

CR

CL

2

Page 3: COST MINIMIZATION FOR BIG DATA PROCESSING IN GEO- …cssongguo/papers/bigdata14-ppt.pdf · 2014. 12. 2. · expected task completion time in closed form. We explore the big data placement

What Is The Problems?

• Where to put the data and computation? (Data & task placement)

– Same server : “0”

– Same DC : “CL” (0 < CL < CR)

– Different DCs : “CR”

• How to utilize physical resources of servers?

– Server ON/OFF (DCR)

– To balance storage and computation resources

• How to route the data transmission?

– What is the transmission rate?

– What is the transmission path? (Data flow routing)

3

Page 4: COST MINIMIZATION FOR BIG DATA PROCESSING IN GEO- …cssongguo/papers/bigdata14-ppt.pdf · 2014. 12. 2. · expected task completion time in closed form. We explore the big data placement

General Problem Formulation

• What is our objective?

– To minmize the total cost: Both server cost and network cost

• What is the constraints?

– Data and task placement

– Hadoop Distributed File System

– Data flow transmission

– QoS satisfaction

– 2D Markov Chain

4

Page 5: COST MINIMIZATION FOR BIG DATA PROCESSING IN GEO- …cssongguo/papers/bigdata14-ppt.pdf · 2014. 12. 2. · expected task completion time in closed form. We explore the big data placement

Data and Task Placement

• Multiple copies of data and at least one task computation unit for each

task must be put in a server

• Each required resource (storage and computation, etc.) must not exceed

the server capacity

• The total task rate in all servers shall equal to original user task rate

• If a storage or computation unit is located in a server, this server must be

turned on

5

Page 6: COST MINIMIZATION FOR BIG DATA PROCESSING IN GEO- …cssongguo/papers/bigdata14-ppt.pdf · 2014. 12. 2. · expected task completion time in closed form. We explore the big data placement

Hadoop Distributed File System

6

• P- copy storage policy

• HDFS data distribution example (P=3)

Rack 1 Rack 2 Rack 3 Rack 4 Rack 5

Page 7: COST MINIMIZATION FOR BIG DATA PROCESSING IN GEO- …cssongguo/papers/bigdata14-ppt.pdf · 2014. 12. 2. · expected task completion time in closed form. We explore the big data placement

Data Flow Transmission

7

Rack 1 Rack 2

2

4

5

5

4

2

1

Rack N

Rack 1

CL

CR

1

1

DC 2

DC 1

Storage

Computation

Page 8: COST MINIMIZATION FOR BIG DATA PROCESSING IN GEO- …cssongguo/papers/bigdata14-ppt.pdf · 2014. 12. 2. · expected task completion time in closed form. We explore the big data placement

Data Flow Transmission

• Only severs with data residence can be flow source nodes

• The total outgoing flow from source nodes shall not exceed the user

request rate λ

• the destination receives all data from others only when it does not hold a

copy of data

8

Page 9: COST MINIMIZATION FOR BIG DATA PROCESSING IN GEO- …cssongguo/papers/bigdata14-ppt.pdf · 2014. 12. 2. · expected task completion time in closed form. We explore the big data placement

QoS Satisfaction

• Fluid flow model

– Pipelined transmission

– Computation process starts ASA first chunk arrives

9

Bottleneck

Page 10: COST MINIMIZATION FOR BIG DATA PROCESSING IN GEO- …cssongguo/papers/bigdata14-ppt.pdf · 2014. 12. 2. · expected task completion time in closed form. We explore the big data placement

2D Markov Chain

Data

Storage Computation User

• Step 1: User requests arrive with rate λ

• Step 2: Data is transmitted to the computation unit with rate γ

• Step 3: Computation is executed with rate μ

Cloud services

Results

Rate λ Rate μ Rate γ

10

This process can be modeled by a 2D Markov Chain

Page 11: COST MINIMIZATION FOR BIG DATA PROCESSING IN GEO- …cssongguo/papers/bigdata14-ppt.pdf · 2014. 12. 2. · expected task completion time in closed form. We explore the big data placement

2D Markov Chain

• 2D Markov chain

– User request rate λ

– Computaion rate μ

– Data transmission rate γ

• Computation can happen when

and only when data arrives

– The total system delay T will be affected by λ , μ and γ

– Computaion rate μ is related to how much computation

resource is distributed to each task

– Data transmission rate γ is related to the data flow path

– T shall not exceed the QoS

11

Page 12: COST MINIMIZATION FOR BIG DATA PROCESSING IN GEO- …cssongguo/papers/bigdata14-ppt.pdf · 2014. 12. 2. · expected task completion time in closed form. We explore the big data placement

QoS Satisfaction

• By solving the ODEs, we can derive the state probability πjk(p, q) as:

12

• When B goes to infinity, the mean number of tasks for chunk k on

server j Tjk is

• Finally,

Page 13: COST MINIMIZATION FOR BIG DATA PROCESSING IN GEO- …cssongguo/papers/bigdata14-ppt.pdf · 2014. 12. 2. · expected task completion time in closed form. We explore the big data placement

Notations

13

Page 14: COST MINIMIZATION FOR BIG DATA PROCESSING IN GEO- …cssongguo/papers/bigdata14-ppt.pdf · 2014. 12. 2. · expected task completion time in closed form. We explore the big data placement

Formulation

14

Data&Request Placement

Data Flow Transmission

QoS Satisfaction

Page 15: COST MINIMIZATION FOR BIG DATA PROCESSING IN GEO- …cssongguo/papers/bigdata14-ppt.pdf · 2014. 12. 2. · expected task completion time in closed form. We explore the big data placement

Performance Evaluation

15

Page 16: COST MINIMIZATION FOR BIG DATA PROCESSING IN GEO- …cssongguo/papers/bigdata14-ppt.pdf · 2014. 12. 2. · expected task completion time in closed form. We explore the big data placement

Performance Evaluation

16

• Our proposal outperforms the traditional mechanism under all settings

• Our proposal saves approximately 20% overall cost than the traditional

“locate computation with data” mechanism

Page 17: COST MINIMIZATION FOR BIG DATA PROCESSING IN GEO- …cssongguo/papers/bigdata14-ppt.pdf · 2014. 12. 2. · expected task completion time in closed form. We explore the big data placement

Contributions

• We propose a two-dimensional Markov chain and derive the

expected task completion time in closed form. We explore the big

data placement problem to answer the following questions:

– a) how to place these data chunks in the servers,

– b) how to distribute tasks onto servers without violating the resource

constraints, and

– c) how to resize data centers to achieve the operation cost minimization

goal.

Previous works ONLY focus on the “locate data with computation” policy, but

we show that jointly consider “data and computation location” will give a

better performance in cost minimization.

17