department of computer science, jinan university, guangzhou, p.r. china lijun lyu, junjie xie, yuhui...

41
Department of Computer Science, Jinan University, Guangzhou, P.R. China jun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zh ICA3PP 2014: The 14th International Conference on Algorithms & Architectures for Parallel Processing. August 24-27, Dalian, China.

Upload: chloe-lester

Post on 22-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International

Department of Computer Science, Jinan University, Guangzhou, P.R. China

Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou

ICA3PP 2014: The 14th International Conference on Algorithms & Architectures for Parallel Processing. August 24-27, Dalian, China.

Page 2: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International

• Motivation

• Challenges

• Related work

• Our idea

• System architecture

• Evaluation

• Conclusion

2

Page 3: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International

• The Explosive Growth of Data IDC: 1,800EB data in 2011, 40-60% annual increase

Larger Data Center Google: 19 data centers > 1 million servers

Higher traffic Cisco forecasts that annual traffic in global data centers will

nearly triple over the next 5 years and reach 7.7ZB by the end of 2017

3Google Data Center

Page 4: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International

• Data Center Network Node increment Scalability? Failures are common Fault tolerance?

Google MapReduce in a 4,000-node cluster: 5 nodes fail during a job 1 disk fails every 6 hours

Bandwidth-hungry services Network capacity?Infrastructure services: MapReduce, GFS, …

Network applications: Cloud disk, Video, …

Page 5: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International

• Tree-based Structure Traditional tree

Bandwidth bottleneck, Single points of failure, Expensive

Modified tree: Fat-tree High capacity Limited scalability

5

Traditional Tree-based StructureFat-tree

Page 6: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International

• Other novel, hybrid network structures Physical topology

Level-based, but not tree-based Recursively defined

Routing mechanism No routers, without traditional internet routing mechanism Put routing intelligence on servers Take advantage of structural properties

Typical structures DCell, FiConn, BCube, Totoro…

6

Page 7: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International

• DCell

7

• Totoro

• FiConn

• BCube

• Physical structures

Page 8: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International

• Routing mechanisms

8

DCell Totoro FiConn BCube

Core idea Divide-and-ConquerCorrect different

address digits

Calculation Hop by hop Full path

Link state Broadcast domain Path probing

Path selection Dijkstra + Rerouting Greedy Available one

Traffic-aware No mention Yes No mention

Shortest distance

No Yes

Page 9: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International

• What we achieve: Athena Routing Mechanism Routing algorithm

Based on Dynamic Programming Find the shortest path with lower complexity than classic algorithms Support Multi-path

Path probing mechanism Bypass the failed nodes & links Traffic-aware

PropertiesMore resilient, shorter latency, higher capacity, Lower complexity

9

Page 10: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International

• Athena Routing Mechanism Implement on the structure of Totoro Compare with the original Totoro Fault-tolerant Routing

Algorithm (TFR) and Shortest Path Algorithm (SPA, based on Floyd-Warshall).

Applicable to DCell, FiConn, BCube… Similar topology: level-based, recursively defined.. Put routing intelligence on servers

10

Page 11: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International

• Totoro Two-port servers Low-end switches Level-based Recursively defined

two-port NIC

11Totoro Structure of One Level

Page 12: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International

• Building Totoro Connect N servers to an N-port switch Here, N=4 Basic partition: Totoro0

Intra-switch

A Totoro0 Structure 12

Page 13: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International

• Building Totoro Available ports in Totoro0: c. Here, c=4

Connect n Totoro0s to n-port switches by using c/2 ports

Inter-switch

A Totoro1 structure consists of n Totoro0s. 13

Page 14: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International

• Building Totoro Connect n Totoroi-1s to n-port switches to build a

Totoroi

Recursively defined Half of available ports ⇒ Open & Scalable The number of paths among Totorois is n/2 times of

the number of paths among Totoroi-1s ⇒ Multi-redundant links ⇒ High network capacity

14

Page 15: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International

15

0, 0, 0 0, 0, 10, 0, 2 0, 0, 3 0, 1, 0 0, 1, 1 0, 1, 20, 1, 3 0, 2, 0 0, 2, 1 0, 2, 20, 2, 3 0, 3, 0 0, 3, 1 0, 3, 2 0, 3, 3

3, 2, 33, 2, 23, 2, 13, 2, 0 3, 3, 33, 3, 23, 3, 13, 3, 03, 1, 33, 1, 23, 1, 13, 1, 03, 0, 33, 0, 23, 0, 13, 0, 0 2, 3, 32, 3, 22, 3, 02, 2, 32, 2, 22, 2, 12, 2, 02, 1, 32, 1, 22, 1, 12, 1, 02, 0, 32, 0, 22, 0, 1

1-0, 0 1-0, 1

1-2, 11-2, 01-3, 0 1-3, 1

2-0 2-1 2-2 2-3

1-1, 0 1-1, 1

1, 0, 0 1, 0, 11, 0, 2 1, 0, 3 1, 1, 0 1, 1, 1 1, 1, 21, 1, 3 1, 2, 0 1, 2, 1 1, 2, 21, 2, 3 1, 3, 1 1, 3, 2 1, 3, 31, 3, 0

2, 3, 12, 0, 0

Level -1 Link

Level -2 Link

Totoro2 structure with N = 4, n = 4, K = 2.

Page 16: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International

16

• Athena Routing Algorithm (ARA) Based on Dynamic Programming (DP) Applicable to problems which exhibit the properties of

Overlapping subproblems Optimal substructure

Recursively calculate

Page 17: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International

17

Steps of ARA: 1.Suppose src and dst belong to two partitions.2.Get all paths connecting these two partitions.3.For each path, recursively calculate it.4.Store all paths.5.Sort all path by length.6.Remove the extra paths.

This function is based on the corresponding structural properties.

Cartesian product

Page 18: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International

18

• Case study of ARA work out the path from src to dst

Page 19: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International

19

• Case study of ARA Step. 1: src and dst belong to two different sub-

partitions respectively

Page 20: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International

20

• Case study of ARA Step. 2: there exist two paths between these two sub-

partitions

Page 21: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International

21

• Case study of ARA Step. 3: for Path 1, recursively work out the sub-paths

in these sub-partitions, and join them for a full path

Page 22: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International

22

• Case study of ARA Step. 4: similarly, work out the full path for Path 2

Page 23: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International

23

• Case study of ARA Step. 5: add all paths into the result set

Page 24: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International

24

• Case study of ARA Step. 5: sort the paths by lengths

Page 25: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International

25

• Case study of ARA Step. 5: remove the extra paths (here, we suppose the

size of set to return is 1, i.e., it is the shortest path)

Page 26: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International

26

• Path Probing Mechanism Source host sends the probing request packets Destination host sends probing reply packets Intermediate servers record the link capacities in the

probing packets and forward them

Page 27: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International

27

• Path Probing Mechanism Detect the failed paths No extra rerouting technique

is required Detect the link capacity Support load balance…

Page 28: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International

28

Page 29: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International

29

Page 30: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International

30

• Protocol Implementation ARM Packet format

Path-probing packetData packet

Page 31: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International

31

• Protocol Implementation Protocol

2.5-layer protocol

How an intermediate server determines the next hop? A fact: two adjacent servers in a path only differ at one “bit” Hence, we only store the different “bit”s in the vector.

Page 32: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International

• Evaluating Path Failure & Average Path Lengths ARM vs. TFR vs. SPA

TFR: the original Totoro Fault-tolerant Routing algorithm

SPA: Shortest Path Algorithm, Floyd-Warshall, performance bound

• Evaluating Resource Usage

32

Page 33: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International

33

• Evaluating Path Failure & Average Path Lengths Experimental parameters

Types of failures Link, Node, Switch & Rack failures

Platform Totoro2 (4096 servers)

Failures ratios 2% - 20%

Communication mode All-to-all

Simulation times 20 times

Page 34: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International

34

• Evaluating Path Failure Path failure ratio vs. server/rack failure ratio

The performance of ARM/TFR are almost identical to that of SPA!

Page 35: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International

35

• Evaluating Path Failure Path failure ratio vs. switch failure ratio

The performance of ARM is almost identical to that of SPA!

But TFR isn’t.

Page 36: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International

36

• Evaluating Path Failure Path failure ratio vs. link failure ratio When a high link failure occurs:

ARM achieves slightly better capacity than TFR. Performance gap between ARM and SPA still exists!

SPA traverse all feasible links in the whole structure until finding a valid path!

This is a tradeoff that ARM makes to facilitate algorithmic complexity and save computation resources.

Page 37: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International

37

• Evaluating Average Path Lengths

ARM:1.Better than TFR.2.Almost identical to SPA.3.Shorter than SPA, this is because the path failure ratio of ARM is a bit higher than that of SPA, thus our total path length is shorter.

Page 38: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International

38

• Evaluating Resource Usage Experimental parameters

Testbed Lenovo T350, Quad-core, 8GB memory

Platform Totoro2 (4096 servers)

Size of each result 10 paths

Communication mode One-to-all in 4 Totoro1

Page 39: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International

39

• Evaluating Resource Usage

+10nodes/s

28%

18s

0%

CPU:1.Increase by 10 per second2.Peak value of 28% at 18s3.Benefited from the cache

Memory:For each host, it only costs 164KB at most.

Page 40: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International

• More resilient• Shorter latency

• Higher capacity• Lower complexity

• In the future work, we will focus on the implementation of ARM in DCell, FiConn and other structures!

40

Page 41: Department of Computer Science, Jinan University, Guangzhou, P.R. China Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou ICA3PP 2014: The 14th International

41

ICA3PP 2014: The 14th International Conference on Algorithms & Architectures for Parallel Processing. August 24-27, Dalian, China.