fundamental methods and heuristics for massive scale data

19
Australia/China SKA Big Data Workshop, April 10 th , 2017 Fundamental Methods and Heuristics for Massive Scale Data Distribution Xiaoying Zheng Shanghai Advanced Research Institute (SARI), Chinese Academy of Sciences

Upload: others

Post on 11-Nov-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Fundamental Methods and Heuristics for Massive Scale Data

Australia/China SKA Big Data Workshop, April 10th, 2017

Fundamental Methods and Heuristics for Massive Scale Data Distribution

Xiaoying ZhengShanghai Advanced Research Institute (SARI),

Chinese Academy of Sciences

Page 2: Fundamental Methods and Heuristics for Massive Scale Data

Australia/China SKA Big Data Workshop2017/04/10

About SARI

• Established since 2009

• Research areas

– frontier studies and advanced manufacturing

– information technology

– space technology

– energy and environment

– health sciences

1

SARI

Shanghai Tech. edu

Shanghai SRF

We’re here

Page 3: Fundamental Methods and Heuristics for Massive Scale Data

Australia/China SKA Big Data Workshop2017/04/10

About Me

• Associate Professor at SARI, CAS

• Research focus

– Networking

modeling and performance evaluation of networks; peer-to-peer networks; content and service distribution; congestion control; network flow control and routing

– Cloud Computing

resource allocation and scheduling

2

Page 4: Fundamental Methods and Heuristics for Massive Scale Data

Australia/China SKA Big Data Workshop2017/04/10

Background

3

• SKA data will be transferred across the globe on a Tbit/s scale

• the network connecting SKA data centers is supposed to be more stable and moderate in size compared with Internet

• the capacity limitation can equally be at the core or the edge

• Proposed approaches– Swarming

– Back-pressure based multipath routing

Page 5: Fundamental Methods and Heuristics for Massive Scale Data

Australia/China SKA Big Data Workshop2017/04/10

Outline

4

• Approach 1: Swarming

• Approach 2: Back-pressure based multipath routing

• FuNet: a SDN-based network testbed at SARI

Page 6: Fundamental Methods and Heuristics for Massive Scale Data

Australia/China SKA Big Data Workshop2017/04/10

Approach 1: Swarming

5

• A big success for P2P file sharing– Files are broken into many

chunks

– Receivers help each other to receive chunks

• Apply swarming to infrastructure-based SKA data distribution– More stable content servers

and networks

A snapshot of swarming. The available file chunks at each host H1 to H4 are shown by shaded boxes. e.g., H1 has chunks 1, 3, 8 and 10. The dashed lines show the current connections, e.g., H1 is downloading chunks 2 and 4 from H2, and chunk 11 from H3.

Page 7: Fundamental Methods and Heuristics for Massive Scale Data

Australia/China SKA Big Data Workshop2017/04/10

Swarming: transmit by multiple multi-cast trees

6

• Equivalent to use multiple Steiner trees to distribute different file chunks

– Some other nodes who are NOT interested in the file may also participate

– to help resource limited distribution session and improve distribution efficiency

• Proposed problem: determine an optimal set of distribution trees as well as the data rate on each tree

Page 8: Fundamental Methods and Heuristics for Massive Scale Data

Australia/China SKA Big Data Workshop2017/04/10

Swarming: example

7

A swarming example. (a) Node 1 distributes a file to node 2 and 3; node 4 is an out-of-session node. The numbers next to the links are the link capacities. (b) All possible distribution trees and the optimal solution for throughput maximization. The boxes represent file chunks. Three of the trees involve the out-of-session node 4.

Page 9: Fundamental Methods and Heuristics for Massive Scale Data

Australia/China SKA Big Data Workshop2017/04/10

Swarming: solutions

8

• An analogy of the min-cost path in single-cast

• Use a min-cost Steiner tree in a time slot, and switch from trees to trees

• Solutions:

– Approximate min-cost Steiner tree search + column generation

– Or random min-cost Steiner tree search

Page 10: Fundamental Methods and Heuristics for Massive Scale Data

Australia/China SKA Big Data Workshop2017/04/10

Outline

9

• Approach 1: Swarming

• Approach 2: Back-pressure based multipath routing

• FuNet: a SDN-based network testbed at SARI

Page 11: Fundamental Methods and Heuristics for Massive Scale Data

Australia/China SKA Big Data Workshop2017/04/10

Approach 2 : back-pressure based multi-path routing

10

• Motivation:

– The bandwidth of Inter SKA data center network is expensive

– Backbone traffic volumes vary over time

– Use the time-varying leftover bandwidth to transmit non-urgent data

Page 12: Fundamental Methods and Heuristics for Massive Scale Data

Australia/China SKA Big Data Workshop2017/04/10

Back-pressure based multi-path routing

11

• Difficulty: the leftover bandwidth is time-varying and unknown– Traditional solution: traffic prediction techniques

• Our solution: back-pressure based multi-path routing– data packets are temporarily stored at intermediate

datacenters, and forwarded to a neighbor node when there are available spare residual bandwidth.

– balance the buffers of two adjacent datacenter nodes as much as possible by pushing data across the link between the two nodes using the residual bandwidth, where the buffer size is regard as the pressure of the buffer

Page 13: Fundamental Methods and Heuristics for Massive Scale Data

Australia/China SKA Big Data Workshop2017/04/10

Example: step 1

12

• A queue is maintained at each data center node for each directional link and each transmission session; push new packets to the source

Page 14: Fundamental Methods and Heuristics for Massive Scale Data

Australia/China SKA Big Data Workshop2017/04/10

Example: step 2

13

• Push packets across each link so as to balance the queues of the link as much as possible

Page 15: Fundamental Methods and Heuristics for Massive Scale Data

Australia/China SKA Big Data Workshop2017/04/10

Example: step 3

14

• Packets arrives at the next hop and are removed from the sink

Page 16: Fundamental Methods and Heuristics for Massive Scale Data

Australia/China SKA Big Data Workshop2017/04/10

Example: step 4

15

• Packets are re-allocated between queues according to the expectation of the leftover bandwidth

Page 17: Fundamental Methods and Heuristics for Massive Scale Data

Australia/China SKA Big Data Workshop2017/04/10

Outline

16

• Approach 1: Swarming

• Approach 2: Back-pressure based multipath routing

• FuNet: a SDN-based network testbed at SARI

Page 18: Fundamental Methods and Heuristics for Massive Scale Data

Australia/China SKA Big Data Workshop2017/04/10

FuNet: a SDN-based network testbed

17

• CAS built a 1G/10G network testbedconnecting 15 cities with 30 hosts, including Beijing, Shanghai, Hefei, Zhengzhou

• Accessible from/to GENI• Supports Protocol-

Oblivious-Forward protocol (POF)– An open-source SDN

protocol stack developed by Huawei

• Be able to support the SKA data transmission prototype development

SARI

Page 19: Fundamental Methods and Heuristics for Massive Scale Data

Australia/China SKA Big Data Workshop2017/04/10 18

Thank you!

[email protected]