load balancing tasks with overlapping requirements milan vojnovic microsoft research joint work with...

Post on 26-Dec-2015

213 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Load Balancing Tasks with Overlapping Requirements

Milan VojnovicMicrosoft Research

Joint work with Dan Alistarh, Christos Gkantsidis, Jennifer Iglesias, Bo Zong

2

Motivating Application Scenario: Stream Processing Platforms

3

Tasks and Requirements

4

5

Problem #1: Bi-Criteria Load Balancing

Query Assignment Problem:

• Find an assignment of tasks to machines that

Criteria 1: minimizes the total number of distinct requirements that need to be supplied to machines

Criteria 2: the number of tasks assigned over machines is balanced

6

Problem #2: Min-Max Load Balancing

Query Assignment Problem:

• Find an assignment of tasks to machines that minimizes the maximum number of distinct requirements needed by a machine

7

Other Motivating Application Scenarios• Scheduling tasks in distributed clusters of machines with data locality

• …

• Beyond resource allocation in data centres:

• Clustering of information objects (documents, images, videos)

• Summarizing topics for collections of documents

• …

8

Related Work

Standard load balancing• Identical machines Graham-1996• Related machines Aspnes et al-1993, Cho and Sahni-

1988• Restricted machines Azar et al-1992• Unrelated machines Aspnes et al-1993• Routing Aspnes et al-1993

Min-max multiway cut Bansal et al-2014Svitkina and Tardos 2004

9

Problem #1: Bi-Criteria Load Balancing

Minimize

subject to

for

set of requirements set of tasks 𝑓 (𝑄′ )=∑

𝑠∈𝑆

𝑤 (𝑠 )1 (𝑠 requiredby some 𝑞∈𝑄 ′)𝑆𝑞⊆𝑆 , for every q∈𝑄

10

NP Hardness

• Query Assignment Problem is NP-complete

Proof: Reduction from the well known bin packing problem

11

Random Query Assignment

• Maximum number of tasks per machine:

with probability

[Raab and Steger, 1998]

• The expected number of requirements needed by the machines:

= number of tasks needing requirement

12

Deficiency of Random Query Assignment

𝑛/ 𝑙

𝑛/ 𝑙

𝑛/ 𝑙

𝑚/ 𝑙

𝑚/ 𝑙

𝑚/ 𝑙

• Expected number of needed requirements:

as

• Optimal:

13

Special Case: Tasks with Singleton Requirements

• There exists a polynomial-time algorithm that guarantees 2-approximation for singleton task requirements with arbitrary weights

14

Algorithm

15

Tasks with Arbitrary Sets of Requirements• For unit-weight requirements, there exists a polynomial algorithm

with approximation ratio

where is maximum number of requirements of a task

• For arbitrary-weight requirements, the same approximation ratio holds but with an extra factor: the ratio of the max to the min weight

16

Gadget: Minimum Task Type Packing

• Given a set of requirements , a set of tasks , and a real number • Find a subset of query types that minimizes

subject to

17

Algorithm

1. Pick an empty machine2. Find a subset of query types that approximately solves MQP problem

with parameter

3. Let be the subset of unassigned queries of type in 4. If then apply a pruning procedure5. If there are unassigned queries, go to 1

18

Experimental Evaluation

• Random bipartite graph for subscriptions of tasks to requirements• Number of tasks per requirement according to a Zipf distribution ()• Number of requirements per task fixed to a constant

• Metric: replication factor

= total number of needed requirements / m

19

Offline Algorithms

• MQP = defined in an earlier slide• OffRand = uniform random assignment of a query type to a machine• IC = Incremental cost• MMS = Min-max traffic cost per machine

20

Performance of Offline Algorithms

Number of requirements per task

21

Online Task Assignment

• LeastCost

• LeastSource

• LeastQT

22

Performance of Online Algorithms

Number of requirements per task

23

Problem #2: Min-Max Load Balancing

Minimize

subject to

24

Online Task Assignment

• At each arrival of task

• Compute for every

• Assign task to machine in

25

Hidden Co-Clustering Input

26

Recovery Theorem

• Suppose and

There exists an online assignment of tasks that guarantees asymptotic recovery of hidden clusters

Proof: coupling to a Polya’s urn process

Asymptotic recovery: portion of tasks from the same hidden cluster of tasks that is assigned to the same bin goes to 1 for asymptotically large number of tasks

27

Experimental Evaluation

• Dataset

• Greedy• Random = random task arrival• Decreasing with respect to the number of requirements

• Balance big = large tasks to least loaded, small items according to greedy• Prefer big = large tasks to least loaded, delayed assignment of up to a fixed number of

small tasks

28

Retail dataset

29

Conclusion

• Studied two variants of non-standard load balancing problems• Bi-criteria and min-max

• Approximation ratios for offline problems• Hidden clustering recovery conditions for a simple greedy online task

assignment strategy• Open questions:• Tighter approximation ratios for offline versions of both problems?• Similar hidden cluster recover questions (allowing for more memory)?

top related