on retries in parallel distributed load balancing …medinamo/research/theses/mscthesis.pdf · load...

TEL AVIV UNIVERSITY The Iby and Aladar Fleischman Faculty of Engineering

The Zandman-Slaner School of Graduate Studies

ON RETRIES IN PARALLEL DISTRIBUTED LOAD

BALANCING ALGORITHMS

A thesis submitted toward the degree of

Master of Science in Electrical Engineering

by

Medina Mordechai

February 2009

TEL AVIV UNIVERSITY The Iby and Aladar Fleischman Faculty of Engineering

The Zandman-Slaner School of Graduate Studies

ON RETRIES IN PARALLEL DISTRIBUTED LOAD

BALANCING ALGORITHMS

A thesis submitted toward the degree of

Master of Science in Electrical Engineering

by

Medina Mordechai

This research was carried out in the Department of Electrical Engineering – Systems

under the supervision of Prof. Guy Even

February 2009

Abstract

We deal with the well studied allocation problem of assigning m balls to n bins so that

the maximum number of balls assigned to the same bin is minimized. In particular, we

focus on parallel distributed algorithms for this problem. In the classical setting, each ball

randomly chooses a few bins, one of which it is eventually assigned to. The algorithms

we consider allow retries in the sense that a ball may choose a different randomly chosen

bin if its first choices are too loaded.

Our first contribution is the observation that the lower bound presented by Adler

et al. [1] is not valid if retries are allowed. We consider the question of whether smaller

maximum loads are achievable if retries are allowed. We present and analyze an algorithm

with at most one retry per ball. The analysis is different than the analyses of previous

randomized allocation algorithms. We prove tight asymptotic bounds on the maximum

load that meet previous bounds for parallel distributed load balancing algorithms.

On the more practical side, we present a parallel algorithm with retries, and demon-

strate its improved maximum load for n in the range between 1 million and 8 million.

We obtain a maximum load of 3 using 2.5 rounds of communication.

i

Contents

1 Introduction 1

1.1 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 The Greedy Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 2

1.1.2 The Parallel Greedy Algorithm . . . . . . . . . . . . . . . . . . . 2

1.1.3 Threshold Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2.1 The Basic Parallel Model . . . . . . . . . . . . . . . . . . . . . . . 4

1.2.2 Variations on the Model . . . . . . . . . . . . . . . . . . . . . . . 5

1.2.3 Introducing Retries . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.3 Lower Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.4 Main Question Addressed in this Thesis . . . . . . . . . . . . . . . . . . 6

1.5 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.5.1 Retries and the Basic Parallel Model . . . . . . . . . . . . . . . . 7

1.5.2 Analyzing an Algorithm with Retries . . . . . . . . . . . . . . . . 7

1.5.3 Previous Techniques with Proofs . . . . . . . . . . . . . . . . . . 7

1.5.4 Simulation of Algorithm H-retry . . . . . . . . . . . . . . . . . 7

1.6 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Applications 9

2.1 Hashing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2 Dynamic Assignment of Tasks to Servers . . . . . . . . . . . . . . . . . . 10

3 Techniques 11

3.1 Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

ii

3.2 From Binomial to Poisson . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.2.1 The Binomial Random Variable . . . . . . . . . . . . . . . . . . . 11

3.2.2 The Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . 12

3.2.3 Approximation of Binomial Distribution By Poisson Distribution . 12

3.3 High Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.4 Martingales and Doob Martingales . . . . . . . . . . . . . . . . . . . . . 14

3.4.1 Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.4.2 The Azuma-Hoeffding Inequality . . . . . . . . . . . . . . . . . . 15

3.4.3 The Lipschitz Condition . . . . . . . . . . . . . . . . . . . . . . . 16

3.5 Balls into Bins Tight Bound . . . . . . . . . . . . . . . . . . . . . . . . . 16

4 Theoretical Analysis 23

4.1 Algorithm retry: Description . . . . . . . . . . . . . . . . . . . . . . . . 24

4.2 Analyzing the Number of Rejected Replicas . . . . . . . . . . . . . . . . 25

4.3 Analyzing the Number of Doubly Rejected Balls . . . . . . . . . . . . . . 29

4.4 Putting it All Together . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5 Simulation of Algorithm H-retry 38

5.1 The New Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5.2.1 Maximum load . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5.2.2 Progress of the algorithm . . . . . . . . . . . . . . . . . . . . . . . 42

Bibliography 44

iii

List of Figures

4.1 The distribution of bin loads in an experiment with 8 ·106 bins and 16 ·106

balls. The x-axis depicts the bins in descending load order. The y-axis

depicts the load of each bin. The leftmost bar represents bins with load 5

or higher. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.2 The trade-off between the threshold and the additional load from Step 2a.

The x-axis denotes T , and the y-axis denotes the value of T and 1/(T ln(T )). 36

iv

List of Tables

5.1 Results for bin load frequencies, number of re-throws, and rejection fre-

quencies in 50 trials per four values of n ranging from 1 million to 8 mil-

lion. For each bin load, the frequencies obtained in the trials is presented

by the median and half the difference between the maximum and minimum

frequency. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

5.2 The change in the frequencies of the bin loads during the execution of the

algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

v

List of Algorithm Listings

4.1 A balls into bins algorithm with one retry. . . . . . . . . . . . . . . . . . 24

vi

Chapter 1

Introduction

1.1 Previous Work

Azar et al. [2] considered the problem of allocating balls to bins in a balanced way. For

simplicity, Suppose that the number of balls equals the number of bins, and is denoted by

n. If each ball selects a bin uniformly and independently at random, then the maximum

load of a bin is Θ(ln n/ ln ln n) with high probability [7]. Azar et al. proved that, if balls

choose two random bins, each ball is sequentially placed in a bin that is less loaded among

the two chosen bins, then w.h.p. the maximum load is only ln ln n/ ln 2 + Θ(1).

This surprising improvement in the maximum load has spurred a lot of interest in

randomized load balancing in various settings. Adler et al. [1] investigated the possibility

of finding parallel load balancing algorithms. They presented upper bounds and match-

ing lower bounds of Θ(

r√

log n/ log log n)

for parallel load balancing using r rounds of

communication. Shortly after that, Stemann [11] presented another parallel algorithm

with the same asymptotic bounds. However, there is a synchronization point in the

algorithm of Stemann, while the parallel algorithms in [1] are strictly asynchronous.

Berenbrink et al. [3] generalized the lower bound proved by [1] to a nonconstant number

r of communication rounds, provided that r ≤ log log n.

1

1.1.1 The Greedy Algorithm

The greedy algorithm for load balancing presented in [2] is a sequential algorithm. Each

ball, in its turn, chooses d bins uniformly and independently at random. The ball queries

each of these bins for its current load (i.e., the number of balls that have been already

assigned to it). The ball is placed in a bin with smaller load. Azar et al. [2] proved that

w.h.p. the maximum load at the end of this process is ln ln n/ ln d + Θ(1). Voecking [12]

presented a surprising variation in which the bins are partitioned into d disjoint parts of

n/d bins. Each ball randomly chooses one bin from each part. In addition, ties are broken

by applying an “always-go-left” rule. These two modifications lead to smaller constants

in the analysis. In fact, experiments reported in [12] show a reduction of the maximum

load from 4 to 3 for the range 220 ≤ n ≤ 224.

1.1.2 The Parallel Greedy Algorithm

We overview the parallel greedy algorithm pgreedy presented and investigated by Adler

et al. [1]. For simplicity we present the version in which d = 2 (namely, each balls chooses

two bins). We denote the balls by t ∈ [1..n] and the bins by i ∈ [1..n]. The algorithm

works as follows:

1. Each ball t chooses two bins b1(t) and b2(t) independently and uniformly at random.

(For simplicity, assume that b1(t) 6= b2(t)). The ball t sends requests to bins b1(t)

and b2(t). Copies of the same ball in different bins are referred to as siblings.

2. Upon receiving a request from ball t, bin i responds to ball t by reporting the

number of requests it received so far. We denote this number by hi(t), and refer to

it as the height of ball t in bin i.

3. After receiving its heights from b1(t) and b2(t), ball t sends a commit to the bin

that assigned a lower height. (Tie-breaking rules are not addressed in [1]).

Adler et al. [1] proved that the maximum load achieved by pgreedy is O(√

log n/ log log n)

with high probability. They also proved a matching lower bound. In their experimental

results, increasing the number of bins chosen by each ball to d = 3 only slightly improved

the maximum load. Moreover, setting d = 4 had an adverse effect on the maximum

2

load. To summarize, for n between 106 and 8 · 106, the maximum load was 5 or 6 (for

2 ≤ d ≤ 4).

1.1.3 Threshold Algorithm

The threshold algorithm threshold(T ) presented by Adler et al. [1] works differently.

A threshold parameter T is used to bound the number of balls that may be assigned to

each bin in each round. Initially, all balls are unaccepted. In each round, each unaccepted

ball chooses independently and uniformly a single random bin. Each bin accepts at most

T balls among the balls that have chosen it. The other balls, if any, are rejected. Note

that, although described “in rounds”, algorithm threshold(T ) can work asynchronously

as balls may proceed to the next round as soon as the replies arrive. The idea is that

the number of unaccepted balls decreases rapidly, and thus, Adler et al. prove that, if

r is constant, then setting T = O( r√

log n/ log log n) requires at most r rounds with

high probability. Another threshold algorithm was considered by Stemann [11]. Each

ball begins (like in pgreedy) by choosing two random bins and sending requests to

them. Only after every bin collects all the requests, the assignment process begins. The

assignment process is iterative and proceeds as follows. Each bin that received at most

T requests accepts all of them, and notifies the requesting balls. Each accepted ball

sends a message to the other bin to withdraw the sibling ball. Bins that accept withdraw

messages have fewer requests and hence may accept balls in the next iteration. We first

note that Stemann’s algorithms is not completely asynchronous as each bin must know if

it will receive more than T requests before accepting balls. Stemann was able to match

the results of [1], namely, for n balls and n bins and for a constant α, the maximum load

after r rounds of communication, where 2 ≤ r ≤ log log n3α

, is max√

3αr log nlog log n

, 4 (α + 7)

with probability at least 1− 1nα−1 . Plugging in r = 2 and α = 2 gives a maximum load of

√

12 logn/ log log n in 2 rounds of communication.

3

1.2 Model

1.2.1 The Basic Parallel Model

In this section we overview the model of parallel load balancing from Adler et al. [1].

There are n balls and n bins. This model is often referred to as static since balls are not

deleted.

The communication model is as follows. In the beginning, each ball t chooses a

constant number, d, of bins. From this point on, messages are sent only between pairs

(t, i), where ball t had chosen bin i. Communication proceeds in rounds. Each round

consists of messages from balls to bins and responses from bins to balls. In the last

round, each ball commits to one of the d bins that it has chosen initially. We assume that

each bin may simultaneously send messages to all the balls that have sent it a message

(in fact, w.h.p. there are only O(logn/ log log n) such messages per bin). The length of

each message is logarithmic in n, so messages can contain the ID of a ball and a bin.

A sublogarithmic bound on the number of messages sent in each round by each bin is

implied in this model. In addition, each ball sends at most d (a constant) messages per

round.

The requirements from the algorithm are as follows.

1. Nonadaptive - the random choices of each ball are made before any communication

takes place.

2. Symmetric - all the balls run the "same" program and all the bins run the "same"

program1. Moreover, each bin is chosen uniformly and independently at random.

3. Asynchronous - balls and bins wait only for messages addressed to them. In fact, a

bin or a ball waits for a message MSG only if the protocol guarantees the transmis-

sion of message MSG. In strict asynchronous algorithms, a ball or bin may proceed

even if a round is not completed.

1The definition of symmetry deserves a more formal treatment to avoid using ID’s of balls and bins,etc. One simple definition is that the program is fixed for all values of n.

4

1.2.2 Variations on the Model

In this section we consider modifications of the basic parallel model. We show that trivial

protocols exist if the requirements are relaxed:

1. Relaxing the symmetry requirement: One could think of a “macro” bin, that is in

charge of log n bins. Hence, we have nlog n

macro bins, each in charge of a disjoint

set of log n bins. If we relax symmetry so that “macro” bins behave differently, then

a constant maximum load is easily achievable w.h.p. as follows. Each ball sends

a request to a uniformly random macro bin. Each macro bin distributes the balls

evenly among the log n bins, leading to a constant maximum load w.h.p.

2. Relaxing the nonadaptiveness requirement: We can simulate the “macro” bins (from

the previous relaxation) by violating nonadaptiveness. A constant maximum load is

easily achievable w.h.p. as follows. Each ball sends a request to a uniformly random

bin. Contiguous disjoint blocks of log n bins “share” their requests. Determine a

balanced allocation and send messages back to balls with a new target bin.

1.2.3 Introducing Retries

We consider a parallel model which adds a feature to the basic parallel model as follows.

Allowing retries: The random choices of each ball C, are divided into two disjoint

sets C = I ∪ R. The set I consists of choices that are employed initially, and the set R

consists of choices that are used, by demand, instead of a choice in the set I.

Note that Algorithm threshold is a special case of an algorithm with retries. We

seek a more general retry algorithm (e.g., Algorithm 4.1).

1.3 Lower Bounds

The technique that was used in [1] for analyzing a parallel balls-and-bins lower bound

is the witness tree method. This technique is used for bounding the probability of some

“bad event", e.g., the probability of the occurrence of a “heavily-loaded" bin. The lower

bound is expressed in terms of the number of rounds of communication, r, the number of

5

balls, m, the number of bins, n, and the number of choices available to each ball, d. We

will outline the technique for d = 2, r = 2, and m = n. A vertex in a graph is associated

with each bin. As d = 2 each ball can be represented by an undirected edge between the

two bins that the ball has chosen. Now, the bin assigned to a ball is modeled by orienting

the edge of the ball. The in-degree of a vertex after all edges are oriented corresponds to

the load of the bin. The communication model in this context is that for each round of

communication, every ball and bin “sees” a larger neighborhood of the graph around it.

For n balls and n bins the corresponding graph is a random graph chosen uniformly from

all the graphs with n vertices and n edges. Adler et al. [1] showed that with constant

probability there exists a subgraph, which is a rooted symmetric tree of radius r = 2,

such that the root has a degree of(√

2− o(1))√

log n/ log log n. The symmetry of the

tree implies that the orientation procedure must orient half the edges incident to the

root towards it. This implies a maximum load of(√

2/2− o(1))√

log n/ log log n. They

showed a general lower bound of Ω(

r√

log n/ log log n)

for the maximum load with at

least constant probability if the number of rounds is r. That lower bound applies for a

symmetric nonadaptive scheme, as every ball chooses in advance its two bin candidates.

For schemes that have retries that lower bound does not apply (as elaborated in (1.5.1)).

Berenbrink et al. [3] generalized the lower bound proved by [1] to a nonconstant number

r of communication rounds, provided that r ≤ log log n.

1.4 Main Question Addressed in this Thesis

We study in this thesis the question of whether a parallel allocation processes with retries

may substantially decrease the maximum load. A similar question was raised by [4] for

the sequential model. They showed a lower bound of ln ln n/ ln d+Ω (1) on the maximum

load w.h.p, which does not change the known lower bound by [2] even though the latter

does not apply for the retry model.

6

1.5 Contributions

1.5.1 Retries and the Basic Parallel Model

We noticed that the lower bound presented by [1] is not valid if retries are allowed.

The lower bound proof does not capture the fact that in the retry parallel model

(1.2.3) there are additional choices which are not considered at the beginning and might

appear later (in the graph). Thus, the graph model does not deal with retries.

1.5.2 Analyzing an Algorithm with Retries

In Chapter 4 we analyze algorithm retry that allows one retry. This analysis is based

on techniques presented in Chapter 3. To the best of our knowledge, these techniques

haven not been applied to “balls and bins” algorithms. The analysis is asymptotically

tight. The tight analysis of a parallel algorithm with retries meets the asymptotically

maximum load as in [1]. We conclude that there is no asymptotic benefit by applying

retries.

1.5.3 Previous Techniques with Proofs

The techniques that were used throughout Chapter 4 are summarized in Chapter 3. We

tailored these techniques to our specific parameters to obtain somewhat simpler proofs.

1.5.4 Simulation of Algorithm H-retry

The gap between log2 log2 n and√

log2 n/ log2 log2 n becomes noticeable only for very

large values of n (e.g., n > 21024). This raises the need for conducting experiments (i.e.,

simulations) with smaller values of n (i.e., n ∈ [106, 8 · 106]) since the asymptotic analysis

does not yield results for such values of n. Indeed, simulations by Azar et al. [2] showed

that the maximum load is 4 for n ≤ 224. This improves over choosing a single bin where

the maximum load reaches even 13 for the same values of n. Voecking [12] suggested a

variation (see details in Sec. 1.1.1) that reduced the maximum load to 3-4 for n ≤ 224.

Adler et al. [1] also conducted experiments, and showed that their parallel algorithms

7

obtained a maximum load of 5-6 for n in the range between 1 million and 32 million

balls.

We present in Chapter 5 a practical parallel algorithm and demonstrate its improved

maximum load for n in the range between 1 million and 8 million. We obtain a maximum

load of 3 using 2.5 rounds of communication. Similarly to [11], our algorithm requires

a single synchronization point. In our experiments, at most 5 balls were rejected, and

at most a few hundred balls had duplicate siblings assigned to bins. One can assign

the rejected balls and delete the duplicate siblings using an additional round. From a

practical point of view, it is not clear whether the duplicates and the few rejections are

of interest.

1.6 Organization

The thesis is organized as follows: In Chapter 2 we overview two balls-in-bins applications.

In Chapter 3 we overview the techniques that were used in Chapter 4 and their proofs. In

Chapter 4 we present Algorithm retry and its tight analysis. In Chapter 5 we present

Algorithm H-retry and experimental results we have obtained with it.

8

Chapter 2

Applications

This chapter surveys some applications of the Balls and Bins model. This survey is

inspired by [8]. Sections 2.1 shows a sequential application while 2.2 shows a parallel

application and its relation to our new model.

2.1 Hashing

The standard hash table implementation [6] uses a single hash function to map many

keys to entries in a smaller table. If there is a collision, i.e., if two or more keys map

to the same table entry, then all the conflicting keys are stored in a linked list called a

chain. Thus, each table entry is the head of a chain and the maximum time to search for

a key in the hash table is proportional to the length of the longest chain in the table. If

the hash function is perfectly random - i.e., if each key entry is mapped to an entry of the

table independently and uniformly at random, and n keys are sequentially inserted into

a table with n entries - then the length of the longest chain is Θ (log n/ log log n) w.h.p.

This bound follows from the analogous bound on the maximum load in the classical

balls-and-bins problem where each ball chooses a single bin independently and uniformly

at random [7]. Now suppose that we use two perfectly random hash functions. When

inserting a key, we apply both hash functions to determine the two possible table entries

where the key can be inserted. Then, of the two possible entries, we add the key to the

shorter of the two chains. To search for an element, we have to search through the chains

at two entries given by both hash functions. If n keys are sequentialy inserted into the

9

table, the length of the longest chain is Θ (log log n) w.h.p, implying that the maximum

time needed to search the hash table is Θ (log log n) w.h.p. This bound also follows from

the analogous bound for the balls-and-bins problem where each ball chooses two bins at

random [2].

2.2 Dynamic Assignment of Tasks to Servers

Consider n identical servers, and n identical tasks. Suppose that the tasks arrive in

parallel and need to be assigned to a server. We would like to minimize the maximum load

of the servers. Ideally, when a task arrives (requesting a server), we would like to assign it

to the least loaded server. However, gathering complete information about the loads of all

the servers is expensive and assumed to be not justifiable. An alternative approach that

requires no coordination is to simply allocate each task to a random server. If there are n

tasks and n servers, using the parallel balls-and-bins analogy, some server is assigned with

Θ(

√

log n/ log log n)

tasks w.h.p. The new model enables the unfortunate tasks (e.g.

those who did not commit to a server due to the parallel balls-and-bins strategy) to “try”

again. Again, the retry promises that some server is assigned with Θ(

√

log n/ log log n)

tasks w.h.p.

10

Chapter 3

Techniques

This chapter will survey the techniques that are used in chapter 4. The survey is based

on [1, 5, 9, 10] .

3.1 Expectation

Theorem 3.1. [Linearity of Expectation] For any finite collection of discrete random

variables X1, X2, ..., Xn with finite expectations,

E

[

n∑

i=1

Xi

]

=

n∑

i=1

E [Xi] .

Theorem 3.1 holds for any collection of random variables, even if they are not inde-

pendent.

3.2 From Binomial to Poisson

3.2.1 The Binomial Random Variable

Definition 3.2. A discrete Binomial random variable X with parameters n and p, de-

noted by B(n, p), is defined by the following probability distribution on k = 0, 1, 2, ..., n:

Pr (X = k) =(n

k

)

pk(1− p)n−k.

11

The expectation of this random variable is np and its variance is np(1− p).

3.2.2 The Poisson Distribution

Definition 3.3. A discrete Poisson random variable X with parameter λ is given by the

following probability distribution on k = 0, 1, 2, ...:

Pr (X = k) =e−λλk

k!.

The expectation of this random variable as well as its variance is λ.

3.2.3 Approximation of Binomial Distribution By Poisson Dis-

tribution

Claim 3.4. Let X be discrete Binomial random variable with parameters n and p, and

let λ = np, then:

λk

k!

(

1− λ

n

)n−k

≥ Pr (X = k) ≥ λk

k!

(

1− k

n

)k (

1− λ

n

)n

.

Proof. First, let us rewrite Pr (X = k):

Pr (X = k) =(n

k

)

· pk(1− p)n−k

=(n− k + 1) · (n− k + 2) · ... · n

k!· pk(1− p)n−k.

Plugging p = λn

we will get:

Pr (X = k) =λk

k!

(n− k + 1) · (n− k + 2) · ... · nnk

·(

1− λ

n

)n−k

.

The claim follows.

Claim 3.5. Let X be discrete Binomial random variable with parameters n and p. Let

12

Y be a discrete Poisson random variable with parameter λ, then:

Pr (Y = k) · ekλ/n > Pr (X = k) > Pr (Y = k) · e−k2/(n−k)−λ2/(n−λ).

Proof. The following holds for 0 < t < 1:

e−t

1−t < 1− t < e−t.

We now conclude the upper and lower bounds using Claim 3.4.

Upper bound:

Pr (X = k) ≤ λk

k!

(

1− λ

n

)n−k

<λk

k!

(

e−λn

)n−k

=λk

k!e−λe

λkn

= Pr (Y = k) · ekλ/n.

Lower bound:

Pr (X = k) >λk

k!

(

1− k

n

)k (

1− λ

n

)n

>λk

k!e−

k2

n−k e−λn

n−λ

= Pr (Y = k) · e−k2/(n−k)−λ2/(n−λ).

Claim 3.5 is often used to prove that the Poisson distribution “approximates” the

Binomial distribution. Of course, this approximation is not for all values of k. It holds,

for example, for constant k or even k <√

n.

13

3.3 High Probability

Definition 3.6. We say that an event X occurs with high probability if Pr (X) ≥ 1 −O(

1n

)

.

3.4 Martingales and Doob Martingales

Martingales are sequences of random variables satisfying certain conditions. Those con-

ditions are satisfied in our balls in bins setting. We will present in this section a technique

that provides high probability bounds that are needed in Chapter 4.

3.4.1 Martingales

Definition 3.7. A sequence of random variables Z0, Z1, ... is a martingale with respect

to the sequence X0, X1, ... if, for all n ≥ 0, the following conditions hold:

• Zn is a function of X0, X1, ..., Xn.

• E [|Zn|] <∞.

• E [Zn+1 |X0, X1, ..., Xn] = Zn.

Definition 3.8. A sequence of random variables Z0, Z1, ... is called martingale if it is a

martingale with respect to itself. That is, E [|Zn|] <∞ and E [Zn+1 | Z0, Z1, ..., Zn] = Zn.

Definition 3.9. A Doob martingale refers to a sequence of random variables constructed

using the following exposure process. Let X0, X1, ..., Xn be a sequence of random vari-

ables, and let Y be a random variable with E [|Y |] <∞ that may “depend” on X0, X1, ..., Xn.

Let:

Zi , E [Y |X0, X1, ..., Xi] , i = 0, 1, ..., n.

One could think of the Doob martingale as an exposure process, that is, as the se-

quence advances more and more information is revealed, e.g., exposure of edges in a

random graph.

14

Claim 3.10. A Doob martingale Zini=0 constructed as in Definition 3.9 is a martingale

with respect to X0, X1, ..., Xn.

Proof. Using the fact that E [Y |X0, X1, ..., Xi+1] is a random variable and that E [V |W ] =

E [E [V | U, W ] |W ]:

E [Zi+1 |X0, X1, ..., Xi] = E [E [Y |X0, X1, ..., Xi+1] |X0, X1, ..., Xi]

= E [Y |X0, X1, ..., Xi]

= Zi.

3.4.2 The Azuma-Hoeffding Inequality

We use tail inequalities for martingales. These inequalities are similar to Chernoff in-

equalities, and apply even when the underlying random variables are not independent.

Theorem 3.11. [Azuma-Hoeffding Inequality] Let X0, X1, ..., Xn be a martingale.

Let B1, B1, ..., Bn denote a sequence of random variables where Bk may be function of

X0, X1, ..., Xk−1. Let dknk=1 denote a sequence of real numbers.

If

∀k : Bk ≤ Xk −Xk−1 ≤ Bk + dk

then, for all t ≥ 0 and any λ > 0,

Pr (|Xt −X0| ≥ λ) ≤ 2e−2λ2/(∑t

k=1 d2k).

15

3.4.3 The Lipschitz Condition

A real function f : Rn →R satisfies the Lipschitz condition with bound c if, for any i

and for any set of values x1, x2, ..., xn and yi,

|f (x1, x2, ..., xi−1, xi, xi+1, ..., xn)− f (x1, x2, ..., xi−1, yi, xi+1, ..., xn)| ≤ c.

Let X1, X2, ..., Xn denote random variables and ~X denote the n-tupple (x1, x2, ..., xn).

Let Z0 , E

[

f(

~X)]

and Zk , E

[

f(

~X)

|X1, X2, ..., Xk

]

, by Definition 3.9, the sequence

Z0, Z1, ... is a Doob martingale. If the Xk are independent random variables, then there

exist random variables Bknk=1, that depend only on Z0, Z1, ..., Zk−1, with Bk ≤ Zk −Zk−1 ≤ Bk + c. We may then apply Theorem 3.11 with dk ≡ c.

Theorem 3.12. Let X0, X1, ..., Xn be a sequence of independent random variables.

Let f(

~X)

= f (X1, X2, ..., Xn) be a function that satisfies the Lipschitz condition with

bound c.

Let Z0 , E

[

f(

~X)]

and Zk , E

[

f(

~X)

|X1, X2, ..., Xk

]

. Then the sequence Z0, Z1, ...

is a Doob martingale and for all t ≥ 0 and any λ > 0,

Pr (|Zt − Z0| ≥ λ) ≤ 2e−2λ2/t·c2.

3.5 Balls into Bins Tight Bound

Consider the randomized process which assigns m balls into n bins independently and

uniformly at random. Our goal is to bound the maximum load obtained by this process.

The general bound is given by the following theorem whose proof is rather elaborate

[7, 10]. Kolchin et al. [7] showed, among many results, that if m = nlg n

then the number

of balls whose height1 is greater than lg nlg lg n

is constant. This implies a maximum load of

lg nlg lg n

+ O (1). More precise constants were proved in [10].

Theorem 3.13. The randomized process which assigns m balls into n bins independently

and uniformly at random produces a maximum load of Θ

(

ln n

ln(1+ nm·lnn)

+ mn

)

w.h.p.

1We assign heights [1..k] to balls in a bin with k balls.

16

Our analysis requires using Theorem 3.13 only for√

n < m < nlnn

. Theorem 3.16 deals

with this special case of Theorem 3.13. The maximum load in that case is Θ(

log nlog(n/m)

)

.

We present a self-contained proof of Theorem 3.16 using the techniques presented in this

chapter. In addition, we use the first and second moment method as in [10]. In [1], upper

bounds are proved using a Poisson approximation. We first summarize the first and

second moment method from [10] in Claim 3.14, followed by Claim 3.15, that is helpful

in proving Theorem 3.16.

The following claim is proved using Markov’s inequality and Chebyshev’s inequality

(hence the term first and second moment).

Claim 3.14. Let X1, X2, ..., Xn be identically distributed 0-1 random variables, such that:

∀1 ≤ i ≤ j ≤ n : E [XiXj] ≤ (1 + o(1)) E2 [X1] . (3.1)

Let S ,∑n

i=1 Xi , then:

Pr [S = 0] =

1− o(1), if E [S ]→ 0

o(1), if E [S]→∞ .

Claim 3.15. Let2 m ≤ e · nln n

. Let α > 0 denote a constant. Let k , α · log nlog(n/m)

and

γ(

mn

)

, n · e−m/n(mn )

k

k!, then

n1−3α−o(1) < γ(

mn

)

≤ n1−α−o(1) .

Proof. It is sufficient to prove that:

ln(

n1−3α−o(1))

< ln(

γ(m

n

))

≤ ln(

n1−α+o(1))

.

Let us consider the expression, ln(

γ(

mn

))

:

2Note that the assumption m ≤ e · n

ln nis used in Claim 3.15 only for the lower bound: n1−3α−o(1) ≤

γ(

m

n

)

.

17

ln(

γ(m

n

))

= ln

(

n ·e−m/n

(

mn

)k

k!

)

(3.2)

= lnn + k · ln(m

n

)

−m

n− (1 + o(1)) · k · ln k

= lnn + α ·log n

log(n/m)· ln

(m

n

)

−m

n− (1 + o(1)) · α ·

log n

log(n/m)· ln

(

α ·log n

log(n/m)

)

= lnn + α ·lnn

ln(n/m)·[

ln(m

n

)

− (1 + o(1)) · ln(

α ·ln n

ln(n/m)

)]

− o(1)

= lnn − α ·lnn

ln(n/m)·[

ln( n

m

)

+ (1 + o(1)) ·[

ln α + ln ln n − ln ln(n

m)]]

− o(1) .

The assumption that m ≤ e · nln n

implies that ln ne≤ n

m, hence for n ≥ e(e2):

ln ln n

ln(n/m)≤ ln ln n

ln(

lnne

) (3.3)

=ln ln n

ln ln n− 1

= 1 +1

ln ln n− 1

≤ 2 .

Let us consider the lower bound. Combining Equations 3.2, 3.3:

ln(

γ(m

n

))

= ln n − α ·lnn

ln(n/m)·[

ln( n

m

)

+ (1 + o(1)) ·[


m)]]

− o(1)

= ln n − α · ln n − α ·ln n · (1 + o(1)) · ln α

ln(n/m)− α ·

lnn · (1 + o(1)) · ln ln n

ln(n/m)+ α ·

ln n · (1 + o(1)) · ln ln( nm

)

ln(n/m)− o(1)

> ln n − α · ln n − o(1) · ln n − α · (2 + o(1)) lnn − o(1)

= (1 − 3α − o(1)) · ln n .

Let us consider the upper bound. Equation 3.2 yields:

ln(

γ(m

n

))

= ln n − α ·lnn

ln(n/m)·[

ln( n

m

)

+ (1 + o(1)) ·[


m)]]

− o(1)

= ln n − α · ln n − o(1) · ln n − α ·lnn

ln(n/m)·[

(1 + o(1)) ·[

ln ln n − ln ln(n

m)]]

− o(1)

≤ (1 − α − o(1)) · lnn .

The claim follows.

Theorem 3.16. Let√

n ≤ m ≤ e · nln n

. Then, the randomized process which assigns

m balls into n bins independently and uniformly at random produces a maximum load of

18

Θ(

log nlog(n/m)

)

w.h.p.

Proof. Let X1, X2, ..., Xn be the number of balls in each bin. Each Xi is a discrete

Binomial random variable with parameters m and p = 1n.

Let

k , α · log n

log(n/m),

χ(k)i =

1, Xi ≥ k,

0, o.w. ,

S ,

n∑

i=1

χ(k)i .

We will show that:

Pr

(

∃ 1 ≤ i ≤ n : Xi ≥ α · log n

log(n/m)

)

=

1− o(1), if 0 < α < 13,

o(1), if α > 1.

The proof structure is as follows: We first bound Pr (Xi ≥ k), to obtain a tight bound for

E [S] = n ·Pr (Xi ≥ k). We find threshold values for α that determine whether E [S]→ 0

or E [S] → ∞. The theorem follows after we show that the premise in Equation (3.1)

holds, and by applying Claim (3.14). That will provide us with a maximum load of

Θ(

log nlog(n/m)

)

w.h.p.

Let us consider Pr (Xi ≥ k). Let Yi be a discrete Poisson random variable with pa-

rameter λ = mn≤ e

ln n. Using Claim 3.5:

Pr (Xi ≥ k) <m∑

ℓ=k

Pr (Yi = ℓ) · eℓ λm ≤ eλ

m∑

ℓ=k

e−λλℓ

ℓ!. (3.4)

Let aℓ , e−λλℓ

ℓ!, since m < n for n > ee and ℓ ≥ 1:

aℓ+1

aℓ= λ

ℓ+1<

1

2

19

Now we can rewrite Equation 3.4:

Pr (Xi ≥ k) < eλ

(

ak +m∑

ℓ=k

ak ·ℓ∏

j=k

aj+1

aj

)

< eλ

(

ak + ak ·1

2·

∞∑

ℓ=0

(

1

2

)ℓ)

< 2eλ · ak

= (2 + o(1)) · ak . (3.5)

Using Claim 3.5:

Pr (Xi ≥ k) > Pr (Xi = k) > Pr (Y = k) · e−k2/(m−k)−λ2/(m−λ) . (3.6)

Since m ≤ e · nln n

, then :

λ2

m− λ=

m

n2 − n≤ e · n

ln n

n (n− 1)=

e

ln n (n− 1)→ 0 (3.7)

The assumption√

n ≤ m ≤ e · nlnn

implies that lnne≤ n

m≤ √n, substitute k =

α · log nlog(n/m)

:

k2

m− k≤

(

α · log nlog(n/m)

)2

√n− α · log n

log(n/m)

=α2 · log2 n

log2(n/m) · √n− α · log(n/m) · log n

≤ α2 · log2 n(

log(

ln ne

))2 · √n− α · 12· log2(n)

→ 0 . (3.8)

Combining (3.7) and (3.8) and plugging it in the bound presented in Equation 3.6:

Pr (Xi ≥ k) > (1 + o(1)) · e−m/n

(

mn

)k

k!. (3.9)

20

Combining (3.5) and (3.9) 3:

(2 + o(1)) · e−m/n

(

mn

)k

k!> Pr (Xi ≥ k) > (1 + o(1)) · e−m/n

(

mn

)k

k!.

Due to linearity of expectation (3.1):

E [S] = n · Pr (Xi ≥ k) = Θ

(

n · e−m/n

(

mn

)k

k!

)

= Θ(

γ(m

n

))

. (3.10)

By Claim 3.15:

limn→∞

E [S] =

0, if α > 1,

∞, if 0 < α < 13

.

(3.11)

To apply Claim 3.14 it remains to show that, Equation 3.1 holds. Indeed, as justifiedbelow:

E

[

χ(k)i χ

(k)j

]

= Pr (Xi ≥ k, Xj ≥ k)

=

n−k∑

k1=k

n−k1∑

k2=k

( n

k1

)(n − k1

k2

)

(

1

n

)k1+k2(

1 −2

n

)n−(k1+k2)

≤n∑

k1=k

n∑

k2=k

( n

k1

)( n

k2

)

(

1

n

)k1+k2(

1 −1

n

)2n−2(k1+k2)

=n∑

k1=k

( n

k1

)

(

1

n

)k1(

1 −1

n

)n−k1(

1 −1

n

)

−k1n∑

k2=k

( n

k2

)

(

1

n

)k2(

1 −1

n

)n−k2(

1 −1

n

)

−k2

(3.12)

≤

(

(

1 −1

n

)

−k

Pr (Xi ≥ k)

)2

=

(

(

1 −1

n

)−k

· E

[

χ(k)i

]

)2

≤ (1 + o(1)) · E2[

χ(k)i

]

.

The third step is valid since:(

n−k1

k2

)

≤(

nk2

)

and(

1− 2n

)

≤(

1− 1n

)2. The last inequal-

3Note how closely the tail of the Poisson distribution approximates the tail of the Binomial distributionfor this special value of k. This is the reason for the relative simplicity of the proof for this specific settingof the parameters.

21

ity is valid for k = α · log nlog(n/m)

, since:

k

n= α · log n

n · log(n/m)

≤ α · log n

n · log(

lnne

)

→ 0 ,

hence:

(

1− 1

n

)−k

< ekn

= 1 + o(1) .

To complete the proof we apply 3.14 since Equation 3.11 and Equation 3.12 imply

that the premises of Claim 3.14 hold.

22

Chapter 4

Theoretical Analysis

This chapter deals with the analysis of a load balancing algorithm with retries. We refer

to this algorithm by retry (see listing in Algorithm 4.1).

This algorithm captures the major properties of Algorithm H-retry presented in

Chapter 5. These common properties are: tossing two independent replicas for each ball,

rejection of a replica based on a threshold, and re-throwing the doubly rejected balls (i.e.,

retry).

We assign heights [1..k] to balls in a bin with k balls.

In Figure 4.1 we depict the bin loads in an experiment when m = 16 · 106 balls are

randomly uniformly and independently tossed in n = 8 · 106 bins. The x-axis depicts

the bins in descending load order. The y-axis depicts the load of each bin. We set a

threshold T , and consider all balls of height greater than T . These balls are called the

excess replicas of balls. A ball is doubly rejected if both replicas of the balls are excess

replicas.

Given the number k of rejected replicas (the replicas above the threshold line in

Figure 4.1), both replicas of a ball are rejected with probability k·(k−1)2n·(2n−1)

<(

k2n

)2. The

conditioning on the number of rejected replicas enables us to bound the expected number

of doubly rejected balls. To use this conditioning on k, we show that the number of

doubly rejected balls is concentrated. The proof technique is based on showing that our

setting matches that of Theorem 3.12 (Azuma-Hoeffding Inequality with the Lipschitz

condition). To bound the expected number of doubly rejected balls we need a bound on

the variance of the rejected replicas. Again, Theorem 3.12 is used to bound the variance.

23

Once we have a high probability bound on the number of doubly rejected balls, we use

Theorem 3.13 to conclude that the maximum load that retry produces is Θ(√

lg nlg lg n

)

w.h.p.

4.1 Algorithm retry: Description

Let height(i(b)) denote the height of the ball b in the i’th bin.

Algorithm 4.1 A balls into bins algorithm with one retry.

retry(threshold T, number of bins n):

1. Round 1 :

(a) Each ball b chooses uniformly at random two bins i1(b), i2(b) and sends requeststo these bins.

(b) A ball b receives reject messages from the bins. We denote the rejected replicasof a ball b by R(b), namely,R(b) , j ∈ 1, 2 : height(ij (b)) > T.

2. Round 2:

(a) If |R(b)| = 2 (i.e. b is doubly rejected) then b chooses uniformly at randombin i3(b).

Note that Algorithm retry is nonadpative, as i1(b), i2(b), i3(b) could have been chosen

before any communication took place.

One could withdraw one replica of a a ball, that both his replicas were not rejected,

at the end of Round 2. Since the analysis of Algorithm retry does not benefit from this

withdrawal, the description of Algorithm retry does not include a withdrawal.

24

4.2 Analyzing the Number of Rejected Replicas

Figure 4.1: The distribution of bin loads in an experiment with 8 · 106 bins and 16 · 106

balls. The x-axis depicts the bins in descending load order. The y-axis depicts the loadof each bin. The leftmost bar represents bins with load 5 or higher.

x0 1#106 2#106 3#106 4#106 5#106 6#106 7#106 8#106

0

1

2

3

4

5

Threshold

RejectedReplicas

Suppose that m balls are tossed into n bins independently and uniformly at random and

let X(m)1 , X

(m)2 , ..., X

(m)n denote the number of balls in each bin. Note that if m = 2n then

X(m)i equals the load of bin i at the end of Step 1a (namely, each replica is considered as

one of the m balls). To simplify notation we denote X(m)i by Xi whenever the value of m

is clear.

The excess load in bin i equals Xi − T (i.e., the number of replicas that are above the

threshold). Let f(

~X)

,∑n

i=1 max (Xi − T, 0) =∑n

b=1 |R(b)|. If m = 2n then f(

~X)

equals the number of reject messages in Step 1b.

The following claim bounds the expected number of reject messages. We use linearity

of expectation (3.1), the binomial distribution inequalities (3.5), and bound the tail of a

Poisson distribution by a geometric series. The claim states that the expected value is

Θ(

n · 2T+1

(T+1)!

)

.

25

Claim 4.1. If the threshold T satisfies 6 ≤ T ≤ √n, then

e−5n · 2T+1

(T + 1)!≤ E

[

f(

~X)]

≤ 2n · 2T+1

(T + 1)!.

Proof. Let ak , (k − T ) · e−2·2k

k!. It follows that for j ≥ T + 1:

aj+1

aj=

(j + 1− T ) · 2(j − T ) · (j + 1)

=

(

1 +1

j − T

)

· 2

j + 1

≤ 4

T + 2.

If T ≥ 6 then:

aj+1

aj

≤ 4T+2

≤ 1

2.

Since the random variables X1, X2, ..., Xn are identically distributed, by linearity of ex-

pectation:

E

[

f(

~X)]

= E [∑n

i=1 max (Xi − T, 0)] = n · E [max (Xi − T, 0)] .

Note that each Xi is a binomial random variable with parameters 2n and 1n. Then, by

Claim 3.5, for any fixed k = 0, 1, 2, ...2n:

Pr (Xi = k) <e−2 · 2k

k!· ek/n

≤ e−2 · 2k

k!· e2.

Note that for any fixed k = 0, 1, 2, ...,√

n the following holds:

k2

2n− k+

2

n− 1<

n

2n−√n+ 2

=1

2− 1√n

+ 2

< 3.

26

Hence for any fixed k = 0, 1, 2, ...,√

n, due to Claim 3.5:

Pr (Xi = k) >e−2 · 2k

k!· e−k2/(2n−k)−2/(n−1)

>e−2 · 2k

k!· e−3.

It follows that:

n · E [max (Xi − T, 0)] ≤ n ·∑2n

k=T+1ak · e2

≤ n · e2

aT+1 +∑2n

k=T+1aT+1 ·

∏k

j=T+1

(

aj+1

aj

)

= n · e2

aT+1 + aT+1 ·∑2n

k=T+1

(

1

2

)k−(T+1)+1

≤ n · e2

aT+1 + aT+1 ·1

2·∑∞

k=0

(

1

2

)k

= 2e2 · n · aT+1

which completes the proof of the upper bound.

The lower bound is proved as follows:

E

[

f(

~X)]

> n ·∑2n

k=T+1ak · e−3 ≥ e−3 · n · aT+1.

The following claim shows that the number of rejected replicas of balls is concentrated

around its expected value . We use Azuma-Hoeffding inequality with the Lipschitz con-

dition (3.12). The claim states that the probability that the number of rejected replicas

of balls deviates from its expected value by ε is at most 2 · e−ε2/n.

For i, j ∈ N let δi,j ,

1 i = j

0 o.w

.

Claim 4.2. Pr(∣

∣

∣f(

~X)

− E

[

f(

~X)]∣

∣

∣ ≥ ε)

≤ 2 · e−ε2/n.

Proof. There are 2n replicas of balls. We denote a replica by 1 ≤ β ≤ 2n. Let ξβ

denote the bin that replica β is sent to. The random variables ξβ2nβ=1 are independent

27

and uniformly distributed. The load in bin i satisfies X(2n)i =

∑2nβ=1 δξβ ,i. Let f

(

~ξ)

,∑n

i=1 max(

∑2nβ=1 δξβ ,i − T, 0

)

, then f(

~ξ)

= f(

~X)

. Let Z0 = E

[

f(

~ξ)]

and Zk =

E

[

f(

~ξ)

| ξ1, ξ2, ..., ξk

]

. The sequence Z0, Z1, ... is a Doob martingale. Note that Z2n =

E

[

f(

~ξ)

| ξ1, ξ2, ..., ξ2n

]

= f(

~ξ)

. The function f satisfies the Lipschitz condition with

bound1 1. Now we may apply the Azuma-Hoeffding inequality special case (3.12):

Pr(∣

∣

∣f(

~ξ)

− E

[

f(

~ξ)]∣

∣

∣≥ ε)

= Pr (|Z2n − Z0| ≥ ε)

≤ 2 · e−ε2/n.

The claim follows.

We also bound the second moment.

Corollary 4.3. Pr

(

(

f(

~X)

− E

[

f(

~X)])2

≥ γ

)

≤ 2 · e−γ/n.

The following claim bounds the variance of the number of rejected replicas of balls.

We use Corollary 4.3. The claim states that the variance of the number of rejected replicas

of balls is at most 4n.

Claim 4.4. V ar(

f(

~X))

≤ 4n.

Proof. By Corollary 4.3:

Pr

(

(

f(

~X)

− E

[

f(

~X)])2

≥ α · n)

≤ 2 · e−α.

1moving a replica may result with either: (a) no change in the excess load summation (non-excessreplica→non-excess replica, excess replica→excess replica ), (b) decrease of 1 (excess replica→non-excessreplica ), (c) increase of 1 (non-excess replica→excess replica )

28

Now we can bound V ar(

f(

~X))

:

V ar(

f(

~X))

= E

[

(

f(

~X)

− E

[

f(

~X)])2

]

≤∞∑

k=1

n · k · Pr

(

(

f(

~X)

− E

[

f(

~X)])2

∈ ((k − 1)n, kn]

)

≤∞∑

k=1

n · k · Pr

(

(

f(

~X)

− E

[

f(

~X)])2

≥ (k − 1) · n)

≤∞∑

k=1

n · k · 2 · e−(k−1)

= 2e · n∞∑

k=1

k · e−k

≤ 2e · n∞

x=1

x · e−xdx

= 2e · n[

−e−x (x + 1)]∞x=1

= 4n.

4.3 Analyzing the Number of Doubly Rejected Balls

Let g(n) , |1 ≤ b ≤ n : |R(b)| = 2| be the function that counts the number of balls,

both replicas of which are rejected in Step 1b (e.g., doubly rejected).

The following claim bounds the expected number of doubly rejected balls. We use

a conditioning on the number of excess balls and Claim 4.4. The claim states that the

expected number of doubly rejected balls is Θ

(

E2[f( ~X)]

4n

)

.

Claim 4.5.E

2[f( ~X)]4n

− 1 ≤ E [g(n)] ≤ E2[f( ~X)]

4n+ 1.

29

Proof. Let us consider E [g(n)]:

E [g(n)] =∑

k

E

[

g(n) | f(

~X)

= k]

· Pr(

f(

~X)

= k)

=∑

k

n · k

2n

k − 1

2n− 1· Pr

(

f(

~X)

= k)

≤∑

k

n ·(

k

2n

)2

· Pr(

f(

~X)

= k)

(4.1)

=1

4n·∑

k

k2 · Pr(

f(

~X)

= k)

=1

4n· E[

f(

~X)2]

=1

4n·(

V ar(

f(

~X))

+ E2[

f(

~X)])

,

E [g(n)] =∑

k

E

[

g(n) | f(

~X)

= k]

· Pr(

f(

~X)

= k)

=∑

k

n · k

2n· k − 1

2n− 1· Pr

(

f(

~X)

= k)

≥∑

k

n ·[

(

k

2n

)2

− 1

n

]

· Pr(

f(

~X)

= k)

(4.2)

=∑

k

n ·(

k

2n

)2

· Pr(

f(

~X)

= k)

−∑

k

Pr(

f(

~X)

= k)

=1

4n·∑

k

k2 · Pr(

f(

~X)

= k)

− 1

=1

4n· E[

f(

~X)2]

− 1

=1

4n·(

V ar(

f(

~X))

+ E2[

f(

~X)])

− 1 .

Claim 4.4 implies that 0 ≤ V ar(

f(

~X))

≤ 4n, then combining with Equations 4.1, 4.2 :

E2[

f(

~X)]

4n− 1 ≤ E [g(n)] ≤ 1

4n·(

4n + E2[

f(

~X)])

,

and the claim follows .

30

The following two claims bound E [g (n)] for the range of ln ln n ≤ T ≤√

lnnln ln n

. These

claims state that 15·e10 · n

3/4·ln lnn

lnn< E [g (n)] < n

ln n.

Claim 4.6. Let T ≥ ln ln n then E [g (n)] < n ·(

2T

T !

)2

< nlnn

.

Proof. Let us consider the first inequality. Claims 4.5, 4.1 imply that :

E [g(n)] ≤E

2[

f(

~X)]

4n+ 1

≤

(

2n · 2T+1

(T+1)!

)2

4n+ 1

= n ·(

2 · 2T

T ! · (T + 1)

)2

+ 1

≤ 2n · 4

(T + 1)2 ·(

2T

T !

)2

< n ·(

2T

T !

)2

.

Let us consider the second inequality, then for T ≥ ln ln n:

n ·(

2T

T !

)2

= n · e2 ln 2·T−(2+o(1))T ln T

< n · e2T−(2+o(1))T lnT

= n · e−T ((2+o(1)) ln T−2)

=n

eT ((2+o(1)) ln T−2)

≤ n

eln ln n(ln ln ln n−2)

<n

eln ln n

=n

ln n.

Claim 4.7. Let T ≤√

lnnln ln n

then E [g (n)] > 15·e10 · n3/4·ln lnn

lnn.

31

Proof. Claims 4.5, 4.1 imply that :

E [g(n)] ≥E

2[

f(

~X)]

4n− 1

≥

(

e−5n · 2T+1

(T+1)!

)2

4n− 1 (4.3)

=e−10

4· n ·

(

2T+1

(T + 1)!

)2

− 1

=e−10

(T + 1)2· n ·

(

2T

T !

)2

− 1 .

Let us bound n ·(

2T

T !

)2

for T ≤√

lnnln ln n

. Since for n ≥ e

(

e(e2))

,√

lnn · ln lnn < 16ln n,

then:

n ·(

2T

T !

)2

= n · e2 ln 2·T−(2+o(1))T ln T

> n · eT−(2+o(1))T lnT

= n · e−T ((2+o(1)) ln T−1)

>n

e3T lnT

≥ n

e32

√ln n

ln ln nln( ln n

ln ln n)(4.4)

>n

e32

√ln n

ln ln nln ln n

=n

e32

√lnn·ln ln n

>n

e32· 16

lnn

= n3/4 .

Let us bound 1(T+1)2

for T ≤√

ln nln lnn

:

32

1

(T + 1)2 ≥ 1(√

ln nln lnn

+ 1)2

>1

(

2 ·√

ln nln ln n

)2 (4.5)

=1

4· ln ln n

ln n.

Combining Equations 4.3, 4.4, and 4.5:

E [g(n)] ≥ e−10

(T + 1)2 · n ·(

2T

T !

)2

− 1

>1

4 · e10· ln lnn

ln n· n3/4 − 1

>1

5 · e10· n

3/4 · ln ln n

ln n.

The following claim shows that the number of doubly rejected balls is concentrated

around its expected value. The proof is similar to the proof of Claim 4.2. We use Azuma-

Hoeffding inequality with the Lipschitz condition Theorem 3.12. The claim states that

the probability that the number of doubly rejected balls deviates from its expected value

by an ε is at most 2 · e−ε2/n.

Claim 4.8. Pr (|g (n)− E [g (n)]| ≥ ε) ≤ 2 · e−ε2/n.

Proof. There are 2n replicas of balls. We denote a replica by 1 ≤ β ≤ 2n. Let ξβ denote

the bin that replica β is sent to. Hence, g (n) is a function of the random variables ξβ2nβ=1.

The random variables ξβ2nβ=1 are independent and uniformly distributed. Let Z0 =

E [g (n)] and Zk = E [g (n) | ξ1, ξ2, ..., ξk] . The sequence Z0, Z1, ... is a Doob martingale.

Note that Z2n = E [g (n) | ξ1, ξ2, ..., ξ2n] = g (n). The function g satisfies the Lipschitz

condition with bound2 1. Now we may apply the Azuma-Hoeffding inequality with special

2We consider a worst case scenario. Changing the bin allocated to a replica may affect g(n) as follows:(a) no change in the number of the rejected balls, if the replica was excess and remains excess replica,(b) no change in the number of the rejected balls, if the replica was non-excess and remains non-excessreplica, (c) decrease of 1, if the replica is non-excess after the change and was excess before, (d) increaseof 1, if the replica is excess after the change and was non-excess before.

33

case (3.12):

Pr (|g (n)− E [g (n)]| ≥ ε) = Pr (|Z2n − Z0| ≥ ε)

≤ 2 · e−ε2/n.

Claims 4.2, 4.8 are an adaptation of an application of the Azuma-Hoeffding inequality

that appeared in [9]. Where a concentration result of the number of empty bins is proved.

We would like to emphasize that both Claims 4.2 and 4.8 do not depend on the value

of T .

Corollary 4.9. Let ln ln n ≤ T ≤√

lnnln ln n

, then the following inequalities occur with

probability 1− 1n:

1. |g (n)− E [g (n)]| ≤√

n ln n .

2. g (n) = Θ

(

E[f( ~X)]2

4n

)

.

3. g (n) < (1 + o(1)) · nln n

.

Proof. Plugging ε =√

n lnn in Claim 4.8 and taking the complement probability implies

Part (1).

Claims 4.6 and 4.7 state:

1

5 · e10· n

3/4 · ln ln n

ln n< E [g (n)] <

n

ln n. (4.6)

34

Combing Equations Part (1), and 4.6:

g (n) ≤ E [g (n)] +√

n ln n

= E [g (n)]

(

1 +

√n ln n

E [g (n)]

)

< E [g (n)]

(

1 +

√n ln n

15·e10 · n

3/4·ln lnn

lnn

)

(4.7)

= E [g (n)]

(

1 +5 · e10 · ln3/2 n

n1/4 · ln lnn

)

= E [g (n)] (1 + o(1)) ,

g (n) ≥ E [g (n)]−√

n ln n

= E [g (n)]

(

1−√

n ln n

E [g (n)]

)

> E [g (n)]

(

1−√

n ln n1

5·e10 · n3/4·ln lnn

lnn

)

= E [g (n)]

(

1− 5 · e10 · ln3/2 n

n1/4 · ln ln n

)

= E [g (n)] (1− o(1)) ,

which proves Part (2).

Part (2) follows, now, by Claim 4.5.

Equations 4.6, and 4.7 prove Part (3).

4.4 Putting it All Together

The following theorem and claims bound the maximum load that Algorithm retry pro-

duces w.h.p. We use Corollary 4.9 and the high probability bound in Theorem 3.16.

Claim 4.11 states that if the threshold T is in the interval[

ln ln n,√

ln nln lnn

]

then, the

optimal threshold is Θ(√

lg nlg lg n

)

which implies a maximum load of Θ(√

lg nlg lg n

)

w.h.p.

Theorem 4.10 shows that there is a trade-off between the threshold and the additional

35

load that Step 2a incurs (see Figure 4.2), e.g., thresholds other than T = Θ(√

lg nlg lg n

)

in

the interval[

ln lnn,√

lnnln ln n

]

incurs a higher load in Step 2a. Claim 4.12 states that a

selection of a threshold that is lower than ln ln n or higher than√

ln nln ln n

does not improve

the maximum load.

Figure 4.2: The trade-off between the threshold and the additional load from Step 2a.The x-axis denotes T , and the y-axis denotes the value of T and 1/(T ln(T )).

T1

Tln T

1.2 1.4 1.6 1.8 2.0 2.2 2.4

1

2

3

4

5

6

7

8

9

Threshod

Step 2.a additional load

log nlog log n

Theorem 4.10. Let ln ln n ≤ T ≤√

lnnln ln n

, then the maximum load that Algorithm

retry produces is Θ(

T + ln nT ·lnT

)

w.h.p..

Proof. Corollary 4.9 implies that g (n) < (1 + o(1)) · nln n

w.h.p. Then, using the high

probability bound in Theorem 3.16 for throwing m = g (n) balls into n bins implies that

the additional load that Step 2a incurs is Θ

(

lnn

ln( nm)

)

w.h.p.

Corollary 4.9, and Claim 4.1 imply that:

g (n) = Θ

E

[

f(

~X)]2

4n

= Θ

(

n ·(

2T+1

(T + 1)!

)2)

.

36

Plugging m = g (n) in Θ

(

ln n

ln( nm)

)

:

Θ

(

ln n

ln(

nm

)

)

= Θ

ln n

ln

(

(

(T+1)!2T+1

)2)

= Θ

ln n

ln(

(T+1)!2T+1

)

= Θ

(

lnn

T · ln T

)

.

Since the maximum load that Step 1a produces is T , and since the additional load that

Step 2a produces is Θ(

ln nT ·lnT

)

, then the maximum load is Θ(

T + ln nT ·lnT

)

w.h.p.

Claim 4.11. Let ln ln n ≤ T ≤√

lnnln ln n

, then the maximum load that Algorithm retry

produces is minimized for T = Θ(√

lg nlg lg n

)

, and it is Θ(√

lg nlg lg n

)

w.h.p.

Proof. Theorem 4.10 implies that the maximum load is:

Θ

(

T +ln n

T · ln T

)

(4.8)

for T in the interval[

ln ln n,√

ln nln lnn

]

.

Bound 4.8 is minimized for T = lnnT ·lnT

, which implies that T = Θ(√

lg nlg lg n

)

. We conclude

that if T = Θ(√

lg nlg lg n

)

then the maximum load is Θ (T ) w.h.p.

Claim 4.12. Let

√

lnnln ln n

< T , or T < ln ln n, then the maximum load that Algorithm

retry produces is Ω(√

lg nlg lg n

)

w.h.p.

Proof. If T < ln ln n then the number of doubly rejected balls (e.g. g(n)) increases, thus

increasing the additional load that Step 2a incurs.

If√

ln nln lnn

< T then the maximum load will be at least min

T, lnnln ln n

, since the

maximum load is Ω(

ln nln lnn

)

w.h.p. [7, 10].

37

Chapter 5

Simulation of Algorithm H-retry

5.1 The New Algorithm

The new algorithm combines ideas from pgreedy and from the threshold algorithm of

Stemann. Let T denote the threshold used by the algorithm. The algorithm is non-

adaptive, symmetric, and requires only one synchronization (similarly to [11]).

Overview. Instead of picking a bin that assigned the smallest height to ball t (as in

pgreedy), the ball forwards the heights to the other bin. Only after the bin receives

all the requests and the heights of the siblings of the balls in the bin, reject and accept

decisions are made.

First, the bin performs a safe delete step. The idea is that, if the local height is larger

than the height of a sibling, then the bin can safely remove the ball. We refer to the

difference between the local height of ball t and the height of its sibling by δt. Let |bini|denote the number of balls that have sent requests to the ith bin. We apply safe deletion

only for at most |bini|−T balls. If a bin still contains too many balls after the safe delete

step, then the excess delete step takes place.

Note that, after the safe delete step, every ball t in an overloaded bin has δt ≤ 0.

During the excess delete step, balls are removed until the number of balls equals the

threshold. Balls with δt closer to zero are deleted first, and among balls with the same

δt, lower balls are deleted first.

After safe deletion and excess deletion are completed, reject and accept messages are

38

sent to all the balls. An accept message is also accompanied by the load of the bin. Thus,

each ball knows how many of its siblings were accepted. If both siblings are accepted,

then the ball sends a withdraw message to a bin with the higher load. If both siblings

were rejected, then we re-throw these doubly rejected balls.

Each doubly rejected ball uniformly re-chooses two random bins. These bins accept

only if accepting the new ball does not overload the bin. Our experiments show that,

even for 8 million balls, only a handful of balls are finally rejected. One could reassign

them, if needed, using an extra round. Alternatively, one could re-choose three random

bins instead of two or simply accept the re-thrown balls while increasing the maximum

load only by one with high probability.

Description. To make reading easier, we present a “sequential” description of the al-

gorithm below. We use the same notation used for the description of pgreedy.

1a Each ball t chooses two bins b1(t) and b2(t) independently and uniformly at random.

(For simplicity, assume that b1(t) 6= b2(t)). The ball t sends requests to bins b1(t)

and b2(t).

1b Each bin i assigns heights to the requesting balls (according to their order of arrival),

and sends the height hi(t) to ball t. Let bini denote the set of balls that have sent

requests to bin i.

2a Each ball t forwards its height in one bin to the other bin (containing the sibling

ball).

2b We define a synchronization point at this stage: for each ball t in bin i, the bin

knows the local height hi(t) and the height hj(t) of the sibling of t in bin j. Let

δt

= hi(t)−hj(t). (Note that δt is a local variable in bin i; in bin j, δt = hj(t)−hi(t).)

We now apply two deletion steps: safe delete and excess delete. The safe delete

step proceeds as follows. Let Ai = t′1, t′2, . . . denote the subset of bini consisting

of balls t with δt > 0 sorted in descending height order (i.e., hi(t′1) > hi(t

′2) > · · · ).

Let αi

= min|Ai|, |bini| − T. Each bin i removes the αi highest balls. Thus

bini ← bini \ t′1, . . . , t′αi.

39

After the safe delete step, the bin might still be overloaded. In this case, we apply

the excess delete step to balls with δt ≤ 0. Let βi = max0, |bini| − T. We need

to remove βi balls from bini. Note that balls with δt > 0, whose height is below the

threshold T , were not deleted in the safe deletion step and are not deleted in the

excess delete step.

The excess delete step proceeds as follows. We first sort the balls with δt ≤ 0 in

each bin in lexicographic ascending order according to (−δt, hi(t)). We now simply

remove from bin i the βi first balls according to this lexicographic order. Namely,

we first remove balls with δt = 0, and among them we remove balls with the smallest

height. We then continue with balls with the δt = −1, and so on.

A reject message is sent to each ball removed from bini. An accept message is sent

to the balls that remain in bini together with the current load of the bin (i.e., |bini|).

3a For each ball t, if t received two accept messages, then t sticks with a bin with the

lower load, and sends a withdraw message to the other bin.

For each ball t, if t received two reject messages, then t re-chooses two bins b3(t), b4(t)

independently and uniformly at random. The ball t sends requests to these two bins.

We refer to these requests as re-throws of ball t.

3b (No messages are sent in this step.) A bin i that receives a withdraw message from

ball t simply removes t from bini.

A bin i that receives a re-throw request from ball t, accepts t only if |bini| < T .

Discussion. The new algorithm is symmetric, non-adaptive, uses a constant number

of rounds, and requires only one synchronization point.

Our experiments are, of course, synchronous. This leads to a “layering” phenomena

since two siblings are more likely to receive the same height. In Adler et al. [1], shuffling of

heights appears in the mpgreedy(d) algorithm. We note that the mpgreedy algorithm

requires multiple synchronous rounds. One could move the synchronization point in our

algorithm to step 1b, namely, each bin waits for all the requests. If this is the case, then

the balls could be shuffled before heights are assigned. The synchronization point in step

40

2b would no longer be required since the bin waits for messages containing the height of

the siblings from the balls already in the bin.

There are a many ways to deal with doubly rejected balls. First, since they are so few,

one could simply have each such ball choose a random bin. Since there are so few such

balls, they incur only a constant additional load. This is perhaps the simplest solution.

A second option is to reserve a small portion of the bins for re-throws so that in step 1a

the reserved bins are not chosen. In step 3a, each doubly rejected ball chooses one or two

bins among the reserved bins.

Duplicate siblings due to re-throws can be removed by adding a round (called step

4a). Namely, send accept and reject messages in step 3b so a ball can send withdraw

messages in step 4a to eliminate duplicates. We emphasize that the complications caused

by re-throws are due to very few balls, hence, it is not clear that these issues are of

practical interest.

5.2 Experimental Results

5.2.1 Maximum load

We conducted experiments for n (the number of balls) ranging from 1 million to 8 million.

For each n, the results for 50 trials are presented in Table 5.1. The value of the threshold

was T = 3 in all cases, except for n = 8 · 106, where we also used T = 4 (last row).

The frequencies of the bin loads are presented. For example, a load of zero means the

bin is empty. For each load, the range of frequencies in the experiments is given by

the median and half the difference between the maximum and the minimum frequency.

Note that the load frequencies are sharply concentrated. The column labeled #re-throws

contains the number of balls that are doubly rejected, and therefore, required a re-throw.

Note that the number of doubly rejected balls roughly doubles as n doubles and is also

sharply concentrated. The frequencies of the number of doubly-rejected balls that are

still rejected after re-throws appear in the last 4 columns. We never encountered more

than 5 balls that were not finally accepted.

41

0 1 2 3 4 0 1 2 3 >=4

201975.5 601483.5 191069.5 5485 0 38

±512 ±1028.5 ±608 ±129 ±0 ±10.5

404052.5 1202875.5 382162.5 10985.5 0 76

±784.5 ±1434 ±721.5 ±255.5 ±0 ±31

808048.5 2405691.5 764304.5 21931.5 0 149

±1089.5 ±2131 ±998 ±259.5 ±0 ±27

1616149.5 4811338.5 1528371 43972.5 0 310

±1387.5 ±2758 ±1466 ±397 ±0 ±44.5

1596627.5 4840281 1529629 33367.5 87 0

±1379 ±2671.5 ±1487.5 ±394.5 ±21.5 ±1

4

8

0 00

1

1

0

26

24

50

0 0

0

23

4

1M

2M

4M

8M

8M

Rejection frequencies

3

3

3

000644

036 14

19

15

Bin LoadsTn #Retr ies

Table 5.1: Results for bin load frequencies, number of re-throws, and rejection frequenciesin 50 trials per four values of n ranging from 1 million to 8 million. For each bin load,the frequencies obtained in the trials is presented by the median and half the differencebetween the maximum and minimum frequency.

5.2.2 Progress of the algorithm

In table 5.2 we show how the frequencies of the bin loads change over the course of the

execution of the algorithm during four different trials (for n ranging from 1 million to 8

million and T = 3). The initial load is the load after step 1a (i.e. 2n balls in n bins).

The next two columns refer to loads in step 2b: after safe delete and after excess delete.

The safe delete step removed most of the balls in the overloaded bins. The excess delete

step removed all the remaining balls that caused overloading above the threshold. Note

that only a small fraction of the balls removed in the excess delete step turn out to be

doubly rejected.

Roughly 75% of the balls are doubly accepted. One sibling of each doubly accepted

ball is withdrawn in step 3a. Withdrawals get rid of all the current duplicates, and the

number of balls in the bins equals n minus the number of doubly rejected balls. The

final distribution of bin loads is presented in the last column (with duplicates due to

re-throws). Note that the proportions between the numbers in the tables for different

values of n remain roughly the same.

42

1M Initial After Safe del After Excess del After Withdraws of duplicates Final

0 135640 135640 135640 202119 202104

1 270683 270683 270683 601345 601307

2 269648 269648 269648 190994 191034

3 180946 321580 324029 5542 5555

4 90499 2401 0 0 0

>=5 52584 48 0 0 0


0 270208 270208 270208 403709 403673

1 541615 541615 541615 1203520 1203448

2 541242 541242 541242 381919 382000

3 361507 642164 646935 10852 10879

4 180457 4707 0 0 0

>=5 104971 64 0 0 0


0 541389 541389 541389 807689 807623

1 1082508 1082508 1082508 2406684 2406562

2 1082659 1082659 1082659 763719 763856

3 721420 1283730 1293444 21908 21959

4 361971 9564 0 0 0

>=5 210053 150 0 0 0


0 1083559 1083559 1083559 1616708 1616604

1 2164328 2164328 2164328 4811017 4810815

2 2165986 2165986 2165986 1528105 1528297

3 1441416 2567142 2586127 44170 44284

4 723250 18675 0 0 0

>=5 421461 310 0 0 0

Table 5.2: The change in the frequencies of the bin loads during the execution of thealgorithm.

43

Bibliography

[1] Micah Adler, Soumen Chakrabarti, Michael Mitzenmacher, and Lars Eilstrup Ras-

mussen. Parallel randomized load balancing. Random Struct. Algorithms, 13(2):159–

188, 1998.

[2] Y. Azar, A.Z. Broder, A.R. Karlin, and E. Upfal. Balanced allocations. SIAM

journal on computing, 29(1):180–200, 2000.

[3] P. Berenbrink, F. Meyer auf der Heide, and K. Schröder. Allocating Weighted Jobs

in Parallel. Theory of Computing Systems, 32(3):281–300, 1999.

[4] Artur Czumaj and Volker Stemann. Randomized allocation processes. Random

Struct. Algorithms, 18(4):297–331, 2001.

[5] William Feller. An Introduction to Probability Theory and Its Applications, Volume

1. Wiley, January 1968.

[6] Donald E. Knuth. The Art of Computer Programming Volumes 1-3 Boxed Set.

Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1998.

[7] V.F. Kolchin, B.A. Sevastyanov, and V.P. Chistyakov. Random Allocations. John

Wiley & Sons, 1978.

[8] M. Mitzenmacher, A. Richa, and R. Sitaraman. The power of two random choices:

A survey of techniques and results. Handbook of Randomized Computing, 1:255–312,

2001.

[9] M. Mitzenmacher and E. Upfal. Probability and Computing: Randomized Algorithms

and Probabilistic Analysis. Cambridge University Press, 2005.

44

[10] Martin Raab and Angelika Steger. "balls into bins" - a simple and tight analysis. In

RANDOM ’98, pages 159–170, London, UK, 1998. Springer-Verlag.

[11] V. Stemann. Parallel balanced allocations. In Proceedings of the eighth annual ACM

symposium on Parallel algorithms and architectures, pages 261–269. ACM New York,

NY, USA, 1996.

[12] B. Voecking. How Asymmetry Helps Load Balancing. Journal of the ACM,

50(4):568–589, 2003.

45

on retries in parallel distributed load balancing …medinamo/research/theses/mscthesis.pdf · load...

Documents