on retries in parallel distributed load balancing …medinamo/research/theses/mscthesis.pdf · load...
TRANSCRIPT
TEL AVIV UNIVERSITY The Iby and Aladar Fleischman Faculty of Engineering
The Zandman-Slaner School of Graduate Studies
ON RETRIES IN PARALLEL DISTRIBUTED LOAD
BALANCING ALGORITHMS
A thesis submitted toward the degree of
Master of Science in Electrical Engineering
by
Medina Mordechai
February 2009
TEL AVIV UNIVERSITY The Iby and Aladar Fleischman Faculty of Engineering
The Zandman-Slaner School of Graduate Studies
ON RETRIES IN PARALLEL DISTRIBUTED LOAD
BALANCING ALGORITHMS
A thesis submitted toward the degree of
Master of Science in Electrical Engineering
by
Medina Mordechai
This research was carried out in the Department of Electrical Engineering – Systems
under the supervision of Prof. Guy Even
February 2009
Abstract
We deal with the well studied allocation problem of assigning m balls to n bins so that
the maximum number of balls assigned to the same bin is minimized. In particular, we
focus on parallel distributed algorithms for this problem. In the classical setting, each ball
randomly chooses a few bins, one of which it is eventually assigned to. The algorithms
we consider allow retries in the sense that a ball may choose a different randomly chosen
bin if its first choices are too loaded.
Our first contribution is the observation that the lower bound presented by Adler
et al. [1] is not valid if retries are allowed. We consider the question of whether smaller
maximum loads are achievable if retries are allowed. We present and analyze an algorithm
with at most one retry per ball. The analysis is different than the analyses of previous
randomized allocation algorithms. We prove tight asymptotic bounds on the maximum
load that meet previous bounds for parallel distributed load balancing algorithms.
On the more practical side, we present a parallel algorithm with retries, and demon-
strate its improved maximum load for n in the range between 1 million and 8 million.
We obtain a maximum load of 3 using 2.5 rounds of communication.
i
Contents
1 Introduction 1
1.1 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 The Greedy Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 The Parallel Greedy Algorithm . . . . . . . . . . . . . . . . . . . 2
1.1.3 Threshold Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.1 The Basic Parallel Model . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.2 Variations on the Model . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.3 Introducing Retries . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Lower Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Main Question Addressed in this Thesis . . . . . . . . . . . . . . . . . . 6
1.5 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5.1 Retries and the Basic Parallel Model . . . . . . . . . . . . . . . . 7
1.5.2 Analyzing an Algorithm with Retries . . . . . . . . . . . . . . . . 7
1.5.3 Previous Techniques with Proofs . . . . . . . . . . . . . . . . . . 7
1.5.4 Simulation of Algorithm H-retry . . . . . . . . . . . . . . . . . 7
1.6 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 Applications 9
2.1 Hashing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Dynamic Assignment of Tasks to Servers . . . . . . . . . . . . . . . . . . 10
3 Techniques 11
3.1 Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
ii
3.2 From Binomial to Poisson . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2.1 The Binomial Random Variable . . . . . . . . . . . . . . . . . . . 11
3.2.2 The Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . 12
3.2.3 Approximation of Binomial Distribution By Poisson Distribution . 12
3.3 High Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.4 Martingales and Doob Martingales . . . . . . . . . . . . . . . . . . . . . 14
3.4.1 Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.4.2 The Azuma-Hoeffding Inequality . . . . . . . . . . . . . . . . . . 15
3.4.3 The Lipschitz Condition . . . . . . . . . . . . . . . . . . . . . . . 16
3.5 Balls into Bins Tight Bound . . . . . . . . . . . . . . . . . . . . . . . . . 16
4 Theoretical Analysis 23
4.1 Algorithm retry: Description . . . . . . . . . . . . . . . . . . . . . . . . 24
4.2 Analyzing the Number of Rejected Replicas . . . . . . . . . . . . . . . . 25
4.3 Analyzing the Number of Doubly Rejected Balls . . . . . . . . . . . . . . 29
4.4 Putting it All Together . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5 Simulation of Algorithm H-retry 38
5.1 The New Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.2.1 Maximum load . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.2.2 Progress of the algorithm . . . . . . . . . . . . . . . . . . . . . . . 42
Bibliography 44
iii
List of Figures
4.1 The distribution of bin loads in an experiment with 8 ·106 bins and 16 ·106
balls. The x-axis depicts the bins in descending load order. The y-axis
depicts the load of each bin. The leftmost bar represents bins with load 5
or higher. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2 The trade-off between the threshold and the additional load from Step 2a.
The x-axis denotes T , and the y-axis denotes the value of T and 1/(T ln(T )). 36
iv
List of Tables
5.1 Results for bin load frequencies, number of re-throws, and rejection fre-
quencies in 50 trials per four values of n ranging from 1 million to 8 mil-
lion. For each bin load, the frequencies obtained in the trials is presented
by the median and half the difference between the maximum and minimum
frequency. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.2 The change in the frequencies of the bin loads during the execution of the
algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
v
List of Algorithm Listings
4.1 A balls into bins algorithm with one retry. . . . . . . . . . . . . . . . . . 24
vi
Chapter 1
Introduction
1.1 Previous Work
Azar et al. [2] considered the problem of allocating balls to bins in a balanced way. For
simplicity, Suppose that the number of balls equals the number of bins, and is denoted by
n. If each ball selects a bin uniformly and independently at random, then the maximum
load of a bin is Θ(ln n/ ln ln n) with high probability [7]. Azar et al. proved that, if balls
choose two random bins, each ball is sequentially placed in a bin that is less loaded among
the two chosen bins, then w.h.p. the maximum load is only ln ln n/ ln 2 + Θ(1).
This surprising improvement in the maximum load has spurred a lot of interest in
randomized load balancing in various settings. Adler et al. [1] investigated the possibility
of finding parallel load balancing algorithms. They presented upper bounds and match-
ing lower bounds of Θ(
r√
log n/ log log n)
for parallel load balancing using r rounds of
communication. Shortly after that, Stemann [11] presented another parallel algorithm
with the same asymptotic bounds. However, there is a synchronization point in the
algorithm of Stemann, while the parallel algorithms in [1] are strictly asynchronous.
Berenbrink et al. [3] generalized the lower bound proved by [1] to a nonconstant number
r of communication rounds, provided that r ≤ log log n.
1
1.1.1 The Greedy Algorithm
The greedy algorithm for load balancing presented in [2] is a sequential algorithm. Each
ball, in its turn, chooses d bins uniformly and independently at random. The ball queries
each of these bins for its current load (i.e., the number of balls that have been already
assigned to it). The ball is placed in a bin with smaller load. Azar et al. [2] proved that
w.h.p. the maximum load at the end of this process is ln ln n/ ln d + Θ(1). Voecking [12]
presented a surprising variation in which the bins are partitioned into d disjoint parts of
n/d bins. Each ball randomly chooses one bin from each part. In addition, ties are broken
by applying an “always-go-left” rule. These two modifications lead to smaller constants
in the analysis. In fact, experiments reported in [12] show a reduction of the maximum
load from 4 to 3 for the range 220 ≤ n ≤ 224.
1.1.2 The Parallel Greedy Algorithm
We overview the parallel greedy algorithm pgreedy presented and investigated by Adler
et al. [1]. For simplicity we present the version in which d = 2 (namely, each balls chooses
two bins). We denote the balls by t ∈ [1..n] and the bins by i ∈ [1..n]. The algorithm
works as follows:
1. Each ball t chooses two bins b1(t) and b2(t) independently and uniformly at random.
(For simplicity, assume that b1(t) 6= b2(t)). The ball t sends requests to bins b1(t)
and b2(t). Copies of the same ball in different bins are referred to as siblings.
2. Upon receiving a request from ball t, bin i responds to ball t by reporting the
number of requests it received so far. We denote this number by hi(t), and refer to
it as the height of ball t in bin i.
3. After receiving its heights from b1(t) and b2(t), ball t sends a commit to the bin
that assigned a lower height. (Tie-breaking rules are not addressed in [1]).
Adler et al. [1] proved that the maximum load achieved by pgreedy is O(√
log n/ log log n)
with high probability. They also proved a matching lower bound. In their experimental
results, increasing the number of bins chosen by each ball to d = 3 only slightly improved
the maximum load. Moreover, setting d = 4 had an adverse effect on the maximum
2
load. To summarize, for n between 106 and 8 · 106, the maximum load was 5 or 6 (for
2 ≤ d ≤ 4).
1.1.3 Threshold Algorithm
The threshold algorithm threshold(T ) presented by Adler et al. [1] works differently.
A threshold parameter T is used to bound the number of balls that may be assigned to
each bin in each round. Initially, all balls are unaccepted. In each round, each unaccepted
ball chooses independently and uniformly a single random bin. Each bin accepts at most
T balls among the balls that have chosen it. The other balls, if any, are rejected. Note
that, although described “in rounds”, algorithm threshold(T ) can work asynchronously
as balls may proceed to the next round as soon as the replies arrive. The idea is that
the number of unaccepted balls decreases rapidly, and thus, Adler et al. prove that, if
r is constant, then setting T = O( r√
log n/ log log n) requires at most r rounds with
high probability. Another threshold algorithm was considered by Stemann [11]. Each
ball begins (like in pgreedy) by choosing two random bins and sending requests to
them. Only after every bin collects all the requests, the assignment process begins. The
assignment process is iterative and proceeds as follows. Each bin that received at most
T requests accepts all of them, and notifies the requesting balls. Each accepted ball
sends a message to the other bin to withdraw the sibling ball. Bins that accept withdraw
messages have fewer requests and hence may accept balls in the next iteration. We first
note that Stemann’s algorithms is not completely asynchronous as each bin must know if
it will receive more than T requests before accepting balls. Stemann was able to match
the results of [1], namely, for n balls and n bins and for a constant α, the maximum load
after r rounds of communication, where 2 ≤ r ≤ log log n3α
, is max√
3αr log nlog log n
, 4 (α + 7)
with probability at least 1− 1nα−1 . Plugging in r = 2 and α = 2 gives a maximum load of
√
12 logn/ log log n in 2 rounds of communication.
3
1.2 Model
1.2.1 The Basic Parallel Model
In this section we overview the model of parallel load balancing from Adler et al. [1].
There are n balls and n bins. This model is often referred to as static since balls are not
deleted.
The communication model is as follows. In the beginning, each ball t chooses a
constant number, d, of bins. From this point on, messages are sent only between pairs
(t, i), where ball t had chosen bin i. Communication proceeds in rounds. Each round
consists of messages from balls to bins and responses from bins to balls. In the last
round, each ball commits to one of the d bins that it has chosen initially. We assume that
each bin may simultaneously send messages to all the balls that have sent it a message
(in fact, w.h.p. there are only O(logn/ log log n) such messages per bin). The length of
each message is logarithmic in n, so messages can contain the ID of a ball and a bin.
A sublogarithmic bound on the number of messages sent in each round by each bin is
implied in this model. In addition, each ball sends at most d (a constant) messages per
round.
The requirements from the algorithm are as follows.
1. Nonadaptive - the random choices of each ball are made before any communication
takes place.
2. Symmetric - all the balls run the "same" program and all the bins run the "same"
program1. Moreover, each bin is chosen uniformly and independently at random.
3. Asynchronous - balls and bins wait only for messages addressed to them. In fact, a
bin or a ball waits for a message MSG only if the protocol guarantees the transmis-
sion of message MSG. In strict asynchronous algorithms, a ball or bin may proceed
even if a round is not completed.
1The definition of symmetry deserves a more formal treatment to avoid using ID’s of balls and bins,etc. One simple definition is that the program is fixed for all values of n.
4
1.2.2 Variations on the Model
In this section we consider modifications of the basic parallel model. We show that trivial
protocols exist if the requirements are relaxed:
1. Relaxing the symmetry requirement: One could think of a “macro” bin, that is in
charge of log n bins. Hence, we have nlog n
macro bins, each in charge of a disjoint
set of log n bins. If we relax symmetry so that “macro” bins behave differently, then
a constant maximum load is easily achievable w.h.p. as follows. Each ball sends
a request to a uniformly random macro bin. Each macro bin distributes the balls
evenly among the log n bins, leading to a constant maximum load w.h.p.
2. Relaxing the nonadaptiveness requirement: We can simulate the “macro” bins (from
the previous relaxation) by violating nonadaptiveness. A constant maximum load is
easily achievable w.h.p. as follows. Each ball sends a request to a uniformly random
bin. Contiguous disjoint blocks of log n bins “share” their requests. Determine a
balanced allocation and send messages back to balls with a new target bin.
1.2.3 Introducing Retries
We consider a parallel model which adds a feature to the basic parallel model as follows.
Allowing retries: The random choices of each ball C, are divided into two disjoint
sets C = I ∪ R. The set I consists of choices that are employed initially, and the set R
consists of choices that are used, by demand, instead of a choice in the set I.
Note that Algorithm threshold is a special case of an algorithm with retries. We
seek a more general retry algorithm (e.g., Algorithm 4.1).
1.3 Lower Bounds
The technique that was used in [1] for analyzing a parallel balls-and-bins lower bound
is the witness tree method. This technique is used for bounding the probability of some
“bad event", e.g., the probability of the occurrence of a “heavily-loaded" bin. The lower
bound is expressed in terms of the number of rounds of communication, r, the number of
5
balls, m, the number of bins, n, and the number of choices available to each ball, d. We
will outline the technique for d = 2, r = 2, and m = n. A vertex in a graph is associated
with each bin. As d = 2 each ball can be represented by an undirected edge between the
two bins that the ball has chosen. Now, the bin assigned to a ball is modeled by orienting
the edge of the ball. The in-degree of a vertex after all edges are oriented corresponds to
the load of the bin. The communication model in this context is that for each round of
communication, every ball and bin “sees” a larger neighborhood of the graph around it.
For n balls and n bins the corresponding graph is a random graph chosen uniformly from
all the graphs with n vertices and n edges. Adler et al. [1] showed that with constant
probability there exists a subgraph, which is a rooted symmetric tree of radius r = 2,
such that the root has a degree of(√
2− o(1))√
log n/ log log n. The symmetry of the
tree implies that the orientation procedure must orient half the edges incident to the
root towards it. This implies a maximum load of(√
2/2− o(1))√
log n/ log log n. They
showed a general lower bound of Ω(
r√
log n/ log log n)
for the maximum load with at
least constant probability if the number of rounds is r. That lower bound applies for a
symmetric nonadaptive scheme, as every ball chooses in advance its two bin candidates.
For schemes that have retries that lower bound does not apply (as elaborated in (1.5.1)).
Berenbrink et al. [3] generalized the lower bound proved by [1] to a nonconstant number
r of communication rounds, provided that r ≤ log log n.
1.4 Main Question Addressed in this Thesis
We study in this thesis the question of whether a parallel allocation processes with retries
may substantially decrease the maximum load. A similar question was raised by [4] for
the sequential model. They showed a lower bound of ln ln n/ ln d+Ω (1) on the maximum
load w.h.p, which does not change the known lower bound by [2] even though the latter
does not apply for the retry model.
6
1.5 Contributions
1.5.1 Retries and the Basic Parallel Model
We noticed that the lower bound presented by [1] is not valid if retries are allowed.
The lower bound proof does not capture the fact that in the retry parallel model
(1.2.3) there are additional choices which are not considered at the beginning and might
appear later (in the graph). Thus, the graph model does not deal with retries.
1.5.2 Analyzing an Algorithm with Retries
In Chapter 4 we analyze algorithm retry that allows one retry. This analysis is based
on techniques presented in Chapter 3. To the best of our knowledge, these techniques
haven not been applied to “balls and bins” algorithms. The analysis is asymptotically
tight. The tight analysis of a parallel algorithm with retries meets the asymptotically
maximum load as in [1]. We conclude that there is no asymptotic benefit by applying
retries.
1.5.3 Previous Techniques with Proofs
The techniques that were used throughout Chapter 4 are summarized in Chapter 3. We
tailored these techniques to our specific parameters to obtain somewhat simpler proofs.
1.5.4 Simulation of Algorithm H-retry
The gap between log2 log2 n and√
log2 n/ log2 log2 n becomes noticeable only for very
large values of n (e.g., n > 21024). This raises the need for conducting experiments (i.e.,
simulations) with smaller values of n (i.e., n ∈ [106, 8 · 106]) since the asymptotic analysis
does not yield results for such values of n. Indeed, simulations by Azar et al. [2] showed
that the maximum load is 4 for n ≤ 224. This improves over choosing a single bin where
the maximum load reaches even 13 for the same values of n. Voecking [12] suggested a
variation (see details in Sec. 1.1.1) that reduced the maximum load to 3-4 for n ≤ 224.
Adler et al. [1] also conducted experiments, and showed that their parallel algorithms
7
obtained a maximum load of 5-6 for n in the range between 1 million and 32 million
balls.
We present in Chapter 5 a practical parallel algorithm and demonstrate its improved
maximum load for n in the range between 1 million and 8 million. We obtain a maximum
load of 3 using 2.5 rounds of communication. Similarly to [11], our algorithm requires
a single synchronization point. In our experiments, at most 5 balls were rejected, and
at most a few hundred balls had duplicate siblings assigned to bins. One can assign
the rejected balls and delete the duplicate siblings using an additional round. From a
practical point of view, it is not clear whether the duplicates and the few rejections are
of interest.
1.6 Organization
The thesis is organized as follows: In Chapter 2 we overview two balls-in-bins applications.
In Chapter 3 we overview the techniques that were used in Chapter 4 and their proofs. In
Chapter 4 we present Algorithm retry and its tight analysis. In Chapter 5 we present
Algorithm H-retry and experimental results we have obtained with it.
8
Chapter 2
Applications
This chapter surveys some applications of the Balls and Bins model. This survey is
inspired by [8]. Sections 2.1 shows a sequential application while 2.2 shows a parallel
application and its relation to our new model.
2.1 Hashing
The standard hash table implementation [6] uses a single hash function to map many
keys to entries in a smaller table. If there is a collision, i.e., if two or more keys map
to the same table entry, then all the conflicting keys are stored in a linked list called a
chain. Thus, each table entry is the head of a chain and the maximum time to search for
a key in the hash table is proportional to the length of the longest chain in the table. If
the hash function is perfectly random - i.e., if each key entry is mapped to an entry of the
table independently and uniformly at random, and n keys are sequentially inserted into
a table with n entries - then the length of the longest chain is Θ (log n/ log log n) w.h.p.
This bound follows from the analogous bound on the maximum load in the classical
balls-and-bins problem where each ball chooses a single bin independently and uniformly
at random [7]. Now suppose that we use two perfectly random hash functions. When
inserting a key, we apply both hash functions to determine the two possible table entries
where the key can be inserted. Then, of the two possible entries, we add the key to the
shorter of the two chains. To search for an element, we have to search through the chains
at two entries given by both hash functions. If n keys are sequentialy inserted into the
9
table, the length of the longest chain is Θ (log log n) w.h.p, implying that the maximum
time needed to search the hash table is Θ (log log n) w.h.p. This bound also follows from
the analogous bound for the balls-and-bins problem where each ball chooses two bins at
random [2].
2.2 Dynamic Assignment of Tasks to Servers
Consider n identical servers, and n identical tasks. Suppose that the tasks arrive in
parallel and need to be assigned to a server. We would like to minimize the maximum load
of the servers. Ideally, when a task arrives (requesting a server), we would like to assign it
to the least loaded server. However, gathering complete information about the loads of all
the servers is expensive and assumed to be not justifiable. An alternative approach that
requires no coordination is to simply allocate each task to a random server. If there are n
tasks and n servers, using the parallel balls-and-bins analogy, some server is assigned with
Θ(
√
log n/ log log n)
tasks w.h.p. The new model enables the unfortunate tasks (e.g.
those who did not commit to a server due to the parallel balls-and-bins strategy) to “try”
again. Again, the retry promises that some server is assigned with Θ(
√
log n/ log log n)
tasks w.h.p.
10
Chapter 3
Techniques
This chapter will survey the techniques that are used in chapter 4. The survey is based
on [1, 5, 9, 10] .
3.1 Expectation
Theorem 3.1. [Linearity of Expectation] For any finite collection of discrete random
variables X1, X2, ..., Xn with finite expectations,
E
[
n∑
i=1
Xi
]
=
n∑
i=1
E [Xi] .
Theorem 3.1 holds for any collection of random variables, even if they are not inde-
pendent.
3.2 From Binomial to Poisson
3.2.1 The Binomial Random Variable
Definition 3.2. A discrete Binomial random variable X with parameters n and p, de-
noted by B(n, p), is defined by the following probability distribution on k = 0, 1, 2, ..., n:
Pr (X = k) =(n
k
)
pk(1− p)n−k.
11
The expectation of this random variable is np and its variance is np(1− p).
3.2.2 The Poisson Distribution
Definition 3.3. A discrete Poisson random variable X with parameter λ is given by the
following probability distribution on k = 0, 1, 2, ...:
Pr (X = k) =e−λλk
k!.
The expectation of this random variable as well as its variance is λ.
3.2.3 Approximation of Binomial Distribution By Poisson Dis-
tribution
Claim 3.4. Let X be discrete Binomial random variable with parameters n and p, and
let λ = np, then:
λk
k!
(
1− λ
n
)n−k
≥ Pr (X = k) ≥ λk
k!
(
1− k
n
)k (
1− λ
n
)n
.
Proof. First, let us rewrite Pr (X = k):
Pr (X = k) =(n
k
)
· pk(1− p)n−k
=(n− k + 1) · (n− k + 2) · ... · n
k!· pk(1− p)n−k.
Plugging p = λn
we will get:
Pr (X = k) =λk
k!
(n− k + 1) · (n− k + 2) · ... · nnk
·(
1− λ
n
)n−k
.
The claim follows.
Claim 3.5. Let X be discrete Binomial random variable with parameters n and p. Let
12
Y be a discrete Poisson random variable with parameter λ, then:
Pr (Y = k) · ekλ/n > Pr (X = k) > Pr (Y = k) · e−k2/(n−k)−λ2/(n−λ).
Proof. The following holds for 0 < t < 1:
e−t
1−t < 1− t < e−t.
We now conclude the upper and lower bounds using Claim 3.4.
Upper bound:
Pr (X = k) ≤ λk
k!
(
1− λ
n
)n−k
<λk
k!
(
e−λn
)n−k
=λk
k!e−λe
λkn
= Pr (Y = k) · ekλ/n.
Lower bound:
Pr (X = k) >λk
k!
(
1− k
n
)k (
1− λ
n
)n
>λk
k!e−
k2
n−k e−λn
n−λ
= Pr (Y = k) · e−k2/(n−k)−λ2/(n−λ).
Claim 3.5 is often used to prove that the Poisson distribution “approximates” the
Binomial distribution. Of course, this approximation is not for all values of k. It holds,
for example, for constant k or even k <√
n.
13
3.3 High Probability
Definition 3.6. We say that an event X occurs with high probability if Pr (X) ≥ 1 −O(
1n
)
.
3.4 Martingales and Doob Martingales
Martingales are sequences of random variables satisfying certain conditions. Those con-
ditions are satisfied in our balls in bins setting. We will present in this section a technique
that provides high probability bounds that are needed in Chapter 4.
3.4.1 Martingales
Definition 3.7. A sequence of random variables Z0, Z1, ... is a martingale with respect
to the sequence X0, X1, ... if, for all n ≥ 0, the following conditions hold:
• Zn is a function of X0, X1, ..., Xn.
• E [|Zn|] <∞.
• E [Zn+1 |X0, X1, ..., Xn] = Zn.
Definition 3.8. A sequence of random variables Z0, Z1, ... is called martingale if it is a
martingale with respect to itself. That is, E [|Zn|] <∞ and E [Zn+1 | Z0, Z1, ..., Zn] = Zn.
Definition 3.9. A Doob martingale refers to a sequence of random variables constructed
using the following exposure process. Let X0, X1, ..., Xn be a sequence of random vari-
ables, and let Y be a random variable with E [|Y |] <∞ that may “depend” on X0, X1, ..., Xn.
Let:
Zi , E [Y |X0, X1, ..., Xi] , i = 0, 1, ..., n.
One could think of the Doob martingale as an exposure process, that is, as the se-
quence advances more and more information is revealed, e.g., exposure of edges in a
random graph.
14
Claim 3.10. A Doob martingale Zini=0 constructed as in Definition 3.9 is a martingale
with respect to X0, X1, ..., Xn.
Proof. Using the fact that E [Y |X0, X1, ..., Xi+1] is a random variable and that E [V |W ] =
E [E [V | U, W ] |W ]:
E [Zi+1 |X0, X1, ..., Xi] = E [E [Y |X0, X1, ..., Xi+1] |X0, X1, ..., Xi]
= E [Y |X0, X1, ..., Xi]
= Zi.
3.4.2 The Azuma-Hoeffding Inequality
We use tail inequalities for martingales. These inequalities are similar to Chernoff in-
equalities, and apply even when the underlying random variables are not independent.
Theorem 3.11. [Azuma-Hoeffding Inequality] Let X0, X1, ..., Xn be a martingale.
Let B1, B1, ..., Bn denote a sequence of random variables where Bk may be function of
X0, X1, ..., Xk−1. Let dknk=1 denote a sequence of real numbers.
If
∀k : Bk ≤ Xk −Xk−1 ≤ Bk + dk
then, for all t ≥ 0 and any λ > 0,
Pr (|Xt −X0| ≥ λ) ≤ 2e−2λ2/(∑t
k=1 d2k).
15
3.4.3 The Lipschitz Condition
A real function f : Rn →R satisfies the Lipschitz condition with bound c if, for any i
and for any set of values x1, x2, ..., xn and yi,
|f (x1, x2, ..., xi−1, xi, xi+1, ..., xn)− f (x1, x2, ..., xi−1, yi, xi+1, ..., xn)| ≤ c.
Let X1, X2, ..., Xn denote random variables and ~X denote the n-tupple (x1, x2, ..., xn).
Let Z0 , E
[
f(
~X)]
and Zk , E
[
f(
~X)
|X1, X2, ..., Xk
]
, by Definition 3.9, the sequence
Z0, Z1, ... is a Doob martingale. If the Xk are independent random variables, then there
exist random variables Bknk=1, that depend only on Z0, Z1, ..., Zk−1, with Bk ≤ Zk −Zk−1 ≤ Bk + c. We may then apply Theorem 3.11 with dk ≡ c.
Theorem 3.12. Let X0, X1, ..., Xn be a sequence of independent random variables.
Let f(
~X)
= f (X1, X2, ..., Xn) be a function that satisfies the Lipschitz condition with
bound c.
Let Z0 , E
[
f(
~X)]
and Zk , E
[
f(
~X)
|X1, X2, ..., Xk
]
. Then the sequence Z0, Z1, ...
is a Doob martingale and for all t ≥ 0 and any λ > 0,
Pr (|Zt − Z0| ≥ λ) ≤ 2e−2λ2/t·c2.
3.5 Balls into Bins Tight Bound
Consider the randomized process which assigns m balls into n bins independently and
uniformly at random. Our goal is to bound the maximum load obtained by this process.
The general bound is given by the following theorem whose proof is rather elaborate
[7, 10]. Kolchin et al. [7] showed, among many results, that if m = nlg n
then the number
of balls whose height1 is greater than lg nlg lg n
is constant. This implies a maximum load of
lg nlg lg n
+ O (1). More precise constants were proved in [10].
Theorem 3.13. The randomized process which assigns m balls into n bins independently
and uniformly at random produces a maximum load of Θ
(
ln n
ln(1+ nm·lnn)
+ mn
)
w.h.p.
1We assign heights [1..k] to balls in a bin with k balls.
16
Our analysis requires using Theorem 3.13 only for√
n < m < nlnn
. Theorem 3.16 deals
with this special case of Theorem 3.13. The maximum load in that case is Θ(
log nlog(n/m)
)
.
We present a self-contained proof of Theorem 3.16 using the techniques presented in this
chapter. In addition, we use the first and second moment method as in [10]. In [1], upper
bounds are proved using a Poisson approximation. We first summarize the first and
second moment method from [10] in Claim 3.14, followed by Claim 3.15, that is helpful
in proving Theorem 3.16.
The following claim is proved using Markov’s inequality and Chebyshev’s inequality
(hence the term first and second moment).
Claim 3.14. Let X1, X2, ..., Xn be identically distributed 0-1 random variables, such that:
∀1 ≤ i ≤ j ≤ n : E [XiXj] ≤ (1 + o(1)) E2 [X1] . (3.1)
Let S ,∑n
i=1 Xi , then:
Pr [S = 0] =
1− o(1), if E [S ]→ 0
o(1), if E [S]→∞ .
Claim 3.15. Let2 m ≤ e · nln n
. Let α > 0 denote a constant. Let k , α · log nlog(n/m)
and
γ(
mn
)
, n · e−m/n(mn )
k
k!, then
n1−3α−o(1) < γ(
mn
)
≤ n1−α−o(1) .
Proof. It is sufficient to prove that:
ln(
n1−3α−o(1))
< ln(
γ(m
n
))
≤ ln(
n1−α+o(1))
.
Let us consider the expression, ln(
γ(
mn
))
:
2Note that the assumption m ≤ e · n
ln nis used in Claim 3.15 only for the lower bound: n1−3α−o(1) ≤
γ(
m
n
)
.
17
ln(
γ(m
n
))
= ln
(
n ·e−m/n
(
mn
)k
k!
)
(3.2)
= lnn + k · ln(m
n
)
−m
n− (1 + o(1)) · k · ln k
= lnn + α ·log n
log(n/m)· ln
(m
n
)
−m
n− (1 + o(1)) · α ·
log n
log(n/m)· ln
(
α ·log n
log(n/m)
)
= lnn + α ·lnn
ln(n/m)·[
ln(m
n
)
− (1 + o(1)) · ln(
α ·ln n
ln(n/m)
)]
− o(1)
= lnn − α ·lnn
ln(n/m)·[
ln( n
m
)
+ (1 + o(1)) ·[
ln α + ln ln n − ln ln(n
m)]]
− o(1) .
The assumption that m ≤ e · nln n
implies that ln ne≤ n
m, hence for n ≥ e(e2):
ln ln n
ln(n/m)≤ ln ln n
ln(
lnne
) (3.3)
=ln ln n
ln ln n− 1
= 1 +1
ln ln n− 1
≤ 2 .
Let us consider the lower bound. Combining Equations 3.2, 3.3:
ln(
γ(m
n
))
= ln n − α ·lnn
ln(n/m)·[
ln( n
m
)
+ (1 + o(1)) ·[
ln α + ln ln n − ln ln(n
m)]]
− o(1)
= ln n − α · ln n − α ·ln n · (1 + o(1)) · ln α
ln(n/m)− α ·
lnn · (1 + o(1)) · ln ln n
ln(n/m)+ α ·
ln n · (1 + o(1)) · ln ln( nm
)
ln(n/m)− o(1)
> ln n − α · ln n − o(1) · ln n − α · (2 + o(1)) lnn − o(1)
= (1 − 3α − o(1)) · ln n .
Let us consider the upper bound. Equation 3.2 yields:
ln(
γ(m
n
))
= ln n − α ·lnn
ln(n/m)·[
ln( n
m
)
+ (1 + o(1)) ·[
ln α + ln ln n − ln ln(n
m)]]
− o(1)
= ln n − α · ln n − o(1) · ln n − α ·lnn
ln(n/m)·[
(1 + o(1)) ·[
ln ln n − ln ln(n
m)]]
− o(1)
≤ (1 − α − o(1)) · lnn .
The claim follows.
Theorem 3.16. Let√
n ≤ m ≤ e · nln n
. Then, the randomized process which assigns
m balls into n bins independently and uniformly at random produces a maximum load of
18
Θ(
log nlog(n/m)
)
w.h.p.
Proof. Let X1, X2, ..., Xn be the number of balls in each bin. Each Xi is a discrete
Binomial random variable with parameters m and p = 1n.
Let
k , α · log n
log(n/m),
χ(k)i =
1, Xi ≥ k,
0, o.w. ,
S ,
n∑
i=1
χ(k)i .
We will show that:
Pr
(
∃ 1 ≤ i ≤ n : Xi ≥ α · log n
log(n/m)
)
=
1− o(1), if 0 < α < 13,
o(1), if α > 1.
The proof structure is as follows: We first bound Pr (Xi ≥ k), to obtain a tight bound for
E [S] = n ·Pr (Xi ≥ k). We find threshold values for α that determine whether E [S]→ 0
or E [S] → ∞. The theorem follows after we show that the premise in Equation (3.1)
holds, and by applying Claim (3.14). That will provide us with a maximum load of
Θ(
log nlog(n/m)
)
w.h.p.
Let us consider Pr (Xi ≥ k). Let Yi be a discrete Poisson random variable with pa-
rameter λ = mn≤ e
ln n. Using Claim 3.5:
Pr (Xi ≥ k) <m∑
ℓ=k
Pr (Yi = ℓ) · eℓ λm ≤ eλ
m∑
ℓ=k
e−λλℓ
ℓ!. (3.4)
Let aℓ , e−λλℓ
ℓ!, since m < n for n > ee and ℓ ≥ 1:
aℓ+1
aℓ= λ
ℓ+1<
1
2
19
Now we can rewrite Equation 3.4:
Pr (Xi ≥ k) < eλ
(
ak +m∑
ℓ=k
ak ·ℓ∏
j=k
aj+1
aj
)
< eλ
(
ak + ak ·1
2·
∞∑
ℓ=0
(
1
2
)ℓ)
< 2eλ · ak
= (2 + o(1)) · ak . (3.5)
Using Claim 3.5:
Pr (Xi ≥ k) > Pr (Xi = k) > Pr (Y = k) · e−k2/(m−k)−λ2/(m−λ) . (3.6)
Since m ≤ e · nln n
, then :
λ2
m− λ=
m
n2 − n≤ e · n
ln n
n (n− 1)=
e
ln n (n− 1)→ 0 (3.7)
The assumption√
n ≤ m ≤ e · nlnn
implies that lnne≤ n
m≤ √n, substitute k =
α · log nlog(n/m)
:
k2
m− k≤
(
α · log nlog(n/m)
)2
√n− α · log n
log(n/m)
=α2 · log2 n
log2(n/m) · √n− α · log(n/m) · log n
≤ α2 · log2 n(
log(
ln ne
))2 · √n− α · 12· log2(n)
→ 0 . (3.8)
Combining (3.7) and (3.8) and plugging it in the bound presented in Equation 3.6:
Pr (Xi ≥ k) > (1 + o(1)) · e−m/n
(
mn
)k
k!. (3.9)
20
Combining (3.5) and (3.9) 3:
(2 + o(1)) · e−m/n
(
mn
)k
k!> Pr (Xi ≥ k) > (1 + o(1)) · e−m/n
(
mn
)k
k!.
Due to linearity of expectation (3.1):
E [S] = n · Pr (Xi ≥ k) = Θ
(
n · e−m/n
(
mn
)k
k!
)
= Θ(
γ(m
n
))
. (3.10)
By Claim 3.15:
limn→∞
E [S] =
0, if α > 1,
∞, if 0 < α < 13
.
(3.11)
To apply Claim 3.14 it remains to show that, Equation 3.1 holds. Indeed, as justifiedbelow:
E
[
χ(k)i χ
(k)j
]
= Pr (Xi ≥ k, Xj ≥ k)
=
n−k∑
k1=k
n−k1∑
k2=k
( n
k1
)(n − k1
k2
)
(
1
n
)k1+k2(
1 −2
n
)n−(k1+k2)
≤n∑
k1=k
n∑
k2=k
( n
k1
)( n
k2
)
(
1
n
)k1+k2(
1 −1
n
)2n−2(k1+k2)
=n∑
k1=k
( n
k1
)
(
1
n
)k1(
1 −1
n
)n−k1(
1 −1
n
)
−k1n∑
k2=k
( n
k2
)
(
1
n
)k2(
1 −1
n
)n−k2(
1 −1
n
)
−k2
(3.12)
≤
(
(
1 −1
n
)
−k
Pr (Xi ≥ k)
)2
=
(
(
1 −1
n
)−k
· E
[
χ(k)i
]
)2
≤ (1 + o(1)) · E2[
χ(k)i
]
.
The third step is valid since:(
n−k1
k2
)
≤(
nk2
)
and(
1− 2n
)
≤(
1− 1n
)2. The last inequal-
3Note how closely the tail of the Poisson distribution approximates the tail of the Binomial distributionfor this special value of k. This is the reason for the relative simplicity of the proof for this specific settingof the parameters.
21
ity is valid for k = α · log nlog(n/m)
, since:
k
n= α · log n
n · log(n/m)
≤ α · log n
n · log(
lnne
)
→ 0 ,
hence:
(
1− 1
n
)−k
< ekn
= 1 + o(1) .
To complete the proof we apply 3.14 since Equation 3.11 and Equation 3.12 imply
that the premises of Claim 3.14 hold.
22
Chapter 4
Theoretical Analysis
This chapter deals with the analysis of a load balancing algorithm with retries. We refer
to this algorithm by retry (see listing in Algorithm 4.1).
This algorithm captures the major properties of Algorithm H-retry presented in
Chapter 5. These common properties are: tossing two independent replicas for each ball,
rejection of a replica based on a threshold, and re-throwing the doubly rejected balls (i.e.,
retry).
We assign heights [1..k] to balls in a bin with k balls.
In Figure 4.1 we depict the bin loads in an experiment when m = 16 · 106 balls are
randomly uniformly and independently tossed in n = 8 · 106 bins. The x-axis depicts
the bins in descending load order. The y-axis depicts the load of each bin. We set a
threshold T , and consider all balls of height greater than T . These balls are called the
excess replicas of balls. A ball is doubly rejected if both replicas of the balls are excess
replicas.
Given the number k of rejected replicas (the replicas above the threshold line in
Figure 4.1), both replicas of a ball are rejected with probability k·(k−1)2n·(2n−1)
<(
k2n
)2. The
conditioning on the number of rejected replicas enables us to bound the expected number
of doubly rejected balls. To use this conditioning on k, we show that the number of
doubly rejected balls is concentrated. The proof technique is based on showing that our
setting matches that of Theorem 3.12 (Azuma-Hoeffding Inequality with the Lipschitz
condition). To bound the expected number of doubly rejected balls we need a bound on
the variance of the rejected replicas. Again, Theorem 3.12 is used to bound the variance.
23
Once we have a high probability bound on the number of doubly rejected balls, we use
Theorem 3.13 to conclude that the maximum load that retry produces is Θ(√
lg nlg lg n
)
w.h.p.
4.1 Algorithm retry: Description
Let height(i(b)) denote the height of the ball b in the i’th bin.
Algorithm 4.1 A balls into bins algorithm with one retry.
retry(threshold T, number of bins n):
1. Round 1 :
(a) Each ball b chooses uniformly at random two bins i1(b), i2(b) and sends requeststo these bins.
(b) A ball b receives reject messages from the bins. We denote the rejected replicasof a ball b by R(b), namely,R(b) , j ∈ 1, 2 : height(ij (b)) > T.
2. Round 2:
(a) If |R(b)| = 2 (i.e. b is doubly rejected) then b chooses uniformly at randombin i3(b).
Note that Algorithm retry is nonadpative, as i1(b), i2(b), i3(b) could have been chosen
before any communication took place.
One could withdraw one replica of a a ball, that both his replicas were not rejected,
at the end of Round 2. Since the analysis of Algorithm retry does not benefit from this
withdrawal, the description of Algorithm retry does not include a withdrawal.
24
4.2 Analyzing the Number of Rejected Replicas
Figure 4.1: The distribution of bin loads in an experiment with 8 · 106 bins and 16 · 106
balls. The x-axis depicts the bins in descending load order. The y-axis depicts the loadof each bin. The leftmost bar represents bins with load 5 or higher.
x0 1#106 2#106 3#106 4#106 5#106 6#106 7#106 8#106
0
1
2
3
4
5
Threshold
RejectedReplicas
Suppose that m balls are tossed into n bins independently and uniformly at random and
let X(m)1 , X
(m)2 , ..., X
(m)n denote the number of balls in each bin. Note that if m = 2n then
X(m)i equals the load of bin i at the end of Step 1a (namely, each replica is considered as
one of the m balls). To simplify notation we denote X(m)i by Xi whenever the value of m
is clear.
The excess load in bin i equals Xi − T (i.e., the number of replicas that are above the
threshold). Let f(
~X)
,∑n
i=1 max (Xi − T, 0) =∑n
b=1 |R(b)|. If m = 2n then f(
~X)
equals the number of reject messages in Step 1b.
The following claim bounds the expected number of reject messages. We use linearity
of expectation (3.1), the binomial distribution inequalities (3.5), and bound the tail of a
Poisson distribution by a geometric series. The claim states that the expected value is
Θ(
n · 2T+1
(T+1)!
)
.
25
Claim 4.1. If the threshold T satisfies 6 ≤ T ≤ √n, then
e−5n · 2T+1
(T + 1)!≤ E
[
f(
~X)]
≤ 2n · 2T+1
(T + 1)!.
Proof. Let ak , (k − T ) · e−2·2k
k!. It follows that for j ≥ T + 1:
aj+1
aj=
(j + 1− T ) · 2(j − T ) · (j + 1)
=
(
1 +1
j − T
)
· 2
j + 1
≤ 4
T + 2.
If T ≥ 6 then:
aj+1
aj
≤ 4T+2
≤ 1
2.
Since the random variables X1, X2, ..., Xn are identically distributed, by linearity of ex-
pectation:
E
[
f(
~X)]
= E [∑n
i=1 max (Xi − T, 0)] = n · E [max (Xi − T, 0)] .
Note that each Xi is a binomial random variable with parameters 2n and 1n. Then, by
Claim 3.5, for any fixed k = 0, 1, 2, ...2n:
Pr (Xi = k) <e−2 · 2k
k!· ek/n
≤ e−2 · 2k
k!· e2.
Note that for any fixed k = 0, 1, 2, ...,√
n the following holds:
k2
2n− k+
2
n− 1<
n
2n−√n+ 2
=1
2− 1√n
+ 2
< 3.
26
Hence for any fixed k = 0, 1, 2, ...,√
n, due to Claim 3.5:
Pr (Xi = k) >e−2 · 2k
k!· e−k2/(2n−k)−2/(n−1)
>e−2 · 2k
k!· e−3.
It follows that:
n · E [max (Xi − T, 0)] ≤ n ·∑2n
k=T+1ak · e2
≤ n · e2
aT+1 +∑2n
k=T+1aT+1 ·
∏k
j=T+1
(
aj+1
aj
)
= n · e2
aT+1 + aT+1 ·∑2n
k=T+1
(
1
2
)k−(T+1)+1
≤ n · e2
aT+1 + aT+1 ·1
2·∑∞
k=0
(
1
2
)k
= 2e2 · n · aT+1
which completes the proof of the upper bound.
The lower bound is proved as follows:
E
[
f(
~X)]
> n ·∑2n
k=T+1ak · e−3 ≥ e−3 · n · aT+1.
The following claim shows that the number of rejected replicas of balls is concentrated
around its expected value . We use Azuma-Hoeffding inequality with the Lipschitz con-
dition (3.12). The claim states that the probability that the number of rejected replicas
of balls deviates from its expected value by ε is at most 2 · e−ε2/n.
For i, j ∈ N let δi,j ,
1 i = j
0 o.w
.
Claim 4.2. Pr(∣
∣
∣f(
~X)
− E
[
f(
~X)]∣
∣
∣ ≥ ε)
≤ 2 · e−ε2/n.
Proof. There are 2n replicas of balls. We denote a replica by 1 ≤ β ≤ 2n. Let ξβ
denote the bin that replica β is sent to. The random variables ξβ2nβ=1 are independent
27
and uniformly distributed. The load in bin i satisfies X(2n)i =
∑2nβ=1 δξβ ,i. Let f
(
~ξ)
,∑n
i=1 max(
∑2nβ=1 δξβ ,i − T, 0
)
, then f(
~ξ)
= f(
~X)
. Let Z0 = E
[
f(
~ξ)]
and Zk =
E
[
f(
~ξ)
| ξ1, ξ2, ..., ξk
]
. The sequence Z0, Z1, ... is a Doob martingale. Note that Z2n =
E
[
f(
~ξ)
| ξ1, ξ2, ..., ξ2n
]
= f(
~ξ)
. The function f satisfies the Lipschitz condition with
bound1 1. Now we may apply the Azuma-Hoeffding inequality special case (3.12):
Pr(∣
∣
∣f(
~ξ)
− E
[
f(
~ξ)]∣
∣
∣≥ ε)
= Pr (|Z2n − Z0| ≥ ε)
≤ 2 · e−ε2/n.
The claim follows.
We also bound the second moment.
Corollary 4.3. Pr
(
(
f(
~X)
− E
[
f(
~X)])2
≥ γ
)
≤ 2 · e−γ/n.
The following claim bounds the variance of the number of rejected replicas of balls.
We use Corollary 4.3. The claim states that the variance of the number of rejected replicas
of balls is at most 4n.
Claim 4.4. V ar(
f(
~X))
≤ 4n.
Proof. By Corollary 4.3:
Pr
(
(
f(
~X)
− E
[
f(
~X)])2
≥ α · n)
≤ 2 · e−α.
1moving a replica may result with either: (a) no change in the excess load summation (non-excessreplica→non-excess replica, excess replica→excess replica ), (b) decrease of 1 (excess replica→non-excessreplica ), (c) increase of 1 (non-excess replica→excess replica )
28
Now we can bound V ar(
f(
~X))
:
V ar(
f(
~X))
= E
[
(
f(
~X)
− E
[
f(
~X)])2
]
≤∞∑
k=1
n · k · Pr
(
(
f(
~X)
− E
[
f(
~X)])2
∈ ((k − 1)n, kn]
)
≤∞∑
k=1
n · k · Pr
(
(
f(
~X)
− E
[
f(
~X)])2
≥ (k − 1) · n)
≤∞∑
k=1
n · k · 2 · e−(k−1)
= 2e · n∞∑
k=1
k · e−k
≤ 2e · n∞
x=1
x · e−xdx
= 2e · n[
−e−x (x + 1)]∞x=1
= 4n.
4.3 Analyzing the Number of Doubly Rejected Balls
Let g(n) , |1 ≤ b ≤ n : |R(b)| = 2| be the function that counts the number of balls,
both replicas of which are rejected in Step 1b (e.g., doubly rejected).
The following claim bounds the expected number of doubly rejected balls. We use
a conditioning on the number of excess balls and Claim 4.4. The claim states that the
expected number of doubly rejected balls is Θ
(
E2[f( ~X)]
4n
)
.
Claim 4.5.E
2[f( ~X)]4n
− 1 ≤ E [g(n)] ≤ E2[f( ~X)]
4n+ 1.
29
Proof. Let us consider E [g(n)]:
E [g(n)] =∑
k
E
[
g(n) | f(
~X)
= k]
· Pr(
f(
~X)
= k)
=∑
k
n · k
2n
k − 1
2n− 1· Pr
(
f(
~X)
= k)
≤∑
k
n ·(
k
2n
)2
· Pr(
f(
~X)
= k)
(4.1)
=1
4n·∑
k
k2 · Pr(
f(
~X)
= k)
=1
4n· E[
f(
~X)2]
=1
4n·(
V ar(
f(
~X))
+ E2[
f(
~X)])
,
E [g(n)] =∑
k
E
[
g(n) | f(
~X)
= k]
· Pr(
f(
~X)
= k)
=∑
k
n · k
2n· k − 1
2n− 1· Pr
(
f(
~X)
= k)
≥∑
k
n ·[
(
k
2n
)2
− 1
n
]
· Pr(
f(
~X)
= k)
(4.2)
=∑
k
n ·(
k
2n
)2
· Pr(
f(
~X)
= k)
−∑
k
Pr(
f(
~X)
= k)
=1
4n·∑
k
k2 · Pr(
f(
~X)
= k)
− 1
=1
4n· E[
f(
~X)2]
− 1
=1
4n·(
V ar(
f(
~X))
+ E2[
f(
~X)])
− 1 .
Claim 4.4 implies that 0 ≤ V ar(
f(
~X))
≤ 4n, then combining with Equations 4.1, 4.2 :
E2[
f(
~X)]
4n− 1 ≤ E [g(n)] ≤ 1
4n·(
4n + E2[
f(
~X)])
,
and the claim follows .
30
The following two claims bound E [g (n)] for the range of ln ln n ≤ T ≤√
lnnln ln n
. These
claims state that 15·e10 · n
3/4·ln lnn
lnn< E [g (n)] < n
ln n.
Claim 4.6. Let T ≥ ln ln n then E [g (n)] < n ·(
2T
T !
)2
< nlnn
.
Proof. Let us consider the first inequality. Claims 4.5, 4.1 imply that :
E [g(n)] ≤E
2[
f(
~X)]
4n+ 1
≤
(
2n · 2T+1
(T+1)!
)2
4n+ 1
= n ·(
2 · 2T
T ! · (T + 1)
)2
+ 1
≤ 2n · 4
(T + 1)2 ·(
2T
T !
)2
< n ·(
2T
T !
)2
.
Let us consider the second inequality, then for T ≥ ln ln n:
n ·(
2T
T !
)2
= n · e2 ln 2·T−(2+o(1))T ln T
< n · e2T−(2+o(1))T lnT
= n · e−T ((2+o(1)) ln T−2)
=n
eT ((2+o(1)) ln T−2)
≤ n
eln ln n(ln ln ln n−2)
<n
eln ln n
=n
ln n.
Claim 4.7. Let T ≤√
lnnln ln n
then E [g (n)] > 15·e10 · n3/4·ln lnn
lnn.
31
Proof. Claims 4.5, 4.1 imply that :
E [g(n)] ≥E
2[
f(
~X)]
4n− 1
≥
(
e−5n · 2T+1
(T+1)!
)2
4n− 1 (4.3)
=e−10
4· n ·
(
2T+1
(T + 1)!
)2
− 1
=e−10
(T + 1)2· n ·
(
2T
T !
)2
− 1 .
Let us bound n ·(
2T
T !
)2
for T ≤√
lnnln ln n
. Since for n ≥ e
(
e(e2))
,√
lnn · ln lnn < 16ln n,
then:
n ·(
2T
T !
)2
= n · e2 ln 2·T−(2+o(1))T ln T
> n · eT−(2+o(1))T lnT
= n · e−T ((2+o(1)) ln T−1)
>n
e3T lnT
≥ n
e32
√ln n
ln ln nln( ln n
ln ln n)(4.4)
>n
e32
√ln n
ln ln nln ln n
=n
e32
√lnn·ln ln n
>n
e32· 16
lnn
= n3/4 .
Let us bound 1(T+1)2
for T ≤√
ln nln lnn
:
32
1
(T + 1)2 ≥ 1(√
ln nln lnn
+ 1)2
>1
(
2 ·√
ln nln ln n
)2 (4.5)
=1
4· ln ln n
ln n.
Combining Equations 4.3, 4.4, and 4.5:
E [g(n)] ≥ e−10
(T + 1)2 · n ·(
2T
T !
)2
− 1
>1
4 · e10· ln lnn
ln n· n3/4 − 1
>1
5 · e10· n
3/4 · ln ln n
ln n.
The following claim shows that the number of doubly rejected balls is concentrated
around its expected value. The proof is similar to the proof of Claim 4.2. We use Azuma-
Hoeffding inequality with the Lipschitz condition Theorem 3.12. The claim states that
the probability that the number of doubly rejected balls deviates from its expected value
by an ε is at most 2 · e−ε2/n.
Claim 4.8. Pr (|g (n)− E [g (n)]| ≥ ε) ≤ 2 · e−ε2/n.
Proof. There are 2n replicas of balls. We denote a replica by 1 ≤ β ≤ 2n. Let ξβ denote
the bin that replica β is sent to. Hence, g (n) is a function of the random variables ξβ2nβ=1.
The random variables ξβ2nβ=1 are independent and uniformly distributed. Let Z0 =
E [g (n)] and Zk = E [g (n) | ξ1, ξ2, ..., ξk] . The sequence Z0, Z1, ... is a Doob martingale.
Note that Z2n = E [g (n) | ξ1, ξ2, ..., ξ2n] = g (n). The function g satisfies the Lipschitz
condition with bound2 1. Now we may apply the Azuma-Hoeffding inequality with special
2We consider a worst case scenario. Changing the bin allocated to a replica may affect g(n) as follows:(a) no change in the number of the rejected balls, if the replica was excess and remains excess replica,(b) no change in the number of the rejected balls, if the replica was non-excess and remains non-excessreplica, (c) decrease of 1, if the replica is non-excess after the change and was excess before, (d) increaseof 1, if the replica is excess after the change and was non-excess before.
33
case (3.12):
Pr (|g (n)− E [g (n)]| ≥ ε) = Pr (|Z2n − Z0| ≥ ε)
≤ 2 · e−ε2/n.
Claims 4.2, 4.8 are an adaptation of an application of the Azuma-Hoeffding inequality
that appeared in [9]. Where a concentration result of the number of empty bins is proved.
We would like to emphasize that both Claims 4.2 and 4.8 do not depend on the value
of T .
Corollary 4.9. Let ln ln n ≤ T ≤√
lnnln ln n
, then the following inequalities occur with
probability 1− 1n:
1. |g (n)− E [g (n)]| ≤√
n ln n .
2. g (n) = Θ
(
E[f( ~X)]2
4n
)
.
3. g (n) < (1 + o(1)) · nln n
.
Proof. Plugging ε =√
n lnn in Claim 4.8 and taking the complement probability implies
Part (1).
Claims 4.6 and 4.7 state:
1
5 · e10· n
3/4 · ln ln n
ln n< E [g (n)] <
n
ln n. (4.6)
34
Combing Equations Part (1), and 4.6:
g (n) ≤ E [g (n)] +√
n ln n
= E [g (n)]
(
1 +
√n ln n
E [g (n)]
)
< E [g (n)]
(
1 +
√n ln n
15·e10 · n
3/4·ln lnn
lnn
)
(4.7)
= E [g (n)]
(
1 +5 · e10 · ln3/2 n
n1/4 · ln lnn
)
= E [g (n)] (1 + o(1)) ,
g (n) ≥ E [g (n)]−√
n ln n
= E [g (n)]
(
1−√
n ln n
E [g (n)]
)
> E [g (n)]
(
1−√
n ln n1
5·e10 · n3/4·ln lnn
lnn
)
= E [g (n)]
(
1− 5 · e10 · ln3/2 n
n1/4 · ln ln n
)
= E [g (n)] (1− o(1)) ,
which proves Part (2).
Part (2) follows, now, by Claim 4.5.
Equations 4.6, and 4.7 prove Part (3).
4.4 Putting it All Together
The following theorem and claims bound the maximum load that Algorithm retry pro-
duces w.h.p. We use Corollary 4.9 and the high probability bound in Theorem 3.16.
Claim 4.11 states that if the threshold T is in the interval[
ln ln n,√
ln nln lnn
]
then, the
optimal threshold is Θ(√
lg nlg lg n
)
which implies a maximum load of Θ(√
lg nlg lg n
)
w.h.p.
Theorem 4.10 shows that there is a trade-off between the threshold and the additional
35
load that Step 2a incurs (see Figure 4.2), e.g., thresholds other than T = Θ(√
lg nlg lg n
)
in
the interval[
ln lnn,√
lnnln ln n
]
incurs a higher load in Step 2a. Claim 4.12 states that a
selection of a threshold that is lower than ln ln n or higher than√
ln nln ln n
does not improve
the maximum load.
Figure 4.2: The trade-off between the threshold and the additional load from Step 2a.The x-axis denotes T , and the y-axis denotes the value of T and 1/(T ln(T )).
T1
Tln T
1.2 1.4 1.6 1.8 2.0 2.2 2.4
1
2
3
4
5
6
7
8
9
Threshod
Step 2.a additional load
log nlog log n
Theorem 4.10. Let ln ln n ≤ T ≤√
lnnln ln n
, then the maximum load that Algorithm
retry produces is Θ(
T + ln nT ·lnT
)
w.h.p..
Proof. Corollary 4.9 implies that g (n) < (1 + o(1)) · nln n
w.h.p. Then, using the high
probability bound in Theorem 3.16 for throwing m = g (n) balls into n bins implies that
the additional load that Step 2a incurs is Θ
(
lnn
ln( nm)
)
w.h.p.
Corollary 4.9, and Claim 4.1 imply that:
g (n) = Θ
E
[
f(
~X)]2
4n
= Θ
(
n ·(
2T+1
(T + 1)!
)2)
.
36
Plugging m = g (n) in Θ
(
ln n
ln( nm)
)
:
Θ
(
ln n
ln(
nm
)
)
= Θ
ln n
ln
(
(
(T+1)!2T+1
)2)
= Θ
ln n
ln(
(T+1)!2T+1
)
= Θ
(
lnn
T · ln T
)
.
Since the maximum load that Step 1a produces is T , and since the additional load that
Step 2a produces is Θ(
ln nT ·lnT
)
, then the maximum load is Θ(
T + ln nT ·lnT
)
w.h.p.
Claim 4.11. Let ln ln n ≤ T ≤√
lnnln ln n
, then the maximum load that Algorithm retry
produces is minimized for T = Θ(√
lg nlg lg n
)
, and it is Θ(√
lg nlg lg n
)
w.h.p.
Proof. Theorem 4.10 implies that the maximum load is:
Θ
(
T +ln n
T · ln T
)
(4.8)
for T in the interval[
ln ln n,√
ln nln lnn
]
.
Bound 4.8 is minimized for T = lnnT ·lnT
, which implies that T = Θ(√
lg nlg lg n
)
. We conclude
that if T = Θ(√
lg nlg lg n
)
then the maximum load is Θ (T ) w.h.p.
Claim 4.12. Let
√
lnnln ln n
< T , or T < ln ln n, then the maximum load that Algorithm
retry produces is Ω(√
lg nlg lg n
)
w.h.p.
Proof. If T < ln ln n then the number of doubly rejected balls (e.g. g(n)) increases, thus
increasing the additional load that Step 2a incurs.
If√
ln nln lnn
< T then the maximum load will be at least min
T, lnnln ln n
, since the
maximum load is Ω(
ln nln lnn
)
w.h.p. [7, 10].
37
Chapter 5
Simulation of Algorithm H-retry
5.1 The New Algorithm
The new algorithm combines ideas from pgreedy and from the threshold algorithm of
Stemann. Let T denote the threshold used by the algorithm. The algorithm is non-
adaptive, symmetric, and requires only one synchronization (similarly to [11]).
Overview. Instead of picking a bin that assigned the smallest height to ball t (as in
pgreedy), the ball forwards the heights to the other bin. Only after the bin receives
all the requests and the heights of the siblings of the balls in the bin, reject and accept
decisions are made.
First, the bin performs a safe delete step. The idea is that, if the local height is larger
than the height of a sibling, then the bin can safely remove the ball. We refer to the
difference between the local height of ball t and the height of its sibling by δt. Let |bini|denote the number of balls that have sent requests to the ith bin. We apply safe deletion
only for at most |bini|−T balls. If a bin still contains too many balls after the safe delete
step, then the excess delete step takes place.
Note that, after the safe delete step, every ball t in an overloaded bin has δt ≤ 0.
During the excess delete step, balls are removed until the number of balls equals the
threshold. Balls with δt closer to zero are deleted first, and among balls with the same
δt, lower balls are deleted first.
After safe deletion and excess deletion are completed, reject and accept messages are
38
sent to all the balls. An accept message is also accompanied by the load of the bin. Thus,
each ball knows how many of its siblings were accepted. If both siblings are accepted,
then the ball sends a withdraw message to a bin with the higher load. If both siblings
were rejected, then we re-throw these doubly rejected balls.
Each doubly rejected ball uniformly re-chooses two random bins. These bins accept
only if accepting the new ball does not overload the bin. Our experiments show that,
even for 8 million balls, only a handful of balls are finally rejected. One could reassign
them, if needed, using an extra round. Alternatively, one could re-choose three random
bins instead of two or simply accept the re-thrown balls while increasing the maximum
load only by one with high probability.
Description. To make reading easier, we present a “sequential” description of the al-
gorithm below. We use the same notation used for the description of pgreedy.
1a Each ball t chooses two bins b1(t) and b2(t) independently and uniformly at random.
(For simplicity, assume that b1(t) 6= b2(t)). The ball t sends requests to bins b1(t)
and b2(t).
1b Each bin i assigns heights to the requesting balls (according to their order of arrival),
and sends the height hi(t) to ball t. Let bini denote the set of balls that have sent
requests to bin i.
2a Each ball t forwards its height in one bin to the other bin (containing the sibling
ball).
2b We define a synchronization point at this stage: for each ball t in bin i, the bin
knows the local height hi(t) and the height hj(t) of the sibling of t in bin j. Let
δt
= hi(t)−hj(t). (Note that δt is a local variable in bin i; in bin j, δt = hj(t)−hi(t).)
We now apply two deletion steps: safe delete and excess delete. The safe delete
step proceeds as follows. Let Ai = t′1, t′2, . . . denote the subset of bini consisting
of balls t with δt > 0 sorted in descending height order (i.e., hi(t′1) > hi(t
′2) > · · · ).
Let αi
= min|Ai|, |bini| − T. Each bin i removes the αi highest balls. Thus
bini ← bini \ t′1, . . . , t′αi.
39
After the safe delete step, the bin might still be overloaded. In this case, we apply
the excess delete step to balls with δt ≤ 0. Let βi = max0, |bini| − T. We need
to remove βi balls from bini. Note that balls with δt > 0, whose height is below the
threshold T , were not deleted in the safe deletion step and are not deleted in the
excess delete step.
The excess delete step proceeds as follows. We first sort the balls with δt ≤ 0 in
each bin in lexicographic ascending order according to (−δt, hi(t)). We now simply
remove from bin i the βi first balls according to this lexicographic order. Namely,
we first remove balls with δt = 0, and among them we remove balls with the smallest
height. We then continue with balls with the δt = −1, and so on.
A reject message is sent to each ball removed from bini. An accept message is sent
to the balls that remain in bini together with the current load of the bin (i.e., |bini|).
3a For each ball t, if t received two accept messages, then t sticks with a bin with the
lower load, and sends a withdraw message to the other bin.
For each ball t, if t received two reject messages, then t re-chooses two bins b3(t), b4(t)
independently and uniformly at random. The ball t sends requests to these two bins.
We refer to these requests as re-throws of ball t.
3b (No messages are sent in this step.) A bin i that receives a withdraw message from
ball t simply removes t from bini.
A bin i that receives a re-throw request from ball t, accepts t only if |bini| < T .
Discussion. The new algorithm is symmetric, non-adaptive, uses a constant number
of rounds, and requires only one synchronization point.
Our experiments are, of course, synchronous. This leads to a “layering” phenomena
since two siblings are more likely to receive the same height. In Adler et al. [1], shuffling of
heights appears in the mpgreedy(d) algorithm. We note that the mpgreedy algorithm
requires multiple synchronous rounds. One could move the synchronization point in our
algorithm to step 1b, namely, each bin waits for all the requests. If this is the case, then
the balls could be shuffled before heights are assigned. The synchronization point in step
40
2b would no longer be required since the bin waits for messages containing the height of
the siblings from the balls already in the bin.
There are a many ways to deal with doubly rejected balls. First, since they are so few,
one could simply have each such ball choose a random bin. Since there are so few such
balls, they incur only a constant additional load. This is perhaps the simplest solution.
A second option is to reserve a small portion of the bins for re-throws so that in step 1a
the reserved bins are not chosen. In step 3a, each doubly rejected ball chooses one or two
bins among the reserved bins.
Duplicate siblings due to re-throws can be removed by adding a round (called step
4a). Namely, send accept and reject messages in step 3b so a ball can send withdraw
messages in step 4a to eliminate duplicates. We emphasize that the complications caused
by re-throws are due to very few balls, hence, it is not clear that these issues are of
practical interest.
5.2 Experimental Results
5.2.1 Maximum load
We conducted experiments for n (the number of balls) ranging from 1 million to 8 million.
For each n, the results for 50 trials are presented in Table 5.1. The value of the threshold
was T = 3 in all cases, except for n = 8 · 106, where we also used T = 4 (last row).
The frequencies of the bin loads are presented. For example, a load of zero means the
bin is empty. For each load, the range of frequencies in the experiments is given by
the median and half the difference between the maximum and the minimum frequency.
Note that the load frequencies are sharply concentrated. The column labeled #re-throws
contains the number of balls that are doubly rejected, and therefore, required a re-throw.
Note that the number of doubly rejected balls roughly doubles as n doubles and is also
sharply concentrated. The frequencies of the number of doubly-rejected balls that are
still rejected after re-throws appear in the last 4 columns. We never encountered more
than 5 balls that were not finally accepted.
41
0 1 2 3 4 0 1 2 3 >=4
201975.5 601483.5 191069.5 5485 0 38
±512 ±1028.5 ±608 ±129 ±0 ±10.5
404052.5 1202875.5 382162.5 10985.5 0 76
±784.5 ±1434 ±721.5 ±255.5 ±0 ±31
808048.5 2405691.5 764304.5 21931.5 0 149
±1089.5 ±2131 ±998 ±259.5 ±0 ±27
1616149.5 4811338.5 1528371 43972.5 0 310
±1387.5 ±2758 ±1466 ±397 ±0 ±44.5
1596627.5 4840281 1529629 33367.5 87 0
±1379 ±2671.5 ±1487.5 ±394.5 ±21.5 ±1
4
8
0 00
1
1
0
26
24
50
0 0
0
23
4
1M
2M
4M
8M
8M
Rejection frequencies
3
3
3
000644
036 14
19
15
Bin LoadsTn #Retr ies
Table 5.1: Results for bin load frequencies, number of re-throws, and rejection frequenciesin 50 trials per four values of n ranging from 1 million to 8 million. For each bin load,the frequencies obtained in the trials is presented by the median and half the differencebetween the maximum and minimum frequency.
5.2.2 Progress of the algorithm
In table 5.2 we show how the frequencies of the bin loads change over the course of the
execution of the algorithm during four different trials (for n ranging from 1 million to 8
million and T = 3). The initial load is the load after step 1a (i.e. 2n balls in n bins).
The next two columns refer to loads in step 2b: after safe delete and after excess delete.
The safe delete step removed most of the balls in the overloaded bins. The excess delete
step removed all the remaining balls that caused overloading above the threshold. Note
that only a small fraction of the balls removed in the excess delete step turn out to be
doubly rejected.
Roughly 75% of the balls are doubly accepted. One sibling of each doubly accepted
ball is withdrawn in step 3a. Withdrawals get rid of all the current duplicates, and the
number of balls in the bins equals n minus the number of doubly rejected balls. The
final distribution of bin loads is presented in the last column (with duplicates due to
re-throws). Note that the proportions between the numbers in the tables for different
values of n remain roughly the same.
42
1M Initial After Safe del After Excess del After Withdraws of duplicates Final
0 135640 135640 135640 202119 202104
1 270683 270683 270683 601345 601307
2 269648 269648 269648 190994 191034
3 180946 321580 324029 5542 5555
4 90499 2401 0 0 0
>=5 52584 48 0 0 0
2M Initial After Safe del After Excess del After Withdraws of duplicates Final
0 270208 270208 270208 403709 403673
1 541615 541615 541615 1203520 1203448
2 541242 541242 541242 381919 382000
3 361507 642164 646935 10852 10879
4 180457 4707 0 0 0
>=5 104971 64 0 0 0
4M Initial After Safe del After Excess del After Withdraws of duplicates Final
0 541389 541389 541389 807689 807623
1 1082508 1082508 1082508 2406684 2406562
2 1082659 1082659 1082659 763719 763856
3 721420 1283730 1293444 21908 21959
4 361971 9564 0 0 0
>=5 210053 150 0 0 0
8M Initial After Safe del After Excess del After Withdraws of duplicates Final
0 1083559 1083559 1083559 1616708 1616604
1 2164328 2164328 2164328 4811017 4810815
2 2165986 2165986 2165986 1528105 1528297
3 1441416 2567142 2586127 44170 44284
4 723250 18675 0 0 0
>=5 421461 310 0 0 0
Table 5.2: The change in the frequencies of the bin loads during the execution of thealgorithm.
43
Bibliography
[1] Micah Adler, Soumen Chakrabarti, Michael Mitzenmacher, and Lars Eilstrup Ras-
mussen. Parallel randomized load balancing. Random Struct. Algorithms, 13(2):159–
188, 1998.
[2] Y. Azar, A.Z. Broder, A.R. Karlin, and E. Upfal. Balanced allocations. SIAM
journal on computing, 29(1):180–200, 2000.
[3] P. Berenbrink, F. Meyer auf der Heide, and K. Schröder. Allocating Weighted Jobs
in Parallel. Theory of Computing Systems, 32(3):281–300, 1999.
[4] Artur Czumaj and Volker Stemann. Randomized allocation processes. Random
Struct. Algorithms, 18(4):297–331, 2001.
[5] William Feller. An Introduction to Probability Theory and Its Applications, Volume
1. Wiley, January 1968.
[6] Donald E. Knuth. The Art of Computer Programming Volumes 1-3 Boxed Set.
Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1998.
[7] V.F. Kolchin, B.A. Sevastyanov, and V.P. Chistyakov. Random Allocations. John
Wiley & Sons, 1978.
[8] M. Mitzenmacher, A. Richa, and R. Sitaraman. The power of two random choices:
A survey of techniques and results. Handbook of Randomized Computing, 1:255–312,
2001.
[9] M. Mitzenmacher and E. Upfal. Probability and Computing: Randomized Algorithms
and Probabilistic Analysis. Cambridge University Press, 2005.
44
[10] Martin Raab and Angelika Steger. "balls into bins" - a simple and tight analysis. In
RANDOM ’98, pages 159–170, London, UK, 1998. Springer-Verlag.
[11] V. Stemann. Parallel balanced allocations. In Proceedings of the eighth annual ACM
symposium on Parallel algorithms and architectures, pages 261–269. ACM New York,
NY, USA, 1996.
[12] B. Voecking. How Asymmetry Helps Load Balancing. Journal of the ACM,
50(4):568–589, 2003.
45