lecture 3: occupancy, moments and deviations, randomized … · 2014. 4. 25. · sotiris...

34
Randomized Algorithms Lecture 3: “Occupancy, Moments and deviations, Randomized selection ” Sotiris Nikoletseas Associate Professor CEID - ETY Course 2013 - 2014 Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 1 / 34

Upload: others

Post on 15-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 3: Occupancy, Moments and deviations, Randomized … · 2014. 4. 25. · Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 16 / 34. 3. Occupancy -

Randomized Algorithms

Lecture 3: “Occupancy, Moments anddeviations, Randomized selection ”

Sotiris NikoletseasAssociate Professor

CEID - ETY Course2013 - 2014

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 1 / 34

Page 2: Lecture 3: Occupancy, Moments and deviations, Randomized … · 2014. 4. 25. · Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 16 / 34. 3. Occupancy -

1. Some basic inequalities (I)

(i)(1 + 1

n

)n ≤ e

Proof: It is: ∀x ≥ 0: 1 + x ≤ ex. For x = 1n , we get(

1 + 1n

)n ≤(e

1n

)n= e

(ii)(1− 1

n

)n−1 ≥ 1e

Proof: It suffices that(n−1n

)n−1 ≥ 1e ⇔

(n

n−1

)n−1≤ e

But nn−1 = 1 + 1

n−1 , so it suffices that(1 + 1

n−1

)n−1≤ e

which is true by (i).

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 2 / 34

Page 3: Lecture 3: Occupancy, Moments and deviations, Randomized … · 2014. 4. 25. · Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 16 / 34. 3. Occupancy -

1. Some basic inequalities (II)

(iii) n! ≥(ne

)nProof: It is obviously

nn

n!≤

∞∑i=0

ni

i!

But∞∑i=0

ni

i!= en from Taylor’s expansion of f(x) = ex.

(iv) For any k ≤ n:(nk

)k ≤(nk

)≤

(nek

)kProof: Indeed, k ≤ n ⇒ n

k ≤ n−1k−1

Inductively k ≤ n ⇒ nk ≤ n−i

k−i , (1 ≤ i ≤ k − 1)

Thus(nk

)k ≤ nk · n−1

k−1 · · ·n−(k−1)k−(k−1) =

nk

k! =(nk

)For the right inequality we obviously have

(nk

)≤ nk

k!

and by (iii) it is k! ≥(ke

)kSotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 3 / 34

Page 4: Lecture 3: Occupancy, Moments and deviations, Randomized … · 2014. 4. 25. · Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 16 / 34. 3. Occupancy -

2. Preliminaries

(i) Boole’s inequality (or union bound)

Let random events E1, E2, . . . , En. Then

Pr

n∪

i=1

Ei

= PrE1 ∪ E2 ∪ · · · ∪ En ≤

n∑i=1

PrEi

Note: If the events are disjoint, then we get equality.

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 4 / 34

Page 5: Lecture 3: Occupancy, Moments and deviations, Randomized … · 2014. 4. 25. · Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 16 / 34. 3. Occupancy -

2. Preliminaries

(ii) Expectation (or Mean)

Let X a random variable with probability density function(pdf) f(x). Its expectation is:

µx = E[X] =∑x

x · PrX = x

If X is continuous, µx =

∞∫−∞

xf(x) dx

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 5 / 34

Page 6: Lecture 3: Occupancy, Moments and deviations, Randomized … · 2014. 4. 25. · Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 16 / 34. 3. Occupancy -

2. Preliminaries

(ii) Expectation (or Mean)

Properties:

∀Xi (i = 1, 2, . . . , n) : E

[n∑

i=1

Xi

]=

n∑i=1

E[Xi]

This important property is called “linearity ofexpectation”.E[cX] = cE[X], where c constantif X,Y stochastically independent, thenE[X · Y ] = E[X] · E[Y ]Let f(X) a real-valued function of X. Then

E[f(x)] =∑x

f(x)PrX = x

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 6 / 34

Page 7: Lecture 3: Occupancy, Moments and deviations, Randomized … · 2014. 4. 25. · Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 16 / 34. 3. Occupancy -

2. Preliminaries

(iii) Markov’s inequality

Theorem: Let X a non-negative random variable. Then, ∀t > 0PrX ≥ t ≤ E[X]

t

Proof: E[X] =∑x

xPrX = x ≥∑x≥t

xPrX = x

≥∑x≥t

tPrX = x = t∑x≥t

PrX = x = t PrX ≥ t

Note: Markov is a (rather weak) concentration inequality, e.g.PrX ≥ 2E[X] ≤ 1

2PrX ≥ 3E[X] ≤ 1

3etc

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 7 / 34

Page 8: Lecture 3: Occupancy, Moments and deviations, Randomized … · 2014. 4. 25. · Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 16 / 34. 3. Occupancy -

2. Preliminaries

(iv) Variance (or second moment)

Definition: V ar(X) = E[(X − µ)2], where µ = E[X]i.e. it measures (statistically) deviations from mean.

Properties:

V ar(X) = E[X2]− E2[X]V ar(cX) = c2V ar(X), where c constant.if X,Y independent, it is V ar(X + Y ) = V ar(X) + V ar(Y )

Note: We call σ =√

V ar(X) the standard deviation of X.

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 8 / 34

Page 9: Lecture 3: Occupancy, Moments and deviations, Randomized … · 2014. 4. 25. · Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 16 / 34. 3. Occupancy -

2. Preliminaries

(v) Chebyshev’s inequality

Theorem: Let X a r.v. with mean µ = E[X]. It is:

Pr|X − µ| ≥ t ≤ V ar(X)t2

∀t > 0

Proof: Pr|X − µ| ≥ t = Pr(X − µ)2 ≥ t2From Markov’s inequality:

Pr(X − µ)2 ≥ t2 ≤ E[(X−µ)2]t2

= V ar(X)t2

Note: Chebyshev’s inequality provides stronger (than Markov’s)concentration bounds, e.g.Pr|X − µ| ≥ 2σ ≤ 1

4Pr|X − µ| ≥ 3σ ≤ 1

9etc

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 9 / 34

Page 10: Lecture 3: Occupancy, Moments and deviations, Randomized … · 2014. 4. 25. · Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 16 / 34. 3. Occupancy -

3. Occupancy - importance

occupancy procedures are actually stochastic processes (i.e,random processes in time). Particularly, the occupancyprocess consists in placing randomly balls into bins, one ata time.

occupancy problems/processes have fundamentalimportance for the analysis of randomized algorithms, suchas for data structures (e.g. hash tables), routing etc.

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 10 / 34

Page 11: Lecture 3: Occupancy, Moments and deviations, Randomized … · 2014. 4. 25. · Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 16 / 34. 3. Occupancy -

3. Occupancy - definition and basic questions

general occupancy process: we uniformly randomly andindependently put, one at a time, m distinct objects(“balls”) each one into one of n distinct classes (“bins”).

basic questions:

what is the maximum number of balls in any bin?how many balls are needed so as no bin remains empty,with high probability?what is the number of empty bins?what is the number of bins with k balls in them?

Note: in the next lecture we will study the couponcollector’s problem, a variant of occupancy.

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 11 / 34

Page 12: Lecture 3: Occupancy, Moments and deviations, Randomized … · 2014. 4. 25. · Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 16 / 34. 3. Occupancy -

3. Occupancy - the case m = n

Let us randomly place m = n balls into n bins.

Question: What is the maximum number of balls in any bin?

Remark: Let us first estimate the expected number of balls inany bin.For any bin i (1 ≤ i ≤ n) let Xi = # balls in bin i.Clearly Xi ∼ B(m, 1

n) (binomial)So E[Xi] = m 1

n = n 1n = 1

We however expect this “mean” (expected) behaviour to behighly improbable, i.e.,

some bins get no balls at all

some bins get many balls

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 12 / 34

Page 13: Lecture 3: Occupancy, Moments and deviations, Randomized … · 2014. 4. 25. · Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 16 / 34. 3. Occupancy -

3. Occupancy - the case m = n

Theorem 1. With probability at least 1− 1n , no bin gets more

than k∗ = 3 lnnln lnn balls.

Proof: Let Ej(k) the event “bin j gets k or more balls”. Becauseof symmetry, we first focus on a given bin (say bin 1). It is

Prbin 1 gets exactly i balls =(ni

) (1n

)i (1− 1

n

)n−i

since we have a binomial B(n, 1n). But(

ni

) (1n

)i (1− 1

n

)n−i ≤(ni

) (1n

)i ≤ (nei

)i ( 1n

)i=

(ei

)i(from basic inequality iv)

Thus PrE1(k) ≤n∑

i=k

(ei

)i≤

( ek

)k·(1 +

e

k+

( ek

)2+ · · ·

)=

=( ek

)k 1

1− ek

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 13 / 34

Page 14: Lecture 3: Occupancy, Moments and deviations, Randomized … · 2014. 4. 25. · Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 16 / 34. 3. Occupancy -

3. Occupancy - the case m = n

Now, let k∗ =⌈3 lnnln lnn

⌉. Then:

PrE1(k∗) ≤(

ek∗

)k∗ 11− e

k∗≤ 2

(e

3 lnnln lnn

)k∗

since it suffices 11− e

k∗≤ 2 ⇔ k∗

k∗−e ≤ 2 ⇔ k∗ ≤ 2k∗ − 2e ⇔⇔ k∗ ≥ 2e which is true.

But 2

(e

3 lnnln lnn

)k∗

= 2(e1−ln 3−ln lnn+ln ln lnn

)k∗≤ 2

(e− ln lnn+ln ln lnn

)k∗ ≤ 2 exp(−3 lnn+ 6 lnn ln ln lnn

ln lnn

)≤ 2 exp(−3 lnn+ 0.5 lnn) = 2 exp(−2.5 lnn) ≤ 1

n2

for n large enough.

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 14 / 34

Page 15: Lecture 3: Occupancy, Moments and deviations, Randomized … · 2014. 4. 25. · Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 16 / 34. 3. Occupancy -

3. Occupancy - the case m = n

Thus,

Prany bin gets more than k∗ balls = Pr

n∪

j=1

Ej(k∗)

n∑j=1

PrEj(k∗) ≤ nPrE1(k∗) ≤ n1

n2=

1

n(by symmetry)

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 15 / 34

Page 16: Lecture 3: Occupancy, Moments and deviations, Randomized … · 2014. 4. 25. · Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 16 / 34. 3. Occupancy -

3. Occupancy - the case m = n log n

We showed that when m = n the mean number of balls inany bin is 1, but the maximum can be as high ask∗ = 3 lnn

ln lnn

The next theorem shows that when m = n log n themaximum number of balls in any bin is more or less thesame as the expected number of balls in any bin.

Theorem 2. When m = n lnn, then with probability1− o(1) every bin has O(log n) balls.

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 16 / 34

Page 17: Lecture 3: Occupancy, Moments and deviations, Randomized … · 2014. 4. 25. · Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 16 / 34. 3. Occupancy -

3. Occupancy - the case m = n - An improvement

If at each iteration we randomly pick d bins and throw theball into the bin with the smallest number of balls, we cando much better than in Theorem 2:

Theorem 3. We place m = n balls sequentially in n bins asfollows:For each ball, d ≥ 2 bins are chosen uniformly at random(and independently). Each ball is placed in the least full ofthe d bins (ties broken randomly). When all balls areplaced, the maximum load at any bin is at mostln lnnln d +O(1), with probability at least 1− o(1) (in other

words, a more balanced balls distribution is achieved).

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 17 / 34

Page 18: Lecture 3: Occupancy, Moments and deviations, Randomized … · 2014. 4. 25. · Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 16 / 34. 3. Occupancy -

3. Occupancy - tightness of Theorem 1

Theorem 1 shows that when m = n then the maximum load inany bin is O

(lnn

ln lnn

), with high probability. We now show that

this result is tight:

Lemma 1: There is a k = Ω(

lnnln lnn

)such that bin 1 has k balls

with probability at least 1√n.

Proof: Pr[k balls in bin 1] =(nk

) (1n

)k (1− 1

n

)n−k

≥(nk

)k 1nk

(1− 1

n

)n−k(from basic inequality iv)

=(1k

)k (1− 1

n

)n−k ≥(1k

)k ( 12e

)= 1

2e

(1k

)k(for n ≥ 2)

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 18 / 34

Page 19: Lecture 3: Occupancy, Moments and deviations, Randomized … · 2014. 4. 25. · Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 16 / 34. 3. Occupancy -

3. Occupancy - tightness of Theorem 1

By putting k = c lnnln lnn we get

Pr c lnnln lnn balls in bin 1 ≥ 1

2e

(ln lnnc lnn

) c lnnln lnn ≥

(1

c lnn

) c lnnln lnn

(for n ≥ 4)

=(

1c2ln lnn

) c lnnln lnn = 1

c2ln lnn c lnnln lnn

= 1c2c lnn = 1

cnc = Ω(n−c)

Setting c = 12 we get Pr c lnn

ln lnn balls in bin 1 ≥ Ω( 1√n)

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 19 / 34

Page 20: Lecture 3: Occupancy, Moments and deviations, Randomized … · 2014. 4. 25. · Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 16 / 34. 3. Occupancy -

3. Occupancy - the case m = n log n

Towards a proof of Theorem 2. We use the following bound.

Theorem (Chernoff bound). Let X a r.v.:X =

∑ni=1Xi = X1 + · · ·+Xn where for all i (1 ≤ i ≤ n) the

Xi’s are independent and

Xi =

1, with probability p

0, with probability 1− p

Let E[X] = np = µ. Then, ∀δ > 0

PrX ≥ µ(1 + δ) ≤(

(1+δ)(1+δ)

When placing m = n log n balls into n bins let

Xi =

1, if ball i lands in bin 1 (prob= 1

n)

0, else

and X =∑m

i=1Xi = # of balls in bin 1. Thenµ = E[X] = m 1

n = lnn.

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 20 / 34

Page 21: Lecture 3: Occupancy, Moments and deviations, Randomized … · 2014. 4. 25. · Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 16 / 34. 3. Occupancy -

3. Occupancy - the case m = n log n

Let us estimate the probability that bin 1 receives more thane.g. 10 lnn balls

by the Markov inequality:

PrX ≥ 10 lnn ≤ lnn10 lnn = 1

10 (the bound is not strong)

by the Chebyshev’s inequality:

X is actually binomial, i.e. X ∼ B(m, 1n) thus its variance

is V ar(X) = m(1n

) (1− 1

n

)= m

n − mn2 ≤ m

n

Thus PrX ≥ mn + k ≤ Pr|X − m

n | ≥ k ≤ V ar(X)k2

≤ mnk2

For m = n lnn ⇒ mn = lnn and for k = 9 lnn we have

PrX ≥ 10 lnn = PrX ≥ lnn+9 lnn ≤ n lnnn81 ln2 n

= 181 lnn

(a bound which is better than the one by Markov’sinequality)

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 21 / 34

Page 22: Lecture 3: Occupancy, Moments and deviations, Randomized … · 2014. 4. 25. · Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 16 / 34. 3. Occupancy -

3. Occupancy - the case m = n log n

Let us estimate the probability that bin 1 receives more thane.g. 10 lnn balls

by Chernoff bound:

PrX ≥ 10 lnn = PrX ≥ (1 + 9) lnn ≤(

e9

1010

)lnn≤ 1

n10

(much stronger)

Thus,Pr∃ bin with more than 10 lnn balls ≤ n 1

n10 = n−9

⇒ Prall bins have less than 10 lnn balls ≥ 1− n−9

A similar bound applies to the “low tail”, i.e. theprobability that there exists a bin with less than, say,110 lnn balls tends to zero, as n tends to infinity. Overall,there is high concentration around the mean value of lnnballs per bin.

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 22 / 34

Page 23: Lecture 3: Occupancy, Moments and deviations, Randomized … · 2014. 4. 25. · Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 16 / 34. 3. Occupancy -

3. Occupancy - the case m = n log n

Note: The corresponding bounds (for any bin) by Markov’sinequality and Chebychev’s inequality are trivial:

by Markov we get ≤ n10

by Chebyshev we get ≤ n81 lnn

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 23 / 34

Page 24: Lecture 3: Occupancy, Moments and deviations, Randomized … · 2014. 4. 25. · Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 16 / 34. 3. Occupancy -

3. Occupancy - all balls in distinct bins

Let the experiment of sequentially putting m ballsrandomly in n bins.Problem: How large m can be so that the probability of allballs being placed in distinct bins remains high?For 2 ≤ i ≤ m, let Ei= “the ith ball lands in a bin notoccupied by the first i− 1 balls”. The desired probability is:Pr

∩mi=2 Ei =

∏mi=2 PrEi|

∩i−1j=2 Ej =

PrE2PrE3|E2PrE4|E2E3 · · ·PrEm|E2 . . . Em−1But PrEi|

∩i−1j=2 Ej = 1− i−1

n ≤ e−i−1n

Pr∩m

i=2 Ei ≤∏m

i=2 e− i−1

n = e−∑m

i=2i−1n = e−

1n

∑m−1i=1 i =

e−m(m−1)

2n

Thus, when m = d√2n+ 1e then this probability is at most

1e while when m increases the probability decreases rapidly.Note: This is similar to the classic “birthday paradox” inprobability theory.

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 24 / 34

Page 25: Lecture 3: Occupancy, Moments and deviations, Randomized … · 2014. 4. 25. · Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 16 / 34. 3. Occupancy -

4. The Randomized Selection Algorithm

The problem: We are given a set S of n distinct elements(e.g. numbers) and we are asked to find the kth smallest.

Notation:

rS(t): the rank of element t (e.g. the smallest element hasrank 1, the largest n and the kth smallest has rank k).S(i) denotes the ith smallest element of S (clearly, we seekS(k) and rS(S(k)) = k).

Remark: the fastest known deterministic algorithm needs3n time and is quite complex. Also, any deterministicalgorithm requires 2n time (a tight lower bound).

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 25 / 34

Page 26: Lecture 3: Occupancy, Moments and deviations, Randomized … · 2014. 4. 25. · Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 16 / 34. 3. Occupancy -

4. The basic idea: random sampling

we will randomly sample a subnet of elements from S,trying to optimize the following trade-off:

- the sample should be small enough to be processed (e.g.ordered) in small time

- the sample should be large enough to contain the kthsmallest element, with high probability

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 26 / 34

Page 27: Lecture 3: Occupancy, Moments and deviations, Randomized … · 2014. 4. 25. · Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 16 / 34. 3. Occupancy -

4. The Lazy Select Algorithm

1 Pick randomly uniformly, with replacement, a subset R ofn

34 elements from S.

2 Sort R using an optimal deterministic sorting algorithm.

3 Let x = k · n− 14 .

l = maxbx−√nc, 1 and h = mindx+

√n e, n

34 .

a = R(l) and b = R(h)

By comparing a and b to every element of S, determinerS(a), rS(b).

4 If k ∈ [n14 , n− n

14 ], let P = y ∈ S : a ≤ y ≤ b.

Check whether S(k) ∈ P and |P | ≤ 4n34 + 2. If not, repeat

steps 1-3 until such a P is found.

5 By sorting P , identify P(k−rS(a)+1) = S(k).

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 27 / 34

Page 28: Lecture 3: Occupancy, Moments and deviations, Randomized … · 2014. 4. 25. · Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 16 / 34. 3. Occupancy -

4. Remarks on the Lazy Select Algorithm

In Step 1, sampling is done with replacement to simplifythe analysis. Sampling without replacement is marginallyfaster but more complex to implement.

Step 2 takes O(n34 log n) time (which is o(n)).

Step 3 clearly takes 2n time (2n comparisons). Graphically,

An example: assume rS(a) = 3 and we want S(7). In thesorted list of P elements, S(7) = P(k−rS(a)+1) == P(7−3+1) = P5, i.e. the 5th element indeed.

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 28 / 34

Page 29: Lecture 3: Occupancy, Moments and deviations, Randomized … · 2014. 4. 25. · Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 16 / 34. 3. Occupancy -

4. Remarks on the Lazy Select Algorithm

In Step 4, it is easy to check (in constant time) whetherS(k) ∈ P by comparing k to (the now known) rS(a), rS(b).

In Step 5, sorting P takes O(n34 logn) = o(n) time.

Note: we skip in Step 4 the (less interesting) cases where

k < n14 and k > n− n

14 . Their analysis is similar.

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 29 / 34

Page 30: Lecture 3: Occupancy, Moments and deviations, Randomized … · 2014. 4. 25. · Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 16 / 34. 3. Occupancy -

4. When Lazy Select fails?

The algorithm may fail in Step 4, either because S(k) /∈ P

because |P | is large. We will show that the probability of failureis very small.

Lemma 1. The probability that S(k) /∈ P is O(n− 14 ).

Proof: This happens if i)S(k) < a or ii)S(k) > b.

i) S(k) < a ⇔ fewer than l (l = k · n− 14 −

√n) of the samples in

R are less than or equal to S(k). Let:

Xi =

1, the ith random sample is at most S(k)

0, otherwise

Clearly, E(Xi) = PrXi = kn and V ar(Xi) =

kn(1−

kn)

Let X =

|R|∑i=1

Xi = # samples in R that are at most S(k). Then

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 30 / 34

Page 31: Lecture 3: Occupancy, Moments and deviations, Randomized … · 2014. 4. 25. · Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 16 / 34. 3. Occupancy -

4. When Lazy Select fails?

µX = E[X] = |R| · E[Xi] = n34kn = kn− 1

4 and

σ2X = V ar[X] =

|R|∑i=1

V ar(Xi) = n34k

n(1− k

n) ≤ n

34

4(since the

samples are independent)

Thus, Pr|X − µX | ≥√n ≤ σ2

Xn ≤ n

34

4n = O(n− 14 )

⇒ PrX − µX < −√n ≤ O(n− 1

4 )

⇒ PrX < µX −√n = PrX < kn− 1

4 −√n︸ ︷︷ ︸

l

≤ O(n− 14 )

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 31 / 34

Page 32: Lecture 3: Occupancy, Moments and deviations, Randomized … · 2014. 4. 25. · Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 16 / 34. 3. Occupancy -

4. When Lazy Select fails?

ii) The case S(k) > b is essentially symmetric (at least h of therandom samples should be smaller than S(k)), so

PrS(k) > b = O(n− 14 )

Overall PrS(k) /∈ P = PrS(k) < a ∪ S(k) > b =

O(n− 14 ) +O(n− 1

4 ) = O(n− 14 )

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 32 / 34

Page 33: Lecture 3: Occupancy, Moments and deviations, Randomized … · 2014. 4. 25. · Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 16 / 34. 3. Occupancy -

4. The Lazy Select Algorithm

Lemma 2 The probability that P contains more than 4n34 + 2

elements is O(n− 14 )

Proof: Very similar to the proof of Lemma 1: Let

ke = max1, k − 2n34 and

kn = mink + 2n34 , n

If S(kl) < a or S(kh) > b then P contains more than 4n34 + 2

elements. For simplicity, let kl = k − 2n34 , kh = k + 2n

34

Then, it suffices to “simulate” the proof of Lemma 1 for k = kland then for k = kh.

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 33 / 34

Page 34: Lecture 3: Occupancy, Moments and deviations, Randomized … · 2014. 4. 25. · Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 16 / 34. 3. Occupancy -

4. The Lazy Select Algorithm

Theorem The Algorithm Lazy Select finds the correct solution

with probability 1−O(n− 14 ) performing 2n+ o(n) comparisons.

Proof: Due to Lemmata 1, 2 the Algorithm finds S(k) on the

first pass through steps 1-5 with probability 1−O(n− 14 ) (i.e., it

does not fail in Step 4 avoiding a loop to Step 1). Step 1

obviously takes o(n) time. Step 2 requires O(n34 log n) = o(n)

time, and Step 3 clearly needs 2n comparisons (comparing eachof the n elements of S to a and b). Overall the time needed isthus 2n+ o(n).

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 34 / 34