element distinctness, frequency moments, and sliding windows · previous element distinctness lower...

Element Distinctness, Frequency Moments, andSliding Windows

Raphael Clifford

University of Bristol, UK

arXiv:1309.3690

FOCS 2013

Joint work with Paul Beame and Widad Machmouchi

Time-space tradeoffs

Wikipedia:Beer

1. Frequency moments. E.g. how many different beer cans?

2. Element distinctness (ED). Have I had the same can twice?

Particularly simple to solve if presorted.

What is the complexity of these problems using small space?

I Any solution using sorting requires T ∈ Ω(n2/S)[Borodin-Cook 82, Beame 91].

I What is the true complexity using small space?

I Are both problems really as hard as sorting? (No.)

I How about multi-output or sliding window versions?

Sliding window ED and frequency moments

A B C B A B C

I #distinct elements (F0) = 3, 2, 3, 2, 3.

I Sliding window ED gives 1, 0, 1, 0, 1.

I We show sliding window ED is easier than sorting but slidingwindow F0 mod 2 is as hard as sorting.

Our new results

Our new upper and lower bounds:

Single window Sliding window

Frequency moments T ∈ Ω(n√

log(n/S)/ log log(n/S)) [BSSV 03]T ∈ Ω(n2/S) (New)T ∈ O(n2/S) (New)

Element distinctnessT ∈ Ω(n

√log(n/S)/ log log(n/S)) [BSSV 03]

T ∈ O(n√n/S) (New)

F0 mod 2 T ∈ O(n2/S) [PR 98]T ∈ O(n2/S) (New)T ∈ Ω(n2/S) (New)

Previous element distinctness lower bounds:

Comparison model Multi-way branching

Borodin et al. 1987 T ∈ Ω(n3/2√

log n/S) -

Yao 1988 T ∈ Ω(n2−ε(n)/S) -Ajtai 1999 - S ∈ o(n)⇒ T ∈ ω(n)

Beame et al. 2003 - T ∈ Ω(n√

log(n/S)/ log log(n/S))

Our new results

log n/S) -

Our new results

log n/S) -

The overall method for our upper bounds

I Using Floyd’s (Pollard’s rho) cycle finding algorithm, we

construct a T ∈ O(n√

n/S) randomised branching programalgorithm for single window ED.

I 1-sided error, inversely polynomial in n

I Reduction from sliding-window ED to single window ED.I T ∈ O(n

√n/S) for a single window gives T ∈ O(n

√n/S) for

sliding windows

I Sliding window frequency moments T ∈ O(n2/S) in thecomparison model.

Cycle finding in graphs

Floyd’s “tortoise and hare” algorithm.

Tortoise

Randomised cycle finding

134 921 37 812 396 452 921 98

1 2 3 4 5 6 7 8 9

ED input

Random

Hash function

: → []

134 921 37 812 396 452 921 98

1 2 3 4 5 6 7 8 9

ED input

Random

Hash function

: → []

134 921 37 812 396 452 921 98

1 2 3 4 5 6 7 8 9

ED input

Random

Hash function

: → []

134 921 37 812 396 452 921 98

1 2 3 4 5 6 7 8 9

ED input

Induced Graph

A new small-space upper bound for ED

Sampling uniformly from [n]:

I Expect to find a repeated value after Θ(√n) samples.

I Prob. of any fixed pair of numbers appearing before arepeated value approaches 2/n.

There is a reasonable chance of finding a real input duplicate inone cycle. Using constant space we can repeat Θ(n) times.

But to run faster using more space, we can’t just run S instancesin parallel.

Sampling uniformly from [n]:

I Expect to find a repeated value after Θ(√n) samples.

I Prob. of any fixed pair of numbers appearing before arepeated value approaches 2/n.

There is a reasonable chance of finding a real input duplicate inone cycle. Using constant space we can repeat Θ(n) times.

But to run faster using more space, we can’t just run S instancesin parallel.

I Maintain a redirection list to split cycles. Update listwhenever a new collision is found.

I Cycles roughly halve in length each time they are visited.

We find all collisions reachable from any S distinct starting pointsusing O(S) items of space and time roughly proportional to thesize of the subgraph explored.

Theorem

There is a randomised branching program algorithm computing EDwith 1-sided error that uses space S and T ∈ O(n

√n/S).

1. Run roughly n/S independent runs of collision-finding withindependent random choices of hash functions andindependent choices of roughly S starting indices.

2. Use run-time cut-off bounding the number of explored verticesat 2√Sn.

3. On each run, check if any of the collisions found is a duplicatein x , in which case output ED(x) = 0 and halt.

4. If none is found in any round then output ED(x) = 1.

Theorem

There is a randomised branching program algorithm computing EDwith 1-sided error that uses space S and T ∈ O(n

√n/S).

1. Run roughly n/S independent runs of collision-finding withindependent random choices of hash functions andindependent choices of roughly S starting indices.

2. Use run-time cut-off bounding the number of explored verticesat 2√Sn.

3. On each run, check if any of the collisions found is a duplicatein x , in which case output ED(x) = 0 and halt.

4. If none is found in any round then output ED(x) = 1.

Sliding window ED

Theorem

Sliding window ED can be solved in time T ∈ O(n√n/S) with

1-sided error probability o(1/n).

I Reduce to single window ED.

I A duplicate in one window determines a large number ofoutputs.

A general sequential lower bound for sliding window F0

Framework of [Borodin-Cook 82, Abrahamson 91]

T ∈ Ω(n2/S) follows if, for some random input distribution, nomatter how cn input values are fixed, any fixed set of Θ(S) outputvalues occurs with prob. 2−S .

I Informally, for the first S outputs, it is hard to predict thenext output value even when a constant fraction of the inputvector is known.

A general sequential lower bound for sliding window F0

Framework of [Borodin-Cook 82, Abrahamson 91]

T ∈ Ω(n2/S) follows if, for some random input distribution, nomatter how cn input values are fixed, any fixed set of Θ(S) outputvalues occurs with prob. 2−S .

I Informally, for the first S outputs, it is hard to predict thenext output value even when a constant fraction of the inputvector is known.

The sliding window F0 lower bound

Need to show: for the first S outputs, it is hard to predict the nextoutput value even when a constant fraction of the input vector isknown.

I Input uniform over [n]2n−1. But some outputs are easy topredict.

a ? ? ? ? a

I Uniform distribution gives whp Ω(n) positions where:I xi is unique in the input.I F0(i , i + n − 1) of the window is in the range [0.5n, 0.85n].

I Only consider outputs for these positions and show they arehard to predict.

a a a a a ?

a b c d e ?

Summing up

I Finding two identical items is easier than sorting the input.

I Sliding window element distinctness is still easier than sorting

I F0 mod 2 may be better than ED as an example of a harddecision problem to study.

I Sliding window F0 mod 2 has the same complexity as sorting(ignoring log factors).

I Is our new complexity for element distinctness in fact tight?

Thank youwww.background-free.com

element distinctness, frequency moments, and sliding windows · previous element distinctness lower...

Documents