“ stable distributions, pseudorandom generators, embeddings and data stream computation ”
DESCRIPTION
“ Stable Distributions, Pseudorandom Generators, Embeddings and Data Stream Computation ”. Paper by Piotr Indyk. Presentation by Andy Worms and Arik Chikvashvili. Abstract. Stable Distributions Stream computation by combining the use of stable distributions - PowerPoint PPT PresentationTRANSCRIPT
1
“Stable Distributions, Pseudorandom
Generators, Embeddings and Data Stream
Computation”Paper by Piotr Indyk
Presentation by Andy Worms and Arik Chikvashvili
2
Abstract
Stable Distributions Stream computation by combining
the use of stable distributions Pseudorandom Generators (PRG)
and their use to reduce memory usage
3
Introduction
Input: n-dimensional point p with L1
norm given.Storage: in a sketch C(p) of
O(logn/є) wordsProperty: given C(p) and C(q) will
estimate |p-q|1 for points p and q up to factor (1+є) w.h.p.
4
Introduction.Stream computation
Given a stream S of data, each chunk of data is of the form (i,a) where i[n]={0…n-1} and a{-M…M}.
We want to approximate the quantity L1(S), where
11/
0 ( , )
( ) ( )
pn
pp
i i a S
L S a
5
Other results
N. Alon, Y. Matias and M. Szegedy. Space O(1/є), w.h.p. for each i in the stream at most two pairs (i,a), approximating L2(S) [1996]
J. Feigenbaum, S. Kannan, M. Strauss and M. Viswanathan showed in L1(S) also for at most two pairs [1999]
6
Stable Distributions
1/( )p p
ii
a X
Distribution D over R is p-stable if exists p≥0 s. t. for any a1…an real numbers and i. i. d. variables X1 … Xn with distribution D the variable
i ii
a X has the same distribution
as the variable
7
Some News The good news: There exist stable
distributions for (0, 2]. The bad news: Most have no closed
formula (i.e. non-constructive). Cauchy Distribution is 1-stable. Gauss Distribution is 2-stable.Source: V.M. Zolotarev “One-dimensional
Stable Distributions” (518.2 ZOL in AAS)
9
Tonight’s Program “Obvious solution”. Algorithm for p=1. The algorithm’s
limitations. Proof of correctness. Overcome the
limitations. Time permitting: p =2
and all p in (0, 2].
10
The Obvious Solution Hold a counter for each i, and
update it on each pair found in the input stream.
Breaks the stream model: O(n) memory.
11
The Problem (p=1) Input: a stream S of pairs (i, a) such
that 0 i n and –M a M.
1
10 ( , )
( )n
i i a S
L S a
Up to an error factor of 1±ε With probability 1-δ
Output:
12
Definitions2/ log1/l c (c defined later)
1 1 11 22
100
1 1
.
. . .
. . . .
.
n
ij lji n
l l ln n
X X X
XX
X X X
ijX
Are independent random variables with Cauchy distribution
A set of Buckets, Sj 0 j l, initially zero.
14
It assumes infinite precision arithmetic.
It randomly and repeatedly accesses
random numbers.
( )n
Limitations
15
1
2
3
Buckets
Data Stream
Example: n = 7, l = 3
-5 6 3 -5 -5 3 0
2 1 -3 2 0 1 5
1 1 1 0 -3 2 1
(2,+1)(5,-2)(4,-1)
i 1 2 3 4 5 6 n=7
להמחשה להמחשה בלבדבלבד
16
Correctness Proof We define:
Therefore:( , )
ii a S
c a
(ci = 0 if there is no (i,a) in S)
1( ) iiL S C c
17
Correctness proof (cont.) Claim 1:Each Sj has the same
distribution as CX for some random variable X with Cauchy Distribution.
Proof: follows from the 1-stability of Cauchy Distribution.
18
Correctness Proof (cont.) Lemma 1: If X has Cauchy Distribution,
then median(|X|) = 1, median(a|X|) = a. Proof: The distribution function of |X| is
Since tan(/4) = 1, F(1) = 1/2. Thus median(|X|)=1 and median(a|X|)=a
20
2 1 2( ) arctan( )
1
z
F z dx zx
20
Correctness proof (cont.) Fact:For any distribution D on R with
distribution function F, take
independent samples X0, …, Xl-1 of D, and let X = median(X0, …, Xl-1):
2/ log1/l c
1 1Pr ( ) , 1
2 2F X
21
Correctness Proof (cont.) Fact in simple Hebrew:1. You choose an error
(small) and a probability (high).
2. With enough samples, you will discover the median with high probability within small error.
22
Correctness Proof (cont.) Lemma 2:Let F be the distribution of |X|, where
X has Cauchy Distribution. And let z>0 be such that ½–ε F(z) ½+ε.
Then, if ε is small enough, 1-4ε z 1+4ε
Proof:
'1 1( )2
F 4
23
Correctness Proof (last) Therefore we have proved:The algorithm correctly estimates
L1(S) up to the factor (1±ε), with probability at least (1-δ).
24
Correctness Proof (review)
For those who are lost:1. Each Bucket distributes like CX and
median(|CX|)=C.2. “Enough” samples approximate
median(|CX|)=C, “well enough”.3. Each bucket is a sample.
25
Tonight’s Program “Obvious solution”. Algorithm for p=1. The algorithm’s
limitations. Proof of correctness. Overcome the limitations. Time permitting: p =2
and all p in (0, 2]. God Willing: Uses of the
algorithm.
26
Bounded Precision The numbers of the stream are integers,
the problem is with the random variables. We will show it is sufficient to pick them
from the set:
(In Hebrew: the set of fractions of small numbers)
{ : , { , , }, 0}L
pV p q L L q
q
27
Bounded Precision (cont.) We want to generate X r.v. with
Cauchy Distribution. We choose Y uniformly from [0,1). X = F-1(Y) = tan(Y/2). Y is the multiple of 1/L closest to Y. X is F-1(Y) rounded to a multiple of
1/L.
29
Assume Y < 1-K/L = 1-. The derivative of F-1 near Y < 1- is
O(1/2). It follows that X=X+E, where |E|
=O(1/2L)=.
( , )
( )j ji i i i
i i a S i
j
i
j
i
ji iS X c c X S cXa
Bounded Precision (cont.)
30
Bounded Precision (cont.)
jji
i
SS c
0 1({ } ) | |jj l i
i
median S c
(result from previous slide)
(up to & , from the algorithm’s proof)
ii
c
by making small enough, you can ignore the contribution
of
31
Memory Usage Reduction
The naïve implementation uses O(n) memory words to store the random matrix.
Couldn’t we generate the random matrix on the fly?
Yes, with a PRG. We also toss less coins.
32
Not just for fun.
From the Python programming language (www.python.org).
source: http://www.python.org/doc/2.3.3/lib/module-random.html
33
Review: Probabilistic Algorithms
Allow algorithm A to: Use random bits. Make errors.
Answers correctly with high probability. for every x, Prr[A(x,r)=P(x)]>1- ε.
(for very small ε, say 10-1000).
Ainput
random bitsoutput
34
Exponential time Derandomization
After 20 years of research we only have the following trivial theorem.
Theorem: Probabilistic Poly-time algorithms can be simulated deterministically in exponential time. (Time 2poly(n)).
35
Proof: Suppose that A
uses r random bits. Run A using all 2r
choices for random bits.
Ainput
random bitsoutput
000000000000000000000100000000010
.
.
11111111111
2r
Time: 2r·poly(n)
Take the Majority vote of outputs.
36
Algorithms which use few bits
Time: 2r·poly(n)
Ainput
random bitsoutputAlgorithms
with few random coins
can be efficiently
derandomized!
00000001 . 1111
2r
r=O(log n)
Polynomial time deterministic
algorithm!
37
Derandomization paradigm Given a probabilistic algorithm that
uses many random bits. Convert it into a probabilistic
algorithm that uses few random bits. Derandomize it by using the
previous Theorem.
38
Pseudorandom Generators
Ainput
outputpseudo-random
bits
PRG seed
Use a short “seed” of very few truly random bits to generate a long string of pseudo-random bits.
Ainput
random bitsoutput
Pseudo-randomness: no efficient algorithm can distinguish truly random bits from pseudo-random bits.
few truly random bits
many “pseudo-random” bits
39
Pseudo-Random Generators
Ainput
outputpseudo-random
bits
PRGshort seed
New probabilistic algorithm.
In our algorithm we need to storage only short seed
And not the whole set of pseudorandom bits
few truly random bits
40
Remember?
i 1 2 3 4 5 6 n=7
1
2
-5
1
1
6
1
-3
3
0
2
-5
-3
0
-5
2
1
3
1
5
0
There exist efficient “random access” (indexable) random number generator.
41
PRG definition Given FSM Q Given a seed which is really random Convert it into a k chunks of random
bits each of length b. Formally- G: {0,1}m({0,1}b)k
Let Q(x) be a state of Q after input x G is PRG if|D[QxDbk(x)] - D[QxDm(G(x))]|1 ≤ є
42
PRG properties
Exists PRG G for space(S) with є=2-O(S)
such that: G expands O(SlogR) bits into O(R) bits G requires only O(S) bits of storage in
addition to its random bits Any length-O(S) chunk of G(x)n can be
computed using O(logR) arithmetic operations on O(S)-bit words
43
Randomness reduction Consider a fixed Sj and O(log M) place
to hold it O(n) for Xi, (i,a) come by increasing
order of i So we need O(n) chunks of
randomness => exists PRG that needs random seed
of size O(logMlog(n/δ)) to expand it to n pseudorandom variables X1…Xn
44
Randomness reduction X1…Xn variables give us Sj => L1(S) But Sj does not depend on order of i-s,
for each I the same Xi will be given => input can b unsorted
We use l=O(log(1/δ))/є random seeds
45
Theorem 2There is algorithm which estimates L1(S)
up to a factor (1є) with probability 1-δ and uses (S=logM, R=n/δ)
O(logMlog(1/δ)/є) bits of random access storage
O(log(n/δ)) arithmetic operations per pair (i,a)
O(logMlog(n/δ)log(1/δ)/є) random bits
46
Further Results When p=2, the algorithm and
analysis are the same, with Cauchy Distribution replaced by Gaussian.
For general p in (0, 2] don’t exist closed formulas for densities or distribution functions.
47
General p Fact: Can be generated p-stable
random variables from two independent variables that are distributed uniformly over [0,1] (Chambers, Mallows and Stuck, 1976)
Seems that Lemma2 and the algorithm itself could work for this case also, but no need to solve them as there are not known applications with p that differs from 1 and 2.