Tight Bounds for Distributed Functional Monitoring
David WoodruffIBM Almaden
Qin ZhangAarhus University
MADALGO
Distributed Functional MonitoringC
P1 P2 P3 Pk…
coordinator
time
sites
Static case vs. Dynamic caseProblems on x1 + x2 + … + xk: sampling, p-norms, heavy hitters, compressed sensing, quantiles, entropyAuthors: Can, Cormode, Huang, Muthukrishnan, Patt-Shamir, Shafrir, Tirthapura, Wang, Yi, Zhao, many others
Communication
x1 x2 x3 xkinputs:
Updates:xi à xi + ej
Motivation
• Data distributed and stored in the cloud– Impractical to put data on a single device
• Sensor networks– Communication very power-intensive
• Network routers– Bandwidth limitations
Problems• Which functions f(x1, …, xk) do we care about?
• x1, …, xk are non-negative length-n vectors
• x = i=1k xi
• f(x1, …, xk) = |x|p = (i=1n xi
p)1/p
• |x|0 is the number of non-zero coordinates
What is the randomized communication cost of these
problems?I.e., the minimal cost of a protocol, which for every input, fails with probability < 1/3
Static case, Dynamic Case
Exact Answers• An (n) communication bound for computing |x|p , p 1
• Reduction from 2-Player Set-Disjointness (DISJ)• Alice has a set S µ [n] of size n/4• Bob has a set T µ [n] of size n/4 with either |S Å T| = 0 or
|S Å T| = 1• Is S Å T = ;?• |X Å Y| = 1 ! DISJ(X,Y) = 1, |X Å Y| = 0 !DISJ(X,Y) = 0• [KS, R] (n) communication
• Prohibitive for applications
Approximate Answers
f(x1, …, xk) = (1 ± ε) |x |p
What is the randomized communication cost as a function of k, ε, and n?
Ignore log(nk/ε) factors
Previous ResultsLower bounds in static model, upper bounds in dynamic
model (underlying vectors are non-negative)
• |x|0: (k + ε-2) and O(k¢ε-2 )
• |x|p: (k + ε-2)
• |x|2: O(k2/ε + k1.5/ε3)
• |x|p, p > 2: O(k2p+1n1-2/p ¢ poly(1/ε))
Our ResultsLower bounds in static model, upper bounds in dynamic
model (underlying vectors are non-negative)
• |x|0: (k + ε-2) and O(k¢ε-2 ) (k¢ε-2)
• |x|p: (k + ε-2) (kp-1¢ε-2). Talk will focus on p = 2
• |x|2: O(k2/ε + k1.5/ε3) O(k¢poly(1/ε))
• |x|p, p > 2: O(k2p+1n1-2/p ¢ poly(1/ε)) O(kp-1¢poly(1/ε))
First lower bounds to depend on
product of k and ε-
2
Upper bound doesn’t depend
polynomially on n
Talk Outline
• Lower Bounds– Non-zero elements – Euclidean norm
• Upper Bounds– p-norm
Previous Lower Bounds• Lower bounds for any p-norm, p != 1
• [CMY](k)
• [ABC] (ε-2) • Reduction from Gap-Orthogonality (GAP-ORT)
• Alice, Bob have u, v 2 {0,1}ε-2 , respectively
• |¢(u, v) – 1/(2ε2)| < 1/ε or |¢(u, v) - 1/(2ε2)| > 2/ε
• [CR, S] (ε-2) communication
Talk Outline
• Lower Bounds– Non-zero elements – Euclidean norm
• Upper Bounds– p-norm
Lower Bound for Distinct Elements• Improve bound to optimal (k¢ε-2)
• Simpler problem: k-GAP-THRESH– Each site Pi holds a bit Zi
– Zi are i.i.d. Bernoulli(¯)– Decide if
i=1k Zi > ¯ k + (¯ k)1/2 or i=1
k Zi < ¯ k - (¯ k)1/2
Otherwise don’t care
• Rectangle property: for any correct protocol transcript ¿,Z1, Z2, …, Zk are independent conditioned on ¿
A Key Lemma• Lemma: For any protocol ¦ which succeeds w.pr. >.9999, the
transcript ¿ is such that w.pr. > 1/2, for at least k/2 different i, H(Zi | ¿) < H(.01 ¯)
• Proof: Suppose ¿ does not satisfy this– With large probability,
¯ k - O(¯ k)1/2 i=1k Zi | ¿] < ¯ k + O(¯ k)1/2
– Since the Zi are independent given ¿, i=1
k Zi | ¿ is a sum of independent Bernoullis
– Since most H(Zi | ¿) are large, by anti-concentration, both events occur with constant probability:
i=1k Zi | ¿ > ¯ k + (¯ k)1/2 , i=1
k Zi | ¿ < ¯ k - (¯ k)1/2
So ¦ can’t succeed with large probability
Composition IdeaC
P1 P2 P3 Pk…
Z3Z2Z1Zk
The input to Pi in k-GAP-THRESH, denoted Zi, is the output of a 2-party Disjointness (DISJ) instance between C and Si
- Let X be a random set of size 1/(4ε2) from {1, 2, …, 1/ε2}- For each i, if Zi = 1, then choose Yi so that DISJ(X, Yi) = 1, else choose Yi so that DISJ(X, Yi) = 0- Distributional complexity (1/ε2) [Razborov]
DISJ
DISJ
DISJDISJ
Can think of C as a
player
Putting it All Together• Key Lemma ! For most i, H(Zi | ¿) < H(.01¯)
• Since H(Zi) = H(¯) for all i, for most i protocol ¦ solves DISJ(X, Yi) with constant probability
• Since the Zi | ¿ are independent, solving DISJ requires communication (ε-2) on each of k/2 copies
• Total communication is (k¢ε-2)
• Can show a reduction:– |x|0 > 1/(2ε2) + 1/ε if i=1
k Zi > ¯ k + (¯ k)1/2
– |x|0 < 1/(2ε2) - 1/ε if i=1k Zi < ¯ k - (¯ k)1/2
Talk Outline
• Lower Bounds– Non-zero elements – Euclidean norm
• Upper Bounds– p-norm
Lower Bound for Euclidean Norm• Improve (k + ε-) bound to optimal (k¢ε-2)
• Base problem: Gap-Orthogonality (GAP-ORT(X, Y))– Consider uniform distribution on (X,Y)
• We observe information lower bound for GAP-ORT
• Sherstov’s lower bound for GAP-ORT holds for uniform distribution on (X,Y)
• [BBCR] + [Sherstov] ! for any protocol ¦ and t > 0, I(X, Y; ¦) = (1/(ε2 log t)) or ¦ uses t communication
Information Implications
• By chain rule, I(X, Y ; ¦) = i=1
1/ε2 I(Xi, Yi ; ¦ | X< i, Y< i) = (ε-2)
• For most i, I(Xi, Yi ; ¦ | X< i, Y< i) = (1)
• Maximum Likelihood Principle: non-trivial advantage in guessing (Xi, Yi)
2-BIT k-Party DISJ
• Choose a random j 2 [k2]– j doesn’t occur in any Ti
– j occurs only in T1, …, Tk/2
– j occurs only in Tk/, …, Tk
– j occurs in T1, …, Tk
• All j’ j occur in at most one set Ti (assume k ¸ 4)• We show (k) information cost
P1 P2 … PkP3
T1 T2 T3 Tk 2 [k2]
We compose GAP-ORT with a variant of k-Party DISJ
Rough Composition Idea
2-BIT k-party DISJ instance
2-BIT k-party DISJ instance
…
2-BIT k-party DISJ instance
{1/ε2
Show (k/ε2) overall information is revealed
Bits Xi and Yi in GAP-ORT determine output of i-th 2-BIT k-party DISJ instance
An algorithm for approximating Euclidean norm solves GAP-ORT, therefore solves most 2-BIT k-party DISJ instances
GAP-ORT
- Information adds (if we condition on enough “helper” variables)- Pi participates in all instances
Talk Outline
• Lower Bounds– Non-zero elements – Euclidean norm
• Upper Bounds– p-norm
Algorithm for p-norm
• We get kp-1 poly(1/ε), improving k2p+1n1-2/p poly(1/ε) for general p and O(k2/ε + k1.5/ε3) for p = 2
• Our protocol is the first 1-way protocol, that is, all communication is from sites to coordinator
• Focus on Euclidean norm (p = 2) in talk
• Non-negative vectors
• Just determine if Euclidean norm exceeds a threshold θ
The Most Naïve Thing to Do
• xi is Site i’s current vector
• x = i=1k xi
• Suppose Site i sees an update xi à xi + ej
• Send j to Coordinator with a certain probability that only depends on k and θ?
Sample and Send
P1 P2 … PkP3
C
1…10…00…0………0…0
0…01…10…0………0…0
0…00…01…1………0…0
………………………………………
0…00…00…0………1…1
|x|2 = k2
{k|x|2 = 2k2
1 1 1 1 1
Send each update with probability at least 1/k
Communication = O(k), so okay
Suppose x has k4 coordinates that are 1, and may have a
unique coordinate which is k2, occurring k times on each site
- Send update with probability 1/k2
- Will find the large coordinate
- But communication is (k2)
What Is Happening?• Sampling with probability ¼ 1/k2 is good to get a few
samples from heavy item
• But all the light coordinates are in the way, making the communication (k2)
• Suppose we put a barrier of k, that is, sample with probability ¼ 1/k2 but only send an item if it has occurred at least k times on a site
• Now communication is O(1) and found heavy coordinate
• But light coordinates also contribute to overall |x|2 value
• Sample at different scales with different barriers
• Use public coin to create O(log n) groups T1, …, Tlog n of the n input coordinates
• Tz contains n/2z random coordinates
• Suppose Site i sees the update xi à xi + ej
• For each Tz containing j • If xij > (θ/2z)1/2/k then with probability
(2z/θ)1/2¢poly(ε-1 log n), send (j, z) to the coordinator
Algorithm for Euclidean Norm
• Expected communication O~(k)
• If a group of coordinates contributes to|x|2, there is a z for which a few coordinates in the group are sampled multiple times
Conclusions• Improved communication lower and upper bounds
for estimating |x|p
• Implies tight lower bounds for estimating entropy, heavy hitters, quantiles
• Implications for data stream model– First lower bound for |x|0 without Gap-Hamming– Useful information cost lower bound for Gap-Hamming, or protocol has very large communication– Improve (n1-2/p/ε2/p) bound for estimating |x|p in a
stream to (n1-2/p/ε4/p)