continuous distributed counting for non-monotonic streams

25
Continuous Distributed Counting for Non- monotonic Streams Milan Vojnović Microsoft Research Joint work with Zhenming Liu and Božidar Radunović January 14, 2012

Upload: wood

Post on 22-Feb-2016

33 views

Category:

Documents


0 download

DESCRIPTION

Continuous Distributed Counting for Non-monotonic Streams. Milan Vojnović Microsoft Research Joint work with Zhenming Liu and Božidar Radunović. January 14, 2012. SUM Tracking Problem. : . Maintain estimate . k. 1. 2. 3. SUM:. SUM Tracking: Applications. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Continuous Distributed Counting for Non-monotonic Streams

Continuous Distributed Counting for Non-monotonic Streams

Milan VojnovićMicrosoft Research

Joint work with Zhenming Liu and Božidar Radunović

January 14, 2012

Page 2: Continuous Distributed Counting for Non-monotonic Streams

2

SUM Tracking ProblemMaintain estimate

1 2 3 k

SUM:

:

𝑋 1𝑋 2𝑋 3 𝑋 4𝑋 5

Page 3: Continuous Distributed Counting for Non-monotonic Streams

3

SUM Tracking: Applications

• Ex 1: database queries

SELECT SUM(AdBids)from Ads

• Ex 2: iterative solving

input data

Page 4: Continuous Distributed Counting for Non-monotonic Streams

4

Related Work

• Count tracking [Huang, Yi and Zhang, 2011]

– Worst-case input, monotonic sum– Expected communication cost, for :

and lower bound

• Lower bound for worst case input [Arackaparambil, Brody and Chakrabarti, 2009] – Expected communication cost:

Page 5: Continuous Distributed Counting for Non-monotonic Streams

5

Questions

• Worst case complexity

Ex. +1, -1, +1, …

• Complexity under random input?– Random permutation– Random i.i.d.– Fractional Brownian motion

Page 6: Continuous Distributed Counting for Non-monotonic Streams

6

Outline

• Upper bounds

• Lower bounds

• Applications

Page 7: Continuous Distributed Counting for Non-monotonic Streams

7

Our Tracker Algorithm

• Sampling based algorithm: upon arrival of update , send a message to the coordinator w. p.

• If any site sends a message: sync all

XiMi = 1Sk

S1S

S site

site

coordinator

S, Sk

S, S1S = S1 + … + Sk

Page 8: Continuous Distributed Counting for Non-monotonic Streams

8

Algorithm’s Modes

Sample

1/(𝜇2𝜖)

Sample

Use two monotonic countersAlways report√𝑘 /𝜖

−√𝑘/𝜖

Page 9: Continuous Distributed Counting for Non-monotonic Streams

9

Communication Cost Upper BoundSingle Site

• Input: i.i.d. Bernoulli • Sampling probability with and

Expected communication cost:

Page 10: Continuous Distributed Counting for Non-monotonic Streams

10

Proof Key Idea

𝑆𝑡=𝑠

𝑠1+𝜖

𝑠1−𝜖

message sent

Page 11: Continuous Distributed Counting for Non-monotonic Streams

11

Communication Cost Upper BoundMultiple Sites

• sites• Updates i.i.d. Bernoulli • large enough and

Expected communication cost:

Page 12: Continuous Distributed Counting for Non-monotonic Streams

12

Communication Cost Upper BoundUnknown Drift Case

• Input: i.i.d. Bernoulli

• : unknown drift parameter

Expected communication cost:

Page 13: Continuous Distributed Counting for Non-monotonic Streams

13

Communication Cost Upper BoundRandom Permutation Input

• Input: a random permutation of values • sufficiently large and

Expected communication cost:

Page 14: Continuous Distributed Counting for Non-monotonic Streams

14

Communication Cost Upper BoundFractional Brownian Motion

• Input: a fractional Brownian motion with Hurst parameter

Expected communication cost:

Page 15: Continuous Distributed Counting for Non-monotonic Streams

15

Outline

• Upper bounds

• Lower bounds

• Applications

Page 16: Continuous Distributed Counting for Non-monotonic Streams

16

−1 /𝜖1/𝜖

Lower BoundsSingle Site, Zero Drift

• Input: i.i.d. Bernoulli

Expected communication cost:

Page 17: Continuous Distributed Counting for Non-monotonic Streams

17

Lower BoundsMultiple Sites

• Input: i.i.d. Bernoulli or a random permutation

Expected communication cost:

Page 18: Continuous Distributed Counting for Non-monotonic Streams

18

Proof Key Ideas

• Under , maximum deviation

0 𝑛𝑘 2𝑘⋯ ⋯𝑗𝑘 ( 𝑗+1)𝑘

𝑗𝑘 ( 𝑗+1)𝑘𝑗𝑘+1𝑗𝑘+2

1 2 k⋯

Page 19: Continuous Distributed Counting for Non-monotonic Streams

19

Proof Key Ideas (cont’d)

• Query: or ?

• Answer: incorrect only if and the answer is under or under

• Lemma: messages is necessary to answer the query correctly with a constant positive probability

1 2 k⋯

𝑋 1𝑋 2 𝑋𝑘

k-input problem:

i. i. d.

Page 20: Continuous Distributed Counting for Non-monotonic Streams

20

Outline

• Upper bounds

• Lower bounds

• Applications

Page 21: Continuous Distributed Counting for Non-monotonic Streams

21

App 1: Tracking (cont’d)

• Input: random permutation of • ,

• Problem: track within a prescribed relative tolerance with high probability

Page 22: Continuous Distributed Counting for Non-monotonic Streams

22

𝑆𝑡𝑖=1𝑠1∑𝑗𝑆𝑡

𝑖 , 𝑗

AMS Sketch

⋯ ⋯

𝑖

11

𝑠2

𝑠1𝑗

𝑆𝑡𝑖 , 𝑗

𝑆𝑡1

𝑆𝑡𝑠2

⋯⋯

• , 4-wise independent hash function𝑆𝑡=median(𝑆𝑡

𝑖 )

Page 23: Continuous Distributed Counting for Non-monotonic Streams

23

App 1: tracking (cont’d)• AMS: within w. p.

using and

• Sum tracking:

Expected total communication:

Page 24: Continuous Distributed Counting for Non-monotonic Streams

24

App 2: Bayesian Linear Regression• Feature vector , output

• Prior osterior • Sum tracking:

• Under random permutation input, the expected communication cost =

Page 25: Continuous Distributed Counting for Non-monotonic Streams

25

Summary

• We considered the sum tracking problem with non-monotonic distributed streams under random permutation, random i. i. d. and fractional Brownian motion

• Derived a practical algorithm that has order optimal communication complexity