heavy hitters continued tail bounds

33
Heavy Hitters Tail Bounds Anna Karlin continued

Upload: others

Post on 10-Jan-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Heavy Hitters continued Tail Bounds

Heavy HittersTail Bounds

Anna Karlin

continued

Page 2: Heavy Hitters continued Tail Bounds

Problem● Input: sequence of ! elements "!, "", … , "# from a known

universe % (e.g., 8-byte integers).

● Goal: perform a computation on the input, in a single left to right pass where

○ Elements processed in real time

○ Can’t store the full data. => minimal storage requirement to maintain working “summary”

A AM

Page 3: Heavy Hitters continued Tail Bounds

Heavy Hitters: Keys that occur many times

Applications:● Determining popular products● Computing frequent search queries● Identifying heavy TCP

X Xa Xz X11

V ten fxt i times element x hasappeared in X Ha Xt

Goal Find ontput all elements with Fx ITThese etfs are heavyhitters

Page 4: Heavy Hitters continued Tail Bounds

Output has size 04

Provably impossible tosolve this

problem exactly withsublinearspace

Modifiedgoalsolve E HHproblem

If fi 7 X addedtoHH list

If Sone ett say y added

to list then wp I on

fig 32k En

Page 5: Heavy Hitters continued Tail Bounds

Count-min sketch● Maintain a short summary of the information that still

enables answering queries.

● Cousin of the Bloom filter

○ Bloom Filter solves the “membership problem”.

○ We want to extend it to solve a counting problem.

Page 6: Heavy Hitters continued Tail Bounds

Count-Min Sketch Modifiedgoalsolve E HHproblem

If FF ZTz X addedtoHH list

If Sone ett say y added

to list then wp s l oh

fxy 372 Eh

designerspecifies K E S b l

keep 2D arrayl hash tables each of size b

Page 7: Heavy Hitters continued Tail Bounds

initialize tables with all Os

when ett showsupUpdate x tf is jet increment tj hjCx

Count x return imine tj h xD

if Count x 3 add to HIT list

I 2 3 4 5 bhi1h22

he e

b 2

I x

1 32

I 4

Page 8: Heavy Hitters continued Tail Bounds

Assumptions

hash functions behave likerandommaps

h hee U 0,1 bigV xfy Pr hj x hj y s

hashhis h heare indep of

eachother

initialize tables with all Os

when ett showsupUpdate x tf is jet increment tj hjCx

Count x return jzetj h xDif Countlx 3 add to HH list

wed

Page 9: Heavy Hitters continued Tail Bounds

Fix time t Xi it had just arrivedtZ E tj hjGD

Page 10: Heavy Hitters continued Tail Bounds
Page 11: Heavy Hitters continued Tail Bounds

Count-Min Sketch● Elegant small space data structure.

● Space used is independent of n.

● Is implemented in several real systems.

○ AT&T used in network switches to analyze network traffic.

○ Google uses a version on top of Map Reduce parallel processinginfrastructure and in log analysis.

● Huge literature on sketching and streaming algorithms (algorithms like Distinct Elements, Heavy Hitters and many many other very cool algorithms).

Page 12: Heavy Hitters continued Tail Bounds

Hashfunctions

Page 13: Heavy Hitters continued Tail Bounds

6.1 Tail bounds

Most Slides by Joshua Fan and Alex Tsun

Page 14: Heavy Hitters continued Tail Bounds

Agenda● Markov’s Inequality● Chebyshev’s Inequality● The law of large numbers

Page 15: Heavy Hitters continued Tail Bounds

Markov’s Inequality (intuition)

a too

b 50

c 25

d no bound

Page 16: Heavy Hitters continued Tail Bounds

Markov’s Inequality (intuition)

Page 17: Heavy Hitters continued Tail Bounds

Markov’s Inequality (intuition)

a 100

b 50

c 25

d no bound

Page 18: Heavy Hitters continued Tail Bounds

Markov’s Inequality (intuition)

Ia 100

b 50

c 25

d no bound

Page 19: Heavy Hitters continued Tail Bounds

Markov’s Inequality

Page 20: Heavy Hitters continued Tail Bounds

Markov’s Inequality

Page 21: Heavy Hitters continued Tail Bounds
Page 22: Heavy Hitters continued Tail Bounds

Markov’s Inequality (Proof)

Page 23: Heavy Hitters continued Tail Bounds

Chebyshev’s Inequality

Page 24: Heavy Hitters continued Tail Bounds

Chebyshev’s Inequality

Page 25: Heavy Hitters continued Tail Bounds

Chebyshev’s Inequality (picture for Gaussian)

Page 26: Heavy Hitters continued Tail Bounds

Chebyshev’s Inequality (picture for Gaussian)

Page 27: Heavy Hitters continued Tail Bounds

Chebyshev’s Inequality (Proof)

Page 28: Heavy Hitters continued Tail Bounds

Chebyshev’s Inequality (Proof)

Page 29: Heavy Hitters continued Tail Bounds

Chebyshev’s Inequality (Proof)

Page 30: Heavy Hitters continued Tail Bounds

The Law of Large Numbers

Page 31: Heavy Hitters continued Tail Bounds

Proof of the WLLN

Page 32: Heavy Hitters continued Tail Bounds

Proof of the WLLN

Page 33: Heavy Hitters continued Tail Bounds

END PIC

Alex TsunJoshua Fan