one algorithm to rule them all one join query at a time atri rudra university at buffalo
TRANSCRIPT
![Page 1: One algorithm to rule them all One join query at a time Atri Rudra University at Buffalo](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfd61a28abf838cadcad/html5/thumbnails/1.jpg)
One algorithm to rule them allOne join query at a time
Atri RudraUniversity at Buffalo
![Page 2: One algorithm to rule them all One join query at a time Atri Rudra University at Buffalo](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfd61a28abf838cadcad/html5/thumbnails/2.jpg)
A brief history of this talk
L2/L2 foreach sparse recovery/compressed sensing
http://www-stat.stanford.edu/~candes/stats330/index.shtml
![Page 3: One algorithm to rule them all One join query at a time Atri Rudra University at Buffalo](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfd61a28abf838cadcad/html5/thumbnails/3.jpg)
The key technical problem
Given the three shadows, what is the largest size of the original set of points?
Given the three shadows, what is the largest size of the original set of points?
![Page 4: One algorithm to rule them all One join query at a time Atri Rudra University at Buffalo](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfd61a28abf838cadcad/html5/thumbnails/4.jpg)
The key technical problem
Highly trivial: 43 = 64 Still trivial: 42 = 16 Correct answer: 41.5 = 8
![Page 5: One algorithm to rule them all One join query at a time Atri Rudra University at Buffalo](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfd61a28abf838cadcad/html5/thumbnails/5.jpg)
The key technical problem
A
B
C
|R|= k
|T| =k|S|=k
k3/2
Loomis Whitney
Algorithmic Loomis-
Whitney?
Algorithmic Loomis-
Whitney?
![Page 6: One algorithm to rule them all One join query at a time Atri Rudra University at Buffalo](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfd61a28abf838cadcad/html5/thumbnails/6.jpg)
An equivalent view
A
B
C
R
TS
A
B C
R
S
T
Output all (a,b,c) s.t. (a,b) in R, (b,c) in S and
(c,a) in T
Output all (a,b,c) s.t. (a,b) in R, (b,c) in S and
(c,a) in T
![Page 7: One algorithm to rule them all One join query at a time Atri Rudra University at Buffalo](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfd61a28abf838cadcad/html5/thumbnails/7.jpg)
Overview of the talk
A
B C
R
S
T
![Page 8: One algorithm to rule them all One join query at a time Atri Rudra University at Buffalo](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfd61a28abf838cadcad/html5/thumbnails/8.jpg)
The take-away message
Joinalgo
http://welovetumblr.blogspot.com/2012/07/thor-is.html
![Page 9: One algorithm to rule them all One join query at a time Atri Rudra University at Buffalo](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfd61a28abf838cadcad/html5/thumbnails/9.jpg)
Overview of the talk
A
B C
R
S
T
![Page 10: One algorithm to rule them all One join query at a time Atri Rudra University at Buffalo](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfd61a28abf838cadcad/html5/thumbnails/10.jpg)
(Database) Joins
Codd
Attributes/Nodes: [n]
Relations/Hyperedges: e1,…, em [n]
11
2233
44
55
Tables/Projections: R1 , … , Rm
Output all a = (a1,..,an) s.t. a projected down to
ei is in Ri for every i in [m]
Output all a = (a1,..,an) s.t. a projected down to
ei is in Ri for every i in [m]
![Page 11: One algorithm to rule them all One join query at a time Atri Rudra University at Buffalo](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfd61a28abf838cadcad/html5/thumbnails/11.jpg)
The triangle join query
A
B
C
R
TS
Output all (a,b,c) s.t. (a,b) in R, (b,c) in S and
(c,a) in T
Output all (a,b,c) s.t. (a,b) in R, (b,c) in S and
(c,a) in T
S
AA
BB CC
R T
![Page 12: One algorithm to rule them all One join query at a time Atri Rudra University at Buffalo](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfd61a28abf838cadcad/html5/thumbnails/12.jpg)
Bounding the output size
Atserias Grohe Marx
AA
BB CC
S
R T
Highly trivial bound: R S T
Still trivial bound: R S
Loomis-Whitney bound: R1/2 S1/2 T1/2
½
½
½x
y
z
AGM bound: Rx Sy Tz
x + z ≥ 1 x + y ≥ 1 y + z ≥ 1
AA
BB
CCx, y, z ≥ 0
![Page 13: One algorithm to rule them all One join query at a time Atri Rudra University at Buffalo](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfd61a28abf838cadcad/html5/thumbnails/13.jpg)
Loomis Whitney
?
![Page 14: One algorithm to rule them all One join query at a time Atri Rudra University at Buffalo](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfd61a28abf838cadcad/html5/thumbnails/14.jpg)
Algorithmic Loomis-WhitneyLoomis-Whitney bound: R1/2 S1/2 T1/2
AA
BB CC
S
R T½
½
½
R
TS CC
BBAA
c
Goal: Count number of trianglesGoal: Count number of triangles
There are Rchoices for edges in R
There are dS(c)dT(c)choices for pairs ofneighbors of c
http://agilitrix.com/2011/03/red-pill-blue-pill/
TS CC
BBAA
c
dT(c)dS(c)
![Page 15: One algorithm to rule them all One join query at a time Atri Rudra University at Buffalo](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfd61a28abf838cadcad/html5/thumbnails/15.jpg)
Algorithmic Loomis-WhitneyLoomis-Whitney bound: R1/2 S1/2 T1/2
Goal: Count number of trianglesGoal: Count number of triangles
There are Rchoices for edges in R
There are dS(c)dT(c)choices for pairs ofneighbors of c
Make this choice for every c in CMake this choice for every c in C
Run time of algo=Σc min( R
,dS(c)dT(c) )
Run time of algo=Σc min( R
,dS(c)dT(c) )
R
TS CC
BBAA
c
![Page 16: One algorithm to rule them all One join query at a time Atri Rudra University at Buffalo](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfd61a28abf838cadcad/html5/thumbnails/16.jpg)
Analyzing the algorithmLoomis Whitney bound: R½ S½ T½
Σc min( R , dS(c) dT(c) )
≤ Σc (R dS(c) dT(c) ) ½
= R½Σc ( dS(c) ½ dT(c) ½ )
≤ R½(Σc dS(c)) ½(ΣcdT(c)) ½
= R½S½T½
R
TS CC
BBAA
c
Cauchy Schwartz
min(E,F) ≤ (EF)½
min(E,F) ≤ (EF)½
![Page 17: One algorithm to rule them all One join query at a time Atri Rudra University at Buffalo](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfd61a28abf838cadcad/html5/thumbnails/17.jpg)
?Atserias Grohe Marx
![Page 18: One algorithm to rule them all One join query at a time Atri Rudra University at Buffalo](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfd61a28abf838cadcad/html5/thumbnails/18.jpg)
Same algorithm!AGM bound: Rx Sy Tz
Σc min( R , dS(c) dT(c) )
≤ Σc Rx (dS(c) dT(c) ) 1-x
≤ RxΣc ( dS(c) y dT(c) z )
≤ Rx(Σc dS(c)) y(ΣcdT(c)) z
= RxSyTz
R
TS CC
BBAA
c
x + z ≥ 1 x + y ≥ 1 y + z ≥ 1
AA
BB
CC
Hölder
min(E,F) ≤ ExF1-x
min(E,F) ≤ ExF1-x
![Page 19: One algorithm to rule them all One join query at a time Atri Rudra University at Buffalo](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfd61a28abf838cadcad/html5/thumbnails/19.jpg)
General Join Result
Attributes/Nodes: [n]
Relations/Hyperedges: e1,…, em [n]
11
2233
44
55
Tables/Projections: R1 , … , Rm
x1,..,xm be a fractional cover
AGM bound: R1x1…Rm
xm
Our result: O(AGM + Input size)
x1
x2
x3
x4
Provably worst-case
optimal join algorithm
Provably worst-case
optimal join algorithm
![Page 20: One algorithm to rule them all One join query at a time Atri Rudra University at Buffalo](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfd61a28abf838cadcad/html5/thumbnails/20.jpg)
List recovery
.
.
.
..
.
.
S1 S2 S3 Sn
………………………Si subset of [q]
………………………c1 c2 c3 cn
20
Code C subset of [q]nApplications in
expandersApplications in
expanders
![Page 21: One algorithm to rule them all One join query at a time Atri Rudra University at Buffalo](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfd61a28abf838cadcad/html5/thumbnails/21.jpg)
An alternate view of joins
A
B C
R
S
T Msg in [q]3
Codeword in [q2]3
.
.
.
..
R S T
Constant dimensionConstant block length
Large alphabet sizeLarge input list size
Constant dimensionConstant block length
Large alphabet sizeLarge input list size
![Page 22: One algorithm to rule them all One join query at a time Atri Rudra University at Buffalo](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfd61a28abf838cadcad/html5/thumbnails/22.jpg)
Overview of the talk
A
B C
R
S
T
![Page 23: One algorithm to rule them all One join query at a time Atri Rudra University at Buffalo](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfd61a28abf838cadcad/html5/thumbnails/23.jpg)
Sparse Recovery/Compressed Sensing
UnknownTo be designed
Observed
DecodeDecode
Output
k=2
Heavy Hitter
Tail
![Page 24: One algorithm to rule them all One join query at a time Atri Rudra University at Buffalo](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfd61a28abf838cadcad/html5/thumbnails/24.jpg)
Quantifying the approximation
L2 ≤ C L2
![Page 25: One algorithm to rule them all One join query at a time Atri Rudra University at Buffalo](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfd61a28abf838cadcad/html5/thumbnails/25.jpg)
(Most of) rest of the talk
![Page 26: One algorithm to rule them all One join query at a time Atri Rudra University at Buffalo](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfd61a28abf838cadcad/html5/thumbnails/26.jpg)
Designing the matrix
UnknownTo be designed
Observed
DecodeDecode
Output
k=2
![Page 27: One algorithm to rule them all One join query at a time Atri Rudra University at Buffalo](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfd61a28abf838cadcad/html5/thumbnails/27.jpg)
Designing the matrix k=2
N
m
k-expander
N m
< ¼ (neighborhood)
Measurement = + noise
Heavy tail noise < ¼ (neighborhood)
> ½ of the neighbors of have the
“correct” value
> ½ of the neighbors of have the
“correct” value
![Page 28: One algorithm to rule them all One join query at a time Atri Rudra University at Buffalo](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfd61a28abf838cadcad/html5/thumbnails/28.jpg)
Count-Sketch style algo k=2
N m
Estimate = median of O(log N) values
Output the top O(k) estimates
O(N log N) decoding
Indyk Ružić
![Page 29: One algorithm to rule them all One join query at a time Atri Rudra University at Buffalo](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfd61a28abf838cadcad/html5/thumbnails/29.jpg)
We need a faster algorithm…
![Page 30: One algorithm to rule them all One join query at a time Atri Rudra University at Buffalo](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfd61a28abf838cadcad/html5/thumbnails/30.jpg)
S
Towards a sub-linear time algo
Estimate=median value
Output the top O(k) estimates in S
O(|S| log N) decoding
All we need to do is to
compute a small S quikcly
All we need to do is to
compute a small S quikcly
![Page 31: One algorithm to rule them all One join query at a time Atri Rudra University at Buffalo](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfd61a28abf838cadcad/html5/thumbnails/31.jpg)
Porat-Strauss Idea: Recursion!
[N]
{0,1}log N
[√N] [√N]
Solve in ~ √N time Solve in ~ √N time
![Page 32: One algorithm to rule them all One join query at a time Atri Rudra University at Buffalo](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfd61a28abf838cadcad/html5/thumbnails/32.jpg)
The problem we now need to solveElements of S Geometrically…
k
k
?
Output size ~ k2Overall running time ~ √N + k2
Not sub-linear for
k > √N
Not sub-linear for
k > √N
Use a table-look up to decrease
the run time
Use a table-look up to decrease
the run time
![Page 33: One algorithm to rule them all One join query at a time Atri Rudra University at Buffalo](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfd61a28abf838cadcad/html5/thumbnails/33.jpg)
Finally…
![Page 34: One algorithm to rule them all One join query at a time Atri Rudra University at Buffalo](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfd61a28abf838cadcad/html5/thumbnails/34.jpg)
Slightly different recursionlog N
[N]
[N⅔] [N⅔] [N⅔]
Geometricproblem tosolve
Overall runtime
k3/2 + N2/3
![Page 35: One algorithm to rule them all One join query at a time Atri Rudra University at Buffalo](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfd61a28abf838cadcad/html5/thumbnails/35.jpg)
Our Results
L2/L2 sparse recovery with failure prob p
Optimal k log(N/k) measurements*
k1+ε poly-log N decoding+space
p ~ (N/k)-k/poly-log k
Also prove tight lower bound of k log(N/k) + log(1/p)
![Page 36: One algorithm to rule them all One join query at a time Atri Rudra University at Buffalo](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfd61a28abf838cadcad/html5/thumbnails/36.jpg)
One algorithm to rule them allOne join query at a time
Atri RudraUniversity at Buffalo
![Page 37: One algorithm to rule them all One join query at a time Atri Rudra University at Buffalo](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfd61a28abf838cadcad/html5/thumbnails/37.jpg)
Only two problems so far…
A
B C
R
S
T
![Page 38: One algorithm to rule them all One join query at a time Atri Rudra University at Buffalo](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfd61a28abf838cadcad/html5/thumbnails/38.jpg)
Albert Meyer (via Dick Lipton)
"Prove it for n=3 and then let 3 go to infinity"
![Page 39: One algorithm to rule them all One join query at a time Atri Rudra University at Buffalo](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfd61a28abf838cadcad/html5/thumbnails/39.jpg)
The 3rd problem…
Big (hyper)graph G
http://pigeonsandplanes.com/2010/12/thoughts-on-net-neutrality.html
11
2233
44
55
Small (hyper) graph H
Compute all copies of H in G
Our join algorithm gives a worst-case optimal algorithm for any constant-sized H
Our join algorithm gives a worst-case optimal algorithm for any constant-sized H
Joins model many more
problems, e.g. CSPs
Joins model many more
problems, e.g. CSPs
![Page 40: One algorithm to rule them all One join query at a time Atri Rudra University at Buffalo](https://reader035.vdocument.in/reader035/viewer/2022070413/5697bfd61a28abf838cadcad/html5/thumbnails/40.jpg)
The take-away message
Joinalgo
http://welovetumblr.blogspot.com/2012/07/thor-is.html