beyond set disjointness : the communication complexity of finding the intersection

25
Beyond Set Disjointness: The Communication Complexity of Finding the Intersection Grigory Yaroslavtsev http://grigory.us Joint with Brody, Chakrabarti, Kondapally and Woodruff

Upload: garron

Post on 21-Feb-2016

32 views

Category:

Documents


0 download

DESCRIPTION

Beyond Set Disjointness : The Communication Complexity of Finding the Intersection. Grigory Yaroslavtsev http://grigory.us. Joint with Brody, Chakrabarti , Kondapally and Woodruff. Communication Complexity [Yaoโ€™79]. Shared randomness. Bob: . Alice: . โ€ฆ. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Beyond Set Disjointness: The Communication Complexity of Finding the

Intersection

Grigory Yaroslavtsevhttp://grigory.us

Joint with Brody, Chakrabarti, Kondapally and Woodruff

Page 2: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Communication Complexity [Yaoโ€™79]

Alice: Bob:

๐’‡ (๐’™ ,๐’š )=?

Shared randomness

โ€ฆ๐’‡ (๐’™ ,๐’š )

โ€ข = min. communication (error ) โ€ข min. -round communication (error )

Page 3: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Set Intersection

๐’™=๐‘† , ๐’š=๐‘‡ , ๐’‡ (๐’™ , ๐’š )=๐‘†โˆฉ๐‘‡๐‘บโŠ† [๐‘› ] ,|๐‘†|โ‰ค๐’Œ ๐‘ป โŠ† [๐‘› ] ,|๐‘‡|โ‰ค๐’Œ = ?

(-Intersection) = ?

Page 4: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

This talk

Let

โ€ข (-Intersection) = [Brody, Chakrabarti, Kondapally, Woodruff, Y.; PODCโ€™14]โ€ข (-Intersection) = [Saglam-Tardos FOCSโ€™13; Brody, Chakrabarti, Kondapally, Woodruff, Y.โ€™13]

{

times

(-Intersection) = for

Page 5: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

-Disjointnessโ€ข , iff โ€ข [Razborovโ€™92; Hastad-Wigdersonโ€™96] โ€ข [Folklore + Dasgupta, Kumar, Sivakumar; Buhrmanโ€™12, Garcia-Soriano, Matsliah, De Wolfโ€™12]

โ€ข [Saglam, Tardosโ€™13]โ€ข [Braverman, Garg, Pankratov, Weinsteinโ€™13]

Page 6: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Applications

โ€ข : exact Jaccard index ( for -approximate use MinHash [Broderโ€™98; Li-Konigโ€™11; Path-Strokel-Woodruffโ€™14])โ€ข Rarity, distinct elements, joins,โ€ฆโ€ข Multi-party set intersection (later)โ€ข Contrast:

Page 7: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

1-round -protocol

๐’‰ : [๐’ ]โ†’[๐’Œ3]

๐‘บ ๐‘ป

๐’‰(๐‘บ) ๐’‰(๐‘ป )

[๐’ ] [๐’ ]

[๐’Œ3] [๐’Œ3]

Page 8: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Hashing

log ๐’Œ

=# of buckets

๐’‰ : [๐’ ]โ†’[๐’Œ / log๐’Œ]

Expected # of elements

Page 9: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Secondary Hashing

= # of hash functions

log 3๐’Œ where

Page 10: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

2-Round -protocol

log 3๐’Œ

log 3๐’Œ

|h๐‘– (๐‘บ )|=|h๐‘– (๐‘ป )|=๐‘‚ ( log๐’Œ log log๐’Œ )

Total communication = = O()

Page 11: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Collisions

๐’Œlog๐’Œ

log 3๐’ŒPr [๐‘๐‘œ๐‘™๐‘™๐‘–๐‘ ๐‘–๐‘œ๐‘› ]=๐‘‚(1log๐’Œ )

Page 12: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Collisions

log 3๐’Œ

log 3๐’Œ

Page 13: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Collisions

โ€ข Second round: โ€“ For each bucket send -bit equality check (total -

communication)โ€“ Correct intersection computed in buckets where

โ€“ Expected # of items in incorrect buckets โ€“ Use 1-round protocol for incorrect bucketsโ€“ Total communication

Page 14: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Main protocol

๐‘‚ (1)

=# of buckets

๐’‰ : [๐’ ]โ†’[๐’Œ]

Expected # of elements

Page 15: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Verification tree -degree

โ€ฆlog ๐‘Ÿโˆ’ 1๐’Œ

buckets = leaves of the verification tree

Page 16: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Verification bottom-up

๐‘บ๐Ÿโ‘ ,๐“๐Ÿ

โ‘ ๐‘บ๐Ÿโ‘ ,๐“๐Ÿ

โ‘

๐‘บ๐Ÿโ‘โˆช๐‘บ๐Ÿ ,๐“๐Ÿ

โ‘โˆช๐‘ป ๐Ÿ

๐‘บ๐Ÿโ‘โˆฉ๐“๐Ÿ

โ‘๐‘บ๐Ÿโ‘โˆฉ๐“๐Ÿ

โ‘

(๐‘บ๐Ÿโ‘โˆช๐‘บ๐Ÿ )โˆฉ(๐“ ยฟยฟ๐Ÿโ‘โˆช๐‘ป ๐Ÿ)ยฟ

Page 17: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

EQ()

Verification bottom-up

๐‘บ๐Ÿโ‘โˆฉ๐“๐Ÿ

โ‘๐‘บ๐Ÿโ‘โˆฉ๐“๐Ÿ

โ‘

(๐‘บ๐Ÿโ‘โˆช๐‘บ๐Ÿ )โˆฉ(๐“ ยฟยฟ๐Ÿโ‘โˆช๐‘ป ๐Ÿ)ยฟ

Correct Incorrect

Incorrect

๐‘บ๐Ÿโ‘โˆฉ๐“๐Ÿ

โ‘๐‘บ๐Ÿโ‘โˆฉ๐“๐Ÿ

โ‘

(๐‘บ๐Ÿโ‘โˆช๐‘บ๐Ÿ )โˆฉ(๐“ ยฟยฟ๐Ÿโ‘โˆช๐‘ป ๐Ÿ)ยฟ

Correct Incorrect

Page 18: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Correct

EQ()

Verification bottom-up

๐‘บ๐Ÿโ‘โˆฉ๐“๐Ÿ

โ‘๐‘บ๐Ÿโ‘โˆฉ๐“๐Ÿ

โ‘

(๐‘บ๐Ÿโ‘โˆช๐‘บ๐Ÿ )โˆฉ(๐“ ยฟยฟ๐Ÿโ‘โˆช๐‘ป ๐Ÿ)ยฟ

Correct Incorrect

Incorrect

๐‘บ๐Ÿโ‘โˆฉ๐“๐Ÿ

โ‘๐‘บ๐Ÿโ‘โˆฉ๐“๐Ÿ

โ‘

(๐‘บ๐Ÿโ‘โˆช๐‘บ๐Ÿ )โˆฉ(๐“ ยฟยฟ๐Ÿโ‘โˆช๐‘ป ๐Ÿ)ยฟ

Correct Incorrect

Correct

Page 19: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Verification bottom-up

๐’‘๐’“ โˆ’๐Ÿ

โ€ฆ๐’‘๐Ÿ

๐‘บ๐Ÿ๐Ÿ ,๐“๐Ÿ

๐Ÿ โ€ฆ ๐‘บ๐’Š๐Ÿ ,๐“ ๐ข

๐Ÿ๐‘บ๐Ÿ๐Ÿ ,๐“๐Ÿ

๐Ÿ ๐‘บ๐’Œ๐Ÿ ,๐“๐’Œ

๐Ÿโ€ฆ

๐’‘๐’“ โˆ’๐Ÿ

Page 20: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Analysis of Stage

โ€ข = [node at stage computed correctly]โ€ข Set = โ€“ Run equality checks and basic intersection

protocols with success probability โ€“ Key lemma: [# of restarts per leaf โ€“ Cost of Equality = โ€“ Cost of Intersection in leafs =

โ€ข [protocol succeeds] =

Page 21: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Lower Bound

โ€ข (-Intersection) = [Brody, Chakrabarti, Kondapally, Woodruff, Y.โ€™13]โ€ข iff , where โ€ข = solving independent instances of โ€ข reduces to -Intersection:โ€“ Given and โ€“ Construct sets with elements and

Page 22: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Communication Direct Sums

โ€œSolving m copies of a communication problem requires m times more communicationโ€:โ€ข For arbitrary [โ€ฆ Braverman, Rao 10; Barak

Braverman, Chen, Rao 11, โ€ฆ.]โ€ข In general, canโ€™t go beyond

Page 23: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Information cost Communication complexityโ€ข [Bar Yossef, Jayram, Kumar,Sivakumarโ€™01]

Disjointness

โ€ข Stronger direct sum for bounded-round complexity of Equality-type problems (a.k.a. โ€œunion bound is optimalโ€) [Molinaro, Woodruff, Y.โ€™13]

Specialized Communication Direct Sums

Page 24: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Extensions

โ€ข Multi-party: players, , where

โ€“ Boost error probability to โ€“ Average per player (using coordinator):

in roundsโ€“Worst-case per player (using a tournament) in rounds

Page 25: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Open Problems

โ€ข (-Intersection) = โ€ข Better protocols for the multi-party setting