beyond set disjointness : the communication complexity of finding the intersection
DESCRIPTION
Beyond Set Disjointness : The Communication Complexity of Finding the Intersection. Grigory Yaroslavtsev http://grigory.us. Joint with Brody, Chakrabarti , Kondapally and Woodruff. Communication Complexity [Yaoโ79]. Shared randomness. Bob: . Alice: . โฆ. - PowerPoint PPT PresentationTRANSCRIPT
Beyond Set Disjointness: The Communication Complexity of Finding the
Intersection
Grigory Yaroslavtsevhttp://grigory.us
Joint with Brody, Chakrabarti, Kondapally and Woodruff
Communication Complexity [Yaoโ79]
Alice: Bob:
๐ (๐ ,๐ )=?
Shared randomness
โฆ๐ (๐ ,๐ )
โข = min. communication (error ) โข min. -round communication (error )
Set Intersection
๐=๐ , ๐=๐ , ๐ (๐ , ๐ )=๐โฉ๐๐บโ [๐ ] ,|๐|โค๐ ๐ป โ [๐ ] ,|๐|โค๐ = ?
(-Intersection) = ?
This talk
Let
โข (-Intersection) = [Brody, Chakrabarti, Kondapally, Woodruff, Y.; PODCโ14]โข (-Intersection) = [Saglam-Tardos FOCSโ13; Brody, Chakrabarti, Kondapally, Woodruff, Y.โ13]
{
times
(-Intersection) = for
-Disjointnessโข , iff โข [Razborovโ92; Hastad-Wigdersonโ96] โข [Folklore + Dasgupta, Kumar, Sivakumar; Buhrmanโ12, Garcia-Soriano, Matsliah, De Wolfโ12]
โข [Saglam, Tardosโ13]โข [Braverman, Garg, Pankratov, Weinsteinโ13]
Applications
โข : exact Jaccard index ( for -approximate use MinHash [Broderโ98; Li-Konigโ11; Path-Strokel-Woodruffโ14])โข Rarity, distinct elements, joins,โฆโข Multi-party set intersection (later)โข Contrast:
1-round -protocol
๐ : [๐ ]โ[๐3]
๐บ ๐ป
๐(๐บ) ๐(๐ป )
[๐ ] [๐ ]
[๐3] [๐3]
Hashing
log ๐
=# of buckets
๐ : [๐ ]โ[๐ / log๐]
Expected # of elements
Secondary Hashing
= # of hash functions
log 3๐ where
2-Round -protocol
log 3๐
log 3๐
|h๐ (๐บ )|=|h๐ (๐ป )|=๐ ( log๐ log log๐ )
Total communication = = O()
Collisions
๐log๐
log 3๐Pr [๐๐๐๐๐๐ ๐๐๐ ]=๐(1log๐ )
Collisions
log 3๐
log 3๐
Collisions
โข Second round: โ For each bucket send -bit equality check (total -
communication)โ Correct intersection computed in buckets where
โ Expected # of items in incorrect buckets โ Use 1-round protocol for incorrect bucketsโ Total communication
Main protocol
๐ (1)
=# of buckets
๐ : [๐ ]โ[๐]
Expected # of elements
Verification tree -degree
โฆlog ๐โ 1๐
buckets = leaves of the verification tree
Verification bottom-up
๐บ๐โ ,๐๐
โ ๐บ๐โ ,๐๐
โ
๐บ๐โโช๐บ๐ ,๐๐
โโช๐ป ๐
๐บ๐โโฉ๐๐
โ๐บ๐โโฉ๐๐
โ
(๐บ๐โโช๐บ๐ )โฉ(๐ ยฟยฟ๐โโช๐ป ๐)ยฟ
EQ()
Verification bottom-up
๐บ๐โโฉ๐๐
โ๐บ๐โโฉ๐๐
โ
(๐บ๐โโช๐บ๐ )โฉ(๐ ยฟยฟ๐โโช๐ป ๐)ยฟ
Correct Incorrect
Incorrect
๐บ๐โโฉ๐๐
โ๐บ๐โโฉ๐๐
โ
(๐บ๐โโช๐บ๐ )โฉ(๐ ยฟยฟ๐โโช๐ป ๐)ยฟ
Correct Incorrect
Correct
EQ()
Verification bottom-up
๐บ๐โโฉ๐๐
โ๐บ๐โโฉ๐๐
โ
(๐บ๐โโช๐บ๐ )โฉ(๐ ยฟยฟ๐โโช๐ป ๐)ยฟ
Correct Incorrect
Incorrect
๐บ๐โโฉ๐๐
โ๐บ๐โโฉ๐๐
โ
(๐บ๐โโช๐บ๐ )โฉ(๐ ยฟยฟ๐โโช๐ป ๐)ยฟ
Correct Incorrect
Correct
Verification bottom-up
๐๐ โ๐
โฆ๐๐
๐บ๐๐ ,๐๐
๐ โฆ ๐บ๐๐ ,๐ ๐ข
๐๐บ๐๐ ,๐๐
๐ ๐บ๐๐ ,๐๐
๐โฆ
๐๐ โ๐
Analysis of Stage
โข = [node at stage computed correctly]โข Set = โ Run equality checks and basic intersection
protocols with success probability โ Key lemma: [# of restarts per leaf โ Cost of Equality = โ Cost of Intersection in leafs =
โข [protocol succeeds] =
Lower Bound
โข (-Intersection) = [Brody, Chakrabarti, Kondapally, Woodruff, Y.โ13]โข iff , where โข = solving independent instances of โข reduces to -Intersection:โ Given and โ Construct sets with elements and
Communication Direct Sums
โSolving m copies of a communication problem requires m times more communicationโ:โข For arbitrary [โฆ Braverman, Rao 10; Barak
Braverman, Chen, Rao 11, โฆ.]โข In general, canโt go beyond
Information cost Communication complexityโข [Bar Yossef, Jayram, Kumar,Sivakumarโ01]
Disjointness
โข Stronger direct sum for bounded-round complexity of Equality-type problems (a.k.a. โunion bound is optimalโ) [Molinaro, Woodruff, Y.โ13]
Specialized Communication Direct Sums
Extensions
โข Multi-party: players, , where
โ Boost error probability to โ Average per player (using coordinator):
in roundsโWorst-case per player (using a tournament) in rounds
Open Problems
โข (-Intersection) = โข Better protocols for the multi-party setting