Transcript
Page 1: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

1

Basics of information theory and information complexity

June 1, 2013

Mark Braverman

Princeton University

a tutorial

Page 2: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Part I: Information theory

β€’ Information theory, in its modern format was introduced in the 1940s to study the problem of transmitting data over physical channels.

2

communication channel

Alice Bob

Page 3: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Quantifying β€œinformation”

β€’ Information is measured in bits.

β€’ The basic notion is Shannon’s entropy.

β€’ The entropy of a random variable is the (typical) number of bits needed to remove the uncertainty of the variable.

β€’ For a discrete variable: 𝐻 𝑋 ≔ βˆ‘Pr 𝑋 = π‘₯ log 1/Pr[𝑋 = π‘₯]

3

Page 4: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Shannon’s entropy

β€’ Important examples and properties:

– If 𝑋 = π‘₯ is a constant, then 𝐻 𝑋 = 0.

– If 𝑋 is uniform on a finite set 𝑆 of possible values, then 𝐻 𝑋 = log 𝑆.

– If 𝑋 is supported on at most 𝑛 values, then 𝐻 X ≀ log 𝑛.

– If π‘Œ is a random variable determined by 𝑋, then 𝐻 π‘Œ ≀ 𝐻(𝑋).

4

Page 5: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Conditional entropy

β€’ For two (potentially correlated) variables 𝑋, π‘Œ, the conditional entropy of 𝑋 given π‘Œ is the amount of uncertainty left in 𝑋 given π‘Œ:

𝐻 𝑋 π‘Œ ≔ 𝐸𝑦~π‘ŒH X Y = y .

β€’ One can show 𝐻 π‘‹π‘Œ = 𝐻 π‘Œ + 𝐻(𝑋|π‘Œ).

β€’ This important fact is knows as the chain rule.

β€’ If 𝑋 βŠ₯ π‘Œ, then 𝐻 π‘‹π‘Œ = 𝐻 𝑋 + 𝐻 π‘Œ 𝑋 = 𝐻 𝑋 + 𝐻 π‘Œ .

5

Page 6: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Example

β€’ 𝑋 = 𝐡1, 𝐡2, 𝐡3

β€’ π‘Œ = 𝐡1 βŠ•π΅2 , 𝐡2 βŠ•π΅4 , 𝐡3 βŠ•π΅4 , 𝐡5

β€’ Where 𝐡1, 𝐡2, 𝐡3, 𝐡4, 𝐡5 βˆˆπ‘ˆ {0,1}.

β€’ Then

– 𝐻 𝑋 = 3; 𝐻 π‘Œ = 4; 𝐻 π‘‹π‘Œ = 5;

– 𝐻 𝑋 π‘Œ = 1 = 𝐻 π‘‹π‘Œ βˆ’ 𝐻 π‘Œ ;

– 𝐻 π‘Œ 𝑋 = 2 = 𝐻 π‘‹π‘Œ βˆ’ 𝐻 𝑋 .

6

Page 7: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Mutual information

β€’ 𝑋 = 𝐡1, 𝐡2, 𝐡3

β€’ π‘Œ = 𝐡1 βŠ•π΅2 , 𝐡2 βŠ•π΅4 , 𝐡3 βŠ•π΅4 , 𝐡5

7

𝐻(𝑋) 𝐻(π‘Œ) 𝐻(π‘Œ|𝑋) 𝐻(𝑋|π‘Œ)

𝐡1 𝐡1 βŠ•π΅2

𝐡2 βŠ•π΅3

𝐡4

𝐡5

𝐼(𝑋; π‘Œ)

Page 8: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Mutual information

β€’ The mutual information is defined as 𝐼 𝑋; π‘Œ = 𝐻 𝑋 βˆ’ 𝐻 𝑋 π‘Œ = 𝐻 π‘Œ βˆ’ 𝐻(π‘Œ|𝑋)

β€’ β€œBy how much does knowing 𝑋 reduce the entropy of π‘Œ?”

β€’ Always non-negative 𝐼 𝑋; π‘Œ β‰₯ 0.

β€’ Conditional mutual information: 𝐼 𝑋; π‘Œ 𝑍 ≔ 𝐻 𝑋 𝑍 βˆ’ 𝐻(𝑋|π‘Œπ‘)

β€’ Chain rule for mutual information: 𝐼 π‘‹π‘Œ; 𝑍 = 𝐼 𝑋; 𝑍 + 𝐼(π‘Œ; 𝑍|𝑋)

β€’ Simple intuitive interpretation.

8

Page 9: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Example – a biased coin

β€’ A coin with πœ€-Heads or Tails bias is tossed several times.

β€’ Let 𝐡 ∈ {𝐻, 𝑇} be the bias, and suppose that a-priori both options are equally likely: 𝐻 𝐡 = 1.

β€’ How many tosses needed to find 𝐡?

β€’ Let 𝑇1, … , π‘‡π‘˜ be a sequence of tosses.

β€’ Start with π‘˜ = 2. 9

Page 10: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

What do we learn about 𝐡? β€’ 𝐼 𝐡; 𝑇1𝑇2 = 𝐼 𝐡; 𝑇1 + 𝐼 𝐡; 𝑇2 𝑇1 =

𝐼 𝐡; 𝑇1 + 𝐼 𝐡𝑇1; 𝑇2 βˆ’ 𝐼 𝑇1; 𝑇2

≀ 𝐼 𝐡; 𝑇1 + 𝐼 𝐡𝑇1; 𝑇2 = 𝐼 𝐡; 𝑇1 + 𝐼 𝐡; 𝑇2 + 𝐼 𝑇1; 𝑇2 𝐡

= 𝐼 𝐡; 𝑇1 + 𝐼 𝐡; 𝑇2 = 2 β‹… 𝐼 𝐡; 𝑇1 .

β€’ Similarly, 𝐼 𝐡; 𝑇1β€¦π‘‡π‘˜ ≀ π‘˜ β‹… 𝐼 𝐡; 𝑇1 .

β€’ To determine 𝐡 with constant accuracy, need 0 < c < 𝐼 𝐡; 𝑇1β€¦π‘‡π‘˜ ≀ π‘˜ β‹… 𝐼 𝐡; 𝑇1 .

β€’ π‘˜ = Ξ©(1/𝐼(𝐡; 𝑇1)). 10

Page 11: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Kullback–Leibler (KL)-Divergence

β€’ A distance metric between distributions on the same space.

β€’ Plays a key role in information theory.

𝐷(𝑃 βˆ₯ 𝑄) ≔ 𝑃 π‘₯ log𝑃[π‘₯]

𝑄[π‘₯].

π‘₯

β€’ 𝐷(𝑃 βˆ₯ 𝑄) β‰₯ 0, with equality when 𝑃 = 𝑄.

β€’ Caution: 𝐷(𝑃 βˆ₯ 𝑄) β‰  𝐷(𝑄 βˆ₯ 𝑃)!

11

Page 12: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Properties of KL-divergence

β€’ Connection to mutual information: 𝐼 𝑋; π‘Œ = πΈπ‘¦βˆΌπ‘Œπ·(π‘‹π‘Œ=𝑦 βˆ₯ 𝑋).

β€’ If 𝑋 βŠ₯ π‘Œ, then π‘‹π‘Œ=𝑦 = 𝑋, and both sides are 0.

β€’ Pinsker’s inequality:

𝑃 βˆ’ 𝑄 1 = 𝑂( 𝐷(𝑃 βˆ₯ 𝑄)).

β€’ Tight!

𝐷(𝐡1/2+πœ€ βˆ₯ 𝐡1/2) = Θ(πœ€2).

12

Page 13: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Back to the coin example

β€’ 𝐼 𝐡; 𝑇1 = πΈπ‘βˆΌπ΅π·(𝑇1,𝐡=𝑏 βˆ₯ 𝑇1) = 𝐷 𝐡1

2Β±πœ€

βˆ₯ 𝐡12

= Θ πœ€2 .

β€’ π‘˜ = Ξ©1

𝐼 𝐡;𝑇1= Ξ©(

1

πœ€2).

β€’ β€œFollow the information learned from the coin tosses”

β€’ Can be done using combinatorics, but the information-theoretic language is more natural for expressing what’s going on.

Page 14: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Back to communication

β€’ The reason Information Theory is so important for communication is because information-theoretic quantities readily operationalize.

β€’ Can attach operational meaning to Shannon’s entropy: 𝐻 𝑋 β‰ˆ β€œthe cost of transmitting 𝑋”.

β€’ Let 𝐢 𝑋 be the (expected) cost of transmitting a sample of 𝑋.

14

Page 15: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

𝐻 𝑋 = 𝐢(𝑋)?

β€’ Not quite.

β€’ Let trit 𝑇 βˆˆπ‘ˆ 1,2,3 .

β€’ 𝐢 𝑇 =5

3β‰ˆ 1.67.

β€’ 𝐻 𝑇 = log 3 β‰ˆ 1.58.

β€’ It is always the case that 𝐢 𝑋 β‰₯ 𝐻(𝑋).

15

1 0

2 10

3 11

Page 16: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

But 𝐻 𝑋 and 𝐢(𝑋) are close

β€’ Huffman’s coding: 𝐢 𝑋 ≀ 𝐻 𝑋 + 1.

β€’ This is a compression result: β€œan uninformative message turned into a short one”.

β€’ Therefore: 𝐻 𝑋 ≀ 𝐢 𝑋 ≀ 𝐻 𝑋 + 1.

16

Page 17: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Shannon’s noiseless coding β€’ The cost of communicating many copies of 𝑋

scales as 𝐻(𝑋).

β€’ Shannon’s source coding theorem:

– Let 𝐢 𝑋𝑛 be the cost of transmitting 𝑛 independent copies of 𝑋. Then the amortized transmission cost

limπ‘›β†’βˆž

𝐢(𝑋𝑛)/𝑛 = 𝐻 𝑋 .

β€’ This equation gives 𝐻(𝑋) operational meaning.

17

Page 18: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

communication channel

𝐻 𝑋 π‘œπ‘π‘’π‘Ÿπ‘Žπ‘‘π‘–π‘œπ‘›π‘Žπ‘™π‘–π‘§π‘’π‘‘

18

𝑋1, … , 𝑋𝑛, … 𝐻(𝑋) per copy to transmit 𝑋’s

Page 19: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

𝐻(𝑋) is nicer than 𝐢(𝑋)

β€’ 𝐻 𝑋 is additive for independent variables.

β€’ Let 𝑇1, 𝑇2 βˆˆπ‘ˆ {1,2,3} be independent trits.

β€’ 𝐻 𝑇1𝑇2 = log 9 = 2 log 3.

β€’ 𝐢 𝑇1𝑇2 =29

9< 𝐢 𝑇1 + 𝐢 𝑇2 = 2 Γ—

5

3=

30

9.

β€’ Works well with concepts such as channel capacity.

19

Page 20: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

β€œProof” of Shannon’s noiseless coding

β€’ 𝑛 β‹… 𝐻 𝑋 = 𝐻 𝑋𝑛 ≀ 𝐢 𝑋𝑛 ≀ 𝐻 𝑋𝑛 + 1.

β€’ Therefore limπ‘›β†’βˆž

𝐢(𝑋𝑛)/𝑛 = 𝐻 𝑋 .

20

Additivity of entropy

Compression (Huffman)

Page 21: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Operationalizing other quantities

β€’ Conditional entropy 𝐻 𝑋 π‘Œ :

β€’ (cf. Slepian-Wolf Theorem).

communication channel

𝑋1, … , 𝑋𝑛, … 𝐻(𝑋|π‘Œ) per copy to transmit 𝑋’s

π‘Œ1, … , π‘Œπ‘›, …

Page 22: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

communication channel

Operationalizing other quantities

β€’ Mutual information 𝐼 𝑋; π‘Œ :

𝑋1, … , 𝑋𝑛, … 𝐼 𝑋; π‘Œ per copy to sample π‘Œβ€™s

π‘Œ1, … , π‘Œπ‘›, …

Page 23: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Information theory and entropy

β€’ Allows us to formalize intuitive notions.

β€’ Operationalized in the context of one-way transmission and related problems.

β€’ Has nice properties (additivity, chain rule…)

β€’ Next, we discuss extensions to more interesting communication scenarios.

23

Page 24: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Communication complexity

β€’ Focus on the two party randomized setting.

24

A B

X Y A & B implement a functionality 𝐹(𝑋, π‘Œ).

F(X,Y)

e.g. 𝐹 𝑋, π‘Œ = β€œπ‘‹ = π‘Œ? ”

Shared randomness R

Page 25: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Communication complexity

A B

X Y

Goal: implement a functionality 𝐹(𝑋, π‘Œ). A protocol πœ‹(𝑋, π‘Œ) computing 𝐹(𝑋, π‘Œ):

F(X,Y)

m1(X,R)

m2(Y,m1,R)

m3(X,m1,m2,R)

Communication cost = #of bits exchanged.

Shared randomness R

Page 26: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Communication complexity

β€’ Numerous applications/potential applications (some will be discussed later today).

β€’ Considerably more difficult to obtain lower bounds than transmission (still much easier than other models of computation!).

26

Page 27: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Communication complexity

β€’ (Distributional) communication complexity with input distribution πœ‡ and error πœ€: 𝐢𝐢 𝐹, πœ‡, πœ€ . Error ≀ πœ€ w.r.t. πœ‡.

β€’ (Randomized/worst-case) communication complexity: 𝐢𝐢(𝐹, πœ€). Error ≀ πœ€ on all inputs.

β€’ Yao’s minimax:

𝐢𝐢 𝐹, πœ€ = maxπœ‡

𝐢𝐢(𝐹, πœ‡, πœ€).

27

Page 28: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Examples

β€’ 𝑋, π‘Œ ∈ 0,1 𝑛.

β€’ Equality 𝐸𝑄 𝑋, π‘Œ ≔ 1𝑋=π‘Œ.

β€’ 𝐢𝐢 𝐸𝑄, πœ€ β‰ˆ log1

πœ€.

β€’ 𝐢𝐢 𝐸𝑄, 0 β‰ˆ 𝑛.

28

Page 29: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Equality ‒𝐹is β€œπ‘‹ = π‘Œ? ”. β€’πœ‡is a distribution where w.p. ½𝑋 = π‘Œ and w.p. Β½ (𝑋, π‘Œ) are random.

A B

X Y

β€’ Shows that 𝐢𝐢 𝐸𝑄, πœ‡, 2βˆ’129 ≀ 129.

MD5(X) [128 bits]

X=Y? [1 bit]

Error?

Page 30: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Examples

β€’ 𝑋, π‘Œ ∈ 0,1 𝑛.

β€’ Innerproduct𝐼𝑃 𝑋, π‘Œ ≔ βˆ‘ 𝑋𝑖 β‹… π‘Œπ‘–π‘– (π‘šπ‘œπ‘‘2).

β€’ 𝐢𝐢 𝐼𝑃, 0 = 𝑛 βˆ’ π‘œ(𝑛).

In fact, using information complexity:

β€’ 𝐢𝐢 𝐼𝑃, πœ€ = 𝑛 βˆ’ π‘œπœ€(𝑛).

30

Page 31: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Information complexity

β€’ Information complexity 𝐼𝐢(𝐹, πœ€)::

communication complexity 𝐢𝐢 𝐹, πœ€

as

β€’ Shannon’s entropy 𝐻(𝑋) ::

transmission cost 𝐢(𝑋)

31

Page 32: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Information complexity

β€’ The smallest amount of information Alice and Bob need to exchange to solve 𝐹.

β€’ How is information measured?

β€’ Communication cost of a protocol?

– Number of bits exchanged.

β€’ Information cost of a protocol?

– Amount of information revealed.

32

Page 33: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Basic definition 1: The information cost of a protocol

β€’ Prior distribution: 𝑋, π‘Œ ∼ πœ‡.

A B

X Y

Protocol Ο€ Protocol

transcript Ξ 

𝐼𝐢(πœ‹, πœ‡) = 𝐼(Ξ ; π‘Œ|𝑋) + 𝐼(Ξ ; 𝑋|π‘Œ)

what Alice learns about Y + what Bob learns about X

Page 34: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Example ‒𝐹is β€œπ‘‹ = π‘Œ? ”. β€’πœ‡is a distribution where w.p. ½𝑋 = π‘Œ and w.p. Β½ (𝑋, π‘Œ) are random.

A B

X Y

𝐼𝐢(πœ‹, πœ‡) = 𝐼(Ξ ; π‘Œ|𝑋) + 𝐼(Ξ ; 𝑋|π‘Œ) β‰ˆ1 + 64.5 = 65.5 bits

what Alice learns about Y + what Bob learns about X

MD5(X) [128 bits]

X=Y? [1 bit]

Page 35: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Prior πœ‡ matters a lot for information cost!

β€’ If πœ‡ = 1 π‘₯,𝑦 a singleton,

𝐼𝐢 πœ‹, πœ‡ = 0.

35

Page 36: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Example ‒𝐹is β€œπ‘‹ = π‘Œ? ”. β€’πœ‡is a distribution where (𝑋, π‘Œ) are just uniformly random.

A B

X Y

𝐼𝐢(πœ‹, πœ‡) = 𝐼(Ξ ; π‘Œ|𝑋) + 𝐼(Ξ ; 𝑋|π‘Œ) β‰ˆ0 + 128 = 128 bits

what Alice learns about Y + what Bob learns about X

MD5(X) [128 bits]

X=Y? [1 bit]

Page 37: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Basic definition 2: Information complexity

β€’ Communication complexity:

𝐢𝐢 𝐹, πœ‡, πœ€ ≔ minπœ‹π‘π‘œπ‘šπ‘π‘’π‘‘π‘’π‘ 

πΉπ‘€π‘–π‘‘β„Žπ‘’π‘Ÿπ‘Ÿπ‘œπ‘Ÿβ‰€πœ€

πœ‹ .

β€’ Analogously:

𝐼𝐢 𝐹, πœ‡, πœ€ ≔ infπœ‹π‘π‘œπ‘šπ‘π‘’π‘‘π‘’π‘ 

πΉπ‘€π‘–π‘‘β„Žπ‘’π‘Ÿπ‘Ÿπ‘œπ‘Ÿβ‰€πœ€

𝐼𝐢(πœ‹, πœ‡).

37

Needed!

Page 38: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Prior-free information complexity

β€’ Using minimax can get rid of the prior.

β€’ For communication, we had:

𝐢𝐢 𝐹, πœ€ = maxπœ‡

𝐢𝐢(𝐹, πœ‡, πœ€).

β€’ For information

𝐼𝐢 𝐹, πœ€ ≔ infπœ‹π‘π‘œπ‘šπ‘π‘’π‘‘π‘’π‘ 

πΉπ‘€π‘–π‘‘β„Žπ‘’π‘Ÿπ‘Ÿπ‘œπ‘Ÿβ‰€πœ€

maxπœ‡

𝐼𝐢(πœ‹, πœ‡).

38

Page 39: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Ex: The information complexity of Equality

β€’ What is 𝐼𝐢(𝐸𝑄, 0)?

β€’ Consider the following protocol.

39 A B

X in {0,1}n Y in {0,1}n

A non-singular in 𝒁𝒏×𝒏

A1Β·X

A1Β·Y

A2Β·X

A2Β·Y

Continue for n steps, or until a disagreement is

discovered.

Page 40: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Analysis (sketch)

‒ If X≠Y, the protocol will terminate in O(1) rounds on average, and thus reveal O(1) information.

β€’ If X=Y… the players only learn the fact that X=Y (≀1 bit of information).

β€’ Thus the protocol has O(1) information complexity for any prior πœ‡.

40

Page 41: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Operationalizing IC: Information equals amortized communication

β€’ Recall [Shannon]: limπ‘›β†’βˆž

𝐢(𝑋𝑛)/𝑛 = 𝐻 𝑋 .

β€’ Turns out: limπ‘›β†’βˆž

𝐢𝐢(𝐹𝑛, πœ‡π‘›, πœ€)/𝑛 = 𝐼𝐢 𝐹, πœ‡, πœ€ ,

for πœ€ > 0. [Error πœ€ allowed on each copy]

β€’ For πœ€ = 0: limπ‘›β†’βˆž

𝐢𝐢(𝐹𝑛, πœ‡π‘›, 0+)/𝑛 = 𝐼𝐢 𝐹, πœ‡, 0 .

β€’ [ limπ‘›β†’βˆž

𝐢𝐢(𝐹𝑛, πœ‡π‘›, 0)/𝑛 an interesting open

problem.] 41

Page 42: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Information = amortized communication

β€’ limπ‘›β†’βˆž

𝐢𝐢(𝐹𝑛, πœ‡π‘›, πœ€)/𝑛 = 𝐼𝐢 𝐹, πœ‡, πœ€ .

β€’ Two directions: β€œβ‰€β€ and β€œβ‰₯”.

β€’ 𝑛 β‹… 𝐻 𝑋 = 𝐻 𝑋𝑛 ≀ 𝐢 𝑋𝑛 ≀ 𝐻 𝑋𝑛 + 1.

Additivity of entropy

Compression (Huffman)

Page 43: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

The β€œβ‰€β€ direction

β€’ limπ‘›β†’βˆž

𝐢𝐢(𝐹𝑛, πœ‡π‘›, πœ€)/𝑛 ≀ 𝐼𝐢 𝐹, πœ‡, πœ€ .

β€’ Start with a protocol πœ‹ solving 𝐹, whose 𝐼𝐢 πœ‹, πœ‡ is close to 𝐼𝐢 𝐹, πœ‡, πœ€ .

β€’ Show how to compress many copies of πœ‹ into a protocol whose communication cost is close to its information cost.

β€’ More on compression later.

43

Page 44: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

The β€œβ‰₯” direction

β€’ limπ‘›β†’βˆž

𝐢𝐢(𝐹𝑛, πœ‡π‘›, πœ€)/𝑛 β‰₯ 𝐼𝐢 𝐹, πœ‡, πœ€ .

β€’ Use the fact that 𝐢𝐢 𝐹𝑛,πœ‡π‘›,πœ€

𝑛β‰₯

𝐼𝐢 𝐹𝑛,πœ‡π‘›,πœ€

𝑛.

β€’ Additivity of information complexity: 𝐼𝐢 𝐹𝑛, πœ‡π‘›, πœ€

𝑛= 𝐼𝐢 𝐹, πœ‡, πœ€ .

44

Page 45: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Proof: Additivity of information complexity

β€’ Let 𝑇1(𝑋1, π‘Œ1) and 𝑇2(𝑋2, π‘Œ2) be two two-party tasks.

β€’ E.g. β€œSolve 𝐹(𝑋, π‘Œ) with error ≀ πœ€ w.r.t. πœ‡β€

β€’ Then

𝐼𝐢 𝑇1 Γ— 𝑇2, πœ‡1 Γ— πœ‡2 = 𝐼𝐢 𝑇1, πœ‡1 + 𝐼𝐢 𝑇2, πœ‡2

β€’ β€œβ‰€β€ is easy.

β€’ β€œβ‰₯” is the interesting direction.

Page 46: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

𝐼𝐢 𝑇1, πœ‡1 + 𝐼𝐢 𝑇2, πœ‡2 ≀𝐼𝐢 𝑇1 Γ— 𝑇2, πœ‡1 Γ— πœ‡2

β€’ Start from a protocol πœ‹ for 𝑇1 Γ— 𝑇2 with prior πœ‡1 Γ— πœ‡2, whose information cost is 𝐼.

β€’ Show how to construct two protocols πœ‹1 for 𝑇1 with prior πœ‡1 and πœ‹2 for 𝑇2 with prior πœ‡2, with information costs 𝐼1 and 𝐼2, respectively, such that 𝐼1 + 𝐼2 = 𝐼.

46

Page 47: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

πœ‹( 𝑋1, 𝑋2 , π‘Œ1, π‘Œ2 )

πœ‹1 𝑋1, π‘Œ1 β€’ Publicly sample 𝑋2 ∼ πœ‡2 β€’ Bob privately samples

π‘Œ2 ∼ πœ‡2|X2

β€’ Run πœ‹( 𝑋1, 𝑋2 , π‘Œ1, π‘Œ2 )

πœ‹2 𝑋2, π‘Œ2 β€’ Publicly sample π‘Œ1 ∼ πœ‡1 β€’ Alice privately samples

𝑋1 ∼ πœ‡1|Y1

β€’ Run πœ‹( 𝑋1, 𝑋2 , π‘Œ1, π‘Œ2 )

Page 48: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

πœ‹1 𝑋1, π‘Œ1 β€’ Publicly sample 𝑋2 ∼ πœ‡2 β€’ Bob privately samples

π‘Œ2 ∼ πœ‡2|X2

β€’ Run πœ‹( 𝑋1, 𝑋2 , π‘Œ1, π‘Œ2 )

Analysis - πœ‹1

β€’ Alice learns about π‘Œ1: 𝐼 Ξ ; π‘Œ1 X1X2)

β€’ Bob learns about 𝑋1: 𝐼 Ξ ; 𝑋1 π‘Œ1π‘Œ2𝑋2).

β€’ 𝐼1 = 𝐼 Ξ ; π‘Œ1 X1X2) + 𝐼 Ξ ; 𝑋1 π‘Œ1π‘Œ2𝑋2).

48

Page 49: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

πœ‹2 𝑋2, π‘Œ2 β€’ Publicly sample π‘Œ1 ∼ πœ‡1 β€’ Alice privately samples

𝑋1 ∼ πœ‡1|Y1

β€’ Run πœ‹( 𝑋1, 𝑋2 , π‘Œ1, π‘Œ2 )

Analysis - πœ‹2

β€’ Alice learns about π‘Œ2: 𝐼 Ξ ; π‘Œ2 X1X2Y1)

β€’ Bob learns about 𝑋2: 𝐼 Ξ ; 𝑋2 π‘Œ1π‘Œ2).

β€’ 𝐼2 = 𝐼 Ξ ; π‘Œ2 X1X2Y1) + 𝐼 Ξ ; 𝑋2 π‘Œ1π‘Œ2).

49

Page 50: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Adding 𝐼1 and 𝐼2

𝐼1 + 𝐼2 = 𝐼 Ξ ; π‘Œ1 X1X2) + 𝐼 Ξ ; 𝑋1 π‘Œ1π‘Œ2𝑋2)

+ 𝐼 Ξ ; π‘Œ2 X1X2Y1) + 𝐼 Ξ ; 𝑋2 π‘Œ1π‘Œ2)

= 𝐼 Ξ ; π‘Œ1 X1X2) + 𝐼 Ξ ; π‘Œ2 X1X2Y1) +𝐼 Ξ ; 𝑋2 π‘Œ1π‘Œ2) + 𝐼 Ξ ; 𝑋1 π‘Œ1π‘Œ2𝑋2) =

𝐼 Ξ ; π‘Œ1π‘Œ2 𝑋1𝑋2 + 𝐼 Ξ ; 𝑋2𝑋1 π‘Œ1π‘Œ2 = 𝐼.

50

Page 51: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Summary

β€’ Information complexity is additive.

β€’ Operationalized via β€œInformation = amortized communication”.

β€’ limπ‘›β†’βˆž

𝐢𝐢(𝐹𝑛, πœ‡π‘›, πœ€)/𝑛 = 𝐼𝐢 𝐹, πœ‡, πœ€ .

β€’ Seems to be the β€œright” analogue of entropy for interactive computation.

51

Page 52: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Entropy vs. Information Complexity

Entropy IC

Additive? Yes Yes

Operationalized limπ‘›β†’βˆž

𝐢(𝑋𝑛)/𝑛 limπ‘›β†’βˆž

𝐢𝐢 𝐹𝑛, πœ‡π‘›, πœ€

𝑛

Compression? Huffman:

𝐢 𝑋 ≀ 𝐻 𝑋 + 1 ???!

Page 53: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Can interactive communication be compressed?

β€’ Is it true that 𝐢𝐢 𝐹, πœ‡, πœ€ ≀ 𝐼𝐢 𝐹, πœ‡, πœ€ + 𝑂(1)?

β€’ Less ambitiously:

𝐢𝐢 𝐹, πœ‡, 𝑂(πœ€) = 𝑂 𝐼𝐢 𝐹, πœ‡, πœ€ ?

β€’ (Almost) equivalently: Given a protocol πœ‹ with 𝐼𝐢 πœ‹, πœ‡ = 𝐼, can Alice and Bob simulate πœ‹ using 𝑂 𝐼 communication?

β€’ Not known in general…

53

Page 54: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Direct sum theorems

β€’ Let 𝐹 be any functionality.

β€’ Let 𝐢(𝐹) be the cost of implementing 𝐹.

β€’ Let 𝐹𝑛 be the functionality of implementing 𝑛 independent copies of 𝐹.

β€’ The direct sum problem:

β€œDoes 𝐢(𝐹𝑛) β‰ˆ 𝑛 βˆ™ 𝐢(𝐹)?”

β€’ In most cases it is obvious that 𝐢(𝐹𝑛) ≀ 𝑛 βˆ™ 𝐢(𝐹).

54

Page 55: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Direct sum – randomized communication complexity

β€’ Is it true that

𝐢𝐢(𝐹𝑛, πœ‡π‘›, πœ€) = Ξ©(𝑛 βˆ™ 𝐢𝐢(𝐹, πœ‡, πœ€))?

β€’ Is it true that 𝐢𝐢(𝐹𝑛, πœ€) = Ξ©(𝑛 βˆ™ 𝐢𝐢(𝐹, πœ€))?

55

Page 56: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Direct product – randomized communication complexity

β€’ Direct sum

𝐢𝐢(𝐹𝑛, πœ‡π‘›, πœ€) = Ξ©(𝑛 βˆ™ 𝐢𝐢(𝐹, πœ‡, πœ€))?

β€’ Direct product

𝐢𝐢(𝐹𝑛, πœ‡π‘›, (1 βˆ’ πœ€)𝑛) = Ξ©(𝑛 βˆ™ 𝐢𝐢(𝐹, πœ‡, πœ€))?

56

Page 57: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Direct sum for randomized CC and interactive compression

Direct sum:

β€’ 𝐢𝐢(𝐹𝑛, πœ‡π‘›, πœ€) = Ξ©(𝑛 βˆ™ 𝐢𝐢(𝐹, πœ‡, πœ€))?

In the limit:

β€’ 𝑛 β‹… 𝐼𝐢(𝐹, πœ‡, πœ€) = Ξ©(𝑛 βˆ™ 𝐢𝐢(𝐹, πœ‡, πœ€))?

Interactive compression:

β€’ 𝐢𝐢 𝐹, πœ‡, πœ€ = 𝑂(𝐼𝐢 𝐹, πœ‡, πœ€) ?

Same question!

57

Page 58: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

The big picture

𝐢𝐢(𝐹, πœ‡, πœ€) 𝐢𝐢(𝐹𝑛, πœ‡π‘›, πœ€)/𝑛

𝐼𝐢(𝐹, πœ‡, πœ€) 𝐼𝐢(𝐹𝑛, πœ‡π‘›, πœ€)/𝑛

additivity (=direct sum)

for information

information = amortized

communication direct sum for

communication?

interactive compression?

Page 59: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Current results for compression A protocol πœ‹that has 𝐢 bits of communication, conveys 𝐼 bits of information over prior πœ‡, and works in π‘Ÿ rounds can be simulated:

β€’ Using 𝑂 (𝐼 + π‘Ÿ) bits of communication.

β€’ Using 𝑂 𝐼 β‹… 𝐢 bits of communication.

β€’ Using 2𝑂(𝐼) bits of communication.

β€’ If πœ‡ = πœ‡π‘‹ Γ— πœ‡π‘Œ, then using 𝑂(𝐼 polylog 𝐢) bits of communication. 59

Page 60: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Their direct sum counterparts

β€’ 𝐢𝐢(𝐹𝑛, πœ‡π‘›, πœ€) = Ξ© (𝑛1/2 βˆ™ 𝐢𝐢(𝐹, πœ‡, πœ€)).

β€’ 𝐢𝐢(𝐹𝑛, πœ€) = Ξ© (𝑛1/2 βˆ™ 𝐢𝐢(𝐹, πœ€)).

For product distributions πœ‡ = πœ‡π‘‹ Γ— πœ‡π‘Œ,

β€’ 𝐢𝐢(𝐹𝑛, πœ‡π‘›, πœ€) = Ξ© (𝑛 βˆ™ 𝐢𝐢(𝐹, πœ‡, πœ€)).

When the number of rounds is bounded by π‘Ÿ β‰ͺ 𝑛, a direct sum theorem holds.

60

Page 61: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Direct product

β€’ The best one can hope for is a statement of the type:

𝐢𝐢(𝐹𝑛, πœ‡π‘›, 1 βˆ’ 2βˆ’π‘‚ 𝑛 ) = Ξ©(𝑛 βˆ™ 𝐼𝐢(𝐹, πœ‡, 1/3)).

β€’ Can prove:

𝐢𝐢 𝐹𝑛, πœ‡π‘›, 1 βˆ’ 2βˆ’π‘‚ 𝑛 = Ξ© (𝑛1/2 βˆ™ 𝐢𝐢(𝐹, πœ‡, 1/3)).

61

Page 62: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Proof 2: Compressing a one-round protocol

β€’ Say Alice speaks: 𝐼𝐢 πœ‹, πœ‡ = 𝐼 𝑀;𝑋 π‘Œ .

β€’ Recall KL-divergence: 𝐼 𝑀; 𝑋 π‘Œ = πΈπ‘Œπ· π‘€π‘‹π‘Œ βˆ₯ π‘€π‘Œ = πΈπ‘Œπ· 𝑀𝑋 βˆ₯ π‘€π‘Œ

β€’ Bottom line:

– Alice has 𝑀𝑋; Bob has π‘€π‘Œ;

– Goal: sample from 𝑀𝑋 using ∼ 𝐷(𝑀𝑋 βˆ₯ π‘€π‘Œ) communication.

62

Page 63: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

The dart board

q1 q2 q3 q4 q5 q6 q7 …. u1 u2 u3 u4 u5 u6 u7

1

0

β€’ Interpret the public randomness as random points in π‘ˆ Γ— [0,1], where π‘ˆ is the universe of all possible messages.

β€’ First message under the histogram of 𝑀 is distributed ∼ 𝑀.

π‘ˆ

u1

u2

u3 u4

u5

Page 64: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

64

Proof Idea β€’ Sample using 𝑂(log1/πœ€ + 𝐷(𝑀𝑋 βˆ₯ π‘€π‘Œ))

communication with statistical error Ξ΅.

MX MY

u1 u1

u2 u2

u3 u3

u4 u4

u4

~|U| samples Public randomness:

q1 q2 q3 q4 q5 q6 q7 …. u1 u2 u3 u4 u5 u6 u7

MX MY 1 1

0 0

Page 65: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Proof Idea β€’ Sample using 𝑂(log1/πœ€ + 𝐷(𝑀𝑋 βˆ₯ π‘€π‘Œ))

communication with statistical error Ξ΅.

u4 u2

h1(u4) h2(u4)

65 MX MY u4

1 1

0 0 u2

MX MY

Page 66: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

66 66

Proof Idea β€’ Sample using 𝑂(log1/πœ€ + 𝐷(𝑀𝑋 βˆ₯ π‘€π‘Œ))

communication with statistical error Ξ΅.

u4 u2

h4(u4)… hlog 1/ Ξ΅(u4)

u4

h3(u4)

MX 2MY

MX MY u4

u4

h1(u4), h2(u4)

1 1

0 0

Page 67: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

67

Analysis

β€’ If 𝑀𝑋(𝑒4) β‰ˆ 2π‘˜π‘€π‘Œ(𝑒4), then the protocol

will reach round π‘˜ of doubling.

β€’ There will be β‰ˆ 2π‘˜ candidates.

β€’ About π‘˜ + log1/πœ€ hashes to narrow to one.

β€’ The contribution of 𝑒4 to cost:

– 𝑀𝑋(𝑒4)(log𝑀𝑋(𝑒4)/π‘€π‘Œ(

𝑒4) + log1/πœ€).

Done! 𝐷(𝑀𝑋 βˆ₯ π‘€π‘Œ) ≔ 𝑀𝑋(𝑒) log𝑀𝑋(𝑒)

π‘€π‘Œ(𝑒).

𝑒

Page 68: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

External information cost

β€’ (𝑋, π‘Œ)~πœ‡.

A B

X Y

Protocol Ο€ Protocol

transcript Ο€

𝐼𝐢𝑒π‘₯𝑑(πœ‹, πœ‡) = 𝐼(Ξ ; π‘‹π‘Œ)

what Charlie learns about (X,Y)

C

Page 69: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Example

69

β€’F is β€œX=Y?”. β€’ΞΌ is a distribution where w.p. Β½ X=Y and w.p. Β½ (X,Y) are random.

MD5(X)

X=Y?

A B

X Y

𝐼𝐢𝑒π‘₯𝑑(πœ‹, πœ‡) = 𝐼(Ξ ; π‘‹π‘Œ) = 129𝑏𝑖𝑑𝑠

what Charlie learns about (X,Y)

Page 70: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

External information cost

β€’ It is always the case that 𝐼𝐢𝑒π‘₯𝑑 πœ‹, πœ‡ β‰₯ 𝐼𝐢 πœ‹, πœ‡ .

β€’ If πœ‡ = πœ‡π‘‹ Γ— πœ‡π‘Œ is a product distribution, then

𝐼𝐢𝑒π‘₯𝑑 πœ‹, πœ‡ = 𝐼𝐢 πœ‹, πœ‡ .

70

Page 71: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

External information complexity

β€’ 𝐼𝐢𝑒π‘₯𝑑 𝐹, πœ‡, πœ€ ≔ infπœ‹π‘π‘œπ‘šπ‘π‘’π‘‘π‘’π‘ 

πΉπ‘€π‘–π‘‘β„Žπ‘’π‘Ÿπ‘Ÿπ‘œπ‘Ÿβ‰€πœ€

𝐼𝐢𝑒π‘₯𝑑(πœ‹, πœ‡).

β€’ Can it be operationalized?

71

Page 72: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Operational meaning of 𝐼𝐢𝑒π‘₯𝑑?

β€’ Conjecture: Zero-error communication scales like external information:

limπ‘›β†’βˆž

𝐢𝐢 𝐹𝑛, πœ‡π‘›, 0

𝑛= 𝐼𝐢𝑒π‘₯𝑑 𝐹, πœ‡, 0 ?

β€’ Recall:

limπ‘›β†’βˆž

𝐢𝐢 𝐹𝑛, πœ‡π‘›, 0+

𝑛= 𝐼𝐢 𝐹, πœ‡, 0 .

72

Page 73: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Example – transmission with a strong prior

β€’ 𝑋, π‘Œ ∈ 0,1

β€’ πœ‡ is such that 𝑋 βˆˆπ‘ˆ 0,1 , and 𝑋 = π‘Œ with a very high probability (say 1 βˆ’ 1/ 𝑛).

β€’ 𝐹 𝑋, π‘Œ = 𝑋 is just the β€œtransmit 𝑋” function.

β€’ Clearly, πœ‹ should just have Alice send 𝑋 to Bob.

β€’ 𝐼𝐢 𝐹, πœ‡, 0 = 𝐼𝐢 πœ‹, πœ‡ = 𝐻1

𝑛= o(1).

β€’ 𝐼𝐢𝑒π‘₯𝑑 𝐹, πœ‡, 0 = 𝐼𝐢𝑒π‘₯𝑑 πœ‹, πœ‡ = 1. 73

Page 74: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Example – transmission with a strong prior

β€’ 𝐼𝐢 𝐹, πœ‡, 0 = 𝐼𝐢 πœ‹, πœ‡ = 𝐻1

𝑛= o(1).

β€’ 𝐼𝐢𝑒π‘₯𝑑 𝐹, πœ‡, 0 = 𝐼𝐢𝑒π‘₯𝑑 πœ‹, πœ‡ = 1.

β€’ 𝐢𝐢 𝐹𝑛, πœ‡π‘›, 0+ = π‘œ 𝑛 .

β€’ 𝐢𝐢 𝐹𝑛, πœ‡π‘›, 0 = Ξ©(𝑛).

Other examples, e.g. the two-bit AND function fit into this picture.

74

Page 75: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Additional directions

75

Information complexity

Interactive coding

Information theory in TCS

Page 76: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Interactive coding theory

β€’ So far focused the discussion on noiseless coding.

β€’ What if the channel has noise?

β€’ [What kind of noise?]

β€’ In the non-interactive case, each channel has a capacity 𝐢.

76

Page 77: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Channel capacity

β€’ The amortized number of channel uses needed to send 𝑋 over a noisy channel of capacity 𝐢 is

𝐻 𝑋

𝐢

β€’ Decouples the task from the channel!

77

Page 78: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Example: Binary Symmetric Channel

β€’ Each bit gets independently flipped with probability πœ€ < 1/2.

β€’ One way capacity 1 βˆ’ 𝐻 πœ€ .

78

0

1

0

1 1 βˆ’ πœ€

1 βˆ’ πœ€

πœ€ πœ€

Page 79: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Interactive channel capacity

β€’ Not clear one can decouple channel from task in such a clean way.

β€’ Capacity much harder to calculate/reason about.

β€’ Example: Binary symmetric channel.

β€’ One way capacity 1 βˆ’ 𝐻 πœ€ .

β€’ Interactive (for simple pointer jumping,

[Kol-Raz’13]):

1 βˆ’ Θ 𝐻 πœ€ .

0

1

0

1 1 βˆ’ πœ€

1 βˆ’ πœ€

πœ€

Page 80: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Information theory in communication complexity and beyond

β€’ A natural extension would be to multi-party communication complexity.

β€’ Some success in the number-in-hand case.

β€’ What about the number-on-forehead?

β€’ Explicit bounds for β‰₯ log 𝑛 players would imply explicit 𝐴𝐢𝐢0 circuit lower bounds.

80

Page 81: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

NaΓ―ve multi-party information cost

𝐼𝐢 πœ‹, πœ‡ = 𝐼(Ξ ; 𝑋|π‘Œπ‘)+ 𝐼(Ξ ; π‘Œ|𝑋𝑍)+ 𝐼(Ξ ; 𝑍|π‘‹π‘Œ)

81

A B C

YZ XZ XY

Page 82: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

NaΓ―ve multi-party information cost

𝐼𝐢 πœ‹, πœ‡ = 𝐼(Ξ ; 𝑋|π‘Œπ‘)+ 𝐼(Ξ ; π‘Œ|𝑋𝑍)+ 𝐼(Ξ ; 𝑍|π‘‹π‘Œ)

β€’ Doesn’t seem to work.

β€’ Secure multi-party computation [Ben-Or,Goldwasser, Wigderson], means that anything can be computed at near-zero information cost.

β€’ Although, these construction require the players to share private channels/randomness.

82

Page 83: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Communication and beyond…

β€’ The rest of today:

– Data structures;

– Streaming;

– Distributed computing;

– Privacy.

β€’ Exact communication complexity bounds.

β€’ Extended formulations lower bounds.

β€’ Parallel repetition?

β€’ … 83

Page 84: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

84

Thank You!

Page 85: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Open problem: Computability of IC

β€’ Given the truth table of 𝐹 𝑋, π‘Œ , πœ‡ and πœ€, compute 𝐼𝐢 𝐹, πœ‡, πœ€ .

β€’ Via 𝐼𝐢 𝐹, πœ‡, πœ€ = limπ‘›β†’βˆž

𝐢𝐢(𝐹𝑛, πœ‡π‘›, πœ€)/𝑛 can

compute a sequence of upper bounds.

β€’ But the rate of convergence as a function of 𝑛 is unknown.

85

Page 86: Basics of information theory and information complexitymbraverm/pmwiki/uploads/BravermanSlidesS...Β Β· 1࡚ 2,πœ‡1ΰ΅šπœ‡2 β€’Start from a protocol πœ‹ for 1࡚ 2 with prior πœ‡1ΰ΅šπœ‡2,

Open problem: Computability of IC

β€’ Can compute the π‘Ÿ-roundπΌπΆπ‘Ÿ 𝐹, πœ‡, πœ€ information complexity of 𝐹.

β€’ But the rate of convergence as a function of π‘Ÿ is unknown.

β€’ Conjecture:

πΌπΆπ‘Ÿ 𝐹, πœ‡, πœ€ βˆ’ 𝐼𝐢 𝐹, πœ‡, πœ€ = 𝑂𝐹,πœ‡,πœ€1

π‘Ÿ2.

β€’ This is the relationship for the two-bit AND.

86


Top Related