locally decodable codes

Locally Decodable Codes

Uri Nadav

Contents

• What is Locally Decodable Code (LDC) ?

• Constructions

• Lower Bounds

• Reduction from Private Information Retrieval (PIR) to LDC

Minimum Distance

For every x≠y that satisfy d(C(x),C(y)) ≥ δ

• Error correction problem is solvable for less than δ/2 errors

• Error Detection problem is solvable for less than δ errors

/2

codeword

Error-correction

Encodingx C(x)

Errors

y

Decodingi x[i]

Input

Codeword

Worst case error assumption

Corruptedcodeword

Bit to decode Decoded bit

Query Complexity

• Number of indices decoder is allowed to read from (corrupted) codeword

• Decoding can be done with query complexity Ω(|C(x)|)

• We are interested in constant query complexity

Adversarial Model

We can view the errors model as an adversary that chooses positions to destroy, and has access to the decoding/encoding scheme (but not to random coins)

The adversary is allowed to insert at most m errors

Why not decode in blocks?

Adversary is worst case so it can destroy more than δ fraction of some blocks, and less from others.

Nice errors:

Worst Case:

Many errors in the same block

Ideal Code C:{0,1}nm

Constant information rate: n/m > c

Resilient against constant fraction of errors (linear minimum distance)

Efficient Decoding (constant query complexity)

No Such Code!

Definition of LDC

C:{0,1}nm is a (q,,) locally decodable code if there exists a prob. algorithm A such that:

x {0,1}n, y m with distance d(y,C(x))<m and i {1,..,n}, Pr[ A(y,i)=xi ] > ½ +

A reads at most q indices of y (of its choice)

The Probability is over the coin tosses of A

Queries are not allowed to be adaptive

A must be probabilistic if q< m

A has oracle access to y

Example: Hadamard Code• Hadamard is (2,δ, ½ -2δ) LDC

• Construction:

x1 x2 xn

source word

<x,1>

codeword

<x,2> <x,2n-1>

Encoding

Relative minimum distance ½

Example: Hadamard CodeReconstruction

<x,1>

codeword

x1 x2 xn <x,2> <x,2n-1>

Decoding

source word

xi <x,a>

<x,a+ei

>

2 queries

Pick aR{0,1}n

= +

ei=(0,…0,1,0,…,0)the i’th entry

If less than δ fraction of errors, thenreconstruction probability is at least 1-2δ

reconstruction formula

Another Construction…

n n´ matrix

2 2 n n´ matrix

x

{ }1,...,B nÍ

{ }1,...,A nÍ

,,

i ji A j B

xÎ ÎÅ

Reconstruction of bit xi,j:

1) A,B2) A{i},B3) A,B{j}4) A{i},B{j}

Probability of 1-4 for correct decoding

Generalization…

1/ 1/ k kn n´ ´L CUBE

1/ 1/

2 2k kn n´ ´L

x 2k queriesm=2kn1/k

Smoothly Decodable Code

C:{0,1}nm is a (q,c,) smoothly decodable code if there exists a prob. algorithm A such that:

x {0,1}n and i {1,..,n}, Pr[ A(C(x),i)=xi ] > ½ +

A reads at most q indices of C(x) (of its choice)

The Probability is over the coin tosses of A

Queries are not allowed to be adaptive

A has access to a non corrupted codeword

i {1,..,n} and j {1,..,m}, Pr[ A(·,i) reads j ] ≤ c/m

The event is: A reads index j of C(x) to reconstruct index i

1

2

3

LDC is also Smooth Code

Claim: Every (q,δ,ε) LDC is a (q,q/δ,ε) smooth code.

Intuition – If the code is resilient against linear number of errors, then no bit of the output can be queried too often (or else adversary will choose it)

Proof: LDC is Smooth

A - a reconstruction algorithm for (q,δ,ε) LDC

Si= {j | Pr[A query j] > q/δm}

There are at mostq queries, so sum of prob. over j is q , thus |Si| < δm

Set of indicesread ‘too’ often

Proof:LDC is SmoothA’ – uses A as black box, returns whatever A returns as xi

A’ gives A oracle access to corrupted codeword C(x)’, return only indices not in S

[C(x)’]j =C(x)j

otherwise

0 j Si

A reconstructs xi with probability at least 1/2 + ε, because there are at most |Si| < δm errors

A’ is a (q,q/δ, ε) Smooth decoding algorithm

Proof: LDC is Smooth

0 0 0

AA

C(x)’

indices that A reads too often

C(x)

what A gets

what A wants

indices that A’ fixed arbitrarily

Smooth Code is LDC

• A bit can be reconstructed using q uniformly distributed queries, with ε advantage , when no errors

• With probability (1-qδ) all the queries are to non-corrupted indices.

Remember: Adversary does not know decoding procedure’s random coins

Lower Bounds

• Non existence for q = 1 [KT]

• Non linear rate for q ≥ 2 [KT]

• Exponential rate for linear code, q=2 [Goldreich et al]

• Exponential rate for every code, q=2 [Kerenidis,de Wolf] (using quantum arguments)

Information Theory basics• Entropy

• Mutual Information

I(x,y) = H(x)-H(x|y)

H(x) = -∑Pr[x=i] log(Pr[x=i])

Information Theory cont…

• Entropy of multiple variable is less than the sum of entropies! (equal in case of all variables mutually independent:

H(x1x2…xn) ≤ ∑ H(xi)

• Highest entropy is of a uniformly distributed random variable.

IT result from [KT]

nHR

RC n

))/((|)log(|

},{:

ε

ε

+211≥

+>=∈∀

→10

-

then

x) all and A, of coins random all over taken (prob'

1/2 ]x i)[A(C(x),Pr , [n]i

s.t. A algorithm exists there

Assume function. a be Let :Thm

ix

Single query (q=1)

)) H(1/2-(1||log

n εδ +

≤Σ

Claim: If C:{0,1}nm, is (1,δ,ε) locally decodable then:

No such family of codes!

Good Index

Index j is said to be ‘good’ for i, if Pr[A(C(x),i)=xi |A reads j] > ½ +

ε

Single query (q=1)

There exist at least a single j1 which is good for i.

]x i)[A(C(x),Pr 1/2 ix =<+ ε

j]reads i), j]Pr[A(reads i),A(|x i)[A(C(x),Pr im}j

x ••== ∑{1..∈

By definition of LDC

Conditional prob. summing over disjoint events

Perturbation Vector

Def: Perturbation vector Δj1,j2,…

takes random values uniformly distributed from ∑, in position j1,j2,… and 0 otherwise.

0

0

j1» ∑

0

0

j2 » ∑

0

Destroys specified indices in most unpredicted way

Adding perturbation

]x i),[A(C(x)Pr 1/2 ijx 1=⊕<+ Δε

j]reads i), j]Pr[A(reads i),A(|x i),[A(C(x)Pr ijm}j

x 1••=⊕= ∑

{1..∈

Δ

A resilientAgainst at least 1 error

So, there exists at least one index, j2 ‘good’ for i.

j2 ≠ j1 , because j1 can not be good!

Single query (q=1)

ε

δ

1/2

j]reads i), j]Pr[A(reads i),A(|x i),[A(C(x)Pr ijjj[m]j

x m21

+>

••=⊕∑∈

,..,Δ

So, There are at least δm indices of The codeword ‘good’ for every i.

By pigeonhole principle , there exists an index j’ in {1..m}, ‘good’ for δn indices.

A resilientAgainst δm errors

Single query (q=1)

Think of C(x[1..δn]) projected on j’ as a function from the δn indices of the input. The range is ∑, and each bit of the input can be reconstructed w.p. ½ + ε. Thus by IT result:

)) H(1/2-(1|)log(|

nε

δ+

<Σ

Case q≥2

m = Ω(n)q/(q-1)

Constant time reconstruction procedures are impossible for codes having constant rate!

Case q≥2 Proof Sketch

• A LDC C is also smooth

• A q smooth codeword has a small enough subset of indices, that still encodes linear amount of information

• So, by IT result, m(q-1)/q = Ω(n)

Applications?

• Better locally decodable codes have applications to PIR

• Applications to the practice of fault-tolerant data storage/transmission?

What about Locally Encodable

• A ‘Respectable Code’ is resilient against Ω(m) fraction of errors.

• We expect a bit of the encoding to depend on many bits of the encoding

Otherwise, there exists a bit which influence less than 1/n fraction of the encoding.

Open Issues

• Adaptive vs Non-Adaptive Queries

• Closing the gap

guess first q-1 answers with succeess probability ∑q-1

Logarithmic number of queries

• View message as polynomial p:Fk->F of degree d (F is a field, |F| >> d)• Encode message by evaluating p at all

|F|k points• To encode n-bits message, can have |

F| polynomial in n, and d,k around polylog(n)

To reconstruct p(x)

• Pick a random line in Fk passing through x; • evaluate p on d+1 points of the line; • by interpolation, find degree-d univariate

polynomial that agrees with p on the line• Use interpolated polynomial to estimate

p(x)• Algorithm reads p in d+1 points, each

uniformly distributed

xx+y

x+2y

x+(d+1)y

Private Information Retrieval (PIR)

• Query a public database, without revealing the queried record.

• Example: A broker needs to query NASDAQ database about a stock, but don’t won’t anyone to know he is interested.

PIR

A k server PIR scheme of one round, for database length n consists of:

kll

lk1

lk1

arnd

q

rnd

{0,1}{0,1}[n]:R :function tionreconstruc

{0,1}[n]:AA functions answerK

{0,1}[n]:QQ functions queryK

)(

},{,...,

},{,...,

××

10→×

10→×

a

q

l

l

PIR – definition

• These function should satisfy:

q].r)(j,Pr[Q q]r)(i,[Q Pr

{0,1}q and [k]s , [n] ji, every For :Privacy

x r)),...)(i,QA..., r, R(i,

{0,1}{0,1}x : sCorrectnes

ss

q

ijj

rndn

===

∈∈∈

=

∈∈∈∀

,

(

],[, rni

Simple Construction of PIR

• 2 servers, one round• Each server holds bits x1,…, xn.• To request bit i, choose uniformly A

subset of [n]• Send first server A. • Send second server A+{i} (add i to A if

it is not there, remove if is there)• Server returns Xor of bits in indices of

request S in [n].• Xor the answers.

Lower Bounds On Communication Complexity

• To achieve privacy in case of single server, we need n bits message.

• (not too far from the one round 2 server scheme we suggested).

Reduction from PIR to LDC

• A codeword is a Concatenation of all possible answers from both servers

• A query procedure is made of 2 queries to the database

locally decodable codes

Documents