locally decodable codes
DESCRIPTION
Locally Decodable Codes. Uri Nadav. Contents. What is Locally Decodable Code (LDC) ? Constructions Lower Bounds Reduction from Private Information Retrieval (PIR) to LDC. /2. codeword. Minimum Distance. For every x ≠ y that satisfy d(C( x ),C( y )) ≥ δ - PowerPoint PPT PresentationTRANSCRIPT
Locally Decodable Codes
Uri Nadav
Contents
• What is Locally Decodable Code (LDC) ?
• Constructions
• Lower Bounds
• Reduction from Private Information Retrieval (PIR) to LDC
Minimum Distance
For every x≠y that satisfy d(C(x),C(y)) ≥ δ
• Error correction problem is solvable for less than δ/2 errors
• Error Detection problem is solvable for less than δ errors
/2
codeword
Error-correction
Encodingx C(x)
Errors
y
Decodingi x[i]
Input
Codeword
Worst case error assumption
Corruptedcodeword
Bit to decode Decoded bit
Query Complexity
• Number of indices decoder is allowed to read from (corrupted) codeword
• Decoding can be done with query complexity Ω(|C(x)|)
• We are interested in constant query complexity
Adversarial Model
We can view the errors model as an adversary that chooses positions to destroy, and has access to the decoding/encoding scheme (but not to random coins)
The adversary is allowed to insert at most m errors
Why not decode in blocks?
Adversary is worst case so it can destroy more than δ fraction of some blocks, and less from others.
Nice errors:
Worst Case:
Many errors in the same block
Ideal Code C:{0,1}nm
Constant information rate: n/m > c
Resilient against constant fraction of errors (linear minimum distance)
Efficient Decoding (constant query complexity)
No Such Code!
Definition of LDC
C:{0,1}nm is a (q,,) locally decodable code if there exists a prob. algorithm A such that:
x {0,1}n, y m with distance d(y,C(x))<m and i {1,..,n}, Pr[ A(y,i)=xi ] > ½ +
A reads at most q indices of y (of its choice)
The Probability is over the coin tosses of A
Queries are not allowed to be adaptive
A must be probabilistic if q< m
A has oracle access to y
Example: Hadamard Code• Hadamard is (2,δ, ½ -2δ) LDC
• Construction:
x1 x2 xn
source word
<x,1>
codeword
<x,2> <x,2n-1>
Encoding
Relative minimum distance ½
Example: Hadamard CodeReconstruction
<x,1>
codeword
x1 x2 xn <x,2> <x,2n-1>
Decoding
source word
xi <x,a>
<x,a+ei
>
2 queries
Pick aR{0,1}n
= +
ei=(0,…0,1,0,…,0)the i’th entry
If less than δ fraction of errors, thenreconstruction probability is at least 1-2δ
reconstruction formula
Another Construction…
n n´ matrix
2 2 n n´ matrix
x
{ }1,...,B nÍ
{ }1,...,A nÍ
,,
i ji A j B
xÎ ÎÅ
Reconstruction of bit xi,j:
1) A,B2) A{i},B3) A,B{j}4) A{i},B{j}
Probability of 1-4 for correct decoding
Generalization…
1/ 1/ k kn n´ ´L CUBE
1/ 1/
2 2k kn n´ ´L
x 2k queriesm=2kn1/k
Smoothly Decodable Code
C:{0,1}nm is a (q,c,) smoothly decodable code if there exists a prob. algorithm A such that:
x {0,1}n and i {1,..,n}, Pr[ A(C(x),i)=xi ] > ½ +
A reads at most q indices of C(x) (of its choice)
The Probability is over the coin tosses of A
Queries are not allowed to be adaptive
A has access to a non corrupted codeword
i {1,..,n} and j {1,..,m}, Pr[ A(·,i) reads j ] ≤ c/m
The event is: A reads index j of C(x) to reconstruct index i
1
2
3
LDC is also Smooth Code
Claim: Every (q,δ,ε) LDC is a (q,q/δ,ε) smooth code.
Intuition – If the code is resilient against linear number of errors, then no bit of the output can be queried too often (or else adversary will choose it)
Proof: LDC is Smooth
A - a reconstruction algorithm for (q,δ,ε) LDC
Si= {j | Pr[A query j] > q/δm}
There are at mostq queries, so sum of prob. over j is q , thus |Si| < δm
Set of indicesread ‘too’ often
Proof:LDC is SmoothA’ – uses A as black box, returns whatever A returns as xi
A’ gives A oracle access to corrupted codeword C(x)’, return only indices not in S
[C(x)’]j =C(x)j
otherwise
0 j Si
A reconstructs xi with probability at least 1/2 + ε, because there are at most |Si| < δm errors
A’ is a (q,q/δ, ε) Smooth decoding algorithm
Proof: LDC is Smooth
0 0 0
AA
C(x)’
indices that A reads too often
C(x)
what A gets
what A wants
indices that A’ fixed arbitrarily
Smooth Code is LDC
• A bit can be reconstructed using q uniformly distributed queries, with ε advantage , when no errors
• With probability (1-qδ) all the queries are to non-corrupted indices.
Remember: Adversary does not know decoding procedure’s random coins
Lower Bounds
• Non existence for q = 1 [KT]
• Non linear rate for q ≥ 2 [KT]
• Exponential rate for linear code, q=2 [Goldreich et al]
• Exponential rate for every code, q=2 [Kerenidis,de Wolf] (using quantum arguments)
Information Theory basics• Entropy
• Mutual Information
I(x,y) = H(x)-H(x|y)
H(x) = -∑Pr[x=i] log(Pr[x=i])
Information Theory cont…
• Entropy of multiple variable is less than the sum of entropies! (equal in case of all variables mutually independent:
H(x1x2…xn) ≤ ∑ H(xi)
• Highest entropy is of a uniformly distributed random variable.
IT result from [KT]
nHR
RC n
))/((|)log(|
},{:
ε
ε
+211≥
+>=∈∀
→10
-
then
x) all and A, of coins random all over taken (prob'
1/2 ]x i)[A(C(x),Pr , [n]i
s.t. A algorithm exists there
Assume function. a be Let :Thm
ix
Proof
Combined …
))nH( - (1
H(x - H(x)
C(x))|H(x - H(x) C(x))I(x,
|)Rlog(| H(C(x)) C(x))I(x,
n
1ii
ε+21≥
≥
=•
≤≤•
∑=
/
))(| xC
nHR ))/((|)log(| ε+211≥ -
Single query (q=1)
)) H(1/2-(1||log
n εδ +
≤Σ
Claim: If C:{0,1}nm, is (1,δ,ε) locally decodable then:
No such family of codes!
Good Index
Index j is said to be ‘good’ for i, if Pr[A(C(x),i)=xi |A reads j] > ½ +
ε
Single query (q=1)
There exist at least a single j1 which is good for i.
]x i)[A(C(x),Pr 1/2 ix =<+ ε
j]reads i), j]Pr[A(reads i),A(|x i)[A(C(x),Pr im}j
x ••== ∑{1..∈
By definition of LDC
Conditional prob. summing over disjoint events
Perturbation Vector
Def: Perturbation vector Δj1,j2,…
takes random values uniformly distributed from ∑, in position j1,j2,… and 0 otherwise.
0
0
j1» ∑
0
0
j2 » ∑
0
Destroys specified indices in most unpredicted way
Adding perturbation
]x i),[A(C(x)Pr 1/2 ijx 1=⊕<+ Δε
j]reads i), j]Pr[A(reads i),A(|x i),[A(C(x)Pr ijm}j
x 1••=⊕= ∑
{1..∈
Δ
A resilientAgainst at least 1 error
So, there exists at least one index, j2 ‘good’ for i.
j2 ≠ j1 , because j1 can not be good!
Single query (q=1)
ε
δ
1/2
j]reads i), j]Pr[A(reads i),A(|x i),[A(C(x)Pr ijjj[m]j
x m21
+>
••=⊕∑∈
,..,Δ
So, There are at least δm indices of The codeword ‘good’ for every i.
By pigeonhole principle , there exists an index j’ in {1..m}, ‘good’ for δn indices.
A resilientAgainst δm errors
Single query (q=1)
Think of C(x[1..δn]) projected on j’ as a function from the δn indices of the input. The range is ∑, and each bit of the input can be reconstructed w.p. ½ + ε. Thus by IT result:
)) H(1/2-(1|)log(|
nε
δ+
<Σ
Case q≥2
m = Ω(n)q/(q-1)
Constant time reconstruction procedures are impossible for codes having constant rate!
Case q≥2 Proof Sketch
• A LDC C is also smooth
• A q smooth codeword has a small enough subset of indices, that still encodes linear amount of information
• So, by IT result, m(q-1)/q = Ω(n)
Applications?
• Better locally decodable codes have applications to PIR
• Applications to the practice of fault-tolerant data storage/transmission?
What about Locally Encodable
• A ‘Respectable Code’ is resilient against Ω(m) fraction of errors.
• We expect a bit of the encoding to depend on many bits of the encoding
Otherwise, there exists a bit which influence less than 1/n fraction of the encoding.
Open Issues
• Adaptive vs Non-Adaptive Queries
• Closing the gap
guess first q-1 answers with succeess probability ∑q-1
Logarithmic number of queries
• View message as polynomial p:Fk->F of degree d (F is a field, |F| >> d)• Encode message by evaluating p at all
|F|k points• To encode n-bits message, can have |
F| polynomial in n, and d,k around polylog(n)
To reconstruct p(x)
• Pick a random line in Fk passing through x; • evaluate p on d+1 points of the line; • by interpolation, find degree-d univariate
polynomial that agrees with p on the line• Use interpolated polynomial to estimate
p(x)• Algorithm reads p in d+1 points, each
uniformly distributed
xx+y
x+2y
x+(d+1)y
Private Information Retrieval (PIR)
• Query a public database, without revealing the queried record.
• Example: A broker needs to query NASDAQ database about a stock, but don’t won’t anyone to know he is interested.
PIR
A k server PIR scheme of one round, for database length n consists of:
kll
lk1
lk1
arnd
q
rnd
{0,1}{0,1}[n]:R :function tionreconstruc
{0,1}[n]:AA functions answerK
{0,1}[n]:QQ functions queryK
)(
},{,...,
},{,...,
××
10→×
10→×
a
q
l
l
PIR – definition
• These function should satisfy:
q].r)(j,Pr[Q q]r)(i,[Q Pr
{0,1}q and [k]s , [n] ji, every For :Privacy
x r)),...)(i,QA..., r, R(i,
{0,1}{0,1}x : sCorrectnes
ss
q
ijj
rndn
===
∈∈∈
=
∈∈∈∀
,
(
],[, rni
Simple Construction of PIR
• 2 servers, one round• Each server holds bits x1,…, xn.• To request bit i, choose uniformly A
subset of [n]• Send first server A. • Send second server A+{i} (add i to A if
it is not there, remove if is there)• Server returns Xor of bits in indices of
request S in [n].• Xor the answers.
Lower Bounds On Communication Complexity
• To achieve privacy in case of single server, we need n bits message.
• (not too far from the one round 2 server scheme we suggested).
Reduction from PIR to LDC
• A codeword is a Concatenation of all possible answers from both servers
• A query procedure is made of 2 queries to the database