group testing and coding theory atri rudra university at buffalo, suny or, a theoretical computer...

Post on 15-Jan-2016

216 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Group Testing and Coding Theory

Atri Rudra University at Buffalo, SUNY

Or, A Theoretical Computer Scientist’s (Biased) View of Group Testing

Group testing overview

Test soldier for a disease

WWII example: syphillis

2

Group testing overview

Test an army for a disease

WWII example: syphillis

What if only one soldier has the

disease?

What if only one soldier has the

disease?

3

Can we do

better?

Can we do

better?

4

Communicating with my 2 year oldC(x)

x

y = C(x)+error

x Give up

“Code” C“Akash English”

C(x) is a “codeword”

5

The setupC(x)

x

y = C(x)+error

x Give up

Mapping CError-correcting code or just code

Encoding: x C(x)

Decoding: y x

C(x) is a codeword

The fundamental tradeoff

Correct as many errors as possible with as little redundancy as possible

6

Can one achieve the “optimal” tradeoff with efficient encoding and decoding ?

The main message

7

Coding Theory

Group Testing

Asymptotic view

n!

10n2

n2

O() notation

≤ is O with glasses

poly(n) is O(nc) for some fixed c

Group testing overview

Test an army for a disease

WWII example: syphillis

What if only one soldier has the

disease?

What if only one soldier has the

disease?

Can pool blood samples and

check if at least one soldier has

the disease

Can pool blood samples and

check if at least one soldier has

the disease

10

Group testing

Set of items: (Unknown) vector x in {0,1}n

At most d positives: |x| ≤ d

Tests: a subset S of {1,..,n}

Result of a test: OR of xi’s such that i in S

Goal 1: Figure out x

Goal 2: Minimize the number of tests t

Non-adaptive tests: all tests are fixed a priori

1 2 3 n…………

1

2

3

t

.

.

.

1 0 0 1………….

0 0 1 0………….

0 0 0 1………….

1 1 1 0………….

.

.

.

t = O(d2log n) is possiblet = O(d2log n) is possible

Tons of applications

Tons of applications

Output + itemsOutput + items

11

The decoding step

1 2 3 n…………

1

2

3

t

.

.

.

1 0 0 1………….

0 0 1 0………….

0 0 0 1………….

1 1 1 0………….

.

.

.

x1

x2

x3

xn

.

.

.

.

.

.

r1

r2

r3

rt

.

.

.

unknownunknown

To be designedTo be designed

ObservedObserved

How fast can this step be done?

How fast can this step be done?

12

An application: heavy hitters

Stream items are numbers in the range {1,…,n}

Output all items that occur at least 1/d fraction of the times

One pass,poly log space,

poly log update,poly log report

time

One pass,poly log space,

poly log update,poly log report

time

13

Cormode-Muthukrishnan idea

Use group testing: maintain counters for each test

Heavy tail property: Total frequency of non-heavy items < 1/d

1 2 3 n…………

c1

c2

c3

ct

.

.

.

1 0 0 1………….

0 0 1 0………….

0 0 0 1………….

1 1 1 0………….

.

.

.

Maintain count of items in tests

Maintain total count m

ri = 1 iff ci ≥ m/d

xj = 1 iff j is a heavy item (|x| ≤ d)

r = M × x Reporting the heavy items is just decoding!

Reporting the heavy items is just decoding!

14

Requirements from group testing

1 2 3 n…………

c1

c2

c3

ct

.

.

.

1 0 0 1………….

0 0 1 0………….

0 0 0 1………….

1 1 1 0………….

.

.

.

Non-adaptiveness is crucial

Minimize t (space)

Strongly explicit matrix

Minimize decoding time (report time)

15

An overview of results

# tests (t) Decoding time

d is O(log n)d is O(log n)

O(d2 log n) poly(t) [INR10, NPR11]

O(d2 log n) O(nt) [DR82], [PR08]

O(d4 log n) O(t) [GI04]

O(d2 log2 n) poly(t) [GI04, implicit]

Big savings

Big savings

16

Tackling the first row

# tests (t) Decoding time

O(d2 log n) poly(t) [INR10, NPR11]

O(d2 log n) O(nt) [DR82], [PR08]

O(d4 log n) O(t) [GI04]

O(d2 log2 n) poly(t) [GI04, implicit]

17

d-disjunct matricesSufficient condition for group testing

d columns

1 0 0 0 …………….. 0Exists

True for every d subset of columns and a disjoint column

Set of positives

Test result=0

Every non-positive column has one 0 test

result

Every non-positive column has one 0 test

result

18

L columns

Naïve decoder for d-disjunct matrices

d columns

1 0 0 0 …………….. 0

Set of positives

If rj = 0 then for every column i that is in test j, set xi = 0

If xi=1 then all tests column i participates in will have a 1

If xi=1 then all tests column i participates in will have a 1

O(nt) timeO(Lt) time

19

What is known

d columns

1 0 0 0 …………….. 0

Set of positives O(nt) time

r1

r2

r3

rt

.

.

.

d-disjunct matrix

Strongly explicit d-disjunct matrix with t = O(d2 log2n) [Kautz-Singleton 1964]

Deterministic d-disjunct matrix with t = O(d2 log n) [Porat-Rothschild 2008]

Lower bound of Ω(d2 log n/log d) [Dyachkov-Rykov 1982]

20

Randomized d-disjunct matrix with t = O(d2 log n) [Dyachkov-Rykov 1982]

Up next

# tests (t) Decoding time

O(d2 log n) poly(t) [INR10, NPR11]

O(d2 log n) O(nt) [DR82], [PR08]

O(d4 log n) O(t) [GI04]

O(d2 log2 n) poly(t) [GI04, implicit]

21

Error-correcting codes

22

C(x)x

y

x Give up

Mapping C : km

Dimension k, block length m m≥ k

Rate R = k/m 1

Efficient means polynomial in mDecoding time complexity

Noise model

Errors are worst case (Hamming)error locationsarbitrary symbol changes

Limit on total number of errors

23

Hamming’s 60 yr old observation

24

≥ D

D/2

Large “distance” is good

Large “distance” is good

All you need to remember about Reed-Solomon codes– Part I

q is a prime power

qq/(d+1) vectors from [q]q where every two agree in < q/(d+1) positions

25

How do we get binary codes ?

26

Concatenation of codes [Forney 66]

C1: ({0,1}k)K ({0,1}k)M (Outer code)

C2: {0,1}k {0,1}m (Inner code)

C1° C2: {0,1}kK {0,1}mM

Typically k=O(log M)

x1 x2

wMw1 w2

xKx

C1(x)

C2(w1) C2(w2)C2(wM) C1° C2(x)

Disjunct matrices from RS codesn = qq/(d+1)

Column i gets ith codeword

Column i gets ith codeword

x 0 0 1…. …. 0x

x. q rows

t = q2 = O(d2 log2n)

d-disjunct matrix [Kautz,Singleton]

d-disjunct matrix [Kautz,Singleton]

Code Concatenation

Code Concatenation

q

27

A q=3 example

0

0

0

1

1

1

2

2

2

0

1

2

1

2

0

2

0

1

1

0

0

0

0

1

0

1

0

0

1

2

100

100

100

010

010

010

001

001

001

100

010

001

010

001

100

001

100

010

28

1-Agreement between two columns

0

0

0

1

1

1

2

2

2

0

1

2

1

2

0

2

0

1

1

0

0

0

0

1

0

1

0

0

1

2

100

100

100

010

010

010

001

001

001

100

010

001

010

001

100

001

100

010

≤ 1 agr

Agreement in binary = Agreement among RS codewords

< q/(d+1)

Agreement in binary = Agreement among RS codewords

< q/(d+1)29

d-disjunct matricesSufficient condition for group testing

d columns

1 0 0 0 …………….. 0Exists

True for every d subset of columns and a disjoint column

Set of positives

30

d-disjunctness of Kautz-Singleton

d columns

< q/(d+1) agr 11 11

< q/(d+1) agr 11 11

< q/(d+1) agr 11 11

1 >q- q*d/(d+1)>0 rows

0 0 0

31

Up next

# tests (t) Decoding time

O(d2 log n) poly(t) [INR10, NPR11]

O(d2 log n) O(nt) [DR82], [PR08]

O(d4 log n) O(t) [GI04]

O(d2 log2 n) poly(t) [GI04, implicit]

32

The basic idea

1 2 3 n…………

1

2

3

t

.

.

.

1 0 0 1………….

0 0 1 0………….

0 0 0 1………….

1 1 1 0………….

.

.

.

x1

x2

x3

xn

.

.

.

.

.

.

r1

r2

r3

rt

.

.

.

unknownunknown

Every column is a codeword

Every column is a codeword

ObservedObserved

Show is same as

`decoding’ the code

Show is same as

`decoding’ the code

33

n= # codewords = exp(m)

t = poly(m)

DecodingC(x) sent, y received

x k, y m

How much of y must be correct to recover x ?At least k symbols must be correctAt most (m-k)/m = 1-R fraction of errors1-R is the information-theoretic limit

: the fraction of errors decoder can handleInformation theoretic limit implies 1-R

34

x C(x)

yR = k/m

Can we get to the limit or 1-R ?

35

Not if we always want to uniquely recover the original message

Limit for unique decoding, (1-R)/2

(1-R)/2 (1-R)/2

1-R

c1

c2

r

R 1-R

(1-R)/2

36

List decoding [Elias57, Wozencraft58]

Always insisting on unique codeword is restrictiveThe “pathological” cases are rare

“Typical” received word can be decoded beyond (1-R)/2

Better Error-Recovery ModelOutput a list of answersList Decoding Example: Spell Checker

(1-R)/2

Almost all the space in higher dimension.

All but an exponential (in m) fraction

Information theoretic limit

• < 1 - R– Information-

theoretic limit

• Can handle twice as many errors

37Rate (R)

Unique decoding

Inf. theoretic limit

Fra

c. o

f Err

ors

()

Achievable by random codes.

NOT ALGORITHMIC!

Achievable by random codes.

NOT ALGORITHMIC!

38

Other applications of list decoding

CryptographyCryptanalysis of certain block-ciphers [Jakobsen98]Efficient traitor tracing scheme [Silverberg, Staddon, Walker 03]

Complexity TheoryHardcore predicates from one way functions [Goldreich,Levin 89; Impagliazzo

97; Ta-Shama, Zuckerman 01]Worst-case vs. average-case hardness [Cai, Pavan, Sivakumar 99; Goldreich, Ron,

Sudan 99; Sudan, Trevisan, Vadhan 99; Impagliazzo, Jaiswal, Kabanets 06]

Other algorithmic applicationsIP Traceback [Dean,Franklin,Stubblefield 01; Savage, Wetherall, Karlin, Anderson 00] Guessing Secrets [Alon,Guruswami,Kaufman,Sudan 02; Chung, Graham, Leighton 01]

Algorithmic list decoding results

1- R - > 0 Folded RS codes[Guruswami, R. 06]

39

Unique decoding

Inf. theoretic limit

Guruswami-Sudan 98

Parvaresh-Vardy 05

Fra

c. o

f Err

ors

()

Rate (R)

Folded RS

Concatenated codes

40

Concatenation of codes [Forney 66]

C1: ({0,1}k)K ({0,1}k)M (Outer code)

C2: {0,1}k {0,1}m (Inner code)

C1° C2: {0,1}kK {0,1}mM

Typically k=O(log M)

x1 x2

wMw1 w2

xKx

C1(x)

C2(w1) C2(w2)C2(wM) C1° C2(x)

Brute force decoding for inner code

41

List decoding C1° C2

y1 y2 yM

How do we “list decode” from lists ?

in {0,1}m

S1 S2 SM

in {0,1}k

List recovery

.

.

.

..

.

.

S1 S2 S3 SM

………………………Si subset of [q]

………………………c1 c2 c3 cM

|Si| ≤ d

42

All you need to remember about (Reed-Solomon) codes-- Part II

q is a prime power

qq/(d+1) vectors from [q]q where every two agree in < q/(d+1) positions

poly(q) time algorithm for list recovery

.

.

.

..

.

.

S1 S2 S3 Sq

………………………Si subset of [q]

………………………c1 c2 c3 cq

|Si| ≤ d

43

Back to the example

0

0

0

1

1

1

2

2

2

0

1

2

1

2

0

2

0

1

1

0

0

0

0

1

0

1

0

0

1

2

100

100

100

010

010

010

001

001

001

100

010

001

010

001

100

001

100

010

101

001

011

+ items+ items ResultvectorResultvector

{1,2}

{2}

{0,2}

44

All you ever needed to know about (Reed-Solomon) codes…at least for this talk

q is a prime power

qq/(d+1) vectors from [q]q where every two agree in < q/(d+1) positions

poly(q) time algorithm for list recovery

.

.

.

..

.

.

S1 S2 S3 Sq

………………………

Si subset of [q]

………………………c1 c2 c3 cq

|Si| ≤ d

45

d2 columns

What does this imply?

d columns

1 0 0 0 …………….. 0

Set of positives

r1

r2

r3

rt

.

.

.

KS matrixpoly(t) time

O(d2t) time

t = O(d2 log2 n) Implicit in [Guruswami-

Indyk 04]

Implicit in [Guruswami-

Indyk 04]

46

Up next

# tests (t) Decoding time

O(d2 log n) poly(t) [INR10, NPR11]

O(d2 log n) O(nt) [DR82], [PR08]

O(d4 log n) O(t) [GI04]

O(d2 log2 n) poly(t) [GI04, implicit]

47

L columns

Filter-evaluate decoding paradigm

d columns

1 0 0 0 …………….. 0

Set of positives

r1

r2

r3

rt

.

.

.

d-disjunct matrix

“Filtering” matrix

y1

y2

y3

yt’

.

.

.poly(t’)time

O(Lt) time 48

So all we need to do

o(d2 log n/log d) tests

49

[Indyk, Ngo, R. 10]

[Ngo, Porat, R. 11]

Overview of the results

# tests (t) Decoding time

O(d2 log n) poly(t) [INR10, NPR11]

O(d2 log n) O(nt) [DR82], [PR08]

O(d4 log n) O(t) [GI04]

O(d2 log2 n) poly(t) [GI04, implicit]

50

The main message

51

Coding Theory

Group Testing

Open Questions

Close the gap between upper and lower bounds

Other applications of group testing? Complexity Theory?

Strongly explicit construction of optimal disjunct matrices ?

52

More on Coding Theory

53

http://www.cse.buffalo.edu/~atri/courses/coding-theory/book/index.html

Questions?

54

d+L columns

The filtering matrix

New* object: (d,L)-list disjunct matrix

d columns

Set of positives

Running naïve decoderreturns ≤ L bogus columns

Independently considered by

[Cheraghchi 09]

Independently considered by

[Cheraghchi 09]

(d,d)-list disjunct matricesexists with O(d log n) tests

55

Reed-Solomon codes

56

Message: (x0,x1,…,xk-1) Fk

View as poly. f(Y) = x0+x1Y+…+xk-1Yk-1

Encoding, RS(f) = ( f(1),f(2),…,f(m) ) F ={ 1,2,…,m}

f(1) f(2) f(3) f(4) f(m)

Alphabet size is at least m

Alphabet size is at least m

r

Revisiting the decoding algorithm

.

.

.

.

1

2

j

q

.

.

.

.

.

.

...

.

.

.

.

.

.

1x x ………… Sj

.

.

|Sj|≤ d

1 3 q21

11

……….……….……….

.

.

.

2

1

1

3

q

d-disjunct matrix

Naïve decoderNaïve decoder

Works but hits

a d3 barrier

Works but hits

a d3 barrier

57

r

Connection to List Recovery

x 0 0 1…. …. 0x

.

.

.

.

1

2

j

q

.

.

.

.

.

.

...

.

.

.

.

.

.

Decoding: Output all codewords that match the test results

1x x ………… Sj

.

.

.

………… S1

………… S2

………… Sq

List recover from S1,…,St to get the positive

codewords

List recover from S1,…,St to get the positive

codewords

|Sj|≤ d

58

r

Revisiting the decoding algorithm-II

.

.

.

.

1

2

j

q

.

.

.

.

.

.

...

.

.

.

.

.

.

1x x ………… Sj

.

.

|Sj|≤ 2d

1 3 q2

(d,d)-list disjunct

Naïve decoderNaïve decoder

Need to change the parameters

of the Reed-

Solomon codes a bit.

Need to change the parameters

of the Reed-

Solomon codes a bit.

59

http://www.impawards.com/2007/are_we_done_yet.html60

How we get our hands on…

.

.

.

.

1

2

j

q

.

.

.

.

.

.

...

.

.

.

.

.

.

1 3 q2

(d,d)-list disjunct

n ~ qq/d

RS codeword

d log qrows

t = q X (d log q)

~ (d X log n/ log q) X (d log q)

= d2 log n 61

Solution 1 [Indyk, Ngo, R. 10]

1 3 q2

(d,d)-list disjunctd log qrows

Pick “inner” codes at random

62

Solution 2 [Ngo, Porat, R. 10]

1 3 q2

(d,d)-list disjunctd log qrows

Use explicit expanders!

Some comments:

Left degree of the expander not important

d1+o(1) log q rows possible [GUV 07, Cheraghchi 09]

Use PV codes instead of RS codes63

top related