group testing and coding theory atri rudra university at buffalo, suny or, a theoretical computer...

Group Testing and Coding Theory

Atri Rudra University at Buffalo, SUNY

Or, A Theoretical Computer Scientist’s (Biased) View of Group Testing

Group testing overview

Test soldier for a disease

WWII example: syphillis

Test an army for a disease

What if only one soldier has the

disease?

Can we do

better?

Can we do

better?

Communicating with my 2 year oldC(x)

y = C(x)+error

x Give up

“Code” C“Akash English”

C(x) is a “codeword”

The setupC(x)

y = C(x)+error

x Give up

Mapping CError-correcting code or just code

Encoding: x C(x)

Decoding: y x

C(x) is a codeword

The fundamental tradeoff

Correct as many errors as possible with as little redundancy as possible

Can one achieve the “optimal” tradeoff with efficient encoding and decoding ?

The main message

Coding Theory

Group Testing

Asymptotic view

O() notation

≤ is O with glasses

poly(n) is O(nc) for some fixed c

Test an army for a disease

disease?

Can pool blood samples and

check if at least one soldier has

the disease

Can pool blood samples and

check if at least one soldier has

the disease

Group testing

Set of items: (Unknown) vector x in {0,1}n

At most d positives: |x| ≤ d

Tests: a subset S of {1,..,n}

Result of a test: OR of xi’s such that i in S

Goal 1: Figure out x

Goal 2: Minimize the number of tests t

Non-adaptive tests: all tests are fixed a priori

1 2 3 n…………

1 0 0 1………….

0 0 1 0………….

0 0 0 1………….

1 1 1 0………….

t = O(d2log n) is possiblet = O(d2log n) is possible

Tons of applications

Output + itemsOutput + items

The decoding step

1 2 3 n…………

1 0 0 1………….

0 0 1 0………….

0 0 0 1………….

1 1 1 0………….

unknownunknown

To be designedTo be designed

ObservedObserved

How fast can this step be done?

An application: heavy hitters

Stream items are numbers in the range {1,…,n}

Output all items that occur at least 1/d fraction of the times

One pass,poly log space,

poly log update,poly log report

One pass,poly log space,

poly log update,poly log report

Cormode-Muthukrishnan idea

Use group testing: maintain counters for each test

Heavy tail property: Total frequency of non-heavy items < 1/d

1 2 3 n…………

1 0 0 1………….

0 0 1 0………….

0 0 0 1………….

1 1 1 0………….

Maintain count of items in tests

Maintain total count m

ri = 1 iff ci ≥ m/d

xj = 1 iff j is a heavy item (|x| ≤ d)

r = M × x Reporting the heavy items is just decoding!

Reporting the heavy items is just decoding!

Requirements from group testing

1 2 3 n…………

1 0 0 1………….

0 0 1 0………….

0 0 0 1………….

1 1 1 0………….

Non-adaptiveness is crucial

Minimize t (space)

Strongly explicit matrix

Minimize decoding time (report time)

An overview of results

# tests (t) Decoding time

d is O(log n)d is O(log n)

O(d2 log n) poly(t) [INR10, NPR11]

O(d2 log n) O(nt) [DR82], [PR08]

O(d4 log n) O(t) [GI04]

O(d2 log2 n) poly(t) [GI04, implicit]

Big savings

Tackling the first row

d-disjunct matricesSufficient condition for group testing

d columns

1 0 0 0 …………….. 0Exists

True for every d subset of columns and a disjoint column

Set of positives

Test result=0

Every non-positive column has one 0 test

result

Every non-positive column has one 0 test

result

L columns

Naïve decoder for d-disjunct matrices

d columns

1 0 0 0 …………….. 0

Set of positives

If rj = 0 then for every column i that is in test j, set xi = 0

If xi=1 then all tests column i participates in will have a 1

O(nt) timeO(Lt) time

What is known

d columns

1 0 0 0 …………….. 0

Set of positives O(nt) time

d-disjunct matrix

Strongly explicit d-disjunct matrix with t = O(d2 log2n) [Kautz-Singleton 1964]

Deterministic d-disjunct matrix with t = O(d2 log n) [Porat-Rothschild 2008]

Lower bound of Ω(d2 log n/log d) [Dyachkov-Rykov 1982]

Randomized d-disjunct matrix with t = O(d2 log n) [Dyachkov-Rykov 1982]

Up next

Error-correcting codes

x Give up

Mapping C : km

Dimension k, block length m m≥ k

Rate R = k/m 1

Efficient means polynomial in mDecoding time complexity

Noise model

Errors are worst case (Hamming)error locationsarbitrary symbol changes

Limit on total number of errors

Hamming’s 60 yr old observation

Large “distance” is good

All you need to remember about Reed-Solomon codes– Part I

q is a prime power

qq/(d+1) vectors from [q]q where every two agree in < q/(d+1) positions

How do we get binary codes ?

Concatenation of codes [Forney 66]

C1: ({0,1}k)K ({0,1}k)M (Outer code)

C2: {0,1}k {0,1}m (Inner code)

C1° C2: {0,1}kK {0,1}mM

Typically k=O(log M)

wMw1 w2

C2(w1) C2(w2)C2(wM) C1° C2(x)

Disjunct matrices from RS codesn = qq/(d+1)

Column i gets ith codeword

x 0 0 1…. …. 0x

x. q rows

t = q2 = O(d2 log2n)

d-disjunct matrix [Kautz,Singleton]

Code Concatenation

A q=3 example

1-Agreement between two columns

≤ 1 agr

Agreement in binary = Agreement among RS codewords

< q/(d+1)

Agreement in binary = Agreement among RS codewords

< q/(d+1)29

d-disjunct matricesSufficient condition for group testing

d columns

1 0 0 0 …………….. 0Exists

True for every d subset of columns and a disjoint column

Set of positives

d-disjunctness of Kautz-Singleton

d columns

< q/(d+1) agr 11 11

1 >q- q*d/(d+1)>0 rows

Up next

The basic idea

1 2 3 n…………

1 0 0 1………….

0 0 1 0………….

0 0 0 1………….

1 1 1 0………….

unknownunknown

Every column is a codeword

ObservedObserved

Show is same as

`decoding’ the code

Show is same as

`decoding’ the code

n= # codewords = exp(m)

t = poly(m)

DecodingC(x) sent, y received

x k, y m

How much of y must be correct to recover x ?At least k symbols must be correctAt most (m-k)/m = 1-R fraction of errors1-R is the information-theoretic limit

: the fraction of errors decoder can handleInformation theoretic limit implies 1-R

x C(x)

yR = k/m

Can we get to the limit or 1-R ?

Not if we always want to uniquely recover the original message

Limit for unique decoding, (1-R)/2

(1-R)/2 (1-R)/2

(1-R)/2

List decoding [Elias57, Wozencraft58]

Always insisting on unique codeword is restrictiveThe “pathological” cases are rare

“Typical” received word can be decoded beyond (1-R)/2

Better Error-Recovery ModelOutput a list of answersList Decoding Example: Spell Checker

(1-R)/2

Almost all the space in higher dimension.

All but an exponential (in m) fraction

Information theoretic limit

• < 1 - R– Information-

theoretic limit

• Can handle twice as many errors

37Rate (R)

Unique decoding

Inf. theoretic limit

Achievable by random codes.

NOT ALGORITHMIC!

Achievable by random codes.

NOT ALGORITHMIC!

Other applications of list decoding

CryptographyCryptanalysis of certain block-ciphers [Jakobsen98]Efficient traitor tracing scheme [Silverberg, Staddon, Walker 03]

Complexity TheoryHardcore predicates from one way functions [Goldreich,Levin 89; Impagliazzo

97; Ta-Shama, Zuckerman 01]Worst-case vs. average-case hardness [Cai, Pavan, Sivakumar 99; Goldreich, Ron,

Sudan 99; Sudan, Trevisan, Vadhan 99; Impagliazzo, Jaiswal, Kabanets 06]

Other algorithmic applicationsIP Traceback [Dean,Franklin,Stubblefield 01; Savage, Wetherall, Karlin, Anderson 00] Guessing Secrets [Alon,Guruswami,Kaufman,Sudan 02; Chung, Graham, Leighton 01]

Algorithmic list decoding results

1- R - > 0 Folded RS codes[Guruswami, R. 06]

Unique decoding

Inf. theoretic limit

Guruswami-Sudan 98

Parvaresh-Vardy 05

Rate (R)

Folded RS

Concatenated codes

Concatenation of codes [Forney 66]

C1: ({0,1}k)K ({0,1}k)M (Outer code)

C2: {0,1}k {0,1}m (Inner code)

C1° C2: {0,1}kK {0,1}mM

Typically k=O(log M)

wMw1 w2

C2(w1) C2(w2)C2(wM) C1° C2(x)

Brute force decoding for inner code

List decoding C1° C2

y1 y2 yM

How do we “list decode” from lists ?

in {0,1}m

S1 S2 SM

in {0,1}k

List recovery

S1 S2 S3 SM

………………………Si subset of [q]

………………………c1 c2 c3 cM

|Si| ≤ d

All you need to remember about (Reed-Solomon) codes-- Part II

q is a prime power

poly(q) time algorithm for list recovery

S1 S2 S3 Sq

………………………Si subset of [q]

………………………c1 c2 c3 cq

|Si| ≤ d

Back to the example

+ items+ items ResultvectorResultvector

All you ever needed to know about (Reed-Solomon) codes…at least for this talk

q is a prime power

poly(q) time algorithm for list recovery

S1 S2 S3 Sq

………………………

Si subset of [q]

………………………c1 c2 c3 cq

|Si| ≤ d

d2 columns

What does this imply?

d columns

1 0 0 0 …………….. 0

Set of positives

KS matrixpoly(t) time

O(d2t) time

t = O(d2 log2 n) Implicit in [Guruswami-

Indyk 04]

Implicit in [Guruswami-

Indyk 04]

Up next

L columns

Filter-evaluate decoding paradigm

d columns

1 0 0 0 …………….. 0

Set of positives

d-disjunct matrix

“Filtering” matrix

.poly(t’)time

O(Lt) time 48

So all we need to do

o(d2 log n/log d) tests

[Indyk, Ngo, R. 10]

[Ngo, Porat, R. 11]

Overview of the results

The main message

Coding Theory

Group Testing

Open Questions

Close the gap between upper and lower bounds

Other applications of group testing? Complexity Theory?

Strongly explicit construction of optimal disjunct matrices ?

group testing and coding theory atri rudra university at buffalo, suny or, a theoretical computer...

ddisjunct matricesif

d positives

d fraction

randomized ddisjunct

poly log report time

poly log update

poly log space

codewordgroup testing

Documents

the perfect predator: a scientist’s race to save her

charlotte mcdonald foss consultant cmcdonald54@comcast.net...

one scientist’s wish list for stm publishers · one...

mmaaannnuuus ssmmmrrriiitttiiii iinnnn · pdf...

atri operational costs of trucking 2012

atri paintings for web 2015

data scientist’s analysis toolbox: comparison of python, r...

global insurance market opportunities a scientist’s...

science communication a scientist’s survival kit

a mad scientist’s chemistry presentation chemistry taks

scientist’s perspective dr. nick bontis

identifying autonomous vehicle technology impacts...

section 2.1: the scientist’s mind key vocabulary evidence...

a weed scientist’s perspective from a weed scientist’s...

data scientist’s analysis toolbox: comparison of python, r

a scientist’s primer to benito vergara

the scientist’s efforts in developmening the albadii’i

object oriented programming - atri cse tech club - home

ai 101: an opinionated computer scientist’s view

computer organization - atri cse tech club - home