sublinear

57

Upload: keegan-rice

Post on 02-Jan-2016

20 views

Category:

Documents


1 download

DESCRIPTION

Sublinear. Algorithms. Sloan Digital Sky Survey. 4 petabytes (~1MG). 10 petabytes/yr. Biomedical imaging. 150 petabytes/yr. Data. Data. massive input. output. Sample tiny fraction. Sublinear algorithms. Approximate MST. [CRT ’01]. Optimal!. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Sublinear
Page 2: Sublinear

Biomedical imaging

Sloan Digital Sky

Survey4 petabytes4 petabytes

(~1MG)(~1MG)4 petabytes4 petabytes

(~1MG)(~1MG)

10 10 petabytes/yrpetabytes/yr

10 10 petabytes/yrpetabytes/yr

150 petabytes/yr150 petabytes/yr150 petabytes/yr150 petabytes/yr

Page 3: Sublinear
Page 4: Sublinear
Page 5: Sublinear

massive input

massive input outputoutput

Sublinear Sublinear algorithalgorith

msms

Sublinear Sublinear algorithalgorith

msms

Sample tiny fractionSample tiny fractionSample tiny fractionSample tiny fraction

Page 6: Sublinear
Page 7: Sublinear

ApproximateApproximate MSTMSTApproximateApproximate MSTMST [CRT ’01][CRT ’01][CRT ’01][CRT ’01]

Page 8: Sublinear

Reduces to counting connected componentsReduces to counting connected componentsReduces to counting connected componentsReduces to counting connected components

Page 9: Sublinear

EEEE = no. connected components= no. connected components= no. connected components= no. connected components

varvarvarvar << (no. connected components)<< (no. connected components)<< (no. connected components)<< (no. connected components)2222

Page 10: Sublinear

Shortest PathsShortest PathsShortest PathsShortest Paths [CLM ’03][CLM ’03][CLM ’03][CLM ’03]

Page 11: Sublinear

Ray ShootingRay ShootingRay ShootingRay Shooting

VolumeVolume IntersectionIntersection Point locationPoint location

VolumeVolume IntersectionIntersection Point locationPoint location

[CLM ’03][CLM ’03][CLM ’03][CLM ’03]

Page 12: Sublinear
Page 13: Sublinear

low-entropy datalow-entropy data

low-entropy datalow-entropy data

Takens embeddingsTakens embeddings Markov models Markov models (speech)(speech)

Takens embeddingsTakens embeddings Markov models Markov models (speech)(speech)

Page 14: Sublinear

Self-Improving AlgorithmsSelf-Improving AlgorithmsSelf-Improving AlgorithmsSelf-Improving Algorithms

Arbitrary, unknown random sourceArbitrary, unknown random sourceArbitrary, unknown random sourceArbitrary, unknown random source

SortingSorting MatchingMatching MaxCutMaxCut All pairs shortest pathsAll pairs shortest paths Transitive closureTransitive closure ClusteringClustering

SortingSorting MatchingMatching MaxCutMaxCut All pairs shortest pathsAll pairs shortest paths Transitive closureTransitive closure ClusteringClustering

Page 15: Sublinear

Self-Improving AlgorithmsSelf-Improving AlgorithmsSelf-Improving AlgorithmsSelf-Improving Algorithms

Arbitrary, unknown random sourceArbitrary, unknown random sourceArbitrary, unknown random sourceArbitrary, unknown random source

11. . Run algorithm for best worst-case behaviorRun algorithm for best worst-case behavior or best under uniform distribution or best underor best under uniform distribution or best under some postulated prior.some postulated prior.

11. . Run algorithm for best worst-case behaviorRun algorithm for best worst-case behavior or best under uniform distribution or best underor best under uniform distribution or best under some postulated prior.some postulated prior.

22. Learning phase: Algorithm finetunes itself. Learning phase: Algorithm finetunes itself as it learns about the random source throughas it learns about the random source through repeated use.repeated use.

22. Learning phase: Algorithm finetunes itself. Learning phase: Algorithm finetunes itself as it learns about the random source throughas it learns about the random source through repeated use.repeated use.

33. . Algorithm settles to stationary status: optimalAlgorithm settles to stationary status: optimal expected complexity under (still unknown) expected complexity under (still unknown) random source.random source.

33. . Algorithm settles to stationary status: optimalAlgorithm settles to stationary status: optimal expected complexity under (still unknown) expected complexity under (still unknown) random source.random source.

Page 16: Sublinear

Self-Improving AlgorithmsSelf-Improving AlgorithmsSelf-Improving AlgorithmsSelf-Improving Algorithms

E Tk E Tk Optimal Optimal expected time for expected time for random sourcerandom source

E Tk E Tk Optimal Optimal expected time for expected time for random sourcerandom source

time T1time T1time T1time T1

time T2time T2time T2time T2

time T5time T5time T5time T5

time T3time T3time T3time T3

time T4time T4time T4time T4

Page 17: Sublinear

(x1, x2, … , xn) (x1, x2, … , xn) (x1, x2, … , xn) (x1, x2, … , xn)

SortingSortingSortingSorting

each xi independent from Dieach xi independent from Di H = entropy of rank distributionH = entropy of rank distribution each xi independent from Dieach xi independent from Di H = entropy of rank distributionH = entropy of rank distribution

Page 18: Sublinear

ClusteringClusteringClusteringClustering K-median K-median (k=2)(k=2)K-median K-median (k=2)(k=2)

Page 19: Sublinear

Minimize sum of distancesMinimize sum of distancesMinimize sum of distancesMinimize sum of distances Hamming cube Hamming cube {0,1}{0,1}Hamming cube Hamming cube {0,1}{0,1}

dddd

Page 20: Sublinear

Minimize sum of distancesMinimize sum of distancesMinimize sum of distancesMinimize sum of distances Hamming cube Hamming cube {0,1}{0,1}Hamming cube Hamming cube {0,1}{0,1}

dddd

Page 21: Sublinear

Minimize sum of distancesMinimize sum of distancesMinimize sum of distancesMinimize sum of distances Hamming cube Hamming cube {0,1}{0,1}Hamming cube Hamming cube {0,1}{0,1}

dddd

[KSS][KSS][KSS][KSS]

Page 22: Sublinear

How to achieve linear limiting expected How to achieve linear limiting expected time?time?How to achieve linear limiting expected How to achieve linear limiting expected time?time?

Input space {0,1}Input space {0,1}Input space {0,1}Input space {0,1}dndndndn

prob < O(dn)/KSSprob < O(dn)/KSSprob < O(dn)/KSSprob < O(dn)/KSS

Identify coreIdentify coreIdentify coreIdentify core

TailTail::TailTail::

Use KSS Use KSS Use KSS Use KSS

Page 23: Sublinear

How to achieve linear limiting expected How to achieve linear limiting expected time?time?How to achieve linear limiting expected How to achieve linear limiting expected time?time?

Store sample of Store sample of precomputed KSSprecomputed KSSStore sample of Store sample of precomputed KSSprecomputed KSS

nearest neighbornearest neighbornearest neighbornearest neighborIncremental algorithmIncremental algorithmIncremental algorithmIncremental algorithm

NP vs P: input vicinity NP vs P: input vicinity algorithmic algorithmic vicinityvicinityNP vs P: input vicinity NP vs P: input vicinity algorithmic algorithmic vicinityvicinity

Page 24: Sublinear

Main difficulty: How to spot the tail?Main difficulty: How to spot the tail?Main difficulty: How to spot the tail?Main difficulty: How to spot the tail?

Page 25: Sublinear
Page 26: Sublinear

1. Data is accessible before noise1. Data is accessible before noise1. Data is accessible before noise1. Data is accessible before noise

2. Or it’s 2. Or it’s notnot2. Or it’s 2. Or it’s notnot2. Or ?2. Or ?2. Or ?2. Or ?

Page 27: Sublinear

1. Data is accessible before noise1. Data is accessible before noise1. Data is accessible before noise1. Data is accessible before noise

Page 28: Sublinear

encode decode

Page 29: Sublinear

Data inaccessible before noise

Assumptions are Assumptions are necessary !necessary !

Page 30: Sublinear

Data inaccessible before noise

2. Bipartite graph, expander2. Bipartite graph, expander

3. Solid w/ angular 3. Solid w/ angular constraintsconstraints

1. Sorted sequence1. Sorted sequence

4. Low dim attractor set4. Low dim attractor set

Page 31: Sublinear

Data inaccessible before noise

data must satisfydata must satisfy

some property some property PP

but does not quitebut does not quite

Page 32: Sublinear

f(x) = ?f(x) = ?

x

f(x)

But life being what it is…

data

f = access function

Page 33: Sublinear

f(x) = ?f(x) = ?

x

f(x) data

Page 34: Sublinear

)(O

Humans

Define distance from any object to data class

Page 35: Sublinear

f(x) = ?f(x) = ?

x

g(x)

x1, x2,…

f(x1), f(x2),…

filter

g is access function for:

Page 36: Sublinear

Similar to Self-Correction [RS96, BLR’93]

except:

about data, not functions

error-free

allows O(distance to property)

Page 37: Sublinear

Monotone function: [n] Rd

Filter requires polylog (n) queries

Page 38: Sublinear

Offline reconstruction

Page 39: Sublinear

Offline reconstruction

Page 40: Sublinear

Online reconstruction

Page 41: Sublinear

Online reconstruction

Page 42: Sublinear

Online reconstruction

Page 43: Sublinear

Online reconstruction

Page 44: Sublinear

monotonemonotone functionfunctionmonotonemonotone functionfunction

0

100

200

300

400

500

600

700

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Page 45: Sublinear

0

100

200

300

400

500

600

700

1 2 3 4 5 6 7 8 9 1011121314151617181920

Page 46: Sublinear

Frequency of a pointFrequency of a pointFrequency of a pointFrequency of a point

Smallest interval I containing > |I|/2 Smallest interval I containing > |I|/2 violations involving f(x)violations involving f(x)

Smallest interval I containing > |I|/2 Smallest interval I containing > |I|/2 violations involving f(x)violations involving f(x)

xxxx

Page 47: Sublinear

Frequency of a pointFrequency of a pointFrequency of a pointFrequency of a point

Page 48: Sublinear

Given Given x:x:

Given Given x:x:

1. estimate its frequency1. estimate its frequency1. estimate its frequency1. estimate its frequency

2. if nonzero, find “smallest” 2. if nonzero, find “smallest” intervalinterval

2. if nonzero, find “smallest” 2. if nonzero, find “smallest” intervalintervalaround x with both endpoints around x with both endpoints around x with both endpoints around x with both endpoints

having zero frequencyhaving zero frequencyhaving zero frequencyhaving zero frequency

3. interpolate between 3. interpolate between f(endpoints)f(endpoints)

3. interpolate between 3. interpolate between f(endpoints)f(endpoints)

Page 49: Sublinear

To prove:To prove:To prove:To prove:

1. Frequencies can be estimated 1. Frequencies can be estimated in in

1. Frequencies can be estimated 1. Frequencies can be estimated in in

2. Function is monotone 2. Function is monotone overover

2. Function is monotone 2. Function is monotone overover

polylog time polylog time polylog time polylog time

3. ZF domain occupies (1-3. ZF domain occupies (1-22

3. ZF domain occupies (1-3. ZF domain occupies (1-22

zero-frequency domainzero-frequency domainzero-frequency domainzero-frequency domain

) fraction) fraction) fraction) fraction

Page 50: Sublinear

Bivariate concave function

Filter requires polylog (n) queries

Page 51: Sublinear

bipartite graph

k-connectivity

expander

Page 52: Sublinear

denoising low-dim attractor sets

Page 53: Sublinear
Page 54: Sublinear

Priced Priced

computation & computation & accuracyaccuracy

Priced Priced

computation & computation & accuracyaccuracy

spectrometry/cloning/gene chipspectrometry/cloning/gene chip PCR/hybridization/chromatographyPCR/hybridization/chromatography gel electrophoresis/blottinggel electrophoresis/blotting

spectrometry/cloning/gene chipspectrometry/cloning/gene chip PCR/hybridization/chromatographyPCR/hybridization/chromatography gel electrophoresis/blottinggel electrophoresis/blotting

001100001010001111110011001101011100001100000101111o1o1100001100

001100001010001111110011001101011100001100000101111o1o1100001100

Linear programmingLinear programming Linear programmingLinear programming

Page 55: Sublinear

computation

experimentation

Page 56: Sublinear

Pricing dataPricing data

Pricing dataPricing data

Ongoing project w/ Nir AilonOngoing project w/ Nir AilonOngoing project w/ Nir AilonOngoing project w/ Nir Ailon

Factoring is easy. Here’s why…Factoring is easy. Here’s why…Factoring is easy. Here’s why…Factoring is easy. Here’s why…Gaussian mixture sample: Gaussian mixture sample: 0010010100100110101010100100101001001101010101….….

Page 57: Sublinear

Collaborators: Nir Ailon, Seshadri Comandur, Ding Liu Collaborators: Nir Ailon, Seshadri Comandur, Ding Liu