scalable machine learning for massive datasets: fast summation algorithms · 2007-03-08 ·...

125
Scalable machine learning for massive datasets: Fast summation algorithms Getting good enough solutions as fast as possible Vikas Chandrakant Raykar [email protected] University of Maryland, CollegePark March 8, 2007 Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 1 / 69

Upload: others

Post on 05-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Scalable machine learning for massive datasets:Fast summation algorithms

Getting good enough solutions as fast as possible

Vikas Chandrakant [email protected]

University of Maryland, CollegePark

March 8, 2007

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 1 / 69

Page 2: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Motivation

Outline of the proposal

1 Motivation

2 Key Computational tasks

3 Thesis contributionsAlgorithm 1: Sums of Gaussians

Kernel density estimation

Gaussian process regression

Implicit surface fitting

Algorithm 2: Sums of Hermite × GaussiansOptimal bandwidth estimation

Projection pursuit

Algorithm 3: Sums of error functionsRanking

4 Conclusions

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 2 / 69

Page 3: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Motivation

Supervised LearningLearning from examples

Classification [spam/ not spam]

Regression [predicting the amount of snowfall]

Ranking [predicting movie preferences]

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 3 / 69

Page 4: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Motivation

Supervised LearningLearning from examples

Classification [spam/ not spam]

Regression [predicting the amount of snowfall]

Ranking [predicting movie preferences]

Learning can be viewed as function estimation f : X → YX = Rd features/attributes.

Classification Y = {−1,+1}.Regression Y = R

Task X Yspam filtering word frequencies spam(+1) not spam(-1)

snowfall prediction temperature, humidity inches of snow

movie preferences rating by other users movie 1 � movie 2

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 3 / 69

Page 5: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Motivation

Three components of learning

Learning can be viewed as function estimation f : X → Y.

Three tasks

Training → Learning the function f from examples {xi , yi}Ni=1.

Prediction → Given a new x predict y .

Model Selection → What kind on function f to use.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 4 / 69

Page 6: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Motivation

Two approaches to learningParametric approach

Assumes a known parametric form for the function to be learnt.

Training ⇔ Estimate the unknown parameters.

Once the model has been trained, for future prediction the trainingexamples can be discarded.

The essence of the training examples have been captured in themodel parameters.

Leads to erroneous inference unless the model is known a priori.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 5 / 69

Page 7: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Motivation

Two approaches to learningNon-parametric approach

Do not make any assumptions on the form of the underlying function.

Letting the data speak for themselves.

Perform better than parametric methods.

However all the available data has to be retained while making theprediction.

Also known as memory based methods.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 6 / 69

Page 8: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Motivation

Scalable machine learningSay we have N training examples

Many state-of-the-art learning algorithms scale as O(N2) or O(N3).

They also have O(N2) memory requirements.

Huge data sets containing millions of training examples are relativelyeasy to gather.

We would like to have algorithms that scale as O(N).

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 7 / 69

Page 9: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Motivation

Scalable machine learningSay we have N training examples

Many state-of-the-art learning algorithms scale as O(N2) or O(N3).

They also have O(N2) memory requirements.

Huge data sets containing millions of training examples are relativelyeasy to gather.

We would like to have algorithms that scale as O(N).

Example

A kernel density estimation with 1 million points would take around 2 days.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 7 / 69

Page 10: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Motivation

Scalable machine learningSay we have N training examples

Many state-of-the-art learning algorithms scale as O(N2) or O(N3).

They also have O(N2) memory requirements.

Huge data sets containing millions of training examples are relativelyeasy to gather.

We would like to have algorithms that scale as O(N).

Example

A kernel density estimation with 1 million points would take around 2 days.

Previous approaches

Use only a subset of the data.

Online algorithms.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 7 / 69

Page 11: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Motivation

Goals of this dissertation

1 Identify the key computational primitives contributing to the O(N3)or O(N2) complexity.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 8 / 69

Page 12: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Motivation

Goals of this dissertation

1 Identify the key computational primitives contributing to the O(N3)or O(N2) complexity.

2 Speedup up these primitives by approximate algorithms that scale asO(N) and provide high accuracy guarantees.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 8 / 69

Page 13: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Motivation

Goals of this dissertation

1 Identify the key computational primitives contributing to the O(N3)or O(N2) complexity.

2 Speedup up these primitives by approximate algorithms that scale asO(N) and provide high accuracy guarantees.

3 Demonstrate the speedup achieved on massive datasets.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 8 / 69

Page 14: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Motivation

Goals of this dissertation

1 Identify the key computational primitives contributing to the O(N3)or O(N2) complexity.

2 Speedup up these primitives by approximate algorithms that scale asO(N) and provide high accuracy guarantees.

3 Demonstrate the speedup achieved on massive datasets.

4 Realese the source code for the algorithms developed under LGPL.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 8 / 69

Page 15: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Motivation

Goals of this dissertation

1 Identify the key computational primitives contributing to the O(N3)or O(N2) complexity.

2 Speedup up these primitives by approximate algorithms that scale asO(N) and provide high accuracy guarantees.

3 Demonstrate the speedup achieved on massive datasets.

4 Realese the source code for the algorithms developed under LGPL.

Fast matrix-vector multiplication

The key computational primitive at the heart of various algorithms is a”structured” matrix-vector product (MVP).

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 8 / 69

Page 16: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Motivation

Tools and applications

We use ideas and techniques from

Computational physics 7→ fast multipole methods.

Scientific computing 7→ iterative methods

Computational geometry 7→clustering, kd-trees.

to design these algorithms and have applied it to

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 9 / 69

Page 17: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Motivation

Tools and applications

We use ideas and techniques from

Computational physics 7→ fast multipole methods.

Scientific computing 7→ iterative methods

Computational geometry 7→clustering, kd-trees.

to design these algorithms and have applied it to

kernel density estimation [59,63,64,67]

optimal bandwidth estimation [60,61]

projection pursuit [60,61]

implicit surface fitting

Gaussian process regression [59,64]

ranking [62,65,66]

collaborative filtering [65,66]

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 9 / 69

Page 18: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Key Computational tasks

Outline of the proposal

1 Motivation

2 Key Computational tasks

3 Thesis contributionsAlgorithm 1: Sums of Gaussians

Kernel density estimation

Gaussian process regression

Implicit surface fitting

Algorithm 2: Sums of Hermite × GaussiansOptimal bandwidth estimation

Projection pursuit

Algorithm 3: Sums of error functionsRanking

4 Conclusions

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 10 / 69

Page 19: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Key Computational tasks

Key Computational tasks

Training Prediction Choosing(N examples) (at N points) parameters

Kernel regression O(N2) O(N2) O(N2)

Gaussian processes O(N3) O(N2) O(N3)

SVM O(N3sv ) O(NsvN) O(N3

sv )

Ranking O(N2)

KDE O(N2) O(N2)

Laplacian eigenmaps O(N3)

Kernel PCA O(N3)

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 11 / 69

Page 20: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Key Computational tasks

Key Computational tasks

Training Prediction Choosing(N examples) (at N points) parameters

Kernel regression O(N2) O(N2) O(N2)

Gaussian processes O(N3) O(N2) O(N3)

SVM O(N3sv ) O(NsvN) O(N3

sv )

Ranking O(N2)

KDE O(N2) O(N2)

Laplacian eigenmaps O(N3)

Kernel PCA O(N3)

The key computational primitives contributing to O(N2) or O(N3).

Matrix-vector multiplication.

Solving large linear systems.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 11 / 69

Page 21: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Key Computational tasks

Kernel machines

Minimize the regularized empirical risk functional Rreg [f ].

minf ∈H

Rreg [f ] =1

N

N∑

i=1

L[f (xi ), yi ] + λ‖f ‖2H, (1)

where H denotes a reproducing kernel Hilbert space (RKHS).

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 12 / 69

Page 22: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Key Computational tasks

Kernel machines

Minimize the regularized empirical risk functional Rreg [f ].

minf ∈H

Rreg [f ] =1

N

N∑

i=1

L[f (xi ), yi ] + λ‖f ‖2H, (1)

where H denotes a reproducing kernel Hilbert space (RKHS).

Theorem (Representer Theorem)

If k : X × X 7→ Y is the kernel of the RKHS H then the minimizer of

Equation 1 is of the form

f (x) =

N∑

i=1

qik(x , xi ). (2)

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 12 / 69

Page 23: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Key Computational tasks

f (x) =N∑

i=1

qik(x , xi ).

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 13 / 69

Page 24: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Key Computational tasks

f (x) =N∑

i=1

qik(x , xi ).

Kernel machines f is the regression/classification function.[Representer theorem]

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 13 / 69

Page 25: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Key Computational tasks

f (x) =N∑

i=1

qik(x , xi ).

Kernel machines f is the regression/classification function.[Representer theorem]

Density estimation f is the kernel density estimate

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 13 / 69

Page 26: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Key Computational tasks

f (x) =N∑

i=1

qik(x , xi ).

Kernel machines f is the regression/classification function.[Representer theorem]

Density estimation f is the kernel density estimate

Gaussian processes f is the mean prediction.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 13 / 69

Page 27: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Key Computational tasks

Prediction

Given N training examples {xi}Ni=1, the key computational task is to

compute a weighted linear combination of local kernel functionscentered on the training data, i.e.,

f (x) =N∑

i=1

qik(x , xi ).

The computation complexity to predict at M points given N trainingexamples scales as O(MN).

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 14 / 69

Page 28: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Key Computational tasks

Training

Training these models scales as O(N3) since most involve solving thelinear system of equation

(K + λI)ξ = y.

K is the dense N × N Gram matrix where [K]ij = k(xi , xj).I is the identity matrix.λ is some regularization parameter or noise variance.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 15 / 69

Page 29: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Key Computational tasks

Training

Training these models scales as O(N3) since most involve solving thelinear system of equation

(K + λI)ξ = y.

K is the dense N × N Gram matrix where [K]ij = k(xi , xj).I is the identity matrix.λ is some regularization parameter or noise variance.

Direct inversion requires O(N3) operations and O(N2) storage.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 15 / 69

Page 30: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Key Computational tasks

N-body problems in statistical learning

O(N2) because computations involve considering pair-wise elements.

N-body problems in statistical learning in analogy with the CoulombicN-body problems occurring in computational physics.

These are potential based problems involving forces or charges.

In our case the potential corresponds to the kernel function.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 16 / 69

Page 31: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Key Computational tasks

Fast Matrix Vector Multiplication

We need a fast algorithm to compute

f (yj) =N∑

i=1

qik(yj , xi ) j = 1, . . . ,M.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 17 / 69

Page 32: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Key Computational tasks

Fast Matrix Vector Multiplication

We need a fast algorithm to compute

f (yj) =N∑

i=1

qik(yj , xi ) j = 1, . . . ,M.

Matrix Vector Multiplication f = Kq

f (y1)f (y2)

...f (yM)

=

k(y1, x1) k(y1, x2) . . . k(y1, xN)k(y2, x1) k(y2, x2) . . . k(y2, xN)

......

. . ....

k(yM , x1) k(yM , x2) . . . k(yM , xN)

q1

q2...

qN

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 17 / 69

Page 33: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Key Computational tasks

Fast Matrix Vector Multiplication

We need a fast algorithm to compute

f (yj) =N∑

i=1

qik(yj , xi ) j = 1, . . . ,M.

Matrix Vector Multiplication f = Kq

f (y1)f (y2)

...f (yM)

=

k(y1, x1) k(y1, x2) . . . k(y1, xN)k(y2, x1) k(y2, x2) . . . k(y2, xN)

......

. . ....

k(yM , x1) k(yM , x2) . . . k(yM , xN)

q1

q2...

qN

Direct computation is O(MN).

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 17 / 69

Page 34: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Key Computational tasks

Fast Matrix Vector Multiplication

We need a fast algorithm to compute

f (yj) =N∑

i=1

qik(yj , xi ) j = 1, . . . ,M.

Matrix Vector Multiplication f = Kq

f (y1)f (y2)

...f (yM)

=

k(y1, x1) k(y1, x2) . . . k(y1, xN)k(y2, x1) k(y2, x2) . . . k(y2, xN)

......

. . ....

k(yM , x1) k(yM , x2) . . . k(yM , xN)

q1

q2...

qN

Direct computation is O(MN).

Reduce from O(MN) to O(M + N)

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 17 / 69

Page 35: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Key Computational tasks

Why should O(M + N) be possible?

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 18 / 69

Page 36: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Key Computational tasks

Why should O(M + N) be possible?

Structured matrix

A dense matrix of order M × N is called a structured matrix if its entriesdepend only on O(M + N) parameters.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 18 / 69

Page 37: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Key Computational tasks

Why should O(M + N) be possible?

Structured matrix

A dense matrix of order M × N is called a structured matrix if its entriesdepend only on O(M + N) parameters.

K is a structured matrix.

[K]ij = k(xi , yj) = e−‖xi−yj‖2/h2

(Gaussian kernel)Depends only on xi and yj .

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 18 / 69

Page 38: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Key Computational tasks

Why should O(M + N) be possible?

Structured matrix

A dense matrix of order M × N is called a structured matrix if its entriesdepend only on O(M + N) parameters.

K is a structured matrix.

[K]ij = k(xi , yj) = e−‖xi−yj‖2/h2

(Gaussian kernel)Depends only on xi and yj .

Motivating toy example

Consider

G (yj) =N∑

i=1

qi (xi − yj)2 for j = 1, . . . ,M.

Direct summation is O(MN).

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 18 / 69

Page 39: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Key Computational tasks

Motivating toy example

Factorize and regroup

G (yj) =N∑

i=1

qi (xi − yj)2

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 19 / 69

Page 40: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Key Computational tasks

Motivating toy example

Factorize and regroup

G (yj) =N∑

i=1

qi (xi − yj)2

=

N∑

i=1

qi (x2i − 2xiyj + y2

j )

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 19 / 69

Page 41: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Key Computational tasks

Motivating toy example

Factorize and regroup

G (yj) =N∑

i=1

qi (xi − yj)2

=

N∑

i=1

qi (x2i − 2xiyj + y2

j )

=

[N∑

i=1

qix2i

]

− 2yj

[N∑

i=1

qixi

]

+ y2j

[N∑

i=1

qi

]

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 19 / 69

Page 42: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Key Computational tasks

Motivating toy example

Factorize and regroup

G (yj) =N∑

i=1

qi (xi − yj)2

=

N∑

i=1

qi (x2i − 2xiyj + y2

j )

=

[N∑

i=1

qix2i

]

− 2yj

[N∑

i=1

qixi

]

+ y2j

[N∑

i=1

qi

]

= M2 − 2yjM1 + y2j M0

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 19 / 69

Page 43: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Key Computational tasks

Motivating toy example

Factorize and regroup

G (yj) =N∑

i=1

qi (xi − yj)2

=

N∑

i=1

qi (x2i − 2xiyj + y2

j )

=

[N∑

i=1

qix2i

]

− 2yj

[N∑

i=1

qixi

]

+ y2j

[N∑

i=1

qi

]

= M2 − 2yjM1 + y2j M0

The moments M2, M1, and M0 can be pre-computed in O(N).

Hence the computational complexity is O(M + N).

Encapsulating information in terms of the moments.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 19 / 69

Page 44: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Key Computational tasks

Direct vs Fast

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 20 / 69

Page 45: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Key Computational tasks

In general

For any kernel K (x , y) we can expand as

K (x , y) =

p∑

k=1

Φk(x)Ψk(y) + error .

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 21 / 69

Page 46: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Key Computational tasks

In general

For any kernel K (x , y) we can expand as

K (x , y) =

p∑

k=1

Φk(x)Ψk(y) + error .

The fast summation is of the form

G (yj) =

p∑

k=1

AkΨk(y) + error ,

where the moments Ak can be pre-computed as

Ak =

N∑

i=1

qiΦk(xi ).

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 21 / 69

Page 47: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Key Computational tasks

In general

For any kernel K (x , y) we can expand as

K (x , y) =

p∑

k=1

Φk(x)Ψk(y) + error .

The fast summation is of the form

G (yj) =

p∑

k=1

AkΨk(y) + error ,

where the moments Ak can be pre-computed as

Ak =

N∑

i=1

qiΦk(xi ).

Organize using data-structures to use this effectively.

Give accuracy guarantees.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 21 / 69

Page 48: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Key Computational tasks

Notion of ǫ-exact approximation

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 22 / 69

Page 49: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Key Computational tasks

Notion of ǫ-exact approximation

Direct computation is O(MN).

We will compute G (yj) approximately so as to reduce thecomputational complexity to O(N + M).

Speedup at the expense of reduced precision.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 22 / 69

Page 50: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Key Computational tasks

Notion of ǫ-exact approximation

Direct computation is O(MN).

We will compute G (yj) approximately so as to reduce thecomputational complexity to O(N + M).

Speedup at the expense of reduced precision.

User provides a accuracy parameter ǫ.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 22 / 69

Page 51: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Key Computational tasks

Notion of ǫ-exact approximation

Direct computation is O(MN).

We will compute G (yj) approximately so as to reduce thecomputational complexity to O(N + M).

Speedup at the expense of reduced precision.

User provides a accuracy parameter ǫ.

The algorithm computes G(yj) such that |G (yj) − G (yj)| < ǫ.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 22 / 69

Page 52: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Key Computational tasks

Notion of ǫ-exact approximation

Direct computation is O(MN).

We will compute G (yj) approximately so as to reduce thecomputational complexity to O(N + M).

Speedup at the expense of reduced precision.

User provides a accuracy parameter ǫ.

The algorithm computes G(yj) such that |G (yj) − G (yj)| < ǫ.

The constant in O(N + M) depends on the accuracy ǫ.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 22 / 69

Page 53: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Key Computational tasks

Notion of ǫ-exact approximation

Direct computation is O(MN).

We will compute G (yj) approximately so as to reduce thecomputational complexity to O(N + M).

Speedup at the expense of reduced precision.

User provides a accuracy parameter ǫ.

The algorithm computes G(yj) such that |G (yj) − G (yj)| < ǫ.

The constant in O(N + M) depends on the accuracy ǫ.

Smaller the accuracy → Larger the speedup.

ǫ can be arbitrarily small.

For machine level precision no difference between the direct and thefast methods.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 22 / 69

Page 54: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Key Computational tasks

Two aspects of the problem

1 Approximation theory → series expansions and error bounds.

2 Computational geometry → effective data-structures.

A class of techniques using only good space division schemes called dualtree methods have been proposed.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 23 / 69

Page 55: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions

Outline of the proposal

1 Motivation

2 Key Computational tasks

3 Thesis contributionsAlgorithm 1: Sums of Gaussians

Kernel density estimation

Gaussian process regression

Implicit surface fitting

Algorithm 2: Sums of Hermite × GaussiansOptimal bandwidth estimation

Projection pursuit

Algorithm 3: Sums of error functionsRanking

4 Conclusions

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 24 / 69

Page 56: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 1: Sums of Gaussians

Algorithm 1: Sums of Gaussians

The most commonly used kernel function in machine learning is theGaussian kernel

K (x , y) = e−‖x−y‖2/h2,

where h is called the bandwidth of the kernel.

−5 −4 −3 −2 −1 0 1 2 3 4 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

xi

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 25 / 69

Page 57: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 1: Sums of Gaussians

Discrete Gauss Transform

G (yj) =

N∑

i=1

qie−‖yj−xi‖

2/h2.

{qi ∈ R}i=1,...,N are the N source weights.

{xi ∈ Rd}i=1,...,N are the N source points.

{yj ∈ Rd}j=1,...,M are the M target points.

h ∈ R+ is the source scale or bandwidth.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 26 / 69

Page 58: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 1: Sums of Gaussians

Fast Gauss Transform (FGT)

ǫ − exact approximation algorithm.

Computational complexity is O(M + N).

Proposed by Greengard and Strain and applied successfully to a fewlower dimensional applications in mathematics and physics.

However the algorithm has not been widely used much in statistics,pattern recognition, and machine learning applications where higherdimensions occur commonly.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 27 / 69

Page 59: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 1: Sums of Gaussians

Constants are important

FGT ∼ O(pd(M + N)).

We propose a method Improved FGT (IFGT) which scales as ∼O(dp(M + N)).

5 10 15 20 25 3010

0

105

1010

1015

1020

1025

p=5

d

FGT ~ pd

IFGT ~ dp

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 28 / 69

Page 60: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 1: Sums of Gaussians

Brief idea of IFGT

yj

r

ryk

rxk

ck

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 29 / 69

Page 61: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 1: Sums of Gaussians

Brief idea of IFGT

yj

r

ryk

rxk

ck

Step 0 Determine parameters of algorithm based on specified errorbound, kernel bandwidth, and data distribution.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 29 / 69

Page 62: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 1: Sums of Gaussians

Brief idea of IFGT

yj

r

ryk

rxk

ck

Step 0 Determine parameters of algorithm based on specified errorbound, kernel bandwidth, and data distribution.

Step 1 Subdivide the d-dimensional space using a k-center clusteringbased geometric data structure (O(N log K )).

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 29 / 69

Page 63: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 1: Sums of Gaussians

Brief idea of IFGT

yj

r

ryk

rxk

ck

Step 0 Determine parameters of algorithm based on specified errorbound, kernel bandwidth, and data distribution.

Step 1 Subdivide the d-dimensional space using a k-center clusteringbased geometric data structure (O(N log K )).

Step 2 Build a p truncated representation of kernels inside eachcluster using a set of decaying basis functions (O(Ndp)).

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 29 / 69

Page 64: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 1: Sums of Gaussians

Brief idea of IFGT

yj

r

ryk

rxk

ck

Step 0 Determine parameters of algorithm based on specified errorbound, kernel bandwidth, and data distribution.

Step 1 Subdivide the d-dimensional space using a k-center clusteringbased geometric data structure (O(N log K )).

Step 2 Build a p truncated representation of kernels inside eachcluster using a set of decaying basis functions (O(Ndp)).

Step 3 Collect the influence of all the the data in a neighborhoodusing coefficients at cluster center and evaluate (O(Mdp)).

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 29 / 69

Page 65: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 1: Sums of Gaussians

Sample result

For example in three dimensions and 1 million training and test points[h=0.4]

IFGT – 6 minutes.

Direct – 34 hours.

with an error of 10−8.

FIGTree

We have also combined the IFGT with a kd-tree based nearest neighborsearch algorithm.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 30 / 69

Page 66: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 1: Sums of Gaussians

Speedup as a function of d [h = 1.0]FGT cannot be run for d > 3

0 2 4 6 8 1010

−2

10−1

100

101

102

103

d

Tim

e (

se

c)

DirectFGTFIGTree

2 4 6 8 1010

−10

10−8

10−6

10−4

10−2

d

Ma

x.

ab

s.

err

or

/ Q

Desired errorFGTFIGTree

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 31 / 69

Page 67: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 1: Sums of Gaussians

Speedup as a function of d [h = 0.5√

d ]FIGTree scales well with d .

0 10 20 30 40 5010

−1

100

101

102

103

d

Tim

e (s

ec)

DirectFGTFIGTree

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 32 / 69

Page 68: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 1: Sums of Gaussians

Speedup as a function of ǫ

Better speedup for lower precision.

10−10

10−5

0

0.5

1

1.5

2

2.5

3

3.5

4

Desired error, ε

Tim

e (

se

c)

DirectFIGTreeDual tree

10−10

10−5

10−15

10−10

10−5

100

Desired error, ε

Ma

x. a

bs. e

rro

r / Q

Desired errorFIGTreeDual tree

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 33 / 69

Page 69: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 1: Sums of Gaussians

Speedup as a function of h

Scales well with bandwidth.

10−3

10−2

10−1

100

101

10−3

10−2

10−1

100

101

102

h

Tim

e (s

ec)

DirectFIGTreeDual tree

d=2

d=3

d=4

d=5

d=2

d=3d=4d=5

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 34 / 69

Page 70: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 1: Sums of Gaussians

Applications

Direct application

Kernel density estimation.

Prediction in Gaussian process regression, SVM, RLS.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 35 / 69

Page 71: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 1: Sums of Gaussians

Applications

Direct application

Kernel density estimation.

Prediction in Gaussian process regression, SVM, RLS.

Embed in iterative or optimization methods

Training of kernel machines and Gaussian processes.

Computing eigen vector in unsupervised learning tasks.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 35 / 69

Page 72: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 1: Sums of Gaussians

Application 1: Kernel density Estimation

Estimate the density p from an i.i.d. sample x1, . . . , xN drawn from p.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 36 / 69

Page 73: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 1: Sums of Gaussians

Application 1: Kernel density Estimation

Estimate the density p from an i.i.d. sample x1, . . . , xN drawn from p.

The most popular method is the kernel density estimator (also knownas Parzen window estimator).

p(x) = 1N

∑Ni=1

1hK

(x−xi

h

)

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 36 / 69

Page 74: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 1: Sums of Gaussians

Application 1: Kernel density Estimation

Estimate the density p from an i.i.d. sample x1, . . . , xN drawn from p.

The most popular method is the kernel density estimator (also knownas Parzen window estimator).

p(x) = 1N

∑Ni=1

1hK

(x−xi

h

)

The widely used kernel is a Gaussian.

p(x) =1

N

N∑

i=1

1

(2πh2)d/2e−‖x−xi‖

2/2h2. (3)

The computational cost of evaluating this sum at M points due to N

data points is O(NM),

The proposed FIGTree algorithm can be used to compute the sumapproximately to ǫ precision in O(N + M) time.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 36 / 69

Page 75: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 1: Sums of Gaussians

KDE experimental resultsN = M = 44, 484 ǫ = 10−2

SARCOS dataset

d Optimal h Direct time (sec.) FIGTree time (sec.) Speedup

1 0.024730 168.500 0.110 1531.8182 0.033357 180.156 0.844 213.4553 0.041688 189.438 6.094 31.08604 0.049527 196.375 19.047 10.3105 0.056808 208.453 97.156 2.1466 0.063527 221.906 130.250 1.7047 0.069711 226.375 121.829 1.8588 0.075400 236.781 106.203 2.2309 0.080637 247.235 88.250 2.80110 0.085465 254.547 98.718 2.579

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 37 / 69

Page 76: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 1: Sums of Gaussians

KDE experimental resultsN = M = 7000 ǫ = 10−2

10−4

10−3

10−2

10−1

100

10−2

100

102

h

Tim

e (s

ec)

d=4

DirectFIGTreeDual tree

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 38 / 69

Page 77: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 1: Sums of Gaussians

Application 2: Gaussian processes regression

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 39 / 69

Page 78: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 1: Sums of Gaussians

Application 2: Gaussian processes regression

Regression problem

Training data D = {xi ∈ Rd , yi ∈ R}N

i=1

Predict y for a new x .

Also get uncertainty estimates.

0 0.2 0.4 0.6 0.8 1−1.5

−1

−0.5

0

0.5

1

1.5

x

y

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 39 / 69

Page 79: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 1: Sums of Gaussians

Gaussian processes regression

Bayesian non-linear non-parametric regression.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 40 / 69

Page 80: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 1: Sums of Gaussians

Gaussian processes regression

Bayesian non-linear non-parametric regression.

The regression function is represented by an ensemble of functions, onwhich we place a Gaussian prior.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 40 / 69

Page 81: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 1: Sums of Gaussians

Gaussian processes regression

Bayesian non-linear non-parametric regression.

The regression function is represented by an ensemble of functions, onwhich we place a Gaussian prior.

This prior is updated in the light of the training data.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 40 / 69

Page 82: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 1: Sums of Gaussians

Gaussian processes regression

Bayesian non-linear non-parametric regression.

The regression function is represented by an ensemble of functions, onwhich we place a Gaussian prior.

This prior is updated in the light of the training data.

As a result we obtain predictions together with valid estimates ofuncertainty.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 40 / 69

Page 83: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 1: Sums of Gaussians

Gaussian process model

Model

y = f (x) + ε

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 41 / 69

Page 84: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 1: Sums of Gaussians

Gaussian process model

Model

y = f (x) + ε

ε is N (0, σ2).

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 41 / 69

Page 85: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 1: Sums of Gaussians

Gaussian process model

Model

y = f (x) + ε

ε is N (0, σ2).

f (x) is a zero-mean Gaussian process with covariance functionK (x , x

).

Most common covariance function is the Gaussian.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 41 / 69

Page 86: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 1: Sums of Gaussians

Gaussian process model

Model

y = f (x) + ε

ε is N (0, σ2).

f (x) is a zero-mean Gaussian process with covariance functionK (x , x

).

Most common covariance function is the Gaussian.

Infer the posterior

Given the training data D and a new input x∗ our task is to compute theposterior p(f∗|x∗,D).

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 41 / 69

Page 87: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 1: Sums of Gaussians

Solution

The posterior is a Gaussian.

The mean is used as the prediction.

The variance is the uncertainty associated with the prediction.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 42 / 69

Page 88: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 1: Sums of Gaussians

Solution

The posterior is a Gaussian.

The mean is used as the prediction.

The variance is the uncertainty associated with the prediction.

0 0.2 0.4 0.6 0.8 1−1.5

−1

−0.5

0

0.5

1

1.5

x

y

3σ2σ

σ

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 42 / 69

Page 89: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 1: Sums of Gaussians

Direct Training

ξ = (K + σ2I)−1y

Direct computation of the inverse of a matrix requires O(N3)operations and O(N2) storage.

Impractical even for problems of moderate size (typically a fewthousands).

For example N=25,600 takes around 10 hours, assuming you haveenough RAM.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 43 / 69

Page 90: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 1: Sums of Gaussians

Iterative methods

(K + λI)ξ = y.

The iterative method generates a sequence of approximate solutionsξk at each step which converge to the true solution ξ.

Can use the conjugate-gradient method.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 44 / 69

Page 91: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 1: Sums of Gaussians

Iterative methods

(K + λI)ξ = y.

The iterative method generates a sequence of approximate solutionsξk at each step which converge to the true solution ξ.

Can use the conjugate-gradient method.

Computational cost of conjugate-gradient

Requires one matrix-vector multiplication and 5N flops per iteration.

Four vectors of length N are required for storage.

Hence computational cost now reduces to O(kN2).

For example N=25,600 takes around 17 minutes (compare to 10hours).

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 44 / 69

Page 92: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 1: Sums of Gaussians

CG+FIGTree

The core computational step in each conjugate-gradient iteration isthe multiplication of the matrix K with a vector, say q.

Coupled with the CG the IFGT reduces the computational cost of GPregression to O(N).

For example N=25,600 takes around 3 secs. (compare to 10hours[direct] or 17 minutes[CG]).

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 45 / 69

Page 93: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 1: Sums of Gaussians

Results on the robotarm datasetTraining time

256 512 1024 2048 4096 819210

−2

10−1

100

101

102

m

Trai

ning

tim

e (s

ecs)

robotarm

SDSR and PPCG+FIGTreeCG+dual−tree

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 46 / 69

Page 94: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 1: Sums of Gaussians

Results on the robotarm datasetTest error

256 512 1024 2048 4096 81920.13

0.135

0.14

0.145

0.15

0.155

0.16

m

SM

SE

robotarm

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 47 / 69

Page 95: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 1: Sums of Gaussians

Results on the robotarm datasetTest time

256 512 1024 2048 4096 8192

10−2

10−1

100

m

Test

ing

time

(sec

s)

robotarm

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 48 / 69

Page 96: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 1: Sums of Gaussians

How to choose ǫ for inexact CG?

Matrix-vector product may be performed in an increasingly inexact manneras the iteration progresses and still allow convergence to the solution.

0 2 4 6 8 10 12 1410

−7

10−6

10−5

Iteration

ε

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 49 / 69

Page 97: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 1: Sums of Gaussians

Application 3:Implicit surface fitting

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 50 / 69

Page 98: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 1: Sums of Gaussians

Implicit surface fitting as regression

negative off−surface points

positiveoff−surface points

on−surface points

surface normals

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 51 / 69

Page 99: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 1: Sums of Gaussians

Implicit surface fitting as regression

Using the proposed approach we can handle point clouds containingmillions of points.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 52 / 69

Page 100: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 2: Sums of Hermite × Gaussians

Algorithm 2: Sums of Hermite × Gaussians

The FIGTree can be used in any kernel machine where we encountersums of Gaussians.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 53 / 69

Page 101: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 2: Sums of Hermite × Gaussians

Algorithm 2: Sums of Hermite × Gaussians

The FIGTree can be used in any kernel machine where we encountersums of Gaussians.

Most kernel methods require choosing some hyperparameters (e.g.bandwidth h of the kernel).

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 53 / 69

Page 102: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 2: Sums of Hermite × Gaussians

Algorithm 2: Sums of Hermite × Gaussians

The FIGTree can be used in any kernel machine where we encountersums of Gaussians.

Most kernel methods require choosing some hyperparameters (e.g.bandwidth h of the kernel).

Optimal procedures to choose these parameters are O(N2).

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 53 / 69

Page 103: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 2: Sums of Hermite × Gaussians

Algorithm 2: Sums of Hermite × Gaussians

The FIGTree can be used in any kernel machine where we encountersums of Gaussians.

Most kernel methods require choosing some hyperparameters (e.g.bandwidth h of the kernel).

Optimal procedures to choose these parameters are O(N2).

Most of these procedures involve solving some optimization whichinvolves taking the derivatives of kernel sums.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 53 / 69

Page 104: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 2: Sums of Hermite × Gaussians

Algorithm 2: Sums of Hermite × Gaussians

The FIGTree can be used in any kernel machine where we encountersums of Gaussians.

Most kernel methods require choosing some hyperparameters (e.g.bandwidth h of the kernel).

Optimal procedures to choose these parameters are O(N2).

Most of these procedures involve solving some optimization whichinvolves taking the derivatives of kernel sums.

The derivatives of Gaussian sums involve sums of products of Hermitepolynomials and Gaussians.

Gr (yj) =∑N

i=1 qiHr

(yj−xi

h

)e−(yj−xi )

2/2h2j = 1, . . . ,M.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 53 / 69

Page 105: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 2: Sums of Hermite × Gaussians

Kernel density estimation

The most popular method for density estimation is the kernel densityestimator (KDE).

p(x) =1

N

N∑

i=1

1

hK

(x − xi

h

)

FIGTree can be directly used to accelerate KDE.

Efficient use of KDE requires choosing h optimally.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 54 / 69

Page 106: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 2: Sums of Hermite × Gaussians

The bandwidth h is a very crucial parameter

As h decreases towards 0, the number of modes increases to thenumber of data points and the KDE is very noisy.

As h increases towards ∞, the number of modes drops to 1, so thatany interesting structure has been smeared away and the KDE justdisplays a unimodal pattern.

Small bandwidth h=0.01 Large bandwidth h=0.2

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 55 / 69

Page 107: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 2: Sums of Hermite × Gaussians

Application 1: Fast optimal bandwidth selection

The state-of-the-art method for optimal bandwidth selection forkernel density estimation scales as O(N2).

We present a fast computational technique that scales as O(N).

The core part is the fast ǫ − exact algorithm for kernel densityderivative estimation which reduces the computational complexityfrom O(N2) to O(N).

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 56 / 69

Page 108: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 2: Sums of Hermite × Gaussians

Application 1: Fast optimal bandwidth selection

The state-of-the-art method for optimal bandwidth selection forkernel density estimation scales as O(N2).

We present a fast computational technique that scales as O(N).

The core part is the fast ǫ − exact algorithm for kernel densityderivative estimation which reduces the computational complexityfrom O(N2) to O(N).

For example for N = 409, 600 points.

Direct evaluation → 12.76 hours.Fast evaluation → 65 seconds with an error of around 10−12.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 56 / 69

Page 109: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 2: Sums of Hermite × Gaussians

Marron Wand normal mixtures

−3 −2 −1 0 1 2 30

0.1

0.2

0.3

0.4

0.5

0.6

0.72

−3 −2 −1 0 1 2 30

0.2

0.4

0.6

0.8

1

1.2

1.43

−3 −2 −1 0 1 2 30

0.2

0.4

0.6

0.8

1

1.2

1.4

1.64

−3 −2 −1 0 1 2 30

0.5

1

1.5

2

2.5

3

3.5

45

−3 −2 −1 0 1 2 30

0.05

0.1

0.15

0.2

0.25

0.3

0.356

−3 −2 −1 0 1 2 30

0.05

0.1

0.15

0.2

0.25

0.3

0.357

−3 −2 −1 0 1 2 30

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

8

−3 −2 −1 0 1 2 30

0.05

0.1

0.15

0.2

0.25

0.3

0.359

−3 −2 −1 0 1 2 30

0.1

0.2

0.3

0.4

0.5

0.6

0.710

−3 −2 −1 0 1 2 30

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

11

−3 −2 −1 0 1 2 30

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

12

−3 −2 −1 0 1 2 30

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

13

−3 −2 −1 0 1 2 30

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

14

−3 −2 −1 0 1 2 30

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

15

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 57 / 69

Page 110: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 2: Sums of Hermite × Gaussians

Speedup for Marron Wand normal mixtures

hdirect hfast Tdirect (sec) Tfast (sec) Speedup Rel. Err.

1 0.122213 0.122215 4182.29 64.28 65.06 1.37e-0052 0.082591 0.082592 5061.42 77.30 65.48 1.38e-0053 0.020543 0.020543 8523.26 101.62 83.87 1.53e-0064 0.020621 0.020621 7825.72 105.88 73.91 1.81e-0065 0.012881 0.012881 6543.52 91.11 71.82 5.34e-0066 0.098301 0.098303 5023.06 76.18 65.93 1.62e-0057 0.092240 0.092240 5918.19 88.61 66.79 6.34e-0068 0.074698 0.074699 5912.97 90.74 65.16 1.40e-0059 0.081301 0.081302 6440.66 89.91 71.63 1.17e-00510 0.024326 0.024326 7186.07 106.17 67.69 1.84e-00611 0.086831 0.086832 5912.23 90.45 65.36 1.71e-00512 0.032492 0.032493 8310.90 119.02 69.83 3.83e-00613 0.045797 0.045797 6824.59 104.79 65.13 4.41e-00614 0.027573 0.027573 10485.48 111.54 94.01 1.18e-00615 0.023096 0.023096 11797.34 112.57 104.80 7.05e-007

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 58 / 69

Page 111: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 2: Sums of Hermite × Gaussians

Projection pursuit

The idea of projection pursuit is to search for projections from high- tolow-dimensional space that are most interesting.

1 Given N data points in a d dimensional space project each data pointonto the direction vector a ∈ Rd , i.e., zi = aT xi .

2 Compute the univariate nonparametric kernel density estimate, p, ofthe projected points zi .

3 Compute the projection index I (a) based on the density estimate.

4 Locally optimize over the the choice of a, to get the most interesting

projection of the data.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 59 / 69

Page 112: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 2: Sums of Hermite × Gaussians

Projection index

The projection index is designed to reveal specific structure in thedata, like clusters, outliers, or smooth manifolds.

The entropy index based on Renyi’s order-1 entropy is given by

I (a) =

∫p(z) log p(z)dz .

The density of zero mean and unit variance which uniquely minimizesthis is the standard normal density.

Thus the projection index finds the direction which is mostnon-normal.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 60 / 69

Page 113: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 2: Sums of Hermite × Gaussians

Speedup

The computational burden is reduced in the following three instances.

1 Computation of the kernel density estimate.

2 Estimation of the optimal bandwidth.

3 Computation of the first derivative of the kernel density estimate,which is required in the optimization procedure.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 61 / 69

Page 114: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 2: Sums of Hermite × Gaussians

Image segmentation via PP

(a)

−20

24

−5

0

5−5

0

5

(b)

−3 −2 −1 0 1 20

0.5

1

1.5

2(c) (d)

Image segmentation via PP with optimal KDE took 15 minutes while thatusing the direct method takes around 7.5 hours.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 62 / 69

Page 115: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 3: Sums of error functions

Algorithm 3: Sums of error functions

Another sum which we have encountered in ranking algorithms is

E (y) =

N∑

i=1

qi erfc(y − xi).

−5 0 5−0.5

0

0.5

1

1.5

2

2.5

z

erfc

(z)

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 63 / 69

Page 116: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 3: Sums of error functions

Example

N = M = 51, 200.

Direct evaluation takes about 18 hours.

We specify ǫ = 10−6.

Fast evaluation just takes 5 seconds.

Actual error is around 10−10

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 64 / 69

Page 117: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 3: Sums of error functions

Application 1: Ranking

For some applications ranking or ordering the elements is moreimportant.

Information retrieval.Movie recommendation.Medical decision making.

Compare two instances and predict which one is better.

Various ranking algorithms train the models using pairwise preferencerelations.

Computationally expensive to train due to the quadratic scaling in thenumber of pairwise constraints,

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 65 / 69

Page 118: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Thesis contributions Algorithm 3: Sums of error functions

Fast ranking algorithm

We propose a new ranking algorithm.

Our algorithm also uses pairwise comparisons the runtime is stilllinear.

This is made possible by fast approximate summation of erfcfunctions.

The proposed algorithm is as accurate as the best available methodsin terms of ranking accuracy.

Several orders of magnitude faster.

For a dataset with 4, 177 examples the algorithm took around 2seconds.

Direct took 1736 seconds and the best competitor RankBoost took63 seconds.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 66 / 69

Page 119: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Conclusions

Outline of the proposal

1 Motivation

2 Key Computational tasks

3 Thesis contributionsAlgorithm 1: Sums of Gaussians

Kernel density estimation

Gaussian process regression

Implicit surface fitting

Algorithm 2: Sums of Hermite × GaussiansOptimal bandwidth estimation

Projection pursuit

Algorithm 3: Sums of error functionsRanking

4 Conclusions

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 67 / 69

Page 120: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Conclusions

Conclusions

Identified the key computationally intensive primitives in machinelearning.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 68 / 69

Page 121: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Conclusions

Conclusions

Identified the key computationally intensive primitives in machinelearning.

We presented linear time algorithms.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 68 / 69

Page 122: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Conclusions

Conclusions

Identified the key computationally intensive primitives in machinelearning.

We presented linear time algorithms.

We gave high accuracy guarantees.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 68 / 69

Page 123: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Conclusions

Conclusions

Identified the key computationally intensive primitives in machinelearning.

We presented linear time algorithms.

We gave high accuracy guarantees.

Unlike methods which rely on choosing a subset of the dataset we useall the available points and still achieve O(N) complexity.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 68 / 69

Page 124: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Conclusions

Conclusions

Identified the key computationally intensive primitives in machinelearning.

We presented linear time algorithms.

We gave high accuracy guarantees.

Unlike methods which rely on choosing a subset of the dataset we useall the available points and still achieve O(N) complexity.

Applied it to a few machine learning tasks.

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 68 / 69

Page 125: Scalable machine learning for massive datasets: Fast summation algorithms · 2007-03-08 · Scalable machine learning for massive datasets: Fast summation algorithms Getting good

Conclusions

Publications

Conference papers

A fast algorithm for learning large scale preference relations. Vikas C. Raykar, Ramani Duraiswami, and BalajiKrishnapuram, In Proceedings of the AISTATS 2007, Peurto Rico, March 2007 [also submitted to PAMI]

Fast optimal bandwidth selection for kernel density estimation. Vikas C. Raykar and Ramani Duraiswami, InProceedings of the sixth SIAM International Conference on Data Mining, Bethesda, April 2006, pp. 524-528. [inpreparation for JCGS]

The Improved Fast Gauss Transform with applications to machine learning. Vikas C. Raykar and Ramani Duraiswami,To appear in Large Scale Kernel Machines, MIT Press 2006. [also submitted to JMLR]

Technical reports

Fast weighted summation of erfc functions. Vikas C. Raykar, R. Duraiswami, and B. Krishnapuram, CS-TR-4848,Department of computer science, University of Maryland, CollegePark.

Very fast optimal bandwidth selection for univariate kernel density estimation. Vikas C. Raykar and R. Duraiswami,CS-TR-4774, Department of computer science, University of Maryland, CollegePark.

Fast computation of sums of Gaussians in high dimensions. Vikas C. Raykar, C. Yang, R. Duraiswami, and N. Gumerov,CS-TR-4767, Department of computer science, University of Maryland, CollegePark.

Software releases under LGPLThe FIGTree algorithm.

Fast optimal bandwidth estimation.

Fast erfc summation (coming soon)

Vikas C. Raykar (Univ. of Maryland) Doctoral dissertation March 8, 2007 69 / 69