modelling. - university of california, berkeley

130
Today. Modelling.

Upload: others

Post on 18-May-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Modelling. - University of California, Berkeley

Today.

Modelling.

An Analysis of the Power of PCA.

Musing (rant?) about algorithms in the real world.

Page 2: Modelling. - University of California, Berkeley

Today.

Modelling.

An Analysis of the Power of PCA.

Musing (rant?) about algorithms in the real world.

Page 3: Modelling. - University of California, Berkeley

Today.

Modelling.

An Analysis of the Power of PCA.

Musing (rant?) about algorithms in the real world.

Page 4: Modelling. - University of California, Berkeley

Two populations.

DNA data:

human1: A · · · C · · · T · · · Ahuman2: C · · · C · · · A · · · Thuman3: A · · · G · · · T · · · T

Single Nucleotide Polymorphism.

Same population?

Model: same populution breeds.

Population 1: snp 843: Pr[A] = .4 , Pr[T] = .6Population 2: snp 843: Pr[A] = .6 , Pr[T] = .4

Individual: x1,x2,x3..,xn.

Which population?

Comment: snps could be movie preferences, populations could betypes.

E.g., republican/democrat, shopper/saver.

Page 5: Modelling. - University of California, Berkeley

Two populations.

DNA data:

human1: A · · · C · · · T · · · A

human2: C · · · C · · · A · · · Thuman3: A · · · G · · · T · · · T

Single Nucleotide Polymorphism.

Same population?

Model: same populution breeds.

Population 1: snp 843: Pr[A] = .4 , Pr[T] = .6Population 2: snp 843: Pr[A] = .6 , Pr[T] = .4

Individual: x1,x2,x3..,xn.

Which population?

Comment: snps could be movie preferences, populations could betypes.

E.g., republican/democrat, shopper/saver.

Page 6: Modelling. - University of California, Berkeley

Two populations.

DNA data:

human1: A · · · C · · · T · · · Ahuman2: C · · · C · · · A · · · T

human3: A · · · G · · · T · · · T

Single Nucleotide Polymorphism.

Same population?

Model: same populution breeds.

Population 1: snp 843: Pr[A] = .4 , Pr[T] = .6Population 2: snp 843: Pr[A] = .6 , Pr[T] = .4

Individual: x1,x2,x3..,xn.

Which population?

Comment: snps could be movie preferences, populations could betypes.

E.g., republican/democrat, shopper/saver.

Page 7: Modelling. - University of California, Berkeley

Two populations.

DNA data:

human1: A · · · C · · · T · · · Ahuman2: C · · · C · · · A · · · Thuman3: A · · · G · · · T · · · T

Single Nucleotide Polymorphism.

Same population?

Model: same populution breeds.

Population 1: snp 843: Pr[A] = .4 , Pr[T] = .6Population 2: snp 843: Pr[A] = .6 , Pr[T] = .4

Individual: x1,x2,x3..,xn.

Which population?

Comment: snps could be movie preferences, populations could betypes.

E.g., republican/democrat, shopper/saver.

Page 8: Modelling. - University of California, Berkeley

Two populations.

DNA data:

human1: A · · · C · · · T · · · Ahuman2: C · · · C · · · A · · · Thuman3: A · · · G · · · T · · · T

Single Nucleotide Polymorphism.

Same population?

Model: same populution breeds.

Population 1: snp 843: Pr[A] = .4 , Pr[T] = .6Population 2: snp 843: Pr[A] = .6 , Pr[T] = .4

Individual: x1,x2,x3..,xn.

Which population?

Comment: snps could be movie preferences, populations could betypes.

E.g., republican/democrat, shopper/saver.

Page 9: Modelling. - University of California, Berkeley

Two populations.

DNA data:

human1: A · · · C · · · T · · · Ahuman2: C · · · C · · · A · · · Thuman3: A · · · G · · · T · · · T

Single Nucleotide Polymorphism.

Same population?

Model: same populution breeds.

Population 1: snp 843: Pr[A] = .4 , Pr[T] = .6Population 2: snp 843: Pr[A] = .6 , Pr[T] = .4

Individual: x1,x2,x3..,xn.

Which population?

Comment: snps could be movie preferences, populations could betypes.

E.g., republican/democrat, shopper/saver.

Page 10: Modelling. - University of California, Berkeley

Two populations.

DNA data:

human1: A · · · C · · · T · · · Ahuman2: C · · · C · · · A · · · Thuman3: A · · · G · · · T · · · T

Single Nucleotide Polymorphism.

Same population?

Model: same populution breeds.

Population 1: snp 843: Pr[A] = .4 , Pr[T] = .6Population 2: snp 843: Pr[A] = .6 , Pr[T] = .4

Individual: x1,x2,x3..,xn.

Which population?

Comment: snps could be movie preferences, populations could betypes.

E.g., republican/democrat, shopper/saver.

Page 11: Modelling. - University of California, Berkeley

Two populations.

DNA data:

human1: A · · · C · · · T · · · Ahuman2: C · · · C · · · A · · · Thuman3: A · · · G · · · T · · · T

Single Nucleotide Polymorphism.

Same population?

Model: same populution breeds.

Population 1: snp 843: Pr[A] = .4 , Pr[T] = .6Population 2: snp 843: Pr[A] = .6 , Pr[T] = .4

Individual: x1,x2,x3..,xn.

Which population?

Comment: snps could be movie preferences, populations could betypes.

E.g., republican/democrat, shopper/saver.

Page 12: Modelling. - University of California, Berkeley

Two populations.

DNA data:

human1: A · · · C · · · T · · · Ahuman2: C · · · C · · · A · · · Thuman3: A · · · G · · · T · · · T

Single Nucleotide Polymorphism.

Same population?

Model: same populution breeds.

Population 1: snp 843: Pr[A] = .4 , Pr[T] = .6Population 2: snp 843: Pr[A] = .6 , Pr[T] = .4

Individual: x1,x2,x3..,xn.

Which population?

Comment: snps could be movie preferences, populations could betypes.

E.g., republican/democrat, shopper/saver.

Page 13: Modelling. - University of California, Berkeley

Two populations.

DNA data:

human1: A · · · C · · · T · · · Ahuman2: C · · · C · · · A · · · Thuman3: A · · · G · · · T · · · T

Single Nucleotide Polymorphism.

Same population?

Model: same populution breeds.

Population 1: snp 843: Pr[A] = .4 , Pr[T] = .6Population 2: snp 843: Pr[A] = .6 , Pr[T] = .4

Individual: x1,x2,x3..,xn.

Which population?

Comment: snps could be movie preferences, populations could betypes.

E.g., republican/democrat, shopper/saver.

Page 14: Modelling. - University of California, Berkeley

Two populations.

DNA data:

human1: A · · · C · · · T · · · Ahuman2: C · · · C · · · A · · · Thuman3: A · · · G · · · T · · · T

Single Nucleotide Polymorphism.

Same population?

Model: same populution breeds.

Population 1: snp 843: Pr[A] = .4 , Pr[T] = .6Population 2: snp 843: Pr[A] = .6 , Pr[T] = .4

Individual: x1,x2,x3..,xn.

Which population?

Comment: snps could be movie preferences, populations could betypes.

E.g., republican/democrat, shopper/saver.

Page 15: Modelling. - University of California, Berkeley

Two populations.

DNA data:

human1: A · · · C · · · T · · · Ahuman2: C · · · C · · · A · · · Thuman3: A · · · G · · · T · · · T

Single Nucleotide Polymorphism.

Same population?

Model: same populution breeds.

Population 1: snp 843: Pr[A] = .4 , Pr[T] = .6Population 2: snp 843: Pr[A] = .6 , Pr[T] = .4

Individual: x1,x2,x3..,xn.

Which population?

Comment: snps could be movie preferences, populations could betypes.

E.g., republican/democrat, shopper/saver.

Page 16: Modelling. - University of California, Berkeley

Which population?

Population 1: snp 843: Pr[A] = .4 , Pr[T] = .6Population 2: snp 843: Pr[A] = .6 , Pr[T] = .4

Individual: x1,x2,x3..,xn.

Population 1: snp i : Pr[xi = 1] = p(1)i

Population 2: snp i : Pr[xi = 1] = p(2)i

Simpler Calculation:

Population 1: Gaussion with mean µ1 ∈ Rd , variance σ in each dim.Population 2: Gaussion with mean µ2 ∈ Rd , variance σ in each dim.

Page 17: Modelling. - University of California, Berkeley

Which population?

Population 1: snp 843: Pr[A] = .4 , Pr[T] = .6Population 2: snp 843: Pr[A] = .6 , Pr[T] = .4

Individual: x1,x2,x3..,xn.

Population 1: snp i : Pr[xi = 1] = p(1)i

Population 2: snp i : Pr[xi = 1] = p(2)i

Simpler Calculation:

Population 1: Gaussion with mean µ1 ∈ Rd , variance σ in each dim.Population 2: Gaussion with mean µ2 ∈ Rd , variance σ in each dim.

Page 18: Modelling. - University of California, Berkeley

Which population?

Population 1: snp 843: Pr[A] = .4 , Pr[T] = .6Population 2: snp 843: Pr[A] = .6 , Pr[T] = .4

Individual: x1,x2,x3..,xn.

Population 1: snp i : Pr[xi = 1] = p(1)i

Population 2: snp i : Pr[xi = 1] = p(2)i

Simpler Calculation:

Population 1: Gaussion with mean µ1 ∈ Rd , variance σ in each dim.Population 2: Gaussion with mean µ2 ∈ Rd , variance σ in each dim.

Page 19: Modelling. - University of California, Berkeley

Which population?

Population 1: snp 843: Pr[A] = .4 , Pr[T] = .6Population 2: snp 843: Pr[A] = .6 , Pr[T] = .4

Individual: x1,x2,x3..,xn.

Population 1: snp i : Pr[xi = 1] = p(1)i

Population 2: snp i : Pr[xi = 1] = p(2)i

Simpler Calculation:

Population 1: Gaussion with mean µ1 ∈ Rd , variance σ in each dim.Population 2: Gaussion with mean µ2 ∈ Rd , variance σ in each dim.

Page 20: Modelling. - University of California, Berkeley

Which population?

Population 1: snp 843: Pr[A] = .4 , Pr[T] = .6Population 2: snp 843: Pr[A] = .6 , Pr[T] = .4

Individual: x1,x2,x3..,xn.

Population 1: snp i : Pr[xi = 1] = p(1)i

Population 2: snp i : Pr[xi = 1] = p(2)i

Simpler Calculation:

Population 1: Gaussion with mean µ1 ∈ Rd , variance σ in each dim.Population 2: Gaussion with mean µ2 ∈ Rd , variance σ in each dim.

Page 21: Modelling. - University of California, Berkeley

Which population?

Population 1: snp 843: Pr[A] = .4 , Pr[T] = .6Population 2: snp 843: Pr[A] = .6 , Pr[T] = .4

Individual: x1,x2,x3..,xn.

Population 1: snp i : Pr[xi = 1] = p(1)i

Population 2: snp i : Pr[xi = 1] = p(2)i

Simpler Calculation:

Population 1: Gaussion with mean µ1 ∈ Rd , variance σ in each dim.Population 2: Gaussion with mean µ2 ∈ Rd , variance σ in each dim.

Page 22: Modelling. - University of California, Berkeley

Gaussians

Population 1: Gaussion with mean µ1 ∈ Rd , std dev. σ in each dim.Population 2: Gaussion with mean µ2 ∈ Rd , std dev. σ in each dim.

Difference between humans σ per snp.Difference between populations ε per snp.

How many snps to collect to determine population for individual x?x in population 1.

E [(x−µ1)2] = dσ2

E [(x−µ2)2]≥ (d −1)σ2 +(µ1−µ2)

2.

If (µ1−µ2)2 = dε2 >> σ2, then different.

→ take d >> σ2/ε2

Variance of estimator?Roughly dσ4.Signal is difference between expecations.

roughly dε2

Signal >> Noise. ↔ dε2 >>√

dσ2.Need d >> σ4/ε4.

Page 23: Modelling. - University of California, Berkeley

Gaussians

Population 1: Gaussion with mean µ1 ∈ Rd , std dev. σ in each dim.Population 2: Gaussion with mean µ2 ∈ Rd , std dev. σ in each dim.

Difference between humans σ per snp.Difference between populations ε per snp.

How many snps to collect to determine population for individual x?x in population 1.

E [(x−µ1)2] = dσ2

E [(x−µ2)2]≥ (d −1)σ2 +(µ1−µ2)

2.

If (µ1−µ2)2 = dε2 >> σ2, then different.

→ take d >> σ2/ε2

Variance of estimator?Roughly dσ4.Signal is difference between expecations.

roughly dε2

Signal >> Noise. ↔ dε2 >>√

dσ2.Need d >> σ4/ε4.

Page 24: Modelling. - University of California, Berkeley

Gaussians

Population 1: Gaussion with mean µ1 ∈ Rd , std dev. σ in each dim.Population 2: Gaussion with mean µ2 ∈ Rd , std dev. σ in each dim.

Difference between humans σ per snp.Difference between populations ε per snp.

How many snps to collect to determine population for individual x?

x in population 1.

E [(x−µ1)2] = dσ2

E [(x−µ2)2]≥ (d −1)σ2 +(µ1−µ2)

2.

If (µ1−µ2)2 = dε2 >> σ2, then different.

→ take d >> σ2/ε2

Variance of estimator?Roughly dσ4.Signal is difference between expecations.

roughly dε2

Signal >> Noise. ↔ dε2 >>√

dσ2.Need d >> σ4/ε4.

Page 25: Modelling. - University of California, Berkeley

Gaussians

Population 1: Gaussion with mean µ1 ∈ Rd , std dev. σ in each dim.Population 2: Gaussion with mean µ2 ∈ Rd , std dev. σ in each dim.

Difference between humans σ per snp.Difference between populations ε per snp.

How many snps to collect to determine population for individual x?x in population 1.

E [(x−µ1)2] = dσ2

E [(x−µ2)2]≥ (d −1)σ2 +(µ1−µ2)

2.

If (µ1−µ2)2 = dε2 >> σ2, then different.

→ take d >> σ2/ε2

Variance of estimator?Roughly dσ4.Signal is difference between expecations.

roughly dε2

Signal >> Noise. ↔ dε2 >>√

dσ2.Need d >> σ4/ε4.

Page 26: Modelling. - University of California, Berkeley

Gaussians

Population 1: Gaussion with mean µ1 ∈ Rd , std dev. σ in each dim.Population 2: Gaussion with mean µ2 ∈ Rd , std dev. σ in each dim.

Difference between humans σ per snp.Difference between populations ε per snp.

How many snps to collect to determine population for individual x?x in population 1.

E [(x−µ1)2] = dσ2

E [(x−µ2)2]≥ (d −1)σ2 +(µ1−µ2)

2.

If (µ1−µ2)2 = dε2 >> σ2, then different.

→ take d >> σ2/ε2

Variance of estimator?Roughly dσ4.Signal is difference between expecations.

roughly dε2

Signal >> Noise. ↔ dε2 >>√

dσ2.Need d >> σ4/ε4.

Page 27: Modelling. - University of California, Berkeley

Gaussians

Population 1: Gaussion with mean µ1 ∈ Rd , std dev. σ in each dim.Population 2: Gaussion with mean µ2 ∈ Rd , std dev. σ in each dim.

Difference between humans σ per snp.Difference between populations ε per snp.

How many snps to collect to determine population for individual x?x in population 1.

E [(x−µ1)2] = dσ2

E [(x−µ2)2]≥ (d −1)σ2 +(µ1−µ2)

2.

If (µ1−µ2)2 = dε2 >> σ2, then different.

→ take d >> σ2/ε2

Variance of estimator?Roughly dσ4.Signal is difference between expecations.

roughly dε2

Signal >> Noise. ↔ dε2 >>√

dσ2.Need d >> σ4/ε4.

Page 28: Modelling. - University of California, Berkeley

Gaussians

Population 1: Gaussion with mean µ1 ∈ Rd , std dev. σ in each dim.Population 2: Gaussion with mean µ2 ∈ Rd , std dev. σ in each dim.

Difference between humans σ per snp.Difference between populations ε per snp.

How many snps to collect to determine population for individual x?x in population 1.

E [(x−µ1)2] = dσ2

E [(x−µ2)2]≥ (d −1)σ2 +(µ1−µ2)

2.

If (µ1−µ2)2 = dε2 >> σ2, then different.

→ take d >> σ2/ε2

Variance of estimator?Roughly dσ4.Signal is difference between expecations.

roughly dε2

Signal >> Noise. ↔ dε2 >>√

dσ2.Need d >> σ4/ε4.

Page 29: Modelling. - University of California, Berkeley

Gaussians

Population 1: Gaussion with mean µ1 ∈ Rd , std dev. σ in each dim.Population 2: Gaussion with mean µ2 ∈ Rd , std dev. σ in each dim.

Difference between humans σ per snp.Difference between populations ε per snp.

How many snps to collect to determine population for individual x?x in population 1.

E [(x−µ1)2] = dσ2

E [(x−µ2)2]≥ (d −1)σ2 +(µ1−µ2)

2.

If (µ1−µ2)2 = dε2 >> σ2, then different.

→ take d >> σ2/ε2

Variance of estimator?Roughly dσ4.Signal is difference between expecations.

roughly dε2

Signal >> Noise. ↔ dε2 >>√

dσ2.Need d >> σ4/ε4.

Page 30: Modelling. - University of California, Berkeley

Gaussians

Population 1: Gaussion with mean µ1 ∈ Rd , std dev. σ in each dim.Population 2: Gaussion with mean µ2 ∈ Rd , std dev. σ in each dim.

Difference between humans σ per snp.Difference between populations ε per snp.

How many snps to collect to determine population for individual x?x in population 1.

E [(x−µ1)2] = dσ2

E [(x−µ2)2]≥ (d −1)σ2 +(µ1−µ2)

2.

If (µ1−µ2)2 = dε2 >> σ2, then different.

→ take d >> σ2/ε2

Variance of estimator?

Roughly dσ4.Signal is difference between expecations.

roughly dε2

Signal >> Noise. ↔ dε2 >>√

dσ2.Need d >> σ4/ε4.

Page 31: Modelling. - University of California, Berkeley

Gaussians

Population 1: Gaussion with mean µ1 ∈ Rd , std dev. σ in each dim.Population 2: Gaussion with mean µ2 ∈ Rd , std dev. σ in each dim.

Difference between humans σ per snp.Difference between populations ε per snp.

How many snps to collect to determine population for individual x?x in population 1.

E [(x−µ1)2] = dσ2

E [(x−µ2)2]≥ (d −1)σ2 +(µ1−µ2)

2.

If (µ1−µ2)2 = dε2 >> σ2, then different.

→ take d >> σ2/ε2

Variance of estimator?Roughly dσ4.

Signal is difference between expecations.roughly dε2

Signal >> Noise. ↔ dε2 >>√

dσ2.Need d >> σ4/ε4.

Page 32: Modelling. - University of California, Berkeley

Gaussians

Population 1: Gaussion with mean µ1 ∈ Rd , std dev. σ in each dim.Population 2: Gaussion with mean µ2 ∈ Rd , std dev. σ in each dim.

Difference between humans σ per snp.Difference between populations ε per snp.

How many snps to collect to determine population for individual x?x in population 1.

E [(x−µ1)2] = dσ2

E [(x−µ2)2]≥ (d −1)σ2 +(µ1−µ2)

2.

If (µ1−µ2)2 = dε2 >> σ2, then different.

→ take d >> σ2/ε2

Variance of estimator?Roughly dσ4.Signal is difference between expecations.

roughly dε2

Signal >> Noise. ↔ dε2 >>√

dσ2.Need d >> σ4/ε4.

Page 33: Modelling. - University of California, Berkeley

Gaussians

Population 1: Gaussion with mean µ1 ∈ Rd , std dev. σ in each dim.Population 2: Gaussion with mean µ2 ∈ Rd , std dev. σ in each dim.

Difference between humans σ per snp.Difference between populations ε per snp.

How many snps to collect to determine population for individual x?x in population 1.

E [(x−µ1)2] = dσ2

E [(x−µ2)2]≥ (d −1)σ2 +(µ1−µ2)

2.

If (µ1−µ2)2 = dε2 >> σ2, then different.

→ take d >> σ2/ε2

Variance of estimator?Roughly dσ4.Signal is difference between expecations.

roughly dε2

Signal >> Noise. ↔ dε2 >>√

dσ2.Need d >> σ4/ε4.

Page 34: Modelling. - University of California, Berkeley

Gaussians

Population 1: Gaussion with mean µ1 ∈ Rd , std dev. σ in each dim.Population 2: Gaussion with mean µ2 ∈ Rd , std dev. σ in each dim.

Difference between humans σ per snp.Difference between populations ε per snp.

How many snps to collect to determine population for individual x?x in population 1.

E [(x−µ1)2] = dσ2

E [(x−µ2)2]≥ (d −1)σ2 +(µ1−µ2)

2.

If (µ1−µ2)2 = dε2 >> σ2, then different.

→ take d >> σ2/ε2

Variance of estimator?Roughly dσ4.Signal is difference between expecations.

roughly dε2

Signal >> Noise. ↔ dε2 >>√

dσ2.

Need d >> σ4/ε4.

Page 35: Modelling. - University of California, Berkeley

Gaussians

Population 1: Gaussion with mean µ1 ∈ Rd , std dev. σ in each dim.Population 2: Gaussion with mean µ2 ∈ Rd , std dev. σ in each dim.

Difference between humans σ per snp.Difference between populations ε per snp.

How many snps to collect to determine population for individual x?x in population 1.

E [(x−µ1)2] = dσ2

E [(x−µ2)2]≥ (d −1)σ2 +(µ1−µ2)

2.

If (µ1−µ2)2 = dε2 >> σ2, then different.

→ take d >> σ2/ε2

Variance of estimator?Roughly dσ4.Signal is difference between expecations.

roughly dε2

Signal >> Noise. ↔ dε2 >>√

dσ2.Need d >> σ4/ε4.

Page 36: Modelling. - University of California, Berkeley

Projection

Population 1: Gaussion with mean µ1 ∈ Rd , variance σ in each dim.Population 2: Gaussion with mean µ2 ∈ Rd , variance σ in each dim.

Difference between humans σ per snp.Difference between populations ε per snp.

Project x onto unit vector v in direction µ2−µ1.

E [((x−µ1) ·v)2] = σ2 if x is population 1.

E [((x−µ2) ·v)2]≥ (µ1−µ2)2 if x is population 2.

Std deviation is σ2! versus√

dσ2!

No loss in signal!

dε2 >> σ2.→ d >> σ2/ε2

Versus d >> σ4/ε4.

A quadratic difference in amount of data!

Page 37: Modelling. - University of California, Berkeley

Projection

Population 1: Gaussion with mean µ1 ∈ Rd , variance σ in each dim.Population 2: Gaussion with mean µ2 ∈ Rd , variance σ in each dim.

Difference between humans σ per snp.Difference between populations ε per snp.

Project x onto unit vector v in direction µ2−µ1.

E [((x−µ1) ·v)2] = σ2 if x is population 1.

E [((x−µ2) ·v)2]≥ (µ1−µ2)2 if x is population 2.

Std deviation is σ2! versus√

dσ2!

No loss in signal!

dε2 >> σ2.→ d >> σ2/ε2

Versus d >> σ4/ε4.

A quadratic difference in amount of data!

Page 38: Modelling. - University of California, Berkeley

Projection

Population 1: Gaussion with mean µ1 ∈ Rd , variance σ in each dim.Population 2: Gaussion with mean µ2 ∈ Rd , variance σ in each dim.

Difference between humans σ per snp.Difference between populations ε per snp.

Project x onto unit vector v in direction µ2−µ1.

E [((x−µ1) ·v)2] = σ2 if x is population 1.

E [((x−µ2) ·v)2]≥ (µ1−µ2)2 if x is population 2.

Std deviation is σ2! versus√

dσ2!

No loss in signal!

dε2 >> σ2.→ d >> σ2/ε2

Versus d >> σ4/ε4.

A quadratic difference in amount of data!

Page 39: Modelling. - University of California, Berkeley

Projection

Population 1: Gaussion with mean µ1 ∈ Rd , variance σ in each dim.Population 2: Gaussion with mean µ2 ∈ Rd , variance σ in each dim.

Difference between humans σ per snp.Difference between populations ε per snp.

Project x onto unit vector v in direction µ2−µ1.

E [((x−µ1) ·v)2] = σ2 if x is population 1.

E [((x−µ2) ·v)2]≥ (µ1−µ2)2 if x is population 2.

Std deviation is σ2! versus√

dσ2!

No loss in signal!

dε2 >> σ2.→ d >> σ2/ε2

Versus d >> σ4/ε4.

A quadratic difference in amount of data!

Page 40: Modelling. - University of California, Berkeley

Projection

Population 1: Gaussion with mean µ1 ∈ Rd , variance σ in each dim.Population 2: Gaussion with mean µ2 ∈ Rd , variance σ in each dim.

Difference between humans σ per snp.Difference between populations ε per snp.

Project x onto unit vector v in direction µ2−µ1.

E [((x−µ1) ·v)2] = σ2 if x is population 1.

E [((x−µ2) ·v)2]≥ (µ1−µ2)2 if x is population 2.

Std deviation is σ2! versus√

dσ2!

No loss in signal!

dε2 >> σ2.→ d >> σ2/ε2

Versus d >> σ4/ε4.

A quadratic difference in amount of data!

Page 41: Modelling. - University of California, Berkeley

Projection

Population 1: Gaussion with mean µ1 ∈ Rd , variance σ in each dim.Population 2: Gaussion with mean µ2 ∈ Rd , variance σ in each dim.

Difference between humans σ per snp.Difference between populations ε per snp.

Project x onto unit vector v in direction µ2−µ1.

E [((x−µ1) ·v)2] = σ2 if x is population 1.

E [((x−µ2) ·v)2]≥ (µ1−µ2)2 if x is population 2.

Std deviation is σ2!

versus√

dσ2!

No loss in signal!

dε2 >> σ2.→ d >> σ2/ε2

Versus d >> σ4/ε4.

A quadratic difference in amount of data!

Page 42: Modelling. - University of California, Berkeley

Projection

Population 1: Gaussion with mean µ1 ∈ Rd , variance σ in each dim.Population 2: Gaussion with mean µ2 ∈ Rd , variance σ in each dim.

Difference between humans σ per snp.Difference between populations ε per snp.

Project x onto unit vector v in direction µ2−µ1.

E [((x−µ1) ·v)2] = σ2 if x is population 1.

E [((x−µ2) ·v)2]≥ (µ1−µ2)2 if x is population 2.

Std deviation is σ2! versus√

dσ2!

No loss in signal!

dε2 >> σ2.→ d >> σ2/ε2

Versus d >> σ4/ε4.

A quadratic difference in amount of data!

Page 43: Modelling. - University of California, Berkeley

Projection

Population 1: Gaussion with mean µ1 ∈ Rd , variance σ in each dim.Population 2: Gaussion with mean µ2 ∈ Rd , variance σ in each dim.

Difference between humans σ per snp.Difference between populations ε per snp.

Project x onto unit vector v in direction µ2−µ1.

E [((x−µ1) ·v)2] = σ2 if x is population 1.

E [((x−µ2) ·v)2]≥ (µ1−µ2)2 if x is population 2.

Std deviation is σ2! versus√

dσ2!

No loss in signal!

dε2 >> σ2.→ d >> σ2/ε2

Versus d >> σ4/ε4.

A quadratic difference in amount of data!

Page 44: Modelling. - University of California, Berkeley

Projection

Population 1: Gaussion with mean µ1 ∈ Rd , variance σ in each dim.Population 2: Gaussion with mean µ2 ∈ Rd , variance σ in each dim.

Difference between humans σ per snp.Difference between populations ε per snp.

Project x onto unit vector v in direction µ2−µ1.

E [((x−µ1) ·v)2] = σ2 if x is population 1.

E [((x−µ2) ·v)2]≥ (µ1−µ2)2 if x is population 2.

Std deviation is σ2! versus√

dσ2!

No loss in signal!

dε2 >> σ2.

→ d >> σ2/ε2

Versus d >> σ4/ε4.

A quadratic difference in amount of data!

Page 45: Modelling. - University of California, Berkeley

Projection

Population 1: Gaussion with mean µ1 ∈ Rd , variance σ in each dim.Population 2: Gaussion with mean µ2 ∈ Rd , variance σ in each dim.

Difference between humans σ per snp.Difference between populations ε per snp.

Project x onto unit vector v in direction µ2−µ1.

E [((x−µ1) ·v)2] = σ2 if x is population 1.

E [((x−µ2) ·v)2]≥ (µ1−µ2)2 if x is population 2.

Std deviation is σ2! versus√

dσ2!

No loss in signal!

dε2 >> σ2.→ d >> σ2/ε2

Versus d >> σ4/ε4.

A quadratic difference in amount of data!

Page 46: Modelling. - University of California, Berkeley

Projection

Population 1: Gaussion with mean µ1 ∈ Rd , variance σ in each dim.Population 2: Gaussion with mean µ2 ∈ Rd , variance σ in each dim.

Difference between humans σ per snp.Difference between populations ε per snp.

Project x onto unit vector v in direction µ2−µ1.

E [((x−µ1) ·v)2] = σ2 if x is population 1.

E [((x−µ2) ·v)2]≥ (µ1−µ2)2 if x is population 2.

Std deviation is σ2! versus√

dσ2!

No loss in signal!

dε2 >> σ2.→ d >> σ2/ε2

Versus d >> σ4/ε4.

A quadratic difference in amount of data!

Page 47: Modelling. - University of California, Berkeley

Projection

Population 1: Gaussion with mean µ1 ∈ Rd , variance σ in each dim.Population 2: Gaussion with mean µ2 ∈ Rd , variance σ in each dim.

Difference between humans σ per snp.Difference between populations ε per snp.

Project x onto unit vector v in direction µ2−µ1.

E [((x−µ1) ·v)2] = σ2 if x is population 1.

E [((x−µ2) ·v)2]≥ (µ1−µ2)2 if x is population 2.

Std deviation is σ2! versus√

dσ2!

No loss in signal!

dε2 >> σ2.→ d >> σ2/ε2

Versus d >> σ4/ε4.

A quadratic difference in amount of data!

Page 48: Modelling. - University of California, Berkeley

Don’t know much about...

Don’t know µ1 or µ2?

Page 49: Modelling. - University of California, Berkeley

Without the means?Sample of n people.

Some (say half) from population 1,some from population 2.

Which are which?

Near Neighbors Approach

Compute Euclidean distance squared.Cluster using threshold.

Signal E [d(x1,x2)]−E [d(x1,y1)]should be larger than noise in d(x ,y)Where x ’s from one population, y ’s from other.

Signal is proportional dε2.

Noise is proportional to√

dσ2.

d >> σ4/ε4 → same type people closer to each other.

d >> (σ4/ε4) logn suffices for threshold clustering.

logn factor for union bound over(n

2

)pairs.

Best one can do?

Page 50: Modelling. - University of California, Berkeley

Without the means?Sample of n people.

Some (say half) from population 1,

some from population 2.

Which are which?

Near Neighbors Approach

Compute Euclidean distance squared.Cluster using threshold.

Signal E [d(x1,x2)]−E [d(x1,y1)]should be larger than noise in d(x ,y)Where x ’s from one population, y ’s from other.

Signal is proportional dε2.

Noise is proportional to√

dσ2.

d >> σ4/ε4 → same type people closer to each other.

d >> (σ4/ε4) logn suffices for threshold clustering.

logn factor for union bound over(n

2

)pairs.

Best one can do?

Page 51: Modelling. - University of California, Berkeley

Without the means?Sample of n people.

Some (say half) from population 1,some from population 2.

Which are which?

Near Neighbors Approach

Compute Euclidean distance squared.Cluster using threshold.

Signal E [d(x1,x2)]−E [d(x1,y1)]should be larger than noise in d(x ,y)Where x ’s from one population, y ’s from other.

Signal is proportional dε2.

Noise is proportional to√

dσ2.

d >> σ4/ε4 → same type people closer to each other.

d >> (σ4/ε4) logn suffices for threshold clustering.

logn factor for union bound over(n

2

)pairs.

Best one can do?

Page 52: Modelling. - University of California, Berkeley

Without the means?Sample of n people.

Some (say half) from population 1,some from population 2.

Which are which?

Near Neighbors Approach

Compute Euclidean distance squared.Cluster using threshold.

Signal E [d(x1,x2)]−E [d(x1,y1)]should be larger than noise in d(x ,y)Where x ’s from one population, y ’s from other.

Signal is proportional dε2.

Noise is proportional to√

dσ2.

d >> σ4/ε4 → same type people closer to each other.

d >> (σ4/ε4) logn suffices for threshold clustering.

logn factor for union bound over(n

2

)pairs.

Best one can do?

Page 53: Modelling. - University of California, Berkeley

Without the means?Sample of n people.

Some (say half) from population 1,some from population 2.

Which are which?

Near Neighbors Approach

Compute Euclidean distance squared.Cluster using threshold.

Signal E [d(x1,x2)]−E [d(x1,y1)]should be larger than noise in d(x ,y)Where x ’s from one population, y ’s from other.

Signal is proportional dε2.

Noise is proportional to√

dσ2.

d >> σ4/ε4 → same type people closer to each other.

d >> (σ4/ε4) logn suffices for threshold clustering.

logn factor for union bound over(n

2

)pairs.

Best one can do?

Page 54: Modelling. - University of California, Berkeley

Without the means?Sample of n people.

Some (say half) from population 1,some from population 2.

Which are which?

Near Neighbors Approach

Compute Euclidean distance squared.

Cluster using threshold.

Signal E [d(x1,x2)]−E [d(x1,y1)]should be larger than noise in d(x ,y)Where x ’s from one population, y ’s from other.

Signal is proportional dε2.

Noise is proportional to√

dσ2.

d >> σ4/ε4 → same type people closer to each other.

d >> (σ4/ε4) logn suffices for threshold clustering.

logn factor for union bound over(n

2

)pairs.

Best one can do?

Page 55: Modelling. - University of California, Berkeley

Without the means?Sample of n people.

Some (say half) from population 1,some from population 2.

Which are which?

Near Neighbors Approach

Compute Euclidean distance squared.Cluster using threshold.

Signal E [d(x1,x2)]−E [d(x1,y1)]should be larger than noise in d(x ,y)Where x ’s from one population, y ’s from other.

Signal is proportional dε2.

Noise is proportional to√

dσ2.

d >> σ4/ε4 → same type people closer to each other.

d >> (σ4/ε4) logn suffices for threshold clustering.

logn factor for union bound over(n

2

)pairs.

Best one can do?

Page 56: Modelling. - University of California, Berkeley

Without the means?Sample of n people.

Some (say half) from population 1,some from population 2.

Which are which?

Near Neighbors Approach

Compute Euclidean distance squared.Cluster using threshold.

Signal E [d(x1,x2)]−E [d(x1,y1)]

should be larger than noise in d(x ,y)Where x ’s from one population, y ’s from other.

Signal is proportional dε2.

Noise is proportional to√

dσ2.

d >> σ4/ε4 → same type people closer to each other.

d >> (σ4/ε4) logn suffices for threshold clustering.

logn factor for union bound over(n

2

)pairs.

Best one can do?

Page 57: Modelling. - University of California, Berkeley

Without the means?Sample of n people.

Some (say half) from population 1,some from population 2.

Which are which?

Near Neighbors Approach

Compute Euclidean distance squared.Cluster using threshold.

Signal E [d(x1,x2)]−E [d(x1,y1)]should be larger than noise in d(x ,y)

Where x ’s from one population, y ’s from other.

Signal is proportional dε2.

Noise is proportional to√

dσ2.

d >> σ4/ε4 → same type people closer to each other.

d >> (σ4/ε4) logn suffices for threshold clustering.

logn factor for union bound over(n

2

)pairs.

Best one can do?

Page 58: Modelling. - University of California, Berkeley

Without the means?Sample of n people.

Some (say half) from population 1,some from population 2.

Which are which?

Near Neighbors Approach

Compute Euclidean distance squared.Cluster using threshold.

Signal E [d(x1,x2)]−E [d(x1,y1)]should be larger than noise in d(x ,y)Where x ’s from one population, y ’s from other.

Signal is proportional dε2.

Noise is proportional to√

dσ2.

d >> σ4/ε4 → same type people closer to each other.

d >> (σ4/ε4) logn suffices for threshold clustering.

logn factor for union bound over(n

2

)pairs.

Best one can do?

Page 59: Modelling. - University of California, Berkeley

Without the means?Sample of n people.

Some (say half) from population 1,some from population 2.

Which are which?

Near Neighbors Approach

Compute Euclidean distance squared.Cluster using threshold.

Signal E [d(x1,x2)]−E [d(x1,y1)]should be larger than noise in d(x ,y)Where x ’s from one population, y ’s from other.

Signal is proportional dε2.

Noise is proportional to√

dσ2.

d >> σ4/ε4 → same type people closer to each other.

d >> (σ4/ε4) logn suffices for threshold clustering.

logn factor for union bound over(n

2

)pairs.

Best one can do?

Page 60: Modelling. - University of California, Berkeley

Without the means?Sample of n people.

Some (say half) from population 1,some from population 2.

Which are which?

Near Neighbors Approach

Compute Euclidean distance squared.Cluster using threshold.

Signal E [d(x1,x2)]−E [d(x1,y1)]should be larger than noise in d(x ,y)Where x ’s from one population, y ’s from other.

Signal is proportional dε2.

Noise is proportional to√

dσ2.

d >> σ4/ε4 → same type people closer to each other.

d >> (σ4/ε4) logn suffices for threshold clustering.

logn factor for union bound over(n

2

)pairs.

Best one can do?

Page 61: Modelling. - University of California, Berkeley

Without the means?Sample of n people.

Some (say half) from population 1,some from population 2.

Which are which?

Near Neighbors Approach

Compute Euclidean distance squared.Cluster using threshold.

Signal E [d(x1,x2)]−E [d(x1,y1)]should be larger than noise in d(x ,y)Where x ’s from one population, y ’s from other.

Signal is proportional dε2.

Noise is proportional to√

dσ2.

d >> σ4/ε4 → same type people closer to each other.

d >> (σ4/ε4) logn suffices for threshold clustering.

logn factor for union bound over(n

2

)pairs.

Best one can do?

Page 62: Modelling. - University of California, Berkeley

Without the means?Sample of n people.

Some (say half) from population 1,some from population 2.

Which are which?

Near Neighbors Approach

Compute Euclidean distance squared.Cluster using threshold.

Signal E [d(x1,x2)]−E [d(x1,y1)]should be larger than noise in d(x ,y)Where x ’s from one population, y ’s from other.

Signal is proportional dε2.

Noise is proportional to√

dσ2.

d >> σ4/ε4 → same type people closer to each other.

d >> (σ4/ε4) logn suffices for threshold clustering.

logn factor for union bound over(n

2

)pairs.

Best one can do?

Page 63: Modelling. - University of California, Berkeley

Without the means?Sample of n people.

Some (say half) from population 1,some from population 2.

Which are which?

Near Neighbors Approach

Compute Euclidean distance squared.Cluster using threshold.

Signal E [d(x1,x2)]−E [d(x1,y1)]should be larger than noise in d(x ,y)Where x ’s from one population, y ’s from other.

Signal is proportional dε2.

Noise is proportional to√

dσ2.

d >> σ4/ε4 → same type people closer to each other.

d >> (σ4/ε4) logn suffices for threshold clustering.

logn factor for union bound over(n

2

)pairs.

Best one can do?

Page 64: Modelling. - University of California, Berkeley

Without the means?Sample of n people.

Some (say half) from population 1,some from population 2.

Which are which?

Near Neighbors Approach

Compute Euclidean distance squared.Cluster using threshold.

Signal E [d(x1,x2)]−E [d(x1,y1)]should be larger than noise in d(x ,y)Where x ’s from one population, y ’s from other.

Signal is proportional dε2.

Noise is proportional to√

dσ2.

d >> σ4/ε4 → same type people closer to each other.

d >> (σ4/ε4) logn suffices for threshold clustering.

logn factor for union bound over(n

2

)pairs.

Best one can do?

Page 65: Modelling. - University of California, Berkeley

Principal components analysis.Remember Projection!

Don’t know µ1 or µ2?

Principal component analysis:Find direction, v , of maximum variance.Maximize ∑(x ·v)2 (zero center the points)

Recall: (x ·v)2 could determine population.

Typical direction variance. nσ2.Direction along µ1−µ2,

∝ n(µ1−µ2)2.

∝ ndε2.

Need d >> σ2/ε2 at least.

When will PCA pick correct direction with good probability?

Union bound over directions. How many directions?

Infinity and beyond!

Page 66: Modelling. - University of California, Berkeley

Principal components analysis.Remember Projection!

Don’t know µ1 or µ2?

Principal component analysis:Find direction, v , of maximum variance.Maximize ∑(x ·v)2 (zero center the points)

Recall: (x ·v)2 could determine population.

Typical direction variance. nσ2.Direction along µ1−µ2,

∝ n(µ1−µ2)2.

∝ ndε2.

Need d >> σ2/ε2 at least.

When will PCA pick correct direction with good probability?

Union bound over directions. How many directions?

Infinity and beyond!

Page 67: Modelling. - University of California, Berkeley

Principal components analysis.Remember Projection!

Don’t know µ1 or µ2?

Principal component analysis:Find direction, v , of maximum variance.Maximize ∑(x ·v)2 (zero center the points)

Recall: (x ·v)2 could determine population.

Typical direction variance. nσ2.Direction along µ1−µ2,

∝ n(µ1−µ2)2.

∝ ndε2.

Need d >> σ2/ε2 at least.

When will PCA pick correct direction with good probability?

Union bound over directions. How many directions?

Infinity and beyond!

Page 68: Modelling. - University of California, Berkeley

Principal components analysis.Remember Projection!

Don’t know µ1 or µ2?

Principal component analysis:

Find direction, v , of maximum variance.Maximize ∑(x ·v)2 (zero center the points)

Recall: (x ·v)2 could determine population.

Typical direction variance. nσ2.Direction along µ1−µ2,

∝ n(µ1−µ2)2.

∝ ndε2.

Need d >> σ2/ε2 at least.

When will PCA pick correct direction with good probability?

Union bound over directions. How many directions?

Infinity and beyond!

Page 69: Modelling. - University of California, Berkeley

Principal components analysis.Remember Projection!

Don’t know µ1 or µ2?

Principal component analysis:Find direction, v , of maximum variance.

Maximize ∑(x ·v)2 (zero center the points)Recall: (x ·v)2 could determine population.

Typical direction variance. nσ2.Direction along µ1−µ2,

∝ n(µ1−µ2)2.

∝ ndε2.

Need d >> σ2/ε2 at least.

When will PCA pick correct direction with good probability?

Union bound over directions. How many directions?

Infinity and beyond!

Page 70: Modelling. - University of California, Berkeley

Principal components analysis.Remember Projection!

Don’t know µ1 or µ2?

Principal component analysis:Find direction, v , of maximum variance.Maximize ∑(x ·v)2 (zero center the points)

Recall: (x ·v)2 could determine population.

Typical direction variance. nσ2.Direction along µ1−µ2,

∝ n(µ1−µ2)2.

∝ ndε2.

Need d >> σ2/ε2 at least.

When will PCA pick correct direction with good probability?

Union bound over directions. How many directions?

Infinity and beyond!

Page 71: Modelling. - University of California, Berkeley

Principal components analysis.Remember Projection!

Don’t know µ1 or µ2?

Principal component analysis:Find direction, v , of maximum variance.Maximize ∑(x ·v)2 (zero center the points)

Recall: (x ·v)2 could determine population.

Typical direction variance. nσ2.Direction along µ1−µ2,

∝ n(µ1−µ2)2.

∝ ndε2.

Need d >> σ2/ε2 at least.

When will PCA pick correct direction with good probability?

Union bound over directions. How many directions?

Infinity and beyond!

Page 72: Modelling. - University of California, Berkeley

Principal components analysis.Remember Projection!

Don’t know µ1 or µ2?

Principal component analysis:Find direction, v , of maximum variance.Maximize ∑(x ·v)2 (zero center the points)

Recall: (x ·v)2 could determine population.

Typical direction variance. nσ2.

Direction along µ1−µ2,∝ n(µ1−µ2)

2.∝ ndε2.

Need d >> σ2/ε2 at least.

When will PCA pick correct direction with good probability?

Union bound over directions. How many directions?

Infinity and beyond!

Page 73: Modelling. - University of California, Berkeley

Principal components analysis.Remember Projection!

Don’t know µ1 or µ2?

Principal component analysis:Find direction, v , of maximum variance.Maximize ∑(x ·v)2 (zero center the points)

Recall: (x ·v)2 could determine population.

Typical direction variance. nσ2.Direction along µ1−µ2,

∝ n(µ1−µ2)2.

∝ ndε2.

Need d >> σ2/ε2 at least.

When will PCA pick correct direction with good probability?

Union bound over directions. How many directions?

Infinity and beyond!

Page 74: Modelling. - University of California, Berkeley

Principal components analysis.Remember Projection!

Don’t know µ1 or µ2?

Principal component analysis:Find direction, v , of maximum variance.Maximize ∑(x ·v)2 (zero center the points)

Recall: (x ·v)2 could determine population.

Typical direction variance. nσ2.Direction along µ1−µ2,

∝ n(µ1−µ2)2.

∝ ndε2.

Need d >> σ2/ε2 at least.

When will PCA pick correct direction with good probability?

Union bound over directions. How many directions?

Infinity and beyond!

Page 75: Modelling. - University of California, Berkeley

Principal components analysis.Remember Projection!

Don’t know µ1 or µ2?

Principal component analysis:Find direction, v , of maximum variance.Maximize ∑(x ·v)2 (zero center the points)

Recall: (x ·v)2 could determine population.

Typical direction variance. nσ2.Direction along µ1−µ2,

∝ n(µ1−µ2)2.

∝ ndε2.

Need d >> σ2/ε2 at least.

When will PCA pick correct direction with good probability?

Union bound over directions. How many directions?

Infinity and beyond!

Page 76: Modelling. - University of California, Berkeley

Principal components analysis.Remember Projection!

Don’t know µ1 or µ2?

Principal component analysis:Find direction, v , of maximum variance.Maximize ∑(x ·v)2 (zero center the points)

Recall: (x ·v)2 could determine population.

Typical direction variance. nσ2.Direction along µ1−µ2,

∝ n(µ1−µ2)2.

∝ ndε2.

Need d >> σ2/ε2 at least.

When will PCA pick correct direction with good probability?

Union bound over directions. How many directions?

Infinity and beyond!

Page 77: Modelling. - University of California, Berkeley

Principal components analysis.Remember Projection!

Don’t know µ1 or µ2?

Principal component analysis:Find direction, v , of maximum variance.Maximize ∑(x ·v)2 (zero center the points)

Recall: (x ·v)2 could determine population.

Typical direction variance. nσ2.Direction along µ1−µ2,

∝ n(µ1−µ2)2.

∝ ndε2.

Need d >> σ2/ε2 at least.

When will PCA pick correct direction with good probability?

Union bound over directions. How many directions?

Infinity and beyond!

Page 78: Modelling. - University of California, Berkeley

Principal components analysis.Remember Projection!

Don’t know µ1 or µ2?

Principal component analysis:Find direction, v , of maximum variance.Maximize ∑(x ·v)2 (zero center the points)

Recall: (x ·v)2 could determine population.

Typical direction variance. nσ2.Direction along µ1−µ2,

∝ n(µ1−µ2)2.

∝ ndε2.

Need d >> σ2/ε2 at least.

When will PCA pick correct direction with good probability?

Union bound over directions.

How many directions?

Infinity and beyond!

Page 79: Modelling. - University of California, Berkeley

Principal components analysis.Remember Projection!

Don’t know µ1 or µ2?

Principal component analysis:Find direction, v , of maximum variance.Maximize ∑(x ·v)2 (zero center the points)

Recall: (x ·v)2 could determine population.

Typical direction variance. nσ2.Direction along µ1−µ2,

∝ n(µ1−µ2)2.

∝ ndε2.

Need d >> σ2/ε2 at least.

When will PCA pick correct direction with good probability?

Union bound over directions. How many directions?

Infinity and beyond!

Page 80: Modelling. - University of California, Berkeley

Principal components analysis.Remember Projection!

Don’t know µ1 or µ2?

Principal component analysis:Find direction, v , of maximum variance.Maximize ∑(x ·v)2 (zero center the points)

Recall: (x ·v)2 could determine population.

Typical direction variance. nσ2.Direction along µ1−µ2,

∝ n(µ1−µ2)2.

∝ ndε2.

Need d >> σ2/ε2 at least.

When will PCA pick correct direction with good probability?

Union bound over directions. How many directions?

Infinity

and beyond!

Page 81: Modelling. - University of California, Berkeley

Principal components analysis.Remember Projection!

Don’t know µ1 or µ2?

Principal component analysis:Find direction, v , of maximum variance.Maximize ∑(x ·v)2 (zero center the points)

Recall: (x ·v)2 could determine population.

Typical direction variance. nσ2.Direction along µ1−µ2,

∝ n(µ1−µ2)2.

∝ ndε2.

Need d >> σ2/ε2 at least.

When will PCA pick correct direction with good probability?

Union bound over directions. How many directions?

Infinity and beyond!

Page 82: Modelling. - University of California, Berkeley

Nets“δ - Net”.

Set D of directionswhere all others, v , are close to x ∈D .

x ·v ≥ 1−δ .

δ - Net:[· · · , iδ/d , · · · ] integers i ∈ [−d/δ ,dδ ].

Total of N ∝

(dδ

)O(d)vectors in net.

Signal >> Noise times logN = O(d log dδ) to isolate direction.

logN is due to union bound over vectors in net.

Signal (exp. projection): ∝ ndε2.Noise (std dev.):

√nσ2.

nd >> (σ4/ε4) logd and d >> σ2/ε2 works.

Nearest neighbor works with very high d > σ4/ε4.

PCA can reduce d to “knowing centers” case, with reasonablenumber of sample points.

Page 83: Modelling. - University of California, Berkeley

Nets“δ - Net”.

Set D of directions

where all others, v , are close to x ∈D .x ·v ≥ 1−δ .

δ - Net:[· · · , iδ/d , · · · ] integers i ∈ [−d/δ ,dδ ].

Total of N ∝

(dδ

)O(d)vectors in net.

Signal >> Noise times logN = O(d log dδ) to isolate direction.

logN is due to union bound over vectors in net.

Signal (exp. projection): ∝ ndε2.Noise (std dev.):

√nσ2.

nd >> (σ4/ε4) logd and d >> σ2/ε2 works.

Nearest neighbor works with very high d > σ4/ε4.

PCA can reduce d to “knowing centers” case, with reasonablenumber of sample points.

Page 84: Modelling. - University of California, Berkeley

Nets“δ - Net”.

Set D of directionswhere all others, v , are close to x ∈D .

x ·v ≥ 1−δ .

δ - Net:[· · · , iδ/d , · · · ] integers i ∈ [−d/δ ,dδ ].

Total of N ∝

(dδ

)O(d)vectors in net.

Signal >> Noise times logN = O(d log dδ) to isolate direction.

logN is due to union bound over vectors in net.

Signal (exp. projection): ∝ ndε2.Noise (std dev.):

√nσ2.

nd >> (σ4/ε4) logd and d >> σ2/ε2 works.

Nearest neighbor works with very high d > σ4/ε4.

PCA can reduce d to “knowing centers” case, with reasonablenumber of sample points.

Page 85: Modelling. - University of California, Berkeley

Nets“δ - Net”.

Set D of directionswhere all others, v , are close to x ∈D .

x ·v ≥ 1−δ .

δ - Net:[· · · , iδ/d , · · · ] integers i ∈ [−d/δ ,dδ ].

Total of N ∝

(dδ

)O(d)vectors in net.

Signal >> Noise times logN = O(d log dδ) to isolate direction.

logN is due to union bound over vectors in net.

Signal (exp. projection): ∝ ndε2.Noise (std dev.):

√nσ2.

nd >> (σ4/ε4) logd and d >> σ2/ε2 works.

Nearest neighbor works with very high d > σ4/ε4.

PCA can reduce d to “knowing centers” case, with reasonablenumber of sample points.

Page 86: Modelling. - University of California, Berkeley

Nets“δ - Net”.

Set D of directionswhere all others, v , are close to x ∈D .

x ·v ≥ 1−δ .

δ - Net:

[· · · , iδ/d , · · · ] integers i ∈ [−d/δ ,dδ ].

Total of N ∝

(dδ

)O(d)vectors in net.

Signal >> Noise times logN = O(d log dδ) to isolate direction.

logN is due to union bound over vectors in net.

Signal (exp. projection): ∝ ndε2.Noise (std dev.):

√nσ2.

nd >> (σ4/ε4) logd and d >> σ2/ε2 works.

Nearest neighbor works with very high d > σ4/ε4.

PCA can reduce d to “knowing centers” case, with reasonablenumber of sample points.

Page 87: Modelling. - University of California, Berkeley

Nets“δ - Net”.

Set D of directionswhere all others, v , are close to x ∈D .

x ·v ≥ 1−δ .

δ - Net:[· · · , iδ/d , · · · ] integers i ∈ [−d/δ ,dδ ].

Total of N ∝

(dδ

)O(d)vectors in net.

Signal >> Noise times logN = O(d log dδ) to isolate direction.

logN is due to union bound over vectors in net.

Signal (exp. projection): ∝ ndε2.Noise (std dev.):

√nσ2.

nd >> (σ4/ε4) logd and d >> σ2/ε2 works.

Nearest neighbor works with very high d > σ4/ε4.

PCA can reduce d to “knowing centers” case, with reasonablenumber of sample points.

Page 88: Modelling. - University of California, Berkeley

Nets“δ - Net”.

Set D of directionswhere all others, v , are close to x ∈D .

x ·v ≥ 1−δ .

δ - Net:[· · · , iδ/d , · · · ] integers i ∈ [−d/δ ,dδ ].

Total of N ∝

(dδ

)O(d)vectors in net.

Signal >> Noise times logN = O(d log dδ) to isolate direction.

logN is due to union bound over vectors in net.

Signal (exp. projection): ∝ ndε2.Noise (std dev.):

√nσ2.

nd >> (σ4/ε4) logd and d >> σ2/ε2 works.

Nearest neighbor works with very high d > σ4/ε4.

PCA can reduce d to “knowing centers” case, with reasonablenumber of sample points.

Page 89: Modelling. - University of California, Berkeley

Nets“δ - Net”.

Set D of directionswhere all others, v , are close to x ∈D .

x ·v ≥ 1−δ .

δ - Net:[· · · , iδ/d , · · · ] integers i ∈ [−d/δ ,dδ ].

Total of N ∝

(dδ

)O(d)vectors in net.

Signal >> Noise times logN = O(d log dδ) to isolate direction.

logN is due to union bound over vectors in net.

Signal (exp. projection): ∝ ndε2.Noise (std dev.):

√nσ2.

nd >> (σ4/ε4) logd and d >> σ2/ε2 works.

Nearest neighbor works with very high d > σ4/ε4.

PCA can reduce d to “knowing centers” case, with reasonablenumber of sample points.

Page 90: Modelling. - University of California, Berkeley

Nets“δ - Net”.

Set D of directionswhere all others, v , are close to x ∈D .

x ·v ≥ 1−δ .

δ - Net:[· · · , iδ/d , · · · ] integers i ∈ [−d/δ ,dδ ].

Total of N ∝

(dδ

)O(d)vectors in net.

Signal >> Noise times logN = O(d log dδ) to isolate direction.

logN is due to union bound over vectors in net.

Signal (exp. projection): ∝ ndε2.Noise (std dev.):

√nσ2.

nd >> (σ4/ε4) logd and d >> σ2/ε2 works.

Nearest neighbor works with very high d > σ4/ε4.

PCA can reduce d to “knowing centers” case, with reasonablenumber of sample points.

Page 91: Modelling. - University of California, Berkeley

Nets“δ - Net”.

Set D of directionswhere all others, v , are close to x ∈D .

x ·v ≥ 1−δ .

δ - Net:[· · · , iδ/d , · · · ] integers i ∈ [−d/δ ,dδ ].

Total of N ∝

(dδ

)O(d)vectors in net.

Signal >> Noise times logN = O(d log dδ) to isolate direction.

logN is due to union bound over vectors in net.

Signal (exp. projection): ∝ ndε2.

Noise (std dev.):√

nσ2.

nd >> (σ4/ε4) logd and d >> σ2/ε2 works.

Nearest neighbor works with very high d > σ4/ε4.

PCA can reduce d to “knowing centers” case, with reasonablenumber of sample points.

Page 92: Modelling. - University of California, Berkeley

Nets“δ - Net”.

Set D of directionswhere all others, v , are close to x ∈D .

x ·v ≥ 1−δ .

δ - Net:[· · · , iδ/d , · · · ] integers i ∈ [−d/δ ,dδ ].

Total of N ∝

(dδ

)O(d)vectors in net.

Signal >> Noise times logN = O(d log dδ) to isolate direction.

logN is due to union bound over vectors in net.

Signal (exp. projection): ∝ ndε2.Noise (std dev.):

√nσ2.

nd >> (σ4/ε4) logd and d >> σ2/ε2 works.

Nearest neighbor works with very high d > σ4/ε4.

PCA can reduce d to “knowing centers” case, with reasonablenumber of sample points.

Page 93: Modelling. - University of California, Berkeley

Nets“δ - Net”.

Set D of directionswhere all others, v , are close to x ∈D .

x ·v ≥ 1−δ .

δ - Net:[· · · , iδ/d , · · · ] integers i ∈ [−d/δ ,dδ ].

Total of N ∝

(dδ

)O(d)vectors in net.

Signal >> Noise times logN = O(d log dδ) to isolate direction.

logN is due to union bound over vectors in net.

Signal (exp. projection): ∝ ndε2.Noise (std dev.):

√nσ2.

nd >> (σ4/ε4) logd and d >> σ2/ε2 works.

Nearest neighbor works with very high d > σ4/ε4.

PCA can reduce d to “knowing centers” case, with reasonablenumber of sample points.

Page 94: Modelling. - University of California, Berkeley

Nets“δ - Net”.

Set D of directionswhere all others, v , are close to x ∈D .

x ·v ≥ 1−δ .

δ - Net:[· · · , iδ/d , · · · ] integers i ∈ [−d/δ ,dδ ].

Total of N ∝

(dδ

)O(d)vectors in net.

Signal >> Noise times logN = O(d log dδ) to isolate direction.

logN is due to union bound over vectors in net.

Signal (exp. projection): ∝ ndε2.Noise (std dev.):

√nσ2.

nd >> (σ4/ε4) logd and d >> σ2/ε2 works.

Nearest neighbor works with very high d > σ4/ε4.

PCA can reduce d to “knowing centers” case, with reasonablenumber of sample points.

Page 95: Modelling. - University of California, Berkeley

Nets“δ - Net”.

Set D of directionswhere all others, v , are close to x ∈D .

x ·v ≥ 1−δ .

δ - Net:[· · · , iδ/d , · · · ] integers i ∈ [−d/δ ,dδ ].

Total of N ∝

(dδ

)O(d)vectors in net.

Signal >> Noise times logN = O(d log dδ) to isolate direction.

logN is due to union bound over vectors in net.

Signal (exp. projection): ∝ ndε2.Noise (std dev.):

√nσ2.

nd >> (σ4/ε4) logd and d >> σ2/ε2 works.

Nearest neighbor works with very high d > σ4/ε4.

PCA can reduce d to “knowing centers” case, with reasonablenumber of sample points.

Page 96: Modelling. - University of California, Berkeley

PCA calculation.

Matrix A where rows are points.

First eigenvector of B = AT A is maximum variance direction.

Av are projections onto v .vBv = (vA)T (Av) is ∑x (x ·v)2.

First eigenvector, v , of B maximizes xT Bx .

Bv = λv for maximum λ .→ vBv = λ for unit v .

Eigenvectors form orthonormal basis.

Any other vector av +x , x ·v = 0x is composed of possibly smaller eigenvalue vectors.

→ vBv ≥ (av +x)B(av +x) for unit v , av +x .

Page 97: Modelling. - University of California, Berkeley

PCA calculation.

Matrix A where rows are points.

First eigenvector of B = AT A is maximum variance direction.

Av are projections onto v .vBv = (vA)T (Av) is ∑x (x ·v)2.

First eigenvector, v , of B maximizes xT Bx .

Bv = λv for maximum λ .→ vBv = λ for unit v .

Eigenvectors form orthonormal basis.

Any other vector av +x , x ·v = 0x is composed of possibly smaller eigenvalue vectors.

→ vBv ≥ (av +x)B(av +x) for unit v , av +x .

Page 98: Modelling. - University of California, Berkeley

PCA calculation.

Matrix A where rows are points.

First eigenvector of B = AT A is maximum variance direction.

Av are projections onto v .

vBv = (vA)T (Av) is ∑x (x ·v)2.

First eigenvector, v , of B maximizes xT Bx .

Bv = λv for maximum λ .→ vBv = λ for unit v .

Eigenvectors form orthonormal basis.

Any other vector av +x , x ·v = 0x is composed of possibly smaller eigenvalue vectors.

→ vBv ≥ (av +x)B(av +x) for unit v , av +x .

Page 99: Modelling. - University of California, Berkeley

PCA calculation.

Matrix A where rows are points.

First eigenvector of B = AT A is maximum variance direction.

Av are projections onto v .vBv = (vA)T (Av) is ∑x (x ·v)2.

First eigenvector, v , of B maximizes xT Bx .

Bv = λv for maximum λ .→ vBv = λ for unit v .

Eigenvectors form orthonormal basis.

Any other vector av +x , x ·v = 0x is composed of possibly smaller eigenvalue vectors.

→ vBv ≥ (av +x)B(av +x) for unit v , av +x .

Page 100: Modelling. - University of California, Berkeley

PCA calculation.

Matrix A where rows are points.

First eigenvector of B = AT A is maximum variance direction.

Av are projections onto v .vBv = (vA)T (Av) is ∑x (x ·v)2.

First eigenvector, v , of B maximizes xT Bx .

Bv = λv for maximum λ .→ vBv = λ for unit v .

Eigenvectors form orthonormal basis.

Any other vector av +x , x ·v = 0x is composed of possibly smaller eigenvalue vectors.

→ vBv ≥ (av +x)B(av +x) for unit v , av +x .

Page 101: Modelling. - University of California, Berkeley

PCA calculation.

Matrix A where rows are points.

First eigenvector of B = AT A is maximum variance direction.

Av are projections onto v .vBv = (vA)T (Av) is ∑x (x ·v)2.

First eigenvector, v , of B maximizes xT Bx .

Bv = λv for maximum λ .→ vBv = λ for unit v .

Eigenvectors form orthonormal basis.

Any other vector av +x , x ·v = 0x is composed of possibly smaller eigenvalue vectors.

→ vBv ≥ (av +x)B(av +x) for unit v , av +x .

Page 102: Modelling. - University of California, Berkeley

PCA calculation.

Matrix A where rows are points.

First eigenvector of B = AT A is maximum variance direction.

Av are projections onto v .vBv = (vA)T (Av) is ∑x (x ·v)2.

First eigenvector, v , of B maximizes xT Bx .

Bv = λv for maximum λ .

→ vBv = λ for unit v .

Eigenvectors form orthonormal basis.

Any other vector av +x , x ·v = 0x is composed of possibly smaller eigenvalue vectors.

→ vBv ≥ (av +x)B(av +x) for unit v , av +x .

Page 103: Modelling. - University of California, Berkeley

PCA calculation.

Matrix A where rows are points.

First eigenvector of B = AT A is maximum variance direction.

Av are projections onto v .vBv = (vA)T (Av) is ∑x (x ·v)2.

First eigenvector, v , of B maximizes xT Bx .

Bv = λv for maximum λ .→ vBv = λ for unit v .

Eigenvectors form orthonormal basis.

Any other vector av +x , x ·v = 0x is composed of possibly smaller eigenvalue vectors.

→ vBv ≥ (av +x)B(av +x) for unit v , av +x .

Page 104: Modelling. - University of California, Berkeley

PCA calculation.

Matrix A where rows are points.

First eigenvector of B = AT A is maximum variance direction.

Av are projections onto v .vBv = (vA)T (Av) is ∑x (x ·v)2.

First eigenvector, v , of B maximizes xT Bx .

Bv = λv for maximum λ .→ vBv = λ for unit v .

Eigenvectors form orthonormal basis.

Any other vector av +x , x ·v = 0x is composed of possibly smaller eigenvalue vectors.

→ vBv ≥ (av +x)B(av +x) for unit v , av +x .

Page 105: Modelling. - University of California, Berkeley

PCA calculation.

Matrix A where rows are points.

First eigenvector of B = AT A is maximum variance direction.

Av are projections onto v .vBv = (vA)T (Av) is ∑x (x ·v)2.

First eigenvector, v , of B maximizes xT Bx .

Bv = λv for maximum λ .→ vBv = λ for unit v .

Eigenvectors form orthonormal basis.

Any other vector av +x , x ·v = 0x is composed of possibly smaller eigenvalue vectors.

→ vBv ≥ (av +x)B(av +x) for unit v , av +x .

Page 106: Modelling. - University of California, Berkeley

PCA calculation.

Matrix A where rows are points.

First eigenvector of B = AT A is maximum variance direction.

Av are projections onto v .vBv = (vA)T (Av) is ∑x (x ·v)2.

First eigenvector, v , of B maximizes xT Bx .

Bv = λv for maximum λ .→ vBv = λ for unit v .

Eigenvectors form orthonormal basis.

Any other vector av +x , x ·v = 0

x is composed of possibly smaller eigenvalue vectors.

→ vBv ≥ (av +x)B(av +x) for unit v , av +x .

Page 107: Modelling. - University of California, Berkeley

PCA calculation.

Matrix A where rows are points.

First eigenvector of B = AT A is maximum variance direction.

Av are projections onto v .vBv = (vA)T (Av) is ∑x (x ·v)2.

First eigenvector, v , of B maximizes xT Bx .

Bv = λv for maximum λ .→ vBv = λ for unit v .

Eigenvectors form orthonormal basis.

Any other vector av +x , x ·v = 0x is composed of possibly smaller eigenvalue vectors.

→ vBv ≥ (av +x)B(av +x) for unit v , av +x .

Page 108: Modelling. - University of California, Berkeley

PCA calculation.

Matrix A where rows are points.

First eigenvector of B = AT A is maximum variance direction.

Av are projections onto v .vBv = (vA)T (Av) is ∑x (x ·v)2.

First eigenvector, v , of B maximizes xT Bx .

Bv = λv for maximum λ .→ vBv = λ for unit v .

Eigenvectors form orthonormal basis.

Any other vector av +x , x ·v = 0x is composed of possibly smaller eigenvalue vectors.

→ vBv ≥ (av +x)B(av +x) for unit v , av +x .

Page 109: Modelling. - University of California, Berkeley

Computing eigenvalues.

Power method:

Choose random x .Repeat: Let x = Bx . Scale x to unit vector.

x = a1v1 +a2v2 + · · ·xt ∝ Btx = a1λ t

1v1 +a2λ t2v2 + · · ·

Mostly v1 after a while since λ t1 >> λ t

2.

Cluster Algorithm:Choose random partition.Repeat: Compute means of partition. Project, cluster.

Choose random +1/−1 vector. Multiply by AT (direction betweenmeans), multiply by A (project points), cluster (round to +1/-1 vector.)

Sort of repeatedly multiplying by AAT . Power method.

Page 110: Modelling. - University of California, Berkeley

Computing eigenvalues.

Power method:Choose random x .

Repeat: Let x = Bx . Scale x to unit vector.

x = a1v1 +a2v2 + · · ·xt ∝ Btx = a1λ t

1v1 +a2λ t2v2 + · · ·

Mostly v1 after a while since λ t1 >> λ t

2.

Cluster Algorithm:Choose random partition.Repeat: Compute means of partition. Project, cluster.

Choose random +1/−1 vector. Multiply by AT (direction betweenmeans), multiply by A (project points), cluster (round to +1/-1 vector.)

Sort of repeatedly multiplying by AAT . Power method.

Page 111: Modelling. - University of California, Berkeley

Computing eigenvalues.

Power method:Choose random x .Repeat: Let x = Bx . Scale x to unit vector.

x = a1v1 +a2v2 + · · ·xt ∝ Btx = a1λ t

1v1 +a2λ t2v2 + · · ·

Mostly v1 after a while since λ t1 >> λ t

2.

Cluster Algorithm:Choose random partition.Repeat: Compute means of partition. Project, cluster.

Choose random +1/−1 vector. Multiply by AT (direction betweenmeans), multiply by A (project points), cluster (round to +1/-1 vector.)

Sort of repeatedly multiplying by AAT . Power method.

Page 112: Modelling. - University of California, Berkeley

Computing eigenvalues.

Power method:Choose random x .Repeat: Let x = Bx . Scale x to unit vector.

x = a1v1 +a2v2 + · · ·

xt ∝ Btx = a1λ t1v1 +a2λ t

2v2 + · · ·Mostly v1 after a while since λ t

1 >> λ t2.

Cluster Algorithm:Choose random partition.Repeat: Compute means of partition. Project, cluster.

Choose random +1/−1 vector. Multiply by AT (direction betweenmeans), multiply by A (project points), cluster (round to +1/-1 vector.)

Sort of repeatedly multiplying by AAT . Power method.

Page 113: Modelling. - University of California, Berkeley

Computing eigenvalues.

Power method:Choose random x .Repeat: Let x = Bx . Scale x to unit vector.

x = a1v1 +a2v2 + · · ·xt ∝ Btx = a1λ t

1v1 +a2λ t2v2 + · · ·

Mostly v1 after a while since λ t1 >> λ t

2.

Cluster Algorithm:Choose random partition.Repeat: Compute means of partition. Project, cluster.

Choose random +1/−1 vector. Multiply by AT (direction betweenmeans), multiply by A (project points), cluster (round to +1/-1 vector.)

Sort of repeatedly multiplying by AAT . Power method.

Page 114: Modelling. - University of California, Berkeley

Computing eigenvalues.

Power method:Choose random x .Repeat: Let x = Bx . Scale x to unit vector.

x = a1v1 +a2v2 + · · ·xt ∝ Btx = a1λ t

1v1 +a2λ t2v2 + · · ·

Mostly v1 after a while since λ t1 >> λ t

2.

Cluster Algorithm:Choose random partition.Repeat: Compute means of partition. Project, cluster.

Choose random +1/−1 vector. Multiply by AT (direction betweenmeans), multiply by A (project points), cluster (round to +1/-1 vector.)

Sort of repeatedly multiplying by AAT . Power method.

Page 115: Modelling. - University of California, Berkeley

Computing eigenvalues.

Power method:Choose random x .Repeat: Let x = Bx . Scale x to unit vector.

x = a1v1 +a2v2 + · · ·xt ∝ Btx = a1λ t

1v1 +a2λ t2v2 + · · ·

Mostly v1 after a while since λ t1 >> λ t

2.

Cluster Algorithm:

Choose random partition.Repeat: Compute means of partition. Project, cluster.

Choose random +1/−1 vector. Multiply by AT (direction betweenmeans), multiply by A (project points), cluster (round to +1/-1 vector.)

Sort of repeatedly multiplying by AAT . Power method.

Page 116: Modelling. - University of California, Berkeley

Computing eigenvalues.

Power method:Choose random x .Repeat: Let x = Bx . Scale x to unit vector.

x = a1v1 +a2v2 + · · ·xt ∝ Btx = a1λ t

1v1 +a2λ t2v2 + · · ·

Mostly v1 after a while since λ t1 >> λ t

2.

Cluster Algorithm:Choose random partition.

Repeat: Compute means of partition. Project, cluster.

Choose random +1/−1 vector. Multiply by AT (direction betweenmeans), multiply by A (project points), cluster (round to +1/-1 vector.)

Sort of repeatedly multiplying by AAT . Power method.

Page 117: Modelling. - University of California, Berkeley

Computing eigenvalues.

Power method:Choose random x .Repeat: Let x = Bx . Scale x to unit vector.

x = a1v1 +a2v2 + · · ·xt ∝ Btx = a1λ t

1v1 +a2λ t2v2 + · · ·

Mostly v1 after a while since λ t1 >> λ t

2.

Cluster Algorithm:Choose random partition.Repeat: Compute means of partition. Project, cluster.

Choose random +1/−1 vector. Multiply by AT (direction betweenmeans), multiply by A (project points), cluster (round to +1/-1 vector.)

Sort of repeatedly multiplying by AAT . Power method.

Page 118: Modelling. - University of California, Berkeley

Computing eigenvalues.

Power method:Choose random x .Repeat: Let x = Bx . Scale x to unit vector.

x = a1v1 +a2v2 + · · ·xt ∝ Btx = a1λ t

1v1 +a2λ t2v2 + · · ·

Mostly v1 after a while since λ t1 >> λ t

2.

Cluster Algorithm:Choose random partition.Repeat: Compute means of partition. Project, cluster.

Choose random +1/−1 vector.

Multiply by AT (direction betweenmeans), multiply by A (project points), cluster (round to +1/-1 vector.)

Sort of repeatedly multiplying by AAT . Power method.

Page 119: Modelling. - University of California, Berkeley

Computing eigenvalues.

Power method:Choose random x .Repeat: Let x = Bx . Scale x to unit vector.

x = a1v1 +a2v2 + · · ·xt ∝ Btx = a1λ t

1v1 +a2λ t2v2 + · · ·

Mostly v1 after a while since λ t1 >> λ t

2.

Cluster Algorithm:Choose random partition.Repeat: Compute means of partition. Project, cluster.

Choose random +1/−1 vector. Multiply by AT (direction betweenmeans),

multiply by A (project points), cluster (round to +1/-1 vector.)

Sort of repeatedly multiplying by AAT . Power method.

Page 120: Modelling. - University of California, Berkeley

Computing eigenvalues.

Power method:Choose random x .Repeat: Let x = Bx . Scale x to unit vector.

x = a1v1 +a2v2 + · · ·xt ∝ Btx = a1λ t

1v1 +a2λ t2v2 + · · ·

Mostly v1 after a while since λ t1 >> λ t

2.

Cluster Algorithm:Choose random partition.Repeat: Compute means of partition. Project, cluster.

Choose random +1/−1 vector. Multiply by AT (direction betweenmeans), multiply by A (project points),

cluster (round to +1/-1 vector.)

Sort of repeatedly multiplying by AAT . Power method.

Page 121: Modelling. - University of California, Berkeley

Computing eigenvalues.

Power method:Choose random x .Repeat: Let x = Bx . Scale x to unit vector.

x = a1v1 +a2v2 + · · ·xt ∝ Btx = a1λ t

1v1 +a2λ t2v2 + · · ·

Mostly v1 after a while since λ t1 >> λ t

2.

Cluster Algorithm:Choose random partition.Repeat: Compute means of partition. Project, cluster.

Choose random +1/−1 vector. Multiply by AT (direction betweenmeans), multiply by A (project points), cluster (round to +1/-1 vector.)

Sort of repeatedly multiplying by AAT . Power method.

Page 122: Modelling. - University of California, Berkeley

Computing eigenvalues.

Power method:Choose random x .Repeat: Let x = Bx . Scale x to unit vector.

x = a1v1 +a2v2 + · · ·xt ∝ Btx = a1λ t

1v1 +a2λ t2v2 + · · ·

Mostly v1 after a while since λ t1 >> λ t

2.

Cluster Algorithm:Choose random partition.Repeat: Compute means of partition. Project, cluster.

Choose random +1/−1 vector. Multiply by AT (direction betweenmeans), multiply by A (project points), cluster (round to +1/-1 vector.)

Sort of repeatedly multiplying by AAT .

Power method.

Page 123: Modelling. - University of California, Berkeley

Computing eigenvalues.

Power method:Choose random x .Repeat: Let x = Bx . Scale x to unit vector.

x = a1v1 +a2v2 + · · ·xt ∝ Btx = a1λ t

1v1 +a2λ t2v2 + · · ·

Mostly v1 after a while since λ t1 >> λ t

2.

Cluster Algorithm:Choose random partition.Repeat: Compute means of partition. Project, cluster.

Choose random +1/−1 vector. Multiply by AT (direction betweenmeans), multiply by A (project points), cluster (round to +1/-1 vector.)

Sort of repeatedly multiplying by AAT . Power method.

Page 124: Modelling. - University of California, Berkeley

Sum up.

Clustering mixture of gaussians.

Near Neighbor works with sufficient data.

Projection onto subspace of means is better.

Principal compent analysis can find subspace of means.

Power method computes principal component.

Generic clustering algorithm is rounded version of power method.

Page 125: Modelling. - University of California, Berkeley

Sum up.

Clustering mixture of gaussians.

Near Neighbor works with sufficient data.

Projection onto subspace of means is better.

Principal compent analysis can find subspace of means.

Power method computes principal component.

Generic clustering algorithm is rounded version of power method.

Page 126: Modelling. - University of California, Berkeley

Sum up.

Clustering mixture of gaussians.

Near Neighbor works with sufficient data.

Projection onto subspace of means is better.

Principal compent analysis can find subspace of means.

Power method computes principal component.

Generic clustering algorithm is rounded version of power method.

Page 127: Modelling. - University of California, Berkeley

Sum up.

Clustering mixture of gaussians.

Near Neighbor works with sufficient data.

Projection onto subspace of means is better.

Principal compent analysis can find subspace of means.

Power method computes principal component.

Generic clustering algorithm is rounded version of power method.

Page 128: Modelling. - University of California, Berkeley

Sum up.

Clustering mixture of gaussians.

Near Neighbor works with sufficient data.

Projection onto subspace of means is better.

Principal compent analysis can find subspace of means.

Power method computes principal component.

Generic clustering algorithm is rounded version of power method.

Page 129: Modelling. - University of California, Berkeley

Sum up.

Clustering mixture of gaussians.

Near Neighbor works with sufficient data.

Projection onto subspace of means is better.

Principal compent analysis can find subspace of means.

Power method computes principal component.

Generic clustering algorithm is rounded version of power method.

Page 130: Modelling. - University of California, Berkeley

See you on Tuesday.