exploiting k-nearest neighbor information with many data...classification with nearest neighbors •...

38
Seoul National University Exploiting k-Nearest Neighbor Information with Many Data Yung-Kyun Noh Robotics Lab., Seoul National University 2017 NVIDIA DEEP LEARNING WORKSHOP 2017. 10. 31 (Tue.)

Upload: others

Post on 30-Sep-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Exploiting k-Nearest Neighbor Information with Many Data...Classification with Nearest Neighbors • Use majority voting (k-nearest neighbor classification) • k = 9 (five / four

Seoul National University

Exploiting k-Nearest Neighbor

Information with Many Data

Yung-Kyun Noh

Robotics Lab., Seoul National University

2017 NVIDIA DEEP LEARNING WORKSHOP

2017. 10. 31 (Tue.)

Page 2: Exploiting k-Nearest Neighbor Information with Many Data...Classification with Nearest Neighbors • Use majority voting (k-nearest neighbor classification) • k = 9 (five / four

Seoul National University 2

Page 3: Exploiting k-Nearest Neighbor Information with Many Data...Classification with Nearest Neighbors • Use majority voting (k-nearest neighbor classification) • k = 9 (five / four

Seoul National University

Neural Information Processing Systems (NIPS 2015)

• Oral talks:15

• Spotlights: 37

• Accepted papers: 403

• Single session: more

than 3000 participants

are listening to the

single presentation.

• 7pm – 12am (5hr)

poster session every

day

• Oral talks:15

• Spotlights: 37

• Accepted papers: 403

• Single session: more

than 3000 participants

are listening to the

single presentation.

• 7pm – 12am (5hr)

poster session every

day

3

From Neil Lawrence’s Blog

Look at the poster session

how it does look

Page 4: Exploiting k-Nearest Neighbor Information with Many Data...Classification with Nearest Neighbors • Use majority voting (k-nearest neighbor classification) • k = 9 (five / four

Seoul National University

Asian Conference on Machine Learning

• ACML 2017

4

Page 5: Exploiting k-Nearest Neighbor Information with Many Data...Classification with Nearest Neighbors • Use majority voting (k-nearest neighbor classification) • k = 9 (five / four

Seoul National University

Contents

• Nonparametric methods for estimating

density functions

– Nearest neighbor methods

– Kernel density estimation methods

• Metric learning for nonparametric methods

– Generative approach for metric learning

• Theoretical properties and applications

5

Page 6: Exploiting k-Nearest Neighbor Information with Many Data...Classification with Nearest Neighbors • Use majority voting (k-nearest neighbor classification) • k = 9 (five / four

Seoul National University

=[1, 2, 5, 10, …]T

Representation of Data

6

• Each datum is one point

in a data space

Data space

Page 7: Exploiting k-Nearest Neighbor Information with Many Data...Classification with Nearest Neighbors • Use majority voting (k-nearest neighbor classification) • k = 9 (five / four

Seoul National University 7

ship ship ship airplane ship

ship shipship deer ship

Nearest Points

Page 8: Exploiting k-Nearest Neighbor Information with Many Data...Classification with Nearest Neighbors • Use majority voting (k-nearest neighbor classification) • k = 9 (five / four

Seoul National University 8

automobile truck cat ship ship

automobile automobileship ship ship

Nearest Points

Page 9: Exploiting k-Nearest Neighbor Information with Many Data...Classification with Nearest Neighbors • Use majority voting (k-nearest neighbor classification) • k = 9 (five / four

Seoul National University

Classification with Nearest Neighbors

• Use majority voting (k-nearest neighbor

classification)

• k = 9 (five / four )

• Classify a testing point ( ) as class 1 ( ).

9

: class 1

: class 2

Data space

Page 10: Exploiting k-Nearest Neighbor Information with Many Data...Classification with Nearest Neighbors • Use majority voting (k-nearest neighbor classification) • k = 9 (five / four

Seoul National University

Bayes Classification

• Bayes classification using underlying density

functions: Optimal

10

In general,

we do not know the underlying density.

Error:

Bayes risk

Page 11: Exploiting k-Nearest Neighbor Information with Many Data...Classification with Nearest Neighbors • Use majority voting (k-nearest neighbor classification) • k = 9 (five / four

Seoul National University

Nearest Neighbors and Bayes Classification

• Surrogate method of using underlying density

functions.

11

Count nearest neighbors!

Page 12: Exploiting k-Nearest Neighbor Information with Many Data...Classification with Nearest Neighbors • Use majority voting (k-nearest neighbor classification) • k = 9 (five / four

Seoul National University 12

• Tomas M. Cover (8/7/1938~3/26/2012)

• BS. in Physics from MIT

• Ph.D. in EE from Stanford

• Professor in EE and Statistics, Stanford

• Peter E. Hart (Bone c. 1940s)

• MS., Ph.D. from Stanford

• A strong advocate of artificial

intelligence in industry

• Currently Group Senior Vice

President at the Ricoh Company,

Ltd.

Page 13: Exploiting k-Nearest Neighbor Information with Many Data...Classification with Nearest Neighbors • Use majority voting (k-nearest neighbor classification) • k = 9 (five / four

Seoul National University

• Early in 1966 when I first began

teaching at Stanford, a student,

Peter Hart, walked into my

office with an interesting

problem.

• Charles Cole and he were using

a pattern classification scheme

which, for lack of a better word,

they described as the nearest

neighbor procedure.

• The proper goal would be to

relate the probability of error of

this procedure to the minimal

probability of error … namely,

the Bayes risk.

13

Page 14: Exploiting k-Nearest Neighbor Information with Many Data...Classification with Nearest Neighbors • Use majority voting (k-nearest neighbor classification) • k = 9 (five / four

Seoul National University

Nearest Neighbors and Bayes Risk

14

[T. Cover and P. Hart, 1967]

• 1-NN error

• k-NN error

, uniformly!

Page 15: Exploiting k-Nearest Neighbor Information with Many Data...Classification with Nearest Neighbors • Use majority voting (k-nearest neighbor classification) • k = 9 (five / four

Seoul National University

Bias in the Expected Error

• Assumption:

A nearest neighbor appears at nonzero .

15

①: Asymptotic NN Error

②: Residual due to Finite Sampling .

…①

…②

Metric-dependent terms

R. R. Snapp et al. (1998) Asymptotic expansions of the k nearest neighbor risk, The Annals of Statistics

Y.-K. Noh et al. (2010) Generative local metric learning for nearest neighbor classification, NIPS

Page 16: Exploiting k-Nearest Neighbor Information with Many Data...Classification with Nearest Neighbors • Use majority voting (k-nearest neighbor classification) • k = 9 (five / four

Seoul National University

Metric Dependency of Nearest Neighbors

16

• Different metric changes class belongings

Mahalanobis-type distance:

Classified as blueClassified as red

Page 17: Exploiting k-Nearest Neighbor Information with Many Data...Classification with Nearest Neighbors • Use majority voting (k-nearest neighbor classification) • k = 9 (five / four

Seoul National University

Conventional Idea of Metric Learning

17

Class 1 Class 2 Class 1 Class 2

Page 18: Exploiting k-Nearest Neighbor Information with Many Data...Classification with Nearest Neighbors • Use majority voting (k-nearest neighbor classification) • k = 9 (five / four

Seoul National University

Many Data Situation with Overlap

18

Page 19: Exploiting k-Nearest Neighbor Information with Many Data...Classification with Nearest Neighbors • Use majority voting (k-nearest neighbor classification) • k = 9 (five / four

Seoul National University

Conventional Metric Learning

19

Page 20: Exploiting k-Nearest Neighbor Information with Many Data...Classification with Nearest Neighbors • Use majority voting (k-nearest neighbor classification) • k = 9 (five / four

Seoul National University

Generative Local Metric Learning (GLML)

20

20% increase

Page 21: Exploiting k-Nearest Neighbor Information with Many Data...Classification with Nearest Neighbors • Use majority voting (k-nearest neighbor classification) • k = 9 (five / four

Seoul National University

Bayes Classification with True Model

• Two Gaussians

– same means, random covariance matrices

– Number of data: 20 per class

21

Page 22: Exploiting k-Nearest Neighbor Information with Many Data...Classification with Nearest Neighbors • Use majority voting (k-nearest neighbor classification) • k = 9 (five / four

Seoul National University

Bayes Classification with True Model

• Two Gaussians

– same means, random covariance matrices

– Number of data: 50 per class

22

Page 23: Exploiting k-Nearest Neighbor Information with Many Data...Classification with Nearest Neighbors • Use majority voting (k-nearest neighbor classification) • k = 9 (five / four

Seoul National University

Bayes Classification with True Model

• Two Gaussians

– same means, random covariance matrices

– Number of data: 100 per class

23

Page 24: Exploiting k-Nearest Neighbor Information with Many Data...Classification with Nearest Neighbors • Use majority voting (k-nearest neighbor classification) • k = 9 (five / four

Seoul National University

k-NN Beats True Model With Metric Learning!

24

N = 1000/class

k = 5

N = 3000/class

k = 5

Page 25: Exploiting k-Nearest Neighbor Information with Many Data...Classification with Nearest Neighbors • Use majority voting (k-nearest neighbor classification) • k = 9 (five / four

Seoul National University

Manifold Embedding (Isomap)

25

Use Dijkstra algorithm

to calculate the manifold

distance from nearest

neighbor distance

MDS using manifold

distance

( X )

Page 26: Exploiting k-Nearest Neighbor Information with Many Data...Classification with Nearest Neighbors • Use majority voting (k-nearest neighbor classification) • k = 9 (five / four

Seoul National University

Manifold Embedding (Isomap)

26

Page 27: Exploiting k-Nearest Neighbor Information with Many Data...Classification with Nearest Neighbors • Use majority voting (k-nearest neighbor classification) • k = 9 (five / four

Seoul National University

Isomap with LMNN Metric

27

Page 28: Exploiting k-Nearest Neighbor Information with Many Data...Classification with Nearest Neighbors • Use majority voting (k-nearest neighbor classification) • k = 9 (five / four

Seoul National University

Isomap with GLM Metric

28

Page 29: Exploiting k-Nearest Neighbor Information with Many Data...Classification with Nearest Neighbors • Use majority voting (k-nearest neighbor classification) • k = 9 (five / four

Seoul National University

Nadaraya-Watson Estimator

29

Classification

Regression

Page 30: Exploiting k-Nearest Neighbor Information with Many Data...Classification with Nearest Neighbors • Use majority voting (k-nearest neighbor classification) • k = 9 (five / four

Seoul National University

Kernel regression (Nadaraya-Watson regression) with metric learning

30

x 3

x

Page 31: Exploiting k-Nearest Neighbor Information with Many Data...Classification with Nearest Neighbors • Use majority voting (k-nearest neighbor classification) • k = 9 (five / four

Seoul National University

Kernel regression (Nadaraya-Watson regression) with metric learning

31

x

Page 32: Exploiting k-Nearest Neighbor Information with Many Data...Classification with Nearest Neighbors • Use majority voting (k-nearest neighbor classification) • k = 9 (five / four

Seoul National University

For x & y Jointly Gaussian

• Learned metric is not sensitive to the

bandwidth

32

Page 33: Exploiting k-Nearest Neighbor Information with Many Data...Classification with Nearest Neighbors • Use majority voting (k-nearest neighbor classification) • k = 9 (five / four

Seoul National University

For x & y Jointly Gaussian

• Learned metric is not sensitive to the

bandwidth

33

Page 34: Exploiting k-Nearest Neighbor Information with Many Data...Classification with Nearest Neighbors • Use majority voting (k-nearest neighbor classification) • k = 9 (five / four

Seoul National University

Benchmark Data

34

Page 35: Exploiting k-Nearest Neighbor Information with Many Data...Classification with Nearest Neighbors • Use majority voting (k-nearest neighbor classification) • k = 9 (five / four

Seoul National University

Two Theoretical Properties for Gaussians

• The existence of a symmetric positive definite

matrix A that eliminates the first term of the

bias.

• With optimal bandwidth h minimizing the

leading order terms, the minimum mean

square error is the square of bias in infinitely

high-dimensional space.

35

Page 36: Exploiting k-Nearest Neighbor Information with Many Data...Classification with Nearest Neighbors • Use majority voting (k-nearest neighbor classification) • k = 9 (five / four

Seoul National University

• Choosing between two alternatives under

time pressure with uncertain information.

Diffusion Decision Model

36

z

-z

evidence

T

class 1 class 1

class 2

Speed-accuracy tradeoff

Increase accuracy

Increase accuracy

Increase speed

Increase speed

Confidence level 0.8 Confidence level 0.9

Page 37: Exploiting k-Nearest Neighbor Information with Many Data...Classification with Nearest Neighbors • Use majority voting (k-nearest neighbor classification) • k = 9 (five / four

Seoul National University

Summary

• Nearest neighbor methods and asymptotic

property

• Naradaya-Watson regression with metric

learning

• Diffusion decision making and nearest

neighbor methods

37

Page 38: Exploiting k-Nearest Neighbor Information with Many Data...Classification with Nearest Neighbors • Use majority voting (k-nearest neighbor classification) • k = 9 (five / four

Seoul National University 38

THANK YOUYung-Kyun Noh

[email protected]