a survey on distance metric learning (part 2)

30
1 A Survey on Distance Metric Learning (Part 2) Gerry Tesauro IBM T.J.Watson Research Center

Upload: lane-beard

Post on 31-Dec-2015

65 views

Category:

Documents


0 download

DESCRIPTION

A Survey on Distance Metric Learning (Part 2). Gerry Tesauro IBM T.J.Watson Research Center. Acknowledgement. Lecture material shamelessly adapted from the following sources: Kilian Weinberger: “Survey on Distance Metric Learning” slides IBM summer intern talk slides (Aug. 2006) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A Survey on Distance Metric Learning (Part 2)

1

A Survey on Distance Metric Learning (Part 2)

Gerry Tesauro

IBM T.J.Watson Research Center

Page 2: A Survey on Distance Metric Learning (Part 2)

2

Acknowledgement

• Lecture material shamelessly adapted from the following sources:– Kilian Weinberger:

• “Survey on Distance Metric Learning” slides• IBM summer intern talk slides (Aug. 2006)

– Sam Roweis slides (NIPS 2006 workshop on “Learning to Compare Examples”)

– Yann LeCun talk slides (CVPR 2005, 2006)

Page 3: A Survey on Distance Metric Learning (Part 2)

3

Outline – Part 2

Neighbourhood Components Analysis (Golderberger et al.), Metric Learning by Collapsing Classes (Globerson & Roweis)

Metric Learning for Kernel Regression (Weinberger & Tesauro)

Metric learning for RL basis function construction (Keller et al.)

Similarity learning for image processing (LeCun et al.)

Page 4: A Survey on Distance Metric Learning (Part 2)

Neighborhood Component Analysis

(Goldberger et. al. 2004)Distance metric for visualization and kNN

Page 5: A Survey on Distance Metric Learning (Part 2)

Metric Learning for Kernel Regression

Weinberger & Tesauro, AISTATS 2007

Page 6: A Survey on Distance Metric Learning (Part 2)

Killing three birds with one stone:

We construct a method for linear dimensionality

reduction

that generates a meaningful distance

metric optimally tuned for

distance-based kernel

regression

Page 7: A Survey on Distance Metric Learning (Part 2)

7

Kernel Regression

• Given training set {(xj , yj), j=1,…,N} where x is -dim vector and y is real-valued, estimate value of a test point xi by weighted avg. of samples:

where kij = kD (xi, xj) is a distance-based kernel function using distance metric D

ijij

ijijj

i k

ky

y

Page 8: A Survey on Distance Metric Learning (Part 2)

8

Choice of Kernel

• Many functional forms for kij can be used in MLKR;

our empirical work uses the Gaussian kernel

where σ is a kernel width parameter (can set σ=1 W.L.O.G. since we learn D)

softmax regression estimate similar to Roweis’ softmax classifier

)/exp( 22 ijij Dk

ij

ij

ij

ijj

i D

Dy

y)exp(

)exp(

ˆ2

2

Page 9: A Survey on Distance Metric Learning (Part 2)

Distance Metric for Nearest Neighbor Regression

Learn a linear transformation that allows to estimate the value of a test point from its nearest neighbors

Page 10: A Survey on Distance Metric Learning (Part 2)

Mahalanobis Metric

Distance function is a pseudo Mahalanobis metric (Generalizes

Euclidean distance)

Page 11: A Survey on Distance Metric Learning (Part 2)

11

General Metric Learning Objective

• Find parmaterized distance function Dθ that minimizes total leave-one-out cross-validation loss function

– e.g. params θ = elements Aij of A matrix

• Since we’re solving for A not M, optimization is non-convex use gradient descent

2)ˆ( iii

yy

Page 12: A Survey on Distance Metric Learning (Part 2)

12

Gradient Computation

where xij = xi – xj

For fast implementation: Don’t sum over all i-j pairs, only go up to ~1000

nearest neighbors for each sample i Maintain nearest neighbors in a heap-tree structure,

update heap tree every 15 gradient steps Ignore sufficiently small values of kij ( < e-34 )

Even better data structures: cover trees, k-d trees

))ˆ()ˆ(4 Tij

i jijijjjii xxkyyyyA

A

Page 13: A Survey on Distance Metric Learning (Part 2)

Learned Distance Metric example

orig. Euclidean D < 1 learned D < 1

Page 14: A Survey on Distance Metric Learning (Part 2)

“Twin Peaks” test

n=8000

Training:

we added 3 dimensions with

1000% noise

we rotated 5 dimensions randomly

Page 15: A Survey on Distance Metric Learning (Part 2)

Input Variance

Noise Signal

Page 16: A Survey on Distance Metric Learning (Part 2)

Test data

QuickTime™ and aTIFF (PackBits) decompressorare needed to see this picture.

Page 17: A Survey on Distance Metric Learning (Part 2)

QuickTime™ and aTIFF (PackBits) decompressorare needed to see this picture.

Test data

Page 18: A Survey on Distance Metric Learning (Part 2)

Output Variance

Signal Noise

Page 19: A Survey on Distance Metric Learning (Part 2)

DimReduction with MLKR• FG-NET face data: 82 persons, 984 face images w/age

Page 20: A Survey on Distance Metric Learning (Part 2)

DimReduction with MLKR• FG-NET face data: 82 persons, 984 face images w/age

Page 21: A Survey on Distance Metric Learning (Part 2)

DimReduction with MLKR

Force A to be rectangular

Project onto eigenvectors of A

Allows visualization of data

PowerManagement data (d=21)

Page 22: A Survey on Distance Metric Learning (Part 2)

Robot arm results (8,32dim)

regression error

Page 23: A Survey on Distance Metric Learning (Part 2)

© 2006 IBM Corporation

IBM

Unity Data Center Prototype

Objective: Learn long-range resource value estimates for each application manager

State Variables (~48):

– Arrival rate– ResponseTime– QueueLength– iatVariance– rtVariance

Action: # of servers allocated by Arbiter

Reward: SLA(Resp. Time)

8 xSeries servers

Value(#srvrs)

Trade3

AppManager

Value(RT)

ResourceArbiter

Batch

AppManager

Trade3

Server Server Server Server Server Server Server Server

Value(#srvrs)

Value(#srvrs)

Demand(HTTP req/sec)

WebSphere 5.1

DB2

AppManager

WebSphere 5.1

DB2

Value(#srvrs)

Maximize Total SLA Revenue

5 sec

Value(RT)

Demand(HTTP req/sec)

SLA SLA SLA

(Tesauro, AAAI 2005; Tesauro et al., ICAC 2006)

Page 24: A Survey on Distance Metric Learning (Part 2)

© 2006 IBM Corporation

IBM

Power & Performance Management

Objective: Managing systems to multi-discipline objectives: minimize Resp. Time and minimize Power Usage

State Variables (21):

– Power Cap

– Power Usage

– CPU Utilization

– Temperature

– # of requests arrived

– Workload intensity (# Clients)

– Response Time

Action: Power Cap

Reward: SLA(Resp. Time) – Power Usage

(Kephart et al., ICAC 2007)

Page 25: A Survey on Distance Metric Learning (Part 2)

© 2006 IBM Corporation25

IBM

IBM Regression Results TEST ERROR

14/47

10/223/5

MLKR

Page 26: A Survey on Distance Metric Learning (Part 2)

27

Metric Learning for RL basis function construction (Keller et al. ICML 2006)

• RL Dataset of state-action-reward tuples {(si, ai, ri) , i=1,…,N}

Page 27: A Survey on Distance Metric Learning (Part 2)

28

Value Iteration

• Define an iterative “bootstrap” calculation:

• Each round of VI must iterate over all states in the state space• Try to speed this up using state aggregation (Bertsekas &

Castanon, 1989)

• Idea: Use NCA to aggregate states:– project states into lower-dim rep; keep states with similar Bellman

error close together

– use projected states to define a set of basis functions {}– learn linear value function over basis functions: V = θi i

'

''1 )'(max)(s

kass

ass

ak sVRPsV

Page 28: A Survey on Distance Metric Learning (Part 2)

Chopra et. al. 2005Similarity metric for image

verification.

Problem: Given a pair of face-images,decide if they are from the same person.

Page 29: A Survey on Distance Metric Learning (Part 2)

Chopra et. al. 2005Similarity metric for image

verification.

Too difficult for linear mapping!

Problem: Given a pair of face-images,decide if they are from the same person.

Page 30: A Survey on Distance Metric Learning (Part 2)