visualization using tsne

20
Visualization using tSNE Yan Xu Jun 7, 2013

Upload: yan-xu

Post on 08-Jul-2015

1.256 views

Category:

Technology


3 download

DESCRIPTION

An introduction to tSNE in the background of dimension reduction

TRANSCRIPT

Page 1: Visualization using tSNE

Visualization using tSNE

Yan Xu

Jun 7, 2013

Page 2: Visualization using tSNE

Linear(PCA)

Nonlinear

Non-

parametric

Parametric(LDA)

Dimension

reduction

Global(ISOMAP,MDS)

Local(LLE, SNE)

Dimension Reduction Overview

MDS SNE sym SNE UNI-SNE tSNE Barnes-Hut-SNE

Local+probability crowding problem

more stable

and faster

solution

tSNE (t-distributed Stochastic Neighbor Embedding)

easier

implementation

2002 2008 2013

O(N2)->O(NlogN)

2007

Page 3: Visualization using tSNE

MDS: Multi-Dimensional Scaling

• Multi-Dimensional Scaling arranges the low-dimensional points so as to minimize the discrepancy between the pairwise distances in the original space and the pairwise distances in the low-D space.

2

2

2

||||ˆ

||||

)ˆ(

jiij

jiij

ij

ji

ij

yyd

xxd

ddCost

Page 4: Visualization using tSNE

2

||||

||||||||

ji

jiji

ji

Costxx

yyxx

high-D

distancelow-D

distance

Sammon mapping from MDS

It puts too much emphasis on getting very small distances exactly

right. It’s slow to optimize and also gets stuck in different local

optima each time

Global to Local?

Page 5: Visualization using tSNE

The idea is to make the local configurations of points in the low-dimensional

space resemble the local configurations in the high-dimensional space.

Maps that preserve local geometry

2

)(

||||iNj

jij

i

i wCost yy

fixed weights

1,||||)(

2

)( iNj

ij

iNj

jij

i

i wwCost xx

Find the y that minimize the cost subject to the constraint that the y have

unit variance on each dimension.

LLE (Locally Linear Embedding)

Page 6: Visualization using tSNE

A probabilistic version of local MDS:

Stochastic Neighbor Embedding (SNE)

• It is more important to get local distances right than non-local ones.

• Stochastic neighbor embedding has a probabilistic way of deciding if

a pairwise distance is “local”.

• Convert each high-dimensional similarity into the probability that one

data point will pick the other data point as its neighbor.

2

2

2|| || 2

| 2|| || 2

i j

i k

x x i

j ix x i

k

ep

e

probability of

picking j given i in

high D

2

2

|| ||

|| |||

i j

i k

y y

y yj i

k

eq

e

probability of

picking j given

i in low D

Page 7: Visualization using tSNE

Picking the radius of the Gaussian that is

used to compute the p’s

• We need to use different radii in different parts of the space so that

we keep the effective number of neighbors about constant.

• A big radius leads to a high entropy for the distribution over

neighbors of i. A small radius leads to a low entropy.

• So decide what entropy you want and then find the radius that

produces that entropy.

• Its easier to specify perplexity:2

2

2|| || 2

| 2|| || 2

i j

i k

x x i

j ix x i

k

ep

e

Page 8: Visualization using tSNE

The cost function for a low-dimensional

representation

ijq

ijp

i jijpQ

iiPKLCost i

|

|log|)||(

)()(2 |||| jijiijiji

j

j

i

qpqpC

yyy

Gradient descent:

Gradient update with a momentum term:

Learning

rate

Momentum

Page 9: Visualization using tSNE

Simpler version SNE: Turning conditional

probabilities into pairwise probabilities

1

2ij

j

pn

n

ppp

jiijij

2

||

2||

2||

2|| 2

2|| 2ij

xi j

xk l

x

x

k l

ep

e

4 ( )( )ij ij i j

ji

Cp q y y

y

( || ) logij

ijij

pCost KL P Q p

q

Page 10: Visualization using tSNE

MNIST

Database

of handwritten

digits

28×28 images

Problem?

Page 11: Visualization using tSNE

Why SNE does not have gaps between

classes

A uniform background model (UNI-SNE) eliminates this effect and allows gaps between classes to appear.

qij can never fall below

Crowding problem: the area accommodating moderately distant

datapoints is not large enough compared with the area

accommodating nearby datapoints.

2

( 1)n n

Page 12: Visualization using tSNE
Page 13: Visualization using tSNE

From UNI-SNE to t-SNE

2 1

2 1

(1 || || )

(1 || || )

i j

k l

k l

ij

y yq

y y

High dimension: Convert distances into probabilities using a

Gaussian distribution

Low dimension: Convert distances into probabilities using a

probability distribution that has much heavier tails than a Gaussian.

Student’s t-distribution

V : the number of degrees of freedom

Standard

Normal Dis.

T-Dis. With

V = 1

Page 14: Visualization using tSNE

Compare tSNE with SNE and UNI-SNE

10

12

14

16

18

10

12

14

-2

-4

Page 15: Visualization using tSNE

Optimization method for tSNE2 1

2 1

(1 || || )

(1 || || )

i j

k l

k l

ij

y yq

y y

2

2

2|| || 2

| 2|| || 2

i j

i k

x x i

j ix x i

k

ep

e

Page 16: Visualization using tSNE

Optimization method for tSNE

Tricks:

1. Keep momentum term small until the map points have become

moderately well organized.

2. Use adaptive learning rate described by Jacobs (1988), which

gradually increases the learning rate in directions where the

gradient is stable.

3. Early compression: force map points to stay close together at the

start of the optimization.

4. Early exaggeration: multiply all the pij’s by 4, in the initial stages

of the optimization.

Page 17: Visualization using tSNE

6000

MNIST

digits

Isomap

Locally Linear Embedding

t-SNE

Sammon mapping

Page 18: Visualization using tSNE

tSNE vs Diffusion maps

Diffusion distance:

Diffusion maps:

2|| ||

(1)i jx x

ijp e

( ) ( 1) ( 1)

1

nt t t

ij ik kj

k

p p p

Page 19: Visualization using tSNE

Weakness

1. It’s unclear how t-SNE performs on general dimensionality

reduction task;

2. The relative local nature of t-SNE makes it sensitive to the curse

of the intrinsic dimensionality of the data;

3. It’s not guaranteed to converge to a global optimum of its cost

function.

Page 20: Visualization using tSNE

References:

t-SNE homepage:

http://homepage.tudelft.nl/19j49/t-SNE.html

Advanced Machine Learning: Lecture11: Non-linear Dimensionality Reduction

http://www.cs.toronto.edu/~hinton/csc2535/lectures.html

Plugin Ad: tSNE in Farsightsplot = new SNEPlotWindow(this);

splot->setPerplexity(perplexity);

splot->setModels(table, selection))

splot->show();