one the role and impact of the metaparameters in t ... · • sne and t-sne are nowadays considered...

39
One the Role and Impact of the Metaparameters in t-distributed Stochastic Neighbor Embedding John A. Lee and Michel Verleysen Machine Learning Group Université catholique de Louvain Louvain-la-Neuve, Belgium [email protected]

Upload: others

Post on 10-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: One the Role and Impact of the Metaparameters in t ... · • SNE and t-SNE are nowadays considered as ‘good’ methods for NDLR • Examples Compstat 2010 On the role and impact

One the Role and Impact of the Metaparameters

in t-distributed Stochastic Neighbor Embedding

John A. Lee and Michel VerleysenMachine Learning Group Université catholique de LouvainLouvain-la-Neuve, [email protected]

Page 2: One the Role and Impact of the Metaparameters in t ... · • SNE and t-SNE are nowadays considered as ‘good’ methods for NDLR • Examples Compstat 2010 On the role and impact

Motivation for nonlinear dimensionality reduction

• High-dimensional data are– difficult to represent– difficult to understand– difficult to analyze

• Motivation #1:– To visualize data living in a d-dimensional space (d > 3)

• Motivation #2:– Models (regression, classification, clustering) based on high-dimensional

data suffer from the curse of dimensionality– Need to reduce the dimension of data while keeping information content!

Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 2

Motivation

Page 3: One the Role and Impact of the Metaparameters in t ... · • SNE and t-SNE are nowadays considered as ‘good’ methods for NDLR • Examples Compstat 2010 On the role and impact

Visualization

• These are data• It is difficult to see something…

Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 3

Motivation

annual increase (%), infant mortality (‰), illiteracy ratio (%), school attendance (%), GIP, annual GIP increase (%)

Page 4: One the Role and Impact of the Metaparameters in t ... · • SNE and t-SNE are nowadays considered as ‘good’ methods for NDLR • Examples Compstat 2010 On the role and impact

Visualization

• These are the same data• under different visualization paradigms• possible to see groups, relations, outliers, …

Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 4

Motivation

Page 5: One the Role and Impact of the Metaparameters in t ... · • SNE and t-SNE are nowadays considered as ‘good’ methods for NDLR • Examples Compstat 2010 On the role and impact

Not all NLDR methods perform equally !

Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 5

Motivation

Geodesic NLM

CDAIsomap

Page 6: One the Role and Impact of the Metaparameters in t ... · • SNE and t-SNE are nowadays considered as ‘good’ methods for NDLR • Examples Compstat 2010 On the role and impact

Stochastic Neighbor Embedding

• SNE and t-SNE are nowadays considered as ‘good’ methods for NDLR• Examples

Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 6

Motivation

From: L. Van der Maaten & G. Hinton, Visualizing Data using t-SNE, Journal of Machine Learning Research 9 (2008) 2579-2605

t-SNE MDS

Page 7: One the Role and Impact of the Metaparameters in t ... · • SNE and t-SNE are nowadays considered as ‘good’ methods for NDLR • Examples Compstat 2010 On the role and impact

Stochastic Neighbor Embedding

• SNE and t-SNE are nowadays considered as ‘good’ methods for NDLR• Examples

Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 7

Motivation

From: L. Van der Maaten & G. Hinton, Visualizing Data using t-SNE, Journal of Machine Learning Research 9 (2008) 2579-2605

t-SNE MDS

Page 8: One the Role and Impact of the Metaparameters in t ... · • SNE and t-SNE are nowadays considered as ‘good’ methods for NDLR • Examples Compstat 2010 On the role and impact

Outline

• NDLR: a historical perspective– stress function– intrusion and extrusions– geodesic distances

• SNE and t-SNE– algorithm– gradient– transformed distances

• Experiments– with Euclidean distances– with geodesic distances

• Conclusions

Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 8

NDLR: a historical perspective

Page 9: One the Role and Impact of the Metaparameters in t ... · • SNE and t-SNE are nowadays considered as ‘good’ methods for NDLR • Examples Compstat 2010 On the role and impact

From MDS to more general cost functions

• MDS follows the idea of

• Extension:

to give more importance to– small distances– close data– …

Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 9

NDLR: a historical perspective → Stress function

( )∑<

−ji

ijijX

d222min δ

jiij

jiij

xxd

yy

−=

−=δ where

( )∑<

−ji

ijijijX

dw222min δ

Traditional « stress » function:

( )∑<

−ji

ijijijX

dw2

min δ

Breakthrough #1

Page 10: One the Role and Impact of the Metaparameters in t ... · • SNE and t-SNE are nowadays considered as ‘good’ methods for NDLR • Examples Compstat 2010 On the role and impact

Limitations of linear projections

• Even simple manifolds can be poorly projected

Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 10

NDLR: a historical perspective → Intrusions and extrusions

Page 11: One the Role and Impact of the Metaparameters in t ... · • SNE and t-SNE are nowadays considered as ‘good’ methods for NDLR • Examples Compstat 2010 On the role and impact

Limitations of linear projections

• Even simple manifolds can be poorly projected

• Points originally far from eachother are projected close: this is an intrusion

Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 11

NDLR: a historical perspective → Intrusions and extrusions

Page 12: One the Role and Impact of the Metaparameters in t ... · • SNE and t-SNE are nowadays considered as ‘good’ methods for NDLR • Examples Compstat 2010 On the role and impact

Nonlinear projections

• Goal: to unfold, rather than to project (linearly)

Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 12

NDLR: a historical perspective → Intrusions and extrusions

Page 13: One the Role and Impact of the Metaparameters in t ... · • SNE and t-SNE are nowadays considered as ‘good’ methods for NDLR • Examples Compstat 2010 On the role and impact

Nonlinear projections

• Goal: to unfold, rather than to project (linearly)

• Intrusions can be hopefully decreased, but extrusions could appear

Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 13

NDLR: a historical perspective → Intrusions and extrusions

Page 14: One the Role and Impact of the Metaparameters in t ... · • SNE and t-SNE are nowadays considered as ‘good’ methods for NDLR • Examples Compstat 2010 On the role and impact

The user’s point of view

• Favouring intrusions or extrusions is related to the application (user’s point of view)

• General way of handling the compromise:

• Nowadays, few methods acknowledge this need for a trade-off !

Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 14

NDLR: a historical perspective → Intrusions and extrusions

( )

−+

=

σδ

λσ

λ ijijij f

dfw 1 Breakthrough #2

allows intrusions allows extrusions

Page 15: One the Role and Impact of the Metaparameters in t ... · • SNE and t-SNE are nowadays considered as ‘good’ methods for NDLR • Examples Compstat 2010 On the role and impact

Geodesic distances

• Goal: to measure distances along the manifold

• Such distances are more easily preserved

Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 15

NDLR: a historical perspective → Geodesic distances

Breakthrough #3

Page 16: One the Role and Impact of the Metaparameters in t ... · • SNE and t-SNE are nowadays considered as ‘good’ methods for NDLR • Examples Compstat 2010 On the role and impact

Geodesic and graph distances

• Geodesic distances: finding the shortest way between data along the manifold

• Problem: the manifold is unknown → approximate it by a graph• It exists efficient algorithms for finding shortest paths• The graph can be built by connecting data in a k-neighborhood, or in

a ε-ball

Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 16

2-d data

Approximation ofGeodesic distance

NDLR: a historical perspective → Geodesic distances

Page 17: One the Role and Impact of the Metaparameters in t ... · • SNE and t-SNE are nowadays considered as ‘good’ methods for NDLR • Examples Compstat 2010 On the role and impact

Distance preservation methods

Euclideandistances in HD space

Geodesicdistances in HD space

Metric MDS Isomap

Favorsintrusions

SammonNLM

GeodesicNLM

Favorsextrusions CCA CDA

Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 17

NDLR: a historical perspective

( ) ( )( )2

1,,,∑

=−=

N

jixy jidjidE

( ) ( )( )( )∑

<=

−=

N

jii y

xyNLM jid

jidjidE

1

2

,

,,

( ) ( )( ) ( )( )∑<=

−=N

jii

xxyCCA jidFjidjidE1

2 ,,, λ

Page 18: One the Role and Impact of the Metaparameters in t ... · • SNE and t-SNE are nowadays considered as ‘good’ methods for NDLR • Examples Compstat 2010 On the role and impact

Distance preservation methods

Euclideandistances in HD space

Geodesicdistances in HD space

Metric MDS Isomap

Favorsintrusions

SammonNLM

GeodesicNLM

Favorsextrusions CCA CDA

Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 18

NDLR: a historical perspective

( ) ( )( )2

1,,,∑

=−=

N

jixy jidjidE

( ) ( )( )( )∑

<=

−=

N

jii y

xyNLM jid

jidjidE

1

2

,

,,

( ) ( )( ) ( )( )∑<=

−=N

jii

xxyCCA jidFjidjidE1

2 ,,, λ

Computational load ↓Performances ↓

Computational load ↑Performances ↑

Page 19: One the Role and Impact of the Metaparameters in t ... · • SNE and t-SNE are nowadays considered as ‘good’ methods for NDLR • Examples Compstat 2010 On the role and impact

Outline

• NDLR: a historical perspective– stress function– intrusion and extrusions– geodesic distances

• SNE and t-SNE– algorithm– gradient– transformed distances

• Experiments– with Euclidean distances– with geodesic distances

• Conclusions

Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 19

SNE and t-SNE

Page 20: One the Role and Impact of the Metaparameters in t ... · • SNE and t-SNE are nowadays considered as ‘good’ methods for NDLR • Examples Compstat 2010 On the role and impact

SNE and t-SNE

• In the original space, the similarity between yi and yj is defined as

• Similarities are not symmetric (individual widths) !• pj|i is the empirical probability of yj to be a neighbor of yi

Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 20

SNE and t-SNE → Algorithm

( ) ( )( )

=

=∑≠

otherwise

if0

ikiik

iijiij

g

gji

pλδ

λδλ ( )

−=

2exp

2uug

Page 21: One the Role and Impact of the Metaparameters in t ... · • SNE and t-SNE are nowadays considered as ‘good’ methods for NDLR • Examples Compstat 2010 On the role and impact

SNE and t-SNE

• In the original space, the similarity between yi and yj is defined as

• Similarities are not symmetric (individual widths) !• pj|i is the empirical probability of yj to be a neighbor of yi

• Individuals widths λi: set (individually) through a global « perplexity » parameter

Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 21

SNE and t-SNE → Algorithm

( )

−=

2exp

2uug

( )PPXTijpH

=2

( ) ( )( )

=

=∑≠

otherwise

if0

ikiik

iijiij

g

gji

pλδ

λδλ

Page 22: One the Role and Impact of the Metaparameters in t ... · • SNE and t-SNE are nowadays considered as ‘good’ methods for NDLR • Examples Compstat 2010 On the role and impact

SNE and t-SNE

• In the embedding space, the similarity between xi and xj is defined as

• Similarities are symmetric• t(u,n) is proportional to a Student t with n degrees of freedom

(n controls the thickness of the tail)• SNE: n → ∞ t-SNE: n = 1

Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 22

SNE and t-SNE → Algorithm

( ) ( )( )

=

=∑≠

otherwise,

, if0

lkkl

ijij

ndt

ndtji

nq ( )

+=

+−21

21,

n

nu

nut

Page 23: One the Role and Impact of the Metaparameters in t ... · • SNE and t-SNE are nowadays considered as ‘good’ methods for NDLR • Examples Compstat 2010 On the role and impact

SNE and t-SNE

• Now that similarties are defined in both spaces, how to compare them?

– This seems to be a major difference with respect to other methods, basedon square erros!

• E is minimized by gradient descent, to find locations xi.

Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 23

SNE and t-SNE → gradient

( )qpDE KL=

( ) ( ) ( )∑=

−+

−+=

∂∂ N

jji

ij

ijij

ixx

nd

nqp

nn

xE

121

22 λ

Page 24: One the Role and Impact of the Metaparameters in t ... · • SNE and t-SNE are nowadays considered as ‘good’ methods for NDLR • Examples Compstat 2010 On the role and impact

SNE and t-SNE

• Now that similarties are defined in both spaces, how to compare them?

– This seems to be a major difference with respect to other methods, basedon square erros!

• E is minimized by gradient descent, to find locations xi.

Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 24

SNE and t-SNE → gradient

( )qpDE KL=

( ) ( ) ( )∑=

−+

−+=

∂∂ N

jji

ij

ijij

ixx

nd

nqp

nn

xE

121

22 λ

xi moves towards xj

Page 25: One the Role and Impact of the Metaparameters in t ... · • SNE and t-SNE are nowadays considered as ‘good’ methods for NDLR • Examples Compstat 2010 On the role and impact

SNE and t-SNE

• Now that similarties are defined in both spaces, how to compare them?

– This seems to be a major difference with respect to other methods, basedon square erros!

• E is minimized by gradient descent, to find locations xi.

Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 25

SNE and t-SNE → gradient

( )qpDE KL=

( ) ( ) ( )∑=

−+

−+=

∂∂ N

jji

ij

ijij

ixx

nd

nqp

nn

xE

121

22 λ

xi moves towards xj

Similarity error –adjusts amplitude

Page 26: One the Role and Impact of the Metaparameters in t ... · • SNE and t-SNE are nowadays considered as ‘good’ methods for NDLR • Examples Compstat 2010 On the role and impact

SNE and t-SNE: gradient

• Now that similarities are defined in both spaces, how to compare them?

– This seems to be a major difference with respect to other methods, basedon square erros!

• E is minimized by gradient descent, to find locations xi.

Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 26

SNE and t-SNE → gradient

( )qpDE KL=

( ) ( ) ( )∑=

−+

−+=

∂∂ N

jji

ij

ijij

ixx

nd

nqp

nn

xE

121

22 λ

xi moves towards xj

Similarity error –adjusts amplitudeDamping

factor

Page 27: One the Role and Impact of the Metaparameters in t ... · • SNE and t-SNE are nowadays considered as ‘good’ methods for NDLR • Examples Compstat 2010 On the role and impact

SNE and t-SNE: gradient

• Damping factor is similar to in CCA and CDA:– Large distances are less important– Distances in the embedding space are used, to allow tears (favoring

extrusions)

Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 27

SNE and t-SNE → gradient

( ) ( ) ( )∑=

−+

−+=

∂∂ N

jji

ij

ijij

ixx

nd

nqp

nn

xE

121

22 λ

xi moves towards xj

Similarity error –adjusts amplitudeDamping

factor

( )ijdFλ

Page 28: One the Role and Impact of the Metaparameters in t ... · • SNE and t-SNE are nowadays considered as ‘good’ methods for NDLR • Examples Compstat 2010 On the role and impact

SNE and t-SNE: distributions

• Why different distributions for pij and qij ?• Remember that distances have often to be enlarged: heavier tails (in

the embedding space) help!

Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 28

SNE and t-SNE → distributions

( ) ( ) ( )∑=

−+

−+=

∂∂ N

jji

ij

ijij

ixx

nd

nqp

nn

xE

121

22 λ

xi moves towards xj

Similarity error –adjusts amplitudeDamping

factor

Page 29: One the Role and Impact of the Metaparameters in t ... · • SNE and t-SNE are nowadays considered as ‘good’ methods for NDLR • Examples Compstat 2010 On the role and impact

SNE and t-SNE: distributions

• Non-trivial solution of min E• After some (rough) approximations:

• Properties– f is monotonically increasing– with SNE (n → ∞):– if δij << λi, then

• t-SNE tries to preserved streched distances• SNE distances are scaled by λi

• n and λi act more or less in the same way

Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 29

SNE and t-SNE → distributions

( )( )

nn

nfdi

ijijij −

+=≈

2

2

1exp

λ

δδ

( ) iijijf λδδ =

( ) ( )1+= nf iijij λδδ

Page 30: One the Role and Impact of the Metaparameters in t ... · • SNE and t-SNE are nowadays considered as ‘good’ methods for NDLR • Examples Compstat 2010 On the role and impact

Outline

• NDLR: a historical perspective– stress function– intrusion and extrusions– geodesic distances

• SNE and t-SNE– algorithm– gradient– transformed distances

• Experiments– with Euclidean distances– with geodesic distances

• Conclusions

Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 30

Page 31: One the Role and Impact of the Metaparameters in t ... · • SNE and t-SNE are nowadays considered as ‘good’ methods for NDLR • Examples Compstat 2010 On the role and impact

Experiments

• Data: swiss roll

• Quality measures: in a K-neighborhood, we count the number of intrusions and extrusions. Then– QNX(K) measures the overall number of intrusions and extrusions

(higher QNX(K) means better quality)– BNX(K) measures the difference between the number of intrusions and

extrusions (positiveBNX(K) means intrusive)

• Use of both Euclidean and geodesic distances

Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 31

Experiments

Page 32: One the Role and Impact of the Metaparameters in t ... · • SNE and t-SNE are nowadays considered as ‘good’ methods for NDLR • Examples Compstat 2010 On the role and impact

Results with Euclidean distances

Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 32

Experiments → with Euclidean distances

increasingperplexity

Page 33: One the Role and Impact of the Metaparameters in t ... · • SNE and t-SNE are nowadays considered as ‘good’ methods for NDLR • Examples Compstat 2010 On the role and impact

Results with Euclidean distances

• Difficult problem! (lowvalues of QNX(K))

• t-SNE largely dependson perplexity

Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 33

Experiments → with Euclidean distances

increasingperplexity

Page 34: One the Role and Impact of the Metaparameters in t ... · • SNE and t-SNE are nowadays considered as ‘good’ methods for NDLR • Examples Compstat 2010 On the role and impact

Results with Euclidean distances

Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 34

Experiments → with Euclidean distances

Page 35: One the Role and Impact of the Metaparameters in t ... · • SNE and t-SNE are nowadays considered as ‘good’ methods for NDLR • Examples Compstat 2010 On the role and impact

Results with geodesic distances

Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 35

Experiments → with geodesic distances

increasingperplexity

Page 36: One the Role and Impact of the Metaparameters in t ... · • SNE and t-SNE are nowadays considered as ‘good’ methods for NDLR • Examples Compstat 2010 On the role and impact

Results with geodesic distances

• Geodesic distances facilitate the task

• CCA performs well!

• t-SNE still dependson perplexity, but large values help

Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 36

Experiments → with geodesic distances

increasingperplexity

Page 37: One the Role and Impact of the Metaparameters in t ... · • SNE and t-SNE are nowadays considered as ‘good’ methods for NDLR • Examples Compstat 2010 On the role and impact

Outline

• NDLR: a historical perspective– stress function– intrusion and extrusions– geodesic distances

• SNE and t-SNE– algorithm– gradient– transformed distances

• Experiments– with Euclidean distances– with geodesic distances

• Conclusions

Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 37

Page 38: One the Role and Impact of the Metaparameters in t ... · • SNE and t-SNE are nowadays considered as ‘good’ methods for NDLR • Examples Compstat 2010 On the role and impact

Conclusions

• t-SNE is a distance preservation method

• Stretching distances : good idea!• But transformation in t-SNE not always optimal (not data driven)• Careful tuning of parameters!

• Damping factor for large distances: good idea• But this does not solve the issue of non-Euclidean manifolds (ex:

hollow sphere)• Situation is better with clustered data (stretching large distances

improves the separation between clusters)

Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 38

Conclusions

Page 39: One the Role and Impact of the Metaparameters in t ... · • SNE and t-SNE are nowadays considered as ‘good’ methods for NDLR • Examples Compstat 2010 On the role and impact

Advertisement

Nonlinear Dimensionality ReductionSpringer, Series: Information Science and StatisticsLee, John A. - Verleysen, Michel 2007, Approx. 330 p. 8 illus. in color., HardcoverISBN: 978-0-387-39350-6

Software available at http://www.dice.ucl.ac.be/mlg/index.php?page=NLDR

Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 39

References