one the role and impact of the metaparameters in t ... · • sne and t-sne are nowadays considered...
TRANSCRIPT
One the Role and Impact of the Metaparameters
in t-distributed Stochastic Neighbor Embedding
John A. Lee and Michel VerleysenMachine Learning Group Université catholique de LouvainLouvain-la-Neuve, [email protected]
Motivation for nonlinear dimensionality reduction
• High-dimensional data are– difficult to represent– difficult to understand– difficult to analyze
• Motivation #1:– To visualize data living in a d-dimensional space (d > 3)
• Motivation #2:– Models (regression, classification, clustering) based on high-dimensional
data suffer from the curse of dimensionality– Need to reduce the dimension of data while keeping information content!
Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 2
Motivation
Visualization
• These are data• It is difficult to see something…
Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 3
Motivation
annual increase (%), infant mortality (‰), illiteracy ratio (%), school attendance (%), GIP, annual GIP increase (%)
Visualization
• These are the same data• under different visualization paradigms• possible to see groups, relations, outliers, …
Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 4
Motivation
Not all NLDR methods perform equally !
Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 5
Motivation
Geodesic NLM
CDAIsomap
Stochastic Neighbor Embedding
• SNE and t-SNE are nowadays considered as ‘good’ methods for NDLR• Examples
Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 6
Motivation
From: L. Van der Maaten & G. Hinton, Visualizing Data using t-SNE, Journal of Machine Learning Research 9 (2008) 2579-2605
t-SNE MDS
Stochastic Neighbor Embedding
• SNE and t-SNE are nowadays considered as ‘good’ methods for NDLR• Examples
Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 7
Motivation
From: L. Van der Maaten & G. Hinton, Visualizing Data using t-SNE, Journal of Machine Learning Research 9 (2008) 2579-2605
t-SNE MDS
Outline
• NDLR: a historical perspective– stress function– intrusion and extrusions– geodesic distances
• SNE and t-SNE– algorithm– gradient– transformed distances
• Experiments– with Euclidean distances– with geodesic distances
• Conclusions
Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 8
NDLR: a historical perspective
From MDS to more general cost functions
• MDS follows the idea of
• Extension:
to give more importance to– small distances– close data– …
Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 9
NDLR: a historical perspective → Stress function
( )∑<
−ji
ijijX
d222min δ
jiij
jiij
xxd
yy
−=
−=δ where
( )∑<
−ji
ijijijX
dw222min δ
Traditional « stress » function:
( )∑<
−ji
ijijijX
dw2
min δ
Breakthrough #1
Limitations of linear projections
• Even simple manifolds can be poorly projected
Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 10
NDLR: a historical perspective → Intrusions and extrusions
Limitations of linear projections
• Even simple manifolds can be poorly projected
• Points originally far from eachother are projected close: this is an intrusion
Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 11
NDLR: a historical perspective → Intrusions and extrusions
Nonlinear projections
• Goal: to unfold, rather than to project (linearly)
Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 12
NDLR: a historical perspective → Intrusions and extrusions
Nonlinear projections
• Goal: to unfold, rather than to project (linearly)
• Intrusions can be hopefully decreased, but extrusions could appear
Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 13
NDLR: a historical perspective → Intrusions and extrusions
The user’s point of view
• Favouring intrusions or extrusions is related to the application (user’s point of view)
• General way of handling the compromise:
• Nowadays, few methods acknowledge this need for a trade-off !
Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 14
NDLR: a historical perspective → Intrusions and extrusions
( )
−+
=
σδ
λσ
λ ijijij f
dfw 1 Breakthrough #2
allows intrusions allows extrusions
Geodesic distances
• Goal: to measure distances along the manifold
• Such distances are more easily preserved
Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 15
NDLR: a historical perspective → Geodesic distances
Breakthrough #3
Geodesic and graph distances
• Geodesic distances: finding the shortest way between data along the manifold
• Problem: the manifold is unknown → approximate it by a graph• It exists efficient algorithms for finding shortest paths• The graph can be built by connecting data in a k-neighborhood, or in
a ε-ball
Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 16
2-d data
Approximation ofGeodesic distance
NDLR: a historical perspective → Geodesic distances
Distance preservation methods
Euclideandistances in HD space
Geodesicdistances in HD space
Metric MDS Isomap
Favorsintrusions
SammonNLM
GeodesicNLM
Favorsextrusions CCA CDA
Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 17
NDLR: a historical perspective
( ) ( )( )2
1,,,∑
=−=
N
jixy jidjidE
( ) ( )( )( )∑
<=
−=
N
jii y
xyNLM jid
jidjidE
1
2
,
,,
( ) ( )( ) ( )( )∑<=
−=N
jii
xxyCCA jidFjidjidE1
2 ,,, λ
Distance preservation methods
Euclideandistances in HD space
Geodesicdistances in HD space
Metric MDS Isomap
Favorsintrusions
SammonNLM
GeodesicNLM
Favorsextrusions CCA CDA
Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 18
NDLR: a historical perspective
( ) ( )( )2
1,,,∑
=−=
N
jixy jidjidE
( ) ( )( )( )∑
<=
−=
N
jii y
xyNLM jid
jidjidE
1
2
,
,,
( ) ( )( ) ( )( )∑<=
−=N
jii
xxyCCA jidFjidjidE1
2 ,,, λ
Computational load ↓Performances ↓
Computational load ↑Performances ↑
Outline
• NDLR: a historical perspective– stress function– intrusion and extrusions– geodesic distances
• SNE and t-SNE– algorithm– gradient– transformed distances
• Experiments– with Euclidean distances– with geodesic distances
• Conclusions
Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 19
SNE and t-SNE
SNE and t-SNE
• In the original space, the similarity between yi and yj is defined as
• Similarities are not symmetric (individual widths) !• pj|i is the empirical probability of yj to be a neighbor of yi
Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 20
SNE and t-SNE → Algorithm
( ) ( )( )
=
=∑≠
otherwise
if0
ikiik
iijiij
g
gji
pλδ
λδλ ( )
−=
2exp
2uug
SNE and t-SNE
• In the original space, the similarity between yi and yj is defined as
• Similarities are not symmetric (individual widths) !• pj|i is the empirical probability of yj to be a neighbor of yi
• Individuals widths λi: set (individually) through a global « perplexity » parameter
Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 21
SNE and t-SNE → Algorithm
( )
−=
2exp
2uug
( )PPXTijpH
=2
( ) ( )( )
=
=∑≠
otherwise
if0
ikiik
iijiij
g
gji
pλδ
λδλ
SNE and t-SNE
• In the embedding space, the similarity between xi and xj is defined as
• Similarities are symmetric• t(u,n) is proportional to a Student t with n degrees of freedom
(n controls the thickness of the tail)• SNE: n → ∞ t-SNE: n = 1
Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 22
SNE and t-SNE → Algorithm
( ) ( )( )
=
=∑≠
otherwise,
, if0
lkkl
ijij
ndt
ndtji
nq ( )
+=
+−21
21,
n
nu
nut
SNE and t-SNE
• Now that similarties are defined in both spaces, how to compare them?
– This seems to be a major difference with respect to other methods, basedon square erros!
• E is minimized by gradient descent, to find locations xi.
Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 23
SNE and t-SNE → gradient
( )qpDE KL=
( ) ( ) ( )∑=
−+
−+=
∂∂ N
jji
ij
ijij
ixx
nd
nqp
nn
xE
121
22 λ
SNE and t-SNE
• Now that similarties are defined in both spaces, how to compare them?
– This seems to be a major difference with respect to other methods, basedon square erros!
• E is minimized by gradient descent, to find locations xi.
Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 24
SNE and t-SNE → gradient
( )qpDE KL=
( ) ( ) ( )∑=
−+
−+=
∂∂ N
jji
ij
ijij
ixx
nd
nqp
nn
xE
121
22 λ
xi moves towards xj
SNE and t-SNE
• Now that similarties are defined in both spaces, how to compare them?
– This seems to be a major difference with respect to other methods, basedon square erros!
• E is minimized by gradient descent, to find locations xi.
Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 25
SNE and t-SNE → gradient
( )qpDE KL=
( ) ( ) ( )∑=
−+
−+=
∂∂ N
jji
ij
ijij
ixx
nd
nqp
nn
xE
121
22 λ
xi moves towards xj
Similarity error –adjusts amplitude
SNE and t-SNE: gradient
• Now that similarities are defined in both spaces, how to compare them?
– This seems to be a major difference with respect to other methods, basedon square erros!
• E is minimized by gradient descent, to find locations xi.
Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 26
SNE and t-SNE → gradient
( )qpDE KL=
( ) ( ) ( )∑=
−+
−+=
∂∂ N
jji
ij
ijij
ixx
nd
nqp
nn
xE
121
22 λ
xi moves towards xj
Similarity error –adjusts amplitudeDamping
factor
SNE and t-SNE: gradient
• Damping factor is similar to in CCA and CDA:– Large distances are less important– Distances in the embedding space are used, to allow tears (favoring
extrusions)
Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 27
SNE and t-SNE → gradient
( ) ( ) ( )∑=
−+
−+=
∂∂ N
jji
ij
ijij
ixx
nd
nqp
nn
xE
121
22 λ
xi moves towards xj
Similarity error –adjusts amplitudeDamping
factor
( )ijdFλ
SNE and t-SNE: distributions
• Why different distributions for pij and qij ?• Remember that distances have often to be enlarged: heavier tails (in
the embedding space) help!
Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 28
SNE and t-SNE → distributions
( ) ( ) ( )∑=
−+
−+=
∂∂ N
jji
ij
ijij
ixx
nd
nqp
nn
xE
121
22 λ
xi moves towards xj
Similarity error –adjusts amplitudeDamping
factor
SNE and t-SNE: distributions
• Non-trivial solution of min E• After some (rough) approximations:
• Properties– f is monotonically increasing– with SNE (n → ∞):– if δij << λi, then
• t-SNE tries to preserved streched distances• SNE distances are scaled by λi
• n and λi act more or less in the same way
Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 29
SNE and t-SNE → distributions
( )( )
nn
nfdi
ijijij −
+=≈
2
2
1exp
λ
δδ
( ) iijijf λδδ =
( ) ( )1+= nf iijij λδδ
Outline
• NDLR: a historical perspective– stress function– intrusion and extrusions– geodesic distances
• SNE and t-SNE– algorithm– gradient– transformed distances
• Experiments– with Euclidean distances– with geodesic distances
• Conclusions
Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 30
Experiments
• Data: swiss roll
• Quality measures: in a K-neighborhood, we count the number of intrusions and extrusions. Then– QNX(K) measures the overall number of intrusions and extrusions
(higher QNX(K) means better quality)– BNX(K) measures the difference between the number of intrusions and
extrusions (positiveBNX(K) means intrusive)
• Use of both Euclidean and geodesic distances
Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 31
Experiments
Results with Euclidean distances
Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 32
Experiments → with Euclidean distances
increasingperplexity
Results with Euclidean distances
• Difficult problem! (lowvalues of QNX(K))
• t-SNE largely dependson perplexity
Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 33
Experiments → with Euclidean distances
increasingperplexity
Results with Euclidean distances
Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 34
Experiments → with Euclidean distances
Results with geodesic distances
Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 35
Experiments → with geodesic distances
increasingperplexity
Results with geodesic distances
• Geodesic distances facilitate the task
• CCA performs well!
• t-SNE still dependson perplexity, but large values help
Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 36
Experiments → with geodesic distances
increasingperplexity
Outline
• NDLR: a historical perspective– stress function– intrusion and extrusions– geodesic distances
• SNE and t-SNE– algorithm– gradient– transformed distances
• Experiments– with Euclidean distances– with geodesic distances
• Conclusions
Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 37
Conclusions
• t-SNE is a distance preservation method
• Stretching distances : good idea!• But transformation in t-SNE not always optimal (not data driven)• Careful tuning of parameters!
• Damping factor for large distances: good idea• But this does not solve the issue of non-Euclidean manifolds (ex:
hollow sphere)• Situation is better with clustered data (stretching large distances
improves the separation between clusters)
Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 38
Conclusions
Advertisement
Nonlinear Dimensionality ReductionSpringer, Series: Information Science and StatisticsLee, John A. - Verleysen, Michel 2007, Approx. 330 p. 8 illus. in color., HardcoverISBN: 978-0-387-39350-6
Software available at http://www.dice.ucl.ac.be/mlg/index.php?page=NLDR
Compstat 2010 On the role and impact of the metaparameters in t-distributed SNE 39
References