out of sample extensions of pca, kernel pca, and mdsgchen/math285f15/math 285 - project...
TRANSCRIPT
Math 285 Project, Fall 2015
Wilson A. Florero-Salinas
Dan Li
Out of sample extensions of PCA, kernel PCA, and
MDS
Page 1 of 15
TABLE OF CONTENTS
1. Introduction ........................................................................................................................................................................................... 2
2.1 Principal Component Analysis (PCA) .................................................................................................................................................... 2
2.2 The out of sample extension of PCA .................................................................................................................................................... 3
2.3 The out of sample extension of PCA DEMO ......................................................................................................................................... 4
3.1 Kernel PCA ............................................................................................................................................................................................ 5
3.2 The out of sample extension of Kernel PCA ......................................................................................................................................... 6
3.3 The out of sample extension of KPCA (demo) ..................................................................................................................................... 7
4.1 Multidimensional Scaling (MDS) .......................................................................................................................................................... 7
4.2 The out of sample extension of MDS ................................................................................................................................................... 8
4.3 The out of sample extension of MDS (DEMO) ................................................................................................................................... 10
4.4 The out of sample extension of MDS (DEMO) ................................................................................................................................... 11
5. Conclusion ............................................................................................................................................................................................ 11
6. Appendix .............................................................................................................................................................................................. 12
7. References ........................................................................................................................................................................................... 15
Page 2 of 15
1. INTRODUCTION
Classification is the problem of categorizing a new observation based on a set of observations called the “training set”, whose
membership is already known. Classification problems usually involve the training of a model, using the training set, to later make
predictions or classify new observations into one of the known categories. In recent years the collection of huge amounts of data has
been eased by improvements in technology, and it is now common to have observations with thousands, if not millions of features.
Even though there has been a great jump in technology, many modern computers are still not able to efficiently handle observations
with very large number of features, which in some cases makes model training unfeasible. However, in many cases, it is still possible
to train a model with a subset of the features, or a transformation of the feature space into a smaller space in which feature
selection is possible. This leads us to the idea of Dimensionality Reduction.
Dimensionality reduction (DR) is the process of reducing the number of variables under consideration for the purpose of feature
selection or feature extraction. To this end, dimensionality reduction allows the modeler to train models using less variables, and in
some cases, obtain a visualization of the data set in two or three dimensions. Three common DR techniques known in the literature
are Principal Component Analysis (PCA), Kernel PCA, and Classical Multidimensional Scaling (MDS). To perform the corresponding
transformation each of these methods use the entire data set. The question is now: “If new data becomes available, how can these
new observations be incorporated in the new feature space?”
In some cases redoing the DR is enough, but that is not our present concern. In other cases the data set may so large that retraining
is no longer feasible. In this context, we need a way to incorporate these new observations in the new feature space, without
retraining and if possible, recycling information already obtained from the first time we performed dimensionality reduction. This
idea of bringing in new observations to the new feature space is known in the literature as “out-of-sample-extensions”, which will be
the focus of this paper. In the following we briefly review PCA, kernel PCA, and MDS before considering their corresponding out-of-
sample extensions1.
2.1 PRINCIPAL COMPONENT ANALYSIS (PCA)
Principal Component Analysis (PCA) is a linear DR feature extraction tool. PCA attempts to find a linear subspace of lower dimension
than the original feature space, where the new features have the largest variance [B2006]. One way to derive the principal
components of a data set { } d
ix is by maximizing the trace of the covariance matrix of data points { } k
iy , given by
1
1: ( )( )
nT
Y i i
i
S y y y yn
, where 1
1 n
i
i
y yn
and where we assume that there is a TV such that
T
i iy V x . An
equivalent way to derive the principal components is by solving the problem of finding the “best” k-dimensional subspaces (“line”)
that minimize the orthogonal distances. We take this approach here. Concretely, let { } d
ix , where 1,2, ,i n . The task is
to solve [S2003]
2
2min || ( ) ||i S iS
x P x
where ( )SP is the projection onto the subspace S . Let dm represent a fixed point and
d kB an orthonormal basis of
S . If x m B is a parametric equation for the plane, then ( ) ( )T
S i iP x m BB x m . It can be shown that the above
minimization problem is equivalent to solving
1 The reader is referred to the references for additional details.
Page 3 of 15
2min || ||T
FB
X XBB
This minimum is achieved whenT
kX XBB and 1
1 n
i
i
m xn
. Here kX is the best rank-k approximation to X under the
Frobenius norm, andd kB V , where the columns correspond to the first k columns2 of V in the SVD of X . In other words
TX U V . The above then can be summarized in the following theorem
Theorem: The projection of X onto the best-fit k -plane is
T
k n k k k k kX U V
The new coordinates with respect to the basis d kV
i.e., the rows of d k n k k kXV U are called the principal components.
This theorem allows us to easily find the first k principal components of X using the following algorithm
Algorithm 1a: Principal Component Analysis (PCA)
Input: Data set 1 2[ , , , ]T
nX x x x
Ouput: Top k principal components
1. Center the data: i ix x x for all i
2. Perform SVD on: TX U V
3. Return the rows of: d k n k k kXV U
2.2 THE OUT OF SAMPLE EXTENSION OF PCA
In the previous section, we obtained the matrix d kV which mapped points d
ix to points k
iy via a linear map. If the
points are centered to begin with, then ( ) T
S i d k d k iP x V V x or in matrix form ( ) T
S d k d kP X XV V . To obtain the points in the
subspace with dimension k , we simply consider d kXV S , where the subspace S was constructed using the data set
1 2[ , , , ]T
nX x x x . If a new data set becomes available, how can PCA be extended to this new data set? We now illustrate this
extension with figure 1, in Section 2.3. Assume we have a new data set in2
. According to the Algorithm 1a, we center the data.
Then using this centered data we construct the line S and map these points to the line. If a new data set 2mZ becomes
available, then we can map it to the line using the matrix d kV . However, because the original data set has been centered, the data
points and the line (subspace) are in a new set of axis. To bring the new data set Z into the current set of axis, it must be centered
exactly the same way X was centered. Finally, we may project the centered data set Z via the matrix d kV [G1966]. This is
summarized in Algorithm 1b and a visualization in Section 2.3:
2 In this paper, given a matrix
M NA , we define
M kA to be the matrix only including the first k columns. We will also this notation
to denote its dimensions.
Page 4 of 15
Algorithm 1b: Out of sample (PCA)
Input: New data set 1 2[ , , , ]T
mZ z z z , d kV
Ouput: Top k principal components
1. Center the data: i iz z x for all i
2. Return the rows of: d kZV
Both Algorithms 1a and 1b written as a function in MATLAB can be found in Appendix 6.1.
2.3 THE OUT OF SAMPLE EXTENSION OF PCA DEMO
Figure 1: (L) Original data set (M) centered data along with best-fit line (R) current projection space and new uncentered data (red points).
Figure 2: (L) New data points brought to current axis by centering (R) new points projected onto the current space.
It is worth comparing the out-of-fit extension with a retrained model that uses the entire data set. Notice the difference.
Page 5 of 15
3.1 KERNEL PCA
PCA is a linear method that cannot properly handle nonlinear data. If the data is nonlinear, the main idea of Kernel PCA is to use a
map ( ) that takes each data vector ix to a vector ( )ix in a higher dimensional space (called feature space) where PCA can be
applied [W2012]. Concretely, let n data points d
ix be given, and suppose : d D , where D d . Assume further
that 1
1( ) 0
n
i
i
xn
, meaning that the feature vectors have zero mean. Define 1 2: ( ) ( ) ( )T n D
nx x x , and
consider the SVD T
n n n D D DU V . Then applying PCA to via Algorithm 1 the new coordinates with respect to the basis
D kV are given by the rows of D k n k n kV U . Usually the ( )ix are unknown and it is not possible to work out the
decomposition explicitly. To remediate this define ( , ) : ( ) ( )T
i j i jx x x x and consider
( ) ( ) ( , ) :T T
i i i jx x x x K . The matrix K is called the Kernel matrix, which under the proper mapping ( ) ,
which is positive semi-definite . If the data is not centered in the feature space, it can be shown that by considering
1
1( ) ( ) ( )
n
i i i
i
x x xn
, constructing n D we obtain the matrix3
2
1 1 1n n n n
n n nK K K K K 1 1 1 1 , for which
we can obtain similar equations by replacing K by K in previous formulas or formulas that follow4 [S1998] . Proceeding with our
analysis, we have T
T T T T T
n n n D D D n n n D D D n n n D n D n nK U V U V U U so that2
n n n n n nKU U , which is
an eigenvalue problem. In other words, by solving for the eigenvalues and eigenvectors of K , we are able to obtain the matrices
needed in the SVD of . Note that if we consider 2T T
D D D D D DV V we obtain2T
D k D k D kV V . And for the
purpose of principal component extraction, we choose vectors so that || || 1T
i iv v , which leads to || || 1/T
i i iu u , where
( ) ( )T
i i iK , for 1, ,i k . This is equivalent to considering 1
k k k k k k k kK U U
because we want to scale
the iu by 1/ i so that || || 1i iu . KPCA is summarized in the following Algorithm 2a.
Algorithm 2a: Kernel PCA
Input: Data set 1 2[ , , , ]T
nX x x x
Ouput: Top k principal components
1. Construct the kernel matrix ( , )i jK x x
2. Center n nK , via
2
1 1 1n n n n
n n nK K K K K 1 1 1 1
3. Solve the eigenvalue problem: 2
n n n n n nKU U
4. Return the rows of
1 1n k k k k kU u u
where we chose || || 1/i iu
3 Here define [1] n n
n
1 4 Place a tilde on all variables, and the results are similar
Page 6 of 15
3.2 THE OUT OF SAMPLE EXTENSION OF KERNEL PCA
To obtain an out-of-sample extension we proceed as before: center the new data with respect to the training set, and then apply the
projection matrix. Here the transformation D kV is explicitly unknown, but we can use Kernels to our advantage like in the previous
section. Assume that 1 2
T
mZ z z z is the new data set. Define 1 2: ( ) ( ) ( )T m D
Z mz z z . This set may need
centering, so define 1
1( ) ( ) ( )
n
i i i
i
z z xn
to obtain the matrix 1 2: ( ) ( ) ( )T
Z mz z z Applying D kV gives
2 2T T
Z D k Z D k k k Z D k k kV V V
, where we are assuming contains centered data. Otherwise
consider 2 2T T
Z D k Z D k k k Z D k k kV V V
. Let : ( ) ( )T m n
Z i jK z x , then
2 1
Z D k Z n k k k k k Z n k k kV K U K U
Note that ZK was constructed using centered data in the feature space, which is explicitly unknown; however, it is possible to write
the kernel in terms of the original kernels as
2
1 1 1T T
Z Z n m Z n n n m n nn n n
K K K K K 1 1 1 1
The out of sample KPCA is summarized in Algorithm 2b.
Algorithm 2b: out of sample Kernel PCA
Input: New data set 1 2
T
mZ z z z , n kU , k k
Ouput: Top k principal components for new data
1. Construct the kernel matrix ( , )Z i jK z x
2. Center ZK , via5
2
1 1 1T T
Z Z n m Z n n n m n nn n n
K K K K K 1 1 1 1
3. Return the rows of:
1
1
1
1 1Z n k k k Z k
k
K U K u u
Both Algorithms 2a and 2b written as a function in MATLAB can be found in Appendix 6.2.
5 Here
T m n
n m
1 , where 1n m 1 [S1998]
Page 7 of 15
3.3 THE OUT OF SAMPLE EXTENSION OF KPCA (DEMO)
Figure 3. Left: Data set in original dimensions. Green points correspond to out-of-sample observations. Middle & Right: out-of-sample extensions. Green
observations are mapped correctly to their corresponding clusters.
Figure 4: A 2D perspective of the above plot.
4.1 MULTIDIMENSIONAL SCALING (MDS)
Multidimensional scaling (MDS) visualizes a set of high dimensional points in lower dimensions (usually two or three) based on their
pairwise distances [Y1985]. The problem MDS solves is to map the original data into lower dimensions while preserving pairwise
distances. In other words, two points that have large distance remain far apart in the reduced dimension and those points that are
close by shall be mapped close to each other. Mathematically: Given a set of n points and their pairwise distances ijd , find n
points { } k
iy such that 2
2
,
|| ||i j ij
i j
y y d is minimized. To solve this problem consider the proximity matrix 2[ ]ijD d .
The proximity matrix is invariant to change in location and rotation, and we can obtain a unique solution provided that we assume
0iy . If we consider the equality 2|| ||i j ijy y d , it can be shown thatTD YY , where 1 2[ ]T n k
nY y y y and
D is the centered proximity matrix6. To explicitly find the iy we use the fact that D is unitarly diagonalizable and obtain
1/2
n k k kY U , whereTD U U . The above approach has been seen before: From data points{ } n
ix , create a
6 Some properties of D include: (1) TD D (2) [ ]T
i jD y y , and (3) D 1 0 & D 1 0 .
Page 8 of 15
“neighborhood” or similarity matrix D , center this matrix if needed, and then solve an eigenvalue problem [B2003;pg1]. Concretely,
if we let i ij
j
S D be the ith row sum of D , the centering is done via
2
1 1 1 1
2ij ij i j k
k
D D S S Sn n n
and the embedding corresponds to the rows of 1 1 k kY u u
. [B2003; pg2]. This is summarized in Algorithm 3a:
Algorithm 3a: MDS
Input: Date set 1 2[ , , , ]T
nX x x x (or skip to step 2)
Ouput: Embedding 1 2
T n k
nY y y y
1. Construct the pairwise squared distance matrix 2[ ]ijD d
2. Center n nD , via
2
1 1 1 1
2ij ij i j k
k
D D S S Sn n n
3. Diagonalize TD U U
4. Return the rows of 1/2
1 1n k k k k kU u u
4.2 THE OUT OF SAMPLE EXTENSION OF MDS
One of the facts to consider in the out of sample extension of MDS is to note that simultaneously embedding m new objects is not
equivalent to individually embedding the same objects one at a time, as individual embeddings does not attempt to approximate
dissimilarities between pairs of new objects [TP2008; pg 3] . For simplicity let 1m . The extension for 1m is straightforward.
Suppose that nd denotes the squared dissimilarity of the new object x from the original n objects. In other words,
2 2 2
1 2[ ]T
nd d d d , where || ||i id x x . Construct a new matrix 1
0
n
T
D dA
d
. Before we formalize the out of
sample extension, let us introduce notation and a few definitions.
Definition: Given mw , we say that 1, , k
my y is w centered if and only if 1
0m
j j
j
w y
Definition: For mw such that 1 0T
m w 1 , define 1 1
1 1
1( )
2
T
m mw
m m
w wC I C I
w w
1 1
1 1
Notice that 1(D)
m
1 gives the same matrix that was used in the centering of D in Algorithm 3a. Moreover, under appropriate
choices of w , the matrix ( )w C has special properties that allow it to be factored for the purpose of MDS [TP2008], exactly as we
did with the matrix D . The usefulness of ( )w C will be shown in the discussion that follows, but before we start our analysis, let
Page 9 of 15
us revisit the question of why applying MDS to A does not solve the out-of-sample extension problem. The reason is due to the fact
that applying MDS to D entails approximating inner products1(D)
m
1 , which are computed with respect to the centroid of the
original n points. On the other hand, if we let 1
( 1) 1 1[ 1]T n
n n
1 1 , applying MDS to A entails approximating inner products
( 1) 1(A)
n
1, computed with respect to the centroid of all 1n points [TP2008;pg5]. Thus to solve the out-of-sample problem we
must preserve the original centering. To do this, let 1
1[ 0]T n
nw
1 and construct 1(D)
( ) :n
w T
bA B
b
1. The goal is
to find ak
xy that corresponds to the out of sample extension. Let 1 2
T n k
nY y y y represent the embedding of
the original n points and construct a new matrix ( 1)
* [ ]T T n k
xY Y y . Then the problem reduces to approximating B via Y
as follows:
2 2
2
* *
1
min || || min 2n
T T T
i i x x x
ik k
x xy yB Y Y b y y y y
If the term 2
T
x xy y is dropped, the objective becomes convex with solution T T
xY Y y Y b , where xy represents an
approximation7 of an out-of-sample extension corresponding to a new and unknown data point dx . If Y has full rank, then
1
T
xy Y Y b
. There is a strong relationship between PCA and CMDS [G1966] in which it can be shown that xy is a solution to
T T
k k x kX X y X b and that 1 ( ) ( )T
T T
nb x x x x x x so that 1/2 T
x k k n ky U b
[T2010, pg 3-4]. In other words, to
obtain an out-sample-extension for CMDS, we would first have to compute PCA8 on the original data set X , which is unknown by
assumption, and thus not very useful. To overcome this difficult, we can explicitly compute b and write it in terms of known values.
First, by definition,
( 1) 1 ( 1) 1
( 1) 1 ( 1) 1 ( 1) 1 ( 1) 12
1
1
1 1 1
2
1 1 1 1
2
01 1 1 1
00 0 0 0 02
T T
n nT
T T T T
n n n n
n n n n n
T T T
n
D bB I w A I w
n nb
A w A Aw w Awn n n
D d D d D d
d d dn n
1 1
1 1 1 1
1 1 1
1
1
2
1
1
2
1 1 1 1 1 1
0
0 0 0 0
1 1 1 1
02
n n n n n
T
n
n n n n n n n n n n n n n n n
T TT
n n n n n n n n n n
D d
dn
D d D D D DD d
D d d d D Dd n n n
1 1 1
1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
Since we are only interested in b , either taking the last column or row of B , we have
1 12
1 1 1 1
2n n n n n nb d d D D
n n n
1 1 1 1
7 To obtain the optimal solution, the above nonlinear optimization problem must be solved numerically. 8 MDS and PCA attempt to find the most accurate data representation in a lower dimensional space. MDS preserves the most “similarities” of the original data set, while PCA reduces dimension by preserving most of the covariance of data.
Page 10 of 15
The out-out-of-sample extension for the case 1m is summarized in Algorithm 3b, and its corresponding implementation for the
approximate case is found in Appendix 6.3
Algorithm 3b: out of sample MDS (m = 1)
Input: Data setn kU
,1/2
k k ,from MDS &nd , the
square dissimilarity between new object and original n
objects
Ouput: out of sample xy
1. Construct vector b ,
1 12
1 1 1 1
2n n n n n nb d d D D
n n n
1 1 1 1
2. For approximate out-of-sample extension (column)
vector return1/2 T
x k k n ky U b
3. For optimal out-of-sample extension return
construct * [ ]T
xY Y y , D
:T
bB
b
, and9
returnxy that minimizes
2
* *min || ||T
kxy
B Y Y
Notice that the objective function for the optimal solution is a fourth degree polynomial, which can be solved using gradient
numerical methods [T2008;pg10]
4.3 THE OUT OF SAMPLE EXTENSION OF MDS (DEMO)
Figure 5: Map of 12 Chinese cities based on their pairwise distances. Top left: MDS applied to the entire data except Lhasa. Bottom left: out-of-sample extension
for Lhasa. Top Right and Bottom: The same analysis and comparison done for Honhot.
9 Here 2 2
21 1 1
1 1
2
n n n
i ij
i i j
d dn n
Page 11 of 15
Figure 6: MDS on the seed data set10. Left: MDS applied to the entire data set. Red crosses correspond to points that will be traced in the out-of-sample extension.
Right: MDS on a subset of the data set. Green points correspond to the out-of-sample extensions.
4.4 THE OUT OF SAMPLE EXTENSION OF MDS (DEMO)
In section 4.3, we provided the out-of-sample Algorithm for MDS for the case 1m . For the case 1m , analogously define
n mA to be the squared dissimilarities between all n m objects. First, construct the matrix ( ) 1
( ) n m
n mA
1 that
consists of the MDS embedding of all n m objects. As before, let 1 2
T n k
nY y y y represent the embedding of the
original n points, and let 1 2
T m k
mZ z z z be the matrix containing the k -dimensional embedding of m new objects.
To find the out-of-sample extension for these new m objects let 1[ 0 0]T n m
nw
1 , construct the matrix [TP2008;pg2]
1(D)
( ) :n YZ
w T
YZ ZZ
BA B
B B
1, and obtain the optimal Z by solving the optimization problem:
2 2 2min || || min 2 || || || ||T T T T
YZ ZZm k m kZ Z
YB Y Z B YZ B ZZ
Z
5. CONCLUSION
In this paper we discussed the out-of-sample extensions for PCA, Kernel PCA, and MDS. All three extensions share a “centering” step
which can be done easily in the PCA case, or needed to be applied to the Kernel matrix or dissimilarity matrix for Kernel PCA and
MDS, respectively. To find the out-of-sample extension for PCA, centering and mapping to the best-fit line is enough. As for Kernel
PCA, an additional Kernel matrix, between the training and new data needs to be created, and centering is also important before
making transformations using the previously built matrices. For the third method, MDS, the construction of matrices involving
training and new data sets, and centering is also required. Unlike the previous two methods, exact analytic solutions cannot be
found, but rather an approximate (based on PCA) or an optimal (by solving an optimization problem) solution can be computed. We
have also provided some examples in which we show that out-of-sample extensions are not equivalent to an embedding or
transformation of the entire data set, especially in methods that are sensitive to outliers.
10 Can be downloaded at : http://archive.ics.uci.edu/ml/datasets/seeds
Page 12 of 15
6. APPENDIX
Appendix 6.1
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% Principal Component Analysis (PCA)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Xtr = training data set. Each row is an observation.
% Xtst = new observations (out of sample)
% k = # of components to keep.
function [XtrPCA,V,XtstPCA] = PCA(Xtr,Xtst,k)
meanXtr = mean(Xtr);
% center the data
X_tilde = Xtr - repmat(meanXtr, size(Xtr,1), 1);
[U,S,V] = svds(X_tilde,k);
XtrPCA = U*S;
V = V(:,1:k); % projection matrix
% out of sample extension
XtstPCA = (Xtst-repmat(meanXtr, size(Xtst,1), 1))*V;
Page 13 of 15
Appendix 6.2
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%% Kernel PCA %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Xtr = data in which each row is an observation. % Xtst = new observations (out of sample) % k = reduced dimension; var = sigma^2 % Here we use the Gaussian Kernel. Code can easily be modified for % other kernels.
function [XtrKPCA,XtstKPCA] = KPCA(Xtr,Xtst,k,var)
% Calculate pairwise distance matrices Dtr = pdist2(Xtr,Xtr); % Constructing Kernel matrix using Gaussian Kernel K = Kn(Dtr, var); % centering n = size(Xtr,1); Kc = K - K*ones(n,n)/n - ones(n,n)*K/n + ones(n,n)*K*ones(n,n)/(n^2);
% Obtain the evectors of K_tilde that correspond to the largest evalues. % Those evectors are the data points already projected onto the respective % principal components.
[U,S] = eig(Kc,'vector'); [~,indx] = sort(S,'descend'); S = S(indx); S = diag(S); U = U(:,indx); Sk = abs(S(1:k,1:k)); XtrKPCA = U(:,1:k)*sqrt(Sk);
% out of sample extension of KPCA Dtst = pdist2(Xtst,Xtr); Kz = Kn(Dtst, var); % centering m = size(Xtst,1); Kzc = Kz - ones(m,n)*K/n - Kz*ones(n,n)/n + ones(m,n)*K*ones(n,n)/(n^2); %XtstKPCA = Kzc*U(:,1:k)*inv(sqrt(Sk)); XtstKPCA = bsxfun(@rdivide,Kyc*XtrKPCA,diag(Sk)'); end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Gram matrix function %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function K = Kn(D, var) % k(xi,xj) = exp(-0.5*||xi-xj||^2/var_j) K = exp(bsxfun(@rdivide, -0.5*D.^2, var)); end
Page 14 of 15
Appendix 6.3
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%% Math 285 Project Function: MDS %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %INPUT: X = distance matrix, assumed to be symmetric. % k = target dimension; Y = new (lower) coordinates in dimension k % d = distance of new point with n original points as a row vector.
function [Y, y_x, stress] = mds(X,d,k)
n = size(X,1); D = X.^2; meanD = mean(D,1); mmeanD = mean(meanD); %D_tilde = 0.5(-D + ones(n,n)*D/n + D*ones(n,n)/n - ones(n,n)*D*ones(n,n) D_tilde = 0.5*(repmat(meanD', 1, n) + repmat(meanD, n, 1) - D - mmeanD);
% Constructing matrix Y [U,S] = eig(D_tilde,'vector'); [~,indx] = sort(S,'descend'); S = S(indx); S = abs(diag(S)); U = U(:,indx); Y = U(:,1:k)*sqrt(S(1:k,1:k)); % Computing stress: stress = sqrt(sum(sum(x,1))/(n^2*l_dotdot)); stress = sqrt(2*sum((squareform(X) - pdist(Y)).^2)/mmeanD)/n;
% Out-of-sample extension d = d'.^2; % b = d - ones(n,n)*d/n - D*ones(n,1)/n + ones(n,n)*D*ones(n,1)/n^2; b = -0.5*(d - mean(d) - mean(D,2) + mmeanD); y_x = b'*Y/S(1:k,1:k); % (sqrt(S)U'b)'; return as row vector
Page 15 of 15
7. REFERENCES
[A2003] Anderson, M.J and Robinson, J. Generalized discriminant analysis based on distances. Australian & New Zealand Journal of
Statistics, 45:301–318, 2003
[B1997] Borg, I., & Groenen, P. (1997). Modern multidimensional scaling: theory and appli- cations. New York: Springer.
[B2003] Y. Bengio, J.-F. Paiemont, and P. Vincent. Out-of-sample extensions for LLE, Isomap, MDS, eigenmaps, and spectral
clustering. Technical Report 1238, D´epartement d’Informatique et Recherche Op´erationelle, Universit´e de Montr´eal, Montr´eal,
Qu´ebec, Canada, July 2003.
[B2006] Bishop, Christopher M. (ed.). Pattern Recognition and Machine Learning. Springer, Cambridge, U.K.,2006.
[G1996] J. C. Gower. Some distance properties of latent root and vector methods in multivariate analysis. Biometrika, 53:325–338,
1966.
[S1998]. Schölkopf, B., Smola, A. and Müller, K.-R. Nonlinear component analysis as a kernel eigenvalue problem. Neural
Computation, 1998
[S1999]. Scholkopf, B., Smola, A.,and Muller, K.-R. (1999). Kernel principal component analysis. In B.Scholkopf, C. J. C. Burges, and A.
J. Smola, editors, Advances in Kernel Methods – SV Learning, pages 327-352. MIT Press, Cambridge, MA
[S2003]. Shlens, J. A Tutorial on Principal Component Analysis: Derivation, Discussion, and Singular Value Decomposition
[T2008] M. W. Trosset and C. E. Priebe. The out-of-sample problem for classical multidimensional scaling. Computational Statistics
and Data Analysis, 52:4635–4642, June 2008
[T2010] M. W. Trosset and M. Tang. The out-of-sample problem for classical multidimensional scaling: Addendum, November 2010
[W2012]. Q. Wang, Kernel Principal Component Analysis and its Applications in Face Recognition and Active Shape Models. CoRR,
2012.
[Y1985] Forrest W. Young, University of North Carolina Kotz-Johnson (Ed.) Encyclopedia of Statistical Sciences, Volume 5, Copyright
(c) 1985 by John Wiley & Sons, Inc