random projection and svd methods in hyperspectral …

70
RANDOM PROJECTION AND SVD METHODS IN HYPERSPECTRAL IMAGING BY JIANI ZHANG A Thesis Submitted to the Graduate Faculty of WAKE FOREST UNIVERSITY GRADUATE SCHOOL OF ARTS AND SCIENCES in Partial Fulfillment of the Requirements for the Degree of MASTER OF ARTS Mathematics August, 2012 Winston-Salem, North Carolina Approved By: Jennifer Erway, Ph.D., Co-advisor Robert Plemmons, Ph.D., Co-advisor Miaohua Jiang, Ph.D., Chair Xiaofei Hu, Ph.D. V. Paul Pauca, Ph.D.

Upload: others

Post on 28-Mar-2022

5 views

Category:

Documents


0 download

TRANSCRIPT

BY
WAKE FOREST UNIVERSITY GRADUATE SCHOOL OF ARTS AND SCIENCES
in Partial Fulfillment of the Requirements
for the Degree of
Acknowledgments
I would like to thank my advisors for their hard work and patience. When I started to be interested in hyperspectral imaging, I had not taken any courses related to this topic. Dr. Erway spent a lot of time explaining the definitions and theorems of numerical linear algebra to me and answering my questions. Dr. Plemmons is learned and broad minded. I did not only learn about hyperspectral imaging from him; I also learned how to be a good researcher. He helped me modify this thesis several times and gave me many useful suggestions to make it better.
I also would like to thank Dr. Hu and Peter Zhang. Your intelligent ideas and rich experience in imaging helped me with numerical experiments. I do appreciate that you taught and shared your knowledge with me. Thanks to Dr. Jiang and Dr. Pauca for being my committee members. Thanks to Dr. Kirkman for your encouragement and suggestions. Thanks to Jennifer Blevins, the tutor of the Writing Center, for helping me proofread my thesis very carefully.
Last, I would like to thank my family and friends. Thank you for your love and support.
ii
2.1 Numerical Linear Algebraic Preliminaries . . . . . . . . . . . . . . . . 6
2.2 Statistics Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Chapter 3 SVD and PCA in Hyperspectral Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.1 SVD in Hyperspectral Imaging . . . . . . . . . . . . . . . . . . . . . 14
3.2 PCA in Hyperspectral Imaging . . . . . . . . . . . . . . . . . . . . . 15
Chapter 4 Compressive-Projection Principal Component Analysis . . . . . . . . . . . . 18
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2.2 Reconstruction of the Principle Components . . . . . . . . . . 23
4.3 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.2 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.4.3 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
6.1 Comparing Randomized SVD and Truncated SVD . . . . . . . . . . . 43
6.2 Comparing Randomized SVD and CPPCA . . . . . . . . . . . . . . . 44
iii
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
A.1 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Vita. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
iv
Abstract
Hyperspectral imaging provides researchers with abundant information with which to study the characteristics of objects in a scene. Processing the massive hyperspectral imagery datasets in a way that efficiently provides useful information becomes an important issue. In this thesis, we consider methods which reduce the dimension of hyperspectral data while retaining as much useful information as possible.
Traditional deterministic methods for low-rank approximation are not always adapt- able to process huge datasets in an effective way, and therefore probabilistic methods are useful in dimension reduction of hyperspectral images. In this thesis, we begin by generally introducing the background and motivations of this work. Next, we sum- marize the preliminary knowledge and the applications of SVD and PCA. After these descriptions, we present a probabilistic method, randomized Singular Value Decompo- sition (rSVD), for the purposes of dimension reduction, compression, reconstruction, and classification of hyperspectral data. We discuss some variations of this method. These variations offer the opportunity to obtain a more accurate reconstruction of the matrix whose singular values decay gradually, to process matrices without target rank, and to obtain the rSVD with only one single pass over the original data. Moreover, we compare the method with Compressive-Projection Principle Component Analysis (CPPCA). From the numerical results, we can see that rSVD has better performance in compression and reconstruction than truncated SVD and CPPCA. We also apply rSVD to classification methods for the hyperspectral data provided by the National Geospatial-Intelligence Agency (NGA).
v
Chapter 1: Introduction
The history of remote sensing using imaging dates back to the middle of the 20th
century. At that time, people started to take photos from the sky by fixing the camera
to a balloon, a kite, or a pigeon. These rudimentary experiments demonstrated
the basic idea of remote sensing [1, 2]. With the invention of the airplane, aerial
photography became possible. During World War I and World War II, people started
to recognize the importance of the information gathered from a remote place and
began using this information in a strategic ways [3]. The development of artificial
satellites in the latter half of the 20th century made remote sensing possible for civil,
research, and military purposes on a global scale [4]. After the development of these
satellites, remote sensing became a new scientific area. Fig. 1.1 illustrates this brief
history of remote sensing.
Figure 1.1: Remote sensing developed from rudimentary ideas, to aerial photography, and then to spectral imaging.
Spectral analysis is important in the area of remote sensing. Researchers apply the
spectral information of objects in environmental remote sensing, monitoring chemi-
cal/oil spills and military target discrimination [5, 6, 7]. For collection of the spectral
data, color imagery, color infrared imagery, and multispectral imagery were invented.
However, these kinds of images still cannot offer us enough information to construct
1
the “spectrum” of an object. Thus, when hyperspectral imagery was invented, it
became an important milestone in remote sensing. Compared to multispectral sen-
sors, hyperspectral sensors measure energy in many narrow bands (Fig. 1.2). As a
result, hyperspectral imaging produces the spectra of all pixels so that every pixel
contains abundant information about the object, which allows us to learn more about
the characteristics of objects in a scene. However, the hyperspectral imagery datasets
are so massive that the traditional technologies are not always adapted to process
them well. In hyperspectral imaging, classification and target detection are major
objectives. To achieve any of them with these huge data sets, dimension reduction
is an important task. In this thesis, methods are presented to reduce the dimension
and classify hyperspectral images.
Visually, a hyperspectral image is represented as a cube, and one common method
to handle it is to reorganize the image as a matrix. For example, a 100 × 100 × 200
hyperspectral image with 100 × 100 pixels and 200 bands can be represented as a
10000× 200 matrix. More details about how to reorganize hyperspectral images into
matrices will be shown in Chapter 2.
The idea of dimension reduction is to transform the data in a high-dimensional
space to a space of fewer dimensions. There are many methods to deal with this
process, such as manifold learning [8], non-negative matrix factorization [9], principle
component analysis (PCA) [10], and singular value decomposition (SVD) [11]. To
achieve these methods, low-rank matrix approximation is often useful. For example,
PCA and truncated SVD are nothing other than a low rank approximation [12]. The
form of low rank approximation can be shown as
Am×n ≈ Bm×kCk×n (1.1)
where k (k < min{m,n}) is the numerical rank of A.
Low-rank approximation is widely used in many fields, since large matrices can
2
be stored inexpensively and be multiplied rapidly with vectors or other matrices by
using this factorization. For example, researchers often use low-rank approximation
in data analysis [12], in solving least squares problems [13], and in model reduction
or coarse graining for solution of PDEs [14]. Generally low-rank approximation can
be obtained by two kinds of algorithms, deterministic and probabilistic. The classical
methods for deterministic low-rank approximation are based on QR factorization,
eigenvalue decomposition, and singular value decomposition which are challenged by
hyperspectral imaging. The major reasons are: first, they are not always adapted
to solve such large-scale problems; second, they are unable to handle matrices with
missing or inaccurate data; third, they often require several passes over data [15].
For example, the original SVD often cannot even finish the task of the hyperspetral
matrix factorization for the requirements of a large number of operations and large
memory. Even if the truncated SVD, which is the optimal low-rank representation,
can give the factorization [16], it often needs considerable time.
Compared with deterministic methods, probabilistic methods are generally faster
and more robust in practice [17]. These methods begin by projecting the original ma-
trix to a lower dimensional space by multiplying a random matrix. One then factorizes
the matrix in the lower dimensional space. The aim of the probabilistic methods is
to capture most of the information of the original data and perform processing on a
reduced-size matrix.
Many studies related to random projection in hyperspectral imaging for large
amounts of data, either algorithmic [18, 19] or experimental [11, 20, 21], have shown
positive results. For example, Fowler proposed a method [19], Compressive-Projection
Principle Component Analysis (CPPCA), which uses random projection to reduce the
dimension in a light encoder system, transmits the projected data to the decoder on
the ground, and reconstructs the original data in the decoder system. This process is
3
driven by the Rayleigh-Ritz theory and achieved by convex-set optimization. CPPCA
shifts the computational burden effectively from the resource constrained encoder
to the decoder, and the reconstruction obtained by CPPCA is more accurate than
that obtained by popular methods related to compressed sensing. However, CPPCA
recovers coefficients of a known sparsity pattern in an unknown basis and requires an
additional step to recover the eigenvectors.
In this thesis, we present a randomized Singular Value Decomposition (rSVD)
method for the purposes of dimension reduction, compression, reconstruction, and
classification with hyperspectral data. Moreover, we discuss this method with some
variations for different cases. These variations offer the opportunity to obtain a
more accurate reconstruction of the matrix whose singular values decay gradually, the
opportunity to process matrices without target rank, and the opportunity to obtain
the rSVD by only one single pass over the original data. The good results of rSVD are
shown by the numerical experiments and the comparisons of the computation time
and accuracy by rSVD and CPPCA with real hyperspectral data.
The structure of this thesis is as follows. In Chapter 2, we summarize the back-
ground in numerical linear algebra, statistics, and hyperspectral imaging in sections
2.1, 2.2, and 2.3, respectively. In Chapter 3, we describe the applications of singular
value decomposition and principle component analysis in hyperspectral imaging. In
Chapter 4, we generally introduce the CPPCA method in section 4.1, show in detail
the CPPCA algorithm in section 4.2, and analyse the performance in section 4.3.
In Chapter 5, the general introduction of rSVD is shown in section 5.1, the related
algorithms are presented in section 5.2, the performance of rSVD is analyzed in sec-
tion 5.3, and the applications in hyperspectral imaging is shown in 5.4. In Chapter
6, we compare the performances of rSVD and truncated SVD and the performances
of rSVD and CPPCA. Finally, we present the results of numerical experiments and
4
5
Chapter 2: Background Preparation
To better illustrate the ideas and the numerical experiments of this thesis, this
chapter reviews some useful background tools. We start with some definitions and
theorems in numerical linear algebra.
2.1 Numerical Linear Algebraic Preliminaries
To begin with, we review some classical deterministic matrix decomposition meth-
ods, including the singular value decomposition, QR factorization, and eigenvalue
decomposition [22].
Definition 1. Given any matrix A ∈ Rm×n (m > n), there is a singular value de-
composition (SVD) of A. It can be expressed as
A = Um×mSm×nV T n×n (2.1)
where U is an m × m orthonormal matrix, S is an m × n diagonal matrix with
S = diag(σ1, σ2, . . . , σn) and V is an n×n orthonormal matrix. The diagonal entries
of S, σ1 ≥ σ2 ≥ ... ≥ σn ≥ 0, are known as the singular values of A.
When the size of A is large, the calculation of the SVD is expensive. Thus, the
approximation of the SVD, the truncated Singular Value Decomposition, turns out
to be a more widespread method in practice than the full SVD for large matrices.
Definition 2. Given a matrix A ∈ Rm×n, the truncated Singular Value Decomposition
of A can be expressed as
A ≈ Um×kSk×kV T k×n (2.2)
where k is the numerical rank, U and V are orthonormal matrices and S is a diagonal
matrix.
6
Theorem 2.1. Given a matrix A ∈ Rm×n, m ≥ n, there exists a factorization
Am×n = Qm×nRn×n where Q is an orthogonal matrix and R is an upper triangular
matrix. This factorization is termed as QR factorization.
Definition 3. Given a square matrix A ∈ Rm×m, the eigenvalue decomposition of A
is expressed as
A = Xm×mΛm×mX −1 m×m (2.3)
where X is a nonsingular matrix whose ith column is an eigenvector of A and Λ is a
diagonal matrix whose ith diagonal entry is the corresponding eigenvalue.
In particular, when A is symmetric, X contains the singular vectors of A. Next,
we introduce an important theoretical foundation for this thesis.
Theorem 2.2. (Johnson-Lindenstrauss lemma,1984) For any 0 ≤ ε ≤ 1 and any
integer n, if k ≥ 4(ε2/2− ε3)−1 ln(n), then for any set X of n points in Rd, there is
a Lipschitz function f : Rd → Rk such that
(1− ε)u− v ≤ f(u)− f(v) ≤ (1 + ε)u− v (2.4)
for any u, v ∈ X.
This theorem shows us that the distances between points can be preserved by the
projection from a high dimension space to a lower dimension subspace. A proof of
this theorem is given in [23]. Both CPPCA and rSVD are based on this theorem. We
show the details in Chapter 4 and Chapter 5.
Last, we need to mention some preliminary results of numerical linear algebra
related to the rSVD before moving on to the next section. Theorem 2.3 relates the
error of the approximation of SVD, Ak, to the singular value σk+1.
Theorem 2.3. Given a matrix A ∈ Rm×n and k is the rank of A, if the approximation
of A is
7
then the error between A and Ak is A− Ak2 = σk+1.
Therefore the approximation of A, Ak, is less accurate for a fixed k when σ1 is
large and the singular values decay gradually.
Usually, we can examine whether the singular values of a matrix decay rapidly or
gradually by observing the plot of singular values on a log scale visually, as in Fig
2.1. Theorem 2.4 gives the average spectral error with an oversampling parameter.
Figure 2.1: Plots of the first 50 singular values of two matrices with same size on a log scale.
Theorem 2.4. (Average spectral error) Given a matrix A ∈ Rm×n and a Gaussian
random matrix , if the sample matrix
Yk+p = An×(k+p) = Um×(k+p)S(k+p)×(k+p)V T
(k+p)×n, (2.6)
where k is the rank of A and p is an oversampling parameter with small integral
number, then the average spectral error is
EA− Ak+p2 ≤
( 1 +
√ k
where E denotes the expectation with respect to [17].
In other words, when the singular values decay gradually, the error of approxima-
tion may be large, by these two theorems. Therefore, in this case, we use the power
iteration (AAT )qA for a small integer q instead of A in (2.6) for reducing the error.
Here, we give the average spectral error for the power iteration.
Theorem 2.5. (Average spectral error for the power iteration) Use the hypotheses
of Theorem 2.3. Let Z = (AAT )qA where q is a small positive integer, the average
spectral error is
(1 +
√ k
(2.8)
where σi is the ith singular value of A and E is the expectation with respect to .
To find the proofs of Theorems 2.3, 2.4 and 2.5, please refer to [17].
2.2 Statistics Preliminaries
Hyperspectral imagery datasets contain abundant information on wavelength bands.
Often, the information on one wavelength band resembles the information on a differ-
ent wavelength band. This phenomenon makes data analysis difficult; not only does
it cause redundant calculation, but it also makes the data analysis more complex.
Therefore, it is natural to create a small number of new variables to be surrogates
for the original large number of variables. Principal component analysis (PCA) is an
efficient method for executing this. We introduce it in detail in this section.
PCA uses a linearly orthogonal transformation to convert a set of observations
of possibly correlated variables into a set of values of linear uncorrelated variables.
9
When we consider a set of correlated variables X = (X1, ..., Xn)T with the expecta-
tion E(X) = µ and the covariance matrix D(X) = Σ, the linear transform can be
expressed as
..........................................................,
(2.9)
where (w1, w2, . . . , wn) forms the transform matrix W . The variance and covariance
of Z can be calculated by W and Σ as follows:
V ar(Zi) = wTi Σwi (2.10)
and
where i, j = 1, 2, ..., n.
If Z1 includes most of the information of X1, ..., Xn, it can be treated as a sur-
rogate for X1, ..., Xn. But how do we measure the “information”? In the classical
method of measurement, the more information Z1 includes, the greater the value of
V ar(Z1). When Z1 fails to express enough information of X1, ..., Xn, we can consider
adding Z2 to complement the information. Generally, we hope to use as few of the
Zi’s as possible. To make Z2 include as much new information as possible, Z2 should
not contain the information Z1 includes, i.e. Cov(Z2, Z1) = 0. Now that we have in-
troduced the main idea of PCA, we give the formal definition of principle components
now.
Definition 4. Given a set of correlated variables X = (X1, ..., Xn)T , Zi = wTi X is
the ith principle component of X if
10
(1) wTi wi = 1, i = 1, ..., n;
(2) when i > 1, wTi Σwj = 0, j = 1, ..i− 1;
(3) V ar(Zi) = maxwTw=1,wT Σwj=0(j=1,...i−1)V ar(w TX)
By this definition, the problem of obtaining the first principle component Z1 =
wT1 X is equivalent to the problem of obtaining w1. It could be treated as an opti-
mization problem,
maximize w1
V ar(Z1)
(2.12)
The technique of Lagrange multipliers can be used to solve this problem. Consider
a function f(w1)
= wT1 Σw1 − λ(wT1 w1 − 1). (2.15)
Differentiating with respect to w1 and λ gives
{ ∂f ∂w1
= 2(Σ− λI)w1 = 0, ∂f ∂λ
= wT1 w1 − 1 = 0. (2.16)
Since w1 6= 0, |Σ − λI| = 0 is used to find the eigenvalues and eigenvectors of Σ.
To decide which of these eigenvectors gives Z1 with maximum variance, we observe
that
V ar(Z1) = wT1 Σw1 = wT1 λw1 = λwT1 w1 = λ. (2.17)
Thus, to maximize the parameter λ, we choose the eigenvector w1 corresponding to
the largest eigenvalue λ. Generally, we can obtain the ith principle component by the
11
eigenvector corresponding to the ith largest eigenvalue [24]. Theorem 2.6 gives the
more formal statement of the preceding discussion. In addition, the theorem shows
that obtaining the transform matrix W is equivalent to finding the eigenvectors of
the covariance matrix Σ.
Theorem 2.6. Consider a set of correlated variables X = (X1, ..., Xn)T where Xi ∈
Rm; the covariance matrix of X is D(X) = Σ = XXT
m . If the eigenvalues of Σ are
λ1 ≥ λ2 ≥ ... ≥ λn ≥ 0, and w1, w2,...,wn are the eigenvectors corresponding to the
eigenvalues, then Zi = wTi X is the ith principle component of X.
2.3 Hyperspectral Imaging Preliminaries
Let us envision a hyperpectral image, so we can place it into a visual context. A
hyperspectral image looks like a cube which is formed from several “images” (Fig.
2.2). Each “image” contains all pixels in a wavelength band. For each pixel, the
hyperspectral image is measured by many continuous wavelength bands. Now, two of
the most used hyperspectral image spectrometers, NASA’s Airborne Visible/Infrared
Imaging Spectrometer (AVIRIS) and naval research laboratory’s Hyperspectral Digi-
tal Imagery Collection Experiment (HYDICE) can generate 224 and 210 wavelength
bands, respectively [25].
When compared with traditional multispectral imaging, hyperspectral imaging
provides much more information because of its greater number of wavelength bands.
Also, the reflectance curve of each pixel through the wavelength bands is essentially
continuous in hyperspectral images. Thus, each pixel has an entire spectrum which
can be used to determine a spectral signature. Every material has a particular spectral
signature, so this is useful to identify an object by extracting the spectral signature
at each pixel and comparing it with known spectral signatures.
Hyperspectral imaging also has some disadvantages. The main disadvantage is
12
Figure 2.2: Hyperspectral image [26].
that the computational cost for processing can be very large. Is it possible to reduce
the computational cost? Let us keep this question in mind as we continue to introduce
the hyperspectral image.
To process a hyperspectral image, we need to reorganize its data first. Consider
a hyperspectral image which has a× b pixels and n wavelength bands. Let m = a ∗ b.
Form a m × n matrix A. Each entry of Ai,j, (i = 1, ...,m and j = 1, ..., n) is the
reflectance of pixel ith in the jth wavelength band. Every row contains the reflectance
of all wavelength bands of a pixel, and every column contains the reflectance of all
pixels in a certain wavelength band. We use this matrix in the later chapters.
13
3.1 SVD in Hyperspectral Imaging
Singular Value Decomposition (SVD) is a powerful tool in hyperspectral image anal-
ysis. The SVD of a matrix can be directly used for noise reduction, data compression,
and dimension reduction. In addition, it is also related to the processes of classifica-
tion and unmixing. In this section, we review these uses of the SVD in hyperspectral
imaging. Here, we use the matrix A, the hyperspectral data in matrix form, which
we introduced in Chapter 2.
As we know, image noise is undesirable but cannot be avoided from the image
capture, so how to denoise the hyperspectral images is usually the first step in hy-
perspectral imaging. Since most of the actions of a matrix are contained in the first
singular values and their corresponding singular vectors, the truncated SVD can be
used to denoise the matrix by discarding the small singular values which mainly rep-
resent the noise. The truncated SVD, UkSkVk T , then, represents the denoised matrix
of A where k is the numerical rank. For example, [11] shows how to denoise hyper-
spectral images by SVD and how to unmix them based on the compressive sensing
method.
The truncated SVD can also represent a compressed hyperspectral dataset. Recall
the hyperspectral data we introduced in Chapter 2. The sizes of the hyperspectral
datasets are usually large. Thus, compression is an important topic in hyperspectral
imaging. We can compress them from two directions. One direction deals with
the data of the wavelength bands. The other direction deals with the data of the
pixels. Reference [27] shows the methods to compress hyperspectral data by random
projections.
14
One way of dimension reduction is to project the data in the high-dimensional
space to a lower-dimensional subspace which captures most of the action of the data.
Two of the methods that can be used to reduce the dimension are band selection
and feature extraction. SVD is beneficial to the method of feature extraction. By
using SVD, the dimension of data can be reduced to the space spanned by the first k
columns of U . The projection Ap is expressed as
Ap = UT k A, (3.1)
where A is an m × n matrix and Ap is a k × n matrix. The row dimension of Ap is
generally much less than that of A.
For classification, we consider the matrix X which is the transpose of A. The
projection Xp is
Xp = V T k X, (3.2)
where the columns of Vk are the right singular vectors of A and Xp includes the
“most common” information from all pixels. Then it is appropriate to use Xp to
do the unsupervised classification of the hyperspetral image data. With different
user-defined numerical ranks, k, we have different levels of accuracy in classification.
In short, SVD is useful in the initial processing of hyperspectral images because it
can provide a natural, ordered hierarchy for the compressed representation of informa-
tion and it provides an orthogonal basis for the range of the matrix of hyperspectral
data matrix.
3.2 PCA in Hyperspectral Imaging
PCA is commonly used in feature extraction, unmixing, and target detection from
hyperspectral images. The main reason for this is that PCA can convert a large set
of hyperspectral data into a smaller set of linear uncorrelated variables. As we intro-
15
duced in Chapter 2, this means PCA can be used to reduce the dimension while losing
the information of the original data as little as possible. Compared to the other linear
projections which are used to reduce the dimension, PCA has better performance in
preserving target detection and classification capabilities after dimension reduction
in most cases [28].
Based on PCA, some useful methods are generated. For example, Jia and Ricard
[29] proposed the segmented principal components transformation which uses the
property that the hyperspectral data present high correlation in the neighbouring
spectral bands and the hyperspectral data with high correlations along the diagonal
line appears in blocks (Fig. 3.1). They partition the hyperspectral data into different
subsets along the diagonal line, and apply PCA to different subsets for obtaining more
accurate reconstruction.
Figure 3.1: The hyperspectral data present high correlation in the diagonal line [30].
A similar idea is used in the method of class dependent compressive-projection
PCA [31]. This method partitions the image into several subsets such that each subset
represents a unique class that has higher correlation than the subset partitioned by
the segmented principal components transformation.
Moreover, directed principle component analysis, selective principle component
analysis, standard principle component analysis, and residual-scaled principle com-
16
ponent analysis are often utilized in hyperspectral imaging to improve the perfor-
mance of the traditional PCA [32]. In the next chapter, we introduce a method,
Compressive-Projection Principal Component Analysis (CPPCA), which has been
proposed recently and improves traditional PCA with the idea of compressive projec-
tion.
Before we end this chapter, let us observe some relationships between PCA and
SVD. Consider the matrix X ∈ Rn×m and the matrix A, which is equal to the
transpose of X. The covariance matrix Σ of X can be expressed as
Σ = XXT
m = WΛW T (3.3)
where Λ is a diagonal matrix with the entries of eigenvalues, λ1(Σ), λ2(Σ)..., λn(Σ)
and the columns of W are the eigenvectors corresponding to the eigenvalues. W is
also called a transform matrix in PCA.
Since
(3.3) is equivalent to
Σ = V SUTUSV T
m = V S2V T
m (3.5)
Therefore, we conclude that the matrix V for A is equal to the matrix W for X
and
S2
m = Λ. (3.6)
Moreover, the principle components W TX, V TX and SUT are equal.
17
Analysis
As we introduced in Chapter 2, PCA is a data dependent transform which results
from the eigenvalue decomposition of the covariance matrix of a dataset. We have
shown PCA plays a central role in dimension reduction in Chapter 3, but its use is
limited in many resource-constrained settings, like the hyperspectral sensing platform
in satellite-borne devices. One of the reasons its use is limited is because the PCA
transform has to be calculated in this resource-constrained setting before it can be
applied to the data set. This means the computational burden is in the encoder
system which may not have the ability to execute this task.
Fowler [19] has proposed a method called Compressive-Projection Principal Com-
ponent Analysis (CPPCA). The CPPCA encoder projects the dataset at the signal
sensor onto lower dimensional subspaces chosen at random, then the CPPCA decoder
reconstructs not only the PCA transform matrix for the transmitted dataset but also
an approximation of the principle components by these randomly a priori projections.
This process can transfer the computational burden from the encoder to the decoder
successfully. The data flow is shown in Fig. 4.1.
Figure 4.1: Data flow of CPPCA [33].
18
In this chapter, we review CPPCA in Section 4.1, present the CPPCA algorithm
in Section 4.2, perform CPPCA on a real dataset and observe the results in Section
4.3, and apply it to hyperspectral image data compression in Section 4.4.
4.1 Introduction
Consider a dataset of correlated variables X ∈ Rn×m with the expectation E(X) =
µ and the covariance matrix D(X) = Σ, where each column of X, Xi ∈ Rn and
Σ = 1 m XXT . Theorem 2.6 shows the PCA transform matrix W is formed by the
eigenvectors of Σ. So W can be calculated by the eigenvalue decomposition as
Σ = WΛW T (4.1)
where Λ is a diagonal matrix with the entries of eigenvalues, λ1(Σ), λ2(Σ)..., λn(Σ).
Instead of calculating the transform of PCA in the encoder, CPPCA allows this
calculation to be shifted to the decoder. Suppose we have an orthonormal matrix
P ∈ Rn×k, (k ≤ n), whose columns form the basis of a k-dimensional subspace P .
The orthogonal projection of X onto the subspace P is Y = PP TX. The projected
data Y = P TX (Y ∈ Rk×m) is transmitted from the encoder to the decoder. The
projected covariance matrix Σ can be expressed as
Σ = P TX(P TX)T
Σ = U ΛUT (4.3)
where Λ = diag(λ1(Σ), λ2(Σ)..., λk(Σ)). Define λ1(Σ), λ2(Σ)..., λk(Σ) as Ritz values.
From (4.2) and (4.3), we find out that
P TΣP = U ΛU (4.4)
⇒ Σ = PU ΛUP T (4.5)
19
where the ui’s are the columns of U . Define Pui = ui as Ritz vectors, i = 1, 2, ..., k,
where ||ui|| = 1.
Also, the orthogonal projection wj onto P with unit length is defined as the
normalized projection vj
vj = PP Twj
||PP Twj||2 (4.6)
where j = 1, ..., n.
Generally, the Ritz vector ui cannot be used to approximate any vj, j = 1, ..., n.
But, if the subspace P is chosen randomly, and the eigenvalues of Σ are sufficiently
separated, i.e., λ1(Σ) λ2(Σ) ... λk(Σ), then the corresponding normalized
projection vj is very close to the Ritz vector ui. Each ui corresponds to the Ritz value
λi(Σ), i.e., ui ≈ vi, i = 1, .., k (Fig. 4.2). For more details, see [19].
Figure 4.2: The projection of x onto the subspace P . The Ritz vector ui is close to the normalized projection vi [33].
After we have the approximation of vj by uj, an algorithm based on projections
onto convex sets (POCS) we can reconstruct the first L eigenvectors. These are
20
assembled into the approximation of the L−component transform matrix W , denoted
by Ψ, and the principle components Z from Y and P .
Before introducing the CPPCA algorithm, let us first review the method of POCS
[34]. The method of POCS is an iterative algorithm aimed at finding the vector w in
the intersection of a given sequence {Cj}Jj=1 of closed convex sets. That is,
w ∈ C0 = ∩Jj=1Cj. (4.7)
4.2 CPPCA Algorithm
First, we introduce the CPPCA Encoder Algorithm. The CPPCA encoder splits
X = [X1, X2, . . . , Xm] into J partitions of columns Xj. Each Xj is related to its ran-
dom projection P j of P , j = 1, 2, . . . , J . Then Y j = P jXj is formed by Algorithm 1.
Algorithm 1 CPPCA Encoder
1: Draw a length-J cell array of n× k projection matrices P{1}, . . . , P{J}. 2: for j = 1 to J do 3: X{j} ← X(:, j : J : m); 4: Y {j} ← P{j}TX{j}; 5: end for
This algorithm is based on an assumption that each Xj resembles X statistically,
such that it has the approximate eigenvalue decomposition of X [19].
4.2.1 Reconstruction of the Transform Matrix of PCA
As we discussed in the introduction, the CPPCA decoder does not have access to
either the original data X or the covariance matrix Σ. Thus, the transform matrix
W of PCA cannot be calculated directly by the eigenvalue decomposition (4.1) in the
decoder. The main goal of CPPCA is to approximate the transform matrix W with
the projected data Y and a priori P . By taking advantage of the approximation of
21
W , we can easily obtain the approximations of the principle components Z and the
original data set X.
In this part, we introduce the algorithm for reconstructing the first L eigenvectors
of Σ, i.e., the first L columns of the transform matrix of PCA. Given the normalized
projection v of the eigenvector w in the subspace P , we form the subspace C as
C = P⊥ ⊕ span{v}. (4.8)
Thus, C is the direct sum of orthogonal complements of P with a plane containing v.
In order to form the subspaces C1, C2, . . . , CJ , we generate J random subspaces
P1,P2, . . . ,PJ which contain v1, v2, . . . , vJ respectively by the orthonormal projection
matrices P 1, P 2, . . . , P J . Then C1, C2, . . . , CJ can be formed by (4.8).
Figure 4.3: w1 are projected on two different planes [33].
Figure 4.4: These planes has an inter- section. We can find w1 in the inter- section [33].
By the Fig 4.3 and Fig 4.4, we can find that w is in the intersection of C1 ∩ . . . ∩
CJ(J = 2). Since C1, C2, . . . , CJ are closed and convex, it is appropriate to use the
POCS method to give the approximation of w by ui instead of vi if the eigenvalues
of Σ are sufficiently separated. This POCS solution can be used to approximate w.
22
The iteration of an estimation of w is formed as
wt = 1
QjQjT wt−1 (4.9)
where t = 1, 2, ....
The approximation of wi is the normalization of the convergence of wti while the
initial w0 i is the average of the Ritz vectors and Qj is used to perform C(j). We
indicate w0 i as
uji (4.10)
where uji is the ith Ritz vector of jth partition. This process is carried out by Algo-
rithm 2.
Algorithm 2 POCS Method
1: Initialize w0 i and Q by (3.10) and (3.7), respectively.
2: max iteration ← 100; 3: tolerance ← 0.001; 4: for j = 1 to J do 5: wj previous← w0
i ; 6: QQ← [QQ Q ∗Q′]; 7: for i = 1 to max iteration do 8: wj ← QQ ∗ repmat(wj; previous, [J 1])/J ; 9: s is the angle between wj previous and wj; 10: if the angle degree s is greater than 90 then 11: s← 180− s; 12: end if 13: if the angle degree s is less than tolerance then 14: return; 15: end if 16: end for 17: end for
4.2.2 Reconstruction of the Principle Components
In this section, we introduce the algorithm to reconstruct the principle components
Z1, . . . , ZJ . Since the approximation of the principle components is Zj = ΨTXj, we
23
Y j = P jTΨZj. (4.11)
Thus, once we obtain the L-component approximation of the transform matrix
W , we can reconstruct Zj by the least-squares solver. The solution is
Zj = (P jTΨ)+Y j, (4.12)
where (P jTΨ)+ is the pseudoinverse of P jTΨ.
Algorithm 3 Reconstruction of Principle Components by CPPCA
1: Input: P{j}, Ψ, Y {j}; 2: L← the number of columns of Ψ; 3: Zj ← pinv(P{j}′ ∗Ψ) ∗ Y {j});
4.3 Performance
In this section, we examine the performance of CPPCA on an actual hyperspectral
imagery dataset taken from part of hyperspectral data collected by Gader, et al. [35].
We call this the Gulfport dataset. The Gulfport dataset is rotated and cropped into
a HSI cube with 320 × 360 pixels and 58 wavelength bands, and then unfolded into
a large matrix of size 115200× 58. Set the number of pixels as m = 115200 and the
number of wavelength bands as n = 58. Define the hyperspectral data as X where
X ∈ Rn×m has removed the mean vector from the original matrix so that E(X)=0.
Firstly, we test the hypothesis that the Ritz vectors ui are close to normalized
projections vi, i = 1, 2, ..., is true in practice. Let the $i denote the angles between
ui and vi, i = 1, 2, .... Let us observe the plot of the eigenvalues of the hyperspectral
data of Gulfport in a log scale (Fig. 4.5).
From this plot, we observe that λ1(Σ) λ2(Σ) λ3(Σ) > . . . > λn(Σ) > 0. We
generate 1000 random orthonormal projections P ∈ Rn×k to see the angles between
24
Figure 4.5: Plot of the eigenvalues of the hyperspectral data matrix of Gulfport in a log scale.
the first six Ritz vectors ui and the normalized projections vi. The frequency of angle
degrees is presented in Fig. 4.6 and the average angle is given in Table 4.1. From the
figure, we can see that the angles are concentrated when i = 1, 2, 3. When i = 4, 5, 6,
the angles are dispersive. Thus, the results are only stable, when i = 1, 2, 3. From
this table, we find out that:
1. The angles $i of this real hyperspectral data are larger than the angles of the
data of the numerical experiment in [19]. The reason is the eigenvalues of the
hyperspectral data of Gulfport are not separated as significantly as the data of
numerical experiments in [19].
2. $1 is very close to 0, so the Ritz vector u1 is very close to v1.
3. $2 is also close to 0, but it is not as close to 0 as $1.
4. $3, $4, $5, $6 are much greater than $1 and $2.
25
5. $i increases as i increases, so we should not use many ui’s to obtain the initial
average w0 i (4.10), since the approximation of the eigenvector w is incorrect if
we use the ui which is not close to vi to replace vi (4.6).
Figure 4.6: The frequency of angle degrees $i of the hyperspectral data of Gulfport, i = 1, 2, . . . , 6.
Table 4.1: The average angle between Ritz vector and normalized projection.
The average angle $1 $2 $3 $4 $5 $6
The value of $i 1.50 8.60 20.73 27.46 30.14 48.41
When considering the above observations, it is appropriate to use the first three
Ritz vectors to obtain the initial average w0 i (4.10).
26
Next, let us start to reconstruct the eigenvectors w1 and w2 with J = 15, 20, 30, 50, 60.
Set ξi as the average angles between the eigenvectors wi and the approximation of
these eigenvectors, i = 1, 2.
Fig. 4.7 and Fig. 4.8 show that for larger J , the average angles ξ1 and ξ2 are
smaller. Meanwhile, when J is greater, first we need to generate more projections P to
form the projected data Y . As a result, the CPPCA algorithm has more calculations
later. Therefore, we consider finding a value of J which balances the accuracy of
approximation and the amount of calculation.
Figure 4.7: The angle degrees ξ1 of hyperspectral data of Gulfport with J = 15, 20, 30, 50, 60.
As we observed, the average angles ξ1 and ξ2 with J = 15 are much greater than
the average angles ξ1 and ξ2 with other values of J . And, each pair of average angles
ξ1 and ξ2 are close to the other pair when J = 20, 30, 50, 60. Thus, we use J = 20 for
the following experiments.
Last, we reconstruct the approximate original data X which denotes X where
27
Figure 4.8: The angle degrees ξ2 of the hyperspectral data of Gulfport with J = 15, 20, 30, 50, 60.
X = [x1, x2, ..., xm] and X = [x1, x2, ..., xm]. We use the signal-to-noise ratio (SNR)
to measure the quality of the reconstruction of a vector in dB [33].
SNR(xj, xj) = 10log10 var(xj)
where var(xj) is the variance of xj ∈ Rn×1 and
MSE(xj, xj) = 1
n xj − xj. (4.14)
The mean of SNR(xj, xj), j = 1, ...,m is used to measure the quality of the recon-
struction of X. Fig. 4.9 shows the reconstruction performance of the hyperspectral
data of Gulfport with different k. We compare the CPPCA algorithm with the rSVD
algorithm in Chapter 6.
28
Figure 4.9: Reconstruction performance of the hyperspectral data of Gulfport by the CPPCA algorithm.
29
5.1 Introduction
Since the advent of large datasets in hyperspectral image processing, the classical ma-
trix factorization methods we introduced in Chapter 2 can sometimes not be adapted
to process such large-scale problems. These methods often bring a huge computa-
tional cost.
To overcome the disadvantages of classical methods, randomized methods are
considered by researchers to be appropriate for constructing the approximate matrix
factorization. They are appropriate because the random sampling method is effective
when estimating characteristics of the whole population by a relatively small sample
and the Johnson-Lindenstrauss Lemma guarantees that the distances between points
can be preserved by the projection from the high dimension space to a lower dimen-
sional subspace. Chen et al. [28] have also shown that for classification algorithms
and classical target detection for HSI, even with a completely random projection,
the dimensionality can be reduced to 1/5 ∼ 1/3 of the original dimensionality with-
out severely affecting the algorithm performance. Thus, randomized methods are
appropriate to use in hyperspectral imaging.
Our goal is to compute a low rank SVD approximation (2.2) for hyperspectral
imaging by random projections. The algorithms about how to obtain a low rank
SVD approximation have previously been proposed in [17]. Here, we introduce the
ideas and explain how to execute the algorithms under different conditions and how
to apply them in hyperspectral imaging. We define a matrix as A ∈ Rm×n with a
target rank k, k ≤ n ≤ m, and ε as the approximation error.
In the first step, the aim is to find an approximate basis matrix Q ∈ Rm×k for the
30
range of A by using as few columns as possible. Meanwhile Q should satisfy that
A−QQTA2 ≤ ε (5.1)
to ensure the accuracy of approximation.
In the second step, the aim is to finish the approximate factorization of the SVD
with a small amount of calculation. Let B = QTA, the size of B is k × n which is
smaller than the matrix A. It can be factorized directly to be B = UBSV T where UB
and V are orthogonal. Then we can achieve the approximate factorization of A as
A ≈ QUBSV T . Here, denote QUB as U which is still orthogonal, so it can be seen as
an approximation of the original left singular vector matrix U of A. We define this
factorization
How accurate is this method? Define the error ek as
ek = A− USV T2. (5.3)
This error ek should be compared to the theoretical error σk+1
σk+1 = A− Ak2 (5.4)
which we defined in Theorem 2.3.
5.2 Algorithms
This section includes the algorithms to solve problems under different conditions.
Case 1: If we already have a target rank k and the singular values of A decay
rapidly, we construct a random matrix of size n × (k + p). The oversampling
parameter p is assigned as five from experience [17]. In stating the algorithms, we
31
incorporate k + p into k. Form a random sample of the matrix A as Y = A with
lower dimensions. The columns of Y are linearly independent, so we can obtain the
approximate basis for the range of A by using the rank revealing QR factorization,
Y = QR where Q is an orthogonal basis (Algorithm 4).
Algorithm 4
1: Input: Given an m× n matrix A with numerical rank k, k ≤ n ≤ m. 2: Generate a Gaussian random matrix ∈ Rn×k; 3: Form the matrix Y ∈ Rm×k, Y = A; 4: Construct a matrix Q ∈ Rm×k whose columns are the basis for the range of Y ; 5: Form the small matrix B as QTA; 6: Compute the SVD of B, B = UBSV
T ; 7: Form the rSVD of A, A ≈ QUBSV
T = USV T ; 8: Output: U , S, V T .
We compare the error ek and the theoretical error σk+1 of a matrix A in Fig. 5.1
[36], when the singular values of A decay rapidly. It shows that ek given by (5.3) is
close to the theoretical error σk+1 with high probability. However, ek is not always
close to the theoretical error σk+1 from Theorem 2.4. For example, Fig. 5.2 shows
that ek is not close to the theoretical error σk+1 from the curve when q = 0.
Case 2: If the singular values of A decay gradually, σk/σ1 is not small, we may
lose the accuracy of Algorithm 4. Consider forming Y as Y = (AAT )qA by power
iteration. Since (AAT )qA has same singular vectors of A, but the singular value σi of
the matrix (AAT )qA is equal to σ2q−1 i . The singular values of (AAT )qA decay more
rapidly, so the error A−QQTA is smaller by Theorem 2.3 and Theorem 2.5. Fig.
5.2 gives an example of a 1000× 1000 matrix and Algorithm 5 shows us how to deal
with this case with a relatively accurate computation.
Case 3: We may not know the target rank k in practice. Thus, we need to
learn how many columns of Q to use from a given ε such that A − QQTA2 ≤ ε.
We attempt to use, say, l columns first to observe the value of ε. If the value of
A − QlQ T l A2 is beyond ε, then we add more columns of Q until it satisfies the
32
Figure 5.1: The comparison of the error ek and the theoretical error σk+1 [36].
bound (Algorithm 6).
Case 4: The algorithms above in this section require us to revisit the input matrix.
This may be not feasible for large matrices. Here, we introduce the methods for large
matrices. These methods only require one pass over the matrix A to construct the
matrix Q and the rSVD of A.
Algorithm 7 is used for to symmetric matrices. We define B = QTAQ and multiply
QT to both sides of this equation, then we have
BQT = QTAQQT. (5.5)
BQT ≈ QTA = QTY. (5.6)
Q,B, and Y are known, so we can solve this equation (5.6) to obtain the matrix B.
From B = QTAQ we can obtain the approximation of A. Similarly, we can process
33
Algorithm 5
1: Input: Given an m× n matrix A with numerical rank k, k ≤ n ≤ m. 2: Generate a Gaussian random matrix ∈ Rn×k; 3: Form the matrix Y ∈ Rm×k, Y = A; 4: Compute the rank-revealing QR factorization Y = QR; 5: for j = 1 to q do 6: Form Y = ATQ and compute the rank-revealing QR factorization of Y ; 7: Form Y = AQ and compute the rank-revealing QR factorization of Y ; 8: end for 9: Form the small matrix B as QTA; 10: Compute the SVD of B, B = UBSV
T ; 11: Form the rSVD of A, A ≈ QUBSV
T = USV T ; 12: Output: U , S, V T .
Algorithm 6
1: Input: Given an m× n matrix A with numerical rank k, k ≤ n ≤ m. 2: Form an empty basis matrix Q, set e=1 and k=0; 3: while e > ε do 4: k = k + 1; 5: Form the vectors yi = Ari; where ri is a Gaussian random vector; 6: Form qi = (1−Qi−1Q
∗ i−1)yi;
7: Normalize qi = qi qi ;
8: Q = [Qqi]; 9: = [ ri]; 10: Compute the error e = A−QQTA2; 11: end while 12: Form the small matrix B as QTA; 13: Compute the SVD of B, B = UBSV
T ; 14: Form the rSVD of A, A ≈ QUBSV
T = USV T ; 15: Output: U , S, V T .
Algorithm 7
1: Input: Given an m×m symmetric matrix A with numerical rank k, k ≤ m. 2: Generate a Gaussian random matrix ∈ Rm×k; 3: Form the matrix Y ∈ Rm×k, Y = A; 4: Construct a matrix Q ∈ Rm×k whose columns are the basis for the range of Y ; 5: Use a standard least squares solver to find Bapprox which satisfies Bapprox(Q
T) ≈ QTY ;
6: Compute the eigenvalue decomposition of Bapprox as Bapprox = V ΛV T ; 7: Form the approximated eigenvectors U = QV ; 8: The approximation of A can be expressed as A ≈ UΛUT ; 9: Output: U and Λ.
34
Figure 5.2: The comparison of the error ek and the theoretical error σk+1. The pink curve shows the error ek is greater than the theoretical error σk+1 when q = 0 [36]. .
nonsymmetric matrices by Algorithm 8.
5.3 Performance Analysis
We consider an n× n symmetric Toeplitz matrices, n = 15, 30, ..., 1500. The singular
values of these matrices decay rapidly, as seen in Fig. 5.3, so they are appropriate to
be used to test Algorithm 4.
First, we see the relative error between the rSVD and original matrix A by Fig.
5.4. It shows that the relative errors
A− Um×kSk×nV T n×n2
A2
(5.7)
rise and fall around 1.5×10−7. Also, the result shows us the algorithm should remain
35
Algorithm 8
1: Input: Given an m× n matrix A with numerical rank k, k ≤ n ≤ m. 2: Generate Gaussian random matrices ∈ Rn×k and ψ ∈ Rm×k; 3: Form matrices Y ∈ Rm×k and Z ∈ Rn×k, Y = A and Z = ATψ; 4: Construct a matrix Q ∈ Rm×k whose columns are the basis for the range of Y
and a matrix F ∈ Rn×k whose columns are the basis for the range of Z; 5: Find Bapprox which satisfies Bapprox(F
T) ≈ QTY and BT approx(Q
TΨ) ≈ F TZ;
6: Compute the SVD of Bapprox as Bapprox = USV T ;
7: Form the approximated left singular vectors U = QU and the approximated right singular vectors V = FV ;
8: The approximation of A can be expressed as A ≈ USV T
9: Output: U , S and V .
relatively accurate when the sizes of the matrices increase.
Second, we see the computation times of rSVD for n×n Toeplitz matrices. Figure
5.5 illustrates that the computational times increase linearly, but stay very small.
From the results we have shown, rSVD is an accurate and efficient method to
factorize matrices. Also, we compare the relative error and computational time with
truncated SVD in the next chapter.
5.4 Applications
In this section, we apply the rSVD to hyperspectral imaging. As introduced earlier,
one can obtain a good approximation of a matrixA, A ≈ QB, by rSVD. Here, Q andB
are smaller matrices than A. Generally, since the number of wavelength bands which
are generated by hyperspectral spectrometers is less than 250, the reorganization of
hyperspectral data is expressed as a m×n matrix, where n m. As a result, the size
of B is very small. For example, considering the hyperspectral data of the Gulfport
dataset, we reorganize it as a 115200×58 matrix by the way we introduced in Chapter
2. Let this matrix be A. A is approximated with target rank 25 by
A ≈ QQTA = QB (5.8)
36
Figure 5.3: We use a 1000 × 1000 matrix as an example to show the singular values of Toeplitz matrices decay rapidly by observing the 100 largest singular values.
where Q is a 115200× 25 matrix and B is a 25× 58 matrix. Therefore, we consider
compressing the matrix A into B and Q on a hyperspectral sensing platform, then
transmiting B and Q to a decoder station and use them to reconstruct the matrix A.
In practice, to avoid producing multiple random projections for each column of
A, the rSVD splits A = [A1, A2, . . . , Am]T in to J partitions Aj, j = 1, . . . , , J . Each
Aj is related to the random matrix j.
5.4.1 Compression on a Hyperspectral Sensing Platform
In the hyperspectral sensing platform, we generate J random Gaussian matrices j
first. Then, we use Algorithm 4, rSVD Encoder, to compress A to B and Q. The
sum of the bytes used to store B and Q is smaller than that for storing Y which is
the compressed data by the CPPCA.
37
Figure 5.4: The relative errors between the rSVD and original matrix A are shown by the red curve.
Figure 5.5: The computation times (seconds) of rSVD.
38
Algorithm 9 rSVD Encoder
1: Draw a length-J cell array of n× k random Gaussian matrices {1}, . . . ,{J}. 2: for j = 1 to J do 3: A{j} ← A(:, j : J : m); 4: Y {j} ← A{j}{j}; 5: Construct the matrices Q{j} whose columns form an orthonormal basis for the
range of Y {j}; 6: Form B{j} = Q{j}TA{j}; 7: end for 8: Output: a length-J cell array B and a length-J cell array Q.
5.4.2 Reconstruction at a Ground Receiving Station
After receiving the arrays B and Q, the task of reconstructing A can be finished by
QB = QQTA easily for the properties of Q. The algorithm is shown as Algorithm 10.
Algorithm 10 Reconstruction by rSVD
1: Input: a cell length-J array B and a length-J cell array Q. 2: for j = 1 to J do 3: A{j} = Q{j}B{j}; 4: end for
Next, let us use the SNR (4.13) to measure the quality of the reconstruction per-
formance by rSVD. Fig. 5.6 shows relatively accurate reconstruction performance by
rSVD when J = 20. We compare it with CPPCA from the accuracy and computation
times in Chapter 7.
5.4.3 Classification
Since the projection Xp given in (3.3) contains the “most common” information of
the matrix X, we do the unsupervised classification of the hyperspectral image of the
Gulfport dataset by using Xp. Here, we use the k-means algorithm for our numerical
experiment. Fig. 5.7 shows the result of classification by the method of k-means. We
can see that the water and shadows are in yellow, the trees are in red, the grasses
are in dark red, the pavements are in green, the beach sands are in dark blue, and
39
Figure 5.6: Reconstruction performance of the hyperspectral data of the Gulfport dataset by the rSVD algorithm. Here, k is the target rank and n is the number of columns of A.
the sandy/dirt grasses are in blue and light blue. The classification performance is
compared to that obtained from the original matrix X. Only 13 pixels of 115200
pixels are classified differently between the original matrix and the projected data,
so the total accuracy of classification is above 99 percent. This result demonstrates
that it is suitable to use the projected data Xp for classification. Fig. 5.8 shows us
the images associated with the first eight columns of Xp.
40
Figure 5.7: The classification result in the method of k-means.
41
Figure 5.8: The plots of the first eight columns of Xp. From the first sub-figure, we can see that most information of the hyperspectral image is contained in the first column of Xp. From sub-figure 2, we can see that the second column almost contains the rest of the information which the first column does not contain. The fifth column of Xp contains the main identification of four targets.
42
6.1 Comparing Randomized SVD and Truncated SVD
In this section, we compare the two methods of randomized SVD (rSVD) and trun-
cated SVD (tSVD). SVDS is a Matlab function to calculate the truncated SVD (2.2).
The command of SVDS is [U, S, V ] = svds(A, k). From this command, we can get
the k largest singular values and the associated singular vectors of a matrix A. SVDS
is considered to be an efficient method to obtain the tSVD. Thus, we compare it with
rSVD which is coded in Matlab.
First, we compare the computational times of these two methods. Generate ran-
dom test matrices A ∈ Rn×n, n = 101, ..., 2000 and set the target rank k = 6. We use
Algorithm 4 to compute the rSVD. The result is shown as Fig. 6.1.
Figure 6.1: When the target rank is six, the computation time (seconds) of SVDS and rSVD.
From Fig. 6.1, we find that SVDS is almost as effective as rSVD when n is
43
relatively small. However, when n becomes large, the computation time of SVDS
which increases quickly is much greater than the computation time of rSVD, which
is kept in the range from 0 to 1 seconds.
6.2 Comparing Randomized SVD and CPPCA
In this section, we compare the rSVD and the CPPCA algorithms from the aspects of
accuracy and computation time. First, let us use the hyperspectral Gulfport dataset
to compare the accuracy of reconstruction by these two methods. For the rSVD,
the hyperspectral Gulfport dataset should be reorganized as a matrix A ∈ R115200×58.
Meanwhile, for the CPPCA, the hyperspectral Gulfport dataset should be reorganized
as a matrix X ∈ R58×115200, X = AT , and E(X) = E(A) = 0. When we obtain the
reconstruction matrices A and X, we use AT and X that are both 58×115200 matrices
to get the SNR. Fig. 6.2 shows that the rSVD trumps the CPPCA in the accuracy
of reconstruction.
Figure 6.2: Comparison of the reconstruction performances by rSVD and CPPCA with J = 20.
44
Second, compare the computation times of full datacube reconstructions of the
rSVD and the CPPCA. Table 6.1 shows the rSVD takes a little longer than the CP-
PCA to finish the task of construction with target rank k, when k/n = 0.2, 0.3, 0.4, 0.5.
Table 6.1: Computation times in seconds of the rSVD and CPPCA.
k/n 0.1 0.15 0.2 0.3 0.4 0.5
Computation time of rSVD 0.212 0.292 0.390 0.707 0.897 1.264
Computation time of CPPCA 0.247 0.305 0.331 0.368 0.399 0.509
Last, we compare the accuracy of the reconstructions of the eigenvectors, wi, of
the covariance matrix of X by these two methods. The motivation is that if we can
obtain a good reconstruction of the eigenvectors wi from the rSVD, it is helpful in
obtaining the principle components wi TX. PCA is a very useful tool in hyperspectral
imaging, such as in the process of classification. Thus, accurate reconstructions of
the eigenvectors improve the performance of PCA in hyperspectral imaging.
The first row of Fig. 6.3 shows the histograms of the angles between the first
four reconstructions of wi by the rSVD and the true eigenvectors, and the second
row shows the histograms of the angles between the the first four reconstructions of
wi by the CPPCA. We can see that the reconstructions of wi by the rSVD are more
accurate than those CPPCA. Moreover, this advantage appears more when the index
i of w increases, since the angles in the second row apparently increase.
45
Figure 6.3: Comparison of the reconstruction performances of first four wi by rSVD and CPPCA.
46
Chapter 7: Conclusions and Future Research
Recently, researchers have shown that randomization is very useful in low-rank
matrix approximation. In this thesis, we have presented a related method, randomized
SVD. We are interested in the performances of this method in hyperspectral imaging,
such as the accuracy and computation time. From our the numerical experiments,
we can draw observations as follows:
• Compared with the classical deterministic method truncated SVD implemented
in Matlab, randomized SVD can process the matrices in a shorter time. This
advantage is more obvious when the sizes of the matrices increase.
• Compared with the popular CPPCA method, randomized SVD is more accurate
in reconstruction.
• We have applied randomized SVD in classification, and it works well. Only 13
pixels of 115200 pixels are classified differently between the original matrix and
the projected data in our example. The accuracy of classification is above 99
percent.
Thus, the randomized SVD method performs well on large matrices. Although
CPPCA has special uses in resource constrained settings, randomized SVD is more
convenient and more accurate than CPPCA in most situations.
We will focus on how to further apply randomized SVD in hyperspectral imaging
in our future research. For example, we will use randomized SVD in classification with
segmented subsets, anomaly detection, and unmixing with hyperspectral images.
47
Bibliography
[1] Nicholas M. Short, Sr.. History of Remote Sensing: In the Beginning; Launch
Vehicles. 2009.
[3] Wikipedia contributors. Hyperspectral Imaging [Internet]. Wikipedia, The Free
Encyclopedia; 2012 May 26, 04:26 UTC [cited 2012 June 12]. Available from:
http : //en.wikipedia.org/wiki/Hyperspectral imaging.
[4] Wikipedia contributors. Remote Sensing [Internet]. Wikipedia, The Free En-
cyclopedia; 2012 June 7, 13:28 UTC [cited 2012 June 12]. Available from:
http : //en.wikipedia.org/wiki/Remote sensing.
[5] M. T. Eismann, Hyperspectral Remote Sensing. SPIE Press, 2012.
[6] H. F. Grahn and E. Paul Geladi Techniques and Applications of Hyperspectral
Image Analysis. Wiley, 2007.
[7] J. Bioucas-Dias, A. Plaza, N. Dobigeon, M. Parente, Q. Du, P. Gader, and
J. Chanussot. Hyperspectral unmixing overview: Geometrical, statistical, and
sparse regression-based approaches. IEEE Journal of Selected Topics in Applied
Earth Observations and Remote Sensing, vol. 99, pp. 1-16, 2012.
[8] Y. Chen. Improved nonlinear manifold learning for land cover classification
via intelligent landmark selection. Geoscience and Remote Sensing Symposium,
pp.545-548, 2006.
48
[9] D. D. Lee and H. S. Seung. Learning the parts of objects with nonnegative matrix
factorization. Nature, 401:788-791, 1999.
[10] P. Cunningham. Dimension reduction, Technical Report on Dimension Reduction
UCD-CSI-2007-7. August 2007.
[11] C. Li, T. Sun, K. Kelly, and Y. Zhang. A compressive sensing and unmix-
ing scheme for hyperspectral data processing. IEEE Trans Image Process,
21(3):1200-1210, 2012.
[12] T. Hastle, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning:
data mining, inference, and prediction. Springer, Berlin, 2008.
[13] V. Rokhlin and M. Tygert. A fast randomized algorithm for overdetermined
linear least-squares regression. PNAS, vol. 105, no. 36, pp. 13212-13217, 2008.
[14] B. Engquist and O. Runborg. Wavelet-based numerical homogenization with
applications Multiscale and Multiresolution Methods: Theory and Applications,
T. J. Barth et al., ed., vol. 20 of LNCSE, Springer, Berlin, pp. 97148, 2001.
[15] M. Johnson. Randomized algorithms for computation of singular value decom-
position of large matrices, presentation slides, 2010.
[16] H. Wang, S. Babacan, and K. Sayood. Lossless hyperspectral-image compression
using context-based conditional average. Geoscience and Remote Sensing, IEEE
Transactions on, vol. 45, no. 12, pp. 4187–4193, 2007.
[17] N. Halko, P. G. Martinsson and J. A. Tropp. Finding structure with randomness:
probabilistic algorithms for constructing approximate matrix Decompositions.
SIAM Review, 53(2):217-288, 2011.
49
[18] Q. Zhang, R. Plemmons, D. Kittle, D. Brady, and S. Prasad. Joint segmentation
and reconstruction of hyperspectral data with compressed measurements. Applied
Optics. vol. 50, no. 22, pp. 4417–4435, 2011.
[19] J. Fowler. Compressive-projection principal component analysis. IEEE Transac-
tions on Image Processing, 18(10):2230-2242, 2009.
[20] M. Gehm, R. John, D. Brady, R. Willett, and T. Schulz. Single-shot compres-
sive spectral imaging with a dual-disperser architecture. Optics Express, vol. 15,
no. 21, pp. 14 013–14 027, 2007.
[21] A. Wagadarikar, R. John, R. Willett, and D. Brady, Single disperser design for
coded aperture snapshot spectral imaging, Applied optics, vol. 47, no. 10, pp.
B44–B51, 2008.
[22] L. Trefethen and D. Bau, Numerical Linear Algebra. Society for Industrial Math-
ematics, no. 50, 1997.
[23] S. Dasgupta1 and A. Gupta. An elementary proof of a theorem of Johnson and
Lindenstrauss. Random Structures and Algorithms, 22(1):60-65, 2003.
[24] I.T. Jolliffe. Principal Component Analysis, Second Edition. Springer, NY, 2002.
[25] C. Liu, C. Zhao and L. Zhang. A new method of hyperspectral remote sens-
ing image dimensional reduction. Journal of Image and Graphics, 10(2):218-222,
2005.
[26] P. Shippert. Introduction to Hyperspectral Image Analysis. Remote Sensing of
Earth via Satellite, no. 3, Winter 2003.
[27] J. Zhang, J. Erway, X. Hu, Q. Zhang, R. Plemmons. Randomized SVD in hy-
perspectral imaging, preprint, May 2012.
50
[28] Y. Chen, N. Nasrabadi and T. Tran. Effects of linear projections on the perfor-
mance of target detection and classification in hyperspectral imagery. Journal of
Applied Remote Sensing, vol. 5, no. 1, pp. 053563-1-053563-25, 2011.
[29] X. Jia and J. A. Richard. Segmented principal components transformation for
efficient hyperspectral remote-sensing image display and classification. IEEE
Trans. Geoscience and Remote Sensing, vol. 37, no. 1, pp. 538-542, Jan. 1999.
[30] G. Motta and F. Rizzo and J. Storer. Hyperspectral Data Compression. Springer,
NY, 2006.
[31] W. Li, S. Prasad, J. Fowler, L.M. Bruce. Class dependent compressive-projection
principal component analysis for hyperspectral image reconstruction. 3rd IEEE
Workshop on Hyperspectral Signal and Image Processing: Evolution in Remote
Sensing (WHISPERS), 2011.
[32] B. Zhang and L. Gao. Hyperspectral Image Classification and Target Detection.
Science Press, China, 2011.
[33] J. Fowler and Q. Du. Reconstruction from compressive random projections of
hyperspectral imagery. Optical Remote Sensing: Advances in Signal Processing
and Exploitation Techniques, ch.3:31-48, 2011.
[34] A. K. Brodzik and J. M. Mooney. Convex projections algorithm for restoration
of limited-angle chromotomographic images. Journal of the Optical Society of
America A, 16(2):246-257, 1999.
[35] P. Gader, A. Zare, R. Close, and G. Tuell, Co-registered hyperspectral and Li-
DAR Long Beach, Mississippi data collection, 2010, University of Florida, Uni-
versity of Missouri, and Optech International.
51
[36] G. Martinsson. Randomized methods for computing the Singular Value Decom-
position (SVD) of very large matrices. Workshop on Algorithms for Modern Mas-
sive Data Sets, Palo Alto, June 2010.
52
Appendix A: Related Matlab Code
Here we attach some Matlab codes we used in previous chapters. These Matlab
codes include the codes for algorithms, figures and tables.
A.1 Algorithms
>> To find a lgor i thm 1 , a lgor i thm 2 , and algor i thm 3 , p l e a s e r e f e r http ://www. ece . msstate . edu/˜ f ow l e r /CPPCA/ .
>> Algorithm 4 function [U, S ,V]=randProjSVD I (A, k ) [m n ] = s ize (A) ; O = randn(n , k ) ; Y = A∗O; [Q R] = qr (Y, 0 ) ; B = Q’∗A; [U, S ,V] = svd (B) ; U = Q∗U;
>> Algorithm 5 function [U, S ,V]=randProjSVD II (A, k , q ) [m n ] = s ize (A) ; O = randn(n , k ) ; Y = A∗O; [Q R] = qr (Y, 0 ) ; for j = 1 : q
Y = A’∗Q; [Q R] = qr (Y, 0 ) ; Y = A∗Q; [Q R] = qr (Y, 0 ) ;
end B = Q’∗A; [U, S ,V] = svd (B) ; U = Q∗U;
>> Algorithm 6 [Q j ]= ITERandrangefinder (A, 1 0 ) ; B = Q’∗A;
53
[U, S ,V] = svd (B) ; UU = Q∗U; function Q =ITERandrangefinder (A, r ) [m n ] = s ize (A) ; Y=zeros (m, r ) ; for i =1: r
O=randn(n , r ) ; y i=A∗ O( : , i ) ; Y( : , i )=Y( : , i )+y i ;
end Q = [ ] ; N = zeros (1 , r ) ; for k = 1 : r
n i = norm(Y( : , k ) , 2 ) ; N( k )= ni ;
end max N = max(N) ; j =0; e p s i l o n =10ˆ−3; h = e p s i l o n /(10∗ sqrt (2/ pi ) ) ; s t ep = 0 ; while max N > h && step <25
step = step + 1 ; j=j +1; i f j > 1 Y( : , j ) = Y( : , j )−Q∗(Q’∗Y( : , j ) ) ; end qj = Y( : , j ) /norm(Y( : , j ) , 2 ) ; Q = [Q, q j ] ; omega = randn(n , 1 ) ; y = A∗omega − Q∗(Q’∗A∗omega ) ; Y = [Y, y ] ; N = [N, norm(y , 2 ) ] ; for i=j +1: j+r−1
Y( : , i ) = Y( : , i )−qj ∗( qj ’∗Y( : , i ) ) ; N( i ) = norm(Y( : , i ) , 2 ) ;
end
>> Algorithm 7 %Input :
54
load A. mat A=A( 1 : 5 8 , : ) ; %A i s a symmetric matrix . Omiga=randn (58 ,30) ; %30=25+5. Here , 25 i s the t a r g e t rank , 5 i s the oversamping
parameter . %we add a oversamping parameter , s i n c e we want to keep the
accurancy . %The error norm(A−U\LambdaU∗) produced by a l gor i thm 7 can be
l a r g e r than the error r e s u l t i n g from a lgor i thm 4. Y=A∗Omiga ; [Q R]=qr (Y, 0 ) ; Q=Q( : , 1 : 2 5 ) ; B=Q’∗Y∗pinv (Q’∗Omiga) ; %We d e f i n e B=Q’AQ; M u l t i p l y Q’ Omiga to each s ide , then we
have BQ’ Omiga=Q’AQQ’ Omiga ; %Since AQQ’\ approx A, BQ’ Omiga \approx Q’ AOmiga . BQ’ Omiga \
approx Q’Y. ( s t e p 5 in a l gor i thm 7) [V D]=eig (B) ; U=Q∗V; %s i n c e B=Q’AQ, A \approx QBQ’ ; A\approx QVD(QV) ’ ;
>> Algorithm 8
>> Algorithm 9 function [B, Q, A] = RSVD Encoder (A, Omiga) [M N]= s ize (A) ; J = 20 ; [N K] = s ize (Omiga{1}) ; for j = 1 : J
A{ j } = A( j : J :M, : ) ; Y t i l d e { j } = A{ j }∗Omiga{ j } ;
[Q{ j } R{ j }]=qr ( Y t i l d e { j } , 0 ) ; B{ j}=Q{ j } ’∗A{ j } ;
end
>> Algorithm 10 function [ A TILDE , U TILDE , S TILDE , V TILDE ] = RSVD Decoder (
B,Q) J=20; for j =1:J A TILDE{ j}=Q{ j }∗B{ j } ; end
55
A.2 Figures and Tables
>> Fig . 5 . 4 clear ; R= [ ] ; for i =1:1:100 [A, b , x]= grav i ty (15∗ i , 2 , 0 , 1 , . 5 ) ; [Q, j ]= ITERandrangefinder (A, 1 0 ) ; B = Q’∗A; [U, S ,V] = svd (B) ; UU = Q∗U; R( i , 1 )=norm(UU∗S∗V’−A) /norm(A) ; end figure x =15:15:1500 plot (x ,R( 1 : 1 0 0 , 1 ) , ’ r ’ )
>> Fig . 5 . 5 clear ; clc ; t=zeros (100 ,1 ) ; for i =1:1:100 [A, b , x]= grav i ty (15∗ i , 2 , 0 , 1 , . 5 ) ; t ic [Q j ] =ITERandrangefinder (A, 1 0 ) ; B = Q’∗A; [U, S ,V] = svd (B) ; UU = Q∗U; t ( i )=toc ; end figure x =15 :15 :1500 ; plot (x , t , ’−∗ ’ ) ;
>> Fig . 5 . 7 and Fig 5 .8 function [ o v e r a l l a c c u r a c y , each accuracy ]=CLASSIFICATION( ) load TESTDATA CLASSIFICATION. mat P=6;% 6 CLASSES [m n]= s ize (A) ;% MEAN(A) =0,STD(A) =1; Omiga=randn (58 ,25) ;%I f i t works s l ow ly , p l e a s e change i t to a
s m a l l e r number , such as 30. Y=A∗Omiga ; [Q R]=qr (Y, 0 ) ;
56
B=Q’∗A; [U, S ,V] = svd (B) ; U=Q∗U; WW=S ’∗U’ ; %f i g u r e ; %imagesc ( reshape (WW( 1 , : ) ,307 ,1280) ) ; %f i g u r e ; %imagesc ( reshape (WW( 2 , : ) ,307 ,1280) ) ; %f i g u r e ; %imagesc ( reshape (WW( 3 , : ) ,307 ,1280) ) ; %f i g u r e ; %imagesc ( reshape (WW( 4 , : ) ,307 ,1280) ) ; %f i g u r e ; %imagesc ( reshape (WW( 5 , : ) ,307 ,1280) ) ; %f i g u r e ; %imagesc ( reshape (WW( 6 , : ) ,307 ,1280) ) ; %f i g u r e ; %imagesc ( reshape (WW( 7 , : ) ,307 ,1280) ) ; %f i g u r e ; %imagesc ( reshape (WW( 8 , : ) ,307 ,1280) ) ; K=WW( 1 : 2 , : ) ’ ; opt i ons=s t a t s e t ( ’ MaxIter ’ ,500) ; [ IDX C]=kmeans (K,P, ’ opt ions ’ , opt i ons ) ; IDX=reshape (IDX, [ 3 2 0 360 ] ) ; imagesc (IDX) ; f igure [ IDX1 C1]=kmeans (A,P, ’ opt ions ’ , opt i ons ) ; IDX1=reshape (IDX1 , [ 3 2 0 360 ] ) ; imagesc (IDX1) ; a (1 )=sum(sum(IDX==1)) ; a (2 )=sum(sum(IDX==2)) ; a (3 )=sum(sum(IDX==3)) ; a (4 )=sum(sum(IDX==4)) ; a (5 )=sum(sum(IDX==5)) ; a (6 )=sum(sum(IDX==6)) ; a=sort ( a ) ; b (1 )=sum(sum(IDX1==1)) ; b (2 )=sum(sum(IDX1==2)) ; b (3 )=sum(sum(IDX1==3)) ; b (4 )=sum(sum(IDX1==4)) ; b (5 )=sum(sum(IDX1==5)) ; b (6 )=sum(sum(IDX1==6)) ;
57
b=sort (b) ; s=sum(abs ( a−b) ) ; o v e r a l l a c c u r a c y=(1−s /(m∗n) ) ; for i =1:P
each accuracy ( i )=abs ( a ( i )−b( i ) ) /b( i ) ; end each accuracy=diag (eye (P) )−each accuracy ’ ;
>> Fig . 6 . 2 function a l l r e c o n s t r u c t i o n s c o m p a r e 2 ( ) %Compare the r e c o n s t r u c t i o n performances by d i f f e r e n t methods
. clc ; clear %Input : load X. mat ; %X i s the standard data . The mean v e c t o r o f X has been
removed from X, i . e . E(X) =0. %Parameters ; [N M] = s ize (X) ; Ls=[3 3 3 3 3 3 ] ; %the number o f e i g e n v e c t o r s which we used
to form the approximated transform matrix . J=20; % s p l i t the o r i g i n a l matrix i n t o J p a r t i t i o n s . r e l a t i v e d i m e n s i o n s = [ 0 . 1 0 .15 0 .2 0 .3 0 .4 0 . 5 ] ; Ks = round( r e l a t i v e d i m e n s i o n s ∗ N) ; %t a r g e t rank p r o j e c t i o n m a t r i x f i l e = [ ’ p r o j e c t i o n s . ’ num2str(N) ’ . ’
num2str( J ) ’ . mat ’ ] ; %Psi PCA = PCA Train (X) ; o r i g i n a l transform matrix . X o r i g i n a l = X; A=X or i g ina l ’ ; A o r i g i n a l = A; q=5; %% CPPCA for index1 = 1 : length (Ks) K = Ks( index1 ) ; L = Ls ( index1 ) ; P{ index1} = CPPCA GenerateProjections (N, K, J ,
p r o j e c t i o n m a t r i x f i l e ) ; %Generate the c e l l o f p r o j e c t i o n matr ices . The l e n g t h o f the
;
58
%Compress the data X to Y. Here , the o u t p u t s X and Y are c e l l s .
t ic [ X check CPPCA , Psi CPPCA ] = CPPCA Decoder ( Y t i l d e { index1 } ,
P{ index1 } , L) ; toc
%X check CPPCA i s the r e c o n s t r u c t i o n o f c e l l X, and the Psi CPPCA i s the
%approximated transform matrix . To g e t the Psi CPPCA , Rayleigh−Ritz theory
%and convex−s e t o p t i m i z a t i o n are used . D CPPCA( index1 ) = SNR Dataset (X, X check CPPCA , . . .
[ Psi CPPCA zeros (N, N − L) ] ) ; end % To see the SNR of X and X check CPPCA . %% Randomized SVD Direct ly for index1 = 1 : length (Ks)
K = Ks( index1 ) ; Omiga{ index1} = RSVD Generaterandommatrices (N, K, J ) ;
% Generate the c e l l o f Gaussian random matr ices . The l e n g t h o f the c e l l
% i s J . [B{ index1 } , Q{ index1 } , A] = RSVD Encoder ( A or i g ina l , Omiga { index1 }) ;
%Compress the data A to B and Q. Here , the o u t p u t s B, Q and A are c e l l s .
%B i s the sma l l matrix . Q i s the o r t h o g o n a l matrix . t ic A check RSVD = RSVD Decoder (B{ index1 } , Q{ index1 }) ; toc %A check RSVD i s the r e c o n s t r u c t i o n o f c e l l A. A1=ce l l 2mat (A’ ) ; A2 = ce l l 2mat ( A check RSVD ’ ) ; %Change A and A check RSVD to A1 and A2 which have same
s i z e wi th X and X check CPPCA . A1=A1 ’ ; A2=A2 ’ ; D RSVD( index1 ) = mean(SNR(A1 , A2) ) ;
end % To see the SNR of A1 and A2 . %% Randomized SVD Power i t e r a t i o n
59
%Randomized SVD Power i t e r a t i o n i s used f o r the matrix whose s i g u l a r v a l u e s
%decays g r a d u a l l y . for index1 = 1 : length (Ks)
K = Ks( index1 ) ; Omiga{ index1} = RSVD Generaterandommatrices (N, K, J ) ; % Generate the c e l l o f Gaussian random matr ices . The l e n g t h
o f the c e l l % i s J . [B POWER{ index1 } , Q POWER{ index1 } , A POWER] =
RSVD Power iteration Encoder ( A or i g ina l , q , Omiga{ index1 }) ;
%Compress the data A to B and Q. Here , the o u t p u t s B, Q and A are c e l l s .
%B i s the sma l l matrix . Q i s the o r t h o g o n a l matrix . t ic A check RSVD POWER = RSVD Decoder (B POWER{ index1 } , Q POWER{
index1 }) ; toc %A check RSVD POWER i s the r e c o n s t r u c t i o n o f c e l l A. A1=ce l l 2mat (A POWER’ ) ; A2 = ce l l 2mat (A check RSVD POWER ’ ) ; A1=A1 ’ ; A2=A2 ’ ; D RSVD POWER( index1 ) = mean(SNR(A1 , A2) ) ;
end %% D CPPCA = D CPPCA( 3 : length (D CPPCA) ) ; D RSVD = D RSVD( 3 : length (D RSVD) ) ; D RSVD POWER=D RSVD POWER( 3 : length (D RSVD POWER) ) ; r e l a t i v e d i m e n s i o n s = . . .
r e l a t i v e d i m e n s i o n s ( 3 : length ( r e l a t i v e d i m e n s i o n s ) ) ;
f igure (1 ) ; c l f ; plot ( r e l a t i v e d i m e n s i o n s , D CPPCA, ’ LineWidth ’ , 2) ; hold on plot ( r e l a t i v e d i m e n s i o n s , D RSVD, ’ r ’ , ’ LineWidth ’ , 2) ; hold on plot ( r e l a t i v e d i m e n s i o n s , D RSVD POWER, ’ g ’ , ’ LineWidth ’ , 2) ; grid on xlabel ( ’ Re l a t i v e subspace dimension , K/N ’ ) ;
60
ylabel ( ’ Average SNR (dB) ’ ) ; legend ( ’CPPCA’ , ’ Randomized SVD ’ , ’ Randomized SVD POWER’ )
>> Fig . 6 . 3 and Table 6 .1 %Compare the accuracy o f r e c o n s t r u c t i o n o f f i r s t 2 columns o f
the transform matrix by rsvd and cppca f o r the data whose s i n g u l a r v a l u e s decay g r a d u a l l y .
%v1=the ang l e between V( : , 1 ) and r e a l y f i r s t column of the transform matrix
%v2=the ang l e between V( : , 2 ) and r e a l y second column of the transform matrix
%w1=the ang l e between Psi cppca ( : , 1 ) and r e a l y f i r s t column of the transform matrix
%w2=the ang l e between Psi cppca ( : , 2 ) and r e a l y second column of the transform matrix
clear ; load A. mat %mean(A) =0; [M N]= s ize (A) ; L=4;%the number o f e i g e n v e c t o r s which we used to form the
approximated transform matrix . J=20; % s p l i t the o r i g i n a l matrix i n t o J p a r t i t i o n s . k=25; %k i s the t a r g e t rank . X o r i g i n a l=A’ ; num tr i a l s = 10 ; [ Ps i S ] = PCA Train ( X o r i g i n a l ) ; for t r i a l = 1 : num tr i a l s %%rsvd Omiga=randn (58 ,25) ; Y=A∗Omiga ; [Q R]=qr (Y, 0 ) ; B = Q’∗A; [U, S ,V] = svd (B) ; s 1 r svd=SAM( Psi ( : , 1 ) ,V( : , 1 ) ) ; i f ( s 1 r svd > 90) ;
s 1 r svd = 180 − s 1 r svd ; end
omega1 rsvd ( t r i a l ) =s1 r svd ; s 2 r svd=SAM( Psi ( : , 2 ) ,V( : , 2 ) ) ; i f ( s 2 r svd > 90) ;
s 2 r svd = 180 − s 2 r svd ; end
omega2 rsvd ( t r i a l ) =s2 r svd ; s 3 r svd=SAM( Psi ( : , 3 ) ,V( : , 3 ) ) ;
61
i f ( s 3 r svd > 90) ; s 3 r svd = 180 − s 3 r svd ;
end omega3 rsvd ( t r i a l ) =s3 r svd ;
s 4 r svd=SAM( Psi ( : , 4 ) ,V( : , 4 ) ) ; i f ( s 4 r svd > 90) ;
s 4 r svd = 180 − s 4 r svd ; end
omega4 rsvd ( t r i a l ) =s4 r svd ; %% cppca P = CPPCA GenerateProjections (N, k , J ) ; [ Y t i lde , X] = CPPCA Encoder ( X or i g ina l , P) ; [ X check CPPCA , Psi CPPCA ] = CPPCA Decoder ( Y t i lde , P, L) ; s1 cppca= SAM( Psi ( : , 1 ) ,Psi CPPCA ( : , 1 ) ) ;
i f ( s1 cppca > 90) ; s1 cppca = 180 − s1 cppca ;
end omega1 cppca ( t r i a l ) =s1 cppca ; s2 cppca = SAM( Psi ( : , 2 ) ,Psi CPPCA ( : , 2 ) ) ;
i f ( s2 cppca > 90) ; s2 cppca = 180 − s2 cppca ;
end omega2 cppca ( t r i a l ) =s2 cppca ; s3 cppca= SAM( Psi ( : , 3 ) ,Psi CPPCA ( : , 3 ) ) ;
i f ( s3 cppca > 90) ; s3 cppca = 180 − s3 cppca ;
end omega3 cppca ( t r i a l ) =s3 cppca ;
s4 cppca= SAM( Psi ( : , 4 ) ,Psi CPPCA ( : , 4 ) ) ; i f ( s4 cppca > 90) ;
s4 cppca = 180 − s4 cppca ; end
omega4 cppca ( t r i a l ) =s4 cppca ; end subplot ( 2 , 4 , 1 ) ; hist ( omega1 rsvd ( : ) , linspace (0 , 90 , 100) ) ; A = axis ; A(1 ) = −10; A(2) = 90 ; axis (A) ; ylabel ( ’ Frequency ’ ) ; xlabel ( ’ Angle v 1 ( degree s ) ’ ) ;
62
disp ( [ ’ Average omega1 rsvd = ’ num2str(mean( omega1 rsvd ( : ) ) ) ’ degree s ’ ] ) ;
subplot ( 2 , 4 , 2 ) ; hist ( omega2 rsvd ( : ) , linspace (0 , 90 , 100) ) ; A = axis ; A(1 ) = −10; A(2) = 90 ; axis (A) ; ylabel ( ’ Frequency ’ ) ; xlabel ( ’ Angle v 2 ( degree s ) ’ ) ; disp ( [ ’ Average omega2 rsvd = ’ num2str(mean( omega2 rsvd ( : ) ) )
’ degree s ’ ] ) ; subplot ( 2 , 4 , 3 ) ; hist ( omega3 rsvd ( : ) , linspace (0 , 90 , 100) ) ; A = axis ; A(1 ) = −10; A(2) = 90 ; axis (A) ; ylabel ( ’ Frequency ’ ) ; xlabel ( ’ Angle v 3 ( degree s ) ’ ) ; disp ( [ ’ Average omega3 rsvd = ’ num2str(mean( omega3 rsvd ( : ) ) )
’ degree s ’ ] ) ; subplot ( 2 , 4 , 4 ) ; hist ( omega4 rsvd ( : ) , linspace (0 , 90 , 100) ) ; A = axis ; A(1 ) = −10; A(2) = 90 ; axis (A) ; ylabel ( ’ Frequency ’ ) ; xlabel ( ’ Angle v 4 ( degree s ) ’ ) ; disp ( [ ’ Average omega4 rsvd = ’ num2str(mean( omega4 rsvd ( : ) ) )
’ degree s ’ ] ) ; subplot ( 2 , 4 , 5 ) ; hist ( omega1 cppca , linspace (0 , 90 , 100) ) ; A = axis ; A(1 ) = −10; A(2) = 90 ; axis (A) ; ylabel ( ’ Frequency ’ ) ; xlabel ( ’ Angle w 1 ( degree s ) ’ ) ; disp ( [ ’ Average omega1 cppca = ’ num2str(mean( omega1 cppca ( : ) )
) ’ degree s ’ ] ) ;
63
subplot ( 2 , 4 , 6 ) ; hist ( omega2 cppca ( : ) , linspace (0 , 90 , 100) ) ; A = axis ; A(1 ) = −10; A(2) = 90 ; axis (A) ; ylabel ( ’ Frequency ’ ) ; xlabel ( ’ Angle w 2 ( degree s ) ’ ) ; disp ( [ ’ Average omega2 cppca = ’ num2str(mean( omega2 cppca ( : ) )
) ’ degree s ’ ] ) ; subplot ( 2 , 4 , 7 ) ; hist ( omega3 cppca ( : ) , linspace (0 , 90 , 100) ) ; A = axis ; A(1 ) = −10; A(2) = 90 ; axis (A) ; ylabel ( ’ Frequency ’ ) ; xlabel ( ’ Angle w 3 ( degree s ) ’ ) ; disp ( [ ’ Average omega3 cppca = ’ num2str(mean( omega3 cppca ( : ) )
) ’ degree s ’ ] ) ; subplot ( 2 , 4 , 8 ) ; hist ( omega4 cppca ( : ) , linspace (0 , 90 , 100) ) ; A = axis ; A(1 ) = −10; A(2) = 90 ; axis (A) ; ylabel ( ’ Frequency ’ ) ; xlabel ( ’ Angle w 4 ( degree s ) ’ ) ; disp ( [ ’ Average omega4 cppca = ’ num2str(mean( omega4 cppca ( : ) )
) ’ degree s ’ ] ) ;
64
Vita
Jiani Zhang was born on February 18, 1987. She graduated with a Bachelor of
Science degree in Mathematics from Beijing Institute of Technology, China, in July
2010. Then she attended the department of Mathematics at Wake Forest University,
and will receive an MA in mathematics from Wake Forest in August, 2012. She
is going to continue her studies in Mathematics by pursuing a Ph.D. from Tufts
University.
65