quantized principal component analysis with applications to
TRANSCRIPT
International Journal of Innovative
Computing, Information and Control ICIC International c©2005 ISSN 1349-4198Volume x, Number 0x, x 2005 pp. 0–0
QUANTIZED PRINCIPAL COMPONENT ANALYSIS WITHAPPLICATIONS TO LOW-BANDWIDTH IMAGE COMPRESSION
AND COMMUNICATION
D. Wooden
School of Electrical and Computer Engineering
Georgia Institute of Technology
Atlanta, GA 30332, USA
M. Egerstedt
School of Electrical and Computer Engineering
Georgia Institute of Technology
Atlanta, GA 30332, USA
B.K. Ghosh
Department of Electrical and Systems Engineering
Washington University in St. Louis
St. Louis, MO 63130, USA
Abstract. In this paper we show how Principal Component Analysis can be mapped
to a quantized domain in an optimal manner. In particular, given a low-bandwidth
communication channel over which a given set of data is to be transmitted, we show how
to best compress the data. Applications to image compression are described and examples
are provided that support the practical soundness of the proposed method.
Keywords: principal component analysis, quantization, image compression
1
2 D. WOODEN, M. EGERSTEDT, AND B.K. GHOSH
1. Introduction. Principal Component Analysis (PCA) is an algebraic tool for com-
pressing large sets of statistical data in a structured manner. However, the reduction
results in real-valued descriptions of the data. In this paper, we take the compression
one step further by insisting on the use of only a finite number of bits for its repre-
sentation. This is necessary in a number of applications where the data is transmitted
over low-bandwidth communication channels. In particular, the inspiration for this work
came from the need for multiple mobile robots to share visual information about their
environment.
Assuming that the data x1, . . . , xN take on values in a d-dimensional space, one can
identify the d principal directions, coinciding with the eigenvectors of the covariance
matrix, given by
C =1
N
N∑
i=1
(xi − m)(xi − m)T , (1)
where m is the mean of the data.
If our goal is to compress the data set to a set of dimension n < d, we would pick
the n dominant directions, i.e. the directions of maximum variation of the data. This
results in an optimal (in the sense of least squared error) reduction of the dimension from
d to n. For example, if n = 0 then only the mean is used, while n = 1 corresponds to a
1-dimensional representation of the data. The fact that the reduction can be done in a
systematic and optimal manner has lead to the widespread use of PCA in a number of
areas, ranging from process control [7], to weather prediction models [3], to image com-
pression [2]. In this paper, we focus on and draw inspiration from the image processing
problem in particular, even though the results are of a general nature.
2. Principal Component Analysis. Suppose we have a stochastic process with sam-
ples xk ∈ Rd, k = (1, . . . , N), where N is the number of samples taken. Let
QPCA 3
1. m = 1N
∑N
k=1 xk be the mean of the data.
2. ei ∈ Rd be the ith principal direction of the system, where i ∈ {1, . . . , d}
3. ai ∈ R be the ith principal component, i.e.
ai = e Ti (x − m),
associated with the sample point x ∈ Rd. We can then reconstruct x perfectly from its
principal components and the system’s principal directions as
x = m +
d∑
i=1
aiei. (2)
If we wish to reduce the system complexity from a d-dimensional data set to n dimensions,
only the n principal directions (corresponding to the n largest eigenvalues of the covariance
matrix) should be chosen.
The main contribution in this paper is not the problem of reducing the dimension of
the data set, but rather the problem of communicating the data. Given n ≤ d number of
transmittable real numbers, the optimal choice for the reconstruction of x from these num-
bers is simply given by the n largest (in magnitude) principal components. But because
the ai’s are all real-valued, we are required to quantize them prior to communication, and
we wish then to transmit only the most significant quanta. In this paper we derive a
mapping of the PCA algorithm to a quantized counterpart.
3. Quantized Components. Let r ∈ N be the resolution of our system. For example,
if r = 10, then we are communicating decimal integers. If r = 16, then we are communi-
cating nibbles (i.e. half-bytes). Now, let K ∈ Z be the largest integer exponent of r such
thatmaxi∈{1,...,d}(|ai|)
rK∈ R[1,r−1]. (3)
4 D. WOODEN, M. EGERSTEDT, AND B.K. GHOSH
With this definition of r and K, we are equipped to define the quantities by which we will
decompose the principal components. We name the first quantity the quantized component,
zi = arg minζ∈Z
(
|ai − ζ|)
, (4)
where Z = {rK(−r + 1), rK(−r + 2), ..., rK(r − 1)}. As a result, we have that
0 ≤ |zi| ≤ (r − 1)rK, (5)
In other words zi is obtained from the integer in the range [−r + 1, +r − 1], which, when
scaled by rK, minimizes the distance to the principal component ai.
The second quantity, called the remainder component, is simply defined as
yi = ai − zi, (6)
and therefore,
0 ≤ |yi| <1
2rK. (7)
The remainder component is equivalent to the round-off error between zi and ai.
With these definitions, we define the quantized version of the original principal com-
ponents to be aQi , zi, i = 1, . . . , d. And, in a manner similar to the reconstruction of
x from its principal components, we can reconstruct a quantized version of x from its
quantized principal components:
xQ = m +
d∑
i=1
aQi ei. (8)
We presume that sufficient resources may be allocated during a start-up period in which
the transmitter and receiver can agree upon the real-valued mean and principal directions.
Thereafter, regular transmission of quantized components commences.
Now, the question remains, if we may only transmit one quantized component, which
one should we pick?
QPCA 5
ai r K zi yi
345.6 10 2 300 45.6345.6 24 1 336 9.6-984.1 10 3 -1000 15.9-924.1 10 2 -900 -24.1
Table 1. Example Set of Principal, Quantized, and Remainder Components.
Problem 1. Identify the quantized component which minimizes the error between xQ and
x. In other words, solve
arg mink∈{1,...,d}
∥
∥
∥
(
m +d
∑
i=1
δikaQi ei
)
− x∥
∥
∥
2
(9)
where
δik =
1, if i = k
0, otherwise
For the sake of clarity, we present Table 1 as an example of principal components and
their corresponding value of K, as well as their quantized and remainder components.
4. Main Result. Let S = {1, . . . , d} and Sz ={
s ∈ S∣
∣
∣|zs| ≥ |zm|, ∀m ∈ S
}
.
Theorem 4.1. If |Sz| = 1, i.e. ∃ n ∈ S such that |zn| > |zm| ∀m ∈ S, m 6= n, then n is
the solution to Problem 1, i.e. zn is the optimal component to transmit. |Sz| represents
the cardinality of Sz.
6 D. WOODEN, M. EGERSTEDT, AND B.K. GHOSH
Proof. Define the cost function
J(aQ) =∥
∥
∥xQ − x
∥
∥
∥
2
=
=∥
∥
∥(m +
d∑
i=1
aQi ei) − (m +
d∑
i=1
aiei)∥
∥
∥
2
=∥
∥
∥
d∑
i=1
ei(aQi − ai)
∥
∥
∥
2
=
d∑
i=1
(aQi )2 − 2aQ
i ai + a2i
=d
∑
i=1
z2i − 2zi(zi + yi) + a2
i
= −d
∑
i=1
(
z2i + 2ziyi
)
+d
∑
i=1
a2i . (10)
Now, define a similar cost function
Jk(aQ) =
∥
∥
∥m +
d∑
i=1
δikaQi ei − x
∥
∥
∥
2
.
Hence,
Jk(aQ) =
d∑
i=1
δik
(
(aQi )2 − 2aQ
i ai
)
+ a2i
= (aQk )2 − 2aQ
k ak +
d∑
i=1
a2i
= −(
z2k + 2zkyk
)
+d
∑
i=1
a2i . (11)
Taking n, m ∈ S, we may extend Eq. (11) to write
Jn − Jm = −(z2n + 2znyn) + (z2
m + 2zmym) (12)
= −z2n − 2|zn||yn|sgn(zn)sgn(yn) + z2
m + 2|zm||ym|sgn(zm)sgn(ym),
QPCA 7
where sgn(zi) indicates the sign (+ or −) of zi. Since sgn(zi)sgn(yi) ∈ {−1, +1}, we may
write
Jn − Jm ≤ −z2n + 2|zn||yn| + z2
m + 2|zm||ym| (13)
Now, assume that |zn| > |zm|, which gives us
|zn| = |zm| + αrK, (14)
where α ∈ Z+. Hence, we wish to show that Jn − Jm < 0.
Substituting Eq. (14) into Eq. (13),
Jn − Jm ≤ −(|zm| + αrK)2 + 2|yn|(|zm| + αrK) + z2m + 2|zm||ym|.
Using Eq. (7), we conclude that
Jn − Jm < −(|zm| + αrK)2 + (|zm| + αrK)rK +
+z2m + |zm|r
K
< 2|zm|rK − 2|zm|αrK − r2K
(
α2 − α)
=(
1 − α)(
2|zm|rK + αr2K
)
.
And finally, recalling that α ≥ 1,
Jn − Jm < 0, (15)
and the theorem follows.
Now, define S= ={
s ∈ Sz
∣
∣
∣sgn(zs) = sgn(ys)
}
and S 6= = Sz\S=. Moreover, define
S+y =
{
s ∈ S=
∣
∣
∣|ys| ≥ |yl|, ∀l ∈ S=
}
. In other words, S+y refers to the principal
component(s) with the largest quantized and remainder components which also have equal
signs.
8 D. WOODEN, M. EGERSTEDT, AND B.K. GHOSH
Theorem 4.2. If |Sz| > 1 and |S+y | > 0, i.e. ∃ n ∈ S+
y such that |yn| ≥ |ym|, |zn| = |zm|
∀m ∈ Sz and sgn(zn) = sgn(yn), then n is the solution to Problem 1, i.e. zn is the optimal
component to transmit.
Proof. By assumption, |Sz| > 1, and it is a direct consequence of the proof of Theorem 4.1
that we should choose between the elements of Sz for a quantized component to transmit.
Let n ∈ S+y and m ∈ Sz. Recalling Eq. (12),
Jn − Jm = −z2n − 2znyn + z2
m + 2zmym
= −2znyn + 2zmym
= −2|zn|(|yn| − |ym|sgn(zm)sgn(ym)).
It is true that either m ∈ S 6= or m ∈ S=. When m ∈ S 6=, sgn(zm)sgn(ym) = −1 and
therefore
Jn − Jm = −2|zn|(|yn| + |ym|) < 0. (16)
On the other hand, when m ∈ S=, sgn(zm)sgn(ym) = +1 and
Jn − Jm = −2|zn|(|yn| − |ym|). (17)
Note again that n ∈ S+y (i.e. |yn| ≥ |ym|). Hence, Eq. (17) becomes
Jn − Jm ≤ 0. (18)
Furthermore, Jn − Jm = 0 only when |yn| = |ym| and |zn| = |zm|. In other words,
|an| = |am| and clearly the two costs are equal.
Finally, define S−y =
{
s ∈ S 6=
∣
∣
∣|ys| ≤ |yl|, ∀l ∈ S 6=
}
.
Theorem 4.3. If |Sz| > 1 and |S+y | = 0, i.e. sgn(zs) 6= sgn(ys) ∀s ∈ Sz and ∃ n ∈ S−
y
such that |zn| = |zm| and |yn| ≤ |ym| ∀m ∈ Sz, then n is the solution to Problem 1, i.e.
|zn| is the optimal component to transmit.
QPCA 9
Proof. As was the case in Theorem 4.2, |Sz| > 1, but S= and S+y are empty (indicating
that the signs of the quantized and remainder components differ for all those in Sz). We
prove then that the optimal quantized component to transmit is the one with largest |zi|
and the smallest |yi|. Let n ∈ S−y and m ∈ Sz. Recalling again Eq. (12),
Jn − Jm = −z2n − 2znyn + z2
m + 2zmym
= −2znyn + 2zmym
= 2|zn|(|yn| − |ym|)
≤ 0,
and the theorem follows.
Theorem 4.1 tells us that we should transmit the quantized component largest in mag-
nitude. If this is not unique, Theorem 4.2 tells us to send the zi which also has the largest
remainder component which “points” in the same direction as its quantized component
(i.e. sgn(zi) = sgn(yi)). According to Theorem 4.3, if no such remainder component
exists (i.e. all yi “point” opposite of their quantized counterparts), then we send the zi
with the smallest remainder component.
When a unique largest (in magnitude) quantized component exists, it is a direct result of
Eqs. (6) and (7) that it corresponds to the largest (in magnitude) principal component. In
other words, Theorem 4.1 tells us to send the quantized component of the largest principal
component. As a consequence of Eq. (6), when the remainder component has the same
sign as the quantized component, the corresponding principal component is larger (again,
in magnitude) than the quantized component. Moreover, given |zi| and that zi and yi
have matching signs, |ai| is maximized by the largest value possible for |yi|. Theorem
4.2 tells us that, given the |zi| which are largest in magnitude, transmit the zi which has
matching remainder and quantized component signs and has the largest |yi|. In other
words, Theorem 4.2 tells us to transmit the zi corresponding to the largest |ai|.
10 D. WOODEN, M. EGERSTEDT, AND B.K. GHOSH
Similarly, given |zi| and mismatching signs of zi and yi, |ai| is maximized by the smallest
value possible for |yi|. Theorem 4.3 tells us that, given the |zi| which are largest in
magnitude, transmit the zi which has mismatching remainder and quantized component
signs and has the smallest |yi|. In other words, Theorem 4.3 tells us to transmit the zi
corresponding to the largest |ai|.
To summarize: Theorem 4.1, 4.2, and 4.3 tell us, given a set of principal, quantized, and
remainder components, which quantized component should be transmitted. This optimal
quantized component is always the one corresponding the principal component largest in
magnitude.
5. Iterative Algorithm. Naturally, in a practical application, we will want to sent a
succession of quantized components rather than just one, and so an iterative algorithm
is necessary. Fortunately, the nature of our approach very easily lends itself this, and the
algorithm is presented below:
Given a sample set of training data.
Compute the sample mean and principal directions.
Given a new sample x and the system resolution r:Compute x’s principal components ai, i = 1, . . . , d.Set bi = ai
Compute K via Eq. (3)Compute the quantized components zi via Eq. (4)Compute the remainder components yi = bi - zi
For the number of desired transmissions
Determine the optimal quantized component zopt via Thms. 1-3Transmit zopt
Set bopt = yopt
Recompute zopt from bopt via Eq. (4)Recompute K via Eq. (3)
End
The simplicity of this algorithm is appealing. Moreover, at each iteration, there is little
to compute, as little changes from one loop to the next. Indeed the real burden of this
approach comes at the first step in the computation of the principal directions. In the
QPCA 11
next section, we will discuss some of the complexity issues associated with this problem
as well as some methods for reducing it through a particular image compression example.
6. Image Compression Examples. We apply the proposed quantization scheme to
a problem in which images are to be compressed and transmitted over low-bandwidth
channels. Two separate data sets are used. The first set is comprised of 12 352x352 pixel
grayscale images of very similar scenes (Figure 4). The second is comprised of 118 64x96
pixel grayscale images of very different scenes (Figure 5).
The images themselves are represented in two different ways. In the first method, as
is common practice in image processing [4][5] , the images are broken into 8x8 distinct
blocks. Principal components and directions are computed over these 64-pixel pieces, and
the quantization algorithm is applied to each block. At each additional iteration, one
more quantized component is transmitted per block of the image. In the second method,
principal directions and components are computed over the entire image, as a whole.
Under this method, only a single quantized component is communicated per iteration.
There are computational advantages/disadvantages associated with employing either
method. In the first case, a greater number of transmissions is required for the same drop
in error, but computing the principal directions is easy, and the memory required to hold
them is small. In the second case, the mean-squared error drops off dramatically fast,
but computing and maintaining the principal directions can be practically intractable.
In other words, the computational burden associated with computing larger and larger
principal directions can be justified by the lower number of zi’s needed to reconstruct the
image.
Figure 1 shows successive iterations of the algorithm on selected images from our two
data sets. Each pairing in the figure ( (a) with (b), (c) with (d), etc. ) shows two
progressions of an image, where the first progression is based on block representations
of the image, and the second is based on full-image representations. The value of r was
12 D. WOODEN, M. EGERSTEDT, AND B.K. GHOSH
set to 16, meaning that at each iteration, 4 additional bits of information per block were
transmitted.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
Figure 1. Progression of Quantized Images - { Version (Image Set, ImageNumber) Iteration } (a) Block (1,4) 1 3 9 27 40 100 (b) Whole (1,4) 1 3 814 20 25 (c) Block (2,23) 1 5 10 15 30 50 80 (d) Whole (2,23) 1 10 24 50 90140 200 (e) Block (2,80) 1 5 10 15 30 50 80 (f) Whole (2,80) 1 10 24 50 90140 200 (g) Block (2,120) 1 5 10 15 30 50 80 (h) Whole (2,120) 1 10 24 5090 140 200
QPCA 13
Note that the PCA algorithm and our Quantized PCA algorithm rely on the agreed
knowledge between the sender/receiver of what the data mean and what the principal
directions are. Though these values may indeed not be stationary, it is possible to up-
date them in an online fashion. This can be accomplished, for example, by using the
Generalized Hebbian Algorithm [1][6]. Hence, even in a changing environment, as for
mobile robots, a realistic model of the surroundings can be maintained. Note moreover
that whether the image sets are broken into blocks or not, the mean-squared error of any
image drops off at approximately a log-linear rate with respect to the number of quantized
components transmitted. In the following section, we formalize this observation and show
how to predict the linear slope.
7. Error Regression. With a small amount of a priori information, we show how it is
possible to predict the rate at which the log of the mean-squared error falls off. This is
useful because given this rate, we may determine ahead of time how many transmissions
are needed. At each iteration of our algorithm, we transmit one quantized component.
As a result, the error between the true principal components a ∈ Rd and the receiver’s
reconstruction a is reduced along one of its d dimensions.
Per iteration, we can compute the error of our quantization and reconstruction as
ej = x − x =d
∑
i=1
(ai − ai,j)ei, (19)
where x =∑d
i=1(aiei)+m, x =∑d
i=1(ai,jei)+m, and ai,j is the receiver’s ith reconstructed
principal component after the jth iteration. The mean-squared error at the jth iteration
then is
msej =1
d
d∑
i=1
(
d∑
i=1
(ai − ai,j)ei
)2
(20)
=1
d
d∑
i=1
(
ai − ai,j
)2
(21)
14 D. WOODEN, M. EGERSTEDT, AND B.K. GHOSH
By our previous definitions, we note that ai− ai,j = yi,j, i.e. the ith remainder component
after the jth iteration, and
msej =1
d
d∑
i=1
yi,j2 (22)
More useful to us is the logarithm of the mean square error, which we will denote lmsej .
Taking the first difference of lmsej gives
4lmsej = lmsej − lmsej−1 (23)
=(
logr(1
d) + logr(
d∑
i=1
yi,j2)
)
−(
logr(1
d) + logr(
d∑
i=1
yi,j−12)
)
= logr
(
d∑
i=1
yi,j2
yi,j−12
)
(24)
Now, define the total set of indexes L = {1, . . . , d}, and let L0 be given by L0 ={
i ∈
L∣
∣
∣ai = 0
}
. Clearly, these components will never have any affect on the error as yi,j = 0
for all iterations j and ∀i ∈ L0.
It is intuitively clear, and a direct result of Theroem 4.1, that the value of K never
increases over iterations. (For example, see Figure 2.) In fact, K is a constant over blocks
0 200 400 600 800 1000−8
−6
−4
−2
0
2
Val
ue o
f K
Iteration
Value of K for Image Set 2, Image 80, r=16;
Figure 2. Example of the Value of K over Iterations.
QPCA 15
of iterations. After an initial start-up period (e.g. after 41 iterations in Figure 2), the
length of these blocks will be the same on average. For image sets that are very similar,
these block lengths can be very short. For dissimilar image sets, they will be longer,
bounded above by d.
Over these blocks of iterations, yi,j will change value at most once. It is thus possible
to compute the total change in error over an entire block, rather than the change at each
iteration. The question then is, how can we characterize the lengths of these blocks, and
how much should we expect the lmse to change? If we let Kj denote the value of K at
the jth iteration, and define Mκ as the set of iterations for which Kj is some constant κ
we can define
Lκ1 =
{
i ∈ L\L0
∣
∣
∣yi,j = yi,k ∀j, k ∈ Mκ
}
(25)
and
Lκ2 =
{
i ∈ L\L0
∣
∣
∣yi,j 6= yi,k ∀j, k ∈ Mκ, j 6= k
}
. (26)
L0 ∪ Lκ1 ∪ Lκ
2 = L and each set is disjoint. Naturally, L0 does not need to be indexed by
κ as it is invariant over iterations.
We now make the assumption that ∀j and ∀i /∈ L0, the zi,j’s are independently and
identically distributed (with respect to both i and j), and|zi,j |
rKj
is uniformly distributed
over the range {0, . . . , r − 1}. With these definitions, we use Eq. (23) to construct an
average change in log-mean-squared error. We compare the difference in MSE between n,
the last iteration of Mκ, and m, the last iteration of Mκ+1. This is the average 4lmse
over the Mκ interval.
4lmseave = logr
(
∑
i∈L0yi,n +
∑
i∈Lκ1yi,n +
∑
i∈Lκ2yi,n
∑
i∈L0yi,m +
∑
i∈Lκ+11
yi,m +∑
i∈Lκ+12
yi,m
)
. (27)
Recalling that yi = 0 ∀ i ∈ L0, and hence
4lmseave = logr
(
∑
i∈Lκ1yi,n +
∑
i∈Lκ2yi,n
∑
i∈Lκ+11
yi,m +∑
i∈Lκ+12
yi,m
)
. (28)
16 D. WOODEN, M. EGERSTEDT, AND B.K. GHOSH
We moreover have that |yi,j| = γi,jrKj , where γi,j ∈ [0, 1
2). By definition, yi,j is constant
over Mκ, and hence yi,m = yi,n ∀i ∈ Lκ+11 . In other words,
yi,m =
γi,nrκ, if i ∈ Lκ+1
1
γi,mrκ+1, if i ∈ Lκ+12
Similarly,
yi,n =
γi,prκ−1, if i ∈ Lκ
1
γi,nrκ, if i ∈ Lκ
2
where p is the last iteration of Mκ−1. Hence,
4lmseave = logr
(
∑
i∈Lκ1γ2
i,pr2(κ−1) +
∑
i∈Lκ2γ2
i,nr2κ
∑
i∈Lκ+11
γ2i,nr
2κ +∑
i∈Lκ+12
γ2i,mr2(κ+1)
)
(29)
The γi,j’s cancel out, which gives
4lmseave = logr
(
∑
i∈Lκ1r2κr−2 +
∑
i∈Lκ2r2κ
∑
i∈Lκ+11
r2κ +∑
i∈Lκ+12
r2κr2
)
(30)
To continue, we first draw some conclusions about the relative lengths of Lκ1 and Lκ
2 .
Obviously, there are r elements in the set {0, . . . , r − 1}. For all j ∈ M κ and for all
i /∈ L0, the probability that zi,j = 0 is 1r. The existence of zi,j’s that are zero over the
interval Mκ implies that the corresponding remainder components at the end of the last
interval block (i.e. yi,j where j ∈ Mκ+1) are equal to the remainder components at the
end of the current interval block (i.e. yi,j where j ∈ Mκ). In other words, these remainder
components do not change for a certain Kj, hence belong to Lκ1 , and 4lmse over this
interval is not increased or decreased by QPCA as a result of these quantized components.
Another set of indexes also belongs to Lκ1 . These correspond to the quantized com-
ponents which under the interval Mκ would have a zi value of 1rκ, but actually will
not be transmitted until the subsequent Mκ−1 interval block. For example, consider the
following, simple three dimensional system:
QPCA 17
Suppose r = 10 and y = [2543 1251 931]. At this iteration, we calculate that K = 3.
Thus, z = [2000 1000 1000]. We transmit, therefore, z1 = 2000 (which in turn makes
y1 = 543). Now, according to Theorem 4.2, we transmit z2 = 1000 (⇒ y2 = 251). But
we do not transmit z3 = 1000 yet. Instead, K should be re-evaluated to K = 2, and we
transition from M 3 to M2. Now, z = [500 300 900] and we transmit z3 = 900.
From this example, we see that some of the zi,j’s in an interval Mκ that are equal to rκ
should not be transmitted until we enter Mκ−1. Those zi’s which are transmitted are those
which in the previous interval Mκ+1 had the corresponding remainder component |yi,j| ≥
rκ. Given the assumption that yi,j and zi,j are uniformly distributed, this constitutes half
of the zi,j’s in Mκ, with |zi,j| = 1rκ. Those that are not transmitted in the Mκ interval
had a remainder component |yi,j| < rκ. Consequently, half of the probable 1r
zi,j in Mκ
that are equal to rκ are not transmitted in the Mκ interval.
In other words, the set of indexes Lκ1 is a fraction of the total number of available indexes
from L\L0, namely 1r
+ 0.5r
= 1.5r
. And L2 makes up the remaining fraction, 1 − 1.5r
. By
definition, |Lκ1 | + |Lκ
2 | is invariant over κ, and so we define the length ratio of Li as
ρi =|Li|
∑2i=1 |Li|
. (31)
Specifically,
ρ1 =1.5
r(32)
ρ2 = 1 −1.5
r. (33)
We plug this back into Eq. (30),
4lmseave = logr
(ρ1r−2 + ρ2
ρ1 + ρ2r2
)
= logr
(
r−2)
= −2. (34)
18 D. WOODEN, M. EGERSTEDT, AND B.K. GHOSH
In other words, 4lmseave drops by 2 after Mκ iterations, and Mκ = |Lκ2 |.
For our examples, we used images with d = 3136 and d = 6144 for image sets 1 and
2, respectively. The number of sample images used to compute the mean and principal
directions was 12 and 118. Consequently, the space spanned by our two image sets was
11 and 117 dimensional, which is equal to |L1| + |L2|. For example, with r = 10, we can
say then that the average length of L2 for the first image set was
11ρ2 = 11(1 −1.5
r) = 9.35,
and for image set 2 it was
117ρ2 = 117(1 −1.5
r) = 99.45.
We computed a slope for the error from our experimental results for image set 1 and 2,
with r equal to 10, 24, and 128, as shown in Figure 3. Note that the experimental results
coincide with the computed results to a very high degree. In fact, the predicted slope of
the error has a percentage error less than 0.02.
Several important facts should be noted. First, the length of Lκ2 deviates from the
average in the early ”start-up” region of the algorithm, and hence so does the slope. In
the start-up region, the error will drop off much faster, the length of Mκ are much shorter
here. Second, in general, image samples will not span such a small subspace of the original
d-dimensional space. Consequently, the average length of L2 grows and error drops off at
a lower rate. Third, as can be seen in Figure 3, the lmse does not drop off at constant
rate, but has “humps”. These “humps” have a frequency equal to |Lκ2 |.
8. Conclusions. We have proposed a method for transmitting quantized versions of the
principal components associated with a given data set in an optimal manner. This method
was shown to be applicable to the problem of image compression and transmission over
low-bandwidth communication channels. We also show how the progression of the error
QPCA 19
0 20 40 60 80 100 120−25
−20
−15
−10
−5
0
5
Iteration
log r(M
SE
)
LMSE for Image Set 1, with Prediction, r = 10
Plot of error for each image from Image Set 1.
Predicted slope of the error for r=10.
0 20 40 60 80 100−20
−15
−10
−5
0
5
Iteration
log r(M
SE
)
LMSE for Image Set 1, with Prediction, r = 24
Plot of error for each image from Image Set 1.
Predicted slope of the error for r=24.
0 10 20 30 40 50 60 70 80−14
−12
−10
−8
−6
−4
−2
0
2
Iteration
log r(M
SE
)
LMSE for Image Set 1, with Prediction, r = 128
Plot of error for each image from Image Set 1.
Predicted slope of the error for r=128.
(a) (b) (c)
0 200 400 600 800 1000−20
−15
−10
−5
0
5LMSE for Image Set 2, with Prediction, r = 10
Iteration
log r(M
SE
) Plot of error for each image from Image Set 2.
Predicted slope of the error for r=10.
0 200 400 600 800 1000−20
−15
−10
−5
0
5LMSE for Image Set 2, with Prediction, r = 24
Iteration
log r(M
SE
) Plot of error for each image from Image Set 2.
Predicted slope of the error for r=24.
0 100 200 300 400 500 600 700 800−14
−12
−10
−8
−6
−4
−2
0
2
Iteration
log r(M
SE
)
LMSE for Image Set 2, with Prediction, r = 128
Plot of error for each image from Image Set 2.
Predicted slope of the error for r=128.
(d) (e) (f)
Figure 3. Measured and Predicted Error for QPCA on (a) Image Set 1(IS1), r = 10 (b) IS1, r = 24 (c) IS1, r = 128 (d) IS2, r = 10 (e) IS2, r = 24(f) IS2, r = 128.
per iteration of our algorithm can be accurately predicted, based on the parameters of
the algorithm.
REFERENCES
[1] L. Chen, S. Chang: An adaptive learning algorithm for principal component analysis, IEEE Trans-
actions on Neural Networks, v6, i5, pp.1255-1263, 1995.
[2] X. Du, B.K. Ghosh, P. Ulinski: Decoding the Position of a Visual Stimulus from the Cortical Waves
of Turtles, Proceedings of the 2003 American Control Conference, v1, i1, pp.477-82, 2003.
[3] R. Duda, P. Hart, D. Stork: Pattern Classification, John Wiley and Sons, Inc., N.Y., 2001.
[4] M. Kunt: Block Coding of Graphics: A Tutorial Review, Proc. of IEEE, v68, i7, pp.770-86, 1980.
[5] M. Marcellin, M. Gormish, A. Bilgin, M. Boliek: An Overview of JPEG-2000, Proc. of IEEE Data
Compression Conference, pp.523-541, 2000.
20 D. WOODEN, M. EGERSTEDT, AND B.K. GHOSH
[6] T. D. Sanger: Optimal unsupervised learning in a single-layer linear feed-forward neural network,
Neural Networks, v2, i6, pp.459-473, 1989.
[7] C. Undey, A Cinar: Statistical Monitoring of Multistage, Multiphase Batch Processes, IEEE Control
Systems, v22, i5, pp.40-52, 2002.
QPCA 21
Figure 4. Image Set 1.
Figure 5. Image Set 2.