compression at the source for digital camcorders eurasip journal on advances in signal processing tg...

11
Hindawi Publishing Corporation EURASIP Journal on Advances in Signal Processing Volume 2007, Article ID 24106, 11 pages doi:10.1155/2007/24106 Research Article Compression at the Source for Digital Camcorders Nir Maor, 1 Arie Feuer, 1 and Graham C. Goodwin 2 1 Department of Electrical Engineering (EE), Technion-Israel Institute of Technology, Haifa 32000, Israel 2 School of Electrical Engineering and Computer Science, University of Newcastle, Callaghan NSW 2308, Australia Received 2 October 2006; Accepted 30 March 2007 Recommended by Russell C. Hardie Typical sensors (CCD or CMOS) used in home digital camcorders have the potential of generating high definition (HD) video sequences. However, the data read out rate is a bottleneck which, invariably, forces significant quality deterioration in recorded video clips. This paper describes a novel technology for achieving a better utilization of sensor capability, resulting in HD quality video clips with esentially the same hardware. The technology is based on the use of a particular type of nonuniform sampling strategy. This strategy combines infrequent high spatial resolution frames with more frequent low resolution frames. This combi- nation allows the data rate constraint to be achieved while retaining an HD quality output. Post processing via filter banks is used to combine the high and low spatial resolution frames to produce the HD quality output. The paper provides full details of the reconstruction algorithm as well as proofs of all key supporting theories. Copyright © 2007 Nir Maor et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. 1. INTRODUCTION: CURRENT TECHNOLOGY In many digital systems, one faces the problem of having a source which generates data at a rate higher than that which can be transmitted over an associated communication chan- nel. As a consequence, some means of compression are re- quired at the source. A specific case of this problem arises in the current technology of digital home camcorders. Figure 1 shows a schematic diagram of a typical digital camcorder. Images are captured on a two-dimensional sen- sor array (either CCD or CMOS). Each sensor in the array gives a spatial sample of the continuous image, a pixel, and the whole array gives a temporal sample of the time-varying image, a frame. The result is a 3D sampling process of a signal having two spatial dimensions and one temporal dimension. A hard constraint on the spatial resolution in each frame is determined by the number of sensors in the sensor array, while a hard constraint on the frame rate is determined by the minimum exposure time required by the sensor tech- nology. The result is a uniformly sampled digital video se- quence which perfectly captures a time-varying image whose spectrum is bandlimited to a box as shown in Figure 2. We note that the cross-sectional area of the box depends on the spatial resolution constraint, while the other dimension of the box depends on the maximal frame rate. As it turns out, the spectrum of typical time-varying images can reasonably be assumed to be contained in this box. Thus, the sensor technology of most home digital camcorders can, in prin- ciple, generate video of high quality (high definition). However, in current sensor technology, there is a third hard constraint, namely, the rate at which the data from the sensor can be read. This turns out to be the dominant con- straint since this rate is typically lower than the rate of data generated by the sensor. Thus, downsampling (either spa- tially and/or temporally) is necessary to meet the read out constraint. In current technology, this downsampling is done uniformly. The result is a uniformly sampled digital video se- quence (see Figure 1) which can perfectly capture only those scenes which have a spectrum limited to a box of consider- ably smaller dimensions. This is illustrated in Figure 3 where the box in dashed lines represents the full sensor capability and the solid box represents the reduced capability resulting from the use of uniform downsampling. The end result is quite often unsatisfactory due to the associated spatial and/or temporal information loss. With the above as background, the question addressed in the current paper is whether a dierent compression mecha- nism can be utilized, which will lead to significantly less in- formation loss. We show, in the sequel, that the technology we present achieves this goal. An important point is that the new mechanism does not require a new sensor array, but, instead, achieves a better utilization of existing capabilities. The idea thus yields resolution gains without requiring ma- jor hardware modification. (The core idea is the subject of

Upload: trinhtram

Post on 21-May-2018

215 views

Category:

Documents


1 download

TRANSCRIPT

Hindawi Publishing CorporationEURASIP Journal on Advances in Signal ProcessingVolume 2007, Article ID 24106, 11 pagesdoi:10.1155/2007/24106

Research ArticleCompression at the Source for Digital Camcorders

Nir Maor,1 Arie Feuer,1 and Graham C. Goodwin2

1 Department of Electrical Engineering (EE), Technion-Israel Institute of Technology, Haifa 32000, Israel2 School of Electrical Engineering and Computer Science, University of Newcastle, Callaghan NSW 2308, Australia

Received 2 October 2006; Accepted 30 March 2007

Recommended by Russell C. Hardie

Typical sensors (CCD or CMOS) used in home digital camcorders have the potential of generating high definition (HD) videosequences. However, the data read out rate is a bottleneck which, invariably, forces significant quality deterioration in recordedvideo clips. This paper describes a novel technology for achieving a better utilization of sensor capability, resulting in HD qualityvideo clips with esentially the same hardware. The technology is based on the use of a particular type of nonuniform samplingstrategy. This strategy combines infrequent high spatial resolution frames with more frequent low resolution frames. This combi-nation allows the data rate constraint to be achieved while retaining an HD quality output. Post processing via filter banks is usedto combine the high and low spatial resolution frames to produce the HD quality output. The paper provides full details of thereconstruction algorithm as well as proofs of all key supporting theories.

Copyright © 2007 Nir Maor et al. This is an open access article distributed under the Creative Commons Attribution License,which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1. INTRODUCTION: CURRENT TECHNOLOGY

In many digital systems, one faces the problem of having asource which generates data at a rate higher than that whichcan be transmitted over an associated communication chan-nel. As a consequence, some means of compression are re-quired at the source. A specific case of this problem arises inthe current technology of digital home camcorders.

Figure 1 shows a schematic diagram of a typical digitalcamcorder. Images are captured on a two-dimensional sen-sor array (either CCD or CMOS). Each sensor in the arraygives a spatial sample of the continuous image, a pixel, andthe whole array gives a temporal sample of the time-varyingimage, a frame. The result is a 3D sampling process of a signalhaving two spatial dimensions and one temporal dimension.A hard constraint on the spatial resolution in each frame isdetermined by the number of sensors in the sensor array,while a hard constraint on the frame rate is determined bythe minimum exposure time required by the sensor tech-nology. The result is a uniformly sampled digital video se-quence which perfectly captures a time-varying image whosespectrum is bandlimited to a box as shown in Figure 2. Wenote that the cross-sectional area of the box depends on thespatial resolution constraint, while the other dimension ofthe box depends on the maximal frame rate. As it turns out,the spectrum of typical time-varying images can reasonablybe assumed to be contained in this box. Thus, the sensor

technology of most home digital camcorders can, in prin-ciple, generate video of high quality (high definition).

However, in current sensor technology, there is a thirdhard constraint, namely, the rate at which the data from thesensor can be read. This turns out to be the dominant con-straint since this rate is typically lower than the rate of datagenerated by the sensor. Thus, downsampling (either spa-tially and/or temporally) is necessary to meet the read outconstraint. In current technology, this downsampling is doneuniformly. The result is a uniformly sampled digital video se-quence (see Figure 1) which can perfectly capture only thosescenes which have a spectrum limited to a box of consider-ably smaller dimensions. This is illustrated in Figure 3 wherethe box in dashed lines represents the full sensor capabilityand the solid box represents the reduced capability resultingfrom the use of uniform downsampling. The end result isquite often unsatisfactory due to the associated spatial and/ortemporal information loss.

With the above as background, the question addressed inthe current paper is whether a different compression mecha-nism can be utilized, which will lead to significantly less in-formation loss. We show, in the sequel, that the technologywe present achieves this goal. An important point is that thenew mechanism does not require a new sensor array, but,instead, achieves a better utilization of existing capabilities.The idea thus yields resolution gains without requiring ma-jor hardware modification. (The core idea is the subject of

2 EURASIP Journal on Advances in Signal Processing

TG

Memory

Display(TV or LCD)

Imageprocessing

pipelineImagesensor

Timevaryingscene

Uniformlysampled

videosequence

Figure 1: Schematics diagram of a typical digital camcorder.

ωt

ωx

ωy

Proportional to maximal

sensor resolution Proportional to maximal

frame rate

Figure 2: Sensor array potential capacity.

a recent patent application by a subset of the current au-thors [1].)

Previous relevant work includes the work done byShechtman et al. [2]. In the latter paper, the authors usea number of camcorders recording the same scene, toovercome single camcorder limitations. Some of these cam-corders have high spatial resolution but slow frame rate andothers have reduced spatial resolution but high frame rate.The resulting data is fused to generate a single high qualityvideo sequence (with high spatial resolution and fast framerate). This approach has limited practical use because of theuse of multiple camcorders. Also, the idea involves sometechnical difficulties such as the need to perform registrationof the data from the different camcorders. The idea describedin the current paper avoids these difficulties.

The layout of the remainder of the paper is as follows: inSection 2 we describe the spectral properties of typical videoclips. This provides the basis for our approach as presented inSection 3. Note that we describe our approach both heuris-tically and formally. In Section 4 we present experimental

ωt

ωx

ωy

Actual datarate transferred

Figure 3: Digital camcorder actual capacity.

ωt ωx

|I(ωx , 0,ωt)|

Figure 4: 3D spectrum of a typical video clip (shows only the ωt

and ωx axes).

results using our approach. Finally, in Section 5 we provideconclusions.

2. VIDEO SPECTRAL PROPERTIES

The technology that we present here is based on the premisethat the data read out rate, or equivalently, the volume inFigure 3, is a hard constraint. We deal with this constraintby modifying the downsampling scheme (data compression)so as to better fit the characteristics of the data. We do this byappropriate use of nonuniform sampling so as to avoid theredundancy inherent in uniform sampling. Background tothis idea is contained in [3] which discusses the potential re-dundancy frequently associated with uniform sampling (seealso [4]).

To support our idea we have conducted a thorough studyof the spectral properties of over 150 typical video clips. Toillustrate our findings, we show the spectrum of one of theseclips in Figure 4. (We show only one of the spatial frequencyaxes with similar results for the second spatial frequency.)We note in this figure that the spectral energy is concentrated

Nir Maor et al. 3

ωt

ωx

ωy

Figure 5: Spectral support shape of video clips.

around the spatial frequency plane and the temporal frequencyaxis. This characteristic is common to all clips studied. Wewill see in the sequel that this observation is, indeed, the cor-nerstone of our method.

To further support our key observation, we passed a largenumber of video clips through three ideal lowpass filters hav-ing spectral support of a fixed volume but different shapes.The first and second filters had a box-like support represent-ing either uniform spatial or temporal decimation. A thirdfilter had the more intricate shape shown in Figure 5. Theoutputs of these filters were compared both quantitatively(using PSNR) and qualitatively (by viewing the video clips)to the original input clip. On average, the third filter pro-duced a 10 dB advantage over the other two. In all cases ex-amined, the qualitative comparisons were even more favor-able than suggested by the quantitative comparison. Full de-tails of the study are presented in [5]. Our technology (aswill be shown in the sequel) can accommodate more intri-cate shapes which may constitute a better fit to actual spec-tral supports. Indeed, we are currently experimenting withthe dimensions and shape of the filter support as illustratedin Figure 5 to better fit the “foot print” of typical video spec-tra (see also Remark 2).

3. NONUNIFORMLY SAMPLED VIDEO

3.1. Heuristic explanation of the sampling strategy

By examining the spectral properties of typical video clips,as described in the previous section, we have observed thatthere is hardly any information which has simultaneouslyboth high spatial and high temporal frequencies. This ob-servation leads to the intuitive idea of interweaving a combi-nation of two sequences: one of high spatial resolution butslow frame rate and a second with low spatial resolutionbut high frame rate. The result is a nonuniformly sampledvideo sequence as schematically depicted in Figure 6. Notethat there is a time gap inserted following each of the highresolution frames since these frames require more time tobe read out (see Remark 3). In the remainder of the paper,we will formally prove that sampling schemes of the typeshown in Figure 6 do indeed allow perfect reconstruction of

Δx

Δy (2N + 1)Δy

(2N

+1)Δx

Frame type A Frame type B

t

(2M1 + 1)Δt

(2M2 + 1)Δt Δt

Figure 6: Nonuniformly sampled video sequence.

signals which have a frequency domain “footprint” of thetype shown in Figure 5.

3.2. Perfect reconstruction from nonuniformsampled data

To develop the associated theoretical results, we will utilizeideas related to sampling lattices. For background on theseconcepts the reader is referred to [3, 6, 7]. A central tool inour discussion will be the mulitdimensional generalized sam-pling expansion (GSE). For completeness, we have included abrief (without proofs) exposition of the GSE in Appendix A.A more detailed discussion can be found in, for example,[3, 8, 9].

In the sequel, we will first demonstrate that the sam-pling pattern used (see Figure 6) is a form of recurrent sam-pling. We will then employ the GSE tool. In particular, wewill utilize the idea that perfect reconstruction from a re-current sampling is possible if the sampling pattern and thesignal spectral support are such that the resulting matrix H(see (A. 24) in Appendix A) is nonsingular. Specifically, wewill show that the sampling pattern in Figure 6 allows per-fect reconstruction of signals having spectral support as inFigure 5.

To simplify the presentation we will consider only thecase where one of the spatial dimensions is affected whilethe other spatial dimension is untouched during the pro-cess (namely, it has the full available spatial resolution ofthe sensor array). The extension to the more general case isstraightforward but involves more complex notation. Thus,we will examine a sampling pattern of the type shown inFigure 7. Furthermore, to simplify notation, we let z = [ x

t ].We also use Δx and Δt to represent full spatial and temporalresolution. However, we note that the use of these samplingintervals in a uniform pattern would lead to a data rate thatcould not be read off the sensor array.

4 EURASIP Journal on Advances in Signal Processing

Δt

Δx

x

t

Figure 7: The nonuniform sampling pattern considered.

Also, to make the presentation easier, we will ignore theextra time interval after the high resolution frames as shownin Figure 5. (See also Remark 3.) More formally, we considerthe sampling lattice

LAT (T) = {Tn : n ∈ Z2}, (1)

where

T =[

(2L + 1)Δx 00 (2M + 1)Δt

]

. (2)

In each unit cell of this lattice we add 2(L + M) samples

{[�Δx

0

]}2L

�=1

⋃{[

0mΔt

]}2M

m=1

(3)

to obtain the sampling pattern

Ψ =2(L+M)+1⋃

q=1

{LAT (T) + zq

}, (4)

where

zq =

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

0, for q = 1,

⎣(q − 1)Δx

0

⎦ , for q = 2, . . . , 2L + 1,

⎣ 0

(q − 2L− 1)Δt

⎦ , for q = 2(L + 1), . . . ,

2(L + M) + 1.

(5)

As shown in Appendix A this constitutes a recurrent sam-pling pattern. Moreover, we readily observe that this is ex-actly the sampling pattern portrayed in Figure 7 (for L = 2and M = 3). (Note that, for these values, every seventh framehas full resolution while, in the low resolution frames, onlyevery fifth line is read.) The unit cell we consider for the re-ciprocal lattice, LAT (2πT−T) = {2πT−Tn : n ∈ Z2}, is

UC(2πT−T

)

={ω :

∣∣ωx

∣∣ <π

(2L + 1)Δx,∣∣ωt

∣∣ <π

(2M + 1)Δt

}.

(6)

Unit cell

ωx

π

Δt

π

Δx

π

(2L + 1)Δt

− π

Δx

− π

Δt

π

(2L + 1)Δx

− π

(2L + 1)Δx− π

(2L + 1)Δt

Figure 8: Unit cell of reciprocal lattice.

This is illustrated in Figure 8. (Note that the dashed boxin the figure represents the sensor data generation capacitywhich, as previously noted, exceeds the data transition capa-bility.) We next construct the set

S =2(L+M)+1⋃

p=1

{UC

(2πT−T

)+ cp

}, (7)

where

cp =

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

0, for p = 1,

⎢⎣

2π(p − 1)(2L + 1)Δx

0

⎥⎦ , for p = 2, . . . ,L + 1,

⎢⎣

2π(L + 1− p)(2L + 1)Δx

0

⎥⎦ , for p = L + 2, . . . , 2L + 1,

⎢⎣

02π(p − 2L− 1)

(2M + 1)Δt

⎥⎦ , for p = 2L + 2, . . . ,

2L + M + 1,

⎢⎣

02π(2L + M + 1− p)

(2M + 1)Δt

⎥⎦ , for p = 2L + M + 2, . . . ,

2(L + M) + 1.(8)

This set is illustrated in Figure 9. We observe that the set inFigure 9 is, in fact, a cross-section of the set in Figure 5 (atωy = 0). We are now ready to state our main technical result.

Theorem 1. Let I(z) be a signal bandlimited to the set S asgiven in (7) and let this signal be sampled on Ψ as given in (4).

Nir Maor et al. 5

Then, I(z) can be perfectly reconstructed from the sampled data{I(z)}z∈Ψ.

Proof. See Appendix B.

Theorem 1 establishes our key claim, namely, that per-fect reconstruction is indeed possible using the proposednonuniform sampling pattern. We next give an explicit formfor the reconstruction.

Theorem 2. With assumptions as in Theorem 1, perfect signalreconstruction can be achieved using

I(z) =∑

n∈Z2

2(L+M)+1∑

q=1

I(Tn + zq

)ϕq(z−Tn), (9)

where ϕq(z) has the form

ϕ1(z)=

⎢⎢⎢⎣

(1

2L + 1+

12M + 1

−1)

+1

2L + 1

2 sin(

πLx

(2L + 1)Δx

)cos

(π(L + 1)x

(2L + 1)Δx

)

sin(

πx

(2L + 1)Δx

)

+1

2M + 1

2 sin(

πMt

(2M + 1)Δt

)cos

(π(M + 1)t

(2M + 1)Δt

)

sin(

πt

(2M + 1)Δt

)

⎥⎥⎥⎦

×sin

(πx

(2L + 1)Δx

)

πx

(2L + 1)Δx

sin(

πt

(2M + 1)Δt

)

πt

(2M + 1)Δt

,

ϕq(z) =

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

sin(π

Δx

(x − (q − 1)Δx

))

π

Δx

(x − (q − 1)Δx

)

sin(

πt

(2M + 1)Δt

)

πt

(2M + 1)Δt

,

for q = 2, . . . , 2L + 1,

sin(

πx

(2L+1)Δx

)

πx

(2L + 1)Δx

sin(π

Δt

(t−(q−2L−1)Δt

))

π

Δt

(t − (q − 2L− 1)Δt

) ,

for q = 2(L + 1), . . . , 2(L + M) + 1.(10)

Proof. See Appendix C.

ωt

ωx

π

Δt

π

Δx− π

Δx

− π

Δt

Figure 9: The spectrum covering set S.

The reconstruction formula (9) can be rewritten as

I(z) =∑

n∈Z2

2(L+M)+1∑

q=1

Iq(Tn)ϕq(z−Tn)

=2(L+M)+1∑

q=1

[

Iq(z)∑

n∈Z2

δ(z−Tn)

]

∗ ϕq(z)

=2(L+M)+1∑

q=1

Idq (z)∗ ϕq(z),

(11)

where Iq(z) = I(z + zq) is the original continuous signalshifted by zq and Idq (z) = Iq(z)

∑n∈Z2 δ(z−Tn) is its (im-

pulse) sampled version (on the lattice LAT (T)). We see thatthe reconstruction can be viewed as having 2(L + M) + 1signals, each passing through a filter with impulse responseϕq(z). The outputs of these filters are then summed to pro-duce the final result. A further embellishment is possibleon noting that, in practice, we are usually not interestedin achieving a continuous final result of the type describedabove. Rather, we wish to generate a high resolution, uni-formly sampled video clip which can be fed into a digital highresolution display device. Specifically, say that we are inter-ested in obtaining samples on the lattice {I(T1)}m∈Z2 , where

T1 =[Δx 00 Δt

]

. (12)

Note that LAT (T) is a strict subset of LAT (T1). Thus,our goal is to convert the nonuniformly sampled data to a(high resolution) uniformly sampled data. This can be di-rectly achieved by utilizing (11). Specifically, from (11), weobtain

I(T1m

) =2(L+M)+1∑

q=1

k∈Z2

Iq(T1k

)ϕq

(T1(m− k)

)(13)

6 EURASIP Journal on Advances in Signal Processing

· · · · · · · · · · · · · · ·

...

...

I1(Tn)

I1L−1(Tn)

I2(L−1)(Tn)

I2(L−1)−1(Tn)

↑R

↑R

↑R

↑R

ϕ1

ϕ2(L−1)

ϕ2L−1

ϕ2(L−N)−1

...

...

Figure 10: The reconstruction process.

Figure 11: Alternative spectral covering set.

which is the discrete equivalent of (11). Iq(T1k) denotes thezero-padded (interpolated) version of Iq(Tn). (This is easilyseen since LAT (T) ⊂ LAT (T1).) This process is illustratedin Figure 10.

Remark 1. The compression achieved by the use of the spe-cific sampling patterns and spectral covering sets describedabove is given by

α = 2(L + M) + 1(2L + 1)(2M + 1)

. (14)

Remark 2. Heuristically, we could achieve even greater com-pression by using more detailed information about typicalvideo spectral “footprints.” Ongoing work is aimed at findingnonuniform sampling patterns which apply to more generalspectral covering sets. Figure 11 illustrates a possible spectral“footprint.”

Remark 3. As noted earlier, we have assumed in the abovedevelopment that a fixed time interval is used between allframes. This was done for the ease of exposition. A paral-lel derivation for the case when these intervals are not equal(as in Figure 6) has been carried out and is presented in [5].Indeed, this more general scheme was used in all our experi-ments.

Figure 12: Frames from original clip.

4. EXAMPLE

To illustrate the potential benefits of our approach we chosea clip consisting of 320 by 240 pixels per frame at 30 framesper second—Figure 12 shows six frames out of this clip. Thepixel rate of this clip is 2304000 pixels per second. We createa nonuniformly sampled sequence by using the methods de-scribed in Section 3 with L =M = 2. The resulting compres-sion ratio is (see (14)) 9/25. The reconstruction functions inthis case are

ϕ1(z) = 5

⎢⎢⎣−3 +

2 sin2πx5Δx

cos3πx5Δx

sinπx

5Δx

+2 sin

2πt5Δt

cos3πt5Δt

sinπt

5Δt

⎥⎥⎦

×sin

πx

5Δxπx

Δx

sinπt

5Δtπt

Δt

,

ϕq(z) =sin

π(x − (q − 1)Δx

)

Δxπ(x − (q − 1)Δx

)

Δx

sinπt

5Δtπt

5Δt

, for q = 2, 3, 4, 5,

ϕq(z) =sin

πx

5Δxπx

5Δx

sinπ(t − (q − 5)Δt

)

Δtπ(t − (q − 5)Δt

)

Δt

, for q = 6, 7, 8, 9.

(15)

Figures 13 and 14 show ϕ1(z) and ϕ5(z). Using thesefunctions we have reconstructed the clip. Figure 15 showsthe reconstructed frames corresponding to the frames inFigure 12. We observe that the reconstructed frames are al-most identical (both spatially and temporally) to the framesfrom the original clip.

Nir Maor et al. 7

−1

4

2

4

−2

1

0−25

tx

−4

24

Figure 13: ϕ1(z).

0.5

0x1

24

24

−2−4 −2

−4

Figure 14: ϕ5(z).

To illustrate that our method offers advantages relativeto other strategies, we also tested uniform spatial decimationachieved by removing, in all frames, 3 out of every 5 columns.This results in a compression ratio of ∼0.4 (>9/25). Whilethe compression ratio is larger (less compression), the result-ing clip is of significantly poorer quality as can be seen inthe frames of Figure 16. We also applied temporal decima-tion by removing 3 out of every 5 frames resulting again in acompression ratio of 0.4. Temporal interpolation was thenused to fill up the missing frames. The results are shownin Figure 17. We note that the reconstructed motion differsfrom the original one (see fourth and sixth frames from theleft).

5. CONCLUSIONS

This paper has addressed the problem of under utilizationof sensor capabilities in digital camcorders arising from con-strained data read out rate. Accepting the data read outrate as a hard constraint of a camcorder, it has been shownthat, by generating a nonuniformly sampled digital video se-quence, it is possible to generate improved resolution videoclips with the same read out rate. The approach presentedhere utilizes prior information regarding the “footprint” oftypical video clip spectra. Specifically, we have exploited theobservation that high spatial frequencies seldom occur si-multaneously with high temporal frequencies. Reconstruc-tion of an improved resolution (both spatial and temporal)digital video clip from the nonuniform samples has been pre-sented in a form of a filter bank.

Figure 15: Frames from reconstructed clip.

Figure 16: Frames from the spatially decimated clip.

Figure 17: Frames from the temporally decimated clip.

APPENDICES

A. MULTIDIMENSIONAL GSE

A.1. General

Consider a bandlimited signal f (z), z ∈ RD, and a samplinglattice LAT (T). Assume that LAT (T) is not a Nyquist lat-tice for f (z), namely, the signal cannot be reconstructed fromits samples on this lattice. Let UC(2πT−T) be a unit cell forthe reciprocal lattice LAT (2πT−T). Then, there always ex-ists a set of points {cp}Pp=1 ⊂ LAT (2πT−T) such that

support{f (ω)

} ⊂P⋃

p=1

{UC

(2πT−T

)+ cp

}. (A. 16)

Suppose now that the signal is passed through a bank of shift-

invariant filters {hq(ω)}Qq=1 and the filter outputs fq(z) arethen sampled on the given lattice to generate the data set{ fq(Tn)}n∈ZD , q=1,...,Q. We then have the following result.

Theorem 3. The signal f (z), under assumption (A. 16), canbe reconstructed from the data set { fq(Tn)}n∈ZD , q=1,...,Q if andonly if the matrix

H(ω)

=

⎢⎢⎢⎢⎢⎣

h1(ω + c1

)h2

(ω + c1

) · · · hQ(ω + c1

)

h1(ω + c2

)h2

(ω + c2

) · · · hQ(ω + c2

)

......

. . ....

h1(ω + cP

)h2

(ω + cP

) · · · hQ(ω + cP

)

⎥⎥⎥⎥⎥⎦∈ CP×Q

(A. 17)

has full row rank for all ω ∈UC(2πT−T). The reconstructionformula is given by

f (z) =Q∑

q=1

n∈ZDfq(Tn)ϕq(z−Tn), (A. 18)

8 EURASIP Journal on Advances in Signal Processing

where

ϕq(z) = |detT|(2π)D

UC(2πT−T )Φq(ω, z)e jω

Tzdω (A. 19)

and where Φq(ω, z) are the solutions of the following set of lin-ear equations:

H(ω)

⎢⎢⎢⎢⎣

Φ1(ω, z)Φ2(ω, z)

...ΦQ(ω, z)

⎥⎥⎥⎥⎦=

⎢⎢⎢⎢⎢⎣

e jcT1 z

e jcT2 z

...e jc

TP z

⎥⎥⎥⎥⎥⎦. (A. 20)

A.2. Perfect reconstruction from recurrent sampling

The above GSE result can be applied to reconstruction fromrecurrent sampling. By recurrent sampling we refer to a sam-pling pattern Ψ given by

Ψ =Q⋃

q=1

{LAT (T) + zq

}, (A. 21)

where, w.l.o.g. we assume that {zq} ⊂ UC(T). (Otherwise,one can redefine them as zq − Tnq ∈ UCT(T) and Ψ willremain the same.) The data set we have is { f (z)}z∈Ψ and ourgoal is to perfectly reconstruct f (z).

Let us define hq(ω) = e jωTzq , then fq(z) = f (z + zq) and

{fq(Tn)

}n∈ZD , q=1,...,Q =

{f (z)

}z∈Ψ. (A. 22)

Thus, we can apply the GSE reconstruction. In the currentcase

H(ω) =

⎢⎢⎢⎢⎢⎣

e j(ω+c1)Tz1 e j(ω+c1)Tz2 · · · e j(ω+c1)TzQ

e j(ω+c2)Tz1 e j(ω+c2)Tz2 · · · e j(ω+c2)TzQ

......

. . ....

e j(ω+cP)Tz1 e j(ω+cP)Tz2 · · · e j(ω+cP)TzQ

⎥⎥⎥⎥⎥⎦

= H · diag{e jω

Tz1 , . . . , e jωTzQ

},

(A. 23)

where

H =

⎢⎢⎢⎢⎢⎣

e jcT1 z1 e jc

T1 z2 · · · e jc

T1 zQ

e jcT2 z1 e jc

T2 z2 · · · e jc

T2 zQ

......

. . ....

e jcTP z1 e jc

TP z2 · · · e jc

TP zQ

⎥⎥⎥⎥⎥⎦∈ CP×Q. (A. 24)

Note that the matrix diag{e jωTz1 , . . . , e jωTzQ} is always

nonsingular, hence, by Theorem 3, perfect reconstruction ispossible if and only if H has full row rank. Clearly, a necessarycondition is Q ≥ P. For simplicity, one often uses Q = P.

B. PROOF OF THEOREM 1

The sampling pattern described in (4) is clearly a recurrentsampling pattern. We can thus apply the result of Theorem 3.As the discussion in Appendix A.2 concludes, all we need to

show is that the matrix H in (A. 24) is nonsingular. We notethat here we have Q = P = 2(L + M) + 1. By the definitions

of {zq}2(L+M)+1q=1 and {cp}2(L+M)+1

p=1 in (4) and (8), respectively,we observe that the resulting matrix H can be written as

H =⎡

⎣1 1T

2(L+M)

12(L+M) H

⎦ , (A. 25)

where 1P denotes a P-dimensional vector of ones and

H =⎡

⎣A 12L1T

2M

12M1T2L B

⎦ , (A. 26)

Ap,q =

⎧⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎩

e j(2π/(2L+1))qp,

for p = 1, 2, . . . ,L,

e j(2π/(2L+1))q(L−p),

for p = L + 1, 2, . . . , 2L,

q = 1, 2, . . . , 2L

(A. 27)

Bp,q =

⎧⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎩

e j(2π/(2M+1))qp,

for p = 1, 2, . . . ,M,

e j(2π/(2M+1))q(M−p),

for p =M + 1, 2, . . . , 2M,

q = 1, 2, . . . , 2M.

(A. 28)

From (A. 25) we can readily show that

H−1 = [1 + 1T

2(L+M)

(H − 12(L+M)1T

2(L+M)

)−112(L+M)

− (H − 12(L+M)1T

2(L+M)

)−112(L+M)

− 1T2(L+M)

(H − 12(L+M)1T

2(L+M)

)−1

× (H − 12(L+M)1T

2(L+M)

)−1].

(A. 29)

Hence, H−1 exists if and only if (H−12(L+M)1T2(L+M))

−1 exists.Using (A. 26) we can write (A. 27) and (A. 28) as

(H − 12(L+M)1T

2(L+M)

)−1

=⎡

(A− 12L1T

2L

)−10

0(B − 12M1T

2M

)−1

⎦ .(A. 30)

Hence, we need to establish that (A − 12L1T2L)−1 and (B −

12M1T2M)−1 exist. Using the definitions of A and B in (A. 27)

and (A. 28) we can readily show that

AHA = AAH = (2L + 1)I2L − 12L1T2L,

BHB = BBH = (2m + 1)I2M − 12M1T2 ,

(A. 31)

A12L = AH12L = −12L,

B12M = BH12M = −12M ,(A. 32)

Nir Maor et al. 9

where (·)H denotes the transpose conjugate of (·). Hence,

A−1 = 12L + 1

(A− 12L1T

2L

),

B−1 = 12M + 1

(B − 12M1T

2M

)(A. 33)

so that

(A− 12L1T

2L

)−1 = 12L + 1

AH ,

(B − 12M1T

2M

)−1 = 12M + 1

BH.

(A. 34)

This establishes that these inverses exist. Hence, the inverseof H also exists. This completes the proof.

C. PROOF OF THEOREM 2

The proof follows from a straightforward application of theGSE results of Theorem 3 to the case at hand.

We first combine (A. 29), (A. 30), (A. 32), and (A. 34) toobtain

H−1 =

⎢⎢⎢⎢⎢⎢⎢⎢⎣

1− 2L2L+!

− 2M2M + 1

12L + 1

1T2L

12M + 1

1T2M

12L + 1

12L1

2L + 1AH 0

12M + 1

12M 01

2M + 1BH

⎥⎥⎥⎥⎥⎥⎥⎥⎦

.

(A. 35)

Denoting

γ1(z) =

⎢⎢⎢⎢⎢⎢⎣

e jcT2 z

e jcT3 z

...

e jcT2L+1z

⎥⎥⎥⎥⎥⎥⎦

, γ2(z) =

⎢⎢⎢⎢⎢⎢⎣

e jcT2(L+1)z

e jcT3 z

...

e jcT2(L+M)+1z

⎥⎥⎥⎥⎥⎥⎦

(A. 36)

we can use (A. 20) and (A. 23) to obtain

⎢⎢⎢⎢⎢⎣

Φ1(ω, z)

Φ2(ω, z)...

Φ2(L+M)+1(ω, z)

⎥⎥⎥⎥⎥⎦

= diag{e− jωTz1 , . . . , e− jωTz2(L+M)+1

} ·H−1

⎢⎣

1

γ1(z)

γ2(z)

⎥⎦

(A. 37)

so that by (A. 35),⎡

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

Φ1(ω, z)

Φ2(ω, z)

...

Φ2(L+M)+1(ω, z)

⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

=

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

e− jωT z1

(1− 2L

2L+!− 2M

2M + 1+

12L + 1

1T2Lγ1(z) +

12M + 1

1T2Mγ2(z)

)

diag{e− jωT z2 , . . . , e− jωT z2L+1

}( 12L + 1

12L +1

2L + 1AHγ1(z)

)

diag{e− jωT z2(L+1) , . . . , e− jωT z2(L+M)+1

}( 12M + 1

12M +1

2M + 1BHγ2(z)

)

⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

.

(A. 38)

From (8) and (A. 36) we have

1T2Lγ1(z) =

2L+1∑

p=2

e jcTp z

=L∑

r=1

(e j(2πrx/((2L+1)Δx)) + e− j(2πrx/((2L+1)Δx)))

=2 sin

(πLx

(2L + 1)Δx

)cos

(π(L + 1)x

(2L + 1)Δx

)

sin(

πx

(2L + 1)Δx

)

(A. 39)

and similarly

1T2Mγ2(z) =

2 sin(

πMT

(2M + 1)Δt

)cos

(π(M + 1)t

(2M + 1)Δt

)

sin(

πt

(2M + 1)Δt

) .

(A. 40)

Also, from (8), (A. 27), and (A. 36) we obtain, after some al-gebra,

(AHγ1(z)

)r =

L∑

s=1

(e− j(2πrs/(2L+1))e j(2πsx/((2L+1)Δx))

+ e j(2πrs/(2L+1))e− j(2πsx/((2L+1)Δx)))

=2 sin

(πL(x − rΔx)(2L + 1)Δx

)cos

(π(L + 1)(x − rΔx)

(2L + 1)Δx

)

sin(π(x − rΔx)(2L + 1)Δx

)

(A. 41)

for r = 1, . . . , 2L, and similarly, from (8), (A. 28), and (A. 36),(BHγ2(z)

)r

=2 sin

(πM(t − rΔt)(2M + 1)Δt

)cos

(π(M + 1)(t − rΔt)

(2M + 1)Δt

)

sin(π(t − rΔt)(2M + 1)Δt

)

(A. 42)

for r = 1, . . . , 2M.

10 EURASIP Journal on Advances in Signal Processing

Substituting (A. 39)–(A. 42) into (A. 38) we obtain

Φq(ω, z)

=

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

1− 2L2L+!

− 2M2M + 1

+1

2L + 1

×2 sin

(πLx

(2L + 1)Δx

)cos

(π(L + 1)x

(2L + 1)Δx

)

sin(

πx

(2L + 1)Δx

)

+1

2M + 1

2 sin(

πMT

(2M + 1)Δt

)cos

(π(M + 1)t

(2M + 1)Δt

)

sin(

πt

(2M + 1)Δt

) ,

for q = 1,

12L + 1

e− j(q−1)Δxωx

×

⎜⎜⎜⎝

1 +2 sin

(πL

(x − (q − 1)Δx

)

(2L + 1)Δx

)

sin(π(x − (q − 1)Δx

)

(2L + 1)Δx

)

× cos(π(L + 1)

(x − (q − 1)Δx

)

(2L + 1)Δx

)

⎟⎟⎟⎠

,

for q = 2, . . . , 2L + 1,

12M + 1

e− j(q−2L−1)Δtωt

×

⎜⎜⎜⎝

1 +2 sin

(πM

(t − (q − 2L− 1)Δt

)

(2M + 1)Δt

)

sin(π(t − (q − 2L− 1)Δt

)

(2M + 1)Δt

)

× cos(π(M + 1)

(t − (q − 2L− 1)Δt

)

(2M + 1)Δt

)

⎟⎟⎟⎠

,

for q = 2(L + 1), . . . , 2(L + M) + 1.(A. 43)

Substituting (2), (6), and (A. 43) into (A. 19) and, after somefurther algebra, we obtain (10). This completes the proof.

REFERENCES

[1] A. Feuer and N. Maor, “Acquisition of image sequences withenhanced resolution,” U.S.A Patent (pending), 2005.

[2] E. Shechtman, Y. Caspi, and M. Irani, “Increasing space-timeresolution in video,” in Proceedings of the 7th European Con-ference on Computer Vision (ECCV ’02), vol. 1, pp. 753–768,Copenhagen, Denmark, May 2002.

[3] K. F. Cheung, “A multi-dimensional extension of Papoulis’generalized sampling expansion with application in minimum

density sampling,” in Advanced Topics in Shannon Sampling andInterpolation Theory, R. J. Marks II, Ed., pp. 86–119, Springer,New York, NY, USA, 1993.

[4] H. Stark and Y. Yang, Vector Space Projections, John Wiley &Sons, New York, NY, USA, 1998.

[5] N. Maor, “Compression at the source,” M.S. thesis, Departmentof Electrical Engineering, Technion, Haifa, Israel, 2006.

[6] E. Dubois, “The sampling and reconstruction of time-varyingimagery with application in video systems,” Proceedings of theIEEE, vol. 73, no. 4, pp. 502–522, 1985.

[7] A. Feuer and G. C. Goodwin, “Reconstruction of multidimen-sional bandlimited signals from nonuniform and generalizedsamples,” IEEE Transactions on Signal Processing, vol. 53, no. 11,pp. 4273–4282, 2005.

[8] A. Papoulis, “Generalized sampling expansion,” IEEE Transac-tions on Circuits and Systems, vol. 24, no. 11, pp. 652–654, 1977.

[9] A. Feuer, “On the necessity of Papoulis’ result for multi-dimensional GSE,” IEEE Signal Processing Letters, vol. 11, no. 4,pp. 420–422, 2004.

Nir Maor received his B.S. degree from theTechnion-Israel Institute of Technology, in1998, in electrical engineering. He is at thefinal stages of studies toward the M.S. de-gree in electrical engineering in the Depart-ment of Electrical Engineering, Technion-Israel Institute of Technology. Since 1998he has been with the Zoran Microelectron-ics Cooperation, Haifa, Israel, where he hasbeen working on the IC design for digitalcameras. He has ten years of experience in SoC architecture anddesign, and in a wide range of other digital camera aspects. He de-signed, developed, and implemented the infrastructure of the digi-tal camera products. In particular, his specialty includes the aspectsof image acquisition and digital processing together with a widerange of logic design aspects. While being with Zoran, he estab-lished a worldwide application support team. He took a major partin the definition of the HW architecture, development, and imple-mentation of the image processing algorithms and other HW ac-celerators. Later on, he leads the SW group, Verification group, andthe VLSI group of the digital camera product line. At the present,he is managing a digital camera project.

Arie Feuer has been with the ElectricalEngineering Department at the Technion-Israel Institute of Technology since 1983where he is currently a Professor and Headof the Control and Robotics Lab. He re-ceived his B.S. and M.S. degrees from theTechnion in 1967 and 1973, respectively,and his Ph.D. degree from Yale University in1978. From 1967 to 1970 he was in industryworking in automation design and between1978 and 1983 with Bell Labs in Holmdel. Between the years 1994and 2002 he served as the President of the Israel Association of Au-tomatic Control and was a member of the IFAC Council during theyears 2002–2005. Arie Feuer is a Fellow of the IEEE. In the last 17years he has been regularly visiting the Electrical Engineering andComputer Science Department at the University of Newcastle. Hiscurrent research interests include: (1) resolution enhancement ofdigital images and videos, (2) sampling and combined represen-tations of signals and images, and (3) adaptive systems in signalprocessing and control.

Nir Maor et al. 11

Graham C. Goodwin obtained his B.S.(physics), B.E (electrical engineering), andPh.D. degrees from the University of NewSouth Wales. He is currently Professor ofElectrical Engineering at the University ofNewcastle, Australia and holds honoraryDoctorates from Lund Institute of Technol-ogy, Sweden and the Technion Israel. He isthe coauthor of eight books, four edited vol-umes, and many technical papers. Grahamis the recipient of Control Systems Society 1999 Hendrik Bode Lec-ture Prize, a Best Paper award by IEEE Transaction on AutomaticControl, a Best Paper award by Asian Journal of Control, and twoBest Engineering Text Book awards from the International Federa-tion of Automatic Control. He is a Fellow of IEEE; an honorary Fel-low of Institute of Engineers, Australia; a Fellow of the AustralianAcademy of Science; a Fellow of the Australian Academy of Tech-nology, Science, and Engineering; a Member of the InternationalStatistical Institute; a Fellow of the Royal Society, London, and aforeign Member of the Royal Swedish Academy of Sciences.