fast block motion estimation using gray code...
TRANSCRIPT
Fast Block Motion Estimation Using Gray‐Code Kernels
Yair Moshe
THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE
REQUIREMENTS FOR THE MASTER DEGREE
University of Haifa
Faculty of Social Sciences
Department of Computer Science
November, 2007
Fast Block Motion Estimation Using Gray‐Code Kernels
By: Yair Moshe
Supervised By: Dr. Hagit Hel‐Or
THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE MASTER DEGREE
University of Haifa Faculty of Social Sciences
Department of Computer Science
November, 2007
Approved by: ____________________________ Date: ___________________
(Supervisor)
Approved by: ____________________________ Date: ___________________
(Chairperson of M.A Committee)
I
Acknowledgment
To be written.
II
Table of Contents
ABSTRACT ............................................................................................................................................... IV
LIST OF FIGURES AND TABLES ................................................................................................................. V
1. INTRODUCTION……….. ........................................................................................................................ 1
1.1. FUNDAMENTALS OF VIDEO COMPRESSION……….. ................................................................................ 11.2. FAST MOTION ESTIMATION TECHNIQUES……….. ................................................................................. 81.3. ORGANIZATION OF THESIS………… ................................................................................................... 14
2. FAST PATTERN MATCHING USING WALSH‐HADAMARD PROJECTION KERNELS……….. ................... 15
3. THE GRAY‐CODE KERNELS……….. ...................................................................................................... 20
4. THE FME‐GCK ALGORITHM……….. .................................................................................................... 28
5. COMPLEXITY ANALYSIS………… ......................................................................................................... 33
6. FME‐GCK RESULTS……….. ................................................................................................................. 36
6.1. SIMULATION RESULTS……….. .......................................................................................................... 366.2. VIDEO ENCODING RESULTS………..................................................................................................... 46
7. AN ADAPTIVE FME‐GCK……….. ......................................................................................................... 55
8. ADAPTIVE FME‐GCK RESULTS……….. ................................................................................................ 61
9. CONCLUSION……….. .......................................................................................................................... 63
BIBLIOGRAPHY ....................................................................................................................................... 64
III
Fast Block Motion Estimation Using Gray‐Code Kernels
Yair Moshe
ABSTRACT
Motion estimation plays an important role in modern video coders. In such coders, motion is
estimated using a block matching algorithm that estimates the amount of motion on a block‐
by‐block basis. A full search technique for finding the best matching blocks delivers good
accuracy but is usually not practical because of its high computational complexity. In this
dissertation, a novel fast block‐based motion estimation algorithm is proposed. This
algorithm uses an efficient projection framework which bounds the distance between a
template block and candidate blocks. Fast projection is performed with a family of highly
efficient filter kernels – the Gray‐Code Kernels – using only 2 operations per pixel for each
filter kernel. The projection framework is combined with a rejection scheme which allows
rapid rejection of candidate blocks that are distant from the template block. The tradeoff
between computational complexity and quality of results could be easily controlled in the
proposed algorithm, thus it enables adaptivity to image content to further improve the
results. Experiments show that the proposed adaptive algorithm significantly outperforms
popular fast motion estimation algorithms, such as three‐step search and diamond search.
IV
V
List of Figures and Tables
Figure 1: Classical hybrid DPCM‐transform video encoding scheme. ................................................ 4 Figure 2: Block matching algorithm for motion estimation. ............................................................. 5 Figure 3: Frame prediction using block‐based motion estimation. ................................................... 5 Figure 4: Three step search. ......................................................................................................... 9 Figure 5: Diamond search. .......................................................................................................... 10
age struFigure 7: Projection of onto vector produce bound on distance . ........... 16 Figure 6: A pyramid im cture. .......................................................................................... 12
s lower Figure 8: The projection vectors of the WHT of order . U ........................................................ 19 Figure 9: The set of Gray Code Ke d their recursive definition visualized as a binary tree. ...... 21 rnels an
GCK. ...Figure 11: GCK with initial vector creates the WH basis set. ................................................. 24 Figure 10: Efficient filtering using ..................................................................................... 23
Figure 12: Extension of GCK to two dimensions. ........................................................................... 25 Figure 13: ‘Snake’ ordering of WH kernels. .................................................................................. 26 Figure 14: Increasing frequency ordering of WH kernels. .............................................................. 27 Figure 15: The FME‐GCK algorithm. ............................................................................................. 30 Figure 16: Image padding for rapid boundary calculation. ............................................................. 31 Figure 17: Motion information as overlaid arrows. ....................................................................... 37 Figure 18: Effect of different values of the parameter on motion estimation accuracy. ............... 39 Figure 19: FME‐GCK motion estimation accuracy vs. three‐step motion estimation accuracy. .......... 40 Figure 20: FME‐GCK motion estimation accuracy relative to the optimal results. ............................ 41 Figure 21: Effect of different values of the parameter on motion estimation accuracy. ................ 44 Figure 22: Effect of size of the search area on motion estimation accuracy. ................................... 45 Figure 23: FME‐GCK rate‐distortion video encoding results for Container QCIF. .............................. 47 Figure 24: FME‐GCK rate‐distortion video encoding results for Silent Voice QCIF. ........................... 48 Figure 25: FME‐GCK rate‐distortion video encoding results for Foreman QCIF. ............................... 49 Figure 26: FME‐GCK rate‐distortion video encoding results for Paris CIF. ........................................ 50 Figure 27: FME‐GCK rate‐distortion video encoding results for Foreman CIF. .................................. 51 Figure 28: FME‐GCK rate‐distortion video encoding results for Tempete CIF. .................................. 52 Figure 29: FME‐GCK rate‐distortion video encoding results for CIF. ..................................... 53 Mobile Figure 30: Size of residual signal using FME‐GCK with constant nd different values of . ..... 57 aFigure 31: Size of residual signal using FME‐GCK with constant and different values of . ..... 58 Figure 32: Adaptive FME‐GCK results (QCIF resolution). ................................................................ 62 Figure 33: Adaptive FME‐GCK results (CIF resolution). .................................................................. 62
Table 1: Video sequences used for simulation experiments. .......................................................... 36 Table 2: Video sequences used for video coding experiments. ....................................................... 46
1. Introduction
1.1. Fundamentals of Video Compression
This chapter gives a short overview of video compression fundamentals, not intended to be
a complete overview of this topic. Details that are irrelevant to the rest of the discussion are
intentionally ignored or only briefly introduced. For more details regarding video
compression the reader is referred to [1‐6].
Digital video is a representation of a natural visual scene sampled spatially and temporally. It
is a sequence of images, called frames, displayed at a certain frame rate to create the illusion
of animation. This rate, as well as the image size and pixel depth, depends heavily on the
application [1]. Even a very economical application, video streaming for a cellular phone,
might generate 15 fps (frames per second) with QCIF (176 × 144) image size and with bit
depth of 12 bits per pixel. This results in 15 x 176 x 144 x 12 = 4,561,920 bps (bits per
second). However, available bandwidth for this application is smaller by two orders of
magnitude. This situation is similar for most video applications, so significant bit rate
reduction is a necessary requirement.
Digital video compression has become an essential part of modern multimedia systems since
it enables significant bit rate reduction of the video signal for transmission or storage. Video
compression is normally lossy, namely the decompressed video sequence differs from the
original, but is 'close enough' to be useful in a many applications. The goal of a video
compression algorithm is to achieve efficient compression while minimizing the distortion
introduced by the compression process.
A video coder compacts a digital video sequence by decreasing redundancies, namely
components that are not necessary for faithful reproduction of the data:
• Spatial redundancy – Neighboring pixels of a frame are statistically correlated. Most of
the intensity values within an image change continuously from pixel to pixel.
1
• Temporal redundancy ‐ In a video sequence, the difference between consecutive frames
is small. This is true since natural video scenes typically involve smooth camera or object
motion and since the time interval between two consecutive frames is relatively short.
• Psychovisual redundancy – The visual perception of the human visual system is not
uniformly sensitive to the information contained in a video sequence. For example, it is
more sensitive to low spatial frequencies than to high spatial frequencies. Another
example is that it is more sensitive to changes in luma (intensity) than to changes in
chroma (color).
• Statistical redundancy – In a video stream, some data symbols may appear more
frequently than others.
Various techniques might be used for reducing these redundancies. Spatial redundancy is
reduced by predictive coding, predicting a pixel from its neighbors, and by transform coding.
Psychovisual redundancies are mainly reduced by careful quantization. Statistical
redundancy is reduced by entropy coding. One form of entropy coding is variable length
coding in which symbols with a higher occurrence probability are encoded with shorter
lengths.
Reduction of temporal redundancy of a video sequence is the most significant of these
redundancies. Changes between consecutive frames are attributed to the translation of
moving objects in the image. It is thus imperative to apply a process of estimating motion
vectors (displacement vectors) from one frame to the previous. This is referred to as motion
estimation. Translational motion estimation is a very simple model. It cannot accommodate
motions other than translation, such as rotation or camera zooming. Occlusion and
disocclusion of objects, together with lighting changes and various noise artifacts existing in
the frames, complicate the situation even further. Therefore, in order to attain good‐quality
frames in the receiver, coding of the residual (prediction error) is necessary. Differential
signals between the intensity value in the current frame and those of their counterparts in
the previous frame, which are translated by the estimated motion vectors, are encoded.
Adding the transmitted residual frame to the predicted frame, the decoder may reconstruct
the latter frame with satisfactory quality. This reconstruction process is referred to as
2
motion compensation. Through appropriate manipulations, the total amount of data for
both the motion vectors and residual is expected to be much less than the raw data existing
in the image frames, thus resulting in significant data compression [2].
In order to encourage interworking and competition, it has been necessary to define
standard methods of video encoding and decoding to allow products from different
manufacturers to communicate effectively. This has led to the development of several key
international standards for video compression, including the MPEG and H.26x series of
standards. Compression involves a complementary pair of systems, an encoder and a
decoder. The encoder converts the source data into a compressed form, occupying a
reduced number of bits, prior to transmission or storage, and the decoder converts the
compressed form back into a representation of the original video. The standards do not
define an encoder; rather, they define the output that an encoder should produce. A
decoding method is defined in each standard but manufacturers are free to develop
alternative decoders as long as they achieve results in accord with the standard [4]. Most of
the key international standards for video coding, including MPEG‐1 [7], MPEG‐2 [8], MPEG‐4
[9], H.261 [10], H.263 [11], and H.264/MPEG‐4 AVC [12], share the same hybrid DPCM‐
transform model, that will now be briefly described.
Figure 1 shows a classical hybrid DPCM‐transform video encoding scheme. The video
sequence is divided into GOPs (groups of pictures). The first frame of every GOP is an intra
frame and every other frame in the GOP is an inter frame. Intra frames are self‐sufficient and
are coded independently of previous frames. They are used as anchors for temporal
prediction. Inter frames are coded using motion‐compensated prediction from the previous
frame, which could be either an intra or an inter frame. The algorithm processing the frames
of a video sequence is block‐based. A video frame is divided into nonoverlapped rectangular
blocks, called macroblocks, with each macroblock having the same size, usually 16 x 16
pixels. Each macroblock is divided into smaller equal‐size regions, called blocks.
Blocks of an intra macroblock are transformed, quantized, and entropy coded. The purpose
of the transform is to decorrelate the picture content. Quantization reduces the number of
3
bits to encode by adaptively weighting the transform coefficients according to the human
visual system sensitivity. Entropy coding assigns longer code words for symbols with lower
probability.
Figure 1: Classical hybrid DPCM‐transform video encoding scheme.
For inter macroblock, a more complicated process that involves motion estimation and
motion compensation is used. The most frequently used technique in motion estimation for
video coding is the block matching algorithm (BMA). Each macroblock is assumed to move as
one, that is, all pixels in a macroblock share the same motion vector. As illustrated in Figure
2, a template macroblock in the current frame is compared to candidate blocks in a search
area, usually centered on the current macroblock position. The candidate block that
minimizes a matching criterion is chosen as ‘best match' and used as a predictor. The relative
position of each template macroblock in the current frame and its best match in the
previous frame produce a motion vector. The selected best matching region in the reference
frame is subtracted from the current macroblock to produce a residual (difference)
macroblock that is transformed, quantized, entropy coded, and transmitted together with
the motion vector. Figure 3 shows an example of frame prediction using motion
compensation.
4
1N − N
current macroblock
search area
‘best match’ macroblock
motion vector
Figure 2: Block matching algorithm for motion estimation.
(b) (a)
(c) (d)
(a) Frame N‐1. (b) Frame N. (c) Frame N with superimposed motion vectors. (d) Residualimage – subtraction of the Nth motion compensated residual frame from frame N. Meangray represents zero while brighter and darker intensities represent higher residual values(after contrast enhancement).
Figure 3: Frame prediction using block‐based motion estimation.
5
Various distortion measures could be used for finding the best match for a macroblock in the
motion estimation process. Mean Squared Error (MSE) provides a measure of the energy
remaining in the difference macroblock. MSE for a ‐sample macroblock can be
calculated as follows:
( )1 1 2
20 0
1 k k
ij iji j
MSE C R
where is a sample of the template macroblock and is a sample of the candidate block.
Mean Absolute Error (MAE) provides a reasonably good approximation of residual energy
and is easier to calculate than MSE, since it requires a magnitude calculation instead of a
square calculation for each pair of samples:
k
− −
= =
= −∑∑ (1.1)
1 1
20 0
1 k k
ij iji j
MAE C Rk
− −
= =
= −∑∑ (1.2)
The comparison may be simplified further by neglecting the term and simply calculating
the Sum of Absolute Differences (SAD):
1 1
0 0
k k
ij iji j
SAD C R− −
= =
= −∑∑ (1.3)
SAD gives a reasonable approximation to block energy and so Equation (1.3) is a commonly
used matching criterion for block‐based motion estimation [4].
Lately, a new distortion measure for motion estimation has been proposed – the Sum of
Absolute Transformed Differences (SATD). This measure takes a frequency transform, usually
a Hadamard transform, of the differences between the pixels in the template macroblock
and the corresponding pixels in a candidate block:
(1.4) 1 1
0 0
1 ( )2
ij ij ijk k
iji j
D C R
SATD HDH− −
= =
= −
= ∑∑
where is the kernel matrix of the Hadamard transform. The constant can, of course, be
neglected. SATD is considerably slower than the SAD but it more accurately predicts quality
from the viewpoints of both objective and subjective metrics. Therefore it is used in the
H.264 reference model software [13, 14], as well as in other new video encoders.
6
Frame reconstruction is the reverse process that starts with entropy decoding. For intra
macroblocks, each quantized block is then rescaled and inverse transformed to produce a
reconstructed macroblock. Note that the nonreversible quantization process implies that the
reconstructed macroblock is not identical to the original. For inter macroblocks, each
quantized block is rescaled and inverse transformed to produce a decoded residual. The
motion compensated prediction is added to the residual to produce a reconstructed
macroblock. The reconstructed frame is stored so it may be used as a reference frame for
the next encoded frame. This is necessary to ensure the encoder and decoder use identical
reference frames for motion compensated prediction.
There are many variations on the basic motion estimation and compensation process. The
reference frame may be a previous frame, a future frame or a combination of predictions
from two or more previously encoded frames. If a future frame is chosen as the reference, it
is necessary to encode this frame before the current frame (that is, frames must be encoded
out of order). Moving objects in a video scene rarely follow 16×16‐pixel boundaries and so it
may be more efficient to use a variable block size for motion estimation and compensation.
Objects may move by a fractional number of pixels between frames so a better prediction
may be obtained by interpolating the reference frame to sub‐pixel positions before
searching these positions for the best match [4]. In this dissertation, we ignore these
variations for the sake of simplicity. This does not reduce the generality of the proposed
solution since it could be readily extended to support theses variations.
7
1.2. Fast Motion Estimation Techniques
Motion estimation, although efficient in reducing temporal redundancy, incurs high
computational complexity. A brute force technique for finding the best matching region
within the search area in the reference frame is called full search. It is performed by
comparing all candidate blocks in the search area with the template macroblock. Full search
is usually impractical for real‐time applications due to the large number of comparisons
required. Measurements of video encoders' complexity using full search motion estimation
show that motion estimation comprise about 50%–90% of the overall encoder's complexity
[15]. So, many alternative ‘fast search’ motion estimation algorithms have been proposed in
the literature. A fast and accurate block matching algorithm is a critical part of every
practical video coder with significant impact on coding efficiency. According to [15], the main
concepts of these fast algorithms can be classified into six categories: reduction in search
positions, predictive search, simplification of matching criterion, bitwidth reduction,
hierarchical search, and fast full search.
The most popular category is the reduction in search positions. These algorithms reduce
search complexity by limiting the number of candidate blocks. In doing so, they use the
assumption that the matching error monotonically increases with the distance from the
search position with the minimum distortion (the optimal motion vector). This assumption is
not always valid and the process may converge to a local minimum on the error surface
rather than to the global minimum as in the full search algorithm. Well‐known algorithms in
this category are the two‐dimensional logarithmic search [16], three‐step search [17], four‐
step search [18], cross search [19], diamond search [20], and center‐biased diamond search
[21]. Diamond search based algorithms have significantly better performance in speed and
quality than their prior algorithms; about 30‐100 times faster than full search with about 0.3‐
3dB drop in PSNR compared to it [15]. However, due to its simplicity, three‐step search is
still commonly used.
In three‐step search [17], the first step computes the matching criteria for nine points in the
search area (see Figure 4). Of these nine points, the one corresponding to the minimum
matching error is selected. In the next step, another set of nine points are chosen
8
surrounding the selected point in a similar fashion to the first step, with the distances
between the nine points reduced by half. The third and final step is similar with a set of
candidate points located in an even smaller grid. Figure 4 demonstrates this procedure.
1 11
1
1
11
1 1
2
2
2
2 2
2
2
2
3
333
3
3 3 3
i+2 i+4 i+6ii-2i-4i-6
j+2
j+4
j+6
j
j-2
j-4
j-6
Figure 4: Three step search. Points (i+4,j+4), (i+6,j+4), and (i+7,j+5) give the minimum distortion in steps 1, 2, and 3, respectively.
Diamond search [20] employs two search patterns ‐ a large diamond search pattern and a
small diamond search pattern. The large pattern comprises of nine sampled points forming a
diamond shape. The small pattern comprises of five sampled points forming a smaller
diamond shape. In the first stage, the large search pattern is used repeatedly until the
minimum matching error occurs at the center point of the diamond pattern. The search
pattern is then replaced with the small search pattern for the second search stage. Of the
five sampled points in this stage, the position yielding the minimum matching error is
selected as the best matching block. Figure 5 demonstrates this procedure.
9
1 11
1
1
11
1
4
i+2 i+4 i+6ii-2i-4i-6
j+2
j+4
j+6
j
j-2
j-4
j-6
1 2
2
2
3
33
3 34
4
4
Figure 5: Diamond search. Points (i+1,j+1), (i+1,j+3), and again (i+1,j+3) give the minimum distortion in the first step, and point (i+3,j+3) gives the minimum distortion in the second and final step.
For video sequences with fast movement, fast search algorithms such as three‐step search
and diamond search perform poorly due to the frequent failure of the monotonically
increasing distortion model assumption. Predictive motion estimation [22, 23] utilizes the
motion information in the spatial and/or temporal neighboring macroblocks to form an
initial estimate of the current motion vector, thus it can effectively reduce the search area as
well as the computation. The motion vector predictors can be taken as the median or the
actual values of neighboring macroblocks on the left, top, and top right. Zero motion
vectors, or the motion vectors of the collocated macroblocks in the previous frame, may also
be used.
Another approach for fast motion estimation is to speed up the calculation of matching error
for each candidate block. This is usually achieved by subsampling the pixels in the template
10
and candidate blocks. Aliasing effects can be avoided by using low‐pass filtering or by
periodic alternation of different subsampling patterns [24, 25]. Finding the optimal match
with minimum matching error using this technique is not guaranteed. It may be combined
with the former two techniques to limit the number of search positions and to predict the
current motion vector.
Bitwidth reduction is a fast motion estimation technique that is rarely used and has some
relative advantages only for specific hardware configurations. Details of this approach could
be found in [15].
Another approach uses a multiresolution structure, also known as a pyramid structure,
which is a powerful computational configuration for image processing tasks. An example of
this structure is shown in Figure 6. Pyramids of the image frames are reconstructed by
successive two‐dimensional filtering and subsampling of the current and past image frames.
In this hierarchical search, conventional block matching, either full search or any fast
method, is first applied to the highest level of the pyramid. This motion vector is further
refined in the following levels [26]. The search area at the finer levels can be much smaller
than the original search range. Similar to the previously described approach, this technique
also has the disadvantage of possibly being trapped in a local minimum. In spite of this fact,
it has been regarded as one of the most efficient methods for motion estimation with very
large frames and search areas. In [15] it is reported to be about 10‐30 times faster with
about 0.2‐0.5 dB drop in PSNR compared to full search.
A different approach for fast motion estimation is to use some matching criteria to rule out
search positions while ensuring the global minimal matching error is still attained. First, a
simple test is performed to determine the candidate blocks that are possibly the optimal
one. Then, only these blocks are further processed with more precise distortion calculations.
Using an appropriate test, many search positions are determined as suboptimal and can be
excluded from being further considered in the motion vector search. Thus, search
complexity is reduced.
11
Level 0
Level 1
Level m-1
Level m
Level n
Figure 6: A pyramid image structure.
One well‐known example of this approach is the successive elimination algorithm [27] that
eliminates impossible candidate blocks by testing whether the absolute difference between
template macroblock pixel sum and candidate block pixel sum is larger than the up‐to‐date
minimum SAD. The sum of all pixels in template macroblock only has to be computed once,
and the sum of all pixels in a candidate block can be computed efficiently by exploiting
common sums. Another example of this approach is the block sum pyramid [28]. This
algorithm constructs, for each macroblock, the same pyramid structure described earlier.
Successive elimination is then performed hierarchically from the top level to the bottom
level of the pyramid. An improvement of this algorithm based on a winner‐update strategy is
presented in [29]. An ascending list of lower bounds on the matching error for each search
position is maintained. Computation of the matching error can be avoided if one of its lower
bounds is larger than the global minimum matching error. The algorithm computes the
lower bounds only when the previous lower bounds in the same list are smaller than the
global minimum matching error. In [15] the winner‐update strategy is reported to be about
10 times faster than full search (thus 3‐10 slower than diamond search).
Orthogonal transforms have also been shown to be useful for block motion estimation.
However, only very few algorithms using the Walsh‐Hadamard Transform (WHT) for block
motion estimation have been proposed in the literature. In [30] a hierarchical motion
12
estimation algorithm is propose in which the SAD of the WHT coefficients is used as a
distortion measure in four search levels. This algorithm is reported to be 17 times faster than
full search with < 0.1 dB drop in PSNR on average.
In [31] a fast full search algorithm using the MSE is proposed. The MSE is calculated using the
WHT coefficients with the lowest coefficients first. An early termination criterion based on
the successive elimination algorithm [27] is used for early exclusion of impossible
candidates. Efficient calculation of the transform coefficients is performed based on the
overlapping nature of search regions, using approximately 2 operations per pixel per
transform coefficient. This result is similar to ours but is less general and suffers from
constants that degrade the overall algorithm performance significantly. Our algorithm also
has some important advantages compared to [31], as will be described in details later. This
algorithm is reported to be 16‐26 times faster than full search for 16x16 macroblocks.
Very recently, another fast motion estimation algorithm using the WHT has been proposed
[32]. This algorithm uses a winner‐update strategy [29] in the Walsh Hadamard (WH)
domain together with a simple scheme for predictive motion estimation. The algorithm
requires about 65 times fewer operations than full search with < 0.25 dB drop in PSNR. It is
difficult to compare the performance of this algorithm to other competing algorithms since it
considerably benefits from its predictive motion estimation scheme, which can be combined
with other algorithms.
13
1.3. Organization of Thesis
This dissertation is organized as follows: Fast pattern matching algorithms using WH
projection kernels and Gray‐Code Kernels (GCK) are first described in Chapter 2 and Chapter
3, respectively. The proposed fast block motion estimation algorithm, based on these fast
pattern matching algorithms, is presented in Chapter 4. Complexity analysis and results are
given in Chapter 5 and Chapter 6, respectively. The proposed algorithm is further refined for
adaptivity in Chapter 7. Adaptive algorithm results are given in Chapter 8. Finally,
conclusions are drawn in Chapter 9.
14
2. Fast Pattern Matching Using Walsh‐Hadamard Projection Kernels
The block motion estimation problem is a variant of the pattern matching problem. In this
chapter a novel pattern matching technique, suggested in [33, 34], is presented. The
suggested approach uses an efficient WH projection scheme which bounds the distance
between a pattern and an image window using very few operations on average. The
projection framework is combined with a rejection scheme which allows rapid rejection of
image windows that are distant from the pattern.
The pattern matching problem involves finding a particular pattern in an image where the
pattern is usually much smaller than the image. This can be performed naively by scanning
the entire image and evaluating the similarity between the pattern and a local 2D window
about each pixel. Assume a 2D pattern, , , is to be matched within an image
, of size . For each pixel location , in the image, the Euclidean distance may
be calculated:
(2.1) ( ){ }
122
,, 0
( , ) ( , ) ( , )k
E x yi j
d I p I x i y j p i j−
= + + −∑=
where , denotes a local window of at coordinates , . In the context of motion
estimation, this procedure is equivalent to full search block matching of a template
block to a set of candidate blocks in a search area of size with the MSE criterion.
Referring to the pattern and window as vectors in , is the difference
vector between and . The Euclidean distance can then be rewritten in vectorial form:
( , ) TEd p w d d d= = (2.2)
Now assume, as illustrated in Figure 7, that and are not given but only the values of
their projection onto a vector . Let
(2.3) T T Tb u d u p u w= = −
15
be the projected distance value. Since the Euclidean distance is a norm, it follows from the
Cauchy‐Schwartz inequality that a lower bound on the actual Euclidean distance can be
inferred from the projection values. Using Cauchy‐Schwartz inequality for norms, it follows
that:
Tu d u d≥ (2.4)
This implies:
( )
( , )T T
E
u p w u p u wd p w p w d
u u− −
= − = ≥ =T
(2.5)
and:
22 2( , ) /Ed p w b u≥ (2.6)
Figure 7: Projection of onto vector produces lower bound on distance .
d
b2
b1
w
p
u1
u2
If a collection of projection vectors are given … along with the corresponding
projected distance values , the lower bound on the distance can then be tightened
(see [34] for details):
(2.7) 2 2( , ) ( ( ,TE md p b U LB p w≥ =
where … and … so that . As the number of projection
vectors increases, the lower bound on the distance , becomes tighter. In the
extreme case when the rank of equals , the lower bound reaches the Euclidean
distance.
1) )Tw U b−
16
An iterative scheme for calculating the lower bound is also possible. Given an additional
projection vector and projection value , the previously computed lower bound
can be updated without recalculating the inverse of the entire system . The
component of in the kernel of is calculated as:
1 1T
m m m m mu u U U u 1+ + += − (2.8)
so that 0. If the projection vectors are orthogonal, an updated lower bound is:
2 21 2
1( , ) ( , )m m2
1mLB p w LB p w bγ+ += + (2.9)
where (see [34] for details).
If the projection vectors are also orthonormal, the distance lower bound after
projections can be reduced to:
2 1( , ) )T Tm (TLB p w b U U b b b−= = (2.10)
and the normalizing factor in equation (2.9), , can be discarded.
Returning to the pattern matching, a window can be determined as being far from the
pattern if the lower bound is above a certain threshold. Windows can be rejected as non‐
pattern without actually computing the true distance. In the context of this problem, since
lower bounds are compared, the true lower bound is also not required. Thus, even if the
projection vectors are orthogonal and not orthonormal, the normalizing factor in equation
(2.9), , can be discarded.
In order for this approach to be efficient, vectors should be chosen according to the
following two necessary requirements:
• The projection vectors should be highly probable of being parallel to the vector
.
• Projections of image windows onto the projection vectors should be fast to compute.
The first requirement implies that, on average, the first few projection vectors produce a
tight lower bound on the pattern‐window distance. This, in turn, will allow rapid rejection of
image windows that are distant from the pattern. The second requirement arises from the
17
fact that the projection calculations are performed many times for each window of the
image. Thus, the complexity of calculating the projection plays an important role when
choosing the appropriate projection vectors.
A set of projection vectors shown in [33, 34] to satisfy the above two requirements are the
WH basis vectors. For natural images, these vectors capture a large portion of the pattern‐
window distance with few projections on average. In addition, an efficient method for
calculating the projection values for these vectors was introduced (but not used here).
The Walsh‐Hadamard transform has long been used for image representation under
numerous applications [35]. The elements of the WH (nonnormalized) basis vectors are
orthogonal and contain only binary values (±1). Thus, computation of the transform requires
only integer additions and subtractions. The WHT of an image window of size (with a
power of 2) is obtained by projecting the window onto WH basis vectors. In the case of
pattern matching within an image, it is required to project each window of an
image onto the vectors. This results in a highly overcomplete image representation.
The projection vectors associated with the 2D WHT of order 8 are shown in Figure 8.
Each basis vector is of size 8x8, where white represents the value +1 and black represents
the value ‐1. In Figure 8, the basis vectors are displayed in order of increasing sequency (the
number of sign changes along rows and columns of the basis vector). The algorithm
suggested in [33, 34] induces an ordering of basis vectors that is not exactly according to
sequency, and it is shown by experiments that this ordering still captures the increase in
spatial frequency.
18
Figure 8: The projection vectors of the WHT of order .
Projection vectors are ordered with increasing spatial frequency. White represents the value +1 and black represents the value ‐1.
As discussed above, the second critical requirement of the projection vectors is the
efficiency of computation. A method for calculating the projections of all image windows
onto a sequence of WH vectors is discussed in the next chapter. This method, in addition to
being very efficient, does not bind the algorithm to any fixed ordering of basis vectors.
Finally, we note that the projections approach described above deals with the Euclidean
distance however it is applicable to any distance measure that forms a norm. The
correctness of the iterative scheme is proved in [33, 34] only for norm‐2. However, in our
case the iterative projection scheme will be used with the well‐known SAD (norm‐1) distance
measure. This is applicable since as more projections are performed a lower bound on the
SATD (see previous chapter) is tightened. In [36], the correctness of the iterative projection
scheme with the SAD as the distance measure is proven.
19
3. The Gray‐Code Kernels
In [37, 38] a family of filter kernels – the Gray‐Code Kernels (GCK) – is introduced. Filtering an
image with a sequence of Gray‐Code Kernels is highly efficient and requires only 2
operations per pixel for each filter kernel, independent of the size or dimension of the
kernel. This family of kernels includes the WH kernels among others, thus it enables very
efficient projection onto the WH basis vectors.
Consider first the 1D case where signal and kernels are one‐dimensional vectors. Denote by
a set of 1D filter kernels expanded recursively from an initial seed vector as follows:
(3.1) { }{ }
(0)
( ) ( 1) ( 1) ( 1) ( 1)
0
. . ,
1 1
s
k k k k ks s k s s s
k
V
V v v s t v Vα
α
− − − −
=
⎡ ⎤= ⎣ ⎦ ∈
∈ + −
Where indicates the multiplication of kernel by the value and … denotes
concatenation.
The set of kernels and the recursive definition can be visualized as a binary tree of depth .
An example is shown in Figure 9 for 3. The nodes of the binary tree at level represent
the kernels of . The leaves of the tree represent the eight kernels of . The branches
are marked with the values of α used to create the kernels (where +/‐ indicates +1/‐1).
Denote | | the length of . It is easily shown that is an orthogonal set of 2 kernels
of length 2 . Furthermore, given an orthogonal set of seed vectors … , it can be shown
that the union set … is orthogonal with 2 vectors of length 2 . If the
set forms a basis. Figure 9 also demonstrates the fact that the values … along the tree
branches uniquely define a kernel in .
20
The sequence … , 1 1 that uniquely defines a kernel is called
the α‐index of . Two kernels , are defined to be α‐related if and only if the
hamming distance between their α‐index (the number of positions for which their α‐indices
are different) is one. Without loss of generality, let the α‐indices of two α‐related kernels be
… , 1,… and … , 1,… . We denote the corresponding kernels as
and respectively. Since … uniquely define a kernel in , two α‐related
kernels always share the same prefix vector of length 2 ∆. The arrows of Figure 9
indicate examples of α‐related kernels in the binary tree of depth 3.
Of special interest are sequences of kernels that are consecutively α‐related. An ordered set
of kernels … that are consecutively α‐related form a sequence of Gray‐Code
Kernels (GCK). The sequence is called a Gray‐Code Sequence (GCS). The term Gray Code
relates to the fact that the series of α‐indices associated with a GCS form a Gray Code [39].
The kernels at the leaves of the tree in Figure 11 in a left to right scan, are consecutively
α‐related, thus forming a GCS. Note, however that this sequence is not unique and that
there are many possible ways of reordering the kernels to form a GCS.
[s ‐s s ‐s][s s ‐s ‐s][s s s s]
+ ‐ + ‐ + ‐+ ‐
++ ‐ ‐
+ ‐s
[s s]
[s ‐s]
[s ‐s ‐s s]
[s s ‐s ‐s s s ‐s ‐s] [s ‐s s ‐s s ‐s s ‐s] [s ‐s ‐s s s ‐s ‐s s]
[s ‐s ‐s s ‐s s s ‐s][s ‐s s ‐s ‐s s ‐s s][s s ‐s ‐s ‐s ‐s s s] [s s s s ‐s ‐s ‐s ‐s]
[s s s s s s s s]
α-related α-related
Figure 9: The set of Gray Code Kernel eir recursiv de tion visualized as abinary tree.
s and th e fini
In this example, the tree is of depth and creates kernels of length 8.Arrows indicate examples of pairs of kernels that are α‐related.
21
The main idea presented in [37, 38] relies on the fact that two α‐related kernels share a
special relationship. Given two α‐related kernels , their sum and their
difference are defined as follows:
(3.2) p
m
v v vv v v
+ −
+ −
= += −
In [38] it is proven that the following relation holds:
[ ]0 0p mv vΔ Δ⎡ ⎤ =⎣ ⎦ (3.3)
where Δ is the length of the common prefix and 0 denotes a vector with Δ zeros.
For example, consider the two α‐related kernels from Figure 9 whose indices are
and :
[ ][ ]
v s s s s s s s sv s s s s s s s s+
−
== − −−
They share a common prefix of length Δ 2 . Then:
− (3.4)
[ ][ ]2 2 0 0 2 2 0 00 0 2 2 0 0 2 2
p t t
m t t t t
v s s s sv s s
==
t t
s s (3.5)
and equation (3.3) holds with:
[ ] [ ]2 20 0 0 2 2 0 0 2 2 0 0 0t p t t t t t t m tv s s s s v⎡ ⎤ = =⎣ ⎦ (3.6)
For simplicity of explanation, we now expand to an infinite sequence such that
0 for 0 and for 2 . Using this convention, equation (3.3) can be rewritten in
a new notation:
(3.7) ( ) (p mv i v i)− Δ =
and this gives rise to the following corollary:
( ) ( ) ( ) ( )( ) ( ) ( ) ( )
v i v i v i v iv i v i v i v i+ + − −
− − + +
= + −Δ + + −Δ= − −Δ + − −Δ
(3.8)
Equation (3.8) is the core principle behind an efficient filtering scheme.
Let and be the signals resulting from convolving a signal with filter kernels and
respectively:
22
(3.9)
( ) ( ) ( )
( ) ( ) ( )j
j
b i x j v i j
b i x j v i j
+ +
− −
= −
= −
∑
∑
Then, by linearity of the convolution operation and corollary (3.8) we have the following:
(3.10) ( ) ( ) ( ) ( )( ) ( ) ( ) ( )
b i b i b i b ib i b i b i b i+ + − −
− − + +
= + −Δ + + −Δ= − −Δ + − −Δ
This forms the basis of an efficient scheme for convolving a signal with a set of GCK. Given
the result of convolving the signal with the filter kernel ( ), convolving with the filter
kernel ( ) requires only two operations per pixel independent of the kernel size. This
scheme is illustrated in Figure 10.
Figure : Efficient tering using GCK. 10 filGiven (convolution of a signal with the filter kernel ), the convolution result can be computed using 2 operations per pixel regardless of kernel size.
Considering definition (3.1), and setting the prefix string to 1 , we obtain that is
the WH basis set of order 2 . A binary tree can be designed such that its leaves are the WH
kernels ordered in dyadic (or Paley) order [35] of increasing sequency and they form a GCS
(i.e., are consecutively α‐related). An example for 2 is shown in Figure 11 where every
two consecutive kernels are α‐related. Thus, given the result of filtering an image with the
first WH kernel, filtering with the second kernel requires only two operations
(additions/subtractions) per pixel. Subsequently, by ordering the WH kernels to form a GCS,
filtering with the other kernels can be performed using only 2 operations per pixel per kernel
regardless of signal and kernel size.
23
[1]
[1 ‐1]
[1 1]
+‐
‐+ ‐ +
[1 1 1 1][1 1 ‐1 ‐1]
[1 ‐1 1 ‐1]
[1 ‐1 ‐1 1]
α-related
Figure 11: GCK with ector es the WH basis set. initial v creatUsing initial vector and depth , a binary tree creates the WH basisset of order 4. Consecutive kernels are α‐related, as shown by the arrows.
For separable kernels, such as the WHT, the previous definitions and results can be
generalized to two (and more) dimensions. The computation cost remains at two operations
per pixel per kernel regardless of the dimension. For example, Figure 12 shows the two‐
dimensional WH kernels of size 4x4. In this figure, every pair of horizontally or vertically
neighboring kernels is α‐related. For more details the reader in referred to [38].
It was shown that successive filtering with α‐related kernels can be applied efficiently.
However, the efficiency of using the GCK in a particular application is determined not only by
the computational complexity of applying each kernel, but also by the total number of
kernels taking part in the process. This, in turn, depends upon the order in which the kernels
are applied. It is desired to order kernels into an optimal GCS. To do so, a priority value
should be assigned to each kernel, representing its contribution in achieving the goal of the
process ‐ in our case, the ability of matching macroblocks based on the projection values of
the specific kernel. If the order of the kernels within the sequence is insignificant, this
problem is shown in [38] to be to NP‐hard.
24
Figure 12: Extension of GCK to two dimensions. The outer product of two sets of one‐dimensional Gray‐Code Kernels forms the set of two‐dimensional kernels. When , the Walsh‐Hadamard kernels of size 4x4 are obtained.
One possible sequence of 2D WH kernels is that in which kernels are ordered with increasing
sequency (the number of sign changes along each dimension of the kernel ‐ analogous to
frequency). The sequency order is known to perform well on natural images due to energy
compaction in the low order sequencies. However, consecutive kernels in the 2D WH
sequency order are not necessarily α‐related, thus they do not form a GCS. Luckily,
horizontally or vertically neighboring kernels in the 2D WH array are α‐related, so a ‘snake’
ordering is possible, as depicted by overlaid arrows in Figure 13. The ‘snake’ ordering,
originally suggested here, forms a GCS and, although not exactly according to sequency,
captures the increase in spatial frequency.
25
Figure 13: ‘Snake’ ordering of WH kernels. The projection vectors of the WHT of order . ‘Snake’ ordering is depicted by overlaid arrows and numbers.
‘Snake’ ordering approximates the increase in spatial frequency while forming a GCS. Thus,
filtering with any kernel in a snake‐ordered sequence (except the first kernel) requires
maintaining only the projection onto the previous kernel in the sequence and the signal
itself. However, it is possible to select an ordering of kernels such that consecutive kernels
are not necessarily α‐related rather every kernel is α‐related to at least one kernel that
precedes it anywhere in the sequence. This still allows an efficient projection using a
projection into a preceding α‐related kernel and the signal, but incurs higher memory
complexity since preceding projections must be maintained in memory. One such ordering is
the ‘increasing frequency’ ordering, originally suggested here, and depicted in Figure 14. The
kernels are arranged in this order in increasing spatial frequency, thus it has better energy
compaction in the first kernels compared to ‘snake’ ordering. In the algorithm presented in
the next chapter, memory complexity will not be an issue, so the increasing frequency order
is used.
26
Figure 14: Increasing frequency ordering of s. WH kernelThe projection vectors of the WHT of order . Increasing frequency ordering is depicted by overlaid arrows and numbers.
27
4. The FME‐GCK Algorithm
In this chapter, a novel fast block motion estimation algorithm is presented. The algorithm is
based on the fast pattern matching technique described in previous chapters, hence it is
denoted FME‐GCK. The motivation for the FME‐GCK algorithm comes from the fact that the
block motion estimation problem is a variant of the pattern matching problem. Therefore,
the fast pattern matching technique described in previous chapters can be tailored for the
fast block motion estimation, with proper adjustments.
The FME‐GCK algorithm has all the advantages of the fast pattern matching techniques
described in earlier chapters and it also exploits additional redundancies of the block motion
estimation process. It is fast and efficient (see Chapter 5 and Chapter 6), involves integer
computations only and incurs sequential memory access. In contradiction to most classical
motion estimation algorithms, the FME‐GCK enables adaptivity to image content (see
Chapter 7 and Chapter 8).
There are few differences between the pattern matching problem and the block motion
estimation problem. The pattern matching problem involves finding one pattern in an image
while the block motion estimation problem involves finding many different macroblocks in a
frame. Every candidate region of the reference frame in the block motion estimation
problem is a candidate for a best match for several neighboring macroblocks from the
current frame. Furthermore, the current frame forms the reference frame for block motion
estimation of the consecutive image in the video sequence. The FME‐GCK exploits these
additional redundancies in the block motion estimation problem.
In block motion estimation, it is assumed that differences between consecutive frames are
due to translation of complete macroblocks. This is a very simple model that might result in
non‐negligible residual or noise. Therefore, instead of searching for an exact pattern match,
a ‘noisy’ version of the template macroblock is sought. Candidate region that produces the
lowest lower bound will be considered as the best match. The fast pattern matching
28
techniques described in previous chapters are shown in [33, 34] is to be effective even under
very noisy conditions, thus their appropriateness to the block motion estimation problem.
Assume a video sequence is composed of images , , , … of size , macroblocks are
of size , and search areas of size . Also assume a set of WH basis vectors
is given such that every basis vector is α‐related to at least one basis vector that
precedes it in the sequence. Denote by the projection values of macroblocks of image
onto WH basis vector . Denote by , or , a square regions of size of image at
coordinates , , and denote by the search area around macroblock .
The FME‐GCK algorithm
For each image
1) Project onto to obtain and store the resulting projections in
memory.
2) For each Inter macroblock ,
2.1) For each candidate region , ,
2.1.1) Calculate the norm‐1 lower bound on the distance between , and ,
using and .
2.2) Calculate the actual SAD between , and the candidate regions ,
with th smallest distance lower bou om , . e nd fr
2.3) Of the candidate regions , , select , with the smallest SAD from
, as the best matching macroblock.
A block diagram of the FME‐GCK algorithm is shown in Figure 15.
29
'best' matchingmacroblock
Memory
Split intomacroblocks
Compute lowerbounds
Select bestcandidates
Calculate SAD
Select bestcandidate
Project
( )1, 1jx ypjI
( ){ } ( )( ) ( 1) ( 1) ( )1, 1 2, 2 2, 2 1, 1, . .j j j jx y x y x y x yLB p w s t w sa p− − ∈
Input videosequence
Current image projections
{ } 1( )
0
mji ib
−
=
Previous image projections
{ } 1( 1)
0
mji ib
−−
=
{ } 1( 1)2 , 2 0i i
qjx y iw
−−
=
( ){ } 1( ) ( 1)1, 1 2 , 2 0
,i i
qj jx y x y i
d p w−
−
=
Figure 15: The FME‐GCK algorithm.
Step 1 of the algorithm, image projections, is performed for all frames, both Inter and Intra,
while the following steps are performed only for Inter frames, where motion information is
required. Image projections are stored in memory since they are required for motion
estimation of the following image in the video sequence.
In order to perform efficient GCK calculations, each basis vector should be α‐related to at
least one basis vector that precedes it in the sequence (from within the projection values
stored in memory). The order of kernels used within the FME‐GCK is depicted in Figure 14 as
overlaid arrows and numbers. This ‘increasing frequency’ ordering has been chosen due to
its good energy compaction property.
Step 1 of the algorithm is performed using GCK with only 2 operations per pixel for each WH
kernel. An exception for this efficient calculation is the first kernel (DC component) that can
be calculated using 4 operations per pixel as described in [40].
30
Notice that the GCK approach cannot be used efficiently for projecting macroblocks on the
top and left image boundaries. This limitation, although seemingly minor, might increase
algorithm complexity substantially. In an experiment with the Foreman video sequence at
CIF (352x288) resolution, boundary macroblock projections were performed by direct
filtering with WH basis vectors and nonboundary macroblock projections were performed
using GCK. In CIF resolution, only about 0.7% of the candidate regions are top or left
boundary regions. However, boundary projections were found to require about 55% of the
calculation time spent on non‐DC projections.
A solution to this problem is to zero‐pad the upper and left boundaries of the image by
Δ 1 rows and Δ 1 columns respectively. This, naturally, also increases the size
of the projection images . The upper Δ rows and left Δ columns of these projection
images are filled with zeros. This is correct since projecting a zero macroblock
onto any kernel results in zero. For all other image pixels starting from the Δ 1 row and
Δ 1 column, projections are performed using the efficient GCK method. The proposed
technique for fast boundary calculation is depicted in Figure 16.
Figure 16: Image padding for rap y calculatioid boundar n. Zero‐padding each image with rows and columns enables rapid boundary calculation. The upper rows and left columns of each corresponding projection images are filled with zeros. GCK based computations start with the row and column.
Image
k-1
k-1
Boundary
Boundary
31
Step 2.1.1 of the algorithm is based on the projection framework described in Chapter 2.
Although the WH basis vectors are not orthonormal, they are orthogonal. Therefore, the
term in equation (2.7) can be ignored. The projection scheme is used with norm‐1
since this forms a partial calculation of the SATD distance measure. As additional projections
are applied, a better of approximation of the SATD is obtained. In [36], the correctness of the
iterative projection scheme with the SAD distance measure is proven.
The FME‐GCK algorithm gives good time‐quality tradeoff compared to classical fast block
motion estimation techniques. This is described in details in Chapter 6. Usually, only a few
projections are required for highly accurate motion estimation. However, if the
algorithm results are guaranteed to be identical with that of the full search, though this is
not a common configuration. Thus, convergence to the optimal solution is guaranteed. In
this sense, FME‐GCK can be considered as a fast full search motion estimation algorithm.
In Chapter 7 a variant of the FME‐GCK that adaptively changes algorithm parameters is
described.
32
5. Complexity Analysis
It has already been mentioned in the previous chapter that the FME‐GCK algorithm uses two
parameters that affect the tradeoff between complexity and accuracy of motion estimation.
These parameters are , the number of projections to perform for each image, and , the
number of candidate macroblocks for which the SAD value is calculated. Larger produces
more accurate results at the cost of higher time and memory complexity. Memory
complexity is affected since projections of image and projections of image must
be stored in memory, thus, memory complexity is approximately 2 1 where
is the size of the video frames. Larger also produces more accurate results at the
cost of higher time complexity; it does not however, affect memory.
Let us assume 1 time unit for each operation of addition, subtraction, multiplication,
absolute value, and minimum of two numbers. We obtain that performing a single SAD
computation between two macroblocks requires 3 1 time units.
Performing the FME‐GCK algorithm involves projections of each image. Time complexity
of this step is 2 time units per pixel for every projection except for the first projection that
requires 4 time units per pixel to calculate – a total of 2 1 time units per template
macroblock. Calculating the lower bound for candidate macroblocks within the search area
requires another 3 1 time units per template macroblock where there search area
is of size . Finding the candidate regions with the smallest lower bounds, if
performed naively, requires no more than time units. Calculating the SAD for these
candidate regions and selecting the one with the minimal SAD requires 3 1
1 3 1 time units. Thus, a total of 2 1 3 1
3 1 time units are required per template macroblock. Additional calculations are
required due to the aforementioned boundary padding. However, this extra overhead is can
be compensated by a non‐naive algorithm for finding the candidate regions with the
lowest lower bounds. Efficient algorithms for selecting the smallest values in a list are
described in [41].
33
Two possible configurations of and are ones that result in FME‐GCK complexity that is
approximately equal to three‐step search [17] or to diamond search [20, 21]. These
configurations allow comparison of the accuracy of motion information produced by
FME‐GCK to the accuracy of motion information produced by three‐step search or by
diamond search under the same computational constraints. Three‐step search incurs 25
block matching operations per macroblocks. Thus, performing three‐step requires
25 3 1 time units plus 24 time units for calculating the minimum over all SAD values
per macroblock. With 16, this sums up to 19,199 time units. Diamond search is shown
in [21] to reduce block matching operations from 25 to an average of 15.5 per macroblock
with 16, 15. Thus, performing a diamond search requires 15.5 3 1 time units
plus 14.5 time units for calculating the minimum over all SAD values per macroblock. In the
given configuration this sums to 11,903 time units per template macroblock. Note that when
comparing the number of time units to perform FME‐GCK with the time units to perform
three‐step search or diamond search, a factor should multiply FME‐GCK’s complexity. The
factor is added since in FME‐GCK both Inter and Intra macroblocks must be projected, in
contradiction to zero calculations for Intra macroblocks incurred by the three‐step search
and the diamond search. The value of depends on the intra periodicity in the video
sequence. A typical value for is 1.10.
Considering the calculation above we obtain that 11, 4 and 5, 13 are
FME‐GCK configuration similar in their computational complexity to three‐step search and
that 5, 4 is an FME‐GCK configuration similar in its computational complexity to
diamond search. Note that it has been verified by real‐time code profiling that these
configurations are indeed similar to their counterparts (see next chapter for details). These
configurations will be used in the next chapter for comparison purposes.
It is important to note that the theoretical complexity comparison of FME‐GCK to three‐step
search and to diamond search does not take into account the fact that FME‐GCK incurs
sequential memory access while three‐step search and diamond search incur many
unpredictable branches. This difference might have a significant effect on running times to
the favor of the FME‐GCK algorithm, depending of the specific hardware configuration. One
hardware configuration where sequential memory access in highly beneficial is DSP (Digital
34
Signal Processor) chips. DSP chips are widely used for many signal processing applications.
Currently, a work is performed in the Signal and Image Processing Laboratory in the
Technion – IIT, comparing the FME‐GCK algorithm performance to three‐step search
performance and to diamond search performance using the DM642 and DM6437 DSP chips
from Texas Instruments.
35
6. FME‐GCK Results
FME‐GCK was implemented using a highly efficient ANSI‐C code, together with its full search,
three‐step, and diamond search counterparts, in order to enable fair time and quality
comparison. Implementation was performed and measured on a Pentium 4 PC at 3GHz
running Windows XP. In general, computational complexity was found to coincide with
theoretical complexity calculation as described in Chapter 5. Both diamond search and
FME‐GCK with 5, 4 execute on this hardware configuration at the speed of about
110 CIF frames per second.
First, an extensive set of simulations was performed. Then, FME‐GCK and its counterparts
were integrated with a video encoder in order to measure the effect on the real video
encoding. In both simulation and video encoding tests, motion estimation was performed
with GOP size of 15 (for every 15th frame no motion estimation was performed) for the
luminance (Y) component with macroblocks of size 16x16 and search area of size 15x15,
except when noted otherwise.
6.1. Simulation Results
All simulation results were obtained using the video sequences that appear in Table 1.
Table 1: Video sequences used for simulation experiments. For each resolution, video sequences are sorted by ascending order of estimated coding‐difficulty.
QCIF (176x144) CIF (352x288) Sequence frames sequence frames
Akiyo 300 Akiyo 300 Miss_america 150 Silent 300 Trevor 150 Foreman 300 Carphone 300 Tempete 260 Coastguard 300 Mobile 300 Foreman 300 Stefan 300
Higher Coding Difficulty
36
Figure 17 shows frame number 169 of the Foreman CIF video sequence. In this frame the
background is approximately static with head motion to the right. Thus, most motion vectors
in the background region are zero and most motion vectors in the head region point to the
left. The computed motion vectors for full‐search, diamond search and FME‐GCK with
4, 4, are displayed as overlaid arrows. While all three resulting motion fields look
similar, it can be observed that the motion information produced by FME‐GCK is closer to
the optimal (full search) results compared to motion information produced by diamond
search.
(a) (b)
Figure 17: Motion information as overlaid arrows.Frame number 169 of the Foreman CIF video sequence. The computed motion vectors for (a) full‐search, (b)diamond search, and (c) FME‐GCK with , , are displayed as overlaid arrows. While all threeresulting motion fields look similar, it can be observed that the motion information produced by FME‐GCK iscloser to the optimal (full search) results compared to motion information produced by diamond search.
(c)
Figure 18 depicts the effect of different values of the parameter on FME‐GCK motion
estimation accuracy with a constant 4. Motion estimation accuracy is measured in mean
SAD per macroblock between macroblocks and their ‘best’ matching counterparts. Full
search, three‐step search and diamond search results are displayed as reference. As
expected, FME‐GCK algorithm results converge to the optimal, namely increasing the
number of projections produces lower SAD values, thus approaching full search SAD values.
37
2 4 6 8 10 12150
200
250
300
350
400
450
m (number or projections)
Mea
n S
AD
per
mac
robl
ock
akiyo.cif
Full searchThree-step searchDiamond searchFME-GCK
400
2 4 6 8 10 12100
150
200
250
300
350
m (number or projections)
Mea
n S
AD
per
mac
robl
ock
akiyo.qcif
Full searchThree-step searchDiamond searchFME-GCK
2 4 6 8 10 12300
350
400
450
500
550
600
650
700
m (number or projections)
Mea
n S
AD
per
mac
robl
ock
miss america.qcif
Full searchThree-step searchDiamond searchFME-GCK
1300
2 4 6 8 10 12400
500
600
700
800
900
1000
1100
1200
m (number or projections)
Mea
n S
AD
per
mac
robl
ock
silent.cif
Full searchThree-step searchDiamond searchFME-GCK
2 4 6 8 10 12500
600
700
800
900
1000
1100
1200
1300
1400
m (number or projections)
Mea
n S
AD
per
mac
robl
ock
trevor.qcif
Full searchThree-step searchDiamond searchFME-GCK
1900
2 4 6 8 10 121000
1100
1200
1300
1400
1500
1600
1700
1800
m (number or projections)
Mea
n S
AD
per
mac
robl
ock
foreman.cif
Full searchThree-step searchDiamond searchFME-GCK
38
2 4 6 8 10 12900
1000
1100
1200
1300
1400
1500
1600
1700
1800
1900
m (number or projections)
Mea
n S
AD
per
mac
robl
ock
foreman.qcif
Full searchThree-step searchDiamond searchFME-GCK
2 4 6 8 10 12800
900
1000
1100
1200
1300
1400
1500
1600
1700
1800
m (number or projections)
Mea
n S
AD
per
mac
robl
ock
carphone.qcif
Full searchThree-step searchDiamond searchFME-GCK
3400
2 4 6 8 10 121400
1600
1800
2000
2200
2400
2600
2800
3000
3200
m (number or projections)
Mea
n S
AD
per
mac
robl
ock
tempete.cif
Full searchThree-step searchDiamond searchFME-GCK
2 4 6 8 10 12800
1000
1200
1400
1600
1800
2000
2200
2400
2600
m (number or projections)
Mea
n S
AD
per
mac
robl
ock
coastguard.qcif
Full searchThree-step searchDiamond searchFME-GCK
5500
2 4 6 8 10 122000
2500
3000
3500
4000
4500
5000
m (number or projections)
Mea
n S
AD
per
mac
robl
ock
mobile.cif
Full searchThree-step searchDiamond searchFME-GCK
5000
2 4 6 8 10 122500
3000
3500
4000
4500
m (number or projections)
Mea
n S
AD
per
mac
robl
ock
stefan.cif
Full searchThree-step searchDiamond searchFME-GCK
Figure 18: Effect of differen s of the parameter on motion estimation accuracy. t valueResults are for a constant . Full search, three‐step search, and diamond search results are displayed asa reference. FME‐GCK algorithm results converge to the optimal ones. FME‐GCK significantly outperformsthree‐step search (indicated by a light arrow) and is comparable to diamond search (indicated by a darkarrow).
39
For all video sequences except one, FME‐GCK outperforms three‐step search for 5. For
9 out of the 12 video sequences, 4 is sufficient to outperform three‐step search. Note
that an FME‐FCK configuration equal in its computational complexity to three‐step search,
indicated by a light arrow, is 11, 4, so for the same motion accuracy the gain in
computation time by using FME‐GCK compared to three‐step is significant. For a
configuration comparable in its computational time to diamond search, indicated by a dark
arrow, 5, 4, FME‐GCK outperforms diamond search only for few video sequences.
This will change to the favor of FME‐GCK with the introduction of an adaptive FME‐GCK in
Chapter 7.
Figure 19 depicts FME‐GCK motion estimation accuracy vs. three‐step motion estimation
accuracy for similar computational complexity – FME‐GCK with 11, 4. It is again
shown that for this configuration, FME‐GCK significantly outperforms three‐step search for
all video sequences except one (Tempete CIF).
Figure 19: FME‐GCK motion estimation accuracy vs. three‐step motion estimation accuracy. Comparison is performed for similar computational complexity – FME‐GCK with , . For this configuration, FME‐GCK significantly outperforms three‐step search for all video sequences except one.
0
500
1000
1500
2000
2500
3000
3500
4000
4500
Mea
n S
AD
per
mac
robl
ock
FME-GCK (m=11,q=4)Three-step search
akiyoCIF
trevorQCIF
mobileCIF
stefanCIF
silentCIF
miss americaQCIFakiyo
QCIF
foremanQCIFcoastguard
QCIFcarphone
QCIF
foremanCIF
tempeteCIF
40
Looking at Figure 18, it is not obvious that motion estimation accuracy of FME‐GCK can be
approximately predicted from image content. Figure 20 depicts the same FME‐GCK
convergence lines that appear in Figure 18, but now the y‐axis represents
. Thus, Figure 20 represents FME‐GCK motion estimation accuracy
compared to the optimal (full search) motion estimation accuracy. All 12 video sequences
are sorted by ascending order of their coding‐difficulty. Easier‐to‐code video sequences are
plotted as lighter lines while more difficult‐to‐code video sequences are plotted as darker
lines. It could be observed that, in general, more difficult‐to‐code video sequences produce
larger values, thus they require more projections in order to obtain the same motion
estimation accuracy compared to easy‐to‐code video sequences relative to the optimal
results. This fact is exploited in the adaptive FME‐GCK algorithm, as described in Chapter 7.
Figure 20: FME‐GCK motio ation accuracy relative to the optimal results. n estimResults are for a constant . Video sequences are sorted by ascending order of their coding‐difficulty. Easier‐to‐code video sequences are plotted as lighter lines while more difficult‐to‐code video sequences are plotted as darker lines. In general, more difficult‐to‐code video sequences produce larger values, thus they require more projections in order to reach the same motion estimation accuracy compared to easy‐to‐code video sequences.
2 3 4 5 6 7 8 9 10 11 120
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
m (number of projections)
(SA
DFM
E-G
CK -
SAD
Full
Sea
rch) /
SA
DFu
ll S
earc
h
akiyo QCIFakiyo CIFmiss america QCIFsilent CIFtrevor QCIFcarphone QCIFcoastguard QCIFforeman QCIFforeman CIFtempete CIFmobile CIFstefan CIF
Higher Coding Difficulty
41
It is also possible to a keep the parameter constant, 5 in our case, and select
different values of the parameter , as depicted in Figure 21. In this case too, FME‐GCK
algorithm results converge to the optimal, namely increasing the number of SAD calculation
per macroblock produces lower SAD values, thus approaching full search SAD values. For 9
out of the 12 video sequences, 5 is sufficient to outperform three‐step search. An
FME‐FCK configuration equal in its computational complexity to three‐step search, indicated
by a light arrow, is 5, 13, so for the same motion accuracy the gain in computation
time by using FME‐GCK compared to three‐step is significant. A comparison of diamond
search results and FME‐GCK results with 5, 4 was performed in the context of
Figure 18 and is indicated in Figure 21 again by a dark arrow.
Figure 22 depicts the effect of the size of the search area on FME‐GCK motion estimation
accuracy. Experiments were performed for the Carphone QCIF and Foreman CIF video
sequences with search areas of sizes 7x7, 15x15, and 31x31. The y‐axis represents
or
, thus it represents FME‐GCK
motion estimation accuracy compared to diamond search or to three‐step search motion
estimation accuracy respectively. It can be observed that for larger search areas, as more
projections are performed, FME‐GCK results improve both compared to three‐step search
and compared to diamond search. Today, with the advent of high resolution video
sequences, large search areas are commonly used. Thus, FME‐GCK is expected to have even
improved results compared to three‐step search and to diamond in the near future.
42
2 4 6 8 10 12 141050
1100
1150
1200
1250
1300
q (number or SADs per macroblock)
Mea
n S
AD
per
mac
robl
ock
foreman.cif
Full searchThree-step searchDiamond searchFME-GCK
2 4 6 8 10 12 14480
500
520
540
560
580
600
q (number or SADs per macroblock)
Mea
n S
AD
per
mac
robl
ock
silent.cif
Full searchThree-step searchDiamond searchFME-GCK
2 4 6 8 10 12 14120
122
124
126
128
130
132
134
q (number or SADs per macroblock)
Mea
n S
AD
per
mac
robl
ock
akiyo.qcif
Full searchThree-step searchDiamond searchFME-GCK
2 4 6 8 10 12 14520
540
560
580
600
620
640
660
680
q (number or SADs per macroblock)
Mea
n S
AD
per
mac
robl
ock
trevor.qcif
Full searchThree-step searchDiamond searchFME-GCK
2 4 6 8 10 12 14300
310
320
330
340
350
360
q (number or SADs per macroblock)
Mea
n S
AD
per
mac
robl
ock
miss america.qcif
Full searchThree-step searchDiamond searchFME-GCK
134
2 4 6 8 10 12 14120
122
124
126
128
130
132
q (number or SADs per macroblock)
Mea
n S
AD
per
mac
robl
ock
akiyo.qcif
Full searchThree-step searchDiamond searchFME-GCK
43
2 4 6 8 10 12 142700
2800
2900
3000
3100
3200
3300
q (number or SADs per macroblock)
Mea
n S
AD
per
mac
robl
ock
stefan.cif
Full searchThree-step searchDiamond searchFME-GCK
2 4 6 8 10 12 142000
2100
2200
2300
2400
2500
2600
q (number or SADs per macroblock)
Mea
n S
AD
per
mac
robl
ock
mobile.cif
Full searchThree-step searchDiamond searchFME-GCK
2 4 6 8 10 12 141550
1600
1650
1700
1750
1800
1850
1900
1950
2000
2050
q (number or SADs per macroblock)
Mea
n S
AD
per
mac
robl
ock
tempete.cif
Full searchThree-step searchDiamond searchFME-GCK
2 4 6 8 10 12 14900
920
940
960
980
1000
1020
1040
1060
1080
q (number or SADs per macroblock)
Mea
n S
AD
per
mac
robl
ock
foreman.qcif
Full searchThree-step searchDiamond searchFME-GCK
2 4 6 8 10 12 14900
950
1000
1050
1100
1150
q (number or SADs per macroblock)
Mea
n S
AD
per
mac
robl
ock
coastguard.qcif
Full searchThree-step searchDiamond searchFME-GCK
1000
980
960
2 4 6 8 10 12 14820
840
860
880
900
920
940
q (number or SADs per macroblock)
Mea
n S
AD
per
mac
robl
ock
carphone.qcif
Full searchThree-step searchDiamond searchFME-GCK
Figure 21: Effect of differe s of the parameter on motion estimation accuracy. nt valueResults are for a constant . Full search, three‐step search, and diamond search results are displayed asa reference. FME‐GCK algorithm results converge to the optimal ones. FME‐GCK significantly outperformsthree‐step search (indicated by a light arrow) and is comparable to diamond search (indicated by a darkarrow).
44
We summarize the simulation results section by stating that the FME‐GCK algorithm
significantly outperforms three‐step search and produces motion information that is almost
as accurate as diamond search. This will further improve in the favor of FME‐GCK with the
introduction of an adaptive FME‐GCK in Chapter 7. In addition, when larger search areas are
used, FME‐GCK results improve compared to both three‐step search and diamond search.
foreman CIF
carphone QCIF
2 3 4 5 6 7 8 9 10 11 12-0.35
-0.3
-0.25
-0.2
-0.15
-0.1
-0.05
0
0.05
m (number of projections)
(FM
E-G
CK SA
D -
TSS
SAD
) / T
SS
SAD
search area 7x7search area 15x15search area 31x31
2 4 6 8 10 12-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
m (number of projections)
(FM
E-G
CK S
AD -
Dia
mon
d SAD
) / D
iam
ond S
AD
search area 7x7search area 15x15search area 31x31
2 4 6 8 10 12-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
0.25
m (number of projections)
(FM
E-G
CK S
AD -
TSS
SAD
) / T
SS
SAD
search area 7x7search area 15x15search area 31x31
0.6
2 4 6 8 10 12-0.1
0
0.1
0.2
0.3
0.4
0.5
m (number of projections)(FM
E-G
CK SA
D -
Dia
mon
d SAD
) / D
iam
ond SA
D
search area 7x7search area 15x15search area 31x31
Figure Results are given for Carphone QCIF and Foreman CIF video search sequences with a constant
relative to three‐step search and diamond search results with search areas of size 7x7,15x15, and 31x31. Relative to both three‐step search and diamond search, FME‐GCK resultsimprove for larger search areas.
22: Effect of size of the search area on motion estimation accuracy.
45
6.2. Video Encoding Results
In order to compare FME‐GCK results as part of real video encoding, FME‐GCK and its
counterparts were integrated with a video encoder. The standard JVT H.264/AVC reference
software [13] was degenerated and used for all tests. Code features that were degenerated
are the ones not currently supported by FME‐GCK implementation ‐ B pictures, motion
estimation of sub‐macroblocks smaller than 16x16 in size, subpixel motion estimation,
multiple reference frames for motion estimation. It is important to note that FME‐GCK can
readily support all these features in a future version of its implementation.
All experiments were performed according to the common testing conditions recommended
in [42]. Therefore, the video sequences that appear in Table 2 were coded with QP
(quantization parameters) values of 28, 32, 36, 40.
Table 2: Video sequences used for video coding experiments. For each resolution, video sequences are sorted by ascending order of estimated coding‐difficulty.
QCIF (176x144) CIF (352x288) Sequence frames sequence frames
Container 300 Paris 300 Silent Voice 300 Foreman 300 Foreman 300 Tempete 260 Mobile 300
Higher Coding Difficulty
Rate‐distortion results for all seven video sequences can be found in Figures 24‐29. Rate‐
distortion results for three‐step search and diamond search are displayed as a reference. For
every QP value in these figures, distortion (PSNR) was kept roughly constant and mean
Δbitrate results are computed relative to full search according to [43]. A smaller Δbitrate
indicates more accurate motion estimation.
46
27
28
29
30
31
32
33
34
35
36
37
30 50 70 90 110 130 150
Y‐PSNR [dB]
bitrate [Kbits/sec]
Full Search
Three‐step Search
Diamond Search
FME‐GCK (m=5,q=4)
FME‐GCK (m=5,q=13)
Full Search Three‐step Search Diamond Search
FME‐GCK (m=5,q=4)
FME‐GCK (m=5,q=13)
QP bitrate Y‐PSNR bitrateY‐PSNRbitrateY‐PSNRbitrateY‐PSNR bitrateY‐PSNR
28 127.54 36.13 127.5736.12 127.6136.13 128.6036.12 127.7736.12
32 82.81 33.37 82.83 33.37 82.84 33.37 83.70 33.37 82.98 33.37
36 54.32 30.68 54.31 30.68 54.33 30.68 55.05 30.67 54.44 30.68
40 38.21 28.26 38.18 28.27 38.17 28.26 38.89 28.26 38.33 28.26
0.00%0.00 0.02%0.00 1.29%‐0.08 0.24%‐0.02
Figure 23: FME‐GCK rate‐distortion video encoding results for Container QCIF. Bitrate is given in Kbits/sec; Y‐PSNR is given in dB. Full search, three‐step search, and diamond search results are displayed as a reference. FME‐GCK is outperformed by both three‐step search and diamond search.
47
27
28
29
30
31
32
33
34
35
36
30 80 130 180 230
Y‐PSNR [dB]
bitrate [Kbits/sec]
Full Search
Three‐step Search
Diamond Search
FME‐GCK (m=5,q=4)
FME‐GCK (m=5,q=13)
Full Search Three‐step Search Diamond Search
FME‐GCK (m=5,q=4)
FME‐GCK (m=5,q=13)
QP bitrate Y‐PSNR bitrateY‐PSNRbitrateY‐PSNRbitrateY‐PSNR bitrateY‐PSNR
28 175.26 35.51 180.3935.51 175.2935.51 187.1535.44 179.6835.48
32 108.66 32.63 112.6432.64 108.8632.64 115.3832.55 111.1432.60
36 68.32 30.16 70.79 30.17 68.45 30.17 71.41 30.07 69.49 30.13
40 45.33 27.78 46.79 27.78 45.51 27.79 46.68 27.73 45.75 27.76
3.35%‐0.19 0.03%0.00 6.84%‐0.37 2.49%‐0.14
Figure 24: FME‐GCK rate‐distortion video encoding results for Silent Voice QCIF. Bitrate is given in Kbits/sec; Y‐PSNR is given in dB. Full search, three‐step search, and diamond search results are displayed as a reference. FME‐GCK outperforms three‐step search and is outperformed by diamond search.
48
26
27
28
29
30
31
32
33
34
35
36
50 100 150 200 250 300 350 400 450
Y‐PSNR [dB]
bitrate [Kbits/sec]
Full Search
Three‐step Search
Diamond Search
FME‐GCK (m=5,q=4)
FME‐GCK (m=5,q=13)
Full Search Three‐step Search Diamond Search
FME‐GCK (m=5,q=4)
FME‐GCK (m=5,q=13)
QP bitrate Y‐PSNR bitrateY‐PSNRbitrateY‐PSNRbitrate Y‐PSNR bitrateY‐PSNR
28 356.01 35.05 409.1634.93 374.2935.02 361.1235.03 357.6435.05
32 205.46 32.11 241.4031.96 217.5832.07 209.1432.10 206.5732.10
36 115.57 29.42 136.9529.24 122.6929.37 118.2129.40 116.2329.41
40 68.85 27.04 79.94 26.81 72.81 26.98 71.00 27.03 69.24 27.04
21.38%‐0.95 6.84%‐0.32 2.36%‐0.11 0.70%‐0.03
Figure 25: FME‐GCK rate‐distortion video encoding results for Foreman QCIF. Bitrate is given in Kbits/sec; Y‐PSNR is given in dB. Full search, three‐step search, and diamond search results are displayed as a reference. FME‐GCK outperforms both three‐step search and diamond search.
49
26
27
28
29
30
31
32
33
34
35
36
200 400 600 800 1000 1200
Y‐PSNR [dB]
bitrate [Kbits/sec]
Full Search
Three‐step Search
Diamond Search
FME‐GCK (m=5,q=4)
FME‐GCK (m=5,q=13)
Full Search Three‐step Search Diamond Search
FME‐GCK (m=5,q=4)
FME‐GCK (m=5,q=13)
QP bitrate Y‐PSNR bitrateY‐PSNRbitrateY‐PSNRbitrateY‐PSNR bitrateY‐PSNR
28 992.99 35.52 1028.5635.51996.1835.52 1027.1735.50 1003.1135.52
32 655.76 32.41 682.7532.40658.0032.41 680.2232.38 663.4632.40
36 415.56 29.46 435.2229.45417.3229.46 432.1929.42 420.8929.44
40 261.27 26.76 274.3026.76262.4226.76 272.1026.71 264.8526.74
4.49% ‐0.29 0.38%‐0.02 4.39% ‐0.28 1.43% ‐0.09
Figure 26: FME‐GCK rate‐distortion video encoding results for Paris CIF. Bitrate is given in Kbits/sec; Y‐PSNR is given in dB. Full search, three‐step search, and diamond search results are displayed as a reference. FME‐GCK outperforms three‐step search and is outperformed by diamond search.
50
27
28
29
30
31
32
33
34
35
36
150 350 550 750 950 1150 1350 1550 1750
Y‐PSNR [dB]
bitrate [Kbits/sec]
Full Search
Three‐step Search
Diamond Search
FME‐GCK (m=5,q=4)
FME‐GCK (m=5,q=13)
Full Search Three‐step Search Diamond Search
FME‐GCK (m=5,q=4)
FME‐GCK (m=5,q=13)
QP bitrate Y‐PSNR bitrateY‐PSNRbitrateY‐PSNRbitrateY‐PSNR bitrateY‐PSNR
28 1286.80 35.54 1517.2635.391335.4835.511318.7435.50 1295.4535.53
32 727.59 32.84 878.2032.66761.0232.80748.1132.80 733.3432.82
36 403.07 30.42 492.5130.21425.6730.39416.8130.38 406.6930.41
40 241.83 28.30 289.3928.04257.2728.26252.1028.28 244.2728.29
26.22%‐1.02 5.87% ‐0.25 4.02% ‐0.17 1.17% ‐0.05
Figure 27: FME‐GCK rate‐distortion video encoding results for Foreman CIF. Bitrate is given in Kbits/sec; Y‐PSNR is given in dB. Full search, three‐step search, and diamond search results are displayed as a reference. FME‐GCK outperforms both three‐step search and diamond search.
51
24
25
26
27
28
29
30
31
32
33
34
35
400 900 1400 1900 2400 2900 3400
Y‐PSNR [dB]
bitrate [Kbits/sec]
Full Search
Three‐step Search
Diamond Search
FME‐GCK (m=5,q=4)
FME‐GCK (m=5,q=13)
Full Search Three‐step Search Diamond Search
FME‐GCK (m=5,q=4)
FME‐GCK (m=5,q=13)
QP bitrate Y‐PSNR bitrateY‐PSNRbitrateY‐PSNRbitrateY‐PSNR bitrateY‐PSNR
28 2757.57 34.06 2782.1934.042759.6034.062862.5133.99 2802.5834.03
32 1707.69 30.75 1726.7330.741709.2230.751779.1830.67 1739.3230.71
36 966.05 27.70 979.3927.70967.1727.701009.4827.62 986.4627.66
40 508.25 25.03 514.2425.01507.8525.03531.9224.96 520.3125.00
1.35% ‐0.07 0.08% 0.00 5.81% ‐0.30 2.68% ‐0.14
Figure 28: FME‐GCK rate‐distortion video encoding results for Tempete CIF. Bitrate is given in Kbits/sec; Y‐PSNR is given in dB. Full search, three‐step search, and diamond search results are displayed as a reference. FME‐GCK outperforms three‐step search and it outperformed by diamond search.
52
27
28
29
30
31
32
33
34
35
36
150 350 550 750 950 1150 1350 1550 1750
Y‐PSNR [dB]
bitrate [Kbits/sec]
Full Search
Three‐step Search
Diamond Search
FME‐GCK (m=5,q=4)
FME‐GCK (m=5,q=13)
Full Search Three‐step Search Diamond Search
FME‐GCK (m=5,q=4)
FME‐GCK (m=5,q=13)
QP bitrate Y‐PSNR bitrateY‐PSNRbitrateY‐PSNRbitrateY‐PSNR bitrateY‐PSNR
28 1286.80 35.54 1517.2635.391335.4835.511318.7435.50 1295.4535.53
32 727.59 32.84 878.2032.66761.0232.80748.1132.80 733.3432.82
36 403.07 30.42 492.5130.21425.6730.39416.8130.38 406.6930.41
40 241.83 28.30 289.3928.04257.2728.26252.1028.28 244.2728.29
26.22%‐1.02 5.87% ‐0.25 4.02% ‐0.17 1.17% ‐0.05
Figure 29: FME‐GCK rate‐distortion video encoding results for Mobile CIF. Bitrate is given in Kbits/sec; Y‐PSNR is given in dB. Full search, three‐step search, and diamond search results are displayed as a reference. FME‐GCK outperforms both three‐step search and diamond search.
53
For six out of the seven video sequences that appear in Table 2, FME‐GCK outperforms
three‐step search. For three out of these seven video sequences, FME‐GCK also outperforms
diamond search.
The video coding results corroborate the simulation results from section 6.1. The FME‐GCK
algorithm significantly outperforms three‐step search and produces motion information that
is almost as accurate as diamond search. This will further improve in the favor of FME‐GCK
with the introduction of an adaptive FME‐GCK in Chapter 7.
54
7. An Adaptive FME‐GCK
An important advantage of FME‐GCK compared to classical fast block motion estimation
techniques is that it enables adaptivity to image content. In this chapter, the adaptive
capabilities of FME‐GCK are exploited to produce a varying complexity block motion
estimation algorithm. For some video coding applications, a varying complexity block motion
estimation algorithm is a necessity. Furthermore, even if not a necessity, an adaptive varying
complexity may significantly improve motion estimation accuracy compared with a non‐
adaptive method.
Following are some example scenarios in which flexibility in controlling the tradeoff between
complexity and quality is required [3]:
1. Software video codec – Encoding is carried out in software. The upper bound on
computational complexity depends on the available processing resources. These
resources are likely to vary from platform to platform (for example, depending on the
specification of a PC) and may also vary depending on the number of other applications
contending for resources.
2. Power‐limited video codec ‐ In a mobile or handheld computing platform, power
consumption is at a premium. It is now common for a processor in a portable PC or
personal digital assistant to be power‐aware, e.g. a laptop PC may change the processor
clock speed depending on whether it is running from a battery or from an AC supply.
Power consumption increases depending on the activity of peripherals, e.g. hard disk
accesses, display activity, etc. There is therefore a need to manage and limit
computation in order to maximize battery life.
3. Multichannel video coding – One of the tasks of a video server might be to encode
several video sequences simultaneously. Available computational resources are limited
and should be divided between different coding processes. It might be beneficial to
allocate more computational resources to the difficult‐to‐code video sequences in an
effort to equate the quality of the coded sequences.
55
In all scenarios, desired algorithm complexity may depend on external parameters, on the
characteristics of the input video sequences, or on both. Since external parameters are
application specific, the rest of this chapter will deal with adaptively changing FME‐GCK
parameters based only on the characteristics of the input video sequence.
It is well‐known that some video scenes are more difficult‐to‐code (or less code‐able) than
others. Material containing an abundance of spatial detail and/or rapid, possibly non‐
translational, movement generally requires more encoded bits than material containing little
detail and/or simple motion. The less code‐able material is not modeled well by the
translational block‐based motion model used in modern video coders, thus resulting in
relatively large values in the residual signal. These relatively large values require many bits to
code, thus the coding efficiency of these video scenes is low. Increasing the computational
resources for motion estimation of difficult‐to‐code scenes, if performed wisely, should
improve their coding efficiency.
FME‐GCK uses two parameters that affect the tradeoff between complexity and accuracy of
resulting motion vectors. These parameters are , the number of projections to perform for
each image, and , the number of candidate macroblocks for which the SAD value is
calculated (Step 2.2 in algorithm). Larger and larger produce more accurate results at
the cost of higher (time and memory) complexity.
Figure 23 shows the mean SAD between macroblocks and their ‘best’ matching regions in
the previous frame for the sequences Akiyo and Stefan of length 300 frames each in CIF
(352x288) resolution. Large SAD indicates large values in the residual signal; small SAD
indicates small values in the residual signal. Akiyo is a ‘talking head’ sequence with a small
amount of simple motion while the Stefan sequence comprises of complex local and global
motions. The reconstructed macroblocks were produced by FME‐GCK with constant 4
and with different values of . Since Akiyo is easy‐to‐code, its residual signal is small, and
since Stefan is difficult‐to‐code, its residual signal is substantially larger. The difference is
more than an order of magnitude. As expected, for both sequences, larger values of
(more projections), produce a smaller residual signal. More importantly however, is the fact
56
that increasing the number of projections produces a reduction in SAD substantially greater
for the Stefan sequence than for the Akiyo sequence. For example, increasing the number of
projections from 2 to 3 reduces the mean SAD per macroblock in the Stefan sequence by
328.29. On the other hand, reducing the number or projections from 3 to 2 increases the
mean SAD per macroblock sequence in the Akiyo sequence only by 23.66. Thus, using more
projections for Stefan is much more effective in raising mean coding efficiency than using
more projections for Akiyo. In addition, since mean SAD is a measure of subjective image
quality (though not a very good one), using more projections for the Stefan sequence
increases its subjective quality towards approaching Akiyo’s subjective quality. Thus, using
more projections for the Stefan sequence helps the encoder to achieve constant image
quality across different video scenes.
Figure 30: Size of residual signal using FME‐GCK with constant and different values of .
2 3 4 5 6 70
500
1000
1500
2000
2500
3000
3500
4000
4500
m (number of projections)
Mea
n S
AD
per
mac
robl
ock
akiyo CIFstefan CIF
157.06180.72 153.28 151.70 151.08 150.67
3492.93
3164.64 3075.582967.82 2932.12 2911.59
Results are for Akiyo and Stefan video sequences, 300 frames in length each, CIF (352x288) resolution. Akiyo is easy‐to‐code while Stefan is difficult‐to‐code. For both sequences, larger values of (more projections) result in smaller residual signal, but the expected improvement in coding efficiency for adding more projections, is substantially larger for Stefan compared with Akiyo.
57
Figure 24 shows that similar conclusions can be deduced when is kept constant, 5 in
this case, and varies. As expected, for both sequences, larger values of (more SAD
calculations), results in a smaller residual signal. However, the reduction in SAD produced by
using larger values is substantially larger for the Stefan sequence than for the Akiyo
sequence. For example, increasing SAD calculations per macroblock from 2 to 3 reduces the
mean SAD per macroblock by 86.35 for the Stefan sequence but only by 3.38 for the Akiyo
sequence. Thus, performing more SAD calculations for the Stefan sequence is much more
effective in raising mean coding efficiency than performing more SAD calculations for the
Akiyo sequence. As before, performing more computations for motion estimation of the
Stefan sequence helps the encoder to achieve a constant image quality.
Figure 31: Size of residual signal using FME‐GCK with constant and different values of .
2 3 4 5 6 70
500
1000
1500
2000
2500
3000
3500
4000
q (number of SADs per macroblock)
Mea
n S
AD
per
mac
robl
ock
akiyo CIFstefan CIF
156.29 152.91 151.70 151.07 150.80 150.59
3101.97 3015.62 2967.82 2936.29 2912.70 2894.51
Results are for Akiyo and Stefan video sequences, 300 frames in length each, CIF (352x288) resolution. Akiyo is easy‐to‐code while Stefan is difficult‐to‐code. For both sequences, larger values of (more SAD calculations) result in smaller residual signal, but the expected improvement in coding efficiency for adding more SAD calculations, is substantially larger for Stefan compared with Akiyo.
58
Let us assume that we are required to encode a large set of video sequences. This set of
video sequences contains a variety of video scenes with varying coding‐difficulty. Classical
block motion estimation algorithms typically allot the same amount of time for computing
motion vectors for all video scenes, resulting in varying encoding coding efficiency. Scenes
with complex motion result in substantially larger residuals than scenes with simple motion
or with small amount of motion. If a constant bitrate is required at the output of the
encoding process, residual signals of difficult‐to‐code scenes will be more coarsely
quantized, leading to a reduced subjective image quality. Figure 23 and Figure 24 show that
this undesirable effect might be mitigated by selecting larger values for and for more
difficult‐to‐code video scenes. Setting nonconstant values for and is practical only if it is
possible to change computational resources allocation dynamically in the encoding system.
This leads to a higher mean subjective quality or to closer to constant subjective quality of
the encoded video sequences.
In order to change and dynamically, an estimate of the resulting encoding bitrate is
required. This estimate is produced by a video bitrate control algorithm. Any practical video
encoder contains a bitrate control algorithm that attempts to maximize the visual quality
and to achieve a desired target of encoded bits. The bitrate control estimates the number of
bits required for coding each picture. This estimate might be used to control and . Some
examples of well‐known video bitrate control algorithms are MPEG‐2 Test Model 5 [44],
H.263 Test Model 8 [45], MPEG‐4 Annex‐L [9]. MPEG‐2 Test Model 5 and MPEG‐4 Annex‐L
are frame‐level rate‐control, estimating the output bitrate at the frame level, while H.263
Test Model 8 also estimates the output bitrate at the macroblock level.
If a bitrate estimate is not available, a simple estimate of image code‐ability can be used.
One example of such a simple estimate is the size of the residual. Easy‐to‐code material is
expected to result in a small residual signal while difficult‐to‐code material is expected to
result in a larger residual signal. The size of residual of the previous frame can be used to
estimate the code‐ability of the current frame in the video sequences. Due to temporal
redundancy this is a good estimate, except for the first frame of every scene (following a
scene change).
59
The change in , namely the number of projections to perform for each image, is associated
with a complete frame. On the other hand, adaptivity of , namely the number of candidate
macroblocks for which the SAD value is calculated, is applied at the macroblock level with
varying as a function of code‐ability of each macroblock. Changing the value of can use a
simple frame‐based estimate of code‐ability. On the other hand, changing the value of
should use a more accurate estimate which is macroblock‐dependant due to its spatial
locality.
Changing the value of has two disadvantages. Since computation of the lower bound
requires both current and previous image projections, changing takes effect with one
frame of delay. In addition, due to the same reason, raising raises not only time but also
memory complexity. Adaptivity of does not have these two disadvantages.
In the adaptive results given in the next chapter, the size of residual of the previous frame is
used to estimate the coding‐difficulty of the current frame. This estimate is used to
adaptively control .
60
8. Adaptive FME‐GCK Results
Following are adaptive FME‐GCK simulation results with a constant 4 and with variable
values of the parameter . The size of the residual of every frame was used as a simple
code‐ability estimate of its consecutive frame. This code‐ability estimate is used to control
the parameter . As before, macroblocks are of size 16x16 and search area is of size 15x15.
Figure 25 shows time, measured in operations per macroblock, vs. motion accuracy,
measured in mean SAD per macroblock, for a video sequence that is a concatenation of all
six QCIF video sequences that appear in Table 1. Results for different configurations of
thresholds for transitioning between values of are plotted. It is shown that different time‐
accuracy tradeoffs can be used according to thresholds selection. In all configurations, more
projections are preformed for more difficult‐to‐code scenes and fewer projections are
performed for easier‐to‐code video scenes. By this adaptivity, the mean SAD for the
concatenated video sequence in reduced. One adaptive FME‐GCK configuration shown in
Figure 25 is with similar computation complexity as the diamond search, yet outperforms it.
Figure 26 shows time, measured in operations per macroblock, vs. motion accuracy,
measured in mean SAD per macroblock, for a video sequence that is a concatenation of all
six CIF video sequences that appear in Table 1. Results for different configurations of
thresholds for transitioning between values of are plotted. Similar to QCIF, an adaptive
FME‐GCK configuration with similar computation complexity to diamond search,
outperforms it. We conclude that if thresholds of the adaptive FME‐GCK are appropriately
selected, adaptive FME‐GCK significantly outperforms diamond search on average.
It should be noted that a residual based code‐ability estimate was used to produce Figure 25
and Figure 26. A more sophisticated estimate is expected to improve adaptive FME‐GCK
performance. Such an estimate can be also used to adaptively control the parameter to
further improve FME‐GCK performance.
61
1.13 1.14 1.15 1.16 1.17 1.18 1.19 1.2 1.21 1.22 1.23
x 104
607
607.5
608
608.5
609
609.5
610
610.5
611
611.5
612
Operations per macroblock
Mea
n S
AD
per
mac
robl
ock
Diamond searchAdaptive FME-GCK
Figure 32: Adaptive FME‐GCK results (QCIF resolution). Results are given for a concatenation of six QCIF video sequences. Different time‐accuracy tradeoffs can be used according to thresholds selection. For the same computational complexity, adaptive FME‐GCK outperforms diamond search.
1.18 1.182 1.184 1.186 1.188 1.19 1.192 1.194 1.196 1.198 1.2
x 104
1317
1317.5
1318
1318.5
1319
1319.5
1320
1320.5
1321
Operations per macroblock
Mea
n S
AD
per
mac
robl
ock
Diamond searchAdaptive FME-GCK
Figure 33: Adaptive FME‐GCK results (CIF resolution). Results are given for a concatenation of six CIF video sequences. Different time‐accuracy tradeoffs can be used according to thresholds selection. For the same computational complexity, adaptive FME‐GCK outperforms diamond search.
62
9. Conclusion
In this dissertation a novel fast block motion estimation algorithm called FME‐GCK has been
presented. FME‐GCK uses an efficient projection framework which bounds the distance
between a template block and candidate blocks using highly efficient filter kernels.
Candidate regions that are distant from the template macroblock are quickly rejected using
a rapid computation of lower bounds. For the few remaining candidate blocks, the SAD
distortion measure is used.
The FME‐GCK algorithm enables flexibility in the tradeoff between coding efficiency and
computational complexity by allowing adaptivity of the motion estimation process based on
image content and complexity limitations. Algorithm results are guaranteed to converge to
the optimal (full search) results with the increase of allowed computation.
When tuned to a computational complexity equal to that incurred by three‐step search or by
diamond search, and when its adaptivity parameters are appropriately selected, the
FME‐GCK algorithm significantly outperforms both three‐step search and diamond search. In
addition, FME‐GCK incurs only integer arithmetic and sequential memory access, thus it is
appropriate for embedded systems or for any other application where the constraints on
memory complexity are not very tight.
63
Bibliography [1] D. Salomon, Data Compression: The Complete Reference, 4th ed. London: Springer, 2007.
[2] Y. Q. Shi and H. Sun, Image and Video Compression for Multimedia Engineering: Fundamentals, Algorithms, and Standards. Boca Raton, Fla: CRC Press, 1999.
[3] I. E. G. Richardson, Video Codec Design: Developing Image and Video Compression Systems. Chichester: Wiley, 2002.
[4] I. E. G. Richardson, H.264 and MPEG‐4 Video Compression: Video Coding for Next Generation Multimedia. Chichester ; Hoboken, NJ: Wiley, 2003.
[5] M. Ghanbari, Standard Codecs: Image Compression to Advanced Video Coding. London: Institution of Electrical Engineers, 2003.
[6] R. Schafer and T. Sikora, "Digital Video Coding Standards and Their Role in Video Communications," Proceedings of the IEEE, vol. 83 (6), pp. 907‐924, 1995.
[7] "Information Technology – Coding of Moving Pictures and Associated Audio for Digital Storage Media at Up to About 1.5 Mbit/s ‐ Part 2: Video," ISO/IEC 11172‐2 (MPEG‐1 Video), 1993.
[8] "Information Technology ‐ Generic Coding of Moving Pictures and Associated Audio Information: Video ": ISO/IEC 13818‐2 and ITU‐T Rec. H.262 (MPEG‐2 Video) 1995.
[9] "Information Technology ‐ Coding of Audio Visual Objects ‐ Part 2: Visual," ISO/IEC 14496‐2 (MPEG‐4 Video), 1999.
[10] "Video Codec for Audiovisual Services at p x 64 Kbit/s," ITU‐T Recommendation H.261, 1993.
[11] "Video Coding for Low Bit Rate Communication," ITU‐T Reommendation H.263, 1998.
[12] "Advanced Video Coding for Generic Audiovisual Services," ITU‐T Reccomendation H.264 and ISO/IEC 14496‐10 AVC, 2003.
[13] "H.264/AVC Reference Software ver. 11.1," Joint Video Team (JVT) of ISO/IEC MPEG & ITU‐T VCEG, http://iphome.hhi.de/suehring/tml/, August 2006.
[14] K.‐P. Lim, G. Sullivan, and T. Wiegand, "Text Description of Joint Model Reference Encoding Methods and Decoding Concealment Methods," Joint Video Team (JVT) of ISO/IEC MPEG & ITU‐T VCEG Doc. JVT‐X101, July 2007.
[15] Y.‐W. Huang, C.‐Y. Chen, C.‐H. Tsai, C.‐F. Shen, and L.‐G. Chen, "Survey on Block Matching Motion Estimation Algorithms and Architectures with New Results," Journal of VLSI Signal Processing, vol. 42 (3), pp. 297–320, 2006.
[16] J. R. Jain and A. K. Jain, "Displacement Measurement and Its Application in Interframe Image Coding," IEEE Transactions on Communications, vol. 29 (12), pp. 1799–1808, 1981.
64
[17] T. Koga, K. Iinuma, A. Hirano, Y. Iijima, and T. Ishiguro, "Motion‐Compensated Interframe Coding for Video Conferencing," Proceedings of the National Telecommunications Conference (NTC'81), pp. G5.3.1‐5, 1981.
[18] L. M. Po and W. C. Ma, "A Novel Four‐step Search Algorithm for Fast Block Motion Estimation," IEEE Transactions on Circuits and Systems for Video Technology, vol. 6 (3), pp. 313–7, 1996.
[19] M. Ghanbari, "The Cross‐Search Algorithm for Motion Estimation," IEEE Transactions on Communications, vol. 38 (7), pp. 950–3, 1990.
[20] S. Zhu and K.‐K. Ma, "A New Diamond Search Algorithm for Fast Block Matching Motion Estimation," Proceedings of IEEE International Conference on Information, Communications, and Signal Processing (ICICS’97), pp. 292–296, 1997.
[21] J. Y. Tham, S. Ranganath, M. Ranganath, and A. A. Kassim, "A Novel Unrestricted Center‐Biased Diamond Search Algorithm for Block Motion Estimation," IEEE Transactions on Circuits and Systems for Video Technology, vol. 8 (4), pp. 369‐377, 1998.
[22] C. H. Hsieh, P. C. Lu, J. S. Shyn, and E. H. Lu, "Motion Estimation Algorithm Using Interblock Correlation," IEEE Electronic Letters, vol. 26 (5), pp. 276–277, 1990.
[23] J. Chalidabhongse and C. C. J. Kuo, "Fast Motion Vector Estimation Using Multiresolution‐Spatio‐Temporal Correlations," IEEE Transactions on Circuits and Systems for Video Technology, vol. 7 (3), pp. 477–488, 1997.
[24] A. Zaccarin and B. Liu, "Fast Algorithms for Block Motion Estimation," Proccedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’92), pp. 449–452, 1992.
[25] B. Liu and A. Zaccarin, "New Fast Algorithms for the Estimation of Block Motion Vectors," IEEE Transactions on Circuits and Systems for Video Technology, vol. 3 (2), pp. 148–157, 1993.
[26] D. Tzovaras, M. G. Strintzis, and H. Sahinolou, "Evaluation of Multiresolution Block Matching Techniques for Motion and Disparity Estimation," Signal Processing: Image Communication, vol. 6, pp. 56–67, 1994.
[27] W. Li and E. Salari, "Successive Elimination Algorithm for Motion Estimation," IEEE Transactions on Image Processing, vol. 3 (1), pp. 105–107, 1995.
[28] C.‐H. Lee and L.‐H. Chen, "A Fast Motion Estimation Algorithm Based on the Block Sum Pyramid," IEEE Transactions on Image Processing, vol. 6 (11), pp. 1587‐91, 1997.
[29] Y.‐S. Chen, Y.‐P. Hung, and C.‐S. Fuh, "A Fast Block Matching Algorithm Based on the Winner‐Update Strategy," IEEE Transactions on Image Processing, vol. 10 (8), pp. 1212‐22, 2001.
[30] S.‐Y. Choi and S.‐I. Chae, "Hierarchical Motion Estimation in Hadamard Transform Domain," Electronics Letters, vol. 35 (25), pp. 2187‐8, 1999.
[31] M. Brunig and B. Menser, "A Fast Exhaustive Search Algorithm Using Orthogonal Transforms," Proceedings of the 7th International Workshop on Systems, Signals, and Image Processing (IWSSIP'2000), pp. 111‐4, 2000.
65
66
[32] S.‐W. Liu, S.‐D. Wei, and S.‐H. Lai, "Winner Update on Walsh‐Hadamard Domain for Fast Motion Estimation," Proceedings of the 18th International Conference on Pattern Recognition (ICPR'06), vol. 3, pp. 794‐797, 2006.
[33] Y. Hel‐Or and H. Hel‐Or, "Real‐Time Pattern Matching Using Projection Kernels," Proceedings of the 9th IEEE International Conference on Computer Vision (ICCV'03), pp. 1486‐93, 2003.
[34] Y. Hel‐Or and H. Hel‐Or, "Real‐Time Pattern Matching Using Projection Kernels," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27 (9), pp. 1430‐ 1445, 2005.
[35] K. G. Beauchamp, "Applications of Walsh and Related Functions," Academic Press, 1984.
[36] N. Li, C.‐M. Mak, and W.‐K. Cham, "Fast Block Matching Algorithm in Walsh Hadamard Domain," Proceedings of the 7th Asian Conference on Computer Vision (ACCV'06), pp. 712‐721, 2006.
[37] G. Ben‐Artzi, H. Hel‐Or, and Y. Hel‐Or, "Filtering with Gray‐Code Kernels," Proceedings of the 17th International Conference on Pattern Recognition (ICPR'04), vol. 1, pp. 556‐9, 2004.
[38] G. Ben‐Artzi, H. Hel‐Or, and Y. Hel‐Or, "The Gray‐Code Filter Kernels," IEEE Transactions on Pattern Analysis and Machine Intelligence., vol. 29 (3), pp. 382‐393, 2007.
[39] M. Gardner, "The Binary Gray Code," in Knotted Doughnuts and Other Mathematical Entertainments, W. H. Freeman, Ed., 1986, pp. 11‐27.
[40] P. Simard, L. Bottou, P. Haffner, and Y. LeCun, "Boxlets: A Fast Convolution Algorithm for Neural Networks and Signal Processing," Advances in Neural Information Processing Systems, 1999.
[41] D. Knuth, The Art of Computer Programming, 3rd ed. vol. 3: Sorting and Searching. Redwood City, CA: Addison‐Wesley, 1997.
[42] T. Tan, G. Sullivan, and T. Wedi, "Recommended Simulation Common Conditions for Coding Efficiency Experiments," ITU‐T Q.6/SG16, Document VCEG‐AA10d1, October 2005.
[43] G. Bjontegaard, "Calculation of average PSNR differences between RD‐curves," ITU‐T Q.6/SG16, Document VCEG‐M33, April 2001.
[44] "MPEG‐2 Video Test Model 5," ISO/IEC JTC1/SC29/WG11 Document 93/457, 1993.
[45] "Rate Control for Low‐delay Video Communications [H.263 TM8 rate control]," ITU‐T Q6/SG16 Document Q15‐A‐20, 1997.