ieee transactions on circuits and systems for...

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 16, NO. 1, JANUARY 2006 3

Kalman Filtering Based Rate-Constrained MotionEstimation for Very Low Bit Rate Video Coding

Chung-Ming Kuo, Shu-Chiang Chung, and Po-Yi Shih

Abstract—The rate-constrained (R-D) motion estimationtechniques have been presented to improve the conventionalblock-matching algorithm by using a joint rate and distortioncriterion. This paper presents two motion estimation algorithmsusing Kalman filter to further enhance the performance of theconventional R-D motion estimation at a relative low computa-tional cost. The Kalman filter exploits the correlation of blockmotion to achieve higher precision of motion estimation andcompensation. In the first algorithm, the Kalman filter is utilizedas a postprocessing to raise the motion compensation accuracy ofthe conventional R-D motion estimation. In the second algorithm,the Kalman filter is embedded into the optimization process ofR-D motion estimation by defining a new R-D criterion. It furtherimproves the rate-distortion performance significantly.

Index Terms—Kalman filter, motion model, R-D motionestimation.

I. INTRODUCTION

MOTION estimation plays an important role in videocoding systems [1]–[7], such as H.26x and MPEG-x,with significant improvement in bit rate reduction. Among thevarious motion estimation approaches, the block-matching al-gorithm (BMA) is the most popular one due to its simplicity andreasonable performance. In BMA, an image frame is dividedinto nonoverlapping rectangular blocks with equal or variableblock sizes, and all pixels in each block are assumed to have thesame motion. The motion vector (MV) of a block is estimatedby searching for its best match within a search window in theprevious frame. The distortion between the current block andeach searching block is employed as a matching criterion. Theresulting MV is used to generate a motion compensated pre-diction block. The motion compensated prediction differenceblocks (called residue blocks) and the MVs are encoded andsent to the decoder. In high-quality applications, the bit ratefor MVs, , is much less than that for residues, ; thus,

can be neglected in MV estimation. However, in low- orvery low- bit rate applications such as videoconference andvideophone, the percentage of MV bit rate is increased whenoverall rate budget decreases. Thus, the coding of MVs takesup a significant portion of the bandwidth [9]. Then in very low

Manuscript received August 3, 2003; revised January 9, 2004. This workwas supported by National Science Council of Taiwan, R.O.C., under GrantNSC 92-2213-E-214-044, and in part by I-Shou University under GrantISU-93-07-03. This paper was recommended by Associate Editor H. Sun.

C.-M. Kuo and S.-C. Chung are with the Department of InformationEngineering, I-Shou University, Kaoshiung 840, Taiwan, R.O.C. (e-mail:[email protected]; [email protected]).

P.-Y. Shih is with the VIA Technologies Inc., Hsin-Chu 300, Taiwan, R.O.C.(e-mail: [email protected]).

Digital Object Identifier 10.1109/TCSVT.2005.857287

bit rate compression, the motion compensation must considerthe assigned MV rate simultaneously. Thus, a joint rate anddistortion (R-D) optimal motion estimation has been developedto achieve the trade-off between MV coding and residue coding[8]–[16]. In [13], a global optimum R-D motion estimationscheme is developed. The scheme achieves significant im-provement of performance, but it employs Viterbi algorithmfor optimization, which is very complicated and results in asignificant time delay. In [14], a local optimum R-D motionestimation criterion was presented. It effectively reduces thecomplexity at the cost of performance degradation.

In this paper, we will propose two Kalman filter-basedmethods to improve the conventional R-D motion estimation,which are referred to as enhanced algorithm and embeddedalgorithm, respectively. In the enhanced algorithm, the Kalmanfilter is employed as a post processing of MV, which extendsthe integer-pixel accuracy of MV to fractional-pixel accuracy,thus enhancing the performance of motion compensation.Because the Kalman filter exists in both encoder and decoder,the method achieves higher compensation quality withoutincreasing the bit rate for MV.

In the embedded algorithm, the Kalman filter is applied di-rectly during the process of optimization of motion estimation.Since the R-D motion estimation consider compensation error(distortion) and bit rate simultaneously, when Kalman filter isapplied the distortion will be reduced, and thus lowering the costfunction. Therefore, the new algorithm can improve distortionand bit rate simultaneously.

However, the rate constraint used in this paper is a generalcriterion for motion estimation. Thus, the approaches we pro-posed can be combined with existing advanced motion estima-tion algorithms such as overlapped block motion compensa-tion (OBMC) [17], [18], and those recommended in H.264 orMPEG-4 AVC [19], [20].

The paper is organized as follows. A brief review of the re-lated works is presented in Section II. In Section III, we derivethe enhanced R-D motion estimation based on the Kalman filter.In Section IV, we describe how to embed the Kalman filter intothe R-D optimization process of motion estimation. Simulationresults are presented in Section V and finally the conclusionsare given in Section VI.

II. REVIEW OF RELATED WORKS

A. Kalman Filter

The Kalman filtering algorithm estimates the states of asystem from noisy measurement [21], [22]. There are two

1051-8215/$20.00 © 2006 IEEE

4 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 16, NO. 1, JANUARY 2006

major features in a Kalman filter. One is that its mathematicalformulation is described in terms of state-space representation,and the other is that its solution is computed recursively. Itconsists of two consecutive stages: prediction and updating. Wesummarize the Kalman filter algorithm as follows:

Predicted equation

(1)

Measurement equation (2)

where and are state and measurement vector attime . , and are state transition, measurement anddriving matrices, respectively. Generally, we assume that

and are white Gaussian with ,and for all , . Let

, andbe initial conditions.

Prediction (see equations (3) and (4) at the bottom of thepage.)

Updating (see equations (5)–(7) at the bottom of the page.)

The is the error covariance matrix that is associated withthe state estimate . It is defined as

(8)

This matrix provides a statistical measure of the uncertaintyin . The superscripts “ ” and “ ” denote “before” and“after” measurement, respectively.

B. R-D Motion Estimation

In conventional motion estimation, a major consideration isto reduce the motion compensated prediction error such that thecoding rate for the prediction error can be reduced. This is truefor high-rate applications because the bit rate for MVis only a very small part of all transmission rates. However, inlow bit rate or very low bit rate situation, is a significantpart in all available rate budget. For this reason, should beconsidered into the process of motion estimation. Therefore, thecriterion of motion estimation must be modified accordingly.

In 1994, Bernd Girod addressed this problem first. He pro-posed a theoretical framework for rate-constrained motion esti-mation, and a new region based motion estimation scheme [12].

Fig. 1. Generic hybrid video coding system.

In motion compensated hybrid coding, the bit rate can be di-vided into the displacement vector field, the prediction error, andadditional side information. The very accurate motion compen-sation is not the key to a better picture quality at low or very lowbit rates. So, the problem of optimally allocating a limited ratebudget to the displacement vector field and the motion-compen-sated prediction error is addressed.

In 1998, Chen and Willson have conferred this point again[13], and analyzed this issue thoroughly. They explained a newestimation criterion in detail, and proposed a new rate-con-strained motion estimation for general video coding system.Since the performance of video compression is according to notonly motion compensation but also the rate budget, which isinclude bit rate for MV and bit rate for prediction error. There-fore, The optimal solution can then be searched for throughoutthe convex hull of all possible R-D pairs by minimizing thetotal Lagrangian cost function given by

(9)

where is the quantization parameter for blocks, respec-tively. This approach, however, is computationally intensive,involving a joint optimization between motion estimation/com-pensation and prediction residual coding schemes. By (9), wesee that the dicrete cosine transform (DCT) and quantization op-erations must be performed on an MV candidate basis in orderto obtain and . The significant computa-tions make the scheme unacceptable for most practical imple-mentations, no matter what software or hardware. Thus, they

State prediction (3)

Prediction-error covariance (4)

State updating (5)

Updating-error covariance (6)

Kalman gain matrix (7)

KUO et al.: KALMAN FILTERING BASED RATE-CONSTRAINED MOTION ESTIMATION FOR VERY LOW BIT RATE VIDEO CODING 5

Fig. 2. Block diagram of the proposed enhanced R-D motion estimation algorithm.

simplify (9) by only considering motion estimation error andbit rate for MV.

Assume a frame is partitioned into block sets. Letbe the MV estimated for block . Then the mo-

tion field of a frame is described by the -tuple vector,. The joint R-D optimization can

be interpreted as finding a MV field that minimizes the distor-tion under a given rate constraint, which can be formulated bythe Lagrange multiplier method as follows:

(10)

where is the Lagrange multiplier, and and are the mo-tion-compensated distortion and the number of bits associatedwith MV of the block , respectively. In most video coding stan-dards, the MVs of blocks are differentially coded using Huffmancode. Thus, the blocks are coded dependently. However, thissimplification has two evident defects: 1) it is still too complexand 2) the performance is degraded.

In the same year, Coban and Merserau proposed differentscheme on the RD optimal problem [14]. They think that (10)

is a principle for global optimal of R-D problem, but it is diffi-cult in implementation. They supposed, if each block is codedindependently, the solution (10) can be reduced to minimizingthe Lagrangian cost function of each block, i.e.,

(11)

In order to simplify the problem, although the MVs are codeddifferentially, the blocks will be treated as if they are beingcoded independently. This will lead to a locally optimal, glob-ally sub-optimal solution. By this way, the framework of R-Doptimal motion estimation is close to of conventional motionestimation. Although it saves computation by ignoring the rela-tion of blocks, it reduces the overall performance.

III. ENHANCED R-D MOTIOIN ESTIMATIONUSING KALMAN FILTER

The R-D motion estimation often yields smooth MV fields, ascompared with conventional BMAs [13], [14]. In other words,the resulting MVs are highly correlated. In this work, we try tofully exploit the correlation of MVs by using the Kalman filter.This is motivated by our previous works [23], [24], in which


Fig. 3. Block diagram of the proposed embedded R-D motion estimation algorithm.

the Kalman filter is combined with the conventional BMAs toimprove the estimate accuracy of MVs.

In [1], a generic hybrid video coding system is depicted inFig. 1. Fig. 2 shows the block diagram of the proposed motionestimation technique, which consists of two cascaded stages:measurement of MV and Kalman filtering. We employ a R-Dfast search scheme [9]–[11], [13]–[15] to obtain the measuredMV. Then we model the MVs and generate the predicted MVutilizing the inter-block correlation. Based on the measured andpredicted MVs, a Kalman filter is applied to obtain an optimalestimate of MV.

For the sake of simplicity in implementation, we employ thefirst-order AR (autoregressive) model to characterize the MVcorrelation. The MV of the block at location of the -thframe is denoted by , and itstwo components in horizontal and vertical directions are mod-eled as

(12)

(13)

where and represent the model errorcomponents. In order to derive the state-space representation,the time indexes and are used to represent the currentblock location , and the left-neighbor block location

, respectively. Consequently, the state-space repre-sentation of (12) and (13) are

(14)

or

(15)

where we let , ,and . The error components,and , are assumed to be Gaussian distribution

with zero mean and the same variance .The measurement equations for the horizontal and vertical

directions are expressed by

(16)

where , denote two measurement error componentswith the same variance .

In general, the model error and measurement errormay be colored noises. We can model each colored noise by alow-order difference equation that is excited by white Gaussiannoise, and augment the states associated colored noise modelsto the original state space representation. Finally, we apply therecursive filter to the augmented system. However, the proce-dure requires considerable computational complexity and is not


TABLE ICOMPARISONS OF COMPRESSION PERFORMANCE, IN TERMS OF PSNR, OVERALL BIT RATE, AND MV BIT RATE FOR

VARIOUS MOTION ESTIMATION ALGORITHMS USING THE CIF-CLAIR 100 FRAMES UNDER 15 FRAMES/S

TABLE IICOMPARISONS OF COMPRESSION PERFORMANCE, IN TERMS OF PSNR, OVERALL BIT RATE, AND MV BIT RATE FOR

VARIOUS MOTION ESTIMATION ALGORITHMS USING THE CIF- SALESMAN 100 FRAMES UNDER 15 FRAMES/S

TABLE IIICOMPARISONS OF COMPRESSION PERFORMANCE, IN TERMS OF PSNR, OVERALL BIT RATE, AND MV BIT RATE FOR

VARIOUS MOTION ESTIMATION ALGORITHMS USING THE QCIF- FOREMAN 100 FRAMES UNDER 10 FRAMES/S

suitable for our application. Moreover, in this paper, the mea-surements are obtained by the R-D fast search algorithm [14], inwhich the blocks are processed independently. Thus, we can as-sume that the measurement error is independent. For simplicitybut without loss of generality, the prediction error and measure-ment error are assumed to be zero-mean Gaussian distributionwith the same variances and , respectively.

In the above equations, the measurement matrix is con-stant, and state transition matrix can be estimated by theleast square method. Since the motion field for low bit rate ap-plications is rather smooth, we assume that and arewith fixed values.

The proposed algorithm is summarized as follows.

Step 1) Measure MV.Measure the MV of a moving block,

by any R-D search algorithms[9]–[11], [13]–[15]. Encode the MV by H.263Huffman table [6], [7].

Step 2) Kalman filteringa) The predicted MV is obtained by

b) Calculate prediction-error covariance by


TABLE IVCOMPARISONS OF COMPRESSION PERFORMANCE, IN TERMS OF PSNR, OVERALL BIT RATE, AND MV BIT RATE FOR VARIOUS

MOTION ESTIMATION ALGORITHMS USING THE QCIF- MOTHER & DAUGHTER 120 FRAMES UNDER 10 FRAMES/S

TABLE VCOMPARISONS OF COMPRESSION PERFORMANCE, IN TERMS OF PSNR, OVERALL BIT RATE, AND MV BIT RATE FOR VARIOUS


TABLE VICOMPARISONS OF COMPRESSION PERFORMANCE, IN TERMS OF PSNR, OVERALL BIT RATE, AND MV BIT RATE FOR VARIOUS


c) Obtain Kalman gain by

d) The MV estimate is updated by

This is the final estimate output.e) Calculate the filtering-error covariance by

Step 3) Go to Step 1 for next block.In the above algorithm, the optimal estimate is usu-

ally real, thus yielding fractional-pixel accuracy estimate. Theconventional BMA can also obtain the fractional-pixel MV byincreasing resolution with interpolation and matching higher-resolution data on the new sampling grid. However, this notonly increases computational complexity significantly, but alsoraises overhead bit rate for MV. On the contrast, the required

computational overhead is much lower than that of the conven-tional BMA with fractional-pixel matching. In addition, usingthe same Kalman filter as in the encoder, the decoder can esti-mate the fractional part of MV by receiving integer MV. In sum-mary, the new method achieves fractional pixel performancewith the same bit rate for MV as an integer-search BMA, atthe cost of a small increase of computational load at the de-coder. The detail analysis of computational complexity will begiven in Section V. Furthermore, because the Kalman filter isindependent with motion estimation, it can be combined withany existing R-D motion estimation scheme with performanceimprovement.

IV. KALMAN FILTER EMBEDDED R-D MOTION ESTIMATION

The main feature of the above enhanced scheme is to obtainfractional pixel accuracy of MV with estimation instead of ac-


TABLE VIICOMPARISONS OF COMPRESSION PERFORMANCE, IN TERMS OF PSNR, OVERALL BIT RATE, AND MV BIT RATE FOR

VARIOUS MOTION ESTIMATION ALGORITHMS USING THE QCIF- CARPHONE 120 FRAMES UNDER 10 FRAMES/S

TABLE VIIICOMPARISONS OF COMPRESSION PERFORMANCE, IN TERMS OF PSNR, OVERALL BIT RATE, AND MV BIT RATE FOR

VARIOUS MOTION ESTIMATION ALGORITHMS USING THE QCIF- CARPHONE 120 FRAMES UNDER 10 FRAMES/S

TABLE IXEXTRA COMPUTATION REQUIRED BY KALMAN FILTERING FOR EACH ALGORITHM

tual searching. Hence, no extra bit rate is needed for the frac-tional part of MV. However, because the enhanced algorithmdoes not involve the estimation process of MV, the obtainedMV is not optimum from viewpoint of distortion. To addressthe problem, we develop a new R-D motion estimation, in whichthe Kalman filter is included into the estimation process of R-Dscheme. We refer to it as Kalman filter embedded R-D motionestimation and describe the details in the following.

The cost function of Kalman filter embedded R-D motion es-timation can be formulated as

(17)

The is a distortion of Kalman filter-basedmotion compensation. It is obtained by Kalman filtering the in-teger-point MV and the resulting floating-point MV is used togenerate motion compensation prediction. In such case, the MVis represented in integer-point, but it can generate motion com-pensation with fractional pixel accuracy. Therefore, the assignedbit rate for MV is not affected by , but the totalcost function is reduced due to the accuracy increase in com-pensation. Fig. 3 is the block diagram of the embedded algo-rithm. For simplicity, we select (11) as the criterion for motionestimation.

The Kalman filter embedded R-D motion estimation algo-rithm is summarized as follows.

Step 1) Kalman filter-based motion estimationa) Select a location in the search range and denote it

as a candidate measurement of MV .b) Apply the Kalman filter to using the pro-

cedure of Step 2 in the previous section. Then weobtain an optimal estimate of MV , whichis with fractional accuracy. Calculate the distortion

according to the .Step 2) Calculate the bit rate of the MV according to

the H.263 Huffman table [6], [7]. Notice that transmis-sion MV is , which is an integer; thus the re-quired bit rate of MV is not affected by Kalman filter.

Step 3) Using (17), we calculate the cost function. If the bestmatch is found, go to Step 4; otherwise, go back to Step1 to select the next location for estimation.

Step 4) Go to Step 1 for next block.

In the enhanced algorithm, the Kalman filter is not appliedduring the block searching. It is only used to enhance the perfor-mance when MV is obtained by R-D motion estimation. There-fore, the Kalman filter can be viewed as a post processing of


Fig. 4. Comparisons of PSNR performance using the QCIF-Froeman sequence, 120 frames at 10 frames/s, fixed coding bit rate at 2000 bits. Block size = 8�8,search range = [�15; 16].

motion estimation. However, in the embedded algorithm, theKalman filter is applied for every block searching by employingthe joint R-D. Thus, it can be considered as a new R-D mo-tion estimation approach. Since it includes the Kalman filterinto the optimization process, the embedded method performsbetter than the enhanced version at the cost of computationalcomplexity.

V. SIMULATION RESULTS

The performance of the proposed RD-motion estimation withKalman filter (RD-Kalman) was evaluated using a set of stan-dard image sequences including Forman, Mother and Daughter,Carphone, Salesman and Claire. All sequences are with CIF(352 288) or QCIF (176 144) resolution and frame rateof 10 Hz. Since the RD-Kalman motion estimation has frac-tional pel accuracy, the results are compared with the conven-tional RD algorithm and MSE-optimal scheme with both in-teger and half-pixel accuracy. The block size 16 16 and searchrange 64 64 for CIF format and block size 16 16 (or 8 8)and search range 31 31 for QCIF format were chosen, re-spectively. The conventional RD and RD-Kalman adopted thesame motion estimation strategy as that in [13]. Specifically,for the current block, the MVs of the left-neighbor block andup-neighbor block, and the MV obtained with MSE criterion,were selected as the predicted search center, and then a smallsearch of 3 3 is performed.

For the KF-based motion estimation, the parameters arechosen experimentally as follows: the model coefficients

, model error variance , measurementerror variance , initial error covariance ,and initial state . It is evident that from [19], theestimated MVs are real values rather than integer. The dis-placed pixels may not be on the sampling grid. Therefore,

the well-known bilinear interpolation is adopted to generatea motion compensated prediction frame. A Huffman code-book adopted from H.263 standard was used in the coding of2-D differentially coded MVs. The various algorithms werecompared in terms of R-D performance. The common peaksignal-to-noise ratio (PSNR) measure defined in the followingwas selected to evaluate distortion performance

(18)

Moreover, rate performance was evaluated by the number ofbits required to encode an image frame or a motion field.

The Lagrange multiplier , which controls the overall perfor-mance in the R-D sense, is a very important parameter. Gen-erally, an iterative method is needed to determine the value of

. However, it is very computational expensive. As pointed outin [14], for typical video coding applications is insensitive todifferent frames of a video sequence; thus a constant of 20 isadopted in our simulations.

The simulation was carried out by incorporating variousmotion estimation algorithms into an H.263 based motioncompensated DCT video coding system. To be fair in thecomparisons, we fixed the overall coding bit rate at 4000 bitsper frame for CIF-Claire and CIF-Salesman. For QCIF format,two block sizes are conducted for each sequence, which areassigned two different bit rates per frame, respectively. The bitrates preset are 2000 bits (8 8) and 1600 bits (16 16) forForman, 1400 bits and 1000 bits for Mother & Daughter, and2000 bits and 1400 bits for Carphone, respectively.

The averaged results for 100 frames of CIF format sequencesare summarized in Tables I and II. The Kalman-based R-Dmotion estimation approach outperformed the MSE-optimaland conventional RD algorithms in terms of PSNR. Since


Fig. 5. Comparisons of PSNR performance using the QCIF-Froeman sequence, 120 frames at 10 frames/s, fixed coding bit rate at 2000 bits. Block size= 16�16,search range = [�31; 32].

Fig. 6. Comparisons of PSNR using the QCIF-Mother & Daughter sequence, 200 frames at 10 frames/s, fixed coding bit rate at 2000 bits. Block size = 8� 8,search range = [�15; 16].

the Kalman filter has fractional pel accuracy with the ratesof integer MV, it achieves significant PSNR improvement, asexpected. When the integer-based Kalman filter is comparedto the motion estimation methods in half pixel accuracy, itstill achieves better PSNR, but not so significantly. We foundthat the Kalman filter with half pixel accuracy performs betterslightly than that with integer pixel accuracy. This may be dueto the limitation of bilinear interpolation; i.e., the accuracyimprovement is saturated when too many interpolations areperformed. The performance may be further enhanced with the

advanced interpolation filters [25], [26]. However, it is not amajor issue in our paper.

At the same bit rate level and integer pixel accuracy, the en-hanced algorithm achieved an average of 1.23 dB gain overMSE-optimal and 0.34 dB gain over the conventional RD. Theembedded version achieved an average of 1.77 dB gain overMSE-optimal, and 0.88 dB gain over the conventional RD. Notethat the new methods have lower bit rate. Tables III–VIII sum-marized the average results for QCIF format sequences. Forboth block sizes of 16 16 and 8 8, the Kalman filter-based


Fig. 7. Comparisons of PSNR performance using QCIF-Mother and Daughter sequence, 200 frames at 10 frames/s, fixed coding bit rate at 1400 bits. Block size= 16 � 16, search range = [�31; 32].

Fig. 8. Comparisons of PSNR performance using the QCIF-Carphone sequence, 120 frames at 10 frames/s, fixed coding bit rate at 2000 bits. Block size = 8�8,search range = [�15; 16].

R-D motion estimation approaches achieve significant PSNRimprovement. Particularly, the embedded Kalman R-D algo-rithm achieves the best performance due to its ability in reduc-tion of MV rate as well as the compensation distortion.

Figs. 4–9 compare the MSE-Optimal, conventional R-D, en-hanced Kalman R-D and embedded Kalman R-D schemes withboth integer and half pixel accuracy in terms of PSNR withapproximately fixed bit rate for each sequence, respectively.Figs. 10–12 compare these algorithms in terms of bit rate withapproximately fixed PSNR for each sequence, respectively. Asexpected, the results indicate that the proposed schemes achievebetter R-D performance.

The MV fields generated by various algorithms are shownin Figs. 13–17, respectively. The test sequences contain mainlysmall rotation and camera panning. As expected, the proposedalgorithm produces smoother motion fields because of the fil-tering effect of Kalman filter.

Analysis of Computational Complexity: Consider the casethat a block size is , the maximum displacement is p, andthe matching criterion is mean absolute difference (MAD), i.e.,


Fig. 9. Comparisons of PSNR performance using the QCIF-Carphone sequence, 120 frames at 10 frames/s, fixed coding bit rate at 1400 bits. Block size =16 � 16, search range = [�31; 32].

Fig. 10. Comparisons of bit rate performance using the CIF-Salesman sequence, 120 frames at 10 frames/s, fixed average PSNR full search at 39.93 dB, halfpixel search at 40.01 dB, half pixel RD at 39.92 dB, half pixel RD with KF(En) at 39.95 dB, half pixel RD with KF(Em) at 40.05 dB, RD-Optimal at 39.88 dB,RD with KF(En) at 39.95 dB, and RD with KF(Em) at 39.98 dB.

where is the pixel intensity at the location ofblock in the current frame , and

is the pixel intensity at the location with the dis-placement in the previous frame . The full search algo-rithm (FSA) requires search locations. Each search lo-cation corresponds to one MAD computation, which consists of

additions and absolute operations. Therefore, thetotal computation is additions and

absolute operations.In general, the Kalman filtering is computationally expensive.

However, in our application, the computational complexity isrelative small because the calculation of Kalman filtering can

be significantly simplified. We evaluate the required calcula-tions for Kalman filtering as follows. The Kalman filtering con-tains two stages: prediction and updating. In our application,the state transition matrix , driven matrix and mea-surement matrix are all 2 2 identity matrix. The covari-ance of model and measurement error is 2 2 diagonal matrixwith the same constant value and , respectively. For sim-plicity, we denote the addition and multiplication as and

, respectively.Prediction:


Fig. 11. Comparisons of bit rate performance using the QCIF-Mother and Daughter sequence, 200 frames at 10 frames/s, fixed average PSNR full search at38.80 dB, half pixel search at 38.82 dB, half pixel RD at 38.75 dB, half pixel RD with KF(En) at 38.81 dB, half pixel RD with KF(Em) at 38.89 dB, RD-Optimalat 38.77 dB, RD with KF(En) at 38.83 dB, and RD with KF(Em) at 38.87 dB.

Fig. 12. Comparisons of bit rate performance using the QCIF-Carphone sequence, 120 frames at 10 frames/s, fixed average PSNR Full Search at 37.23 dB, HalfPixel Search at 37.32 dB, Half Pixel RD at 37.20 dB, Half Pixel RD with KF(En) at 37.29 dB, Half Pixel RD with KF(Em) at 37.37 dB, RD-Optimal at 37.14 dB,RD with KF(En) at 37.28 dB, and RD with KF(Em) at 37.30 dB.

No calculation is needed

The is 2 2 diagonal matrix with. Thus, can be expressed as

It contains only two additions .Updating:

The is the covariance matrix, i.e.,, which is 2 2 diagonal matrix with. The expansion of is

Thus we have and. It contains only (here, division is

regarded as multiplication) and

Obviously the calculation contains and .


(a)

(b)

(c)

(d)

Fig. 13. (a) Motion field estimated by the conventional MSE-Optimal schemeon the CIF- Claire sequence frame 15. The PSNR quality is 39.87 dB and itrequires 1980 bits to encode using the H.263 Huffman codebook. (b) Motionfield estimated by the Michael C. Chen proposed RD-Optimal scheme on theCIF-Claire sequence frame 15. The PSNR quality is 39.26 dB and it requires1378 bits to encode using the H.263 Huffman codebook. (c) Motion fieldestimated by the R-D Optimal with Enhanced Algorithm on the QIF-Clairesequence frame 15. The PSNR quality is 39.45 dB and it requires 1378 bits toencode using the H.263 Huffman codebook. (d) Motion field estimated by theR-D Optimal with Embedded Algorithm on the CIF-Claire sequence frame 15.The PSNR quality is 39.94 dB and it requires 1030 bits to encode using theH.263 Huffman codebook.

Finally, we consider the calculation of

(a)

(b)

(c)

(d)

Fig. 14. (a) Motion field estimated by the conventional MSE-Optimal schemeon the CIF- Salesman sequence frame 07. The PSNR quality is 36.32 dB andit requires 1246 bits to encode using the H.263 Huffman codebook. (b) Motionfield estimated by the Half Pixel with RD-Optimal scheme on the CIF- Salesmansequence frame 07. The PSNR quality is 36.11 dB and it requires 1095 bits toencode using the H.263 Huffman codebook. (c) Motion field estimated by theHalf Pixel RD with Enhanced Algorithm scheme on the CIF-Salesman sequenceframe 07. The PSNR quality is 36.24 dB and it requires 1095 bits to encode usingthe H.263 Huffman codebook. (d) Motion field estimated by the Half Pixel RDwith Embedded Algorithm scheme on the CIF-Salesman sequence frame 07.The PSNR quality is 36.47 dB and it requires 1016 bits to encode using theH.263 Huffman codebook.

Obviously the calculation of can be simpli-fied as and


(a)

(b)

(c)

(d)

Fig. 15. (a) Motion field estimated by the conventional Half Pixel scheme onthe QCIF- Foreman sequence frame 204. The PSNR quality is 34.56 dB and itrequires 1230 bits to encode using the H.263 Huffman codebook. (b) Motionfield estimated by the Half Pixel with RD-Optimal on the QCIF-Foremansequence frame 204. The PSNR quality is 34.15 dB and it requires 1158 bitsto encode using the H.263 Huffman codebook. (c) Motion field estimated bythe Half Pixel RD with Enhanced Algorithm scheme on the QCIF-Foremansequence frame 204. The PSNR quality is 34.27 dB and it requires 1158 bitsto encode using the H.263 Huffman codebook. (d) Motion field estimated bythe Half Pixel RD with Embedded Algorithm scheme on the QCIF-Foremansequence frame 204. The PSNR quality is 34.66 dB and it requires 889 bits toencode using the H.263 Huffman codebook.

. The calculation contains and.

(a)

(b)

(c)

(d)

Fig. 16. (a) Motion field estimated by the conventional Half Pixel schemeon the QCIF- Mother and Daughter sequence frame 28. The PSNR qualityis 34.83 dB and it requires 1476 bits to encode using the H.263 Huffmancodebook (b) Motion field estimated by the Half Pixel with RD-Optimalscheme on the QCIF- Mother & Daughter sequence frame 28. The PSNRquality is 34.52 dB and it requires 1112 bits to encode using the H.263 Huffmancodebook. (c) Motion field estimated by the Half Pixel RD with EnhancedAlgorithm scheme on the QCIF- Mother and Daughter sequence frame 28.The PSNR quality is 34.67 dB and it requires 1120 bits to encode using theH.263 Huffman codebook. (d) Motion field estimated by the Half Pixel RDwith Embedded Algorithm scheme on the QCIF- Mother & Daughter frame28. The PSNR quality is 34.95 dB and it requires 868 bits to encode using theH.263 Huffman codebook.

Since the computation is combined the and componentssimultaneously. We assumed that the two components use


(a)

(b)

(c)

(d)

Fig. 17. (a) Motion field estimated by the conventional Half Pixel schemeon the QCIF- Carphone sequence frame 60. The PSNR quality is 34.50 dBand it requires 1302 bits to encode using the H.263 Huffman codebook.(b) Motion field estimated by the Half Pixel with RD-Optimal scheme on theQCIF- Carphone sequence frame 60. The PSNR quality is 34.07 dB and itrequires 1132 bits to encode using the H.263 Huffman codebook. (c) Motionfield estimated by the Half Pixel RD with Enhanced Algorithm scheme onthe QCIF-Carphone sequence frame 60. The PSNR quality is 34.21 dB and itrequires 1132 bits to encode using the H.263 Huffman codebook. (d) Motionfield estimated by the Half Pixel RD with Embedded Algorithm on theQCIF-Carphone sequence frame 60. The PSNR quality is 34.63 dB and itrequires 830 bits to encode using the H.263 Huffman codebook.

the same Kalman filter, therefore, the actual computation isonly half of the above analysis. For embedded algorithm, wemust calculate the distortion function foreach searching. Thus, the interpolation is necessary. For each

pixel, the bilinear interpolation requires and , soa block with size need and . Thecomputational complexity of enhanced algorithm is differentfrom that of embedded algorithm, since the filtering operationis performed once per block for the former, but once for eachsearch location for the latter. In the decoder, there is a sameKalman filter, therefore, it performs one Kalman filteringoperation after the MVs received no matter what enhanced orembedded algorithms. The extra computational load requiredfor the proposed algorithms is summarized in Table IX. Itindicates the extra computation introduced by the proposedmethod is small.

VI. CONCLUSION

In this paper, we have presented two efficient Kalman filter-based R-D motion estimation algorithms in which a simple 1-DKalman filter is applied to improve the performance of conven-tional RD motion estimation. Since equivalent Kalman filtersare used in both encoder and decoder, no extra information bitfor MV is needed to send to the decoder. The new algorithmachieves significantly PSNR gain with only a slight increase ofcomplexity. The enhanced algorithm is a post processing, andcan be easily combined with any conventional R-D motion esti-mation schemes. The embedded algorithm is a new R-D motionestimation algorithm that can more effectively exploit the corre-lation of block motion. In the future, we will develop an adaptivescheme to further improve the performance motion compensa-tion. In addition, the combinations with variable block size andOBMC searching, and the investigation of robust transmission,are interesting issues and will be studied in the future.

ACKNOWLEDGMENT

The authors would like to thank the anonymous reviewers fortheir valuable comments and suggestions.

REFERENCES

[1] J. R. Jain and A. K. Jain, “Displacement measurement and its applicationin interframe image coding,” IEEE Trans. Commun., vol. 29, no. 12, pp.1799–1808, Dec. 1981.

[2] T. Koga, K. Iinuma, A. Hirano, Y. Iijima, and T. Ishiguro, “Motion-com-pensated interframe coding for video conferencing,” in Proc. NTC81,New Orleans, LA, 1981, pp. C9.6.1–9.6.5.

[3] R. Srinivasan and K. Rao, “Predictive coding based on efficient motionestimation,” IEEE Trans. Commun., vol. 33, no. 8, pp. 888–896, Aug.1985.

[4] “MPEG-4 Visual Fixed Draft International Standard,”, ISO/IEC14 496-2, 1998.

[5] “MPEG-4 Video Verification Model Version 18.0,” MPEG Video Group,ISO/IEC JTC1/SC29/WG11 N3908, 2001.

[6] “Video Coding for Low Bitrate Communication,” ITU Telecom. Stan-dardization sector of ITU, ITU-T Recommendation H.263, 1996.

[7] “Video Coding for Low Bitrate Communication,” ITU Telecom. Stan-dardization sector of ITU, Draft ITU-T Rec. H.263 Version 2, 1997.

[8] H. li, A. Lundmark, and R. Forchheimer, “Image sequence coding atvery low-bit bitrates: A review,” IEEE Trans. Image Process., vol. 3, no.9, pp. 589–609, Sep. 1994.

[9] T. Wiegand, M. Lightstone, D. Mukherjee, T. G. Cambell, and S. K.Mitra, “Rate- distortion optimized mode selection for very low bit ratevideo coding and the emerging H.263 standard,” IEEE Trans. CircuitsSyst. Video Technol., vol. 6, no. 2, pp. 482–190, Apr. 1996.


[10] F. Kossentini, Y.-W. Lee, M. J. T. Smith, and R. K. Ward, “Predictive RDoptimized motion estimation for very low bit rate video coding,” IEEEJ. Sel. Areas Commun., vol. 15, no. 6, pp. 1752–1763, Dec. 1997.

[11] D. T. Hoang, P. M. Long, and J. S. Vitter, “Efficient cost measure formotion estimation at low bit rate,” IEEE Trans. Circuits Syst. VideoTechnol., vol. 8, no. 5, pp. 488–500, Aug. 1998.

[12] B. Girod, “Rate-constrained motion estimation,” Proc. SPIE VisualCommun. Image Process., vol. 2308, pp. 1026–1034, Nov. 1994.

[13] M. C. Chen and A. N. Willson, “Rate-distortion optimal motion estima-tion algorithm for motion-compensated transform video coding,” IEEETrans. Circuits Syst. Video Technol., vol. 8, no. 3, pp. 147–158, Apr.1998.

[14] M. Z. Coban and R. M. Mersereau, “A fast exhaustive search algorithmfor rate-constrained motion estimation,” IEEE Trans. Image Process.,vol. 7, no. 5, pp. 769–773, May 1998.

[15] J. C. H. Ju, Y. K. Chen, and S. Y. Kung, “A fast rate-optimized motionestimation algorithm for low-bit rate video coding,” IEEE Trans. CircuisSyst. Video Technol., vol. 9, no. 7, pp. 994–1002, Oct. 1999.

[16] Y. Y. Sheila and S. Hemami, “Generalized rate-distortion optimizationfor motion-compensated video coders,” IEEE Trans. Circuis Syst. VideoTechnol., vol. 10, no. 6, pp. 942–955, Sep. 2000.

[17] M. T. Orchard and G. J. Sullivan, “Overlapped block motion compen-sation: An estimation-theoretic approach,” IEEE Trans. Image Process.,vol. 3, no. 5, pp. 693–699, Sep. 1994.

[18] J. K. Su and R. M. Mersereau, “Motion estimation methods for over-lapped block motion compensation,” IEEE Trans. Image Process., vol.9, no. 6, pp. 1509–1521, Sep. 2000.

[19] “Draft ITU-T Recommendation and Final Draft International Stan-dard of Joint Video Specification (ITU-T Rec. H.264 ISO/IEC14496-10AVC) Joint Video Team (JVT),”, Doc. JVT-G050, 2003.

[20] T. Wiegand, G. Sullivan, G. Bjøntegaard, and A. Luthra, “Overview ofthe H.264/AVC video coding standard,” IEEE Trans. Circuits Syst. VideoTechnol., vol. 13, no. 7, pp. 560–576, Jul. 2003.

[21] C. K. Chui and G. Chen, Kalman Filtering With Real-Time Applications,ser. Springer Series in Information Sciences. New York: Springer,1987, vol. 17.

[22] M. S. Grewal and A. P. Andrews, Kalman Filtering Theory and Prac-tice. Englewood Cliffs, NJ: Prentice-Hall, 1993.

[23] C. M. Kuo, C. H. Hsieh, Y. D. Jou, H. C. Lin, and P. C. Lu, “Motionestimation for video compression using Kalman filtering,” IEEE Trans.Broadcast., vol. 42, no. 2, pp. 110–116, Jun. 1996.

[24] C. M. Kuo, C. H. Hsieh, H. C. Lin, and P. C. Lu, “Motion estima-tion algorithm with Kalman filter,” Electron. Lett., vol. 30, no. 7, pp.1204–1206, Jul. 1994.

[25] T. Wedi and H. G. Musmann, “Motion- and aliasing-compensatedprediction for hybrid video coding,” IEEE Trans. Circuits Syst. VideoTechnol., vol. 13, no. 7, pp. 577–586, Jul. 2003.

[26] T. Wedi, “Adaptive interpolation filter for motion compensated predic-tion,” in Proc. 2002 IEEE Int. Conf. Image Process., vol. 2, Sep. 22–25,2002, pp. II-509–II-512.

Chung-Ming Kuo received the B.S. degree fromthe Chinese Naval Academy, Kaohsiung, Taiwan,R.O.C., in 1982, and the M.S. and Ph.D. degreesfrom Chung Cheng Institute of Technology, Taiwan,R.O.C., in 1988 and 1994, respectively, all in elec-trical engineering.

From 1988 to 1991, he was an Instructor in theDepartment of Electrical Engineering, Chinese NavalAcademy, where he became an Associate Professorin January 1995. From 2000 to 2003, he was an As-sociate Professor in the Department of Information

Engineering, I-Shou University, Kaoshiung, Taiwan, R.O.C., and became Pro-fessor in February 2004. His research interests include video compression andimage/video retrieving, multimedia signal processing, and optimal estimation.

Shu-Chiang Chung received the B.S. and M.S.degrees in system engineering from Chung ChengInstitute of Technology, Taiwan, R.O.C., in 1983 and1994, respectively. He is currently working towardthe Ph.D. degree at I-Shou University, Kaohsiung,Taiwan, R.O.C.

From 1994 to 1999, he was an Instructor at theChinese Naval Academy, Kaohsiung, Taiwan, R.O.C.In November 1999, he joined the Naval Shipbuildingand Development Center as a Researcher with theDepartment of Information Systems. His research in-

terests include video coding, image processing, digit signal processing, and op-timal estimation.

Po-Yi Shih received the B.S. degree in 2001 and theM.S. degree in information engineering in 2003, bothfrom I-Shou University, Kaohsiung, Taiwan, R.O.C.

In 2003, he joined the VIA Technologies, Inc.,Hsin-Chu, Taiwan, R.O.C., where he is involved inresearch and design of video encoder and decodersystems. His current research interests include videocoding, image processing, and optimal estimation.

tocKalman Filtering Based Rate-Constrained Motion Estimation for VeChung-Ming Kuo, Shu-Chiang Chung, and Po-Yi ShihI. I NTRODUCTIONII. R EVIEW OF R ELATED W ORKSA. Kalman FilterB. R-D Motion Estimation

Fig. 1. Generic hybrid video coding system.Fig. 2. Block diagram of the proposed enhanced R-D motion estimaIII. E NHANCED R-D M OTIOIN E STIMATION U SING K ALMAN F ILTER

Fig. 3. Block diagram of the proposed embedded R-D motion estimaTABLE I C OMPARISONS OF C OMPRESSION P ERFORMANCE, IN T ERMS OF TABLE II C OMPARISONS OF C OMPRESSION P ERFORMANCE, IN T ERMS OFTABLE III C OMPARISONS OF C OMPRESSION P ERFORMANCE, IN T ERMS OTABLE IV C OMPARISONS OF C OMPRESSION P ERFORMANCE, IN T ERMS OFTABLE V C OMPARISONS OF C OMPRESSION P ERFORMANCE, IN T ERMS OF TABLE VI C OMPARISONS OF C OMPRESSION P ERFORMANCE, IN T ERMS OFIV. K ALMAN F ILTER E MBEDDED R-D M OTION E STIMATION

TABLE VII C OMPARISONS OF C OMPRESSION P ERFORMANCE, IN T ERMS OTABLE VIII C OMPARISONS OF C OMPRESSION P ERFORMANCE, IN T ERMS TABLE IX E XTRA C OMPUTATION R EQUIRED BY K ALMAN F ILTERING FORFig. 4. Comparisons of PSNR performance using the QCIF-Froeman sV. S IMULATION R ESULTS

Fig. 5. Comparisons of PSNR performance using the QCIF-Froeman sFig. 6. Comparisons of PSNR using the QCIF-Mother & Daughter seqFig. 7. Comparisons of PSNR performance using QCIF-Mother and DaFig. 8. Comparisons of PSNR performance using the QCIF-Carphone Analysis of Computational Complexity: Consider the case that a b

Fig. 9. Comparisons of PSNR performance using the QCIF-Carphone Fig. 10. Comparisons of bit rate performance using the CIF-SalesPrediction: $${\mathhat {\bf V}}^{-}(k)={\mmb \Phi} (k-1){\mathh

Fig. 11. Comparisons of bit rate performance using the QCIF-MothFig. 12. Comparisons of bit rate performance using the QCIF-CarpUpdating: $${\bf K}(k)={\bf P}^{-}(k){\bf H}^{T}(k)\left[ {{\bf

Fig. 13. (a) Motion field estimated by the conventional MSE-OptiFig. 14. (a) Motion field estimated by the conventional MSE-OptiFig. 15. (a) Motion field estimated by the conventional Half PixFig. 16. (a) Motion field estimated by the conventional Half PixFig. 17. (a) Motion field estimated by the conventional Half PixVI. C ONCLUSIONJ. R. Jain and A. K. Jain, Displacement measurement and its applT. Koga, K. Iinuma, A. Hirano, Y. Iijima, and T. Ishiguro, MotioR. Srinivasan and K. Rao, Predictive coding based on efficient m

MPEG-4 Visual Fixed Draft International Standard,, ISO/IEC 14 49MPEG-4 Video Verification Model Version 18.0, MPEG Video Group, Video Coding for Low Bitrate Communication, ITU Telecom. StandarVideo Coding for Low Bitrate Communication, ITU Telecom. StandarH. li, A. Lundmark, and R. Forchheimer, Image sequence coding atT. Wiegand, M. Lightstone, D. Mukherjee, T. G. Cambell, and S. KF. Kossentini, Y.-W. Lee, M. J. T. Smith, and R. K. Ward, PredicD. T. Hoang, P. M. Long, and J. S. Vitter, Efficient cost measurB. Girod, Rate-constrained motion estimation, Proc. SPIE Visual M. C. Chen and A. N. Willson, Rate-distortion optimal motion estM. Z. Coban and R. M. Mersereau, A fast exhaustive search algoriJ. C. H. Ju, Y. K. Chen, and S. Y. Kung, A fast rate-optimized mY. Y. Sheila and S. Hemami, Generalized rate-distortion optimizaM. T. Orchard and G. J. Sullivan, Overlapped block motion compenJ. K. Su and R. M. Mersereau, Motion estimation methods for over

Draft ITU-T Recommendation and Final Draft International StandarT. Wiegand, G. Sullivan, G. Bjøntegaard, and A. Luthra, OverviewC. K. Chui and G. Chen, Kalman Filtering With Real-Time ApplicatM. S. Grewal and A. P. Andrews, Kalman Filtering Theory and PracC. M. Kuo, C. H. Hsieh, Y. D. Jou, H. C. Lin, and P. C. Lu, MotiC. M. Kuo, C. H. Hsieh, H. C. Lin, and P. C. Lu, Motion estimatiT. Wedi and H. G. Musmann, Motion- and aliasing-compensated predT. Wedi, Adaptive interpolation filter for motion compensated pr

ieee transactions on circuits and systems for...

Documents