hill, p. r., chiew, t-k., & bull, d. r. (2006). interpolation free ...istortion results,...

5
Hill, P. R., Chiew, T-K., & Bull, D. R. (2006). Interpolation free sub-pixel motion estimation for H.264. In 2006 IEEE International Conference on Image Processing, Atlanta, GA, United States. (pp. 1337 - 1340). Institute of Electrical and Electronics Engineers (IEEE). DOI: 10.1109/ICIP.2006.312581 Peer reviewed version Link to published version (if available): 10.1109/ICIP.2006.312581 Link to publication record in Explore Bristol Research PDF-document University of Bristol - Explore Bristol Research General rights This document is made available in accordance with publisher policies. Please cite only the published version using the reference above. Full terms of use are available: http://www.bristol.ac.uk/pure/about/ebr-terms.html

Upload: others

Post on 25-Dec-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hill, P. R., Chiew, T-K., & Bull, D. R. (2006). Interpolation free ...istortion results, Bot-values for the far neighbours (see eq. 5). Fallbackcheck I is therefore a checkto see if

Hill, P. R., Chiew, T-K., & Bull, D. R. (2006). Interpolation free sub-pixelmotion estimation for H.264. In 2006 IEEE International Conference onImage Processing, Atlanta, GA, United States. (pp. 1337 - 1340). Institute ofElectrical and Electronics Engineers (IEEE). DOI:10.1109/ICIP.2006.312581

Peer reviewed version

Link to published version (if available):10.1109/ICIP.2006.312581

Link to publication record in Explore Bristol ResearchPDF-document

University of Bristol - Explore Bristol ResearchGeneral rights

This document is made available in accordance with publisher policies. Please cite only the publishedversion using the reference above. Full terms of use are available:http://www.bristol.ac.uk/pure/about/ebr-terms.html

Page 2: Hill, P. R., Chiew, T-K., & Bull, D. R. (2006). Interpolation free ...istortion results, Bot-values for the far neighbours (see eq. 5). Fallbackcheck I is therefore a checkto see if

INTERPOLATION FREE SUB-PIXEL MOTION ESTIMATION FOR H.264

P.R. Hill, T K. Chiew and D.R. Bull

Dept. of Electrical and Electronic Engineering, The University of Bristol, Bristol, BS5 IUB, UK.paul.hill @bristol.ac.uk

ABSTRACT

Sub-pixel motion compensation plays an important role incompression efficiency within modern video codecs such asMPEG2, MPEG4 and H.264. Sub-pixel motion compensation is implemented within these standards using interpolatedpixel values at 1/2 or 1/4 pixel accuracy. Such interpolationgives a good reduction in residual energy for each predictedmacroblock and therefore improves compression. However,such interpolation is very computationally complex for theencoder. This is especially true for H.264 where the cost of anexhaustive set of macroblock segmentations need to be esti-mated in order to obtain an optimal mode for prediction. Thispaper presents a novel interpolation-free scheme for sub-pixelmotion compensation using the result of the full pixel SADdistribution of each motion compensated block applied to anH.264 encoder. This system produces reduced complexitymotion compensation with a controllable trade-off betweencompression performance and encoder speed. These methodsfacilitate the generation of a real time software H.264 encode.

Index Terms- Video coding, Interpolation, Motion com-pensation

1. INTRODUCTION

Sub-pixel accuracy for motion compensation is traditionallymade possible using interpolated reference frames. The cre-ation and use of such interpolated reference frames has a sig-nificant implication for the computational load of the encoderand memory bandwidth requirements. The use of multiplereference frames within the H.264 encoder is required for en-hanced prediction (resulting in better compression) and errorresilience. Such use of multiple reference frames will ob-viously have considerable memory requirements when usinginterpolated reference frames. This paper introduces an inter-polation free method for sub-pixel block based motion esti-mation in order to reduce memory bandwidth requiremenLtsand improve computational efficiency in order to facilitatereal time encoding and in situations with limited memory andmemory bandwidth.

In section 2 of this paper, a parabolic model of the sub-pixel resolution motion estimation cost is described that usesthe SAD cost at the best whole pixel resolution position and

its neighbours. Section 3 describes the generation of the pa-rameters for the model described in section 2 using the wholepixel resolution results. Once the parameters for the modelare generated, the minimum value of the model is estimatedusing the techniques described in section 4. This methodwhen used in a simple and direct way was found to give worseresults (in a rate-distortion sense) than the fully interpolatedcase. In order to improve the location of the actual minimumusing the developed method an interpolated fallback was de-veloped as described in section 5. The check used for the fall-back mechanism is described in 5.1 supported by the resultsgiven for the two fallback checks given in section 6. Finally,a conclusion and summary of the developed methods is givenin section 7.

2. MODEL DESCRIPTION

In the H.264 reference software, quarter pixel motion esti-mation is enabled using interpolated reference frames. Ourmethod obtains sub-pixel accuracy by finding the minimum ofa 2D parabolic curve. This curve is fitted to the 9 whole pixelSAD neighbourhood values surrounding the whole pixel mo-tion estimation minimum. This model is reasonable for anystationary 2D signal and extremely close to the actual interpo-lation surface with Gaussian shaped autocorrelation functions[11

The parametrically controlled parabolic surface used toestimate the sub-pixel SAD values is defined by the eq. [2]:

SADi(() A 2+ B(Y + C(Qxx + D-x+E(y + F (1)

Where SADi(() is the estimated SAD value of the ithblock. The Qx and (y values are the co-ordinates of the es-timation, centered at the best motion vector at whole-pixelresolution (they vary from 1.0 to -1.0) with the co-ordinate(0,0) being the best motion vector at whole-pixel resolution.Therefore for the quarter-pixel case used in our experimen-tal H.264 encoder, the parameters (x and (y take the values[X31 4,01rl11/4 21 ,1]L1,- 3/4,- I/ 2S- 1/4,0, 1/4iI/ 2, 3/4,11 -

After the motion vector at full pixel resolution is obtained,the SAD values of its nearest neighbours are made available(calculated or retrieved) giving 8 nearest SAD neighboursfrom which the parameters A, B, C, D, E and F can be es-

1-4244-0481-9/06/$20.00 C2006 IEEE 1337 ICIP 2006

Authorized licensed use limited to: UNIVERSITY OF BRISTOL. Downloaded on January 23, 2009 at 05:50 from IEEE Xplore. Restrictions apply.

Page 3: Hill, P. R., Chiew, T-K., & Bull, D. R. (2006). Interpolation free ...istortion results, Bot-values for the far neighbours (see eq. 5). Fallbackcheck I is therefore a checkto see if

timated. These neighbours are shown in figure 1. For the sub-sequently described techniques, the 8 neighbours are eitherlabeled near neighbours (with even indices) or far neighbours(odd indices).

5 6 7 Pixel co-ordinates with respect to centre pixel:Pixel 8: (0,0) Pixel 2: (0,1) Pixel 5: (-1,-1)Pixel 0: (1,0) Pixel 3: (- 1,1) Pixel 6: (0,-1)

4 8 )o Pixel 1: (1,1) Pixel 4: (-1,0) Pixel 7: (1,-1)Centre pixel: Pixel 8Near neighbours: Pixels 0, 2, 4, 6

3)2 1 Far neighbours: Pixels 1, 3, 5, 7

Fig. 1. Illustration of position of neighbour pixels

There are 9 potential points for the estimation of the 6 pa-rameters within eq. 1. The system is therefore overdescribedand there are therefore many possible estimation methods.Chiew [3] presented three models for this parameter estima-tion. The first method used an underdetermined model basedjust on the near neighbours (defined as the Near-NeighboursModel (NNM)). A second method used an Overcomplete Sys-tem Model (OSM) that used all the 9 neighbour points anda pseudo matrix inverse method for obtaining A-F. Thesemethods were proved to give inferior results to our chosenmethod, the Complete-System Model (CSM).

3. THE COMPLETE SYSTEM MODEL (CSM) FORPARABOLIC PARAMETER ESTIMATION

The CSM parameter estimation model has been found to bethe most efficient computationally and in terms of compres-sion [3]. In this model, the parameters A, B, D, E and F arecalculated from eqs. 2 [the values of S are defined in [3]].Eqs 2 are derived from the simple insertion of neighbour val-ues into eq. 1 shown in figure 1. The value of C is firstlyset to zero in the under-determined case (NNM). Howeverwithin the CSM method the value of C is chosen from theset C1l, C3, C5, C7 where Ck is the value of C found with thecomplete system of equations using points 0, 2, 4, 6, 8 and k(as defined in figure 1). Then the value of C is chosen usingeq. 3, where Ski is the estimate of Si using eq. 1 with para-meter set A, B, Ck, D, E, F. This model determines whichof the far-neighbours best fits the model and ignores the other3, thus removing the effect of outliers. Its complexity is sim-ilar to that of the NNM method whilst also guaranteeing theexistence of a minimum point.

A -S8 +(So +S) B -S + S( +S6);D (So S), S -6); F = S8 (2)-2 O 42 2 61 ?8 V

C = argmi Si Ski (3)k= 1 ?3 ,5 7

i=1,3,5,7

4. OBTAINING THE SURFACE MINIMUM

A parabolic surface (i.e. second order polynomial) as definedin eq. 1 will only have one minimum in continuous space.However this minimum is not guaranteed to be within the (-1,1) x (-1,1) area. Therefore analytical methods are not appro-priate. The technique adopted in [3] is to evaluate the valueof S(x, y) for all sub-pixel locations within the (-1,1)x(-1,1)area. This achieves the desired result but has an associatedhigh computational complexity due to its exhaustive searchand the large number of multiplications associated with eq. 1.

We now present three methods for calculating the mini-mum of the parabolic surface without an exhaustive search.

4.1. Obtaining Parabolic Minimum Methods I & 2

The basis of this method is to start at an origin and checkthe S() value of its near (4-connected) neighbours (at quarterpixel accuracy). If any of these values are less than the valueof So at the origin then move the origin to the position of thenew value and then repeat until the minimum is found.

This method is similar to the block based gradient de-scent motion compensation scheme utilised within the H.263standard reference software and defined by Liu and Feig[4].However, this is only using interpolated values whereas ourmethod uses the estimated SAD values derived from the pa-rameterised parabolic surface (eq. 1). This method is notguaranteed to find the minimum value of S(k) within the(-1,1) x(-1,1) area as the descent of the gradient can getcaught in local minima. This problem is reduced using an8-connected search. Method 1 using an 8-connected search isdefined as method 2.

4.2. Obtaining Parabolic Minimum Method 3

A two stage algorithm checks the value of the 8-connected1/2 pixel resolution positions (and the origin) relative to theorigin. The final minimum of S() is then the minimum of an8-connected 1/4 pixel resolution check centred on the mini-mum at the 1/2 pixel resolution (including the minimum atthe 1/2 pixel resolution).

4.3. Results for Obtaining Minimum Methods

Figure 2 shows the results of using the three methods de-scribed above. The top figure shows an insignificant differ-ence in rate distortion efficiency between using any of thesemethods. However, when examined closely (see inset) it canbe seen that all three methods have a slight cost involved whencompared to the full search The cost for each method is vaiable over the rate distortion curve hut is approximately 0.2%of the rate at a fixed PSNR value. This cost is the highestfor method I using a 4-connected search followed by the 8-connected search of method 2 and finally method 3. However,

1338

Authorized licensed use limited to: UNIVERSITY OF BRISTOL. Downloaded on January 23, 2009 at 05:50 from IEEE Xplore. Restrictions apply.

Page 4: Hill, P. R., Chiew, T-K., & Bull, D. R. (2006). Interpolation free ...istortion results, Bot-values for the far neighbours (see eq. 5). Fallbackcheck I is therefore a checkto see if

the speed of these methods shown in the bottom figure of fig-ure 2 shows the significant speed improvement using method1. As the compression performance for all the methods is in-significant compared to the full search (top figure) method 1is preferred.

40 -

38 -

35.3536 -

35.3.34-

35.2532-

0.315 0.32 0.3250.2 0.4 0.6 0.8 1 1.2

.... Method 3-Method 1(4 connected)

12 -Method 2(8 connected)-Full search

11 ......_._._._._._ ._ ._ ._ ._. _. _. _.._. ....._ ._ ._ ._._.._. _..... ._......10 _ ----- --_______________________

9 ... ..............I.............................I........................ ...

9

(DU)

LL

0 0.2 0.4 0.6 0.8bit rate Mb/s

Fig. 2. Results for obtaining the parabolic miniman Sequence: 100 frames, CIF). Top: Rate dtomn: Encoder speed in frames per second.

41

40

39

38

37

cf) 36

35

34

33

32

31

/- i

...Method 1, Fallback_ - - Full interp _

- Methodl, No Fallback_ Whole Pixel

36.3. t

36.1 Il

36

35.9 I

357

0.4 0.5 0.6

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6Bitrate Mb/s

Fig. 3. Comparison of results for sub-pixel methods: Foreman

check I is therefore a measure of how well the parabolic sur-fice. modle fits the SA Tn valaes, for the far neiqhhnnr-ir This1 1.2 1. 4

IIIk Ji U %.I Ill3 LIAlk KJ AL- V4IULLt., l%UI liM, lUl lllJlt< La 1113l

is achieved by obtaining a measure of the divergence from themodel DivMod. DiThvAod is the average difference between

imum methods (Fore- the predicted value of S() (using eq. 1) with the actual SADistortion results, Bot- values for the far neighbours (see eq. 5). Fallback check I is

therefore a check to see if this value is over a threshold (seeeq. 5).

5. INTERPOLATED FALL-BACK

The results from method 1, provides a reduction in bitratewhen compared to the simple full pixel case. However, thismethod is measurably worse (in terms of rate distortion per-formance) compared to the fully interpolated case (as shownin figure 3).

We therefore adopt the concept of the interpolated fall-back. This involves a check that measures the quality of theresult from method 1. The motion vector from method I iskept without any further method if the result of the fallbackcheck is positive. However if the fallback check is negativethen the usual interpolation method is used. This has an im-pact on computationally efficiency. The greater the use of in-terpolation, the better the bit rate but higher the computationalload. This therefore allows a trade off between computationalefficiency and compression according to a quality threshold.

5.1. Fallback check

The fallback check should reflect how good method 1 (4connected) is at achieving a SAD minimum similar to thatwhich would have been obtained by interpolation. Fallbackcheck 1: The CSM method for obtaining the orrect parame-ters for the parabolic minimum surface does not give an exactmatch for the far neighbours. i.e S() in eq. does not alwaysequal the actual SAD values for the far neighbours. Fallback

DivMod Ei=1,3,5,7 Si Ski (4)Fcallbackcheckl DivModl(Bh x Bw) > threshll (5)

Where Si is the predicted SAD values at far neighbour posi-tions 1,3,5 alnd 7. Ski is the actual SAD values (same posi-tions). Bw and Bh indicate the mode block size.

Bw and Bh are included to normalise the effect of dif-ferent sized blocks. Figure 4 shows how the variation of thethreshold changes the percentage of interpolation fallback andits associated compression and timings. The threshold sets thetrade off between compression and speed. Fallback check 2:A more direct method of checking how good method I is, isby getting the actual interpolated SAD value at the quarterpixel motion vector position indicated by the minimum foundby method 1. The absolute difference between the actual in-terpolated SAD value at this point and the predicted SADvalue (SO in eq. 1) using method 1 is compared to a threshold(as in eq. 5) to form Fallback check 2. As with fallback check1, there is also a trade off between compression and speed ac-cording to the value of the threshold. ThLs is also shown infigure 4. N.B. the cost function of each block is based on theactual interpolated SAD value. This significantly reduces thebitrate compared to using the estimated SAD value.

1339

.. -

" II .I.

" .1

Grzn

1 .8

Authorized licensed use limited to: UNIVERSITY OF BRISTOL. Downloaded on January 23, 2009 at 05:50 from IEEE Xplore. Restrictions apply.

Page 5: Hill, P. R., Chiew, T-K., & Bull, D. R. (2006). Interpolation free ...istortion results, Bot-values for the far neighbours (see eq. 5). Fallbackcheck I is therefore a checkto see if

6. FALLBACK CHECK RESULTS

Figure 4 shows the results of the two fallback check meth-ods described above. The top line of the top graph shows thebit rate of a system without any interpolation and the bottomline of the same graph shows the bit-rate of the system withfull interpolation. Between these two lines the two thresholdmethods decrease the bit rate as the proportion of interpola-tion increases. However this is a much greater increase thanfrom a random selection of interpolation. The lower graphshows the frame rate of the systems with the top and bot-tom lines similarly delimiting the non and full interpolationmodes. The decreasing graph shows how both systems be-come quicker the less interpolation is performed.

Fallback check I was seen to give moderately better re-

sults and was chosen. As bitrate conservation was chosen as

a priority the threshold of 2.0 (threshold 1) was chosen givinga 40% proportion of interpolation (approximately). As shownin figure 4, this gives a significant reduction in bit rate com-pared to the non-interpolated case but also has a reasonablespeed-up (8fps). This method and threshold were used for allsubsequent experiments (in the tables below).

0.691

0.68

.Q 0.67-06

o 0.66

m 0.65

0.64

0.63)5 0.1 0.15 0.2 0.25 0.3

Fallback checkl ---- Fallback check2

0o Full Interp 05 0

No Fallback

1 010-

9 -

8 -

..............................................................................................

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55Proportion of Interpolation over entire sequence

Fig. 4. Comparison of threshold methods. Foreman CIF= 26, almost exactly the same PSNR over bitrate range

100. QP

Table 1. The speeds of methods in frames per second CIFx00

Table 2. Difference between rate distortion curves (measured in %PSNR difference over logaiithmiic scale [5]) comipared to full inter-polation curve: CIF x 100

7. CONCLUSION

Quarter pixel motion compensation provides a key contribu-tion to the compression performance of H.264. However, theinterpolation required to give an accurate estimation for quar-

ter pixel motion compensation is very computationally inten-sive. This is compounded by the exhaustive search used bythe H.264 encoder to check all possible prediction modes. In-terpolation free quarter pixel motion estimation is able to givea substantial reduction in computational complexity. This isshown in table I where the speed improvement of the non

fallback method over full interpolation ranges from 25% to60%. However, as is showrn in figure 3 this method does notmatch the performance (in a rate-distortion sense) of a fullyinterpolated system. The fall back methodology introduced inthis paper improves the rate distortion performance to a levelclose to full interpolation with only a slight increase in com-plexity (compared to the non fallback system). Table 2 illus-trates this, comparing the difference in rate distortion curves

of the three methods.

8. REFERENCES

[1] G. Giunta, "Fine estimators of 2D parameters and ap-

plication to spatial shift estimation," IEEE Trans. SignalProcessing, vol. 46, no. 12, pp. 3201-3207, Dec 1999.

[2] G. Giunta, "Fast estimators of time delay and Dopplerstretch based on discrete-time methods," IEEE Trcans.Signal Processing, vol. 46, pp. 1785-1797, July 1998.

[3] T.K. Chiew, Rapid Segmentation for Video Processing,Ph.D thesis, The University of Bristol, 2004.

[4] L K. Liu and E. Feig, "A Block Based Gradient DescentSearch Algorithm for Block Motion Estimation in VideoCoding," IEEE Trans. Circuits and Systems for Video

Technology, vol. 6, no. 4, pp. 419-423, 1996.

[5] G. Bjontegaard, Calculation of average PSNR differ-

ences between RD-curves," VCEG document VCEG-M33, March 2001.

1340

Fallbac~k No Fallback Whole PixelForeman 0.4004 1.5703 10.9342Akiyo 0.0027 0.2316 0.7219Coast 0.2333 0.7102 8.5228

alul 0.0000 0.0378 0.4380Stefan 0.1249 1.9357 16.5843Mobile 0.2066 3.5145 21. 3099

Full Interp FBack No FBack Whole PixelForeman 7.3034 8.5351 10.5877 14.9931Akiyo 9.7557 11.7719 12.2695 15.6095Coast 7.1907 7.6270 12.1715 16.2123Hall 10.902 13.197 13.7286 17.8572Stefan 6.4978 7.2846 103925 14.9289Mobile 5 8278 8 0765 11 3227 15 1986.

0,-d

0)

or

Authorized licensed use limited to: UNIVERSITY OF BRISTOL. Downloaded on January 23, 2009 at 05:50 from IEEE Xplore. Restrictions apply.