h.264 to vp6 transcoder - ut arlington – uta · frames, but also within a single frame, a...

EE 5359

Multimedia Processing – Summer 2008

Interim Project Report

on

H.264 to VP6 Transcoder

Submitted by

Jay R. Padia

1000 60 5145

Date: July 17, 2008

Abstract

VP6 is a video coding standard developed by On2 Technologies, Inc. It is the preferred codec for

Macromedia Flash 8 video. VP6 assumes importance with Macromedia Flash emerging as a widely

adopted video streaming technology over the internet. H.264 is currently one of the most widely accepted

video coding standards in the industry. It enables high quality video at low bitrates. So there is increasing

importance of techniques which can convert video from H.264 to VP6 and thereby enable high quality

video transmission over the internet using Flash. The current research shows H.263 video which is a

previous generation standard of H.264 can be transcoded to VP6 and complexity can be reduced upto

50%. The similarities and dissimilarities between the two encoders are used to reduce the complexity

using Dynamic Search Range and Dynamic Search Window. The success in reducing complexity in the

H.263 to VP6 transcoder and the available reference material related to transcoding algorithms enables us

to propose a new study to find an algorithm for transcoding H.264 coding standard to VP6 coding

standard. It is proposed to explore the similarities and dissimilarities between the two standards to find the

right transcoding technique.

Importance of the H.264 Standard

H.264 [4] was proposed by the Joint Video Team (JVT) of the ITU-T Video Coding Experts Group

(VCEG) and ISO/IEC Moving Pictures Experts Group (MPEG) in 2003. It is currently one of the most

widely accepted industry standards. It can provide good quality video at substantially lower bitrates

compared to the previous standards. It also shows more error robustness [1] [2].

H.264 has a set of innovations which can together provide a vast improvement in performance over

previous generations of video codecs. MPEG-2 [21] was the most widely used video codec before the

emergence of H.264. H.264 provides the same quality as MPEG-2 at a third to half the data rate. At the

same data rate, H.264 can provide upto 4 times the frame size as can be seen in Table 1. H.264 provides

better image quality when reaching its limits. It does not break into blocks but degrades much more

smoothly, making the image softer as compression increases. H.264 is an emerging standard and over the

years it can see an improvement over the current performance. It can be expected of H.264 to improve

over the years, just as other standards have improved in quality and performance [3].

Table 1. H.264 data rate at various resolutions [3]

Overview of H.264 Standard

H.264 introduces many new features that are significantly different from the previous generation codecs.

These new features make it vastly different from the existing codecs and make it much more effective.

Given below is an overview of the features of H.264 video codec.

Profiles and levels

Like any comprehensive standard, the H.264 standard defines a set of profiles and levels to set points of

conformance for various classes of applications and services. In each profile, specific encoding tools are

permitted to best meet the needs of the intended scenario. H.264 includes six profiles as shown in figure 1

[4]:

• Baseline. Intended for low-complexity applications such as video conferencing and mobile multimedia.

• Main. Intended for the majority of general uses such as the Internet, mobile multimedia, and stored

content.

• Extended. Intended for streaming applications, where stream switching technologies can be beneficial.

• Three High profiles (also known as Fidelity Range Extension or FRExt). Consists of three separate

High profiles (High, High 10, and High 4:2:2), intended for high-end professional uses [3] [5].

Fig 1. H.264 profile levels [3]

4x4 integer transform.

H.264 is designed to operate on much smaller blocks of pixels than other common codecs, which

mitigates blocking, smearing, and ringing artifacts. So H.264 video is crystal clear even in areas of fine

detail. Because the transform is a precisely specified integer transform, it provides bit-precise

reconstruction (that is, exact-match decoding) rather than statistically generated reconstruction. As a

result, there can be no drift among various decoder implementations, so any compliant H.264 decoder will

decode the video exactly as the content author intended it to look [3] [6].

X = input matrix; CfXCf

T = core 2D transformation for X; Ef = matrix formed by scaling factors a, b, c

Increased precision in motion estimation.

H.264 also benefits from increased precision in motion estimation, which is the process of simplifying

redundant data across a series of frames. By expressing information to 1/4-pixel resolution (fig 2) as

opposed to 1/2-pixel resolution like most other codecs, H.264 represents both fast- and slow moving

scenes more precisely. So objects in motion are more crisply reconstructed during decode, providing a

better representation of the source material [7].

Fig 2. Motion vectors in H.264 [7]

Flexible block sizes in motion estimation.

During motion estimation, traditional codecs commonly process frames at the macroblock level (16 pixels

by 16 pixels). H.264 can process on segments within a macroblock, ranging in size from the commonly

used 16x16 to as small as 4x4 as shown in fig 3, which helps to code complex motion in areas of high

detail. The ability of H.264 to perform its processing on a variety of block sizes means that scenes with

complicated motion are more expressively described, providing higher quality in lower data rates [7].

Fig 3(a). Macroblock partitions – 16x16, 16x8, 8x16 & 8x8 [7]

Fig 3(b). Macroblock sub-partitions – 8x8, 8x4, 4x8 & 4x4 [7]

Intraframe prediction.

H.264 is able to gain much of its efficiency by simplifying redundant data not only across a series of

frames, but also within a single frame, a technique called intraframe prediction (figure 4). The H.264

encoder uses intraframe prediction with more ways to reference neighboring pixels, so it compresses

details and gradients better than previous codecs. Intraframe prediction is especially beneficial in high

motion areas, which are traditionally difficult to encode. With H.264, high-motion video can achieve

stunning quality at much lower data rates [3] [8].

Fig 4. 4x4 block intra prediction modes in H.264 [8]

Adaptively tuned deblocking filter.

H.264 also features a robust deblocking filter as observed in figure 5, which operates on 4x4 block

boundaries to remove jagged blocking artifacts. Its filtering is adaptively tuned per block boundary,

making it a very effective smoothing filter during the decoding of a finished bit stream. In addition to

making smoother pictures for display, this filter is used during the encoding process to provide a more

coherent reference picture for subsequent frames, which helps to improve image quality. This advanced

filter technology effectively eliminates blocking artifacts, resulting in a smooth, clean picture [9].

Fig 5(a). H.264 Encoder – Basic encoding structure

Fig 5(b). H.264 Decoder – Basic encoding structure

VP6 Coding standard

TrueMotion VP6 [10] is a new compression technology from On2 Technologies Inc. Macromedia has

licensed it for its Flash suite of products [12]. It features as the main codec for Flash 8 and onwards. It has

interesting features as it gives a very good quality at very high compression.

TrueMotion VP6 is among the best video codecs on the market today. It offers better image quality and

faster decoding performance than Windows Media 9 [22], Real 9 [23], H.264 [4], and QuickTime MPEG-

4 [10]. In internal testing at On2 Technologies Inc, TrueMotion VP6 could beat many H.264

implementations, Windows Media 9 and Real Networks 10 in PSNR comparisons using standard MPEG-

2 test source clips [10]. The VP6 clips were more detailed and contained fewer artifacts than Windows

Media 9 and maintained more texture and detail than Real or H.264 [10].

VP6.2, the latest version of TrueMotion VP6, features a drastic increase in performance from the previous

versions of VP6 [10].

Emerging Importance of VP6 Coding Standard

Flash Video is rapidly changing the landscape of video on the Web. It is emerging as the preferred

solution for providing video services online over Windows Media Player, Apple Quicktime and Real

Networks Real Player [11].

The advantages of Flash Player over its rivals are its small size and its completeness as a website

development package. Its ability to support multiple platforms has made it popular [11].

Macromedia adopted the VP6 coding standard from On2 Technologies, Inc. as the video coding standard

for its Flash player in 2005. It listed quality, portability, stability, low memory usage and performance as

the main criteria for selecting VP6 [12].

It can be observed that significant quality improvement can be obtained with VP6 in Flash 8 over the

Sorenson Spark codec (based on H.263) which was the basis of Flash MX video (as shown in fig 6). It

provides better performance with low contrast video images, removes color oversaturation and also

provides a smoother picture true to the original by removing blockiness in the old format [10].

Improvement in Performance on using VP6

Figure 6 compares the performance of Flash Video using VP6 with Flash MX, the older version which

used the Sorrenson Spark codec which was based on H.263.

The images in Figure 6 (with the exception of the cartoons) are excerpts from a 12:30 minute video of

coral reef exploration. The original source was shot on DVCAM and was stored using photo-JPEG

compression. The only tool used for compressing this video was Flix Professional, using default settings.

The file was preprocessed as follows: since the source was direct from a camera, the 720x486 DV source

needed to have some over-scan cropped out. It was also de-interlaced and sized to 320x240. All

preprocessing was performed in Flix Professional.

In all the comparisons listed, the image on the left side is from VP6 video.

Fig 6(a). Over-saturation of colors in MX (right). [10]

Fig 6(b). Blockiness can be observed in MX (right) [10]

Fig 6(c). Artificial details can be observed in MX (right) [10]

Fig 6(d). Block artifacts in presence of low contrast background. VP6 performs quite well here [10]

Fig 6(e). Absolute mess with MX (right) in low contrast images [10]

It can be observed that VP6 shows significant gains over the old Sorrenson Spark codec used in the Flash

MX. VP6 with all its advantages is finding a place in other applications too. Since then VP6 is gaining

importance as a coding standard.

This creates the need to find a transcoding technique to convert video from H.264 video coding standard

to VP6 video standard.

Comparison of H.264 and VP6

It would be most interesting to observe how VP6 would fare against H.264. A comparative study of

Hulu’s 360p (VP6 based) and 480p (H.264 based) was done (fig 7). The 360p content is VP6 at 700kbps

with a screen resolution of 480×360, while the 480p is H.264 at 1000kbps (or 1Mbps) with a resolution of

640×480. Some of the screenshots of the video played side by side is shown in figure 7.

Fig 7 (a). Comparison of Hulu’s 360p (VP6 based) and 480p (H.264 based) videos [13]

Fig 7(b). Comparison of Hulu’s 360p (VP6 based) and 480p (H.264 based) videos [13]

It can be observed that H.264 with its 480p resolution offers better quality than VP6 at 360p. But also can

be found that at lower resolution and much less bitrate VP6 does not lose any information in the images.

It also shows less blockiness. The color resolution on 480p outscores the lower resolution significantly.

Another observation on 5 second clip in Quicktime (H.264) 640 x 480 and Flash (VP6) 720 x 540 shows

that at similar resolutions, VP6 can give very high compression gains with insignificant loss in visual

quality. Snapshots from each of the clips are shown in figure 8. The size of the .flv clip (5s) is 610 kbytes

over the size of quicktime clip (5s) is 4223 kbytes [14].

It can be observed that VP6 gives significant compression gain at very less loss of visual quality, making

it an excellent choice for video streaming applications.

Fig 8(a). 720x540 flash clip – Significantly small in memory size [14]

Fig 8(b). 640x480 H.264 Clip on Quicktime [14]

Existing Research work

A transcoding technique to convert from the previous generation H.263 standard to VP6 standard has

been proposed [15]. The transcoder has been designed on the basis of the similarities and dissimilarities

between the two standards. Comparison can be found in table 2.

Table 2. Comparison of H.263 and VP6 features [15]

This research particularly holds importance considering the older standard – Sorrenson Spark codec used

in Flash MX was based on the H.263 standard. With the increasing importance of VP6 in streaming

media over the internet this algorithm assumes particular importance. This research also was important in

converting old Flash video formats into VP6 based new video formats. The transcoding algorithms reuse

the information from the H.263 decoding stage and accelerate the VP6 encoding stage. Experimental

results show that the proposed algorithms are able to reduce the encoding complexity by up to 52% while

reducing the PSNR by at most 0.42 dB in the worst case [15].

The goal is to effectively reuse the information gathered during the H.263 decoding stage and speed up

the VP6 encoding stage. The effectiveness of this reuse depends on the similarities and differences

between the input and output video formats. The differences in H.263 and VP6 make it complex to use

transform domain transcoding and pixel domain transcoding was employed by the authors [15].

Transcoder H.263 to VP6

VP6 is also a hybrid codec that uses motion compensated transform coding at its core. The codec has

Intra and Inter pictures similar to MPEG video codecs. Intra pictures are coded independent of other

coded pictures and Inter pictures use previously coded pictures for prediction. Motion compensation

supports 16x16 and 8x8 blocks similar to H.263 but the Inter 8x8 macro blocks can have mixed blocks;

i.e., one or more 8x8 blocks can be coded in Intra mode without using any prediction. The Inter MBs in

VP6 can be coded using 9 different modes. The modes are characterized by the number of motion vectors

(1 vs. 4), reference frame used, whether motion vectors are coded. Where motion vectors are not coded,

the motion vectors are predicted from previously decoded MBs. The VP6 codec uses 8x8 Integer DCT for

transform coding and de-blocking filter is applied at the block boundaries [15].

It can be observed that many features in VP6 are different from H.263 but are similar to H.264. A

comparison between the two standards is presented again later.

The similarities and differences between H.263 and VP6 provide opportunities for reusing H.263 MB

coding mode details for reducing the transcoder complexity. The fact that both H.263 and VP6 support 1

MV and 4 MV modes means that motion vectors can be reused to some extent. However, the fact that

VP6 supports large number of MB modes compared to H.263 means that the H.263 MB mode and motion

vectors cannot be used directly. The differences in the codecs meant that an Inter 16x16 MB in H.263 is

not necessarily coded as an Inter 16x16 MB. Table 3 shows the typical example of MB coding modes

when encoding H.263 decoder output using VP6. For this example, a Foreman video sequence at

352x288 resolution and 297 frames is encoded using H.263 at 384 Kbps and then transcoded to VP6

using full re-encoding at 291 Kbps. The full details of VP6 modes are not given here due to space

considerations. In brief, Nearest and Near MB modes do not code motion vectors and derive their MVs

from previously coded MBs; Golden frames are long term reference frames, and Inter 0,0 forces the use

of a 0,0 motion vector. Each row corresponds to a H.263 MB coding mode and the columns give the VP6

mode used to code those MBs. For example, of all the MBs that are coded as Inter 4V in H.263, 3% were

coded as Inter 0,0 mode, 1% coded as Intra, 30% coded as Inter+MV, 11% nearest, 7% near, and 47% are

coded as Inter 4V MBs. Thus, if an Inter 4V MB in H.263 is mapped to Inter 4V in VP6, it is likely to

map correctly only in 50% of the cases. Thus direct mode mapping will lead to poor results and more

efficient algorithms are necessary [15].

Table 3. MB mode mapping H.263 to VP6 in [15]

The large mismatch of MB coding modes will create poor RD performance if direct mapping of motion

vectors is used. In [15] the patterns which allow them to restrict H.263 modes are evaluated. Near and

Nearest are computationally inexpensive to evaluate and are allowed in all cases. Inter 4V, on the other

hand, takes significant computation and is evaluated only when input MB is also in the Inter 4V mode.

The transcoding algorithms thus reduce the complexity by placing constraints on MB modes evaluated

and further reduce the complexity by using:

1) Dynamic search range and 2) Dynamic search window.

Complexity Reduction Using Dynamic Search Range

The dynamic search range approach sets the search range used for motion estimation for each MB.

Typically this range is fixed throughout the encoding process and is set to 15 in the experiments. With the

knowledge of motion vectors in H.263, the search range no longer has to be fixed. The search range is

changed based on the maximum motion vector component for the current MB. Figure 9 shows the

dynamic search range selection based on H.263 motion vectors. The RD performance is compared to the

baseline transcoder. The results for three of the sequences evaluated are shown and the performance of

the algorithm closely tracks the RD performance of the baseline transcoder. The PSNR drop is higher for

the Stefan sequence because of large motion in the sequence [15].

Fig 9(a). Dynamic Search Range [15] Fig 9(b). Dyanamic Search Window [15]

Complexity Reduction Using Dynamic Search Window

Using a dynamic refinement window further reduces the complexity by reusing the H.263 motion vectors.

Unlike the dynamic search range method where window location is fixed and the window size or search

range is varied, the dynamic search window approach uses the H.263 motion vectors to determine the

position of the fixed sized window. Window sizes of 1x1 and 3x3 for the new motion vector search were

evaluated by the authors (fig 9(b)). This approach reduced the complexity more than the dynamic range

approach due to an even smaller search space. This reduction in complexity comes at a slight increase in

PSNR loss. Figure 9(b) shows the dynamic window derived based on the H.263 motion vectors of a MB.

Figure 10(b) shows a RD plot comparing the dynamic window approach to the baseline approach [15].

In [15] the TMN 3.2 H.263 encoder from University of British Columbia which is based on Telenor's

H.263 implementation was used. The input video is coded at 384 Kbps in baseline profile with advanced

motion options and one I frame (first frame). A decoder based on the same H.263 implementation is used

in the decoding stage of the transcoder. The VP6 encoding stage is based on the optimized VP6 encoder

software provided by On2 Technologies. The VP6 video is encoded with I frame frequency of 120 and at

multiple bitrates to assess the RD performance of the transcoder. The results are compared with the

baseline transcoder that performs full encoding in the VP6 stage.

Fig 10(a). RD performance -Dynamic Search [15]

Fig 10(b). RD performance - Dynamic Window [15]

The results show that the proposed transcoder is able to reduce the complexity by more than 50% without

a significant loss in PSNR. Given that the VP6 implementation used is highly optimized, the resulting

savings of 50% is considered significant. Transcoders based on this approach will be able to transcode at

least 50% more streams for the same hardware configuration.

Comparison of H.264 with the current research work

The authors in [15] show a comparison between H.263 baseline profile and VP6 codec. The similarities

and dissimilarities in the two codecs help design the right transcoder for the application.

On the same lines, a similar comparison is provided in Table 4. Its compares the VP6 features with H.264

baseline features. Certain features in H.264 which are available in Main and High profiles of H.264 are

not included here. It can be observed that there are a lot of similarities between the VP6 and H.264

baseline profile, especially in the features where H.264 differs with other codecs. VP6 supports the use of

integer DCT. It also has deblocking filter like H.264 and supports ¼ pixel accuracy in the motion vectors.

Feature H.263 Baseline VP6 H.264 Baseline

Picture type I, P I, P I, P

Transform Size 8x8 8x8 4x4

Transform DCT Integer DCT Integer DCT

Intra Prediction None None Yes

Motion Compensation

Block Size

16x16, 8x8 16x16, 8x8 16x16, 16x8, 8x16, 8x8, 8x4,

4x8, 4x4

Total MB Modes 4 10 7 inter + (9 + 4) intra

Motion Vectors ½ pixel ¼ pixel ¼ pixel

Deblocking filter None Yes Yes

Reference Frames 1 Max 2 Multiple

Table 4. Comparison of features in H.263 Baseline profile, VP6 and H.264 Baseline profile

Various Transcoding techniques and their applications in H.264 transcoding

A review paper on various techniques and research issues (fig 11) [16] involved in video transcoding

compares the Open-Loop and Closed Loop Transcoder architectures.

Fig 11 . Selection of transcoding function for various applications [16]

Open-Loop Transcoding architecture

Open-Loop transcoding Architecture is the most straightforward transcoding architecture. Here a decoder

and encoder are directly cascaded as shown in figure 12(a). The incoming video stream is fully decoded

and re-encoded into target video with desired bit rate or format. So we find little degradation in visual

quality due to transcoding. However here, decoding of a transcoded video would result in errors if the

predictors of the decoder are different from those in the original encoder. These errors would accumulate

through the whole group of pictures (GOP). The error accumulation resulting from encoder / decoder

predictor mismatch is called ‘drift’ error.

Open loop transcoders contain no feedback loop in the transcoding architecture for compensating the drift

error. Closed-Loop transcoders contain a feedback loop in the transcoding architecture in order to correct

the transcoding distortion by compensating the drift in the transcoder [16] [17].

Fig 12(a). Cascaded decoder and encoder transcoder [16]

Fig 12(b). Cascaded decoder and encoder transcoder [16]

Hybrid Domain Closed-Loop Transcoding Architecture

Various transcoding algorithms provide tradeoff between the computational complexity and reconstructed

video quality. In order to reduce the computational complexity while maintaining the reconstructed video

quality, ME should be omitted and DCT/IDCT should be avoided if possible. One of the architecture uses

MC for P frames only. I frames are intra coded, which need no ME and MC, and thus, IDCT/DCT for I

frames can be omitted in principle. But since I frames are the anchors for subsequent P and B frames, the

IDCT at the decoder stage, inverse quantization and IDCT at the encoder stage for I frames are still

needed to reconstruct the reference frames, while DCT at the encoder stage can be omitted. Since P

frames are also the anchors for the following P and B frames, MC, DCT, and IDCT cannot be omitted.

Transcoding delay can be further reduced without degrading the video quality in this architecture. P

frames with frequent scene changes and rapid motion may contain a large number of INTRA blocks. One

can further omit the IDCT/DCT and MC operation of these INTRA blocks in P frames. In other words,

blocks of I and B pictures and INTRA blocks of P pictures are transcoded in frequency-domain, the

spatial-domain motion compensation is done only when the block is inter block in P frames. This

transcoding architecture is known as hybrid domain transcoding architecture (HDTA), as shown in Fig.

13.

Heterogenous Transcoder

A heterogenous transcoder provides conversion between various standards (fig 14).

A heterogeneous transcoder needs a syntax conversion module, and may change the picture type, picture

resolution, directionality of MVs, and picture rate. A heterogeneous transcoder must adjust the features of

the incoming video to enable the features of the outgoing video. Due to spatial-temporal subsampling,

and different encoding format of the output sequence, the encoder and decoder motion compensation

loops in a heterogeneous transcoder are more complex [17].

Fig 13. Hybrid domain closed-loop transcoder [16]

Generic Heterogeneous Transcoder

A generic heterogeneous transcoder is shown in Fig 14. In this architecture, syntax conversion (SC) is

needed to convert the syntax of source video to that of the target video. A higher resolution decoder

decodes the incoming bitstream. The extracted MVs are then post-processed according to the desired

output encoding structure, and if required, they are properly scaled down to suit the lower spatial-

temporal resolution encoder. In case post-processing is not sufficient, the extracted MVs are refined to

improve the encoding efficiency. The decoded pictures are accordingly down-sampled spatially or

temporally, and the down-sampled images are encoded with the new MVs. Since the incoming MVs are

re-employed and other encoding decisions, such as macroblock types can be extracted from the incoming

bitstream, the architecture of this transcoder can be further simplified. In this architecture, the MVs of the

incoming bitstream are employed in the outgoing one; the extracted MVs have to be converted to be

compatible with the encoding nature of the output bitstream. Note that the nature of extraction of the MVs

and their usage depend on the picture type. The algorithm assumes the motion between the pictures is

uniform, such that the forward and the reverse MVs are images of each other; or an inter-frame MV is a

scaled version of a larger picture distance and so on. In case no MV is found, one might either use a (0, 0)

MV or in the worst-case encode the underlying macroblock using intra-frame coding. The incoming

motion parameters of a sub GOP of up to multiple frames can produce several candidate MVs for the

outgoing picture. All the MVs estimated are compared, and the one that gives the least coding error in

terms of sum of absolute differences (SAD) can be chosen. The best MV can then be refined to produce

near-optimum results.

Fig 14. Heterogenous video transcoder [16]

Analysis of current topic based on available literature

The main issues related to H.264 trancoding to/from other standards is due to the differences of H.264

from previous generation standards. VP6 has many features which are similar to H.264 (table 4).

One of the important aspects of H.264 is the use of the integer discrete cosine transform instead of the

DCT. The DCT based codecs have lower precision value and residual losses due to the loss of precision

to integer conversion. This has been overcome in H.264. VP6 also uses integer DCT like the H.264 [15]

(table 4). The main issue with selection of the block transform is the presence of 4x4 integer DCT in

H.264 vs 8x8 integer DCT in VP6.

In [24] a method for 8x8 DCT block conversion (from an MPEG-2 video stream) to 4x4 integer DCT

block used in H.264/AVC is proposed. Instead of using IDCT and DCT blocks in cascade, DCT

conversion can be obtained in DCT domain (fig 15). This could reduce the computational complexity

significantly as shown in table 5. A similar approach can be used in the current scenario to perform the

conversion in DCT domain itself. The conversion in [24] could be achieved as shown in figure 15.

Fig 15. DCT block conversion in DCT domain compared to a cascade pixel domain transcoder [24]

Table 5. Reduction in number of operations on using proposed method as shown in fig 15 [24]

M = multiplication operation; A = addition operation

The DCT conversion can be obtained in a multitude of steps as shown

Bi = Li * B * Ri B: 8 x 8 DCT Matrix; Bi: 4 x 4 Matrix; i = 0, 1, 2, 3

L0 = L1 = ( I4x4, 04x4 )4x8 L2 = L3 = ( 04x4, I4x4 )4x8

R0 = R2 = ( I4x4; 04x4 )8x4 R1 = R3 = ( 04x4; I4x4 )4x8

Using the distributive property of the DCT

If H is the matrix used for getting the integer DCT from DCT, we have

However to got our H.264 coefficients we need the modiefied H matrix – H’

For modified H matrix H’, we have

So the H.264 transform coefficients can be obtained as below

Thus obtained is the 4x4 integer DCT coefficient matrix used in H.264 standard from and 8x8 DCT.

A similar technique can be used to get 4x4 H.264 integer DCT from 8x8 VP6 integer DCT with slight

change.

Also the presence of deblocking filter in the H.264 is a common issue which is a considered in the various

transcoding techniques. VP6 also supports a deblocking filter [15]. So a comparative study of the

deblocking filters in H.264 and VP6 is required. The unavailability of the VP6 standard definition and

source code due to the licensing problem delays the study. The availability of the deblocking filter in

H.264 for VP6 transcoding will be investigated.

H.264 baseline profile does not support B frames. So absence of B frames in VP6 standard does not come

up as an issue as the present basis of study is the conversion of H.264 baseline profile to VP6 standard.

H.264 supports multiple reference frames whereas VP6 supports upto 2 reference frames [15]. It would be

interesting to study the reuse of the reference frames and selection of up to a maximum of 2 reference

frames. Research in [18] shows that the use of multiple reference frames and the use of quarter pel

accuracy achieve similar RD-results. It is observed that it is not necessary to use multiple reference

frames if quarter-pel accuracy interpolation is used.

Unlike other transforms and like H.264, VP6 also allows 1 and 4 motion vectors of upto quarter-pixel

resolution. However difference in block size and presence of a large number of block size combinations

makes it difficult to reuse the motion vectors. The techniques used in the [15] for H.263 to VP6

transcoding can be useful to search the motion vectors based on available motion vectors and thereby

enable complexity reduction. The dynamic window search technique and dynamic range search technique

used in [15] to reuse the MV information to encode VP6 is discussed earlier. The research described in

[19] and [20] also provides a basis of making decision on MB modes and motion vectors in the context of

the present problem. [20] explains block type conversion and motion vector mapping as shown in the next

section. It discusses the transcoding from H.264 to MPEG-4. A similar approach can be used in the

context of the current problem.

Block Type Conversion and Motion Vector Mapping

Performing brute-force ME and mode decision for each MB causes a transcoder to have high

computational complexity. To reduce this computational complexity, the incoming motion vectors are

used for motion vector mapping. In the given transcoder in [20], the MPEG-4 encoder utilizes the motion

vectors and MB information contained in each MB in the H.264 bitstream. Table 6 lists the MB modes in

H.264 and MPEG – 4 and how they are converted when a pixel domain cascade transcoder is used.

Table 6. MB mode conversions observed in cascaded pixel domain H.264 to MPEG-4 transcoder [20]

Fig 15. Block type conversion and motion vector mapping from H.264 to MPEG-4 [20]

This information is used to decide the MB mode conversion in [20]. Fig 15 shows conversion criteria

used and the conversion of MB modes from H.264 to MPEG – 4. Similar criteria for decision making can

be used in the proposed transcoder.

H.264 supports intraprediction as shown in figure 3, which however is not supported in VP6 like most

other transforms. According to the study by authors in [18] however, during intra-coding, the most

probable modes in H.264 are vertical, horizontal and dc. This information can be leveraged in designing

the transcoder.

The available references and study of various transcoding algorithms will help design the transcoder to

convert H.264 video to VP6 video.

With the license agreement being completed and the availability of the algorithm for VP6 codec,

comparison between H.264 and VP6 would be easier. A new transcoding algorithm can be proposed by

making use of the results available in the literature and making inferences to apply various techniques to

the present problem.

VP6 is a proprietary codec of On2 Technologies, Inc. It is licensed by Adobe Systems, Inc. for its

products Flash 8 and above versions. Multimedia Laboratory, Electrical Engineering Department,

University of Texas at Arlington is in the process of acquiring an evaluation license on VP6 from On2

Technologies, Inc for research on H.264 to VP6 transcoder.

References:

1. S. Kwon, A. Tamhankar and K. R. Rao, “Overview of H.264 / MPEG – 4 Part 10”, J VCIR, vol 17,

pp 186-216, April 2006

2. I. Richardson, V-Codex, “White Paper – An overview of H.264 Advanced Video Coding”,

www.vcodex.com, 2007

3. Apple Inc., “Technology Brief – Quicktime and MPEG-4”, http://www.apple.com, 2008

4. ITU-T Recommendation H.264 – Advanced Video Coding for Generic Audio-Visual services

5. G. J. Sullivan, P. Topiwala and A. Luthra, “The H.264/AVC Advanced Video Coding Standard:

Overview and Introduction to the Fidelity Range Extensions”, SPIE Conference on Applications of

Digital Image Processing XXVII, vol 5558, pp 53-74, Special Session on Advances in the New

Emerging Standard: H.264/AVC, August, 2004

6. I. Richardson, V-Codex, “White Paper - H.264 / MPEG-4 Part 10 : Transform & Quantization”, 2007,

www.vcodex.com.

7. I. Richardson, V-Codex, “White Paper - H.264 / MPEG-4 Part 10 : Inter Prediction”, 2007,

www.vcodex.com.

8. I. Richardson, V-Codex, “White Paper - H.264 / MPEG-4 Part 10 : Intra Prediction”, 2007,

www.vcodex.com.

9. I. Richardson, V-Codex, “White Paper - H.264 / MPEG-4 Part 10 : Intra Prediction – Loop Filter”,

2007, www.vcodex.com.

10. On2 Technologies, Inc., “White Paper – On2 VP6 for Flash 8 Video”, http://www.On2.com,

September 12, 2005

11. J. Emigh, “New Flash Player rises in the Web-Video Market” IEEE Computer 39, 14–16 (2006)

12. T. Uro, “The quest for a new video codec in Flash 8,” http://www.kaourantin.net/2005/08/quest-for-

new-videocodec-in-flash-8.html, August 13, 2005

13. A. Beach, Real World Video Compression, realworldvideocompression.com.

14. A. Hall, alexandtia.com.

15. C. Holder and H. Kalva, “H.263 to VP6 Video Transcoder”, SPIE, vol . 6822 (VCIP), pp 68222B-

68222B San Jose, CA , Jan . 2008

16. I. Ahmad, et al, “Video Transcoding: An Overview of Various Techniques and Research Issues”,

IEEE Transactions on Multimedia, vol 7, pp 793-804, October 2005

17. J. Xin, C. Lin and M. Sun, “Digital Video Transcoding”, Proceedings of the IEEE, Vol 93, pp 84-96,

January 2005

18. J. Bialkowski, M. Barkowsky and A. Koup, “Overview of Low-Complexity Video Transcoding from

H.263 to H.264”, IEEE Conference on Multimedia and Expo 2006, vol 9, pp 49-52, July 2006

19. S. Kim, J. Han and J. Kim, “Efficient Motion Estimation Algorithm for MPEG-4 to H.264

Transcoder”, IEEE Conference on Image Processing, ICIP 2005, vol 3, pp 656-659, September 2005

20. J. Hur and Y. Lee, “H.264 to MPEG-4 Transcoding using Block-Type Information”, IEEE Region 10

TENCON 2005, pp 1-6, November 2005

21. S. Eckart and C. Fogg, “ISO-IEC MPEG-2 software video codec”, SPIE Proceedings, vol. 2419, pp

100-109, Oct 2004

22. J. Loomis and M. Wasson, “VC-1 Technical Overview”,

http://www.microsoft.com/windows/windowsmedia/howto/articles/vc1techoverview.aspx, Microsoft

Corporation, Oct 2007

23. “Real Video 10 – Technical Overview, version 1.0”, Real Networks,

http://docs.real.com/docs/rn/rv10/RV10_Tech_Overview.pdf, 2003

http://www.vcodex.com/

http://www.microsoft.com/windows/windowsmedia/howto/articles/vc1techoverview.aspx

http://docs.real.com/docs/rn/rv10/RV10_Tech_Overview.pdf

24. J. Lee and K. Chung, “DCT Block Conversion for H.264/AVC Video Transcoding”, Euro-Par 2005,

LNCS 3648, pp 919-927, 2005

h.264 to vp6 transcoder - ut arlington – uta · frames, but also within a single frame, a...

Documents