h.264 to vp6 transcoder - ut arlington – uta · frames, but also within a single frame, a...
TRANSCRIPT
EE 5359
Multimedia Processing – Summer 2008
Interim Project Report
on
H.264 to VP6 Transcoder
Submitted by
Jay R. Padia
1000 60 5145
Date: July 17, 2008
Abstract
VP6 is a video coding standard developed by On2 Technologies, Inc. It is the preferred codec for
Macromedia Flash 8 video. VP6 assumes importance with Macromedia Flash emerging as a widely
adopted video streaming technology over the internet. H.264 is currently one of the most widely accepted
video coding standards in the industry. It enables high quality video at low bitrates. So there is increasing
importance of techniques which can convert video from H.264 to VP6 and thereby enable high quality
video transmission over the internet using Flash. The current research shows H.263 video which is a
previous generation standard of H.264 can be transcoded to VP6 and complexity can be reduced upto
50%. The similarities and dissimilarities between the two encoders are used to reduce the complexity
using Dynamic Search Range and Dynamic Search Window. The success in reducing complexity in the
H.263 to VP6 transcoder and the available reference material related to transcoding algorithms enables us
to propose a new study to find an algorithm for transcoding H.264 coding standard to VP6 coding
standard. It is proposed to explore the similarities and dissimilarities between the two standards to find the
right transcoding technique.
Importance of the H.264 Standard
H.264 [4] was proposed by the Joint Video Team (JVT) of the ITU-T Video Coding Experts Group
(VCEG) and ISO/IEC Moving Pictures Experts Group (MPEG) in 2003. It is currently one of the most
widely accepted industry standards. It can provide good quality video at substantially lower bitrates
compared to the previous standards. It also shows more error robustness [1] [2].
H.264 has a set of innovations which can together provide a vast improvement in performance over
previous generations of video codecs. MPEG-2 [21] was the most widely used video codec before the
emergence of H.264. H.264 provides the same quality as MPEG-2 at a third to half the data rate. At the
same data rate, H.264 can provide upto 4 times the frame size as can be seen in Table 1. H.264 provides
better image quality when reaching its limits. It does not break into blocks but degrades much more
smoothly, making the image softer as compression increases. H.264 is an emerging standard and over the
years it can see an improvement over the current performance. It can be expected of H.264 to improve
over the years, just as other standards have improved in quality and performance [3].
Table 1. H.264 data rate at various resolutions [3]
Overview of H.264 Standard
H.264 introduces many new features that are significantly different from the previous generation codecs.
These new features make it vastly different from the existing codecs and make it much more effective.
Given below is an overview of the features of H.264 video codec.
Profiles and levels
Like any comprehensive standard, the H.264 standard defines a set of profiles and levels to set points of
conformance for various classes of applications and services. In each profile, specific encoding tools are
permitted to best meet the needs of the intended scenario. H.264 includes six profiles as shown in figure 1
[4]:
• Baseline. Intended for low-complexity applications such as video conferencing and mobile multimedia.
• Main. Intended for the majority of general uses such as the Internet, mobile multimedia, and stored
content.
• Extended. Intended for streaming applications, where stream switching technologies can be beneficial.
• Three High profiles (also known as Fidelity Range Extension or FRExt). Consists of three separate
High profiles (High, High 10, and High 4:2:2), intended for high-end professional uses [3] [5].
Fig 1. H.264 profile levels [3]
4x4 integer transform.
H.264 is designed to operate on much smaller blocks of pixels than other common codecs, which
mitigates blocking, smearing, and ringing artifacts. So H.264 video is crystal clear even in areas of fine
detail. Because the transform is a precisely specified integer transform, it provides bit-precise
reconstruction (that is, exact-match decoding) rather than statistically generated reconstruction. As a
result, there can be no drift among various decoder implementations, so any compliant H.264 decoder will
decode the video exactly as the content author intended it to look [3] [6].
X = input matrix; CfXCf
T = core 2D transformation for X; Ef = matrix formed by scaling factors a, b, c
Increased precision in motion estimation.
H.264 also benefits from increased precision in motion estimation, which is the process of simplifying
redundant data across a series of frames. By expressing information to 1/4-pixel resolution (fig 2) as
opposed to 1/2-pixel resolution like most other codecs, H.264 represents both fast- and slow moving
scenes more precisely. So objects in motion are more crisply reconstructed during decode, providing a
better representation of the source material [7].
Fig 2. Motion vectors in H.264 [7]
Flexible block sizes in motion estimation.
During motion estimation, traditional codecs commonly process frames at the macroblock level (16 pixels
by 16 pixels). H.264 can process on segments within a macroblock, ranging in size from the commonly
used 16x16 to as small as 4x4 as shown in fig 3, which helps to code complex motion in areas of high
detail. The ability of H.264 to perform its processing on a variety of block sizes means that scenes with
complicated motion are more expressively described, providing higher quality in lower data rates [7].
Fig 3(a). Macroblock partitions – 16x16, 16x8, 8x16 & 8x8 [7]
Fig 3(b). Macroblock sub-partitions – 8x8, 8x4, 4x8 & 4x4 [7]
Intraframe prediction.
H.264 is able to gain much of its efficiency by simplifying redundant data not only across a series of
frames, but also within a single frame, a technique called intraframe prediction (figure 4). The H.264
encoder uses intraframe prediction with more ways to reference neighboring pixels, so it compresses
details and gradients better than previous codecs. Intraframe prediction is especially beneficial in high
motion areas, which are traditionally difficult to encode. With H.264, high-motion video can achieve
stunning quality at much lower data rates [3] [8].
Fig 4. 4x4 block intra prediction modes in H.264 [8]
Adaptively tuned deblocking filter.
H.264 also features a robust deblocking filter as observed in figure 5, which operates on 4x4 block
boundaries to remove jagged blocking artifacts. Its filtering is adaptively tuned per block boundary,
making it a very effective smoothing filter during the decoding of a finished bit stream. In addition to
making smoother pictures for display, this filter is used during the encoding process to provide a more
coherent reference picture for subsequent frames, which helps to improve image quality. This advanced
filter technology effectively eliminates blocking artifacts, resulting in a smooth, clean picture [9].
Fig 5(a). H.264 Encoder – Basic encoding structure
Fig 5(b). H.264 Decoder – Basic encoding structure
VP6 Coding standard
TrueMotion VP6 [10] is a new compression technology from On2 Technologies Inc. Macromedia has
licensed it for its Flash suite of products [12]. It features as the main codec for Flash 8 and onwards. It has
interesting features as it gives a very good quality at very high compression.
TrueMotion VP6 is among the best video codecs on the market today. It offers better image quality and
faster decoding performance than Windows Media 9 [22], Real 9 [23], H.264 [4], and QuickTime MPEG-
4 [10]. In internal testing at On2 Technologies Inc, TrueMotion VP6 could beat many H.264
implementations, Windows Media 9 and Real Networks 10 in PSNR comparisons using standard MPEG-
2 test source clips [10]. The VP6 clips were more detailed and contained fewer artifacts than Windows
Media 9 and maintained more texture and detail than Real or H.264 [10].
VP6.2, the latest version of TrueMotion VP6, features a drastic increase in performance from the previous
versions of VP6 [10].
Emerging Importance of VP6 Coding Standard
Flash Video is rapidly changing the landscape of video on the Web. It is emerging as the preferred
solution for providing video services online over Windows Media Player, Apple Quicktime and Real
Networks Real Player [11].
The advantages of Flash Player over its rivals are its small size and its completeness as a website
development package. Its ability to support multiple platforms has made it popular [11].
Macromedia adopted the VP6 coding standard from On2 Technologies, Inc. as the video coding standard
for its Flash player in 2005. It listed quality, portability, stability, low memory usage and performance as
the main criteria for selecting VP6 [12].
It can be observed that significant quality improvement can be obtained with VP6 in Flash 8 over the
Sorenson Spark codec (based on H.263) which was the basis of Flash MX video (as shown in fig 6). It
provides better performance with low contrast video images, removes color oversaturation and also
provides a smoother picture true to the original by removing blockiness in the old format [10].
Improvement in Performance on using VP6
Figure 6 compares the performance of Flash Video using VP6 with Flash MX, the older version which
used the Sorrenson Spark codec which was based on H.263.
The images in Figure 6 (with the exception of the cartoons) are excerpts from a 12:30 minute video of
coral reef exploration. The original source was shot on DVCAM and was stored using photo-JPEG
compression. The only tool used for compressing this video was Flix Professional, using default settings.
The file was preprocessed as follows: since the source was direct from a camera, the 720x486 DV source
needed to have some over-scan cropped out. It was also de-interlaced and sized to 320x240. All
preprocessing was performed in Flix Professional.
In all the comparisons listed, the image on the left side is from VP6 video.
Fig 6(a). Over-saturation of colors in MX (right). [10]
Fig 6(b). Blockiness can be observed in MX (right) [10]
Fig 6(c). Artificial details can be observed in MX (right) [10]
Fig 6(d). Block artifacts in presence of low contrast background. VP6 performs quite well here [10]
Fig 6(e). Absolute mess with MX (right) in low contrast images [10]
It can be observed that VP6 shows significant gains over the old Sorrenson Spark codec used in the Flash
MX. VP6 with all its advantages is finding a place in other applications too. Since then VP6 is gaining
importance as a coding standard.
This creates the need to find a transcoding technique to convert video from H.264 video coding standard
to VP6 video standard.
Comparison of H.264 and VP6
It would be most interesting to observe how VP6 would fare against H.264. A comparative study of
Hulu’s 360p (VP6 based) and 480p (H.264 based) was done (fig 7). The 360p content is VP6 at 700kbps
with a screen resolution of 480×360, while the 480p is H.264 at 1000kbps (or 1Mbps) with a resolution of
640×480. Some of the screenshots of the video played side by side is shown in figure 7.
Fig 7 (a). Comparison of Hulu’s 360p (VP6 based) and 480p (H.264 based) videos [13]
Fig 7(b). Comparison of Hulu’s 360p (VP6 based) and 480p (H.264 based) videos [13]
It can be observed that H.264 with its 480p resolution offers better quality than VP6 at 360p. But also can
be found that at lower resolution and much less bitrate VP6 does not lose any information in the images.
It also shows less blockiness. The color resolution on 480p outscores the lower resolution significantly.
Another observation on 5 second clip in Quicktime (H.264) 640 x 480 and Flash (VP6) 720 x 540 shows
that at similar resolutions, VP6 can give very high compression gains with insignificant loss in visual
quality. Snapshots from each of the clips are shown in figure 8. The size of the .flv clip (5s) is 610 kbytes
over the size of quicktime clip (5s) is 4223 kbytes [14].
It can be observed that VP6 gives significant compression gain at very less loss of visual quality, making
it an excellent choice for video streaming applications.
Fig 8(a). 720x540 flash clip – Significantly small in memory size [14]
Fig 8(b). 640x480 H.264 Clip on Quicktime [14]
Existing Research work
A transcoding technique to convert from the previous generation H.263 standard to VP6 standard has
been proposed [15]. The transcoder has been designed on the basis of the similarities and dissimilarities
between the two standards. Comparison can be found in table 2.
Table 2. Comparison of H.263 and VP6 features [15]
This research particularly holds importance considering the older standard – Sorrenson Spark codec used
in Flash MX was based on the H.263 standard. With the increasing importance of VP6 in streaming
media over the internet this algorithm assumes particular importance. This research also was important in
converting old Flash video formats into VP6 based new video formats. The transcoding algorithms reuse
the information from the H.263 decoding stage and accelerate the VP6 encoding stage. Experimental
results show that the proposed algorithms are able to reduce the encoding complexity by up to 52% while
reducing the PSNR by at most 0.42 dB in the worst case [15].
The goal is to effectively reuse the information gathered during the H.263 decoding stage and speed up
the VP6 encoding stage. The effectiveness of this reuse depends on the similarities and differences
between the input and output video formats. The differences in H.263 and VP6 make it complex to use
transform domain transcoding and pixel domain transcoding was employed by the authors [15].
Transcoder H.263 to VP6
VP6 is also a hybrid codec that uses motion compensated transform coding at its core. The codec has
Intra and Inter pictures similar to MPEG video codecs. Intra pictures are coded independent of other
coded pictures and Inter pictures use previously coded pictures for prediction. Motion compensation
supports 16x16 and 8x8 blocks similar to H.263 but the Inter 8x8 macro blocks can have mixed blocks;
i.e., one or more 8x8 blocks can be coded in Intra mode without using any prediction. The Inter MBs in
VP6 can be coded using 9 different modes. The modes are characterized by the number of motion vectors
(1 vs. 4), reference frame used, whether motion vectors are coded. Where motion vectors are not coded,
the motion vectors are predicted from previously decoded MBs. The VP6 codec uses 8x8 Integer DCT for
transform coding and de-blocking filter is applied at the block boundaries [15].
It can be observed that many features in VP6 are different from H.263 but are similar to H.264. A
comparison between the two standards is presented again later.
The similarities and differences between H.263 and VP6 provide opportunities for reusing H.263 MB
coding mode details for reducing the transcoder complexity. The fact that both H.263 and VP6 support 1
MV and 4 MV modes means that motion vectors can be reused to some extent. However, the fact that
VP6 supports large number of MB modes compared to H.263 means that the H.263 MB mode and motion
vectors cannot be used directly. The differences in the codecs meant that an Inter 16x16 MB in H.263 is
not necessarily coded as an Inter 16x16 MB. Table 3 shows the typical example of MB coding modes
when encoding H.263 decoder output using VP6. For this example, a Foreman video sequence at
352x288 resolution and 297 frames is encoded using H.263 at 384 Kbps and then transcoded to VP6
using full re-encoding at 291 Kbps. The full details of VP6 modes are not given here due to space
considerations. In brief, Nearest and Near MB modes do not code motion vectors and derive their MVs
from previously coded MBs; Golden frames are long term reference frames, and Inter 0,0 forces the use
of a 0,0 motion vector. Each row corresponds to a H.263 MB coding mode and the columns give the VP6
mode used to code those MBs. For example, of all the MBs that are coded as Inter 4V in H.263, 3% were
coded as Inter 0,0 mode, 1% coded as Intra, 30% coded as Inter+MV, 11% nearest, 7% near, and 47% are
coded as Inter 4V MBs. Thus, if an Inter 4V MB in H.263 is mapped to Inter 4V in VP6, it is likely to
map correctly only in 50% of the cases. Thus direct mode mapping will lead to poor results and more
efficient algorithms are necessary [15].
Table 3. MB mode mapping H.263 to VP6 in [15]
The large mismatch of MB coding modes will create poor RD performance if direct mapping of motion
vectors is used. In [15] the patterns which allow them to restrict H.263 modes are evaluated. Near and
Nearest are computationally inexpensive to evaluate and are allowed in all cases. Inter 4V, on the other
hand, takes significant computation and is evaluated only when input MB is also in the Inter 4V mode.
The transcoding algorithms thus reduce the complexity by placing constraints on MB modes evaluated
and further reduce the complexity by using:
1) Dynamic search range and 2) Dynamic search window.
Complexity Reduction Using Dynamic Search Range
The dynamic search range approach sets the search range used for motion estimation for each MB.
Typically this range is fixed throughout the encoding process and is set to 15 in the experiments. With the
knowledge of motion vectors in H.263, the search range no longer has to be fixed. The search range is
changed based on the maximum motion vector component for the current MB. Figure 9 shows the
dynamic search range selection based on H.263 motion vectors. The RD performance is compared to the
baseline transcoder. The results for three of the sequences evaluated are shown and the performance of
the algorithm closely tracks the RD performance of the baseline transcoder. The PSNR drop is higher for
the Stefan sequence because of large motion in the sequence [15].
Fig 9(a). Dynamic Search Range [15] Fig 9(b). Dyanamic Search Window [15]
Complexity Reduction Using Dynamic Search Window
Using a dynamic refinement window further reduces the complexity by reusing the H.263 motion vectors.
Unlike the dynamic search range method where window location is fixed and the window size or search
range is varied, the dynamic search window approach uses the H.263 motion vectors to determine the
position of the fixed sized window. Window sizes of 1x1 and 3x3 for the new motion vector search were
evaluated by the authors (fig 9(b)). This approach reduced the complexity more than the dynamic range
approach due to an even smaller search space. This reduction in complexity comes at a slight increase in
PSNR loss. Figure 9(b) shows the dynamic window derived based on the H.263 motion vectors of a MB.
Figure 10(b) shows a RD plot comparing the dynamic window approach to the baseline approach [15].
In [15] the TMN 3.2 H.263 encoder from University of British Columbia which is based on Telenor's
H.263 implementation was used. The input video is coded at 384 Kbps in baseline profile with advanced
motion options and one I frame (first frame). A decoder based on the same H.263 implementation is used
in the decoding stage of the transcoder. The VP6 encoding stage is based on the optimized VP6 encoder
software provided by On2 Technologies. The VP6 video is encoded with I frame frequency of 120 and at
multiple bitrates to assess the RD performance of the transcoder. The results are compared with the
baseline transcoder that performs full encoding in the VP6 stage.
Fig 10(a). RD performance -Dynamic Search [15]
Fig 10(b). RD performance - Dynamic Window [15]
The results show that the proposed transcoder is able to reduce the complexity by more than 50% without
a significant loss in PSNR. Given that the VP6 implementation used is highly optimized, the resulting
savings of 50% is considered significant. Transcoders based on this approach will be able to transcode at
least 50% more streams for the same hardware configuration.
Comparison of H.264 with the current research work
The authors in [15] show a comparison between H.263 baseline profile and VP6 codec. The similarities
and dissimilarities in the two codecs help design the right transcoder for the application.
On the same lines, a similar comparison is provided in Table 4. Its compares the VP6 features with H.264
baseline features. Certain features in H.264 which are available in Main and High profiles of H.264 are
not included here. It can be observed that there are a lot of similarities between the VP6 and H.264
baseline profile, especially in the features where H.264 differs with other codecs. VP6 supports the use of
integer DCT. It also has deblocking filter like H.264 and supports ¼ pixel accuracy in the motion vectors.
Feature H.263 Baseline VP6 H.264 Baseline
Picture type I, P I, P I, P
Transform Size 8x8 8x8 4x4
Transform DCT Integer DCT Integer DCT
Intra Prediction None None Yes
Motion Compensation
Block Size
16x16, 8x8 16x16, 8x8 16x16, 16x8, 8x16, 8x8, 8x4,
4x8, 4x4
Total MB Modes 4 10 7 inter + (9 + 4) intra
Motion Vectors ½ pixel ¼ pixel ¼ pixel
Deblocking filter None Yes Yes
Reference Frames 1 Max 2 Multiple
Table 4. Comparison of features in H.263 Baseline profile, VP6 and H.264 Baseline profile
Various Transcoding techniques and their applications in H.264 transcoding
A review paper on various techniques and research issues (fig 11) [16] involved in video transcoding
compares the Open-Loop and Closed Loop Transcoder architectures.
Fig 11 . Selection of transcoding function for various applications [16]
Open-Loop Transcoding architecture
Open-Loop transcoding Architecture is the most straightforward transcoding architecture. Here a decoder
and encoder are directly cascaded as shown in figure 12(a). The incoming video stream is fully decoded
and re-encoded into target video with desired bit rate or format. So we find little degradation in visual
quality due to transcoding. However here, decoding of a transcoded video would result in errors if the
predictors of the decoder are different from those in the original encoder. These errors would accumulate
through the whole group of pictures (GOP). The error accumulation resulting from encoder / decoder
predictor mismatch is called ‘drift’ error.
Open loop transcoders contain no feedback loop in the transcoding architecture for compensating the drift
error. Closed-Loop transcoders contain a feedback loop in the transcoding architecture in order to correct
the transcoding distortion by compensating the drift in the transcoder [16] [17].
Fig 12(a). Cascaded decoder and encoder transcoder [16]
Fig 12(b). Cascaded decoder and encoder transcoder [16]
Hybrid Domain Closed-Loop Transcoding Architecture
Various transcoding algorithms provide tradeoff between the computational complexity and reconstructed
video quality. In order to reduce the computational complexity while maintaining the reconstructed video
quality, ME should be omitted and DCT/IDCT should be avoided if possible. One of the architecture uses
MC for P frames only. I frames are intra coded, which need no ME and MC, and thus, IDCT/DCT for I
frames can be omitted in principle. But since I frames are the anchors for subsequent P and B frames, the
IDCT at the decoder stage, inverse quantization and IDCT at the encoder stage for I frames are still
needed to reconstruct the reference frames, while DCT at the encoder stage can be omitted. Since P
frames are also the anchors for the following P and B frames, MC, DCT, and IDCT cannot be omitted.
Transcoding delay can be further reduced without degrading the video quality in this architecture. P
frames with frequent scene changes and rapid motion may contain a large number of INTRA blocks. One
can further omit the IDCT/DCT and MC operation of these INTRA blocks in P frames. In other words,
blocks of I and B pictures and INTRA blocks of P pictures are transcoded in frequency-domain, the
spatial-domain motion compensation is done only when the block is inter block in P frames. This
transcoding architecture is known as hybrid domain transcoding architecture (HDTA), as shown in Fig.
13.
Heterogenous Transcoder
A heterogenous transcoder provides conversion between various standards (fig 14).
A heterogeneous transcoder needs a syntax conversion module, and may change the picture type, picture
resolution, directionality of MVs, and picture rate. A heterogeneous transcoder must adjust the features of
the incoming video to enable the features of the outgoing video. Due to spatial-temporal subsampling,
and different encoding format of the output sequence, the encoder and decoder motion compensation
loops in a heterogeneous transcoder are more complex [17].
Fig 13. Hybrid domain closed-loop transcoder [16]
Generic Heterogeneous Transcoder
A generic heterogeneous transcoder is shown in Fig 14. In this architecture, syntax conversion (SC) is
needed to convert the syntax of source video to that of the target video. A higher resolution decoder
decodes the incoming bitstream. The extracted MVs are then post-processed according to the desired
output encoding structure, and if required, they are properly scaled down to suit the lower spatial-
temporal resolution encoder. In case post-processing is not sufficient, the extracted MVs are refined to
improve the encoding efficiency. The decoded pictures are accordingly down-sampled spatially or
temporally, and the down-sampled images are encoded with the new MVs. Since the incoming MVs are
re-employed and other encoding decisions, such as macroblock types can be extracted from the incoming
bitstream, the architecture of this transcoder can be further simplified. In this architecture, the MVs of the
incoming bitstream are employed in the outgoing one; the extracted MVs have to be converted to be
compatible with the encoding nature of the output bitstream. Note that the nature of extraction of the MVs
and their usage depend on the picture type. The algorithm assumes the motion between the pictures is
uniform, such that the forward and the reverse MVs are images of each other; or an inter-frame MV is a
scaled version of a larger picture distance and so on. In case no MV is found, one might either use a (0, 0)
MV or in the worst-case encode the underlying macroblock using intra-frame coding. The incoming
motion parameters of a sub GOP of up to multiple frames can produce several candidate MVs for the
outgoing picture. All the MVs estimated are compared, and the one that gives the least coding error in
terms of sum of absolute differences (SAD) can be chosen. The best MV can then be refined to produce
near-optimum results.
Fig 14. Heterogenous video transcoder [16]
Analysis of current topic based on available literature
The main issues related to H.264 trancoding to/from other standards is due to the differences of H.264
from previous generation standards. VP6 has many features which are similar to H.264 (table 4).
One of the important aspects of H.264 is the use of the integer discrete cosine transform instead of the
DCT. The DCT based codecs have lower precision value and residual losses due to the loss of precision
to integer conversion. This has been overcome in H.264. VP6 also uses integer DCT like the H.264 [15]
(table 4). The main issue with selection of the block transform is the presence of 4x4 integer DCT in
H.264 vs 8x8 integer DCT in VP6.
In [24] a method for 8x8 DCT block conversion (from an MPEG-2 video stream) to 4x4 integer DCT
block used in H.264/AVC is proposed. Instead of using IDCT and DCT blocks in cascade, DCT
conversion can be obtained in DCT domain (fig 15). This could reduce the computational complexity
significantly as shown in table 5. A similar approach can be used in the current scenario to perform the
conversion in DCT domain itself. The conversion in [24] could be achieved as shown in figure 15.
Fig 15. DCT block conversion in DCT domain compared to a cascade pixel domain transcoder [24]
Table 5. Reduction in number of operations on using proposed method as shown in fig 15 [24]
M = multiplication operation; A = addition operation
The DCT conversion can be obtained in a multitude of steps as shown
Bi = Li * B * Ri B: 8 x 8 DCT Matrix; Bi: 4 x 4 Matrix; i = 0, 1, 2, 3
L0 = L1 = ( I4x4, 04x4 )4x8 L2 = L3 = ( 04x4, I4x4 )4x8
R0 = R2 = ( I4x4; 04x4 )8x4 R1 = R3 = ( 04x4; I4x4 )4x8
Using the distributive property of the DCT
If H is the matrix used for getting the integer DCT from DCT, we have
However to got our H.264 coefficients we need the modiefied H matrix – H’
For modified H matrix H’, we have
So the H.264 transform coefficients can be obtained as below
Thus obtained is the 4x4 integer DCT coefficient matrix used in H.264 standard from and 8x8 DCT.
A similar technique can be used to get 4x4 H.264 integer DCT from 8x8 VP6 integer DCT with slight
change.
Also the presence of deblocking filter in the H.264 is a common issue which is a considered in the various
transcoding techniques. VP6 also supports a deblocking filter [15]. So a comparative study of the
deblocking filters in H.264 and VP6 is required. The unavailability of the VP6 standard definition and
source code due to the licensing problem delays the study. The availability of the deblocking filter in
H.264 for VP6 transcoding will be investigated.
H.264 baseline profile does not support B frames. So absence of B frames in VP6 standard does not come
up as an issue as the present basis of study is the conversion of H.264 baseline profile to VP6 standard.
H.264 supports multiple reference frames whereas VP6 supports upto 2 reference frames [15]. It would be
interesting to study the reuse of the reference frames and selection of up to a maximum of 2 reference
frames. Research in [18] shows that the use of multiple reference frames and the use of quarter pel
accuracy achieve similar RD-results. It is observed that it is not necessary to use multiple reference
frames if quarter-pel accuracy interpolation is used.
Unlike other transforms and like H.264, VP6 also allows 1 and 4 motion vectors of upto quarter-pixel
resolution. However difference in block size and presence of a large number of block size combinations
makes it difficult to reuse the motion vectors. The techniques used in the [15] for H.263 to VP6
transcoding can be useful to search the motion vectors based on available motion vectors and thereby
enable complexity reduction. The dynamic window search technique and dynamic range search technique
used in [15] to reuse the MV information to encode VP6 is discussed earlier. The research described in
[19] and [20] also provides a basis of making decision on MB modes and motion vectors in the context of
the present problem. [20] explains block type conversion and motion vector mapping as shown in the next
section. It discusses the transcoding from H.264 to MPEG-4. A similar approach can be used in the
context of the current problem.
Block Type Conversion and Motion Vector Mapping
Performing brute-force ME and mode decision for each MB causes a transcoder to have high
computational complexity. To reduce this computational complexity, the incoming motion vectors are
used for motion vector mapping. In the given transcoder in [20], the MPEG-4 encoder utilizes the motion
vectors and MB information contained in each MB in the H.264 bitstream. Table 6 lists the MB modes in
H.264 and MPEG – 4 and how they are converted when a pixel domain cascade transcoder is used.
Table 6. MB mode conversions observed in cascaded pixel domain H.264 to MPEG-4 transcoder [20]
Fig 15. Block type conversion and motion vector mapping from H.264 to MPEG-4 [20]
This information is used to decide the MB mode conversion in [20]. Fig 15 shows conversion criteria
used and the conversion of MB modes from H.264 to MPEG – 4. Similar criteria for decision making can
be used in the proposed transcoder.
H.264 supports intraprediction as shown in figure 3, which however is not supported in VP6 like most
other transforms. According to the study by authors in [18] however, during intra-coding, the most
probable modes in H.264 are vertical, horizontal and dc. This information can be leveraged in designing
the transcoder.
The available references and study of various transcoding algorithms will help design the transcoder to
convert H.264 video to VP6 video.
With the license agreement being completed and the availability of the algorithm for VP6 codec,
comparison between H.264 and VP6 would be easier. A new transcoding algorithm can be proposed by
making use of the results available in the literature and making inferences to apply various techniques to
the present problem.
VP6 is a proprietary codec of On2 Technologies, Inc. It is licensed by Adobe Systems, Inc. for its
products Flash 8 and above versions. Multimedia Laboratory, Electrical Engineering Department,
University of Texas at Arlington is in the process of acquiring an evaluation license on VP6 from On2
Technologies, Inc for research on H.264 to VP6 transcoder.
References:
1. S. Kwon, A. Tamhankar and K. R. Rao, “Overview of H.264 / MPEG – 4 Part 10”, J VCIR, vol 17,
pp 186-216, April 2006
2. I. Richardson, V-Codex, “White Paper – An overview of H.264 Advanced Video Coding”,
www.vcodex.com, 2007
3. Apple Inc., “Technology Brief – Quicktime and MPEG-4”, http://www.apple.com, 2008
4. ITU-T Recommendation H.264 – Advanced Video Coding for Generic Audio-Visual services
5. G. J. Sullivan, P. Topiwala and A. Luthra, “The H.264/AVC Advanced Video Coding Standard:
Overview and Introduction to the Fidelity Range Extensions”, SPIE Conference on Applications of
Digital Image Processing XXVII, vol 5558, pp 53-74, Special Session on Advances in the New
Emerging Standard: H.264/AVC, August, 2004
6. I. Richardson, V-Codex, “White Paper - H.264 / MPEG-4 Part 10 : Transform & Quantization”, 2007,
www.vcodex.com.
7. I. Richardson, V-Codex, “White Paper - H.264 / MPEG-4 Part 10 : Inter Prediction”, 2007,
www.vcodex.com.
8. I. Richardson, V-Codex, “White Paper - H.264 / MPEG-4 Part 10 : Intra Prediction”, 2007,
www.vcodex.com.
9. I. Richardson, V-Codex, “White Paper - H.264 / MPEG-4 Part 10 : Intra Prediction – Loop Filter”,
2007, www.vcodex.com.
10. On2 Technologies, Inc., “White Paper – On2 VP6 for Flash 8 Video”, http://www.On2.com,
September 12, 2005
11. J. Emigh, “New Flash Player rises in the Web-Video Market” IEEE Computer 39, 14–16 (2006)
12. T. Uro, “The quest for a new video codec in Flash 8,” http://www.kaourantin.net/2005/08/quest-for-
new-videocodec-in-flash-8.html, August 13, 2005
13. A. Beach, Real World Video Compression, realworldvideocompression.com.
14. A. Hall, alexandtia.com.
15. C. Holder and H. Kalva, “H.263 to VP6 Video Transcoder”, SPIE, vol . 6822 (VCIP), pp 68222B-
68222B San Jose, CA , Jan . 2008
16. I. Ahmad, et al, “Video Transcoding: An Overview of Various Techniques and Research Issues”,
IEEE Transactions on Multimedia, vol 7, pp 793-804, October 2005
17. J. Xin, C. Lin and M. Sun, “Digital Video Transcoding”, Proceedings of the IEEE, Vol 93, pp 84-96,
January 2005
18. J. Bialkowski, M. Barkowsky and A. Koup, “Overview of Low-Complexity Video Transcoding from
H.263 to H.264”, IEEE Conference on Multimedia and Expo 2006, vol 9, pp 49-52, July 2006
19. S. Kim, J. Han and J. Kim, “Efficient Motion Estimation Algorithm for MPEG-4 to H.264
Transcoder”, IEEE Conference on Image Processing, ICIP 2005, vol 3, pp 656-659, September 2005
20. J. Hur and Y. Lee, “H.264 to MPEG-4 Transcoding using Block-Type Information”, IEEE Region 10
TENCON 2005, pp 1-6, November 2005
21. S. Eckart and C. Fogg, “ISO-IEC MPEG-2 software video codec”, SPIE Proceedings, vol. 2419, pp
100-109, Oct 2004
22. J. Loomis and M. Wasson, “VC-1 Technical Overview”,
http://www.microsoft.com/windows/windowsmedia/howto/articles/vc1techoverview.aspx, Microsoft
Corporation, Oct 2007
23. “Real Video 10 – Technical Overview, version 1.0”, Real Networks,
http://docs.real.com/docs/rn/rv10/RV10_Tech_Overview.pdf, 2003
24. J. Lee and K. Chung, “DCT Block Conversion for H.264/AVC Video Transcoding”, Euro-Par 2005,
LNCS 3648, pp 919-927, 2005