us20050207495
TRANSCRIPT
US 20050207495A1
(12) Patent Application Publication (10) Pub. No.: US 2005/0207495 A1 (19) United States
Ramasastry et al. (43) Pub. Date: Sep. 22, 2005
(54) METHODS AND APPARATUSES FOR COMPRESSING DIGITAL IMAGE DATA WITH MOTION PREDICTION
(76) Inventors: J ayaram Ramasastry, Woodinville, CA (US); Partho Choudhury, Maharashtra (IN); Ramesh Prasad, Maharashtra (IN)
Correspondence Address: BLAKELY SOKOLOFF TAYLOR & ZAFMAN 12400 WILSHIRE BOULEVARD SEVENTH FLOOR LOS ANGELES, CA 90025-1030 (US)
(21) Appl. No.: 11/076,746
(22) Filed: Mar. 9, 2005
Related US. Application Data
(60) Provisional application No. 60/552,153, ?led on Mar. 10, 2004. Provisional application No. 60/552,356, ?led on Mar. 10, 2004. Provisional application No. 60/552,270, ?led on Mar. 10, 2004.
Publication Classi?cation
(51) Int. Cl? ..................................................... ..H04N 7/12 (52) Us. 01. ...... .. 375/240.16; 375/240.19; 375/240.12;
375/240.11; 375/240.24
(57) ABSTRACT
Methods and apparatuses for compressing digital image data With motion prediction are described herein. In one embodi ment, for each tWo consecutive frames of an image sequence, a motion prediction is performed betWeen the consecutive frames by tracking motion on a luminance map of the frames to generate motion prediction information for the luminance component. The motion prediction informa tion of the luminance component is then applied to the chrominance maps. In response to the motion prediction, the Wavelet coef?cients of each frame and the motion prediction information are encoded into a bit stream based on a target transmission rate, Where the encoded Wavelet coefficients satisfy a predetermined threshold according to a predeter mined algorithm. Other methods and apparatuses are also described.
Encoder D3,"? Acqulsmon / ° 6 I vs“
Optional Decoder
Server / 0 I
Network (13.9., wired and/or Wireless)
[01 /
Optimal Encoder
Client /a]
Patent Application Publication Sep. 22, 2005 Sheet 1 0f 18 US 2005/0207495 A1
& u \ E26
Q
WSQMQ 582w m 3:250 Qe
Patent Application Publication Sep. 22, 2005 Sheet 3 0f 18 US 2005/0207495 A1
Physical Layer (W-CDMA, CDMA 1.x. cdma2000, GSM-GPRS, UMTS, iBen) (1)
Data Link Control (DLC) (2)
Streaming piolocol stack (RTP. RTSP. RTCP.
so") (4) Third party ISO proiocoi 5m (TCP/lP/UDP) (3)
Billing and other ancillary services (5)
Network Aware Layer (NAL) (6)
Application Layer APIs ior QwikSUeam'". QNikVu’" and Qwiklex'" (7)
Content Generation Engine (8)
Data Repository (9)
Fig. 3
Patent Application Publication Sep. 22, 2005 Sheet 4 0f 18 US 2005/0207495 A1
l
l
| Raw YUV color frame data ‘ 4L0 o
Wavelet Transfonn ?lter bank
#07.
Source Encoder (ARIES)
1H3
Channel encoding (Tree partitioning, CRC, RCPC)
M
Compressed File (e.g., .qvx ?le)
Fig. 4A
Patent Application Publication Sep. 22, 2005 Sheet 5 0f 18 US 2005/0207495 A1
Compressed Image (.qvx ?le format) 4;
Channel decoding (Tree merging. CRC, RCPC)
Source Decoder (l-ARIES)
Inverse Wavelet Transfonn
Raw YUV data
Fig. 4B
Patent Application Publication Sep. 22, 2005 Sheet 6 0f 18 US 2005/0207495 A1
Perform a wavelet transformation on each image pixel to _ transform the pixel into one or more coefficients in one or
more wavelet maps.
Encode each wavelet map by representing the signi?cance, sign and bit plane infomiation of the pixel using a single bit
in a bit stream. A, 90 L
Encode the signi?cant bits into a context variable dependent upon the information represented by the bit and
its location of the coefficient being coded (e.g.. the probability of occurrence of a predetermined set of bits
I immediately preceding the current bit). A’ $"o3 l
Transmit the content of the context variable as a bit stream as an output representing the encoded pixels.
~ 5-04.
Patent Application Publication
Sub-tree 1 (HL)
Sep. 22, 2005 Sheet 7 0f 18
m
W
Fig. 6
US 2005/0207495 A1
Sub-tree 3 (HH)
Patent Application Publication Sep. 22, 2005 Sheet 9 0f 18 US 2005/0207495 A1
1 Determine a number of iterations (nl) based on a number ot|
quantization levels, which may be determined on the Z 9 ’ largest wavelet coef?cient, and set an initial quantization /
threshold T = 2 "’ l“ g’ l
l Populate all insigni?cant pixels in IPQ. all insigni?cant pixel having descendants in ISO, and all signi?cant pixels in
SPQ. A K a L
For each type I entry of ISO, if the entry is signi?cant with respect to a current quantization threshold, remove the respective entry from ISO and append it in the SPQ
l“ Yo I
l For each type I entry of ISO, if the entry is insignificant with respect to a current quantization threshold, remove the respective entry from ISO and append it in the lPQ
l It the respective type t entry includes descendants, remove the entry from the ISO and append it at the end of ISO as
type II entry for next iteration; otherwise, the entry is purged. ~ g’ r
l For each type II entry of ISO, if the entry is signi?cant with respect to a current quantization threshold, all offspring of the current lSQ entry are appended to the end of ISO as
type I entries for next iteration. I» Z,‘
l Remove any entry in lPQ that is signi?cant with respect to the current quantization threshold and append it in the
'xyolk
Patent Application Publication Sep. 22, 2005 Sheet 10 0f 18 US 2005/0207495 A1
l-ARIES llil
Raw YUV color frame data
‘ I
1
Wavelet Transform ?lter bank I
b u
f r\ MEIMC' f
f e l/ 2 2 ' I 2
35 —l CABAC ooded l ‘n motion l
information
I.
Source Encoder (ARIES llll)
' I
Fig'. 9A
Channel encoding (Tree paniiioning, CRC, RCPC)
compressed file I
Optional
Streaming data
Patent Application Publication Sep. 22, 2005 Sheet 11 0f 18
1
Raw YUV color frame data
Bynau [or i trill
t
Inverse Discrete Wavelet Transform (I-DWT)
I‘ l-ARIES 1m }
US 2005/0207495 A1
‘/ ME/MC' ’ Bypass ME/MC' for ———~-——— I ltrames
T I I
CABAC I coded motion
infon-nation I
l
L
Discrete Wavctet TranstorrMDWT)
I
l I
Source Encoder (ARIES l/II)
Fig. 9B
5 lm Channel encoding (Tree partitioning, CRC, RCPC)
V Compressed File
Streaming data
Patent Application Publication Sep. 22, 2005 Sheet 12 0f 18 US 2005/0207495 A1
Streaming data
I I Optional l I _ o . .___.4_i._.>_aiier t
‘ Compressed Video (.qsx ?le I fon'nat)
CABAC coded I motion
information
Channel decoding (Tree merging. CRC, RCPC)
Source Decoder (l-ARIES llll)
K Bypass MC‘ for l
Frame Buffer ) MC‘ frames
Inverse Wavelet Transform
L RawYUVdata
Patent Application Publication Sep. 22, 2005 Sheet 13 0f 18 US 2005/0207495 A1
, Streaming data
I Compressed Video (.qsx ?le format) I
CABAC coded I motion I
information I
Channel decoding (Tree merging, CRC, RCPC)
E & Source Decoder (l-ARIES II")
I l _.
a ....... a
Frame Buffer % MC‘ frames
[ Inverse Wavelet Transform
I Raw YUV data Fig. 108
Patent Application Publication Sep. 22, 2005 Sheet 14 0f 18 US 2005/0207495 A1
Identify a reference frame (e.g._ the ?rst frame or an I- I I 9 a frame) /
Ana‘
i Perform a MEJMC on the coarsest subbands as parent subbands of a current frame other than the i-frame with
respect to the identi?ed reference frame to generate one or more motion vectors for the coarsest subbands.
~Ho1
Estimate the spatial shifting of pixels of child subbands using the motion vectors of the parent subbands to determine a search area of the child subbands.
l Perform a ME/MC for the child subbands to deten'nine the
motion vectors of the child subbands.
AIlla?
More child subbands?
I q Perform compression on the predicted/compensated data
into compressed data (e.g., see, Figs. 5 and 8) M! a;
Fig. 11
Patent Application Publication Sep. 22, 2005 Sheet 15 0f 18 US 2005/0207495 A1
A8 + 02 w 02 2 V “4 v _v_ H
0 V o v8 \\ A1 " v/
M .. 0 V2 02 * v i 2 *
= 2 04 04 A / ‘Qt.
Fig. 12 //////////////
'
Ill/Ill; / r Z21
k m B e m m f M k=leve| of sub band o=orientation (LL, HL,
LH HH)
Boundary of the _ - - .- Search Area for
re?nement MVs
Re?nement Vector for level k
orientation 0 Block Neighborhood é MOIIOI'I
Vector
Sep. 22, 2005 Sheet 16 0f 18 US 2005/0207495 A1
3 1 m. F
Integer Motion Prediction
//////// an”
a
T.
,
r.
2m "VA / wank 4% mi. ,
/////////
2);)»,
1 . , W
Patent Application Publication
Patent Application Publication Sep. 22, 2005 Sheet 17 0f 18 US 2005/0207495 A1
/////// Integer Motion Prediction
HaIf-Pel Motion
/ Prediction
Fig. 14
Patent Application Publication Sep. 22, 2005 Sheet 18 0f 18 US 2005/0207495 A1
Block currently being tested
~95 I Matching
block
‘W1 22:: V. ____ ‘2 i being tested is in
1MV mode
> Motion Vector (identical colors _ _) denote MVs of the same block)
Displaced MV to translate matching block to the relative
- -> position of macroblock currently being tested
Fig. 15
current block being tested is in 4MV mode
I
US 2005/0207495 A1
METHODS AND APPARATUSES FOR COMPRESSING DIGITAL IMAGE DATA WITH
MOTION PREDICTION
[0001] This application claims the bene?t of US. Provi sional Application No. 60/552,153, ?led Mar. 10, 2004, US. Provisional Application No. 60/552,356, ?led Mar. 10, 2004, and Us. Provisional Application No. 60/552,270, ?led Mar. 10, 2004. The above-identi?ed applications are hereby incorporated by references.
FIELD OF THE INVENTION
[0002] The present invention relates generally to multi media applications. More particularly, this invention relates to compressing digital image data With motion prediction.
BACKGROUND OF THE INVENTION
[0003] A variety of systems have been developed for the encoding and decoding of audio/video data for transmission over Wireline and/or Wireless communication systems over the past decade. Most systems in this category employ standard compression/transmission techniques, such as, for example, the ITU-T Rec. H.264 (also referred to as H.264) and ISO/IEC Rec. 14496-10 AVC (also referred to as MPEG-4) standards. HoWever, due to their inherent gener ality, they lack the speci?c qualities needed for seamless implementation on loW poWer, loW complexity systems (such as hand held devices including, but not restricted to, personal digital assistants and smart phones) over noisy, loW bit rate Wireless channels.
[0004] Due to the likely business models rapidly emerging in the Wireless market, in Which cost incurred by the consumer is directly proportional to the actual volume of transmitted data, and also due to the limited bandWidth, processing capability, storage capacity and battery poWer, ef?ciency and speed in compression of audio/video data to be transmitted is a major factor in the eventual success of any such multimedia content delivery system. Most systems in use today are retro?tted versions of identical systems used on higher end desktop Workstations. Unlike desktop sys tems, Where error control is not a critical issue due to the inherent reliability of cable LAN/WAN data transmission, and bandWidth may be assumed to be almost unlimited, transmission over limited capacity Wireless netWorks require integration of such systems that may leverage suitable processing and error-control technologies to achieve the level of ?delity expected of a commercially viable multi media compression and transmission system.
[0005] Conventional video compression engines, or codecs, can be broadly classi?ed into tWo broad categories. One class of coding strategies, knoWn as a doWnload-and play (D&P) pro?le, not only requires the entire ?le to be doWnloaded onto the local memory before playback, leading to a large latency time (depending on the available band Width and the actual ?le siZe), but also makes stringent demands on the amount of buffer memory to be made available for the doWnloaded payload. Even With the more sophisticated streaming pro?le, the current physical limita tions on current generation transmission equipment at the physical layer force service providers to incorporate a pseudo-streaming capability, Which requires an initial period of latency (at the beginning of transmission), and continuous buffering henceforth, Which imposes a strain on the limited
Sep. 22, 2005
processing capabilities of the hand-held processor. Most commercial compression solutions in the market today do not possess a progressive transmission capability, Which means that transmission is possible only until the last integral frame, packet or bit before bandWidth drops beloW the minimum threshold. In case of video codecs, if the connection breaks before the transmission of the current frame, this frame is lost forever.
[0006] Another draWback in conventional video compres sion codes is the introduction of blocking artifacts due to the block-based coding schemes used in most codecs. Apart from the degradation in subjective visual quality, such systems suffer from poor performance due to bottlenecks introduced by the additional de-blocking ?lters. Yet another draWback is that, due to the limitations in the Word siZe of the computing platform, the coded coef?cients are truncated to an approximate value. This is especially prominent along object boundaries, Where Gibbs’ phenomenon leads to the generation of a visual phenomenon knoWn as mosquito noise. Due to this, the blurring along the object boundaries becomes more prominent, leading to degradation in overall frame quality.
[0007] Additionally, the local nature of motion prediction in some codes introduces motion-induced artifacts, Which cannot be easily smoothened by a simple ?ltering operation. Such problems arise especially in cases of fast motion clips and systems Where the frame rate is beloW that of natural video (e.g., 25 or 30 fps non-interlaced video). In either case, the temporal redundancy betWeen tWo consecutive frames is extremely loW (since much of the motion is lost in betWeen the frames itself), leading to poorer tracking of the motion across frames. This effect is cumulative in nature, especially for a longer group of frames (GoF).
[0008] Furthermore, mobile end-user devices are con strained by loW processing poWer and storage capacity. Due to the limitations on the silicon footprint, most mobile and hand-held systems in the market have to time-share the resources of the central processing unit (microcontroller or RISC/CISC processor) to perform all its DSP, control and communication tasks, With little or no provisions for a dedicated processor to take the video/audio processing load off the central processor. Moreover, most general-purpose central processors lack the unique architecture needed for optimal DSP performance. Therefore, a mobile video-codec design must have minimal client-end complexity While maintaining consistency on the ef?ciency and robustness front.
SUMMARY OF THE INVENTION
[0009] Methods and apparatuses for compressing digital image data With motion prediction are described herein. In one embodiment, for each tWo consecutive frames of an image sequence, a motion prediction is performed betWeen the consecutive frames by tracking motion on a luminance map of the frames to generate motion prediction information for the luminance component. The motion prediction infor mation of the luminance component is then applied to the chrominance maps. In response to the motion prediction, the Wavelet coef?cients of each frame and the motion prediction information are encoded into a bit stream based on a target transmission rate, Where the encoded Wavelet coefficients satisfy a predetermined threshold according to a predeter mined algorithm.