us20050207495

US 20050207495A1

(12) Patent Application Publication (10) Pub. No.: US 2005/0207495 A1 (19) United States

Ramasastry et al. (43) Pub. Date: Sep. 22, 2005

(54) METHODS AND APPARATUSES FOR COMPRESSING DIGITAL IMAGE DATA WITH MOTION PREDICTION

(76) Inventors: J ayaram Ramasastry, Woodinville, CA (US); Partho Choudhury, Maharashtra (IN); Ramesh Prasad, Maharashtra (IN)

Correspondence Address: BLAKELY SOKOLOFF TAYLOR & ZAFMAN 12400 WILSHIRE BOULEVARD SEVENTH FLOOR LOS ANGELES, CA 90025-1030 (US)

(21) Appl. No.: 11/076,746

(22) Filed: Mar. 9, 2005

Related US. Application Data

(60) Provisional application No. 60/552,153, ?led on Mar. 10, 2004. Provisional application No. 60/552,356, ?led on Mar. 10, 2004. Provisional application No. 60/552,270, ?led on Mar. 10, 2004.

Publication Classi?cation

(51) Int. Cl? ..................................................... ..H04N 7/12 (52) Us. 01. ...... .. 375/240.16; 375/240.19; 375/240.12;

375/240.11; 375/240.24

(57) ABSTRACT

Methods and apparatuses for compressing digital image data With motion prediction are described herein. In one embodi ment, for each tWo consecutive frames of an image sequence, a motion prediction is performed betWeen the consecutive frames by tracking motion on a luminance map of the frames to generate motion prediction information for the luminance component. The motion prediction informa tion of the luminance component is then applied to the chrominance maps. In response to the motion prediction, the Wavelet coef?cients of each frame and the motion prediction information are encoded into a bit stream based on a target transmission rate, Where the encoded Wavelet coefficients satisfy a predetermined threshold according to a predeter mined algorithm. Other methods and apparatuses are also described.

Encoder D3,"? Acqulsmon / ° 6 I vs“

Optional Decoder

Server / 0 I

Network (13.9., wired and/or Wireless)

[01 /

Optimal Encoder

Client /a]

Patent Application Publication Sep. 22, 2005 Sheet 1 0f 18 US 2005/0207495 A1

& u \ E26

Q

WSQMQ 582w m 3:250 Qe


Physical Layer (W-CDMA, CDMA 1.x. cdma2000, GSM-GPRS, UMTS, iBen) (1)

Data Link Control (DLC) (2)

Streaming piolocol stack (RTP. RTSP. RTCP.

so") (4) Third party ISO proiocoi 5m (TCP/lP/UDP) (3)

Billing and other ancillary services (5)

Network Aware Layer (NAL) (6)

Application Layer APIs ior QwikSUeam'". QNikVu’" and Qwiklex'" (7)

Content Generation Engine (8)

Data Repository (9)

Fig. 3


l

l

| Raw YUV color frame data ‘ 4L0 o

Wavelet Transfonn ?lter bank

#07.

Source Encoder (ARIES)

1H3

Channel encoding (Tree partitioning, CRC, RCPC)

M

Compressed File (e.g., .qvx ?le)

Fig. 4A


Compressed Image (.qvx ?le format) 4;

Channel decoding (Tree merging. CRC, RCPC)

Source Decoder (l-ARIES)

Inverse Wavelet Transfonn

Raw YUV data

Fig. 4B


Perform a wavelet transformation on each image pixel to _ transform the pixel into one or more coefficients in one or

more wavelet maps.

Encode each wavelet map by representing the signi?cance, sign and bit plane infomiation of the pixel using a single bit

in a bit stream. A, 90 L

Encode the signi?cant bits into a context variable dependent upon the information represented by the bit and

its location of the coefficient being coded (e.g.. the probability of occurrence of a predetermined set of bits

I immediately preceding the current bit). A’ $"o3 l

Transmit the content of the context variable as a bit stream as an output representing the encoded pixels.

~ 5-04.

Patent Application Publication

Sub-tree 1 (HL)

Sep. 22, 2005 Sheet 7 0f 18

m

W

Fig. 6

US 2005/0207495 A1

Sub-tree 3 (HH)


Fig. 7


1 Determine a number of iterations (nl) based on a number ot|

quantization levels, which may be determined on the Z 9 ’ largest wavelet coef?cient, and set an initial quantization /

threshold T = 2 "’ l“ g’ l

l Populate all insigni?cant pixels in IPQ. all insigni?cant pixel having descendants in ISO, and all signi?cant pixels in

SPQ. A K a L

For each type I entry of ISO, if the entry is signi?cant with respect to a current quantization threshold, remove the respective entry from ISO and append it in the SPQ

l“ Yo I

l For each type I entry of ISO, if the entry is insignificant with respect to a current quantization threshold, remove the respective entry from ISO and append it in the lPQ

l It the respective type t entry includes descendants, remove the entry from the ISO and append it at the end of ISO as

type II entry for next iteration; otherwise, the entry is purged. ~ g’ r

l For each type II entry of ISO, if the entry is signi?cant with respect to a current quantization threshold, all offspring of the current lSQ entry are appended to the end of ISO as

type I entries for next iteration. I» Z,‘

l Remove any entry in lPQ that is signi?cant with respect to the current quantization threshold and append it in the

'xyolk


l-ARIES llil

Raw YUV color frame data

‘ I

1

Wavelet Transform ?lter bank I

b u

f r\ MEIMC' f

f e l/ 2 2 ' I 2

35 —l CABAC ooded l ‘n motion l

information

I.

Source Encoder (ARIES llll)

' I

Fig'. 9A

Channel encoding (Tree paniiioning, CRC, RCPC)

compressed file I

Optional

Streaming data

Patent Application Publication Sep. 22, 2005 Sheet 11 0f 18

1

Raw YUV color frame data

Bynau [or i trill

t

Inverse Discrete Wavelet Transform (I-DWT)

I‘ l-ARIES 1m }

US 2005/0207495 A1

‘/ ME/MC' ’ Bypass ME/MC' for ———~-——— I ltrames

T I I

CABAC I coded motion

infon-nation I

l

L

Discrete Wavctet TranstorrMDWT)

I

l I

Source Encoder (ARIES l/II)

Fig. 9B

5 lm Channel encoding (Tree partitioning, CRC, RCPC)

V Compressed File

Streaming data


Streaming data

I I Optional l I _ o . .___.4_i._.>_aiier t

‘ Compressed Video (.qsx ?le I fon'nat)

CABAC coded I motion

information

Channel decoding (Tree merging. CRC, RCPC)

Source Decoder (l-ARIES llll)

K Bypass MC‘ for l

Frame Buffer ) MC‘ frames

Inverse Wavelet Transform

L RawYUVdata


, Streaming data

I Compressed Video (.qsx ?le format) I

CABAC coded I motion I

information I

Channel decoding (Tree merging, CRC, RCPC)

E & Source Decoder (l-ARIES II")

I l _.

a ....... a

Frame Buffer % MC‘ frames

[ Inverse Wavelet Transform

I Raw YUV data Fig. 108


Identify a reference frame (e.g._ the ?rst frame or an I- I I 9 a frame) /

Ana‘

i Perform a MEJMC on the coarsest subbands as parent subbands of a current frame other than the i-frame with

respect to the identi?ed reference frame to generate one or more motion vectors for the coarsest subbands.

~Ho1

Estimate the spatial shifting of pixels of child subbands using the motion vectors of the parent subbands to determine a search area of the child subbands.

l Perform a ME/MC for the child subbands to deten'nine the

motion vectors of the child subbands.

AIlla?

More child subbands?

I q Perform compression on the predicted/compensated data

into compressed data (e.g., see, Figs. 5 and 8) M! a;

Fig. 11


A8 + 02 w 02 2 V “4 v _v_ H

0 V o v8 \\ A1 " v/

M .. 0 V2 02 * v i 2 *

= 2 04 04 A / ‘Qt.

Fig. 12 //////////////

'

Ill/Ill; / r Z21

k m B e m m f M k=leve| of sub band o=orientation (LL, HL,

LH HH)

Boundary of the _ - - .- Search Area for

re?nement MVs

Re?nement Vector for level k

orientation 0 Block Neighborhood é MOIIOI'I

Vector

Sep. 22, 2005 Sheet 16 0f 18 US 2005/0207495 A1

3 1 m. F

Integer Motion Prediction

//////// an”

a

T.

,

r.

2m "VA / wank 4% mi. ,

/////////

2);)»,

1 . , W

Patent Application Publication


/////// Integer Motion Prediction

HaIf-Pel Motion

/ Prediction

Fig. 14


Block currently being tested

~95 I Matching

block

‘W1 22:: V. ____ ‘2 i being tested is in

1MV mode

> Motion Vector (identical colors _ _) denote MVs of the same block)

Displaced MV to translate matching block to the relative

- -> position of macroblock currently being tested

Fig. 15

current block being tested is in 4MV mode

I

US 2005/0207495 A1

METHODS AND APPARATUSES FOR COMPRESSING DIGITAL IMAGE DATA WITH

MOTION PREDICTION

[0001] This application claims the bene?t of US. Provi sional Application No. 60/552,153, ?led Mar. 10, 2004, US. Provisional Application No. 60/552,356, ?led Mar. 10, 2004, and Us. Provisional Application No. 60/552,270, ?led Mar. 10, 2004. The above-identi?ed applications are hereby incorporated by references.

FIELD OF THE INVENTION

[0002] The present invention relates generally to multi media applications. More particularly, this invention relates to compressing digital image data With motion prediction.

BACKGROUND OF THE INVENTION

[0003] A variety of systems have been developed for the encoding and decoding of audio/video data for transmission over Wireline and/or Wireless communication systems over the past decade. Most systems in this category employ standard compression/transmission techniques, such as, for example, the ITU-T Rec. H.264 (also referred to as H.264) and ISO/IEC Rec. 14496-10 AVC (also referred to as MPEG-4) standards. HoWever, due to their inherent gener ality, they lack the speci?c qualities needed for seamless implementation on loW poWer, loW complexity systems (such as hand held devices including, but not restricted to, personal digital assistants and smart phones) over noisy, loW bit rate Wireless channels.

[0004] Due to the likely business models rapidly emerging in the Wireless market, in Which cost incurred by the consumer is directly proportional to the actual volume of transmitted data, and also due to the limited bandWidth, processing capability, storage capacity and battery poWer, ef?ciency and speed in compression of audio/video data to be transmitted is a major factor in the eventual success of any such multimedia content delivery system. Most systems in use today are retro?tted versions of identical systems used on higher end desktop Workstations. Unlike desktop sys tems, Where error control is not a critical issue due to the inherent reliability of cable LAN/WAN data transmission, and bandWidth may be assumed to be almost unlimited, transmission over limited capacity Wireless netWorks require integration of such systems that may leverage suitable processing and error-control technologies to achieve the level of ?delity expected of a commercially viable multi media compression and transmission system.

[0005] Conventional video compression engines, or codecs, can be broadly classi?ed into tWo broad categories. One class of coding strategies, knoWn as a doWnload-and play (D&P) pro?le, not only requires the entire ?le to be doWnloaded onto the local memory before playback, leading to a large latency time (depending on the available band Width and the actual ?le siZe), but also makes stringent demands on the amount of buffer memory to be made available for the doWnloaded payload. Even With the more sophisticated streaming pro?le, the current physical limita tions on current generation transmission equipment at the physical layer force service providers to incorporate a pseudo-streaming capability, Which requires an initial period of latency (at the beginning of transmission), and continuous buffering henceforth, Which imposes a strain on the limited

Sep. 22, 2005

processing capabilities of the hand-held processor. Most commercial compression solutions in the market today do not possess a progressive transmission capability, Which means that transmission is possible only until the last integral frame, packet or bit before bandWidth drops beloW the minimum threshold. In case of video codecs, if the connection breaks before the transmission of the current frame, this frame is lost forever.

[0006] Another draWback in conventional video compres sion codes is the introduction of blocking artifacts due to the block-based coding schemes used in most codecs. Apart from the degradation in subjective visual quality, such systems suffer from poor performance due to bottlenecks introduced by the additional de-blocking ?lters. Yet another draWback is that, due to the limitations in the Word siZe of the computing platform, the coded coef?cients are truncated to an approximate value. This is especially prominent along object boundaries, Where Gibbs’ phenomenon leads to the generation of a visual phenomenon knoWn as mosquito noise. Due to this, the blurring along the object boundaries becomes more prominent, leading to degradation in overall frame quality.

[0007] Additionally, the local nature of motion prediction in some codes introduces motion-induced artifacts, Which cannot be easily smoothened by a simple ?ltering operation. Such problems arise especially in cases of fast motion clips and systems Where the frame rate is beloW that of natural video (e.g., 25 or 30 fps non-interlaced video). In either case, the temporal redundancy betWeen tWo consecutive frames is extremely loW (since much of the motion is lost in betWeen the frames itself), leading to poorer tracking of the motion across frames. This effect is cumulative in nature, especially for a longer group of frames (GoF).

[0008] Furthermore, mobile end-user devices are con strained by loW processing poWer and storage capacity. Due to the limitations on the silicon footprint, most mobile and hand-held systems in the market have to time-share the resources of the central processing unit (microcontroller or RISC/CISC processor) to perform all its DSP, control and communication tasks, With little or no provisions for a dedicated processor to take the video/audio processing load off the central processor. Moreover, most general-purpose central processors lack the unique architecture needed for optimal DSP performance. Therefore, a mobile video-codec design must have minimal client-end complexity While maintaining consistency on the ef?ciency and robustness front.

SUMMARY OF THE INVENTION

[0009] Methods and apparatuses for compressing digital image data With motion prediction are described herein. In one embodiment, for each tWo consecutive frames of an image sequence, a motion prediction is performed betWeen the consecutive frames by tracking motion on a luminance map of the frames to generate motion prediction information for the luminance component. The motion prediction infor mation of the luminance component is then applied to the chrominance maps. In response to the motion prediction, the Wavelet coef?cients of each frame and the motion prediction information are encoded into a bit stream based on a target transmission rate, Where the encoded Wavelet coefficients satisfy a predetermined threshold according to a predeter mined algorithm.

us20050207495

Documents