Download - © 2002 by Yu Hen Hu 1 ECE533 Digital Image Processing JPEG 2000: An Introduction

© 2002 by Yu Hen Hu1ECE533 Digital Image Processing

JPEG 2000: An Introduction


Agenda

Overview Wavelet transform EBCOT - JPEG2000 coefficient modeling

and context encoding MQ arithmetic coding ROI: Region of Interests


Overview


Introduction

Joint Photographic Experts Group (JPEG) is an ISO standard committee with a mission on “Coding and compression of still images”.

JPEG coding standard (1988): DCT (discrete cosine transform) based transform coding to compress bit-map images.

JPEG2000 efforts started in 1996 to use new methods such as fractals or wavelets. The target deliver date is year 2000 and hence the name.


JPEG2000 Features

• High compression efficiency• Lossless color transformations• Lossy and lossless coding in one algorithm• Embedded lossy to lossless coding• Progressive by resolution and quality• Static and dynamic Region-of-Interest• Error resilience• Visual (fixed and progressive) coding• Multiple component images• Palletized Images


Handling Large Images

Partition in both spatial and frequency domain Spatial Domain Partition: Tile, Frame

» bit streams of different tiles or frames are not independent

» artifact may occur at boundaries Special wavelet transform:

» Spatially segmented wavelet transform (SSWT)» Line based wavelet transform

Block: Independent partition in frequency domain (wavelet coefficients)» bit streams are independently generated


JPEG at 0.125 bpp (enlarged)

C. Christopoulos, A. Skodras, T. Ebrahimi, JPEG2000 (online tutorial)


JPEG2000 at 0.125 bpp



DWT-based Image Coding


Wavelet Based Image Coding

DiscreteWavelet

Transform

Context-basedQuantization

Entropycoding

20 40 60

10

20

30

40

50

60

20 40 60

10

20

30

40

50

60

2D discrete wavelet transformconverts images into “sub-bands”Upper left is the DC coefficientLower right are higher frequencysub-bands.


1D Discrete Wavelet Transform

x(n)H0

H1

2

2z–1

H0

H1

2

2z–1

H0

H1

2

2z–1

y0

y1

y2y3

y0 y1 y2 y3

HO: low pass digital filter, H1: high pass digital filter.Z-1: delay, 2: down-sample by 2

Recursive application of wavelet transform in spatialdomain corresponds to dyadic partition of data in the frequency domain.


2D Separate DWT

1D DWT applied alternatively to vertical and horizontal direction line by line.

The LL band is recursively decomposed, first vertically, and then horizontally.

This is Mallat method. Other methods have also been proposed.

L H

LL

HL

LH

HH

Image in spatial domain

HL

LH

HHHL

LH

HH


Bit Plane Coding

Coefficients are represented in sign-magnitude format Bit plane starts from the most significant bit (MSB) Sign bit is encoded after the MSB is encoded. Context (surrounding bit patterns) at each bit plane is

examined. Key: explore patterns in binary bit-plane.

3 -1 7

4 -5 2

6 1 -2

0 0 1+

1+

1- 0

1+

0 0

1+

0 1

0 0 1+

1 0 1-

1 1- 1

0 1 0

0 1+

0MSB LSB


SPIHT Set Partitioning in Hierarchical

Trees. Amir Said and William Pearlman

(IEEE Trans. CSVT, 1996) Based on zero tree wavelet

coding Main ideas:

» Partial magnitude sorting of wavelet transformation coefficients

» Ordered bit plane transmission » Exploitation of the self-similarity

among wavelet coefficients between sub-bands having parent-descendent relations.


JPEG2000

Image components, tiles, and sub-band structures

Wavelet transform Coefficient modeling Arithmetic coding


Tiling

XTOsiz + XTsiz > XOsiz, YTOsiz + YTsiz > YOsiz


Image Structure

Image

Image components

Tiles

precinct

layers

Code block

resolution Sub-band

packet

4LL 3HL

3LH 3HH

2HL

1HL

2HH

1HH

2LH

1LH


Layered Bit stream

Each bit stream is organized as a succession of layers

Each layer contains additional contributions from each block (some contributions might be empty)

Block truncation points associated with each layer are optimal in the rate distortion sense

Rate distortion optimization can be performed but it does not need to be standardized


DC level shift and component transform

Purpose of component transform is to de-correlate among components.For multi-spectral images, PCA may be used.There are reversible and irreversible transforms.

Forward reversible component transform

Inverse reversible component transform


Reversible Color Transform

Make lossless color coding possible. All components must have

identical sub-sampling parameters and same depth


IDWT (NL = 2)


IDWT Procedure

IDWT

DonelevNL

lev0 I(x,y) aoLL(x,y)

a(lev1)LL(u,v) = 2D_SR(alevLL(u,v), alevHL(u,v), alevLH(u,v), alevHH(u,v))

Iev lev1

yes

no


Periodic Symmetric Signal Extension


Lossless 1D DWT

Reversible Integer DWT DWT coefficients are integers without any truncation error provided

image component pixel values are also integer-valued. Transform is exactly reversible. Non-causal filter.

Xext(), Yext(): symmetrically, cyclic extended signals.

Forward transform Reverse transform

I01 2n < i1 1; I0 2n+1 < i1 ; I01 2n+1 < i1 1; I0 2n < i1 ;


Lossy 1D DWTDaubechies’ (9,7) filter in the lifting format.

Step 1: i03 2n < i1+3

Step 2: i02 2n+1 < i1+2

Step 3: i03 2n < i1+3

Step 4: i02 2n+1 < i1+2

Step 5: i01 2n < i1+1

Step 6: i0 2n+1 < i1

= 1.586 134 342, = 0.052 980 118 = 0.882 911 075, = 0.443 506 852K = 1.230 174 105

Step 1: i03 2n+1 < i1+3

Step 2: i02 2n < i1+2

Step 3: i01 2n+1 < i1+1

Step 4: i0 2n < i1Step 5: i0 2n+1 < i1Step 6: i0 2n < i1


Row-based Wavelet Transform

Problem with traditional wavelet transform: » filtering to be performed in both vertical and horizontal

directions. While access in one direction is easy, access in the other will require whole image to be buffered

» Difficult for implementation on PDA or other hand-held devices with limited amount of main memory.

Row-based wavelet transform» consumes the minimum amount of resources, » gives same results as traditional wavelet transform

Method » Use a rolling window for each decomposition level to keep

enough number (five) rows of image data in on-chip memory.


Context coding: EBCOT


Context Coding Algorithm: EBCOT

Embedded Block Coding with Optimal Truncation Block Coding

» Divide each sub-band into code blocks of samples which are coded independently

» For each block, a separate bit-stream is generated without utilizing any information from any of the other blocks

Optimal Truncation» The bit-stream of each block can be truncated to a variety

of discrete lengths, with associated distortion» A post-processing step after all blocks are compressed

determines truncation point for each block


EBCOT Block Coding

Taubman and Zakhor (IEEE Trans. IP, Sep. 94). Layered Zero Coding with Fractional Bit-Planes.

» For each bit plane, the encoding is applied three passes. Four types of coding operations for Arithmetic Entropy

Coding:» Zero Coding (ZC)

» Run-Length Coding (RLC)

» Sign Coding (SC)

» Magnitude Refinement Usage rule:

If a pixel is not yet significant, use ZC and RLC to encode whether it is significant in the current bit plane. If so, use SC to encode its sign. If a pixel is already significant, use Magnitude refinement to encode the new bit position.


Two Tiered Coding in EBCOT

Full-featured bit-stream

Blocks of subband samples

Embedded block bit-streamsand summary information

T1Low-level embedded block

coding engine

T2Layer formation and block

summary informationcoding engine

All the complexity is concentrated in the low-level block coding engine, T1, which generates embedded block bit-streams. The second tier, T2, plays a vital role in efficiently representing the individually coded blocks in a full-featured bit-stream.

ISO/IEC JTC 1/SC 29/WG 1 N1422


Illustration of Layered Coding

Illustration of block contributions to bit-stream layers. Only five layers are shown with seven code blocks, for simplicity.

Notice that not all code blocks need contribute to every layer and that the number of bytes contributed by blocks to any given layer is generally highly variable.

Notice also that the block coding operation proceeds vertically through each code block independently, whereas the layered bit-stream organization is horizontal, distributing the embedded bit-streams for each block throughout the bit-stream.

empty

empty empty

empty

empty

empty

empty

empty

layer 5

layer 4

layer 3

layer 2

layer 1

block 1 bit-stream

block 2 bit-stream

block 3 bit-stream

block 4 bit-stream

block 5 bit-stream

block 6 bit-stream

block 7 bit-stream


Embedded Block Bit Stream

1,3-MiiP1,3-MiiP

2,1-MiiP1,3-MiiP

2,2-MiiP2,3-MiiP 0,2

iP0,3iP1,3-MiiP

0,1iP1,3-MiiP

1,3iP

Pip,k: k-th pass of i-th block, p-th bit plane (1 p Mi1)

Scanning order: for i = 1, 2, …

for p = 1, 2, … Mi1

for k = 1, 2, 3 Three passes process:

» Significant Propagation Pass (Pip,1)

» Magnitude Refinement Pass (Pip,2)

» Clean up Pass (Pip,3)


Coefficient Bit Modeling

Wavelet coefficients are associated with different sub-bands arising from the 2D separable transform applied

These coefficients are then arranged into rectangular blocks within each sub-band, called code-blocks.

These code-blocks are then coded a bit-plane at a time starting from the most significant bit-plane with a non-zero element to the least significant bit-plane.

For each bit-plane in a code-block, a special code-block scan pattern is used for each of three coding passes.

Each coefficient bit in the bit-plane is coded in only one of the three coding passes:» significance propagation, » magnitude refinement, and » cleanup.


Three Passes Scanning

Significant Pass» Scanning all insignificant samples which have at

least one significant neighbors to determine if it will become significant at current bit plane.

» Use ZC to encode if a sample is still insignificant.» If a sample becomes significant, also apply SC to

encode its sign bit. Magnitude Refinement Pass

» Scanning samples which became significant in a previous bit-plane using MR encoding.

Normalization Pass» Scanning all remaining samples and encode using

ZC + RLC


Scanning Order within a code block

Each bit plane with a code block is scanned during the context coding process in a specific order.

All quantized transform coefficients are represented in sign-magnitude representation.

For a particular sub-band, there is a maximum number of magnitude bits, Mb. The “significance state” changes from insignificant to significant at the bit plane where the most significant 1 bit is found.

For a code-block, the number of bit-planes starting from the most significant bit-plane that are all zero, is signaled in the packet header


Neighboring states used to form context

Each coefficient in a code-block has an associated binary state variable called its significance state.

Significance states are initialized to 0 (coefficient is insignificant) and may become 1 (coefficient is significant) during the course of the coding of the code-block.

Four different context formation rules are defined, one for each of the four coding operations: » significance propagation pass:

– significance coding, – sign coding,

» magnitude refinement pass– magnitude refinement coding,

» cleanup pass– Cleanup coding.

The current context obtained during context coding is provided to the arithmetic MQ coder.


Bit plane encoding orders

The number of bit-planes starting from the most significant bit that have no significant coefficients (only insignificant bits) is signaled in the packet headers.

The first bit-plane with a non-zero element has a cleanup pass only. The remaining bit-planes are coded in three coding passes.

Each coefficient bit is coded in exactly one of the three coding passes. Which pass a coefficient bit is coded in depends on the conditions for that pass.

In general, the significance propagation pass includes the coefficients that are predicted, or “most likely,” to become significant and their sign bits, as appropriate.

The magnitude refinement pass includes bits from already significant coefficients.

The cleanup pass includes all the remaining coefficients.


Context of Significance and Cleanup Passes

LL and LH sub-bands (vertical high pass)

HL sub-bands (horizontal high pass)

HH sub-bands (diagonally high pass)

Context label

H V D H V D (H+V) D

2 x x x 2 x x 3 8

1 1 x 1 1 x 1 2 7

1 0 1 0 1 1 0 2 6

1 0 0 0 1 0 2 1 5

0 2 x 2 0 x 1 1 4

0 1 x 1 0 x 0 1 3

0 0 2 0 0 2 2 0 2

0 0 1 0 0 1 1 0 1

0 0 0 0 0 0 0 0 0

x: don’t care


Significance propagation pass

The significance propagation pass includes only bits of coefficients that were insignificant (the significance bit has yet to be encountered) and have a non-zero context. All other coefficients are skipped.

The context is delivered to the arithmetic decoder (along with the bit stream) and the decoded coefficient bit is returned.

If the value of this bit is 1 then the significance state is set to 1 and the immediate next bit to be decoded is the sign bit for the coefficient. Otherwise, the significance state remains 0.

When the contexts of successive coefficients and coding passes are considered, the most current significance state for this coefficient is used.


V0 (or H0) V1 (or H1) V (or H)

contribution S (significant), P (positive) S, P 1

S, N (negative) S, P 0 I (insignificant) S, P 1

S, P S, N 0 S, N S, N 1

I S, N 1 S, P I 1 S, N I 1

I I 0

Sign Bit Coding Two phases:

» Summarize contributions of vertical and horizontal neighbors

» Reduces these contributions into 1 or 5 context labels

The context labels are sent to MQ arithmetic coder.

Signbit = AC(contextlabel) XORbit» Signbit: sign bit of the

current coefficient» AC(contextlabel) is the

valuate returned from arithmetic decoder given the context label and the bit stream.

H contribution V contribution Context label XORbit 1 1 13 0 1 0 12 0 1 1 11 0 0 1 10 0 0 0 9 0 0 1 10 1 1 1 11 1 1 0 12 1 1 1 13 1


Magnitude Refinement

The magnitude refinement pass includes the bits from coefficients that are already significant (except those that have just become significant in the immediately proceeding significance propagation pass).

The context used is determined by the summation of the significance state of the horizontal, vertical, and diagonal neighbors. These are the states as currently known to the decoder, not the states used before the significance decoding pass.

Further, it is dependent on whether this is the first refinement bit (the bit immediately after the significance and sign bits) or not.

H + V + D 1st refinement for this coefficient

Context label

X False 16 1 True 15 0 True 14


Cleanup Pass

The first pass and only coding pass for the first significant bit-plane. The third and the last pass of all the remaining bit-planes. Use both neighbor context as in significant propagation pass and run-

length coding.


Context-based Arithmetic Entropy Coding

The MQ-coder, a low complexity entropy coder is used.

Contexts are based on the significance of horizontal, vertical, diagonal neighbors of the pixel concerned.

Current there are 46 contexts.


Tagged TreeEach node has an associated current value, which is initialized to zero (the minimum). A 0 bit in the tag tree means that the minimum (or the value in the case of the highest level) is larger than the current value and a 1 bit means that the minimum (or the value in the case of the highest level) is equal to the current value. For each contiguous 0 bit in the tag tree the current value is incremented by one. Nodes at higher levels cannot be coded until lower level node values are fixed (i.e a 1 bit is coded). The top node on level 0 (the lowest level) is queried first. The next corresponding node on level 1 is then queried, and so on.


Tagged tree encoding example

K = 0 (top level)

t0(0,0) = 0 (initialize)

t0(0,0) = 0 < q0(0,0) = 1

output 0, t0(0,0)= t0(0,0)+1=1

t0(0,0) = 1 = q0(0,0) = 1output 1, K = K+1 = 1

Note: q0(0,0) is encoded!K = 1

t1(0,0) = q0(0,0) = 1 (initialize)

t1(0,0) = 1 = q1(0,0)output 1, K = K+1 = 2

Note: q1(0,0) is encoded!K = 2

t2(0,0) = q1(0,0) = 1 (initialize)

t2(0,0) = 1 = q2(0,0) = 1output 1, K = K+1 = 3

Note: q2(0,0) is encoded.

K = 3

t3(0,0) = q2(0,0) = 1 (initialize)

t3(0,0) = q3(0,0) = 1output 1, done

Note: q3(0,0) is encoded

Thus, code for q3(0,0): 01111

q0(0,0)=1

q1(0,0)=1

q2(0,0)=1

q3(0,0)=1

01

1

1

1


Example continuedNext, encode q3(1,0). Since its parent node

q2(0,0) is known, we start with K = 3:K = 3

t3(1,0) = q2(0,0) = 1 (initialize)

t3(1,0) = 1 < q3(1,0) = 3

output 0, t3(1,0) = t3(1,0) + 1 = 2

t3(1,0) = 2 < q3(1,0) = 3

output 0, t3(1,0) = t3(1,0) + 1 = 3

t3(1,0) = 3 = q3(1,0), doneoutput 1,

Note q3(1,0) is encoded as 001

Now, consider q3(2,0). Its parent is q2(1,0) which needs to be encoded first.

K = 2

t2(1,0) = q1(0,0) = 1

t2(1,0) = 1 = q2(1,0)output 1, K = K + 1 = 3

K = 3

t3(2,0) = q2(1,0) = 1

t3(2,0) = 1 < q3(2,0) = 2

output 0, t3(2,0) = t3(2,0)+1 = 2

t3(2,0) = 2 = q3(2,0), done

output 1

Hence q3(2,0) is encoded as 101q0(0,0)=1

q1(0,0)=1

q2(0,0)=1

q3(0,0)=1

01

1

1

1

q3(1,0)=3 q3(2,0)=2

q2(1,0)=1

001

1

01


Layers

Bit-stream is a succession of layers. Layer contains the contributions from each code

block. The block truncation associated with each layer are

optimal in rate-distortion sense. Single layer can achieve “progressive in resolution” Multiple layers can achieve “progressive in SNR”


MQ Arithmetic Coding


Basic Arithmetic Coding

MPS: more probable symbol with probability Pe

LPS: less probable symbol with probability Qe

If M is encoded, current interval is the Pe part, else, it is the Qe part (bottom). The length is kept in variable A.

Code string C points to the base of the current interval.

Pe

Qe

M M L M

0.0

1.0


Encoding of the Sequence MMLM

A(the current interval)

if MPS is encoded C C+Qe A AQeelse(LPS is encoded) A Qeendif A < 0.75 Renormalize A and C; Update Qe;

• Interval A is kept between 0.75 and 1.5. Binary 0x8000 is used to represent 0.75 to make comparison easy.

• Each time A is doubled, so does C. The higher order byte of C register is overflowed to an external buffer (compressed code stream).

Qe

Qe

Qe

M M L MContext:

Qe

C(the pointer of code string)

0

A(0)


Decoding of the sequence MMLM

A(the current interval)

QeQe Qe

M M L MContext:

Qe

C(the pointer of code string)

If C>=Qe( MPS is decoded) C <- C-Qe A <- A-Qeelse(LPS is decoded) A <- Qeend

if A<0.75 Renormalize A and C; Update Qe; 0

A(0)


Context model

Uncompressed data

Probability estimator

Arithmetic encoder

Decision (D)Context (CX)

QeMPS

Arithmetic decoder

compressed data

Probability estimator

Decision (D)Context (CX)

Uncompressed

data

QeMPS

Context model

compressed data

JPEG2000 Arithmetic Codec


Encoder Register Structure

“a” bits -- fractional bits in the A-register (the current interval value) “x” bits -- fractional bits in the code register. “s” bits -- spacer bits which provide useful constraints on carry-over, “b” bits -- bit positions from which the completed bytes of the data

are removed from the C-register. “c” bit -- a carry bit.


Encoding

encode

done

Code 0Code 1 D=0?yesno


Encode MPS, LPS

Total 46 context symbols are listed.Encoding is similar to a finite state machine: from current row, find the next row depending on MPS or LPS and output the code stream.


Region of Interests Coding (ROI)


Region of Interests Coding

An ROI is a part of an image that is coded earlier in the code stream than the rest of the image (the background).

The coding is also done in such a way that the information associated with the ROI precedes the information associated with the background.

The method used is the Maxshift method. ROI allows certain parts of the image to be coded in better

quality Static:

» The ROI is decided and coded once for all at the encoder side Dynamic:

» The ROI can be decided and decoded on the fly from a same bit stream


MaxShift Method

Encoding1. Generate ROI mask, M(x,y).

– M(x,y) = 1, wavelet coefficient (x,y) is needed for ROI– M(x,y) = 0, wavelet coefficient (x,y) belong to background

pixels and can be sacrificed w/o affecting ROI.

2. Find the scaling value, s and scale up all ROI wavelet coefficients by s bits so that ROI coefficients > 2s > background coefficient

3. Write the scaling value, s, into code stream using the RGN marker

Decoding1. Get s from RGN marker2. Scale background wavelet coefficients by 2s


ROI Mask Computation

Must track wavelet coefficients that will contribute to ROI region pixels.



Scale Operation



Advantages of Maxshift method

Support for arbitrary shaped ROI’s with minimal complexity

No need to send shape information No need for shape encoder and decoder No need for ROI mask at decoder side Decoder as simple as non-ROI capable

decoder Can decide in which sub band the ROI will

begin» therefore it can give similar results to the general

scaling method


Conclusion

JPEG2000 is an emerging image coding standard for the next generation of digital imaging.

No IPR (intellectual property right) on part I of the standard (free licensing)

More complex than JPEG but designed with hardware implementation in mind.

Many companies are working to incorporate JP2 into the next generation of digital camera and scanners.

Download - © 2002 by Yu Hen Hu 1 ECE533 Digital Image Processing JPEG 2000: An Introduction

Top Related