accelerating m-jpeg compression with temporal...

Accelerating M-JPEG Compression with Temporal Information

Holger Bönisch, Konrad Froitzheim, Peter SchulthessDepartment of Distributed Systems

University of Ulm, GermanyOberer Eselsberg

D 89069 Ulm / Germany

Software implementation of multimedia compression is an enabling technology for the wide-spread use of computer based multimedia communications. M-JPEG offers reasonable compres-sion at reasonable computational cost. This paper presents modifications to the well-known JPEGcompression algorithm to achieve a 60 - 70% speed-up of digital video compression. The schemepresented exploits dependencies between frames to predict the DCT-coefficients in a frame basedon previous frames in the sequence. This knowledge is used to reduce the computational com-plexity of the DCT-transforms. The process-based approach taken introduces only a mild over-head of about 0.4% into the compression. Performance measurements with real video sequencesdemonstrate the increased performance of the modified JPEG-process for digital video.

1 Introduction

Digital video is an integral part of multimedia systems. Most applications require real-time fea-tures, reasonable compression ratios, and high flexibility. The latter requirement has made soft-ware compression preferable to hardware solutions if the real-time requirements are met. Theincreasing performance of modern PCs allows the use of software codecs even in a real-time envi-ronment. On the other hand the trade-off between compression ratio and complexity of the com-pression algorithm limits the choice of applicable compression schemes. Sophisticated compres-sion schemes need state-of-the-art implementations to fulfil the performance requirements.

Motion JPEG (M-JPEG) is a good compromise in such a scenario: It offers reasonable compres-sion ratios as well as moderate compression complexity. M-JPEG creates a sequence of JPEG en-coded frames. Each frame is subdivided into blocks and a symmetric compression scheme (figure1) is applied to each block. Improved compression can be achieved with a conditional replenish-ment scheme [M69, FW97].

color spaceconversion DCT quanti-

zationzigzag

reordering

matrix re-construction

dequan-tization

inverseDCT

color spaceconversion

Transmission of compressed data

entropydecoding

entropycoding

Figure 1: JPEG compression/decompression

M-JPEG is limited to intraframe compression only for historical reasons. There is no interframecompression in spite of the presence of temporal redundancy. All image streams suitable for hu-man consumption contain temporal redundancy. The central idea of our approach is to take ad-vantage of this redundancy to speed up the image compression process. This extends an algorithmintroduced by Froitzheim and Wolf [FW95]. They showed how to accelerate JPEG decompressionusing knowledge on frequency distribution within a block of a JPEG encoded frame. The advantageof this structural knowledge is that the discrete cosine transform can be redefined to reduce thenumber of operations executed in each step. Froitzheim and Wolf proposed a reduced, one dimen-sional inverse DCT (IDCT). All frequency distribution data is extracted during matrix reconstruc-tion, thus minimizing the overhead to generate the structural knowledge.

Unfortunately, there is no direct compression equivalent to this approach, because the frequencydistributions are unknown until quantization, i.e. during the preceeding transform. After the quan-tization the structural knowledge can be accessed for further use. This paper proposes to use thisknowledge for the same area in the next frame of a stream. For a well-known frequency distribu-tion we introduce a reduced JPEG compression process which includes both reduced DCT and re-duced quantization and entropy coding (figure 2).

color spaceconversion DCT

quanti-sation

zigzagreordering

entropycoding

color spaceconversion DCT

quanti-sation

entropycoding

evaluation offrequency

distribution

acquisition of frequency distribution

zigzagreordering

reduced JPEG process

frame n

frame n+1

Figure 2: Conceptual view of the M-JPEG acceleration

The issues addressed in this paper are presented in the following order: Section 2 covers essentialbasics of JPEG compression as specified in [JPG92]. In section 3, the standard JPEG algorithm ismodified to a reduced JPEG process. Reduced JPEG works content based and it’s adaptive behavioris described in section 4. The experimental results are presented in section 5. In section 6 wesummarize and suggest further improvements.

2 Baseline Sequential JPEG

2.1 Color Space Conversion

Source images are separated into component layers so that every color model could be processedby JPEG. Nevertheless, the usage of standard color spaces (one for gray scale called luminance (Y)and two additional chrominance components (Cr, Cb) for color images) is recommended. Colorimages may be subsampled after having been transformed from RGB to YCrCb color space (e.g.according to ITU-R 601-1).

2.2 Discrete Cosine Transform (DCT)

The two dimensional DCT applies an invertible coordinate transform to the frame. It transformsa matrix of image data (n xn) from spatial f ( j ,k) to the frequency domain F(u,v) by a set of or-thogonal base functions (see equation 1a). This is similar to the vector representation (equation1b) where X is the input vector, Y is the output vector and Q is the transform kernel. The twodimensional DCT can be considered as a sequence of two consecutive one dimensional transformsdue to the seperability property of the transform [RY90].

The distribution of energy within the matrix changes significantly after transforming. Most of theenergy is concentrated in the upper left block corner due to the structure of the base functions.F(0, 0) is the mean energy of the block (DC coefficient). The other values are called AC coeffi-cients representing increasing frequencies from left to right and up to down respectively. Values oflow frequencies are considered more important, because they contain more visible informationthan higher frequencies There is no loss of information by the transform except arithmetic inac-curacies.

( )( )

a) ( ) = 2 C( ) C( )

cos cos

b) = =2

C( ) = , = 0

1 , = 1 ... n-1

( ( + 1) + 1)

12

F u vu vn

f j k

Y Q X Q QnC w

i j

n

ww

w

k

n

j

nj un

k vn

Ti j

, ( , )

cos

( (

,

) )=

−

=

−

∑∑

• •+

0

1

0

12

22

2

2 1

2

π π

π

Equation 1: Definition of the 2D-DCT

2.3 Quantization and Entropy Coding

Transformed blocks are quantized proportional to the amount of energy they contain or accord-ing to the importance of a coefficient. The quantization step multiplies the transformed data witha scaled quantization matrix such that most coefficients are set to zero. Only a few highly rele-vant coefficients in the upper left block corner remain nonzero. Different matrices for luminanceand chrominance are applied.

After a zigzag reordering of the coeffizients, two entropy compression schemes are applied, zerosuppression and a variable length code [RB91].

3 Accelerated JPEG compression

3.1 Reduced DCT

Depending on the selected compression ratio only a few values remain nonzero after transformingand quantizing a block. A large amount of operations is wasted during the DCT stage to computevalues which will be discarded in the quantization step. The calculation could be restricted to rele-vant matrix positions if one would only know the position of nonzero values. Although this isimpossible on still images it is not in video processing. Due to the interframe relations between

consecutive images it is possible to predict the frequency distribution. Only the first video framehas to be treated like a still image. This prediciton algorithm is the core of the work presented. I thas to adapt to changes in the image sequence quickly and it must not incur too much additionalcomputation.

The next chapter shows how to adapt to changing block contents to minimize errors. Now weassume a given distribution of nonzero coefficients. This distribution is represented by one bitsetfor each 1-D transform. If at least coefficient of a given row (column) is nonzero, the corre-sponding bit of the row (column) bitset is set. We adopt the term degree (G) from [FW95] to de-scribe a reduced one dimensional DCT. The degree of a transform is derived from the corre-sponding bitset and equal to its most significant nonzero bit. The operation of a reduced DCT canbe described as follows:

( ) ( )( )

F u v F u v u G v G

F u vT TR

R

else

, , ,

,

= < <

=1 2

0

Equation 2: Definition of a reduced DCT

Hence, less coefficients have to be calculated. Consider a target block of known degree (Grow = 5,Gcol = 4). All input samples have to be read, but after that it is sufficient to compute 4 outputsamples per column and 5 output samples per row. Furthermore, we can restrict the second step t othe 4 upper rows as shown in figure 3. In figure 3 nonzero coefficients are shown in color to visu-alize their distribution. We call the area with nonzero coeffients the block shape.

1 1 1 0 1 0 0 0

11110000

columntransform

rowtransform

Figure 3: Example transform of a given block shape. Only colored coefficients have to be computed.

According to equation 2, the output of a reduced DCT is a rectangular area of size Grow, Gcol.. Thatis why the example row transform will generate output values at column 4, although it does notcontain non-zero values. This is not necessary, but is makes the implementation easier and proba-bly faster. Two questions arise out of this:

• Which transform has to be executed first to yield a minimum of instructions?• Which overall speed-up can be expected?

The relationship between execution time of standard and reduced DCT can be computed as follows(A is the number of instructions of a transform of degree G):

( ) ( )( )

t

G A G G A G

G A Grel

Max T T T

Max Max

=+1 1 2

2

Equation 3: Relative calculation time of a reduced DCTas a function of the transform’s degree

Since A(G) increases monotonously (chapter 5 - table 1), equation 3 provides an answer to thefirst question. In order to minimize the numerator it is advisable to choose the transform of lower

degree first. The distribution of nonzero coefficients has a different shape for each block. As dis-cussed in [FW95] asymmetric shapes are more frequent than symmetric configurations.

3.2 Reduced Quantization and Entropy Coding

The quantization step can optimized in the same way as the DCT. Only the relevant part of theblock is computed. All other coefficients are set to zero.

round G

else T1F u v

u v

u vQ

F

Qu v GT( , )

( , )

( , ),=

< < 2

0

Equation 4: Definition of a reduced quantization

Execution time increases linear with the number of nonzero coefficients. There is a considerableadditional overhead setting the other coefficients to zero. This could be avoided if entropy codingcould be restricted to the used part of the block. However, there is no simple mapping between thenonzero area and the zig-zag scan. Figure 4 illustrates the relation between the quantized area andthe respective zig-zag scan. It is advisable to limit the upper bound of the zigzag scan to F(Gcol,Grow). Masking the zigzag overhead is theoretically possible but computationally expensive, be-cause zero values are not encoded and only used to select the proper Huffman table entry. There-fore, the overall speed-up of this step is not as high as the definition promises to be.

uv

Figure 4: Zigzag scan of reduced entropy coding

4 Estimation of Nonzero Coefficients

So far we assumed sufficient knowledge about the position of nonzero coefficients and the shapeof the block consequently. Under these circumstances no additional errors occur compared t ostandard JPEG. Our proposal to gain knowledge about block shapes is to predict them from theprevious frame. This knowledge is not accurate however. If we limit the coefficient computationto the old non-zero blocks, we do not only risk errors, if the input - and the block shape -changes. We would not even be able to dectect these changes.A buffer zone is used to achieve adaptation to different block shapes. The buffer zone extends thepredicted block shape as shown in figure 6 to observe changes of block shapes. It thus includeszero coefficients in the computed coefficient block to check for non-zero values in the trans-formed data indicating changes in the block shape. The errors caused by this strategy depend onhow the predictors are estimated and the size of the buffer zone.

4.1 Predictor Estimation

The naive way to calculate the current block shape is to compare a value to zero after its quanti-zation. This is a generic but not particularly efficient method. A faster estimation is possible bychecking the coefficients of the current degree and of the adaptation buffer only. Penalties of thismethod are its relatively slow convergence and a complex indexing within the reduced JPEG im-plementation.

An efficient solution is to embed the estimation process into the entropy coder. The Huffmancoder loads and compares coefficients in any case. Since we have to load the coefficients in thisstep, we might as well add two instructions neccessary to update the proper bitsets with respect t othe zigzag sequence. The algorithm starts with a minimum bitset. For each nonzero coefficientthis bitset is extended as described in figure 5.

uv

0 0 0 0 0 1 0 01 1 1 1 1 0 0 0

11100000

10000000

∧

∧

Figure 5: Computation of new row and column bitsets by iterating OR operations of the present bitsetand two bitmasks determined by the current zigzag position

4.2 Adaptive Prediction

The next open issue is how to apply the predicted area to the next frame. This depends on thedegree of changes between two frames or the image contents respectively. A change of imagecontent causes a change of the block shape where the block will either grow or shrink. We have t oaccomodate both changes, on to maintain image quality, the other to increase compression speed.In figure 6 we extend the predicted area by a buffer zone (GB) of degree 1 for both row and columntransform. The buffer zone increases the transform degree as follows:

( )[ ] ( )[ ]G u v G u v GT agen T agen B, ,Im Im +

= +1

Equation 6: Transform degree of a block using a buffer zone

The changes of a given block can be classified as follows:

predictor block (frame n)

real situation (frame n+1)

Figure 6: Types of possible changes of block shapes: shrinking,continuous expansion and discontinuous expansion

If the block to be encoded shrinks (figure 6 left) nothing has to be done. If it contains more in-formation than expected (figure 6 middle) not all relevant coefficients are calculated. Togetherwith the predicted shape buffer zone is also extended for the next step. All the relevant coeffi-cients will be computed in subsequent frames. In the example above this will happen one framelater due to the buffer zone, which causes an expansion of the predicted area. This is equivalent t oa higher quantization for the time of one frame and there are almost no visible consequences.Moving edges are blurred for the time of one or sometimes two frames (or less than 133 ms in avideo stream of 15 fps).

A buffer zone of degree one is a compromise between performance and tolerated errors. It doesproduce persistent errors because it does not detect coefficients introduced by discontinuous ex-pansion (figure 6 right). Such isolated high frequency coefficients are rare and appear only onimage content where even standard JPEG has its problems. These contents are typically b/w textor dithered images.

5 Experimental Results

5.1 Implementation

To compare our reduced JPEG to the standard algorithm we implemented both in assembly lan-guage (on a PowerPC processor). Our implementation is based on an algorithm with a balancebetween the number of additions and the number of multiplications in order to take advantage offused multiply accumulate instructions (MAC) of modern multimedia processors. It is a modifica-tion of Chens [CH77] algorithm according to Linzer and Feig [LF91]. The following table showshow many instructions can be saved if using a reduced DCT of a certain degree (standard JPEGcomplies to degree 8):

Table 1: Number of instructions as a function of DCT’s degree

degree 1 2 3 4 5 6 7 8

add/mult 8 18 22 26 27 29 31 33

load/store 9 10 11 12 13 14 15 16

Δ Gmax 32 21 16 11 9 6 3 -

Standard JPEG depends on the following parameters:• color depth - determines the complexity of color space conversion• image size - the number of blocks to process.• image quality - the number of relevant coefficients to be encoded at entropy coding

JPEG acceleration with our algorithm depends on two more parameters:• image content - represented by the number and disthe tribution of relevant coefficients as de-

scribed above• image quality - due to the non-linear speed-up of the reduced DCT (see Table 1) and the re-

duced quantization and entropy coding

In order to be independent of standard JPEG parameters we used several test series to calculatetheir effects. That includes different color depths (gray and color images) and image sizes. Weused QCIF (video conferencing) and a typical computer video format (320x240 pels) to covertypical multimedia scenarios. Image content and quality can be handled by classification. Imagecontent is represented by images of different detail level from the frequency domain’s point ofview. An increasing overall detail level causes a higher number of relevant coefficients per blockand a lower speed-up.

The detail level can be classified as follows:• low detail level - typically landscapes or images containing wide smooth areas in general• medium detail level - portraits or indoor recordings• high detail level - b/w images, drawings or dithered images

The image quality of digital video can be described according to table 2:

Table 2: Classification of digital video quality (M-JPEG compressed)

compressionratio

quality quantizationratio1

characteristics

15:1 - 40:1 still image 1 errors almost invisible,slightly reduced sharpness

40:1 - 80:1standardvideo 2

minor errors like visibleblock edges and slight colorfluctuations

80:1 - 120:1video

conference 3major errors like visibleblock edges, color fluctua-tions, missing details

5.2 Results

A complete set of results is included in Annex A as plots. The speed-up of a real codec covers awide range depending heavily on the color depth parameter (figure 7). This is due to the time con-suming color conversion step, which is not optimized by the reduced JPEG, while gray scale im-ages need only a simple data type conversion.

1 According to the quantization ratio used in Annex A.

16 bit RGB

greyscaleimages

0% 10% 20% 30% 40% 50% 60% 70% 80%

min

max

Figure 7: Minimum / maximum speed-up of the reduced JPEG vs.standard JPEG at standard video quality

There is a strong relation between speed-up and image content or detail level respectively. Theworst results in figure 7 are caused by images of highest detail level. A video is a composition ofimages of different types and can be considered as an average of the results. Because the exact mixof a certain video is unknown we can only present the average speed-up of our test series assuminga uniform distribution. Because color space conversion has to be optimized with entirely differenttechniques it will not be considered in the comarison of the two JPEG versions.

A JPEG encoder using our reduced JPEG algorithm generates higher frame rates of:

• approximately 60% for standard video quality• approximately 70% for video conference quality

During the tests the data about frequency distribution within a block was stored in block relatedbitsets. The overhead calculating these bitsets is always less than 0,4% of the encoding process.

5.3 Errors

The conceptual approach of a buffer zone will not detect nonzero coefficients outside the buffer.We now want to discuss in which situation this approach is applicable. A test image series wasencoded iteratively to measure the effects of missing coefficients. Starting with the DC value theencoding was repeated until the file size did not change any more. Then all images were comparedto their originals of the same quality level both using the program fracomp [KA94] and by humanviewers.

Table 3: Error characteristics of the test series encoded with reduced JPEG (16 Bit color depth)

image quality still image standard video

signal to noise ratio (dB) 40-46 36-42average deviation <1 <2root mean square error 1-2 2-4perceived overall error minimal minimal

Table 3 shows that only very few coefficients are not detected resulting in almost no visible dis-tortions in the video stream. The situation changes if certain image content appears, e.g. b/w text,or if special image transforms were applied before compression, e.g. dithering. Then the error rateincreases rapidly due to the unrecognized high frequency components. To illustrate these effectsall images of the test series were dithered (table 4).

Table 4: Results of table 3 after color depth reduction to 8 bit with dithering

image quality still image standard video

signal to noise ratio (dB) 28-32 25-30average deviation 4-6 4-7root mean square error 7-9 8-10perceived overall error obvious obvious

Errors of this order of magnitude cannot be tolerated if they appear over a longer time (not onlyin a single frame). Since the algorithm is not able to auto detect such situations this must be reme-died by the user by adjusting the width of the buffer zone manually.

6 Conclusions

Slow changes of the image content between consecutive frames of digital video sequences causetemporal redundancy. MPEG takes advantage of this redundancy to achieve high compression.These compression ratios come at a price however, extremly high computation load and signifi-cant delay. Many multimedia applications require a careful balance between speed and compressionratio. That is why we propose to use M-JPEG and to accelerate the compression based temporaldependencies to avoid unneccessary operations.

Temporal dependencies are represented by slowly changing frequency distributions within JPEGblocks at the same position in the frame. Our implementation stores the frequency distribution ofa certain block in two bitsets. We embedded the knowledge acquisiton into the entropy encodingstage. The overhead of this method does not exceed 0,4% of the overall processing time. Thisknowledge allows to predict the frequency distribution of the next frame. The prediction mecha-nism has to be adaptive to detect changes of the image content. In our implementation we choosea buffer zone of degree 1 around the computed block shape. This is sufficient as long as no irregu-lar image content appears.

Under these assumptions we defined and implemented a reduced JPEG compression process in-cluding a reduced DCT, a reduced quantization and an entropy coding. We found that the reducedJPEG codec could generate higher frame rates of about 60 to 70% compared to the standard JPEG.Further compression speed-up could be gained either from further refinement of the knowledgeacquisition (more than one bit per row/column) combined with a matched DCT or from symbol-parallel entropy encoding.

The color space conversion step does not take advantage of our algorithm. An improvement ofthis step could be achieved with parallel computation of the different color components, e.g. withspecial multimedia processor instructions.

7 References

[CH77] Chen, W.H./ Smith, C.H./ Fralick, S.C.: A Fast Computational Algorithm for the Dis-crete Cosine Transform, IEEE Trans. Commun. vol. COM-31, 9/1977, P. 1004 -1009.

[FW95] Froitzheim, K./ Wolf, H.: A Knowledge-based Approach to JPEG Acceleration, SPIEProceedings, Digital Video Compression: Algorithms and Technologies 1995, 2419-30,P. 318.

[FW97] Froitzheim, K./ Wolf, H.: WebVideo - a Tool for WWW-based Teleoperation; IEEEInternational Symposium on Industrial Electronics - ISIE '97, Guimarães, 1997.

[JPG92] ISO/IEC Draft International Standard: JPEG, ISO/IEC 10918-1, 1992.

[LF91] Linzer, E./ Feig, E.: New Scaled DCT Algorithms for Fused Multiply/Add Architectures,IEEE Intl. Conference on Acoustics, Speech and Signal Processing, 1991, P. 2201.

[KA94] Kassler, A.: fracomp: Fractal Image Compression with Windows, 1994ftp://www-vs.informatik.uni-ulm.de/pub/fracomp/

[M69] Mounts, F.W.: A Video Encoding System with Conditional Picture-Element Replenish-ment; Bell Systems Technical Journal 48,7 Sept. 1969, P. 2545 - 2554.

[RB91] Rabbani, M./ Jones, P.W.: Digital Image Compression Techniques, SPIE OpticalEngineering Press 1991, ISBN 0-8194-0648-1

[RY91] Rao, K.R./ Yip, P.: Discrete Cosine Transform, Academic Press 1990, ISBN 0-12-580203-X

Annex A

The following diagrams contain measured frame rates of standard and reduced JPEG. All test wereperformed on Power Macintosh (PPC 601/66Mhz and 604/120Mhz) with MacOS, System 7 attwo frame sizes (320x240 and QCIF 176x144).

• x-axis: quantization scale regarding to ITU’s standard quantization matrices• y-axis: frame rate [Hz]

0,00

100,00

200,00

0,61

1,4 1,82

2,2 2,63

601/66-320x240 standard 601/66-320x240 reduced




0,00

20,00

40,00

60,00

80,00

100,00

120,00

140,00

160,00

180,00

200,00

0,6 1 1,4 1,8 2 2,2 2,6 30,00

20,00

40,00

60,00

80,00

100,00

120,00

140,00

160,00

180,00

0,6 1 1,4 1,8 2 2,2 2,6 3

low detail level - gray scale medium detail level - gray scale

0,00

20,00

40,00

60,00

80,00

100,00

120,00

140,00

160,00

180,00

0,6 1 1,4 1,8 2 2,2 2,6 3

0,00

5,00

10,00

15,00

20,00

25,00

30,00

35,00

0,6 1 1,4 1,8 2 2,2 2,6 3

high detail level - gray scale low detail level - 16 bit RGB

0,00

5,00

10,00

15,00

20,00

25,00

30,00

35,00

40,00

0,6 1 1,4 1,8 2 2,2 2,6 30,00

5,00

10,00

15,00

20,00

25,00

30,00

35,00

40,00

0,6 1 1,4 1,8 2 2,2 2,6 3

medium detail level - 16 bit RGB high detail level - 16 bit RGB

accelerating m-jpeg compression with temporal...

Documents