cosmic: a compatible scheme for moving image coding

Signal Processing: Image Communication 5 (1993) 91 I03 91 Elsevier

COSMIC" A compatible scheme for moving image coding

Geoffrey Morrison and Ian Parke BT Labs, Martlesham Heath, Ipswich IP5 7RE, UK

Abstract. This paper decribes a video coding algorithm submitted for consideration by ISO/MPEG in the phase of its work targeted at bit-rates up to about 10 Mbit/s. Distinguishing features of the submission were the attempt to meet a wide range of requirements, rather than concentrating solely on picture quality, and compatibility with the earlier MPEG draft standard. This was achieved with a hierarchical structure with splitting in the pel domain. The reasons for this approach are explained and the impact on implementation complexity is also covered.

Keywords. B-ISDN; compatibility; embedded bit stream; HDTV; hierarchical video coding; layered video coding; MPEG; scalability; video compression.

1. Summary

The subject of this paper is the COSMIC video coding algorithm submitted in response to a call for proposals from ISO/IEC JTC1/SC2/WG8. The background to the ISO activity is given briefly in Section 2 together with details of the collaborative grouping which developed COSMIC. In Section 3 the topic of compatibility is introduced and several general means of achieving it are dis- cussed in Section 4. From these, layered coding is selected for deeper treatment of several approaches in Section 5 followed in Section 6 by the algo- rithmic details of the technique selected for COSMIC. Other advantages and features are covered in Section 7 and the implementation conse- quences are dealt with in Section 8.

2. Background

2.1. M P E G

The Moving Picture coding Experts Group (MPEG) was established by ISO/IEC JTC1 in 1988 to standardise coding of moving images for

storage applications. A work plan was devised with three phases, distinguished mainly by their coded bit-rates. The first phase to be tackled was aimed primarily at bit-rates in the region of 1.5 Mbit/s where storage media such as CD-ROM were begin- ning to appear. After compiling an agreed list of requirements MPEG issued a call for proposals in mid 1989 to be evaluated later that year. Almost all of the 15 responses were based on a hybrid of motion compensated interframe prediction and the discrete cosine transform (DCT). Both these components had earlier been selected by the CCITT SGXV Specialist Group on coding for visual tele- phony who were approaching the completion of their studies towards Recommendation H.261 [4]. The DCT was also a part of the still picture coding algorithm being developed in the Joint Photo- graphic Experts Group (JPEG) [6].

During 1990 MPEG converged by means of software Simulation Models to agreed text for the video part of Committee Draft 11172 [7]. Accom- panying parts of that document address coding of associated audio and the means to multiplex audio, video and other data into one bit stream and permit the decoded versions to be presented in synchron- ism with each other.

0923-5965/93/$06.00 (~) 1993 Elsevier Science Publishers B.V. All rights reserved

92 G. Morrison, I. Parke / COSMIC." A compatible scheme for moving image coding

Also during 1990, as the first phase algorithm approached its final form, some attention was devoted to the second phase. There was widespread feeling that rates in the region of 5 to 10 Mbit/s could provide quality on a par with or even exceed- ing the best analogue over the air or over cable broadcasting systems. Though there were no readily available storage media providing these throughputs and the same order of replay time as CD-ROM for the first phase, there were good reasons to suppose that these would be developed if the demand warranted. Thus MPEG began to prepare for a second standard. To distinguish the two, the terms MPEG-1 and MPEG-2 were coined although these have no formal status in the ISO.

Meanwhile, the CCITT having completed Recommendation H.261 had established an Experts Group for video coding for Asynchronous Transfer Mode networks. ATM will be the founda- tion of the Broadband Integrated Services Digital Network (B-ISDN) [5]. As there was clearly some overlap in the technological possibilities and possi- bly in applications, liaison statements between the ISO and CCITT groups were exchanged about collaboration. This was agreed and the CCITT delegates began to participate in joint sessions though they continued to hold their own for those items outside the interest of MPEG.

In the spring of 1991, MPEG invited pre- registration of proposals and by the late summer, the details of the evaluation tests and required documentation were finalised. Picture quality would be assessed by formal subjective tests on five short sequences, three encoded at 4 Mbit /s and three at 9 Mbit/s [1]. Hardware implementation complexity would also be taken into account by a ranking procedure performed by VLSI experts considering supporting documentation containing general architectural plans and detailing such factors as the amount and cycle time of random access memory, the arithmetic operations performed, etc. It would have been unreasonable, in terms of time and effort, to require that proposers provide working hardware to validate their paper claims. Conse- quently, the complexity assessment was largely a

subjective exercise, depending heavily on the thoroughness of proposers and the skill of the VLSI experts.

Several other essential or desirable features were to be addressed by the proposals, but few of them were to be tested because of difficulties in deriving and agreeing suitable test methods. As a consequence, there was a very great temptation for proposers to concentrate on achieving the best results under the well defined but limited range of condi- tions of the picture quality tests.

2.2. VADIS and COST 211ter

VADIS is an acronym for 'Video Audio Digital Interactive System', a collaborative European project established under Eureka auspices to run alongside MPEG in a co-operative manner but also to complement it by proceeding to solutions, including VLSI, for full applications, this being outside the scope of MPEG. Thirty organisations joined the project and the vast majority of them had experience in picture coding and were eager to submit proposals to MPEG.

COST 211 is another European project with many years of experience in picture coding, having developed the algorithm which was adopted as CCITT Recommendation H.120. The follow on phase, COST 211bis, contributed strongly to H.261. COST 21 lter is currently working on video coding for a wide range of applications. Many of the participating organisations are also in VADIS, but to benefit from the expertise of those who are not the two projects enjoy close co-operation on those topics where the contractual obligations of VADIS are not compromised.

From the VADIS and COST 211ter partners, several groupings formed to work together on specific approaches and present proposals to MPEG. The development and submission of the COSMIC proposal were led by BT Labs with support from France Telecom CNET, Italtel, the National Technical University of Athens and Siemens.

Signal Processing: Image Communication

G. Morrison, L Parke / COSMIC." A compatible scheme for moving image coding 93

3. Compatibility 3.2. Benefits of compatibility

This item featured so much in the design of the proposal that it was incorporated in the name given to it. The initial desire was compatibility with the earlier schemes emanating from ISO and CCITT, namely MPEG-1 and H.261, but the concept is also applicable to future standards for higher definition picture formats.

3.1. Forms of compatibility

MPE G has recognised that compatibility has several forms and uses the following terms and definitions.

3.1.1. Upward and downward compatibility These forms of compatibility refer to a system

where different picture formats can be used in the encoder and decoder. Different picture formats do not imply different standards. The system is

upward compatible if a higher resolution decoder is able to decode pictures from the signal produced by a lower resolution encoder. downward compatible if a lower resolution decoder is able to decode pictures from the signal, or part of the signal, produced by a higher resolution encoder.

3.1.2. Forward and backward compatibility These terms refer to a system where different

standards can be used by the encoder and decoder, i.e. an existing standard (MPEG-1 or H.261 for example) and a new standard. The picture formats of these standards can, but need not, differ. The system is

forward compatible if the new standard decoder is able to decode pictures from the signal, or part of the signal, of an existing standard encoder.

-backward compatible if an existing standard decoder is able to decode pictures from the signal, or part of the signal, of the new standard encoder.

The benefits of compatibility accrue to users of the standard at both coding and decoding ends.

A higher quality service can be initiated even though the installed base of new decoders for it is still small.

Users who have invested in coding and decoding equipment for the earlier standard are not alien- ated. They are not forced to choose between invest- ing again in the new standard or becoming islands of obsolescence. It is not just the cost of encoders and decoders which must be considered. Costs arising from supporting infrastructures, such as dual inventories of coded material, can also be very significant.

Other benefits and merits, including a degree of scalability, which result from the particular method by which COSMIC achieves compatibility are covered in Section 7.

4. Methods to achieve compatibility

4.1. Simulcast

The simulcast method achieves compatibility at the service level rather than by design of the algorithm. The required quality levels are obtained from parallel encoders operating independently and producing separate and simultaneous bit streams. The total bit-rate required is the sum of the bit-rates for each; in principle this is wasteful as some information is carried twice.

Because there is no interaction between the encoders, the simulcast method provides the highest degree of freedom in the design of the new standard.

Simulcast provides downward and backward compatibilities, but upward and forward compatibilities are not guaranteed. A decoder designed on the assumption that only decoding of the new standard need be provided will not be able to handle material which is provided only in the existing standard.

Vol. 5, Nos. I 2, February 1993

94 G. Morrison, 1. Parke / COSMIC." A compatible scheme for moving image coding

4.2. Syntactic extension

In this case the encoder produces only one data stream and its syntax is an extension of the existing standard. This permits upward and forward compatibilities as the new standard decoder is able to handle the existing standard as a subset of the new one. Downward and backward compatibilities are not provided as the existing standard decoder is not equipped to deal with the additional features of the syntax of the new standard.

4.3. Switchable encoder

This scheme is suitable mainly for services in which the type of decoder can be identified by the encoder. The encoder then switches to the existing standard or the new standard to match the decoder. This method is suitable for one to one applications such as conversational services, but is not a solution for distribution applications where there may be no reverse channel for signalling or there may be multiple decoders conforming to the

existing and new standards.

4.4. Embedded bit stream

The encoder produces a bit stream with two constituents. One conforms to the existing standard. The other contains incremental information which the new standard decoder uses in addition to the data in the existing standard part. Downward and backward compatibilities are automatically achieved if the existing standard decoders are able to ignore the second constituent; the Video and

System parts of the MPEG- 1 standard already provide facilities to do this. Upward and forward compatibilities are also achieved as null incremental

data is a valid condition for the new standard. In principle there is no waste of bit-rate since the

two constituents contain complementary information. In practice, however, the constraint of the existing standard may limit the coding efficiency of the incremental component in comparison with the clean slate opportunity of simulcast.

The embedded bit stream approach is a special case of layered coding methods. We selected them

for further investigation because the other methods examined above give only limited degrees of compatibility.

5. Layered coding methods

5.1. Sub-band coding

Sub-band coders incorporate band splitting filters as an early part of their operation and initially they appear promising for the compatibility and scalability features sought. Sub-band coders can be designed with many partitions of the 3-dimensional

volume of spatial and temporal frequencies, yield- ing fine granularity of scalability. However, the classical sub-band coders are not well suited to applications where the bit-rate of individual or groups of bands must be fixed. The subsampling in higher bands and the lack of connections between the coders of the separated bands mean that any coding distortion arising in a lower band cannot be corrected. The allocation of fixed bit-rates for the bands is therefore problematical. I f a sufficient number of bits is assigned to a lower band for the most demanding occasions, this will be wasteful at other times. However, if a lower number sufficient for typical pictures is given, the result on more difficult images will be visible distortion in that band even though higher ones may be coded well. For the best subjective performance, it would have been better to distribute the bits differently to obtain less distortion in the lower band and more distortion in a higher one.

5.2. D CT split

The DCT is effectively a filter bank and separation can be achieved by selecting coefficients. For example it is possible to perform a 16 x 16 DCT, extract the square of 64 coefficients representing the lower frequencies and apply these to an 8 x 8 inverse DCT to recover a low pass filtered version. However, because of edge effects resulting from the DCT being applied to blocks in a jumping manner unlike a finite impulse response filter running


G. Morrison, I. Parke / COSMIC. A

smoothly over the input data, the reconstructed 64 pel block is not identical to the result which would have been obtained if the original 256 pel block had been low pass filtered, subsampled, trans- formed with an 8 × 8 DCT and inverse trans- formed. The consequence is that it is not possible to optimise simultaneously the quality in the low resolution and high resolution decoded pictures.

The DCT split would also suffer from the same difficulty as sub-band coders over fixed bit-rate allocations. This can be overcome by subtracting the quantised values of the lower coefficients so that the coding errors in them are passed to the higher layer in addition to the other coefficients. Higher layers therefore have the opportunity to balance distortion in the low and high resolution components of the decoded composite.

A DCT split algorithm was investigated and proposed to MP EG by Ter Horst et al. [3].

5.3. Pel split

We considered the pel split approach to offer more flexibility. The DCT split is effectively a specific down sampling filter whose characteristics are 'hard-wired' into the DCT equations. This also means that both the down-sampling ratios and the

compatible scheme for moving image coding 95

block size ratios are restricted. The proposal by Ter Horst et al. employed a 16 horizontal pel x 8 vertical line block for their composite configuration and an 8 x 8 block for the compatible one. That algorithm also employed temporal splitting by decomposing alternate fields in the above manner and the others being coded without spatial splitting. Only such an integer ratio is possible.

By contrast, the pel split does not have these restrictions. The spatial and temporal ratios can be non-integer. It is quite feasible to have a lower layer operating with the CCITT Rec. H.261 Common

Intermediate Format (CIF) of 288 lines repeating at 29.97 Hz and an upper layer with 575 lines at

50 Hz or 485 lines at 59.94 Hz. Further, the spatial and temporal down-sampling and up-sampling filter characteristics are available for optimisation though the latter must be standardised as it appears in both coder and decoder.

The pel split is therefore a hierarchical arrangement. Figure 1 shows the classical configuration in which the lower layers are produced by filtering

and down-sampling. The coded versions are up- sampled and subtracted from the originals to yield inputs for higher layers. The COSMIC algorithm is similar except that the subtractions are not auto- matic. Instead, as shown in Fig. 2, the up-sampled

601 INPUT r , , .

, ~:~ k~,~" ~ 601 OUTPUT

uxt \

D

Fflterfng and do,,vn sampling

= Up~anlpling and Interpolation

Fig. 1. Hierarchical compatible coding. Vol. 5, Nos. 1-2, February 1993

96 G. Morrison, 1. Parke / COSMIC. A compatible scheme for moving image coding

601 INPUT MPEG-lb~

2nd layer

. . . . . . . . . . . M_ _PE_G_-3 . . . . . . . . .

FROM ' F ~ 2 ~ l ~ b ~ ~ - - ] ~ - ~ " ~_ r . . . . . . . . . . . . . . . . . . . . . . . . . .

Predictor : MPEG-1 I / : . . . . . . . . . . . . . . . . . . . . . . . . . .

MUX TO DEMUX

\

[~ = R#ertng and down sampling

[~ = Upsampling and interpolation

~- 601 OUTPUT

Fig. 2. COSMIC compatible coding.

coded lower layer is fed to the upper layer where it can be used as a predictor but need not be. The choice is adaptive.

Although the specific proposal made to ISO/ MPEG consisted of two layers, it is evident that the technique is applicable with more. An important point is that the arrangement chosen for COSMIC has the interaction between the layers taking place outside the loops. This allows complete separation of the internal features of the loops and extra layers can be added at the top with no impact on those below. Thus, the coding technique for these new layers can be decided at the time they are required. It should also be noted that the aspect ratios of the layers do not need to be the same. A 4:3 lower layer can be fed into a 16: 9 one and only the relevant portion used for predictions. In this manner a 3 layer architecture could provide HDTV, standard TV and reduced quality TV from one bit stream. Though the bottom layer in this case is constrained to be the DCT scheme of ISO/ Signal Processing: Image Communication

MPEG CD 11172 for compatibility reasons, the other two could use wavelet transforms, vector quantisation, fractal coding or another method yet to be discovered. As stated earlier, there is no requirement that the scanning formats of the layers have simple integer relationships and so HDTV need not be 'twice' standard TV.

It is possible to devise configurations such as shown in Fig. 3 in which the prediction errors are embedded. These are based on good correlation between the prediction errors of the layers. How- ever, prediction errors are very sensitive to motion vector resolution. We surmised, but because of the short timescale for proposals did not confirm by experiment, that for optimum results both the vector resolutions and the sxzes of the blocks they apply to in the upper layers should be scaled up by the sampling density ratios. The performance of the new layers would be limited by the parameters already enshrined in MPEG-1. Because of this and the closing of doors to other coding methods for

G. Morrison, I. Parke /COSMIC: A compatible scheme for moving image coding 97

601 INPUT

FROM MUX

-)

FS

. . . . . . . . . . . MPEG-1 . . . . . . . . . .

MPEG-] bits } I , . ~

2o~ ,o~o, b , ~ I M UXl-~-)" TO DEMU)

, "-i Ij -) J I ~ I down sampling

DEMUX

Fig. 3. Layered coding with embedded prediction.

--~ 601 OUTPUT

higher layers added in the future, we rejected the embedding of prediction errors.

6. Algorithm details

This section gives the details of the simulations of the algorithm as submitted for testing. Though many parameters of an encoding process can be left open in the eventual standard, it was necessary to select specific values to produce simulation results.

6.1. MPEG-1 layer

The MPEG-1 bit stream is derived in the normal way by first down-sampling the CCIR 601 resolution input pictures to Source Input Format (SIF). For 25 Hz SIF the input pictures are down- sampled horizontally, the interlace is removed by vertical line shifting and the 50 Hz fields are aver- aged to reduce the picture rate to 25 Hz. The coding of the SIF pictures uses Simulation Model

3 from the MPEG-1 phase. The main parameters for this are one intra (I) picture in ten, one bi- directional (B) picture between predicted (P) pictures and a bit-rate of 1.15 Mbit/s.

After local decoding the resulting SIF pictures are up-sampled to CCIR 601 by picture repeating to obtain 50 Hz, vertical line shifting to reintroduce the interlace and horizontal interpolation. These pictures can then be used as a prediction mode for the second layer.

As mentioned earlier, it is also possible to substi- tute H.261 for this layer. This is of particular interest for applications requiring compatibility with the installed user base for that standard.

6.2. Second layer

The second coding layer is shown in Fig. 4. The CCIR 601 input is first converted to a 4:2 : 0 sampling structure in which, in an analogous manner to H.261 and MPEG-1, the colour difference components have half the spatial resolution of luminance both horizontally and vertically. The coding is on

Vol. 5, Nos. 1 2, February 1993

98

601 INPUT

G. Morrison, I. Parke / COSMIC." A compatible scheme for moving image coding

to -) FORWARD I 4:2:° --I OODE --I I 2n 'oyerO,

REVERSE CODER

._ /

MOTION L, COMP I TM

Locally

decoded MOTION FIELD SIF COMP STORE

I FIELD STORE

t

r ~ = Upsarnprlng and interpolation

Write inhibit on Extrapolated fields

Fig. 4. COSMIC coder.

a field basis using three field types; intra, predicted and extrapolated. The intra and predicted types are the same as in MPEG-1. The extrapolated field type is a non-recursive prediction from the previous intra or predicted field and is similar to the B pictures of MPEG-1. The extrapolated fields are never used to predict other fields and therefore are not entered into the field stores in the prediction loops of the coder and decoder.

Each field has a slice and macroblock structure as in MPEG-1 with 18 slices per field and 44 macroblocks per slice. A macroblock consists of four luminance blocks and two chrominance blocks. A block is 8 pels x 8 lines.

There are three possible prediction modes for each macroblock. These are prediction from the decoded MPEG-1 picture, prediction from the previously coded field of the same parity and prediction from the previously coded field of the oppo- site parity. The choice is made on the basis of minimum prediction error power.

The coding of macroblocks is the same as in MPEG-1 using the same DCT and quantisation Signal Processing: Image Communication

with a weighting matrix. Coefficient scanning is changed from zig-zag to vertical. The VLC tables are as for MPEG-1 except for the macroblock types and coded block pattern.

6.3. Results

The coding scheme was evaluated in three modes. These are compatible, simulcast which does not make use of the MPEG-1 bit-rate and non- compatible where the MPEG-1 coder is switched off and the total bit-rate given entirely to the new layer. The results for the sequence 'Mobile and calendar' at total bit-rates of 4 and 9 Mbit/s are shown in Table 1.

The results show that the compatible prediction mode provides gains in picture quality over simulcast and these gains become larger the greater the percentage of the total bits used to code the MPEG-1 layer. The non-compatible mode provides the best performance and can be invoked when compatibility can be sacrificed in order to gain the best quality or lowest bit-rate.

G. Morrison, L Parke / COSMIC." A compatible scheme for moving image coding

Table 1

Results for Mobile and calendar at total bit-rates of 4 and 9 Mbi t / s

MPEG-I 2nd layer SNR (dB) kbit/s kbit/s y U V

Compatible 1150 2850 27,39 30.68 Simulcast 0 2850 26,36 29.80 Non-compatible 0 4000 28,04 31.08 Compatible 1150 7850 31.96 34.16 Simulcast 0 7850 31,57 33.81 Non-compatible 0 9000 32,33 34.43

32.05 31.12 32.41 35.44 35.08 35.69

99

Table 2

Results of compatible coding at total bit-rates of 4 and 9 Mbi t / s

Sequence Bit-rate SNR (dB)

(kbit/s) Y U V

Flower garden 1150 + 2850 28.62 31.28 32.5 l 1150+7850 33.40 34.91 35.41

Table tennis 1150+2850 31.66 36.17 37.28 1150+7850 35.29 38.79 40.28

Popple 1150 + 7850 34.77 37.81 38.32

The results of coding the remaining test sequences using the COSMIC algorithm are shown in Table 2. The signal to noise ratios (SNR) in this paper are averages over the durations of the 625 line versions of the test sequences. They are evaluated on the 4: 2: 0 signals.

7. Other advantages and features

7.1. Fast search modes

In a conventional non-hierarchical coding scheme shuttle modes such as fast forward and fast reverse are achieved by skipping pictures in a sequence. For example in an MPEG sequence with one intra (I) picture every ten and one bi- directional (B) picture between predicted (P) pictures, by skipping the B pictures and displaying only the I and P pictures a two times speed up is achieved. If the B and P pictures are skipped, this should give a ten times speed up factor, but the effective bit-rate will be approximately four times the normal bit-rate because I pictures require sig- nificantly more coded bits than the average over

all pictures. The bit delivery mechanism or the decoder or both may not be able to handle this. This can be mitigated against by displaying each I picture twice giving a five times speed up factor. However, speed up factors achieved in this manner produce very jerky motion.

With COSMIC, in which the codings of the layers are very similar, it is possible to use the full power of the decoder to decode only the MPEG-1 pictures but at a much faster rate. An MPEG-1 bit stream encoded at 1.5 Mbit/s can be decoded at about 6 times normal speed by a decoder intended for use up to 10 Mbit/s. Double speed fast forward can be achieved without the necessity to skip any pictures. The 25 Hz MPEG-1 pictures are decoded at twice normal speed and fed to a standard 50 Hz display. Choosing to display only two out of three pictures gives a speed up factor of three and so on up to speed up factors of six. For greater speed up factors the B pictures must be ignored. COSMIC is thus able to provide much smoother motion for the fast forward mode than other single layer proposals.

Fast reverse is similar, the I and P pictures being decoded for a Group of Pictures and displayed in reverse order. No extra storage is required because the picture stores for the second layer decoding process can be used.

When fast forward and fast reverse shuttling stops the display can revert to full resolution.

7.2. Error resilience

All high compression codecs operate by remov- ing redundancy. This makes decoders susceptible

Vol. 5, Nos. I 2, February 1993

100 G. Morrison, L Parke / COSMIC." A compatible scheme for moving image coding

to errors in the coded bit stream. H.261 includes a Forward Error Corrector to combat the errors likely to be encountered on the networks on which it is intended to be used. The MPEG philosophy is that of a generic algorithm which can be used in many applications with various storage and trans- mission media. The logical conclusion to this is that error correction be located outside the MPEG algorithm and tailored to the specific error characteristics of the digital medium.

While this approach has previously been quite successful, there are at least two transport media where it may be insufficient.

7.2.1. B-ISDN The Broadband Integrated Services Digital Net-

work, which is under development by the CCITT, employs a switching arrangement based on cells containing about 46 bytes of user data. Loss of one cell could have a large effect on a decoder. Various strategies have been devised to cope with this, of which layered coding is one of the most promising. Our earlier work on a two layer coder with H.261 as the first layer and intra coding in the second of quantising errors showed very high resilience to cell loss of the second layer data [2]. The B-ISDN standards will support two levels of cell priority. Thus there is potential for coding algorithms which separate their coded data into two streams with the most important being conveyed at the higher priority. Though the COSMIC algorithm employs predictive coding in the second layer it still displays good resilience to cell loss there.

7.2.2. RF broadcasting An important application for MPEG-2 is likely

to be TV broadcasting. This could supplement and later supplant analogue systems. Digital transmis- sion systems have a very rapid transition between low and high error rates. This is unlike the slow increase in noise as analogue systems are operated towards the limits of their service areas. Such gra- ceful degradation is favoured by broadcasters. A means of achieving a similar end with MPEG-2

would involve a layered scheme. The most important layer would be transmitted at higher power. Errors would first occur in the second layer where they are less visible.

7.3. Scalability

MP EG defines a scalable bit-stream as one having the property that part or parts of it may be discarded, yet a decoder can still produce useful results. Examples are the ability to retrieve images which have less spatial resolution or less temporal resolution or more noise than those represented by the complete bit stream as produced by the encoder. One reason for wishing to do this is to permit use of decoders with less processing power, which would therefore be cheaper, in circumstances where the reduced quality would still be acceptable, yet the full quality would remain available to those with full complexity decoders.

Compatible schemes, including COSMIC, based on layering inherently provide a degree of scalability. The granularity available depends on the number of layers. Scalable algorithms, however, do not necessarily give compatibility.

Advantages of scalable methods are readily apparent in scenarios in which the decoder is fed via a network. Pictures can be received via a lower bit-rate channel without the need for any decoding and recoding, enabling a high quality picture to be directly branched into a lower rate channel. An example of this is a multipoint videoconference where some participants can only have access at a lower bit-rate. A multipoint control unit can arrange that only the base layer is sent to them, while the others enjoy the higher quality of the complete bit stream. In the reverse direction, the single layer from the coders with restricted access is understood by all decoders.

Even in one to one and one to many broadcast applications the user of the decoder has a choice in paying for delivery bandwidth and decoder complexity appropriate to his requirements. This choice can be exercised in three ways:

on a once off basis when initially obtaining equipment and signing up for services.


G. Morrison, L Parke / COSMIC. A compatible scheme for moving image coding 101

on an occasion by occasion basis. The user has the highest level equipment and access which meets all his requirements, but drops back to a lower level, incurring lower bandwidth related charges, when this is adequate for the application.

- dynamically. This is similar to the preceding case but choice is exercised within an application. An example is browsing cheaply at the lowest layer and then paying more for higher quality on the selected items of interest.

8.1.2. Parser speed The required speed of parsing the input bit

stream and decoding the variable length codes is a function of the total coded bit-rate and is independent of the number of layers.

8.1.3. Parser table size Tables are required for the parser's state

machine and for decoding of variable length code words. For the COSMIC proposal the additional overheads are minimised by the use of identical or similar syntax and many common VLC tables in the two layers.

8. Implementation complexity

The goal of the I S O / M P E G activity is to define a standard which will find widespread use and support. To do this it must be cost effective compared to alternative (proprietary) solutions. MPEG therefore also examined proposals on the basis of their implementation complexity. For some applications the number of decoders is many times larger than the number of encoders and the cost of the latter is then of secondary importance. Never- theless, MPEG also considered the encoder complexity.

8.1. Decoder complexity

An initial consideration of the decoder for the proposed algorithm might suggest that the layered approach has a high complexity relative to non- layered proposals but this is not substantiated. The complexity of a decoder is affected by the following factors.

8.1.1. Coded data buffering These buffers are usually characterised by the

time they represent. Though the multi-layer algorithm has a larger number of buffers, the total memory requirement in bits is the same and can be configured in a common memory device. There will be a small overhead for maintaining the additional read and write pointers.

8.1.4. Inverse quantisation The quantisation methods are very similar for

both layers, so the impact will mainly be on the throughput rate required. The multi-layer algorithm may have slightly more code words packed into the same bit-rate. However, inverse quantisation is seldom a decoder bottle-neck.

8.1.5. Inverse DCT This algorithm uses the same 8 x 8 inverse DCT

for all blocks in all layers. The number of blocks to process is 25% higher than a single layer 4 :2 :0 algorithm. However, to put this figure in perspec- tive, the proposed algorithm has 6% fewer blocks than a single layer 4 :2 :2 algorithm.

8.1.6. Prediction storage The MPEG Implementation Studies group

found that picture buffers are the dominant cost in implementations. Although this algorithm has 25% more pels per picture than a single layer 4 :2 :0 version, the prediction storage is modest because the upper layer uses only forward prediction. Therefore, only one 4 :2 :0 CCIR 601 picture memory is required in the second layer loop in addition to the two SIF picture memories in the MPEG-1 decoding loop. This total is actually less than that of most of the proposed single layer schemes using bi-directional prediction techniques copied from MPEG-1.

VoL 5~ Nos. 1 2, February 1993

102 G. Morrison, L Parke / COSMIC." A compatible scheme for moving image coding

8.1.7. Architecture The above analysis is based on an architecture

employing time division multiplexing of common hardware elements between the two layers. This is eased by the very large amounts of identical or similar operation in the two layers. Though it might be considered that the switching between the two layers could introduce considerable complexity, it should be remembered that the decoders for single layer algorithms already operate in a multi- plexed fashion. They have three decoding loops, one luminance and two chrominance, being time shared on the one piece of hardware. Extension to multi-layer is therefore perhaps more of a concep- tual problem than a real one.

8.1.8. Up-sampling filter The up-sampling filter is the only significant

extra item compared to a decoder for a single layer algorithm. It comprises a few pel and line delays to give the taps for the horizontal and vertical spatial filters plus arithmetic processing. In some designs it will be possible to utilise the storage already provided for conversion from block scan to raster scan as part of the vertical up-sampling filter.

Any decoder system which is intended to decode MPEG-1 in addition to MPEG-2 will require an up-sampling filter. Thus, in many cases there will be only a minimal hardware penalty arising from the possible need to switch its characteristics between a proprietary one for MPEG-1 display and the standardised one for producing the MPEG-I prediction for the second layer.

8.2. Encoder complexity

Much of the above analysis of the decoder also applies to the encoder but there are two areas deserving fuller comment.

COSMIC simulations used independent searches in the two layers. It is likely that the MPEG-1 layer vectors could be used as initial candidates in the second layer, thereby further reducing the encoder complexity.

8.2.2. Encoder delay The COSMIC encoder incorporates an MPEG-

1 coder and decoder which introduce delay. The component caused by the rate smoothing buffering can be avoided by feeding the decoder part from the input side of the encoder buffer. However, the delay introduced by the reordering of pictures at the MPEG-1 encoder input and decoder output cannot be avoided. A padding delay must be added to the uncoded CCIR 601 input to the second layer. This is 5 fields of 4 :2 :0 CCIR 601. Fortunately this is a single bulk delay of a shift register type and requires neither random access nor access to the individual fields. Consequently, the penalty is primarily in memory size and not in memory bandwidth.

Delay is also necessary to compensate the delay of the MPEG-1 part of the decoder so that the outputs of the two layers can be combined. This delay can be performed on the encoded data for the second layer and therefore requires much less memory than mentioned above. This delay can be placed at the encoder or at the decoder. Because decoders are usually more cost sensitive we chose to put this matching delay at the encoder.

The delay of H.261 is much less than MPEG-1 because the simpler prediction arrangement does not require field reordering and because the rate smoothing buffer does not need to accommodate the intraframe peaks of MPEG. Thus, when COSMIC is configured to provide H.261 compatibility by using that algorithm in the first layer, the above factors are less troublesome.

8.2.1. Prediction Because the second layer uses forward prediction

only, the determination of motion vectors is con- siderably less than for schemes incorporating bi- directional prediction at the CCIR 601 level. The Signal Processing: Image Communication

9. Conclusion

The COSMIC proposal to I S O / M P E G has been outlined together with the reasons which governed

G. Morrison, L Parke / COSMIC" A

the decisions m a d e in its development . T h o u g h the

a lgor i thm was no t amongs t the highest scoring

ones in the pic ture qual i ty assessments, the au thors

believe tha t it offers a good a l l - round so lu t ion in

terms o f pic ture qual i ty , features and implementa-

t ion complexi ty . U n d o u b t e d l y there are several

areas in which these could be improved and we

hope tha t the eventual M P E G - 2 s t anda rd will

benefit to some extent f rom the ideas and effort

which went in to the C O S M I C proposa l .

Acknowledgments

Thanks are due to our col leagues at BT Labs ,

F rance Te lecom C N E T , I tal tel , the N a t i o n a l Tech-

nical Univers i ty o f Athens and Siemens for their

con t r ibu t ions to the deve lopment o f C O S M I C . The

work was par t ly funded by the Eu reka p rog ramme .

compatible scheme for moving image coding 103

The au thors have sole responsibi l i ty for any

op in ions expressed in this paper .

References

[1] T. Hidaka and K. Ozawa, "ISO/IEC JTCI SC29/WGII; Report on MPEG-2 Subjective Assessment at Kurihama", Signal Processing: Image Communieation, Vol. 5, Nos. 1 2, February 1993, pp. 127 157.

[2] D.G. Morrison and D.O. Beaumont, "Two-layer video coding for ATM networks", Signal Processing : Image Com- munication, Vol. 3, Nos. 2 3, June 1991, pp. 179 195.

[3] R. ter Horst, A. Koster, K. Rijkse, E. Fert, G. Nocture and L. Tranchard, "MUPCOS: A multi-purpose coding scheme", Signal Processing: Image Communication, Vol. 5, Nos. 1 2, February 1993, pp. 75 89.

[4] CCITT Recommendtion H.261: Video codec for audio- visual services at p x 64 kbit/s.

[5] CCITT Recommendation 1.121: Broadband aspects of ISDN.

[6] ISO/IEC DIS 10918-1: Information technology Digital compression and coding of continuous-tone still images.

[7] ISO/IEC DIS 11172: Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/s.

Vol. 5. Nos. I 2~ February 1993

cosmic: a compatible scheme for moving image coding

Documents