an introduction to super audio cd and dvd-audio...

12
JULY 2003 IEEE SIGNAL PROCESSING MAGAZINE 71 An Introduction to Super Audio CD and DVD-Audio F or many, the audio CD is considered the most successful new format in consumer electron- ics. Since its introduc- tion in 1982, the CD has become the distribution medium of choice for both music and data. Using a sam- pling frequency of 44.1 kHz, analog audio is sampled and converted to pulse code modulated (PCM) sam- ples. Each sample has 16 b of resolu- tion and provides a signal-to-noise ra- tio in excess of 96 dB. Audio CDs use no compression; however, a cross-in- terleaved Reed-Solomon code (CIRC) allows audio CDs to satisfy the need for error detection and cor- rection during playback. In 1997, the introduction of DVD-Video provided not only supe- rior video quality but also allowed consumers to experience and enjoy multichannel surround sound in the home. DVD movies do provide mul- tichannel audio, but it is stored com- pressed using the lossy Dolby digital or MPEG-audio formats. The sam- pling rate for compressed audio in the DVD-Video format is 48 kHz. The DVD-Audio specification al- lows for up to 24-b PCM data and uses the Meridian lossless packing (MLP) algorithm to provide up to six channels of high-quality, multichan- nel audio at sampling rates of up to 96 kHz for six channels or 192 kHz for two channels. Super-audio CD (SACD), intro- duced in March 1999, integrates a va- riety of new technologies, such as the hybrid disc, direct stream digital (DSD), and direct stream transfer coding. Unlike DVD-Audio, SACD does not use PCM encoding. The idea behind DSD is to remove the front-end and back-end decimation and interpolation filters and directly record 1-b data, sampled at approxi- mately 2,822.4 kHz. At the end, DSD signals yield a dynamic range greater than 120 dB. Table 1 summa- rizes the key characteristics of audio CDs, SACD, and DVD-Audio. The purpose of the next two articles is to present and highlight the latest developments in consumer audio and specifically in DVD-Audio and SACD. The first article by B.H. Suzuki, N. Fuchigami, and J.R. Stuart, from JVC and the Meridian Group, pres- ents an overview of the DVD-Audio specification. The authors place spe- cial emphasis into the design and im- plementation of the MLP coder, especially as it relates to the coding of multichannel audio. The second article by E. Janssen and D. Reefman, from Philips Re- search Laboratories, introduces SACD. Here, the authors place special emphasis on the new challenges in dig- ital signal processing using DSD streams. I hope that readers will enjoy reading both articles and that this Forum will stimulate a healthy and interesting exchange of ideas in fu- ture columns. Table 1. Main parameters in DVD-Audio, SACD, and audio CD recordings. Parameter DVD-Audio SACD CD Audio coding 16-, 20-, or 24-b LPCM 1-b DSD 16-b LPCM Sampling rate 44.1, 48, 88.2, 96, 176.4, or 192 kHz 2,822.4 kHz 44.1 kHz Channels 1-6 2-6 2 Compression Yes (MLP) Yes (DST) None Content protection Yes Yes No Playback time 62-843 min* 70-80 min 74 min Frequency response DC-96 kHz DC-100 kHz DC-20 kHz Dynamic range Up to 144 dB Over 120 dB 96 dB *For 62 min, we assume f s = 96 kHz, 20-b samples, and five channels. For 843 min, we assume f s = 44 1 . kHz, 16-b samples, and one channel. 1053-5888/03/$17.00©2003IEEE Konstantinos Konstantinides

Upload: duongtuyen

Post on 27-Jul-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An Introduction to Super Audio CD and DVD-Audio Fdecoy.iki.fi/dsound/ambisonic/motherlode/source/01215233.pdf · JULY 2003 IEEE SIGNAL PROCESSING MAGAZINE 71 An Introduction to Super

JULY 2003 IEEE SIGNAL PROCESSING MAGAZINE 71

An Introduction to Super Audio CD and DVD-Audio

For many, the audio CDis considered the mostsuccessful new formatin consumer electron-ics. Since its introduc-

tion in 1982, the CD has become thedistribution medium of choice forboth music and data. Using a sam-pling frequency of 44.1 kHz, analogaudio is sampled and converted topulse code modulated (PCM) sam-ples. Each sample has 16 b of resolu-tion and provides a signal-to-noise ra-tio in excess of 96 dB. Audio CDs useno compression; however, a cross-in-terleaved Reed-Solomon code(CIRC) allows audio CDs to satisfythe need for error detection and cor-rection during playback.

In 1997, the introduction ofDVD-Video provided not only supe-rior video quality but also allowedconsumers to experience and enjoymultichannel surround sound in thehome. DVD movies do provide mul-tichannel audio, but it is stored com-pressed using the lossy Dolby digitalor MPEG-audio formats. The sam-pling rate for compressed audio in theDVD-Video format is 48 kHz.

The DVD-Audio specification al-lows for up to 24-b PCM data anduses the Meridian lossless packing(MLP) algorithm to provide up to sixchannels of high-quality, multichan-nel audio at sampling rates of up to 96kHz for six channels or 192 kHz fortwo channels.

Super-audio CD (SACD), intro-duced in March 1999, integrates a va-riety of new technologies, such as thehybrid disc, direct stream digital

(DSD), and direct stream transfercoding. Unlike DVD-Audio, SACDdoes not use PCM encoding. The ideabehind DSD is to remove thefront-end and back-end decimationand interpolation filters and directlyrecord 1-b data, sampled at approxi-mately 2,822.4 kHz. At the end,DSD signals yield a dynamic rangegreater than 120 dB. Table 1 summa-rizes the key characteristics of audioCDs, SACD, and DVD-Audio.

The purpose of the next two articlesis to present and highlight the latestdevelopments in consumer audio andspecifically in DVD-Audio andSACD.

The first article by B.H. Suzuki, N.Fuchigami, and J.R. Stuart, from

JVC and the Meridian Group, pres-ents an overview of the DVD-Audiospecification. The authors place spe-cial emphasis into the design and im-plementation of the MLP coder,especially as it relates to the coding ofmultichannel audio.

The second article by E. Janssenand D. Reefman, from Philips Re-search Laboratories, introducesSACD. Here, the authors place specialemphasis on the new challenges in dig-ital signal processing using DSDstreams.

I hope that readers will enjoyreading both articles and that thisForum will stimulate a healthy andinteresting exchange of ideas in fu-ture columns.

Table 1. Main parameters in DVD-Audio, SACD, and audio CD recordings.

Parameter DVD-Audio SACD CD

Audio coding 16-, 20-, or 24-bLPCM 1-b DSD 16-b LPCM

Sampling rate 44.1, 48, 88.2, 96,176.4, or 192 kHz 2,822.4 kHz 44.1 kHz

Channels 1-6 2-6 2

Compression Yes (MLP) Yes (DST) None

Content protection Yes Yes No

Playback time 62-843 min* 70-80 min 74 min

Frequency response DC-96 kHz DC-100 kHz DC-20 kHz

Dynamic range Up to 144 dB Over 120 dB 96 dB

*For 62 min, we assume f s = 96 kHz, 20-b samples, and five channels. For 843 min, weassume f s = 44 1. kHz, 16-b samples, and one channel.

1053-5888/03/$17.00©2003IEEE

Konstantinos Konstantinides

Page 2: An Introduction to Super Audio CD and DVD-Audio Fdecoy.iki.fi/dsound/ambisonic/motherlode/source/01215233.pdf · JULY 2003 IEEE SIGNAL PROCESSING MAGAZINE 71 An Introduction to Super

DVD-Audio Specifications

Since 1983 audio CD(CD-DA) has been theprimary carrier for high-quality stereo music. Inthe intervening years opti-

cal media and signal processing tech-nologies have greatly advanced and en-abled the development of higher-den-sity discs such as DVD (digital versatiledisc). DVD offers more than six timesthe capacity of CD.

The DVD Forum [1] has devel-oped a family of specifications to ac-commodate both prerecorded(DVD-ROM) and recordable(DVD-R, RW, and RAM) applica-tions. DVD-Video [2] and DVD-Audio [3] are both applications ofDVD-ROM.

DVD-Video provides a carrierfor high-quality video assets, and itsspecification reflects a balance be-tween both data capacity and datarate for the sound and picture com-ponents of a movie. By contrast,DVD-Audio uses the capacity ofDVD-ROM to provide a music car-rier with supreme audio quality andbetter sound-field reproduction.The DVD-Audio Specification wasdrawn up in cooperation with themusic industry, which demandedvery high-quality audio in stereoand multichannel, multimedia func-tions and security.

The lossless coding schemeadopted in DVD-Audio is a key tech-nology that allows high-quality mul-tichannel music to be available on thehuge, but still limited, capacity DVDdisc. This article mainly focuses onMLP Lossless [4] since it will be of

most interest for readers of this maga-zine. (MLP stands for MeridianLossless Packing and is a trademark ofDolby Laboratories Inc.)

Features of DVD-AudioWorking Group 4 (WG4) of theDVD Forum developed theDVD-Audio Specification, and dur-ing this effort they received 15 key re-quirements from the internationalmusic industry, represented by ISC(International Steering Committee).The main features of DVD-Audio,which closely reflect these require-ments, are summarized below.

High-Quality AudioThe first requirement from artists, themusic industry, and listeners is to pro-vide sound quality that is superior toCD and as close to the original masteras possible. DVD-Audio meets thisrequirement flexibly by offering sev-eral options based on linear pulsecode modulation (LPCM). By in-cluding options that increase both thesampling frequency and quantizationword length, DVD-Audio offers fourtimes the frequency range and 256times the resolution of CD. DVD-Audio supports:� up to 192 kHz/24-b for two-chan-nel stereo� up to 96 kHz/24-b for multichan-nel (up to six channels).

In some combinations the audiocan be stored on the disc as rawLPCM. However, for efficient re-cording of the huge data resultingfrom high-resolution multichannel,

the lossless compression must beused. As the name implies, MLPcompresses data without any loss anddecompresses it into data that is ex-actly identical to the original LPCMmaster. MLP enables any of the audioformats permitted on DVD-Audio toplay for longer than 74 min (on a sin-gle layer 4.7 GB disc).

DVD-Audio also supports thesampling frequencies 44.1, 88.2, and176.4 kHz for making good use of as-sets that may be issued on CD.

DVD-Audio offers great flexiblyto producers by supporting not onlythe high-bit/high-sampling parame-ters but also more conventional cod-ing such as 44.1 kHz/16-b/twochannel. Titles can be made in a widevariety of formats.

Attractive MultichannelSurroundAlthough DVD-Audio provides forsuper high-quality, two-channel au-dio, perhaps its most attractive fea-ture is the support for high-qualitymultichannel sound. The superbquality of sound field that can beachieved through exerting the wholebit rate of DVD for audio data isclearly superior to that obtained usingthe lossy-compressed surroundsound options of DVD-Video.

DVD-Audio also offers methodsto enjoy multichannel music intwo-channel listening environmentsby providing backward compatibilityfor the listener who has a two-channelspeaker system or who may be usingheadphones.

72 IEEE SIGNAL PROCESSING MAGAZINE JULY 2003

Bike H. Kuzuki, Norihiko Fuchigami, and J. Robert Stuart

Page 3: An Introduction to Super Audio CD and DVD-Audio Fdecoy.iki.fi/dsound/ambisonic/motherlode/source/01215233.pdf · JULY 2003 IEEE SIGNAL PROCESSING MAGAZINE 71 An Introduction to Super

A Variety ofValue-Added ContentsAnother important requirement forthis new generation medium is theprovision of a variety of value-addedcontent to supplement the high-qual-ity audio experience. DVD-Audio of-fers attractive value-added content:still pictures that can be enjoyed inconjunction with the high-qualityaudio, DVD-video compatible con-tent suitable for video clips such asinterviews, and visual menu for se-lecting songs or additional informa-tion. These features were developedby transplanting or expanding onthe mult imedia capabi l i ty ofDVD-Video.

Compatibility with DVD-VideoAs will be explained later, DVD-Video compatible content is directlycompatible with the DVD-VideoSpecification, which means that suchcontent can be played on existingDVD-Video players.

Additionally, since functions oroperations for visual content such asstill pictures and menus are almost thesame as those of DVD-Video, usersof DVD-Video easily become famil-iar with DVD-Audio. Thus,DVD-Audio turns the compatibilitywith DVD-Video to good accountfor the propagation of the format.

Supporting CopyrightManagement SystemIllegal copying damages the rights ofcontent owners and so new generationcarriers must provide a credible protec-tion system against illegal copying.DVD-Video compatible contentmakes use of the content scramble sys-tem (CSS) encryption method [5] thatis used to protect DVD-Video movies.For the high-quality audio, DVD-Au-dio adopts a newer copy protection sys-tem: content protection for pre-recorded media (CPPM) [6]. Addi-tionally, audio data on a DVD-Audiodisc may contain an “audio watermark”that enables downstream control of il-legal copying.

A User’s View ofDVD-Audio ContentAs with CD-DA, DVD-Audio de-fines the “track” as the unit corre-sponding to one tune. So as toefficiently manage the large numberof tracks that are possible on this largecapacity disc, a higher layer, “group,”is introduced. A disc can contain 1 to9 groups, each of which in turn cancontain up to 99 tracks. Users can ac-cess a target tune by selecting a groupand a track.

Furthermore, a track may be seg-mented into multiple “index” por-

tions. As with CD-DA, index 0, and 1to 99 can be specified.

Besides audio data, a DVD-Audiodisc can include still pictures,DVD-Video compatible content, andso on. DVD-Video compatible con-tents are also represented as tracks atthe user level. A “visual menu” mayalso be included if required. Figure 1shows the user’s view of DVD-Audio.

For distinction, we call tracks forhigh-quality audio contents audiotracks and those for DVD-Videocompatible contents video tracks.

Internal Data Structureof DVD-AudioThe internal data structure ofDVD-Audio is somewhat differentfrom that implied by the user’s view.Because it can store a variety of con-tent, DVD-Audio has a fairly com-plex data structure, especially whencompared to the rather simplerstructures of CD-DA. Figure 2shows how data is allocated on a discfrom inside to outside. In the figure,the dotted-line boxes indicate op-tional content.

Since DVD-Audio is an applica-tion format of DVD-ROM (DVDSpecifications for Read-Only disc),the DVD-Audio disc complies withPart 1: Physical Specifications ofDVD-ROM [7] and the data isstored as files in compliance with Part2: File System Specifications [8].

Data for DVD-Audio is recordedas several files in the AUDIO_TS di-rectory, in compliance with Part 4:Audio Specifications, the body ofDVD-Audio format.

Audio manager (AMG) managesthe total presentation. The audio datais recorded in audio objects (AOB) inan audio title set (ATS). An AOB is aprogram stream (PS) complying withMPEG-2 System (ISO/IEC13818-1) [9] and contains either apacketized LPCM stream or apacketized MLP stream. Each pack of

JULY 2003 IEEE SIGNAL PROCESSING MAGAZINE 73

� 1. The user’s view of DVD-Audio.

Page 4: An Introduction to Super Audio CD and DVD-Audio Fdecoy.iki.fi/dsound/ambisonic/motherlode/source/01215233.pdf · JULY 2003 IEEE SIGNAL PROCESSING MAGAZINE 71 An Introduction to Super

the MPEG PS on a DVD disc isstored in a logical sector of 2,048 B,where roughly 2,000 B are used forthe main data (audio data in the caseof AOB) and the rest is for headers.

Still pictures to be presented withaudio data are recorded in an audiostill video set (ASVS) separatelyfrom the audio data. Each still pic-ture, called audio still video (ASV),is also a MPEG PS containing oneMPEG-intra picture.

DVD-Video compatible content,i f any, is recorded in the“VIDEO_TS” directory in compli-ance with Part 3: Video Specifica-tions. The video-with-audio data forthe video track is recorded as a videoobject (VOB) in a video title set(VTS). This video data is also man-aged (i.e., referred to) by the AMGfor playback on DVD-Audio players.VMG is the video manager informa-tion used by DVD-Video players.

Audio SpecificationsThis section describes the general au-dio specifications for audio data inthe ATS.

Audio Coding Schemesand Audio ParametersDVD-Audio supports two audio cod-ing schemes: linear PCM (LPCM)and MLP (MLP is also called “packedPCM” in the format book).

Audio data in an audio track iscoded in one of these two schemesand all DVD-Audio players supporttheir playback. MLP, which is de-scribed later, is a lossless compressionsystem for LPCM audio data, and it isused for two key purposes:� to extend playing time: losslesscompression extends playing time, bytypically reducing the data on the discby a factor of 2� to enable the highest quality multi-channel data: the maximum bit rateallowed for audio data in DVD-Au-dio is 9.6 Mb/s. Without compres-sion, the highest quality audio that

could be accommodated is 96kHz/20-b/five-channel LPCM,which runs at 9.6 Mb/s. MLP also re-duces the maximum data rate on thedisc and meets the highest quality re-quirement of 96 kHz/24-b/six chan-nel, which would otherwise require a

carrier capable of 13.824 Mb/s innative LPCM format.

Table 1 shows the combinations ofsampling frequency, quantizationword length, and number of channels(one to six channels) supported onDVD-Audio.

74 IEEE SIGNAL PROCESSING MAGAZINE JULY 2003

� 2. Data structure of a DVD-Audio disc.

Table 1. Supported parameter configurations for audio tracks.

Page 5: An Introduction to Super Audio CD and DVD-Audio Fdecoy.iki.fi/dsound/ambisonic/motherlode/source/01215233.pdf · JULY 2003 IEEE SIGNAL PROCESSING MAGAZINE 71 An Introduction to Super

DVD-Audio supports up to qua-druple sampling (192 kHz, 176.4kHz) for two-channel stereo and upto double sampling (96 kHz, 88.2kHz) for multichannel.

As 44.1 kHz sampling and its mul-tiples are also supported, master ma-terials for CD can be transferred toDVD-Audio discs without a sam-pling conversion process.

Channel AssignmentDVD-Audio supports up to six fullbandwidth audio channels. The chan-nel assignment follows the layout formovie (video), i.e., front three chan-nels, rear-surround two channels (orone channel) plus LFE (low fre-quency effect). This generally meansthat users are not required to changethe speaker layout for playback ofDVD-Video and DVD-Audio.

This layout is basically assumed tocomply with the ITU-R BS.775-1 rec-ommendation [10]; however, it is im-portant to note that any exact layout orrelation of playback speakers (dis-tance, azimuth, etc.) are not defined inthe DVD-Audio Specification.

Multichannel andStereo CompatiblePresentation MethodsMultichannel audio is a primary fea-ture of DVD-Audio; however, anumber of users may have two-chan-nel stereo equipment at home.DVD-Audio provides two methodsfor multichannel and stereo compati-ble presentation.

Down-Mix MethodIf a track contains only multichannelaudio data then two-channel stereo isderived from the multichannel databy a downmix process. The downmixcoefficients are specified during theprocess of authoring.

The actual processing is slightlydifferent depending on the codingscheme used.

� In the case of linear PCM, theplayer performs the downmix to twochannel according to down-mix co-efficients stored on the disc. Differ-ent tracks basically can have differentcoefficients.� In the case of MLP, the downmix isperformed during the authoring pro-cess (MLP encoding) according tospecified coefficients, and thedownmixed substream (called “L0,R0”) is stored on the disc togetherwith another substream containingthe rest of channels (e.g., C, LFE, Ls,Rs). For two-channel stereo repro-duction, players directly decode andplay back the L0, R0 substream.

Audio SelectionIn case of audio selection, a track logi-cally contains separate multichanneland two-channel stereo data: for ex-ample it could be as “audio #1: mul-tichannel” and “audio #2: two-channel stereo.”

The coding schemes for audio #1and #2 can be any combination ofLPCM or MLP. Obviously, in termsof data efficiency, using lossless com-pression (MLP) for both selectionswould be the best.

When playing back a track with au-dio selection, players automaticallyselect the appropriate version (i.e.,multichannel for multichannel play-ers or two channel for two-channelstereo players), and it is also possiblefor users to switch between audio #1and #2 with the “audio” key on a re-mote control unit (RCU).

Using this method, content pro-v iders can inc lude dedicatedtwo-channel mixes which may haveused sound effects processing such asreverberation, equalization, and de-lay in addition to downmix in theirproduction. However, providingtwo-channel compatibility by audioselection does require more dataspace, since the disc must hold boththe multichannel and two-channelmixes in separate locations.

In the case of DVD-Video, multi-ple audio streams (if any) are multi-plexed with the video stream. In thatcase, the sum of bit rates for the videostream and all audio streams must notexceed 9.8 Mb/s (the maximum bitrate of DVD-Video). By contrast, inthe case of DVD-Audio, audio selec-tions are put on the disc as separate au-dio data (AOB) without multiplexing.Therefore, each AOB can adopt abit-rate up to 9.6 Mb/s. From theviewpoint of audio data recording,this scheme is equivalent to storing au-dio #1 and #2 as two separate tracks.

MLP LosslessMLP for DVD-Audio is a tailoredsubset of the MLP algorithm thatcomplies with the MPEG programstream system and DVD specific fea-tures. This section describes the MLPsystem [11].

Lossless CompressionUnlike perceptual or lossy data reduc-tion, lossless coding does not alter thefinal decoded signal in any way butmerely “packs” the audio data moreefficiently into a smaller data rate.

Audio information that is of inter-est to the human listener containssome redundancy. On music signals,the information content varies withtime and the input channel informa-tion capacity is rarely fully exercised.

The aim of lossless compression isto reduce incoming audio to a datarate that closely reflects the inherentinformation content plus a mini-mum overhead.

An important insight then is thatthe coded output of a lossless com-pressor will have a variable data rateon normal audio content.

While normal music may be com-pressed by a factor of 2, we reasonablyexperience wider variations in com-pressed rate. There are also pathologi-cal signals: for example, silence ornear-silence will compress greatly and

JULY 2003 IEEE SIGNAL PROCESSING MAGAZINE 75

Page 6: An Introduction to Super Audio CD and DVD-Audio Fdecoy.iki.fi/dsound/ambisonic/motherlode/source/01215233.pdf · JULY 2003 IEEE SIGNAL PROCESSING MAGAZINE 71 An Introduction to Super

signals that are nearly random willnot. Indeed, should a section of chan-nel data appear to be truly random,then no compression is possible. For-tunately it turns out that real acousticsignals tend not to provide full-scalewhite noise in all channels for any sig-nificant duration!

Previously lossless audio data com-pression systems have been optimizedfor reducing average data rate (i.e.,minimizing compressed file size).

Additionally, it is important to re-duce the instantaneous peak data ratefor optimum results at high samplingrates such as 96 kHz or 192 kHz fordata-limited, disc-based applicationslike DVD-Audio.

MLP tackles this by attempting tomaximize the compression at all timesusing this set of techniques:� looking for “dead air,” channelsthat do not exercise all the availableword length� channels that do not use the avail-able bandwidth� removing interchannel correlations� efficiently coding the residual in-formation� smoothing coded information bybuffering.

How Does It Work?MLP coding is based on establishedconcepts; however, there are some

important novel techniques used inthis system, including:� lossless processing� lossless matrixing� lossless use of infinite impulse re-sponse (IIR) filters� managed first-in, first-out (FIFO)buffering across transmission� decoder lossless self-check� operation on heterogeneous chan-nel sampling frequency.

These methods are described next,in the context of the encoder.

MLP EncoderThe MLP encoder core is illustratedin Figure 3. The steps for encodingblocks of data are the following:� 1) Incoming channels may be re-mapped to optimize the use ofsubstreams (described later).� 2) Each channel is shifted to re-cover unused capacity (e.g., less than24-b precision or less than full scale).� 3) A lossless matrix techniqueoptimizes the channel use by reduc-ing interchannel correlations.� 4) The signal in each channel isdecorrelated using a separate predic-tor for each channel.� 5) The decorrelated audio is fur-ther optimized using entropy coding.� 6) Each substream is buffered us-ing a FIFO memory system tosmooth the encoded data rate.

� 7) Multiple data substreams are in-terleaved.� 8) The stream is packetized forfixed or variable data rate and for thetarget carrier.

Lossless MatrixA multichannel audio mix will usuallyshare some common information be-tween channels.

On occasion the correlations will beweak, but there are other cases wherethe correlations can be high. Examplesinclude multitrack recordings where amix-down to the delivered channelsmay pan signals between channels andthus place common information insome channels.

There are also specific exampleswhere high interchannel correlationsoccur, including:� mono presented as dual-mono withidentical left and right (common in“talking book” or archive recordings)� derived surround signals based onleft minus right� multichannel speaker feeds result-ing from a hierarchical upscale� multichannel speaker feeds result-ing from an ambisonic decode fromB-format WXYZ.

The MLP encoder uses a matrixthat allows the encoder to reduce cor-relations, thereby concentratinglarger amplitude signals in fewer

76 IEEE SIGNAL PROCESSING MAGAZINE JULY 2003

nChannels

PCMAudio

Channels

Remap

Shift

Shift

Shift

LosslessMatrix

LSB Bypass

Decorrelator

Decorrelator

Decorrelator

EntropyCoder

EntropyCoder

EntropyCoder

SubstreamInterleaveand Add

Headers forEncoding

Parameters

� 3. Block diagram of the lossless encoder core.

Page 7: An Introduction to Super Audio CD and DVD-Audio Fdecoy.iki.fi/dsound/ambisonic/motherlode/source/01215233.pdf · JULY 2003 IEEE SIGNAL PROCESSING MAGAZINE 71 An Introduction to Super

channels. A trivial (though impor-tant) example would be the tendencyof the matrix process to rotate a stereomix from left/right to sum/difference.In general the encoded data rate isminimized by reducing the common-ality between channels.

However, conventional matrixingis not lossless: a conventional inversematrix reconstructs the original sig-nals but with rounding errors.

The MLP encoder decomposes thegeneral matrix into a cascade of affinetransformations. Each affine transfor-mation modifies just one channel by

adding a quantized linear combina-tion of the other channels; see Figure4. For example, if the encodersubtracts a particular linear combina-tion, then the decoder must add itback. The quantizers Q in Figure 4ensure constant input-outputwordwidth and lossless operation ondifferent computing platforms.

PredictionIf the values of future audio samplescan be estimated, then it is only neces-sary to transmit the rules of predic-

tion along with the differencebetween the estimated and actual sig-nals. This is the function of thedecorrelator (optimal coding showsno correlation between the currentlytransmitted difference signal and itsprevious values). It is useful to con-sider how prediction operates in thefrequency (Shannon) domain.

Figure 5 shows the short-termspectrum of a music excerpt. If thisspectrum was flat, a linear predictionfilter could make no gains. However,it is far from flat, so a decorrelator canmake significant gains by flattening it,ideally leaving a transmitted differ-ence signal with a flat spectrum, es-sentially being white noise. TheGerzon/Craven theorems [20] showthat the level of the optimally decorre-lated signal is given by the average ofthe original signal spectrum whenplotted as decibels versus linear fre-quency. As illustrated in Figure 5, thisdecibel average can have significantlyless power than the original signal,hence the reduction in data rate. Infact this power reduction representsthe information content of the signalas defined by Shannon [13].

In practice, the degree to whichany section of music data can be“whitened” depends on the contentand the complexity allowed in theprediction filter.

Infinite complexity could theoreti-cally achieve a prediction at the en-tropy level shown in Figure 5;however, all the coefficients whichdefine this decorrelator would thenneed to be transmitted to the decoder(as well as the residual signal) to re-construct (recorrelate) the signal.There is therefore a need to obtain agood balance between predictor com-plexity and performance.

FIR and IIR PredictionMost previous lossless compressionschemes use finite impulse response(FIR) prediction filters and canachieve creditable reduction of data

JULY 2003 IEEE SIGNAL PROCESSING MAGAZINE 77

� 4. A single lossless matrix encode and decode.

� 5. Spectra of a signal and its average level.

Page 8: An Introduction to Super Audio CD and DVD-Audio Fdecoy.iki.fi/dsound/ambisonic/motherlode/source/01215233.pdf · JULY 2003 IEEE SIGNAL PROCESSING MAGAZINE 71 An Introduction to Super

rate on conventional CD-type mate-rial [14]-[16]. However, we pointedout in [17]-[19] that IIR filters haveadvantages in some situations, partic-ularly:� cases where control of peak datarate is important� cases where the input spectrumexhibits an extremely wide dynamicrange.

The ARA proposal [12] pointedout the particularly increased likeli-hood of wide dynamic range in thespectrum of audio sampled at higherrates such as 96 or 192 kHz. Spectralenergy at high frequencies is normallyquite low and may be further attenu-ated by microphone response or airabsorption.

The ARA also indicated the de-sirability that a music providershould have the freedom to controllossless data rate by adjusting super-sonic filtering during mastering. Apowerful lossless compression sys-tem will require the use of FIR andIIR prediction.

Track 6 of the CD “Hello, I Mustbe Going!” by Phil Collins shows anexample that is quite difficult to com-press. The original signal spectrum inFigure 6 includes a percussion instru-ment with an unusually extended tre-ble response. An eighth-order FIRfilter is able to flatten the major por-tion of the spectrum. However, it iscompletely unable to deal with thedrop above 20 kHz caused by theanti-alias filter. A fourth-order de-nominator IIR filter is able to do thisvery effectively as shown.

In this case the improvement incompression is small, as there is only2 kHz of underutilized spectrum be-tween the 20 kHz cut-off and theNyquist frequency of 22.05 kHz. IIRfiltering gives a bigger improvementif filtering leaves a larger region of thespectrum unoccupied, for example ifaudio is sampled at 96 kHz but a filteror other roll-off is in place at say 30 or35 kHz (see [19]) (as may happenwith a microphone).

Lossless Prediction in MLPThe MLP encoder uses a separatepredictor for each encoded channel.The encoder is free to select IIR orFIR filters up to eighth order from awide palette. These extensive op-tions ensure that good data reduc-tion can be provided on as manytypes of audio as possible.

The effectiveness of the encoder tac-tics described so far can be seen in Fig-

ure 7, which graphs the data ratethrough a 30-s, 96 kHz/24-b/six-chan-nel orchestral excerpt.

The lowest curve in Figure 7 is thedata rate for the normal MLP en-coder; the flat-topped sections will beexplained later.

The middle curve shows the im-pact of switching off the lossless ma-trix and illustrates that in this case asignificant improvement in coding

78 IEEE SIGNAL PROCESSING MAGAZINE JULY 2003

� 6. Spectra for a signal excerpt and the residuals using eighth-order FIR and fourth-order IIR predictors.

� 7. Resulting data rate in MLP encodes showing the benefit of the encoder stages.

Page 9: An Introduction to Super Audio CD and DVD-Audio Fdecoy.iki.fi/dsound/ambisonic/motherlode/source/01215233.pdf · JULY 2003 IEEE SIGNAL PROCESSING MAGAZINE 71 An Introduction to Super

ratio was obtained by removinginterchannel correlations. The uppercurve shows the further reduced ef-fectiveness by constraining the pre-dictor choices to a simple FIR. Thetop line shows the 9.6 Mb/s data-ratelimit for DVD-Audio.

The input data rate is 13.824 Mb/s,so in this example the options of IIRand lossless matrixing improved thecoding ratio from 1.64:1 to 2.08:1.

Entropy CodingOnce the cross-channel and inter-sample correlations have been re-moved, it remains to encode the indi-vidual samples of the decorrelatedsignal as efficiently as possible. “En-tropy coding” is the general namegiven to this process, with its aim be-ing to match the coding of each valueto the probability that it occurs. Infre-quent values are coded to a largenumber of bits, but this is more thancompensated by coding frequent val-ues to a small number of bits.

Audio signals tend to be peaky,and so linear coding is inefficient. Forexample, in PCM one has to allocateenough bits to describe the highestpeak, and the most significant bits(MSBs) will be used infrequently.Audio signals often have a Laplaciandistribution (c.f., [14]-[16]), that is,

the histogram is a two-sided decayingexponential; this appears to be trueeven after decorrelation. The MLPencoder may choose from a numberof entropy coding methods.

BufferingWe have explained that while normalaudio signals can be well predictedthere will be occasional fragments likesibilants, synthesized noise, or percus-sive events that have high entropy.

MLP uses a particular form ofstream buffering that can reduce thevariations in transmitted data rate,absorbing transients that are hard tocompress.

FIFO memory buffers are used inthe encoder and decoder (see Figures 9and 10). These buffers are configuredto give a constant notional delay acrossencode and decode. This overall delayis small, typically of the order of 75 ms.To allow rapid startup or cueing, theFIFO management minimizes the partof the delay due to the decoder buffer.So, this buffer is normally empty andfills only ahead of sections with high in-stantaneous data rate.

During these sections, the decoder’sbuffer empties and is thus able to de-liver data to the decoder core at a higherrate than the transmission channel isable to provide. In the context of a disc,

this strategy has the effect of movingexcess data away from the stress peaks,to a preceding quieter passage.

The encoder can use the bufferingfor a number of purposes, e.g.,� keeping the data-rate below a pre-set (format) limit� minimizing the peak data rate overan encoded section.

Figure 8 shows how hard-to-com-press signals can be squeezed below apreset format limit. This 30-s, 96kHz/24-b recording features closelyrecorded cymbals in six channels. Atthe crescendo this signal is virtuallyrandom and the underlying com-pressed data rate is 12.03 Mb/s. Buff-ering allows the MLP encoder to holdthe transmitted data rate below 9.2Mb/s by filling the decoder buffer to ashort-term maximum of 86 KB (bot-tom curve).

Use of SubstreamsThe MLP stream contains a hierarchi-cal structure of substreams. Incomingchannels can be matrixed into two (ormore) substreams. This method al-lows simpler decoders to access a sub-set of the overall signal.

This substream principle is illus-trated for the encoder in Figure 9 andthe decoder in Figure 10; note thateach substream is separately buffered.

MLP DecoderThe MLP decoder unwinds each en-coder process in reverse order.

The decoder is relatively low com-plexity. A decoder capable of extract-ing a two-channel stream at 192 kHzrequires approximately 27 MIPS(mega instruction per second), while40 MIPS will be required to decodesix channels at 96 kHz.

Two-Channel DownmixIt is often useful to provide a meansfor access ing high-resolut ionmultichannel audio streams ontwo-channel playback devices. In anapplication such as DVD-Audio, the

JULY 2003 IEEE SIGNAL PROCESSING MAGAZINE 79

� 8. Buffering allows a difficult passage to remain below a hard format limit.

Page 10: An Introduction to Super Audio CD and DVD-Audio Fdecoy.iki.fi/dsound/ambisonic/motherlode/source/01215233.pdf · JULY 2003 IEEE SIGNAL PROCESSING MAGAZINE 71 An Introduction to Super

content provider can place separatemulti- and two-channel streams onthe disc as described earlier. How-ever, to do this requires separate mix,mastering, and authoring processesand uses disc capacity.

In cases where only one multi-channel stream is available, then thereare very few options at replay; one isto use either a fixed or guideddownmix. However, to create such adownmix it is first necessary to de-code the full multichannel signal; thiscontravenes the desirable principlethat decoder complexity should de-crease with functionality.

Performing Mix-Downin the Lossless EncoderMLP provides an elegant and uniquesolution. The encoder combineslossless matrixing with the use of twosubstreams in such a way as to opti-mal ly encode both the L0, R0downmix and the multichannel ver-sion. This method can be explainedby referring to Figures 9 and 10.

Downmix instructions are fed tothe matrix 1 to determine some coef-ficients for the lossless matrices. Thematrices then perform a rotation suchthat the two channels on substream 0decode to the desired stereo mix andcombine with substream 1 to providefull multichannel.

Because the two-channel downmixis a linear combination of the multi-channel mix, then strictly, no new in-formation has been added. In theexample shown in Figure 9 there arestill only six independent channels inthe encoded stream. So, theoretically,the addition of the two-channel ver-sion should require only a modest in-crease in overall data rate (typically 1 bper sample, e.g., 96 kb/s at 96 kHz).

The advantages of this method areconsiderable.� The quality of the mix-down isguaranteed. The producer can listento it at the encoding stage and thelossless method delivers it bit-accu-rate to the end user.

� A two-channel-only playback de-vice does not need to decode the mul-tichannel stream and then performmix-down. Instead, the lossless de-coder only need decode substream 0.� A more complex decoder may ac-cess both the two-channel andmultichannel versions losslessly.� The downmix coefficients do nothave to be constant for a whole track,but can be varied under artistic control.

How Much Compression?In specifying a lossy system, the criticalcompression measure is the final bitrate for a given perceptual quality, andthis is independent of the inputwordwidth. With lossless compres-sion, increases in incoming precision(i.e., additional LSBs on the input)must be losslessly reproduced. How-ever, these LSBs typically contain littleredundancy and so contribute directlyto the transmitted data rate. Thereforewe tend to quote the saving of data

rate, as this measure is relatively inde-pendent of incoming precision. Thesaving of data rate (in bits per originalsample) is indicated in Table 2.

At 44.1 or 48 kHz, the peak datarate can almost always be reduced byat least 4 b/sample, i.e., 16-b audiocan be losslessly compressed to fitinto a 12-b channel.

At 96 kHz, the peak data rate cansimilarly be reduced by 8 b/sample, i.e.,24-b audio can be compressed to 16 band 16-b/96 kHz audio can belosslessly compressed to fit into an 8-bchannel.

The important parameter for trans-mission applications is the reduction ofthe peak rate and in the case of DVD-Audio peak rate is a key parameter, be-cause the encoded stream must alwaysoperate below the audio bufferdata-rate limit of 9.6 Mb/s.

The average number in Table 2 in-dicates the degree of compressionthat could be obtained when using

80 IEEE SIGNAL PROCESSING MAGAZINE JULY 2003

Matrix

1

24

24

24

24

24

24

EncoderCore 0

EncoderCore 1

Sub 0

Sub 1

FIFOBuffer

FIFOBuffer

Packetizer

AdditionalNonaudio

Data

� 9. Illustrating two substreams in encoding.

24

24

24

24

24

24

FIFOBuffer

FIFOBuffer

Depacketizer

Substream 0

Substream 1

AdditionalData

DecoderCore 0

DecoderCore 1

Matrix

1

m0

m1

m2

m3

m4

m5

� 10. Decoding two substreams.

Page 11: An Introduction to Super Audio CD and DVD-Audio Fdecoy.iki.fi/dsound/ambisonic/motherlode/source/01215233.pdf · JULY 2003 IEEE SIGNAL PROCESSING MAGAZINE 71 An Introduction to Super

MLP in an archive, mastering, or ed-iting environment. For example, apeak data rate reduction of 8 b/sam-ple means that a 96 kHz/24-b channelcan be carried on the disc with a rateequal to that of a24 8 16– = b LPCMchannel. However the space used onthe disc is estimated by the averagesaving, in this case the residual will be24 11 13– = b/channel. Considerthat an 11-b saving represents a com-pression ratio of 1.85:1 with 24-bmaterial, whereas the same savingcompresses 16-b audio by 3.2:1!

Playing Time on DVD-AudioDVD audio holds approximately 4.7GB of data and has a maximum datatransfer rate of 9.6 Mb/s for an audiostream. Six channels of 96 kHz/24-bLPCM audio have a data rate of13.824 Mb/s, which is well in excessof 9.6 Mb/s.

Also, at 13.824 Mb/s, the data ca-pacity of the disc would be used up inapproximately 45 min.

So, lossless compression is neededto reduce the data on the disc to ex-tend playing time to the industrynorm of 74 min and to guarantee aminimum reduction of 31% in instan-taneous data rate.

MLP meets this requirement witha sophisticated encoder, a simple de-coder, and a specific subset of featureslimited to two substreams and sixchannels.

Here are some examples of playingtimes that can be obtained:

� 5.1channels, 96kHz/24-b:100min� 6 channels, 96 kHz/24-b: 86 min� 2 channels, 96 kHz/24-b: 4 h� 2 channels, 192 kHz/24-b: 2 h� 2 channels, 44.1 kHz/16-b: 12 h� 1 channel, 44.1 kHz/16-b: 25 h(talking book).

Other FeaturesStill PictureThe audio track can be accompanied bymultiple still pictures (ASVs) and thesecan be presented in “slideshow” or“browsable” modes. In slideshowmode ASVs are presented at predeter-mined timings. In browsable mode us-ers can change (or browse) the picturesfreely. Browsable pictures are a uniquefeature of DVD-Audio which is notsupported in DVD-Video.

A set of ASV(s) to be presented to-gether with one or more audio track(s)is called an ASV unit (ASVU). Becausethe audio may use the full data-rate ca-pability of DVD-Audio, ASVUs arepreloaded into a memory buffer inthe player in advance of the audiopresentation. Since each picture(ASV) in the ASVU is read from thememory, changing the ASVs doesnot affect the audio presentation atall. Moreover, since the display tim-ing of each ASV can be flexibly con-trolled, the browsable functionalitybecomes possible.

The maximum size of one ASVUis 2 MB (2097152 B), which wouldallow for storing typically 15 to 20MPEG-intra pictures at 720*480 pix-els. The lower the resolution of thestored images, the more pictures canbe contained in an ASVU.

Video TrackA video track or DVD-Video compat-ible content is used mainly for a musicvideo clip accompanied by movingpictures. It is optionally recorded inthe VIDEO_TS directory in compli-ance with a restricted subset of Part 3,Video Specifications. The restrictionsforbid the use of some DVD-Video

functions such as multistory (storybranching), region control, and paren-tal control, which are not necessary foran audio application.

Video tracks can also be used to con-tain the same tunes as the audio tracks(in lower quality) so that the disc can becompatible with DVD-Video players.

Visual MenuThe visual menu allows content in atrack or group of a DVD-Audio discto be accessed through a menu pic-ture on the screen. The menu canalso give access to other informationsuch as liner notes, discography, andlyrics. The visual menu of DVD-Au-dio provides capabilities which arequite like those of the title menu inDVD-Video.

CopyrightManagement SystemDVD-Audio accommodates a copy-right management system calledCPPM. CPPM technology togetherwith copyright management infor-mation on the disc protects the re-corded assets and aims to controlany copying.

CPPM and CSSCPPM uses encryption to protectcontent in the AUDIO_TS directory.CSS, the copy protection system forDVD-Video, is used for contents inthe VIDEO_TS directory. Note thatCPPM and CSS are systems devel-oped by organizations outside of theDVD Forum. Further informationmay be found in [5] and [6].

Encryption of ContentsA primary function of CPPM or CSS isto encrypt the content. Since encrypteddata cannot be decrypted without thecorrect key, a direct copy of the en-crypted data (very likely on a PC) hasno meaning. As the decryption key isobtained only through official contract

JULY 2003 IEEE SIGNAL PROCESSING MAGAZINE 81

Table 2. Data-rate reduction.

SamplingkHz

Data-Rate Reduction:b/sample/channel

Peak Average

48 4 5 to 11

96 8 9 to 13

192 9 9 to 14

Page 12: An Introduction to Super Audio CD and DVD-Audio Fdecoy.iki.fi/dsound/ambisonic/motherlode/source/01215233.pdf · JULY 2003 IEEE SIGNAL PROCESSING MAGAZINE 71 An Introduction to Super

and managed securely, the chance ofkey hacking is extremely low.

Furthermore, as a countermeasureagainst the theft or hacking of thekeys, CPPM provides a method todisable any stolen key; that is, discsmanufactured after hacking can beprotected from being decrypted ille-gally with that key.

Audio Watermark (Watermark)An audio track can optionally includean audio watermark [22], [6]. Thewatermark can carry copy control in-formation (WM-CCI) and other in-formation about the copyrightholder. The watermark is primarilyintended for the following:� Tracing illegal copies. Even if au-dio data is captured from a player’sanalog output and illegally distrib-uted in some way, the watermark canidentify the original rights holder, etc.� Stopping playback of some pirateddiscs. Under the CPPM contract, audiodata must be CPPM-encrypted when aWM-CCI is included that restrictscopying. Therefore, a non-CPPM discwhich contains such a watermark willbe an illegal disc. Players supportingCPPM detect the watermark and stopplaying back such discs.� Control of copying via an analog in-terface. The watermark is the only wayto confirm the copyright of data whenit is transmitted via an analog inter-face. If the destination recording de-vice complies with the contract for thesame watermark, the device will detectthe watermark to control the copying.

ConclusionsDVD-Audio offers a broad and blankcanvas for music producers and artistswith flexible high-quality audio con-figurations, multimedia features, andcompatibility with DVD-Video plat-forms. One can produce multichan-nel music sampled at 96 kHz/24-bwith the aid of the MLP Lossless andsuper-wide-range stereo music sam-pled at 192 kHz. At CD quality, the

large capacity of DVD can containcomplete works, such as all Beetho-ven’s symphonies, on one disc. Visualimages that would otherwise be dis-tributed in a paper booklet can be in-cluded, and as with a DVD-Videodisc, moving pictures can offervalue-added contents such as musicclips or interviews.

Recently, security is a topic of greatimportance for commercial media.DVD-Audio supports CPPM, astate-of-the-art content protection sys-tem and an audio watermark to takecare of the analog based copying. Bothmeasures cooperate to prevent piracy,unbounded casual copying, orclone-copying. In addition, DVD-Au-dio allows for administrable copyingto a secure destination media such asDVD audio recording, which hasbeen under development by the DVDForum. The DVD Audio RecordingSpecification supports the same audiospecifications as DVD-Audio (includ-ing the MLP Lossless) and has optionsfor some lossy coding schemes for lon-ger time recording.

DVD-Audio in collaboration withDVD audio recording will providehigh-quality and secure platforms formusic assets, and the authors hopethat music lovers will recognize thevalue of such storage media.

AcknowledgmentsWe wish to thank everyone con-cerned in the establishment of theDVD-Audio specifications, its pro-motion, and business. We are alsograteful to the developers of MLP.

References[1] DVD Forum secretariat. Tokyo, Japan. Avail-

able: http://www.dvdforum.org/

[2] “DVD Specifications for Read-Only Disc Part3: VIDEO SPECIFICATIONS Version 1.13,”in DVD Forum, Tokyo Japan, Mar. 2002.

[3] “DVD Specifications for Read-Only Disc Part4: AUDIO SPECIFICATIONS Version 1.2,”in DVD Forum, Tokyo, Japan, Mar. 2001.

[4] “DVD Specifications for Read-Only Disc Part4: AUDIO SPECIFICATIONS, Packed PCM:

MLP Reference Information Version 1.0,” inDVD Forum, Tokyo, Japan, Mar. 1999.

[5] DVD Copy Control Association (DVD CCA).Available: http://www.dvdcca.org/

[6] 4C Entity, LLC. Available:http://www.4centity.com/

[7] “DVD Specifications for Read-Only Disc Part1: PHYSICAL SPECIFICATIONS Version1.04,” in DVD Forum, Tokyo, Japan, June 2002.

[8] “DVD Specifications for Read-Only Disc Part2: FILE SYSTEM SPECIFICATIONS Version1.04,” in DVD Forum, Tokyo, Japan, June 2002.

[9] Information Technology—Generic Coding of Mov-ing Pictures and Associated Audio, ISO/IEC Stan-dard 13818, 1994.

[10] Multi-Channel Stereophonic Sound System withand Without Accompanying Picture, ITU-R Rec-ommendation BS.775-1, 1994.

[11] M.A. Gerzon, J.R. Stuart, R.J. Wilson, P.G.Craven, and M.J. Law, “The MLP Lossless com-pression system,” in AES 17th Int. Conf.High-Quality Audio Coding, Florence, Italy,Sept. 1999, pp. 61-75.

[12] Acoustic Renaissance for Audio. (1995, Feb.).A Proposal for High-Quality Application ofHigh-Density CD Carriers. [Private publication].Available: http://www.meridian-audio.com/ara

[13] C.E. Shannon, “A mathematical theory ofcommunication,” Bell Syst. Tech. J., vol. 27 pp.379-423, 623-656, July 1948, Oct. 1948.

[14] A. Robinson, “Shorten: Simple lossless andnear-lossless waveform compression,” Cam-bridge Univ., Cambridge, U.K., Tech RepCUED/F-INFENG/TR.156, Dec 1994.

[15] A.A.M.L. Bruekers, A.W.J. Oomen, and R.J.van der Vleuten, “Lossless coding for DVD au-dio,” in Proc. AES 101st Conv., Los Angeles, CA,Preprint 4358, Nov. 1996.

[16] C. Cellier, P. Chenes, and M. Rossi, “Lossless au-dio bit rate reduction,” in Proc. AES UK Conf.Managing the Bit Budget, May 1994, pp. 107-122.

[17] P.G. Craven and M.A. Gerzon, “Lossless cod-ing for audio discs,” J. Audio Eng. Soc., vol. 44,pp. 706-720, Sept. 1996.

[18] P.G. Craven and M.A. Gerzon, “Lossless codingmethod for waveform data,” International PatentApplication PCT/GB96/01164, May 15, 1996.

[19] P.G. Craven, M.J. Law, and J.R. Stuart, “Losslesscompression using IIR prediction filters,” J. AudioEng. Soc. (Abstracts), vol. 45, p. 404, Mar. 1997.

[20] P.G. Craven and M.A. Gerzon, “Optimal noiseshaping and dither of digital signals,” presentedat the AES 87th Convention, New York,Preprint 2822, Oct. 1989.

[21] Documentation—International Standard Record-ing Code (ISRC), ISO Standard 3901, 1986.

[22] Verance Corporate Headquarters. CA. Avail-able: http://www.verance.com/index.html

82 IEEE SIGNAL PROCESSING MAGAZINE JULY 2003