nab_1_15 digital audio standards and practices

10
219 NAB ENGINEERING HANDBOOK Copyright © 2007 Focal Press. All rights of reproduction in any form reserved. C H A P T E R 1.15 Digital Audio Standards and Practices Updated for the 10th Edition by TIM CARROLL Linear Acoustic Inc., Lancaster, Pennsylvania CHIP MORGAN CMBE El Dorado Hills, California RANDALL HOFFNER ABC, Inc. New York, New York INTRODUCTION Digital audio technology has supplanted analog audio technology in U.S. television and radio production and broadcast facilities. Like digital video, digital audio offers many advantages in production, editing, distribution, and routing. Digital audio is remarkably robust and far less susceptible to degradation from hum, noise, level anomalies, and stereo phase errors than analog audio. Each analog audio recording gen- eration and processing step adds its own measure of noise to the signal, but in the digital domain audio is not subject to such noise buildup. However, percep- tual coding artifacts can be a problem; see Chapter 3.7, “Digital Audio Data Compression,” for additional information. Digital audio may be stored on magnetic, optical, or magneto-optical discs and in solid state memories. When audio samples have been reduced to a series of numbers, processing and manipulation become largely mathematical operations, easily accommo- dated by microprocessors. Nonlinear editing is an example of a process that cannot be done in the analog domain. The technology of digital compression creates new economies in the storage and transport of digital audio and permits the broadcast of digital audio within a reasonable segment of spectral bandwidth. The distribution and routing of audio in the digital domain present the broadcaster with new options as well, such as the capability to embed digital audio within a serial digital video signal, facilitating the car- riage of video and multiple audio channels on a single coaxial cable. Although digital audio presents its own unique set of challenges, its advantages far outweigh its disadvantages. Digital audio systems are inherently free of the hum and noise problems that can invade analog audio sys- tems. The nature of the digital domain gives rise to a new set of considerations for the facility planner and designer. Digital audio signals operate in the multiple megahertz frequency domain that video engineers are well acquainted with, raising such considerations as signal reflections and impedance discontinuities. In digital audio system engineering, just as in analog audio system engineering, cognizance of the potential pitfalls to be avoided and the application of good engineering practices will result in facilities that func- tion well. Since the 9th edition of the NAB Engineering Hand- book was published, digital audio standards and prac- tices have not seen a dramatic change in their basic definitions, but their use has grown in popularity. Dig- ital audio is no longer a format used only by the larg- est broadcasters and postproduction facilities; it has come to be relied upon as the only way to handle modern broadcast audio requirements. Digital audio has proven to be as robust and flexible as it was origi- nally designed to be. Routing, distribution, storage, and signal processing have advanced to the point of making it difficult or impossible to find analog ver- sions of these processes with the same features and capabilities. With multichannel and surround sound audio having gained remarkable popularity, the thought of handling these signals in the analog domain quickly becomes overwhelming and the effi- ciency and consistency of digital audio truly make it

Upload: wmesa777

Post on 21-Feb-2015

66 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: NAB_1_15 Digital Audio Standards and Practices

219NAB ENGINEERING HANDBOOKCopyright © 2007 Focal Press.

All rights of reproduction in any form reserved.

C H A P T E R

1.15

Digital Audio Standards and Practices

Updated for the 10th Edition by

TIM CARROLLLinear Acoustic Inc.,

Lancaster, Pennsylvania

CHIP MORGANCMBE

El Dorado Hills, California

RANDALL HOFFNERABC, Inc.

New York, New York

INTRODUCTION

Digital audio technology has supplanted analog audiotechnology in U.S. television and radio productionand broadcast facilities. Like digital video, digitalaudio offers many advantages in production, editing,distribution, and routing. Digital audio is remarkablyrobust and far less susceptible to degradation fromhum, noise, level anomalies, and stereo phase errorsthan analog audio. Each analog audio recording gen-eration and processing step adds its own measure ofnoise to the signal, but in the digital domain audio isnot subject to such noise buildup. However, percep-tual coding artifacts can be a problem; see Chapter 3.7,“Digital Audio Data Compression,” for additionalinformation.

Digital audio may be stored on magnetic, optical, ormagneto-optical discs and in solid state memories.When audio samples have been reduced to a series ofnumbers, processing and manipulation becomelargely mathematical operations, easily accommo-dated by microprocessors. Nonlinear editing is anexample of a process that cannot be done in the analogdomain. The technology of digital compression createsnew economies in the storage and transport of digitalaudio and permits the broadcast of digital audiowithin a reasonable segment of spectral bandwidth.

The distribution and routing of audio in the digitaldomain present the broadcaster with new options aswell, such as the capability to embed digital audiowithin a serial digital video signal, facilitating the car-riage of video and multiple audio channels on a singlecoaxial cable. Although digital audio presents its own

unique set of challenges, its advantages far outweighits disadvantages.

Digital audio systems are inherently free of the humand noise problems that can invade analog audio sys-tems. The nature of the digital domain gives rise to anew set of considerations for the facility planner anddesigner. Digital audio signals operate in the multiplemegahertz frequency domain that video engineers arewell acquainted with, raising such considerations assignal reflections and impedance discontinuities. Indigital audio system engineering, just as in analogaudio system engineering, cognizance of the potentialpitfalls to be avoided and the application of goodengineering practices will result in facilities that func-tion well.

Since the 9th edition of the NAB Engineering Hand-book was published, digital audio standards and prac-tices have not seen a dramatic change in their basicdefinitions, but their use has grown in popularity. Dig-ital audio is no longer a format used only by the larg-est broadcasters and postproduction facilities; it hascome to be relied upon as the only way to handlemodern broadcast audio requirements. Digital audiohas proven to be as robust and flexible as it was origi-nally designed to be. Routing, distribution, storage,and signal processing have advanced to the point ofmaking it difficult or impossible to find analog ver-sions of these processes with the same features andcapabilities. With multichannel and surround soundaudio having gained remarkable popularity, thethought of handling these signals in the analogdomain quickly becomes overwhelming and the effi-ciency and consistency of digital audio truly make it

Page 2: NAB_1_15 Digital Audio Standards and Practices

SECTION 1: BROADCAST ADMINISTRATION, STANDARDS, AND TECHNOLOGIES

220

possible to process multichannel sound with the preci-sion required.1

Some useful new formats and standards haveemerged along with necessary revisions to existingstandards. For example, new transport methods basedon TCP/IP computer networks that, although not yetstandardized, are gaining popularity, and the require-ment for audio metadata in digital television is pre-senting new challenges. In addition, new audio datarate reduction (i.e., compression) schemes have foundtheir place in the chain, and better standards havebeen developed to support them.

Proper synchronization of the digital audio plantstill seems to be a challenge. It is difficult for audio-centric facilities and staff to realize that they now havetiming requirements like their video counterparts, butAES-11 provides accurate guidance that, when fol-lowed, results in a properly timed plant and high-quality audio.

DIGITAL AUDIO STANDARDSFollowing are standards that are relevant to digitalaudio in broadcast facilities. Many of them have beenpresented before and remain important parts of stan-dardized digital audio systems. In recent years, therehave been significant advances made by the Society ofMotion Picture and Television Engineers (SMPTE),and it is necessary to consider these standards alongwith their Audio Engineering Society (AES) counter-parts if they exist.

AES3-2003AES3-2003 is the “AES Recommended Practice forDigital Audio Engineering—Serial Transmission forTwo-Channel Linearly Represented Digital AudioData.” This is the baseline standard for digital audiodeveloped by the AES and the European BroadcastingUnion (EBU) and is commonly referred to as the AES/EBU standard.

AES3 defines a digital protocol and physical andelectrical interfaces for the carriage of two discreteaudio channels, accompanied by various housekeep-ing, status, and user information in a single serial dig-ital bitstream. As its title indicates, AES3 wasdesigned to carry linearly quantized (uncompressedPCM) digital audio. Compressed digital audio may becarried on the IEC 958 digital audio interface.2 IEC958 is identical to AES3 in protocol, but can haveslightly different electrical characteristics for supportof consumer electronics. It addresses a professionalimplementation (AES/EBU) and a consumer imple-mentation (S/PDIF). The AES3 interface has the

capacity to carry linearly sampled digital audio at bitdepths from 16 to 24, data descriptive of such factorsas channel status and sample validity, along with par-ity checking data and user data. Total bit count persample, including audio and housekeeping, is 32 bits.An ancillary standard, AES5 (discussed later), recom-mends use of the professional audio sample rate of 48kHz on AES3, while recognizing the use of samplerates of 44.1 kHz, 32 kHz, and 96 kHz. AES3 carriesaudio samples using time-division multiplexing, inwhich samples from each of the two representedaudio channels alternate.

Data Structure

The data carried on the AES/EBU interface is dividedinto blocks, frames, and subframes. An AES block isconstructed of 192 frames, each frame being composedof two subframes, each subframe containing a singleaudio sample. A subframe begins with a preamblethat provides sync information and describes whattype of subframe it is, and ends with a validity bit, auser bit, a channel status bit, and a parity bit.

The subframe is divided into 32 time slots, each timeslot being one sample bit in duration. The first fourtime slots are filled with a 4-bit preamble. The 24 timeslots following the preamble may be filled in one oftwo ways. As shown in Figure 1.15-1(a),3 an audiosample word of up to 24 bits may fill all the time slots.Figure 1.15-1(b)4 illustrates that the first four time slotsof the audio sample word space may be filled withauxiliary bits, which can represent user data or low-quality audio for informational or cueing purposes,for example. In all cases, the audio word is repre-sented least significant bit (LSB) first, most significantbit (MSB) last. If digital audio words of bit depth lessthan the maximum are represented, the unused bitsare set to logic 0. Time slots 28, 29, 30, and 31 are filledwith a validity bit (V), a user bit (U), a channel statusbit (C), and a parity bit (P), respectively. The sub-frames are assembled into frames and blocks as shownin Figure 1.15-2.

1It would be virtually impossible to handle multichannel audio tothe precision required if the audio were handled via traditional ana-log means.

2Since 1997, the IEC document numbering system has added60000 to the old IEC standard number, so the official number of thisstandard is now IEC 60958. Because the three-digit number is morewidely known, this number will be used descriptively within thechapter.

3AES3–1992 (r1997), AES Recommended Practice for Digital Au-dio Engineering—Serial Transmission Format for Two-Channel Lin-early Represented Digital Audio Data, Figure 1.

4Ibid.

(a)

(b)FIGURE 1.15-1 Subframe formats: (a) 16–20 bit audioword; (b) 16–24 bit audio word.

Page 3: NAB_1_15 Digital Audio Standards and Practices

CHAPTER 1.15: DIGITAL AUDIO STANDARDS AND PRACTICES

221

Each subframe begins with one of three preambles.The first subframe in the 192-frame block, a Channel 1subframe, starts with Preamble Z. All other Channel 1subframes in the block start with Preamble X. AllChannel 2 subframes start with Preamble Y. Figure1.15-2 represents the last frame of a block and the firsttwo frames of the following block. Subframe 1 ofFrame 0, the first subframe of the block, begins withPreamble Z, uniquely identifying the beginning of theblock. After the first subframe, the successive sub-frames are marked by Preamble Y and Preamble X, toidentify Channel 2 and Channel 1 subframes, respec-tively.

A frame, consisting of two 32-bit subframes, ismade up of 64 bits, and the data rate of the interfacesignal may be readily calculated by multiplying thesampling rate times 64. In the case of the 48 kHz sam-ple rate, the total data rate of the signal is 64 times48,000 or 3.072 Mbps. As will be explained later, theinterface employs an embedded clock signal that istwice the sample rate, making the actual frequency ofthis signal about 6.1 MHz.

Encoding

All time slots except the preambles are encoded usingbiphase-mark coding to prevent the transmission oflong strings of logic 0’s or logic 1’s on the interface,and thereby minimize the dc component on the trans-mission line; facilitate rapid clock recovery from theserial data stream; and make the interface insensitiveto the polarity of connections. The preambles inten-tionally violate the rules of biphase-mark coding bydiffering in at least two states from any valid biphasecode to avoid the possibility of other data being mis-taken for a preamble. Biphase-mark coding requires aclock that runs at twice the sample rate of the databeing transmitted, and each bit that is transmitted isrepresented by a symbol that is composed of twobinary states. Figure 1.15-3 illustrates these relation-ships.

The top sequence of Figure 1.15-3 illustrates theinterface clock pulses, running at a speed twice thesource coded sample rate. The middle sequence shows

the source coding, which is the series of pulse codemodulated (PCM) digital audio samples. The bottomsequence shows how the source coded data is repre-sented in biphase-mark coding. In biphase-mark cod-ing, each source coded bit is represented by a symbolthat is composed of two consecutive binary states. Thefirst binary state of a biphase-mark symbol is alwaysdifferent from the second state of the symbol preced-ing it. A logic 0 is represented in biphase-mark codingby a symbol containing two identical binary states. Alogic 1 is represented in biphase-mark coding by asymbol containing two different binary states. Thisrelationship may be seen by examining the first fullsource coding bit at the left in the figure, which is alogic 1. Note that the duration of this bit is two clockpulses. Because the symbol immediately before itended with a logic 0, the biphase-mark symbol repre-senting it begins with a logic 1. As the bit to be trans-mitted is a logic 1, the second state of the biphase-mark symbol representing it is different from the first,a logic 0. The second source coded bit to be transmit-ted is a logic 0. Its first biphase-mark binary state is alogic 1, because the immediately previous state was alogic 0, and the second state is also a logic 1. The factthat the first binary state of a biphase-mark signal isalways different from the last binary state of the previ-ous symbol ensures that the signal on the interfacedoes not dwell at either logic 0 or logic 1 for a periodlonger than two clock pulses. Because biphase-markcoding does not depend on the absolute logic state ofthe symbols representing the source coded data, butrather on their relative states, the absolute polarity of abiphase-mark coded signal has no effect on the infor-mation transmitted, and the interface is insensitive tothe polarity of connections.

Ancillary Data

The last four time slots in a subframe are occupied byvarious housekeeping and user data. The validity bit(V) indicates whether the audio sample word is suit-able for conversion to an analog audio signal. Thechannel status bit (C) from each subframe is assem-bled into a sequence spanning the duration of an

FIGURE 1.15-2 AES3 block and frame structure.

Page 4: NAB_1_15 Digital Audio Standards and Practices

SECTION 1: BROADCAST ADMINISTRATION, STANDARDS, AND TECHNOLOGIES

222

entire AES3 block, and these 192 bit blocks of channelstatus data describe a number of aspects of the signal.Examples of channel status data include

• the length of audio sample words,• sampling frequency,• sampling frequency scaling flag,• number of audio channels in use,• emphasis information,• consumer or professional interface implemented,• audio or data being transmitted on the interface,• a variety of other possible information.

The 192-bit channel status bits (per block) are subdi-vided into 24-byte units. There is a separate channelstatus block for each audio channel, so channel statusmay be different for each of the audio channels. Userdata, or U-bits, may be used in any way desired. Theparity bit (P) facilitates the detection of data errors inthe subframe by applying even parity, ensuring thattime slots 4–31 carry an even number of logic 1’s andlogic 0’s.

Electrical Interface

The electrical interface specified by AES3 is a two-wiretransformer balanced signal. The AES interface wasdevised by audio engineers, with the intent of creatinga digital audio signal that could be carried on the samebalanced, shielded, twisted pair cables and XLR-3 typeconnectors that are used for analog audio signals. Thespecified source impedance for AES3 line drivers andthe specified input impedance for AES3 line receiversis 110 Ω, which is the approximate characteristicimpedance of shielded twisted pair cable as used foranalog audio. The permitted signal level on the inter-face ranges from 2–7 V peak-to-peak.

The balanced, twisted pair electrical interface cangive rise to some problems in implementation. XLRtype connectors and audio patch panels, for example,are not impedance matched devices. This is not criticalwhen the highest frequency of interest is 20 kHz, but itcan cause serious problems when a 6 MHz signal mustbe passed. These considerations, plus the familiarity oftelevision engineers with unbalanced coaxial trans-

mission of analog video, and the need for higher con-nector density for a given product size generated therequirement for standardization of an unbalanced,coaxial electrical interface for the AES3 signal. Such anelectrical interface is standardized in SMPTE 276M,which describes carriage of the AES/EBU interface onstandard 75 Ω video cable using BNC connectors, at asignal level of 1 V peak-to-peak. The fact that the 110 Ωbalanced and 75 Ω unbalanced signal formats coexistin many systems frequently presents the requirementto translate between these two signals. Devices to per-form such translations are readily available, andSMPTE 276M has an informative annex explaininghow to build them. For density and compatibilityissues, most modern multichannel audio equipment isbeing designed to support SMPTE 276M.

AES-2id–1996 (r2001) is an information documentcontaining guidelines for the use of the AES3 inter-face. AES-3id–2001 is an information document con-taining descriptive information about the unbalancedcoaxial interface for AES3 audio.

AES5–2003

AES5–2003 is the “AES Recommended Practice forProfessional Digital Audio—Preferred Sampling Fre-quencies for Applications Employing Pulse-CodeModulation.” This companion document to AES3 con-tains the recommended digital audio sample rate forsignals to be carried on the interface. The professionaldigital audio sample rate of 48 kHz is recommended,with recognition given to the use of the compact discsample rate of 44.1 kHz, a low bandwidth sample rateof 32 kHz, and higher bandwidth sampling frequen-cies, also referred to as Double Rate (62–108 kHz) andQuadruple Rate (124–216 kHz) for applications requir-ing a higher bandwidth or more relaxed anti-alias fil-tering.

SMPTE EG 32, engineering guideline on AES/EBUaudio emphasis and sample rates for use in televisionsystems, also recommends that the 48 kHz sample ratebe used. Variations on these sample rates are encoun-tered. Varispeed operation requires the ability toadjust these sample rates by about +/– 12%, and of

FIGURE 1.15-3 AES channel coding.

Page 5: NAB_1_15 Digital Audio Standards and Practices

CHAPTER 1.15: DIGITAL AUDIO STANDARDS AND PRACTICES

223

course, accommodation to 59.94 Hz video requiresoperation at 48 kHz/1.001.

AES10–2003

AES10–2003 is the “AES Recommended Practice forDigital Audio Engineering—Serial MultichannelAudio Digital Interface (MADI).” MADI is a multi-channel digital audio interface that is based on AES3.It is designed for the carriage of a maximum of 64audio channels (at 48 kHz sample rate) on a singlecoaxial cable or optical fiber. MADI preserves theAES3 subframe protocol except for the preamble. AMADI frame is composed of 64 channels, which areanalogous to AES3 subframes. Each MADI channelcontains 32 time slots, as does an AES3 subframe. Thefirst four time slots contain synchronization data,channel activity status (channel on/off), and othersuch information. The following 28 time slots are filledin the same way as in an AES3 subframe—24 audiobits, followed by a V bit, a U bit, a P bit, and a C bit.

The MADI coaxial cable interface is based on thefiber distributed digital interface (FDDI) standardizedin ISO 9314, for which chip sets are available. Data istransmitted using non-return-to-zero inverted (NRZI),polarity-free coding and a 4–5 bit encoding format, inwhich each channel’s 32 bits are grouped into 8 wordsof 4 bits each, and each 4-bit word is then encoded intoa 5-bit word. The data rate on the interface is a con-stant 125 Mbps, with the payload data rate runningbetween approximately 50 and 100 Mbps, dependingon the sample rate in use. Sample rates may vary from32 to 96 kHz +/–12.5%. The specified coaxial cablelength for the MADI signal is up to 50 m. A standardfor carriage on optical fiber is under consideration.

MADI finds frequent use in multitrack audio facili-ties, for example, as an interface between multitrackaudio recorders and consoles. It is conceivable that theMADI interface could be transmitted over very longdistances, using, for example, a synchronous opticalnetwork (SONET) circuit.

AES-10id–2005 is an information document contain-ing engineering guidelines for the implementationand use of the MADI interface.

AES11–2003

AES11–2003 is the “AES Recommended Practice forDigital Audio Engineering—Synchronization of Digi-tal Audio Equipment in Studio Operations.” This doc-ument describes a systematic approach to thesynchronization of AES3 digital audio signals. Syn-chronism between two digital audio signals is definedas that state in which the signals have identical framefrequencies, and the timing difference between them ismaintained within a recommended tolerance on asample-by-sample basis.

AES11 recommends that each piece of digital audioequipment has an input connector that is dedicated tothe reference signal. Four methods of synchronizationare proposed: (a) the use of a master digital audio ref-erence signal (DARS), ensuring that all input/output

equipment sample clocks are locked to a single refer-ence; (b) the use of the sample rate clock embeddedwithin the digital audio program signal that is input tothe equipment; (c) the use of video, from which aDARS signal is developed; and (d) the use of GPS toreference a DARS generator providing frequency andphase from one-second pulses, as well as time-of-daysample address codes in bytes 18–21 of channel status.Methods (a), (b), and (c) are preferred for normal stu-dio practice, as method (b) may increase the timingerror between pieces of equipment in a cascadedimplementation.

The digital audio reference signal is to have the for-mat and electrical configuration of the two-channelAES3 interface, but implementation of only the basicstructure of the interface format, where only the pre-amble is active, is acceptable as a reference signal. Adigital audio reference signal may be categorized inone of two grades. A grade 1 reference signal mustmaintain a long-term frequency accuracy within 61ppm, whereas a grade 2 reference signal has a toler-ance of less than 610 ppm.

AES17–1998 (r2004)

AES17–1998 (r2004) is the “AES Standard Method forDigital Audio Engineering—Measurement of DigitalAudio Equipment.” This standard defines a number oftests and test conditions for specifying digital audioequipment. Many of these tests are substantially thesame as those used for testing analog audio equip-ment, but the unique nature of digital audio dictatesthat additional tests are necessary beyond those usedfor analog audio equipment.

AES18–1996 (r2002)

AES18–1996 (r2002) is the “AES Recommended Prac-tice for Digital Audio Engineering—Format for theUser Data Channel of the AES Digital Audio Inter-face.” This standard describes a method of formattingthe user data channels within the AES3 digital audiointerface using a packet-based transmission format.This method has gained popularity in some broadcastfacilities for carrying nonaudio ancillary data such assong titles and other information. It is critical to note,however, that user and other channel status bits arenotoriously unreliable. In an effort to save data space,most storage equipment does not preserve this dataand instead generates static values prior to output. If afacility design relies on using this data space, it isimperative to verify that all equipment in the chainsupports it.

ATSC A/52B–2005 Digital Audio Compression (AC-3) Standard

Digital television broadcasting as described by theAdvanced Television Systems Committee (ATSC)standard utilizes the AC-3 digital audio standard. Useof this standard will necessitate the carriage of AC-3

Page 6: NAB_1_15 Digital Audio Standards and Practices

SECTION 1: BROADCAST ADMINISTRATION, STANDARDS, AND TECHNOLOGIES

224

compressed digital audio streams between pieces ofDTV equipment. An example of this is the interfacebetween an AC-3 encoder and the program datastream multiplexer of a DTV transmission system. Theformer Annex B of the ATSC AC-3 Digital Audio Stan-dard for digital television broadcast that describes thecarriage of compressed AC-3 elementary streams onthe IEC 958 digital audio interface has been replacedby IEC 61937.

IEC 60958 Digital Audio Interface

IEC 60958 (IEC 958) is logically identical to the AES3digital audio interface. Electrically, it provides for boththe 110 Ω balanced and the 75 Ω unbalanced inter-faces. Two versions are described: a consumer version,the Sony/Philips Digital Interface (S/PDIF), in whichbit 0 of the channel status word is set at logic 0; and aprofessional version, the AES/EBU interface, in whichbit 0 of the channel status word is set at logic 1. Provi-sion is made in the location of time slots 12–27, whichare normally used to carry linear 16-bit PCM audiowords, to permit some recording equipment to recordand play back either linear 16-bit PCM audio orencoded data streams (compressed digital audio). Theconsumer implementation permits only the 32-bitmode, in which channel 1 and channel 2 subframes aresimultaneously employed to carry 32-bit words. Theprofessional implementation permits either the 32-bitmode or the 16-bit mode, in which each subframe car-ries a 16-bit digital audio word.

The consumer implementation may carry eithertwo channels of linear PCM digital audio, or one ormore compressed audio bitstreams accompanied bytime stamps. The professional implementation maycarry two channels of linear PCM digital audio, twosets of compressed audio bitstreams with time stamps,or one channel of linear PCM digital audio and one setof compressed audio bitstreams with time stamps.Note that the consumer implementation may alsopresent output levels that are lower than the specified1 V peak-to-peak of the professional version, and careis advised when connecting consumer and profes-sional devices.

SMPTE STANDARDS AND RECOMMENDED PRACTICES

CONCERNING THE USE OF AES DIGITALAUDIO IN TELEVISION SYSTEMS

SMPTE 272M–2004

SMPTE 272M–2004 is “Television—Formatting AES/EBU Audio and Auxiliary Data into Digital VideoAncillary Data Space.” This standard defines theembedding of AES/EBU digital audio into the stan-dard definition serial digital interface specified inSMPTE 259M, 10-Bit 4:2:2 Component and 4fsc NTSCComposite Digital Signals—Serial Digital Interface. Withsuch embedding, up to 16 channels of digital audio in

the AES3 format may be carried on the serial digitalvideo interface signal that travels on a single coaxialcable.

SMPTE 276M–1995

SMPTE 276M–1995 is “Television—Transmission ofAES/EBU Digital Audio Signals over Coaxial Cable.”This SMPTE standard defines the unbalanced 75 Ωcoaxial cable electrical interface for the AES3 bit-stream.

SMPTE 299M–2004

SMPTE 299M–2004 is “Television—24-Bit DigitalAudio Format for HDTV Bit-Serial Interface.” Thisstandard defines the embedding of AES/EBU digitalaudio data into the high-definition serial digital videointerface specified in SMPTE 292M, Bit Serial DigitalInterface for High-Definition Television Systems. This isthe high-definition counterpart to SMPTE 272M.

SMPTE 302M–2002

SMPTE 302M–2002 is “Television—Mapping of AES3Data into MPEG-2 Transport Stream.” This SMPTEstandard describes how the 20-bit audio payload of anAES/EBU signal is mapped into an MPEG-2 transportstream in a bit-for-bit accurate manner. This formatcan be found in most modern MPEG-2 encoders andintegrated receiver/decoders (IRDs), and is a methodused to carry uncompressed 20-bit PCM audio as wellas mezzanine compressed audio such as Dolby E,high-density multiplexed AC-3, and Linear e-squaredformats. Although it can be used to carry a singleAC-3 stream, it is very inefficient and is incompatiblewith consumer equipment. IEC 13818 describes theproper manner for multiplexing an AC-3 stream intoan MPEG-2 transport stream.

SMPTE 320M–1999

SMPTE 320M–1999 is “Television—Channel Assign-ments and Levels on Multichannel Audio Media.”This often-overlooked standard defines proper chan-nel ordering for multichannel audio soundtracks. Thestandard for television is as follows: 1 = Left front, 2 =Right front, 3 = Center, 4 = Low Frequency Effects(LFE or Subwoofer), 5 = Left surround, 6 = Right sur-round, 7 = Left or Lt (“left total,” for matrix surroundencoded systems), 8 = Right or Rt (“right total,” formatrix surround encoded systems). It is possible forfilm format to differ slightly and the channel orderingis detailed in this specification, but for use within tele-vision facilities, film channel formatting should be cor-rected to match the order shown above.

SMPTE 337M through 341M

These standards are for “Formatting of Non-PCMAudio and Data in AES3 Serial Digital Audio Inter-

Page 7: NAB_1_15 Digital Audio Standards and Practices

CHAPTER 1.15: DIGITAL AUDIO STANDARDS AND PRACTICES

225

face.” They describe standardized methods for carry-ing compressed audio and other data types withinAES3 signals, specifically:

• 338M–2000: Data types• 339M–2000: Generic data types• 340M–2000: ATSC A/52 (AC-3) data type• 341M–2000: Captioning data type

They will become increasingly important as new pro-fessional equipment is developed to support com-pressed audio formats.

SMPTE RP 155–2004

SMPTE RP 155–2004 is “Reference Levels for DigitalAudio Systems.” This recommended practicedescribes a reference level lineup signal for use in dig-ital audio recording on digital television tape record-ers, and recommends the proper setting for the lineupsignal on the recorder’s digital audio level meters. Thereference signal is the digital representation of a 1000Hz sine wave, the level of which is 20 dB below thesystem maximum (full-scale digital). Meters are to becalibrated with this signal to indicate –20 dBFS (i.e., 20dB below full-scale digital).

SMPTE EG 32–1996

SMPTE EG 32–1996 is “Emphasis of AES/EBU Audioin Television Systems and Preferred Audio SamplingRate.” This engineering guideline recommends that noemphasis be used on digital audio recordings for tele-vision applications and that the professional digitalaudio sample frequency of 48 kHz be used.

IMPLEMENTATION ISSUES

The key to realizing the benefits of digital audio on asystemwide scale is a thorough understanding of theprinciples underlying digital signal distribution, rout-ing, and switching. There are, as explained, two elec-trical interfaces available for AES3 signals, and bothrequire good engineering practices for successfulimplementation. Digital audio’s data rate dictates thatuncompressed digital audio signals occupy a band-width similar to that of analog video. Regardless ofthe electrical interface, a well-engineered interconnectrequires proper match of source, destination, andcharacteristic cable impedances. Prior to the 1992 revi-sion of AES3, any equipment manufactured to AES3–1985 violated this principle, as that standard specifieda 250 Ω load impedance for receivers and a 110 Ωsource impedance for transmitters. Beginning in 1992,AES3 specifies impedance matching among transmit-ter, receiver, and cable.

Choice of Cable

The use of the unbalanced coaxial cable interface forAES3 data transmission is often preferred by video

engineers. SMPTE 276M and AES3–id provide guid-ance for using the 75 Ω unbalanced AES3 interface.Any high-quality video cable will be found quiteacceptable for unbalanced AES3 signals. Those engi-neers designing facilities dealing only with audio mayprefer the use of balanced, shielded, twisted paircables with XLR-type connectors to carry AES3 sig-nals, but should be aware of the cable length restric-tions of this implementation and of the possibility thatproblems will arise from impedance mismatches atconnectors and patch panels. For balanced transmis-sion of AES3 signals, special low capacitance twistedpair cable intended especially for digital audio use isrecommended over the standard twisted pair cablesused for analog audio, as the higher capacitance ofanalog audio cable tends to distort square wave sig-nals by rolling off the higher frequency components.

Digital Audio Distribution

The use of analog video distribution and routingequipment is generally not recommended for AES3signals, as such equipment may distort AES signalshapes and rise times, adversely affecting the decod-ing of the signal at the receiving equipment. The spu-rious high frequency signal energy that may begenerated by such distortions of signal shape cancause crosstalk-related bit errors that are difficult todetect and analyze. Distribution of the AES3 signalusing high-quality digital audio distribution amplifi-ers will maintain the proper frequency and phase rela-tionships, as well as signal shapes and rise times.

System Synchronization

When possible, all digital audio signals should be syn-chronous in order to avoid objectionable digital arti-facts. In a large plant, it is necessary to provide a singlemaster reference signal to which all interconnectedsystems are synchronized. The master reference, fed toall pieces of equipment, allows audio data to beretimed and synchronized within specified tolerances.

Large facilities, in particular, will benefit from theconversion of digital audio signals from sources with-out external sync capability to a standard, synchro-nized audio sample rate. Broadcast digital audioplants typically contain consumer and other nonsyn-chronizable equipment that requires sample rate con-version. Audio sample rate converters perform afunction similar to video standard converters, in thata dynamic low pass filter continually adjusts theoffending signal’s phase at the output of the con-verter. In some cases, the output and input samplerates can be locked together via an integer relation-ship in a process known as synchronous sample rateconversion. For example, 48 kHz and 44.1 kHz arerelated by the integer ratio of 160 to 147. Modern sam-ple rate conversion can be accomplished with full 24-bit resolution and THD+N below –140 dBm and assuch has become an audibly lossless process. How-ever, it is important to note that systems utilizingcompressed audio, such as AC-3, Dolby E, and Linear

Page 8: NAB_1_15 Digital Audio Standards and Practices

SECTION 1: BROADCAST ADMINISTRATION, STANDARDS, AND TECHNOLOGIES

226

esquared, bit-for-bit accuracy of the AES3 audio pay-load is imperative and will be corrupted by samplerate conversion—even when used in 1:1 modes forretiming (i.e., 48 kHz is reclocked to local 48 kHz refer-ence).

MADI Synchronization

It is necessary for the equipment transmitting MADIdata to include timing information that the receivingequipment can extract and use for synchronization. Atleast one sync code must be sent per frame; a synccode consists of two consecutive 5-bit words not usedin the 4-bit to 5-bit encoding scheme. The total MADIinterface data rate is higher than the payload data raterequired, the difference between these two rates beingsufficient to include sync codes within each frame. Thefiber distributed digital interface (FDDI) chip set usedfor MADI implementation automatically handles therequired synchronizing and coding operations.

AES3 Synchronization

AES3 is inherently synchronous, the clock signal beingreadily recovered from the AES3 bitstream. However,the use of a master digital audio reference ensures thatall digital audio equipment in a system will be fre-quency and phase locked and free of cascaded timingerrors, and is highly recommended by AES11. Themaster reference signal may come from the digitalaudio console in a facility on the scale of a single room,or from an external reference generator in larger facili-ties. The master sync signal should be sent to allequipment capable of accepting external sync signals.

Digital audio phase integrity must remain intactduring the conversion of multiple audio channelsbetween the digital and analog domains. Perfect phasesynchronization requires use of an SDIF-2 word clockor an AES3 signal as the common master clock. Digitalaudio recording and processing equipment forces anyAES3 input signal into a common AES3 frame phase.When such an AES3 frame alignment is performed, aphase error will result if there are any deviations in theframe phase of analog-to-digital (A/D) converters.

When digital audio signals are transferred to apiece of equipment that is not synchronized using amaster sync signal, sample rate converters must beused at the inputs to the receiving equipment to pre-vent clicks and pops.

Word Clock Synchronization

SDIF-2 word clock, commonly referred to as simplyword clock, is a square wave signal at the digital audiosample rate. Word clock is commonly used as a refer-ence signal in small, audio-only facilities. In facilitiesthat handle both video and audio, black burst is com-monly used as the reference for both video and AESaudio signal synchronization. Note that most profes-sional audio equipment does not accept word clock asa reference signal, but instead relies on the AES11

standard whereby an AES/EBU signal with itsembedded clock reference is used to derive propersynchronization. This eliminates the difficulties of dis-tributing a high-frequency word clock square wavethroughout a facility.

Signal RoutingAsynchronous routing is the simplest and most cost-efficient method of routing digital audio. It passes dig-ital audio signals at any sample rate, a degree of flexi-bility that is ideal in situations in which a number ofdifferent audio sample rates are encountered. How-ever, the lack of synchronization to a master referencemakes it a poor choice for on-air applications or anyother situation in which frame accurate switching orediting is required.

An asynchronous router may be thought of as anelectronic patching system, functioning as thoughsimple wires were used to connect inputs to outputs.In an asynchronous system, it is imperative that thedestination equipment be capable of locking to thesample rate of the signal routed to it; otherwise, mut-ing usually takes place.

The disadvantage of asynchronous routers is thattheir output signal is almost always corrupted when aswitch is made between input signals. A switch typi-cally results in one or more AES frames being dam-aged, and this may cause destination equipment tomomentarily lose lock, causing muting or the genera-tion of pops and clicks.

Synchronous routing ensures precise timing and nocorruption of the data stream during switches. It isconsiderably more complex and costly than asynchro-nous routing, as it requires that a transition betweentwo inputs be made at an AES frame boundary. Allinputs to a synchronous router must be locked to acommon digital audio reference. A digital audio con-sole is essentially a synchronous router with manycontrols. Note that when routing compressed audiosuch as Dolby E, switching must occur not only at anAES frame boundary, but also at an AES frame bound-ary located near the video vertical interval switchpoint to prevent corruption of the compressed audiopackets. Systems like the Linear Acoustic Stream-Stacker-HD require switching only on the AES frameboundary. Routing and switching AC-3 encoded sig-nals are of greater difficulty, as the encoded packetsfrom one stream to the next are not phase aligned.

JitterJitter is short-term frequency variation in the inputdata stream to a digital audio device. It can result froma number of causes, including such things as the cou-pling of excessive noise into a transmission link. Somejitter buildup is inevitable in a system, as certain com-ponents of the system inherently generate someamount of jitter. For example, noise in the phase-locked loops that control clock frequencies in the com-ponents of the system unavoidably generates some jit-ter. The presence of out-of-specification jitter on a

Page 9: NAB_1_15 Digital Audio Standards and Practices

CHAPTER 1.15: DIGITAL AUDIO STANDARDS AND PRACTICES

227

digital audio signal or clock can result in bit errors thatgenerate clicking and popping sounds. High levels ofjitter may cause a receiving device to lose lock, while arelatively small amount may have no apparent nega-tive effect unless present in devices performing ana-log-to-digital (A/D) or digital-to-analog (D/A)conversions. Excessive jitter is seldom a problemwhen only two pieces of equipment are involved, buttypically builds up when larger numbers of equip-ment are interconnected. Jitter may be eliminatedthrough the use of synchronizing digital-to-analogconverters or a common synchronization signal. Jitteron the synchronization signal itself can cause degrada-tion of all digital audio in devices locked to it.

Levels and Metering

When an analog audio signal is converted to digital,the greatest analog voltage level that may be repre-sented digitally is called full-scale digital (FSD). Whenquantized, this voltage level causes all digital audiobits to be set to logic 1, and this level is called 0 dBFS(full scale). This is an inflexible limit, and any excur-sion of the analog signal above this level will beclipped off, as the digital audio word does not havethe capacity to faithfully represent it. In practice, theFSD level is often set about 1 dB above the analog cliplevel in an effort to assure that digital clipping neveroccurs.

When signals are converted between the analogand digital domains, the analog reference levels ofA/D and D/A converters may be set to any number ofvalues. If the analog reference level is improperly cali-brated in any of the converters in the path, A/D andD/A conversions may result in an increase or adecrease in the level of the recovered analog signal.

Consistency in the type of digital audio meteringdevice used, good operator training, and the establish-ment of strict house standard reference levels andalignment practices are the best defenses when itcomes to accurate audio level control.

There is no U.S. standard for a specific digital audiolevel meter. Digital audiometers are often of theinstantaneous response type, with no integration time,permitting them to respond with full excursion to apeak as brief as a single digital audio sample. Contrastthis with the standard volume indicator (VU meter),which is an average-responding device, and the typi-

cal peak-program meter, which does not respond withfull excursion to peaks with durations less than 10 ms.Typically, digital audio metering devices display amaximum value of 0 dBFS, and reference level lineuptone is set to a designated point below 0 dBFS toaccommodate peaks without digital clipping.

Figure 1.15-4 shows a representative digital audiometer, the display device of which is usually an arrayof light emitting diodes or other such devices. Thisrepresentative meter displays a range of –40 dB to 0dBFS, with lineup tone being calibrated at –20 dBFS.

For television applications, SMPTE RP 155 recom-mends adjusting the level of lineup tone to read –20dBFS on digital audio meters used on digital video-tape recorders. Other industry segments have vari-ously used lineup tone levels of –15, –18, and –20dBFS. These varying reference levels may cause incon-sistent results when digital audio recordings are inter-changed. It is therefore important to establish commondigital audio reference and operating levels whenexchanging digital audio recordings.

Loudness metering is best accomplished withmeters designed to measure loudness. VU- and PPM-type meters are not truly appropriate for accuratelyjudging loudness, as the results are often a mixture ofmeter readout and user interpretation and are thusunreliable for producing consistent results.

SUMMARY

Digital audio, with its many advantages, is inherentlynot susceptible to many of the problems that areencountered in analog audio systems. It does harborsome potential hazards of its own, however. With careand attention to good engineering practices in thedesign and maintenance of digital audio facilities andobservance of the recommendations described inAES/EBU, IEC, and SMPTE standards, outstandingresults will be realized.

Standards[1] AES3–1992 (r2003) AES Recommended Practice for Digital Audio

Engineering—Serial Transmission Format for Two-Channel LinearlyRepresented Digital Audio Data, New York, Audio EngineeringSociety, 2003.

[2] AES5–1998 (r2003) AES Recommended Practice for ProfessionalDigital Audio—Preferred Sampling Frequencies for Applications

FIGURE 1.15-4 Representative digital audio level meter. (Courtesy Dorrough Electronics.)

Page 10: NAB_1_15 Digital Audio Standards and Practices

SECTION 1: BROADCAST ADMINISTRATION, STANDARDS, AND TECHNOLOGIES

228

Employing Pulse-Code Modulation, New York, Audio Engineer-ing Society, 2003.

[3] AES10–1991 (r2003) AES Recommended Practice for Digital AudioEngineering—Serial Multichannel Audio Digital Interface (MADI),New York, Audio Engineering Society, 2003.

[4] AES11–1997 (r2003) AES Recommended Practice for Digital AudioEngineering—Synchronization of Digital Audio Equipment in Stu-dio Operations, New York, Audio Engineering Society, 2003.

[5] AES17–1998 AES Standard Method for Audio Engineering—Mea-surement of Digital Audio Equipment, New York, Audio Engi-neering Society, 1998.

[6] AES18–1996 AES Recommended Practice for Digital Audio Engi-neering—Format for the User Data Channel of the AES DigitalAudio Interface, New York, Audio Engineering Society, 1996.

[7] ATSC A/52B–2005 Digital Audio Compression (AC-3) Standard,Washington, Advanced Television Systems Committee, 1995.

[8] IEC 60958 (1999) Digital Audio Interface, Geneva, InternationalElectrotechnical Commission, 1999.

[9] SMPTE 259M–1993 10-Bit 4:2:2 Component and 4fsc NTSC Com-posite Digital Signals—Serial Digital Interface, White Plains, Soci-ety of Motion Picture and Television Engineers, 1993.

[10] SMPTE 272M–1994 (r2004) Formatting AES/EBU Audio and Aux-iliary Data into Digital Video Ancillary Data Space, White Plains,Society of Motion Picture and Television Engineers, 2004.

[11] SMPTE 276M–1995 Transmission of AES/EBU Digital Audio Sig-nals over Coaxial Cable, White Plains, Society of Motion Pictureand Television Engineers, 1995.

[12] SMPTE 292M–1996 Bit-Serial Digital Interface for High-DefinitionTelevision Systems, White Plains, Society of Motion Picture andTelevision Engineers, 1996.

[13] SMPTE 299M–1997 (r2004) 24-Bit Digital Audio Format forHDTV Bit-Serial Interface, White Plains, Society of Motion Pic-ture and Television Engineers, 2004.

[14] SMPTE 302M–1998/2000 Mapping of AES3 Data into MPEG-2Transport Stream, White Plains, Society of Motion Picture andTelevision Engineers, 2000.

[15] SMPTE 320M–1999 Channel Assignments and Levels on Multi-channel Audio Media, White Plains, Society of Motion Pictureand Television Engineers, 1999.

[16] SMPTE 337M through SMPTE 340M Formatting of Non-PCMAudio and Data in AES3 Serial Digital Audio Interface, WhitePlains, Society of Motion Picture and Television Engineers.

[17] SMPTE RP 155 Audio Levels for Digital Audio Records on DigitalTelevision Tape Recorders, White Plains, Society of Motion Pic-ture and Television Engineers, 2004.

[18] IEC 61937-1 Digital Audio—Interface for Non-Linear PCMEncoded Bitstreams Applying IEC 60958, Part 1—General,Geneva, International Electrotechnical Commission.

[19] IEC 61937-3 Digital Audio—Interface for Non-Linear PCMEncoded Bitstreams Applying IEC 60958, Part 3—Non-Linear PCMBitstreams According to the AC-3 Format, Geneva, InternationalElectrotechnical Commission.

Recommended Practices and Information DocumentsAES2–id, 1996 AES Information Document for Digital Audio Engineer-

ing—Guidelines for the Use of the AES Interface, New York, AudioEngineering Society, 1996.

AES3–id, 1995 AES Information Document for Digital Audio Engineer-ing—Transmission of AES3 Formatted Data by Unbalanced CoaxialCable, New York, Audio Engineering Society, 1995.

AES10–id, 1995 AES Information Document for Digital Audio Engineer-ing—Engineering Guidelines for the Multichannel Audio DigitalInterface (MADI) AES10, New York, Audio Engineering Society,1995.

SMPTE Recommended Practice RP 155–1997 Audio Levels for DigitalAudio Records on Digital Television Tape Recorders, White Plains,Society of Motion Picture and Television Engineers, 1997.

SMPTE Engineering Guideline EG 32–1996 Emphasis of AES/EBUAudio in Television Systems and Preferred Audio Sampling Rate,White Plains, Society of Motion Picture and Television Engi-neers, 1996.