ban dich digitalaudio

Upload: hoang-tuan-viet

Post on 02-Jun-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/10/2019 Ban Dich DigitalAudio

    1/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 1

    Phn 2: Nn Audio s

  • 8/10/2019 Ban Dich DigitalAudio

    2/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 2

    Digital Audio Compression

  • 8/10/2019 Ban Dich DigitalAudio

    3/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 3

    MPEG Audio: Specifications

    MPEG-1 (ISO/IEC 11172-3) provides:

    Single-channel ('mono') and two-channel ('stereo' or 'dual mono')coding of digitized sound waves at 32, 44.1, and 48 kHz

    sampling rate. The predefined bit-rates range from 32 to 448 kbit/s for Layer I,

    from 32 to 384 kbit/s for Layer II, and from 32 to 320 kbit/s forLayer III.

    MPEG-2 BC (ISO/IEC 13818-3) provides: A backwards compatible (BC) multi-channel extension to

    MPEG-1 Up to 5 main channels plus a 'low frequent enhancement' (LFE)

    channel can be coded The bit-rate range is extended up to about 1 Mbit/s;

    An extension of MPEG-1 towards lower sampling rates 16,22.05, and 24 kHz for bitrates from 32 to 256 kbit/s (Layer I)

    and from 8 to 160 kbit/s (Layer II & Layer III).

  • 8/10/2019 Ban Dich DigitalAudio

    4/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 4

    MPEG Audio: Specifications (2) MPEG-2 AAC (ISO/IEC 13818-7) provides

    A very high-quality audio coding standard for 1 to 48 channels at samplingrates of 8 to 96 kHz, with multichannel, multilingual, and multiprogramcapabilities.

    AAC works at bitrates from 8 kbit/s for a monophonic speech signal up to in

    excess of 160 kbit/s/channelfor very-high-quality coding that permitsmultiple encode/decode cycles. Three profiles of AAC provide varying levels of complexity and scalability.

    MPEG-4 (ISO/IEC 14496-3) provides Coding and composition of natural and synthetic audio objects, Scalability of the bitrate of an audio bitstream, Scalability of encoder or decoder complexity, Structured Audio: A universal language for score-driven sound synthesis TTSI: An interface for text-to-speech conversion systems.

    MPEG-7 (ISO/IEC 15938) will provide Standardized descriptions and description schemes of audio structures

    and sound content,

    A language to specify such descriptions and description schemes.

  • 8/10/2019 Ban Dich DigitalAudio

    5/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 5

    Related specifications

    MUSICAM Masking pattern adapted Universal Sub-band Integrated

    Coding And Multiplexing Designed to be suitable for DAB (Digital Audio Broadcasting)

    ASPEC Adaptive Spectral Perceptual Entropy Coding Designed for high degrees of compression to allow audio

    transmission on ISDN

    NICAM 728 Used for European PAL television audio

    Dolby AC-3 Design for ATSC Digital TV

  • 8/10/2019 Ban Dich DigitalAudio

    6/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 6

    Tng quan v nn audio

    Nn audio da vo hai hin tng sau: Th nht, vi tn hiu audio in hnh, khng phi mi

    tn s u xut hinng thi. Th hai, do hin tng che mt n, thnh gic ca con

    ngi khng th nhn bitc mi chi tit ca tnhiu audio.

    C cu nn audio chia ph m thanh thnh cc bngbng cch lc hoc m ha bini, v s dng t dliu hn khi m t cc bng c bin thp.

    Khi hin tng che mt n ngn cn hoc lm gimmc nghe thy ca mt bng c th, th lng d liucn gi cn c th gimi na.

  • 8/10/2019 Ban Dich DigitalAudio

    7/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 7

    Background of audio compression

    Audio compression takes advantage of two facts. First, in typical audio signals, not all frequencies are

    simultaneously present. Second, because of thephenomenon of masking, human

    hearing cannot perceive every detail of an audio signal.

    Audio compression splits the audio spectrum intobands by filtering or transforms, and includesless data when describing bands in which the level

    is low. Where maskingprevents or reduces audibilityof

    a particular band, even less data needs to be sent.

  • 8/10/2019 Ban Dich DigitalAudio

    8/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 8

    Tng quan v nn audio (2)

    Nn audio kh hn nn video l do s chnh xc ca thnh gic.

    1- S che mt n: Chc th che mt n khi m thanh che v m thanhc che l

    trng nhau v khng gian. S trng nhau v khng gian lun tn ti ch thu mono

    nhng khng c ch thu stereo

    Do , h thng m thanh stereo v m thanh vng,ngi ta chpnhn h s nn thp tc mt cht lng xcnh.

    2- Cht lng loa : Cng hng tr ca nhng loa cht lng kmchemt n cc

    mo dng nhn to. Kimt ramt b nn bng loa cht lng km s cho kt qu sai, tn hiu c cht lng chp nhnc s gy tht vng khi nghechng bng loa tt.

  • 8/10/2019 Ban Dich DigitalAudio

    9/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 9

    Background of audio compression (2)

    Audio compression is relatively harder than video compressionbecause of the acuity of hearing.

    1- Masking:

    Masking only works properly when the masking and the maskedsounds coincide spatially. Spatial coincidence is always the case in mono recordings but

    not in stereo recordings, where low-level signals can still beheard if they are in a different part of the soundstage.

    Consequently, in stereo and surround sound systems, alower compression factor is allowable for a given quality.

    2. Speakers quality: Delayed resonances in poor loudspeakers actually mask

    compression artifacts. Testing a compressor with poor speakers gives a false result,

    signals which are apparently satisfactory may be disappointingwhen heard on good equipment.

  • 8/10/2019 Ban Dich DigitalAudio

    10/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 10

    c tnh nghe ca con ngi

    Hnh pha trn cho thy ngng nghe cacon ngi l hmca tn s.

    Tt nhin, nhy cao nht nm trong

    vng ni. Hnh di m t ngng nghe khi c s

    xut hin ca mn sc. Ch rng ngng nghec nng cao

    vi cc m tn s cao v c tn s thp l hin tng che mt n.

    Vi ph u vo phc tp, v d nh mnhc, ngng ngheu tng hu ht cc

    tn s. Kt qu l, m x x ca ct-xt audio

    tng t chc th nghec khi nhcim lng.

  • 8/10/2019 Ban Dich DigitalAudio

    11/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 11

    The characteristics of human hearing

    The top figure shows that the thresholdof hearing is a function of frequency.

    Naturally, the greatest sensitivity isin the speech range.

    The bottom figure shows the hearingthreshold in the presence of a singletone. Note that the threshold is raised for

    tones at higher frequency and to someextent at lower frequency maskingeffect.

    A complex input spectrum, such asmusic, raises the threshold at nearly all

    frequencies. As a result, the hiss from an analog audio

    cassette is only audible during quietpassages in music.

  • 8/10/2019 Ban Dich DigitalAudio

    12/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 12

    c tnh nghe ca con ngi (2) m thanh phic xut hin

    t nht 1ms trc khi chngtr nn c th nghec. Do s png chm ny,

    hin tng che mt n vn cth xy ra ngay c khi hai tnhiu khng hin dinng

    thi Hin tng che mt n trc

    v che mt n sau xut hinkhi m che mt n tip tc chem thanh cc mc thp hntrc v sau khong thi giandin ramchemt n .

  • 8/10/2019 Ban Dich DigitalAudio

    13/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 13

    The characteristics of human hearing (2)

    A sound must be present forat least about 1 millisecondbefore it becomes audible. Because of this slow

    response, masking can stilltake place even when the twosignals involved are not

    simultaneous. Forward and backward

    masking occur when themasking sound continues tomask sounds at lower levelsbefore and after the maskingsound's actual duration.

  • 8/10/2019 Ban Dich DigitalAudio

    14/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 14

    S che mt n

    S che mt n lm tng ngng nghe, ccb nn li dng hin tng ny ny bng

    cch tng nhiu sn, cho php biu dinm thanh bng t bit hn.

    Nhiu sn chc th tng tn s m ticnh hng ca s che mt n.

    tia honh hng ca s che mt n,

    cn phi chia ph audio ra lm cc bng tnkhc nhau cho phpa ra cc lng

    nn/gin v nhiu khc nhau trong mi bng.

  • 8/10/2019 Ban Dich DigitalAudio

    15/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 15

    Masking

    Masking raises the threshold of hearing, compressors take advantage of this effect by

    raising the noise floor, which allows the audiowaveform to be expressed with fewer bits.

    The noise floor can only be raised at frequencies at

    which there is effective masking. To maximize effective masking, it is necessary to

    split the audio spectrum into different frequency

    bands to allow introduction of different amounts ofcompanding and noise in each band.

  • 8/10/2019 Ban Dich DigitalAudio

    16/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 16

    M hnh bm ho MPEG Audio tng qut

    B lcbng con

    Phnphi bit

    PhtLung bit

    Tnh tonngng

    che mt n

    u vo u ra

  • 8/10/2019 Ban Dich DigitalAudio

    17/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 17

    MPEG Audio: General encoder model

    Sub-band

    Filter

    Bit

    Allocation

    Bit-stream

    Generation

    Compute

    Masking

    Input Output

  • 8/10/2019 Ban Dich DigitalAudio

    18/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 18

    Thut ton m ho MPEG Audio

    Sdng cc b lc bng con chia tn hiu audio thnh 32 bng tn

    con tngng vi 32 bng quan trng nht.

    Tnh ton lng che mt n cho mi bng gy ra bi bng ln cn bngcch sdng m hnh tm l thnh gic.

    Nu nng lng trong mt bng thp hn ngng che mt n , n s b

    b qua. Mt khc, xcnh s bt cn thitbiu din h ssao cho

    nhiu sinh ra do lng tha thp hn hiung che mt n (1bit tng

    ng 6dB).

    Pht ra lung bit.

  • 8/10/2019 Ban Dich DigitalAudio

    19/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 19

    MPEG Audio encoding algorithm

    Use sub-band filters to divide the audio signal into32 frequency sub-bands that approximate the 32

    critical bands. Determine amount of masking for each band caused

    by nearby band using the psychoacoustic model.

    If the power in a band is below the maskingthreshold, ignore it. Otherwise, determine numberof bits needed to represent the coefficient such that

    noise introduced by quantization is below themasking effect (1 bit 6 dB).

    Generate bitstream

  • 8/10/2019 Ban Dich DigitalAudio

    20/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 20

    MPEG Audio: V dm ha

    Bng 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

    Mc (db) 0 8 12 10 6 2 10 60 35 20 15 2 3 5 3 1

    Sau khi phn tch, mc ca 16 bngu tin trong 32bng

    Mc ca bng th8 l 60dB. Nu c h o mc c h e mt n12 dBbng 7, 15dBbng 9, th: Mcbng 7 l 10 dB ( < 12 dB ), b qua.

    Mcbng 9 l 35 dB ( > 15 dB), m ho.

    C thm ho ln ti 2bit (=12db) ca sai s lng t.

  • 8/10/2019 Ban Dich DigitalAudio

    21/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 21

    MPEG Audio: Coding example

    Band 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

    Level (db) 0 8 12 10 6 2 10 60 35 20 15 2 3 5 3 1

    After analysis, the levels of the first 16 of the 32 bands are:

    The level of the 8th band is 60dB. If it gives a masking of 12

    dB in the 7th band, 15dB in the 9th, then Level in 7th band is 10 dB ( < 12 dB ), ignore it.

    Level in 9th band is 35 dB ( > 15 dB ), encode it.

    Can encode with up to 2 bits (= 12 dB) of quantization error.

  • 8/10/2019 Ban Dich DigitalAudio

    22/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 22

    M ho bng con _ Sub-band coding (SBC) - Nn gin

    Hnh v bn m t mt b nn gin chia bng. B lc chia bng l mt tp hp cc bng hp,

    pha tuyn tnh, chng gi ln nhau v c cngmt di thng.

    u r a mi bng gm cc muc trng cho mt

    dng sng. Trong mi bng tn, u vo audio c khuch

    i ln mc cao nht trc khi truyni. Sau, mi mc s c quay tr v gi tr

    chnh xc ca n Nhiu trnng truyn s c gim trong mi

    bng. Nu so snh s gim nhiu vi ngng nghe ta

    thy cc bng c th chp nhn mt lng nhiuln hn nh hin tng che mt n.

    Do , trong mi bng sau khi nn-gin, c thgim di t ca cc mu.

    Kthut nytc mt t s nn v nhiu gyra do gim phn giic che mt n.

  • 8/10/2019 Ban Dich DigitalAudio

    23/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 23

    Sub-band coding (SBC) - Companding The Figure shows a band-splitting compandor. The band-splitting filter is a set of narrow-band,

    linear-phase filters that overlap and all have thesame bandwidth.

    The output in each band consists of samplesrepresenting a waveform.

    In each frequency band, the audio input isamplified up to maximum level prior totransmission.

    Afterwards, each level is returned to its correctvalue.

    Noise picked up in the transmission is reduced ineach band. If the noise reduction is compared with the

    threshold of hearing, it can be seen that greaternoise can be tolerated in some bands becauseof masking.

    Consequently, in each band after companding,it is possible to reduce the wordlength ofsamples.

    This technique achieves a compressionbecause the noise introduced by the loss ofresolution is masked.

  • 8/10/2019 Ban Dich DigitalAudio

    24/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 24

    MPEG Audio Lp I

    Hnh v trn m t b m ho chia bng dng trong MPEG lp I cn gin

    ho. u vo audio s ca v o b lc chia bng, b ny chia ph ca tn hiu ra

    lm cc bng (32 bng). Trc thi gianoc chia thnh cc khi c chiu di bng nhau. Trong MPEG Lp I, c 384 muu vo, do trongu vo ca b lc c 12 mu

    trong mi bng ca lot 32 bng ny. Trong mi bng, tn hiuc khuchi ln tia nh mt php nhn H s khuchi y u cu l khngi trong khong thi gian ko di mt block. Mt h s tl c truyn cng vi mi block ca mi bng cho php qu trnh

    co ngc li bn gii m.

  • 8/10/2019 Ban Dich DigitalAudio

    25/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 25

    MPEG Audio Layer I

    The figure shows a simplified bandsplitting coder used in MPEG Layer I.

    The digital audio input is fed to a bandsplitting filter that divides the spectrum ofthe signal into a number of bands (32 bands). The time axis is divided into blocks of equal length. In MPEG Layer I, this is 384 input samples, so in the output of the filter there are

    12 samples in each of 32 bands.

    Within each band, the level is amplified by multiplication to bring the level upto maximum. The gain required is constant for the duration of a block A single scale factor is transmitted with each block for each bandin order

    to allow the process to be reversed at the decoder.

  • 8/10/2019 Ban Dich DigitalAudio

    26/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 26

    MPEG Audio Lp I (tip)

    u r a caccb lc cngc phn tch xcnh ph catn hiuu vo.

    S phn tch nyiu khin m hnh che mt n xcnhmc che mt n trong mi bng.

    Kh nng thc hin che mt n cng ln th chnh xc cacc mu trong mi bng cng c th nh.

    chnh xc mu b gimi bng cch lng t ho li gim di t.

    S gim ny l khngi vi mi t trong mt bng, nhng cc

    bng khc nhau c th s dng di t khc nhau di t cnc truyn di dng m phn phi bit cho mi

    bng cho php b gii m gii m dng bit ng.

  • 8/10/2019 Ban Dich DigitalAudio

    27/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 27

    MPEG Audio Layer I (cont.)

    The filter bank output is also analyzed to determine thespectrum of the input signal.

    This analysis drives a masking model that determines the

    degree of masking that can be expected in each band. The more masking available, the less accurate the samples in

    each band can be. The sample accuracy is reduced by requantizing to reduce

    wordlength. This reduction is also constant for every word in a band, but

    different bands can use different wordlengths.

    The wordlength needs to be transmitted as a bit allocationcode for each band to allow the decoder to deserialize the bitstream properly.

    MPEG d bi di 1

  • 8/10/2019 Ban Dich DigitalAudio

    28/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 28

    MPEG dng bit audio mc 1

    Hnh trn m t dng bit audio MPEG mc 1, n bao gm:

    Mung b v phn mou. 32 t m phn phi bit, mi t 4 bit.

    Nhng m ny m t di t ca c c mu trongmi bng con.

    32 h s tl s dng trong vic nn-gin mi bng.

    Cc h s tl ny xcnh khuchi cn thit trong b gii ma audio v mc chnh xc.

    D liu audio trong mi bng.

    MP G l 1 di bi

  • 8/10/2019 Ban Dich DigitalAudio

    29/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 29

    MPEG Level 1 audio bit stream

    The top Figure shows an MPEG Level 1 audio bit stream,

    which includes: Synchronizing pattern and the header, 32 Bit allocation codes of four bits each.

    These codes describe the wordlength of samples in eachsubband.

    32 scale factors used in the companding of each band. These scale factors determine the gain needed in the decoder to

    return the audio to the correct level. Audio data in each band.

    MPEG b ii l I

  • 8/10/2019 Ban Dich DigitalAudio

    30/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 30

    MPEG b gii m lp I

    Tn hiung b c pht hin bi b nh thi, n cng tch cc dliu phn phi bit v d liu h s tl.

    D liu phn phi bit sauc dng tch ra cc mu d liu(sample) c chiu di bini.

    Qu trnh lng t ha li v qu trnh nnco ngc li bng

    vic s dng d liu h s tl a mi bng tngng quay limc chnh xc ban u.

    32 bng tch bit ny sauc ghp li vi nhau bng mt b lcghp sinh ra audio u ra.

    MPEG L I d d

  • 8/10/2019 Ban Dich DigitalAudio

    31/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 31

    MPEG Layer I decoder

    The synchronization pattern is detected by the timinggenerator, which deserializes the bit allocation and scalefactor data.

    The bit allocation data then allows deserialization of thevariable length samples.

    The requantizing is reversed and the compression isreversed by the scale factor data to put each band back tothe correct level.

    These 32 separate bands are then combined in a combinerfilter which produces the audio output.

    MPEG A di khi i l

  • 8/10/2019 Ban Dich DigitalAudio

    32/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 32

    MPEG Audio: khi nim lp

    3 lp trong MPEG audio: L

    p I, II, III

    M hnh c bn ging nhau.

    phc tp ca CODEC tng theo lp.

    Bm ho lp cao hn c th gii m lung ca lp thp hn ( v d,

    b gii m lp III c th gii m lung lp II, v..v ) M hnh tm l thnh gic con ngic dng xcnh m

    phn phi bit cho mi bng con.

    H s nn(Tc bit gc l 1,4 Mbps tngng vi cht lng audio CD)

    1:4 Lp 1 (tngng 384 kbps cho tn hiu stereo).

    1:6...1:8 Lp 2 (tngng 256..192 kbps cho tn hiu stereo).

    1:10...1:12 Lp 3 (tngng 128..112 kbps cho tn hiu stereo).

    MPEG A di Th f L

  • 8/10/2019 Ban Dich DigitalAudio

    33/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 33

    MPEG Audio: The concept of Layers

    Three layers in MPEG audio: Layer I, II, III Basic model is similar. CODEC complexity increases with each layer. Encoder of higher layer can decode stream of lower layer (e.g.

    Layer III decoder can decode Layer II stream, etc) Psychoacoustic model is used to determine bit allocation to each

    subband.

    Compression Ratios (Original bitrate is 1,4 Mbps of CD quality audio)

    1:4 by Layer 1 (corresponds to 384 kbps for a stereo signal),

    1:6...1:8 by Layer 2 (corresponds to 256..192 kbps for a stereo signal),

    1:10...1:12 by Layer 3 (corresponds to 128..112 kbps for a stereo signal),

    MPEG A di L i bl

  • 8/10/2019 Ban Dich DigitalAudio

    34/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 34

    MPEG Audio: Loi blc

    Lp I: b lc DCT vi mt khung v tn s triu trong mi bng.

    M hnh tm l thnh gic ch dng hin tng mt n tn s.

    Lp II: Sdng 3 khung trong mt b lc (tng cng 1152 mu). M hnh tm l thnh gic sdng mt cht mt n thi gian.

    Lp III: Sdng b lc bng tt hn (cc tn skhng bng nhau).

    M hnh tm l thnh gic dng c hiung mt n thi gian.

    C li dngd tha stereo.

    Sdng bm ho Huffman.

    MPEG A di Filt t p

  • 8/10/2019 Ban Dich DigitalAudio

    35/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 35

    MPEG Audio: Filter type

    Layer I: DCT type filter with one frame and equalfrequency spread per band

    Psychoacoustic model only uses frequency masking. Layer II: Use three frames in filter (total 1152

    samples)

    Psychoacoustic models a little bit of the temporal masking. Layer III: Better critical band filter is used (non-

    equal frequencies) Psychoacoustic model includes temporal masking effects. Takes into account stereo redundancy.

    Uses Huffman coder.

    B h A di MPEG 1 (L I & II)

  • 8/10/2019 Ban Dich DigitalAudio

    36/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 36

    Bm ho Audio MPEG-1 (Layer I & II)

    Analysisfilter bank Scaler Quantizer

    Quantized

    sampleencoder

    Psychoacousticmodel

    Bit-rateallocation

    Bit-rateallocationencoder

    Scalefactor

    encoder

    M

    ultiplexer

    Rn

    32 subbands

    0 to 31

    PCMinput

    Output

    SFn

    SMRn

    SF = Scale factorR = Rate

    SMR = Signal to Mask Ratio

    MPEG 1 A di E d (L I & II)

  • 8/10/2019 Ban Dich DigitalAudio

    37/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 37

    MPEG-1 Audio Encoder (Layer I & II)

    Analysis

    filter bank Scaler Quantizer

    Quantized

    sampleencoder

    Psychoacousticmodel

    Bit-rateallocation

    Bit-rateallocationencoder

    Scalefactor

    encoder

    M

    ultiplexer

    Rn

    32 subbands

    0 to 31

    PCMinput

    Output

    SFn

    SMRn

    SF = Scale factorR = Rate

    SMR = Signal to Mask Ratio

    MPEG 1 b m ho Audio (tip)

  • 8/10/2019 Ban Dich DigitalAudio

    38/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 38

    MPEG-1 b m ho Audio (tip)

    Lung audio u vo chy qua mt bng lc chiau vo thnhnhiu bng con.

    ng thi lung audio u voi qua m hnh tm l thnh gic

    xcnh ts ca nng lng tn hiu vi mc che mt ncho mi bng con.

    Khi phn phi bit s dng h s tn hiu trn mt n quytnh vic phn chia tng s bit c dng cho qu trnh lng tho tn hiu bng con gim thiu tia kh nng nghe thynhiu lng t ho.

    Cui cng, b ghp knh ghp cc mu bng con c

    lng t ha vnh dng cc d liu ny cng vi thng tinph thnh dng bit m ho.

    Cc d liu ph thuc ty c th c chn vo trong lung bitm ho.

    MPEG 1 Audio Encoder (cont )

  • 8/10/2019 Ban Dich DigitalAudio

    39/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 39

    MPEG-1 Audio Encoder (cont.)

    The input audio stream passes through a filter bank that dividesthe input into multiple subbands of frequency.

    The input audio stream simultaneously passes through a

    psychoacoustic model that determines the ratio of the signalenergy to the masking threshold for each subband. The bit- or noise allocation block uses the Signal-to-Mask

    Ratios to decide how to apportion the total number of code

    bits available for the quantization of the subband signals tominimize the audibility of the quantization noise. Finally, the multiplexer takes the representation of the quantized

    subband samples and formats this data and side information into

    a coded bitstream. Ancillary data not necessarily related to the audio stream can

    be inserted within the coded bitstream.

    MPEG Audio: ghp cc mu bng con

  • 8/10/2019 Ban Dich DigitalAudio

    40/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 40

    MPEG Audio: ghp cc mu bng con

    Lp I: 12 * 32 = 384 mu.

    Lp II, III: 12* 3* 32 = 1152 mu.

    12samples

    12samples

    12samples

    Subbandfilter 1

    Subbandfilter 2

    Subbandfilter 32

    12samples

    12samples

    12samples

    12

    samples

    12

    samples

    12

    samples

    Layer Iframe

    Layer II, IIIframe

    .

    .

    Audiosamples

    in

    MPEG Audio: Subband sample grouping

  • 8/10/2019 Ban Dich DigitalAudio

    41/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 41

    MPEG Audio: Subband sample grouping

    Layer I: 12 * 32 = 384 samples, Layer II, III: 12* 3* 32 = 1152 samples

    12samples

    12samples

    12samples

    Subbandfilter 1

    Subbandfilter 2

    Subbandfilter 32

    12samples

    12samples

    12samples

    12

    samples

    12

    samples

    12

    samples

    Layer Iframe

    Layer II, IIIframe

    .

    .

    Audiosamples

    in

    M hnh tm l thnh gic: Lp I & II

  • 8/10/2019 Ban Dich DigitalAudio

    42/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 42

    M hnh tm l thnh gic: Lp I & II

    B tch nhn dng v phn tch cc thnh phn m thanh dng sinev cc m dng khng sine (ging nhiu) v kh nng che mt n cahai loi tn hiu ny khc nhau.

    FastFourier

    Transform(FFT)

    Tonal/nontonalseparator

    Computetonal

    masking

    thresholdfunction

    Computenontonalmaskingthresholdfunction

    Compute

    signalpower

    Compute

    quietthreshold

    CalculateMinimum

    tonal

    non

    tonal Mn

    Sn SMRn

    PCMinput

    512 or 1024frequencies

    Maskingthresholdfunction

    Psychoacoustic model: Layer I & II

  • 8/10/2019 Ban Dich DigitalAudio

    43/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 43

    Psychoacoustic model: Layer I & II

    The separator identifies and separates the tonal and noise-like components (non-tonal) of the audio signal because themasking abilities of the two types of signal differ.

    FastFourier

    Transform(FFT)

    Tonal/nontonalseparator

    Computetonal

    masking

    thresholdfunction

    Computenontonalmaskingthresholdfunction

    Compute

    signalpower

    Compute

    quietthreshold

    CalculateMinimum

    tonal

    non

    tonal Mn

    Sn SMRn

    PCMinput

    512 or 1024frequencies

    Maskingthresholdfunction

    MPEG-1 bm ho Audio Lp III (mp3)

  • 8/10/2019 Ban Dich DigitalAudio

    44/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 44

    MPEG 1 bm ho Audio Lp III (mp3)

    Analysisfilter bank

    Scaler Quantizer

    Quantizedsample

    Huffman

    encoder

    Psycho-

    acousticmodel

    Calculate windows sizes,Scale factor bands,

    Bit rate allocationand quantization taking

    buffer fullness into account

    Sideinformationencoder

    Scalefactor

    encoder

    Multiplexer

    32 subbands

    0 to 31

    PCM

    input

    Output

    Scale_factors

    MDCT

    Buffer

    Side

    information

    Buffer

    fullnessSMRn

    Sub-subbands

    MPEG-1 Audio Layer III Encoder (mp3)

  • 8/10/2019 Ban Dich DigitalAudio

    45/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 45

    MPEG 1 Audio Layer III Encoder (mp3)

    Analysisfilter bank

    Scaler Quantizer

    Quantizedsample

    Huffman

    encoder

    Psycho-

    acousticmodel

    Calculate windows sizes,Scale factor bands,

    Bit rate allocationand quantization taking

    buffer fullness into account

    Sideinformationencoder

    Scalefactor

    encoder

    Multiplexer

    32 subbands

    0 to 31

    PCM

    input

    Output

    Scale_factors

    MDCT

    Buff

    er

    Side

    information

    Buffer

    fullnessSMRn

    Sub-subbands

    Dng khung ca 3 lp

  • 8/10/2019 Ban Dich DigitalAudio

    46/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 46

    Dng khung ca 3 lp

    SCFSI = Scale Factor Selection Information _ thng tin la chn h stl.

    Thng tin ph ca khung mp3 = 17bytes (136 bit) trong ch knhn v 32 bytes (256 bits) trong ch knhi.

    CRC l ty chn. Trong khi lp I chcha 384 mu th lp II v lp III cha 1152 mu. D liu chnh ca mp3 c th cha d liu ca cc khung hng xm

    (xem cc slide sau)

    Header(32)

    CRC(0,16)

    Bit Allocation(128,256)

    Scale factor(0-384)

    Samples Ancillarydata

    Header

    (32)

    CRC

    (0,16)

    Bit Allocation

    (128,256)

    Scale factor

    (0-384)

    Samples Ancillary

    data

    SCFSI

    (0-60)

    Header

    (32)

    CRC

    (0,16)

    Side information

    (136, 256)

    Main Data

    (may belong to other frames)

    Ancillary

    data

    Layer I

    Layer II

    Layer III

    Frame formats of 3 layers

  • 8/10/2019 Ban Dich DigitalAudio

    47/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 47

    Frame formats of 3 layers

    SCFSI = Scale Factor Selection Information Side Information of mp3 frame = 17 bytes (136 bits) in single

    channel mode and 32 bytes (256 bits) in dual channel mode. CRC is optional While Layer I contains only 384 samples, Layer II and Layer III

    contains 1152 samples Main data of mp3 may contain data of neighbor frames (See next

    slide)

    Header(32)

    CRC(0,16)

    Bit Allocation(128,256)

    Scale factor(0-384)

    Samples Ancillarydata

    Header

    (32)

    CRC

    (0,16)

    Bit Allocation

    (128,256)

    Scale factor

    (0-384)

    Samples Ancillary

    data

    SCFSI

    (0-60)

    Header

    (32)

    CRC

    (0,16)

    Side information

    (136, 256)

    Main Data

    (may belong to other frames)

    Ancillary

    data

    Layer I

    Layer II

    Layer III

    Khung mp3

  • 8/10/2019 Ban Dich DigitalAudio

    48/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 48

    Khung mp3 Phn d liu chnh cha gi tr h s tl m ho v cc d liuc

    m ho Huffman. Chiu di ca n p h thuc v o tc bit v chiu d i ca d liu ph thuc.

    Chiu di ca phn h s tl ph thuc vo viccch s tl cc s

    dng li hay khng, v cng ph thuc vo chiu di ca s (di hay ngn). H s tl c dng trong vic lng t ho li cc mu.

    Do tnh cht ca m Huffman nn tc bit thayi theo thi gian trong

    sut qu trnh m ho. C th dngnh dng VBR (tc bit thayi) kim sot vn ny,

    nhng ccng dng nh truyn thng qung b th thng yu cu mttc bit c nh

    Do ngi raa r a mt kthut gi l dtrbitcho php s dngkhong khng gian lu tr cc d liu cha s dngn ca mt khung cho mthoc hai khung tip theo.

    MP3 frame

  • 8/10/2019 Ban Dich DigitalAudio

    49/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 49

    The main data section contains the coded scale factor valuesand the Huffman coded frequency lines Its length depends on the bitrate and the length of the ancillary

    data. The length of the scale factor part depends on whether scale

    factors are reused, and also on the window length (short or long). The scale factors are used in the requantization of the

    samples The demand for Huffman code bits varies with time during the

    coding process. The variable bitrate format can be used to handle this, but a fixed

    bitrate is often required for an application such as broadcasting Therefore there is also a bit reservoir technique that allows

    unused main data storage in one frame to be used by up totwo consecutive frames

    Khung mp3 - d tr bit

  • 8/10/2019 Ban Dich DigitalAudio

    50/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 50

    g p

    Thit k ca dng bit lp III ph hp hn vi nhucu thayi theo thi gianca b m ho

    Ging vi lp II, lp III x l d liu audio trong cc khung c 1,152 mu.

    Khc vi lp II, d liuc m ho th hin cc mu ny khng cn thitphi va kht vi khung c chiud i c nh trong dng bit m ho.

    B m ha c th cho cc bit vo mt kho d tr khi n cn s bit t hn s bittrung bnh m ho mt khung.

    MP3 frame Bit Reservoir

  • 8/10/2019 Ban Dich DigitalAudio

    51/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 51

    The design of the Layer III bitstream better fits the encoder's timevarying demand on code bits.

    As with Layer II, Layer III processes the audio data in frames of1,152 samples.

    Unlike Layer II, the coded data representing these samples do notnecessarily fit into a fixed length frame in the code bitstream.

    The encoder can donate bits to a reservoir when it needs fewerthan the average number of bits to code a frame.

    Khung mp3 - d tr bit (2)

  • 8/10/2019 Ban Dich DigitalAudio

    52/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 52

    g p ( )

    Sau, khi b m ho cn nhiu hn s bittrung bnh ny, n c th mn cc bit trong

    kho d tr. B m ho chc th mn cc bit cho

    trong cc khung qu kh, n khng thmn cc bit cho t khung tng lai.

    Dng bit MP3 bao gm 9-bit con tr, btu

    d liu chnh, chraa chbyte khiuca d liu audio cho khung.

    MP3 frame Bit Reservoir (2)

  • 8/10/2019 Ban Dich DigitalAudio

    53/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 53

    ( )

    Later, when the encoder needs more thanthe average number of bits to code a frame,

    it can borrow bits from the reservoir. The encoder can only borrow bits donated

    from past frames; it cannot borrow fromfuture frames.

    MP3 bitstream includes a 9-bit pointer,

    "main_data_begin," with each frame's sideinformation pointing to the location of thestarting byte of the audio data for that frame.

    MP3: phn tch tn s lai

  • 8/10/2019 Ban Dich DigitalAudio

    54/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 54

    p

    Mcch Tng phn gii tn s trong cc bng con c m

    ho nhn thc tt hn. Cho php gim bt nhiu rng ca gy ra bi cc b lc

    bng con. MDCT (Modified Discrete Cosine Transform) - Bini

    cosin ri rc ci tin. 50% bini gi nhau Ca sMDCT ngn : 6 bng con ph (12 im DCT)

    trong mi bng con. Phn gii thi gian tt hn. Ca sMDCT di: 18 bng con ph(36 im DCT)

    trong mi bng con. Phn gii tn s tt hn.

    MP3: Hybrid frequency analysis

  • 8/10/2019 Ban Dich DigitalAudio

    55/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 55

    y q y y

    Purpose Increase the frequency resolution in subbands for better

    perceptural coding.

    Allow for some cancelation of aliasing caused bypolyphase analysis subband filters.

    MDCT (Modified Discrete Cosine Transform) 50% overlapped transform Short-window MDCT: 6 sub-subbands (12 point DCT) in

    each subband. Better time resolution.

    Long window MDCT: 18 sub-subbands (36 point DCT) ineach subband. Better frequency resolution.

    B gii m mp3

  • 8/10/2019 Ban Dich DigitalAudio

    56/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 56

    g p

    MP3 Decoder

  • 8/10/2019 Ban Dich DigitalAudio

    57/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 57

    c tnh MP3

  • 8/10/2019 Ban Dich DigitalAudio

    58/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 58

    Sound quality Bandwidth Mode Bitrate Reduction ratio

    Telephone sound 2.5 kHz mono 8 kbps * 96:1

    Short wave 4.5 kHz mono 16 kbps 48:1

    AM radio 7.5 kHz mono 32 kbps 24:1

    FM radio 11 kHz stereo 56...64 kbps 26...24:1

    Near-CD 15 kHz stereo 96 kbps 16:1

    CD >15 kHz stereo 112..128kbps 14..12:1

    MP3 Performance

  • 8/10/2019 Ban Dich DigitalAudio

    59/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 59

    Sound quality Bandwidth Mode Bitrate Reduction ratio

    Telephone sound 2.5 kHz mono 8 kbps * 96:1

    Short wave 4.5 kHz mono 16 kbps 48:1

    AM radio 7.5 kHz mono 32 kbps 24:1

    FM radio 11 kHz stereo 56...64 kbps 26...24:1

    Near-CD 15 kHz stereo 96 kbps 16:1

    CD >15 kHz stereo 112..128kbps 14..12:1

    MPEG-2 Audio

  • 8/10/2019 Ban Dich DigitalAudio

    60/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 60

    S khc nhau gia MPEG-1 v MPEG-2 audioi vi stereo 2 knh.

    Tc ly mu PCM mrng ti c cc tn s 16, 22.05,24 kHz.

    Tcbit c mrng ti c mc thpn 8 kbits/s.

    Bng lng t ho tt hncc tc thp hn. Ci thin hiu qum ho ca hs t l (scale_factor)

    v ch cng (intensity_mode_stereo)lp III.

    MPEG-2 Audio

  • 8/10/2019 Ban Dich DigitalAudio

    61/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 61

    Difference between MPEG-1 and MPEG-2audio for two-channel stereo

    Initial PCM sampling rate extends to include 16,22.05, 24 kHz.

    Pre-assigned bitrates are extended to as low as 8

    kbits/s. Provide better quantization tables for lower rates.

    Improve the coding efficiency of the coding ofscale_factor and intensity_mode stereo in LayerIII.

    MPEG-2 Audio: tng thch ngc

  • 8/10/2019 Ban Dich DigitalAudio

    62/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 62

    nh ngh

    a m thanh vng 5 knh

    Tri trc (L), phi trc (R), trung tmtrc (C), cnh/sau tri (LS), cnh/sauphi (RS), v (ty chn) loa siu trm(low-frequency enhancement _ LFE)

    B gii m MPEG-1 c th gii m tnhiu L v R.

    Phng php m ho Knh L v R c m ho nh MPEG-

    1.

    Cc knh b sung c m ho nh dliu ph thuc trong lung audio MPEG-1.

    3/2 stereo: L, R, C, LS, RS.

    Stereo 5.1: L, R, C, LS, RS, LFE

    MPEG-2 Audio: Backward Compatible (BC)

  • 8/10/2019 Ban Dich DigitalAudio

    63/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 63

    Define a five-channel surroundsound Front left (L), front right (R), front

    center (C), side/rear left (LS),side/rear right (RS), and (optional)

    low-frequency enhancement (LFE) MPEG-1 decoder can decode the L

    and R signal. Coding method:

    L and R channels are coded asMPEG-1 does.

    Additional channels are coded asancillary data in the MPEG-1 audio

    stream. 3/2 stereo: L, R, C, LS, RS 5.1 channel stereo: L, R, C, LS, RS,

    LFE

    Khung MPEG-2 Audio

  • 8/10/2019 Ban Dich DigitalAudio

    64/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 64

    n Theo hnh v, khung MPEG-2 audio l dng mrng ca khung MPEG-1, vi h tr a khung va

    ngn ng.

    Header CRC Bit Allocation Scale factor Samples Ancillary data 1SCFSI

    MCHeader

    MCCRC

    MCBit Allocation

    MCPredictor

    MC Samples Multi-lingualCommentary

    MCSCFSI

    Ancillary data 2

    Multi-Channel (MC) audio data information

    MPEG-2 Audio frame

  • 8/10/2019 Ban Dich DigitalAudio

    65/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 65

    As can be seen on the Figure, MPEG-2 Audio frameis an extension of MPEG-1 frame, which supports

    multi-channel and multi-lingual.

    Header CRC Bit Allocation Scale factor Samples Ancillary data 1SCFSI

    MCHeader

    MCCRC

    MCBit Allocation

    MCPredictor

    MC Samples Multi-lingualCommentary

    MCSCFSI

    Ancillary data 2

    Multi-Channel (MC) audio data information

    S tng thch gia MPEG-2 BC and MPEG-1

  • 8/10/2019 Ban Dich DigitalAudio

    66/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 66

    Layer ILayer I Layer IILayer II Layer IIILayer III

    MPEG-1MPEG-1

    Mono & Stereo32, 44.1, 48 Khz

    Layer ILayer I Layer IILayer II Layer IIILayer III

    LowFrequency

    LowFrequency

    Mono & Stereo18, 22.05, 24 Khz

    Layer ILayer I Layer IILayer II Layer IIILayer III

    Multi-Channel

    Multi-Channel

    5 channels32, 44.1, 48 Khz

    MPEG-2MPEG-2

    MPEG-2 BC and MPEG-1 compatibility

  • 8/10/2019 Ban Dich DigitalAudio

    67/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 67

    Layer ILayer I Layer IILayer II Layer IIILayer III

    MPEG-1MPEG-1

    Mono & Stereo32, 44.1, 48 Khz

    Layer ILayer I Layer IILayer II Layer IIILayer III

    LowFrequency

    LowFrequency

    Mono & Stereo18, 22.05, 24 Khz

    Layer ILayer I Layer IILayer II Layer IIILayer III

    Multi-Channel

    Multi-Channel

    5 channels32, 44.1, 48 Khz

    MPEG-2MPEG-2

    MPEG-2 Audio: M ho audio tin tin (AAC)

  • 8/10/2019 Ban Dich DigitalAudio

    68/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 68

    nng cao cht lng nn audio sdng cckthut mi nht

    Cnc gi l MPEG-2 NBC (Non BackwardCompatible)

    Tc ly mu PCM: 8 kHz n 96 kHz.

    H trtmono lnn 48 knh audio

    MPEG-2 Audio: Advanced Audio Coding (AAC)

  • 8/10/2019 Ban Dich DigitalAudio

    69/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 69

    To further improve the quality of compressedaudio using state-of-the-art technologies.

    It was designated as MPEG-2 NBC (NonBackward Compatible)

    Initial PCM sampling rate: 8 kHz to 96 kHz.

    Support from mono up to 48 audio channels

    Nhng im quan trng

  • 8/10/2019 Ban Dich DigitalAudio

    70/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 70

    Cc chun MPEG Audio C ch MPEG Audio

    Thnh gic ca con ngi & hin tng mt n C ch m ho bng con SBC M hnh tm l thnh gic ca con ngi Khi nim cc lp

    MPEG-1 M ho/gii m audio Lp I Lp II Lp III S khc nhau

    MPEG-2 Audio BC MPEG-2 AAC (NBC)

    Key Points

  • 8/10/2019 Ban Dich DigitalAudio

    71/71

    9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 71

    MPEG Audio Specifications MPEG Audio mechanism

    Human hearing & Audio masking

    Sub-band coding (SBC) mechanism

    Psychoacoustic model

    The concept of layers

    MPEG-1 Audio encoding/decoding Layer I Layer II Layer III

    Differences MPEG-2 Audio BC

    MPEG-2 AAC (NBC)