lecture 16 (104 slides)

Upload: hasnain-gohar

Post on 03-Jun-2018

227 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/12/2019 Lecture 16 (104 Slides)

    1/104

    Digital Speech ProcessingDigital Speech Processing

    Speech Coding Methods BasedSpeech Coding Methods Basedon Speech Modelson Speech Models

    1

  • 8/12/2019 Lecture 16 (104 Slides)

    2/104

    Waveform Coding versus BlockWaveform Coding versus BlockProcessingProcessing

    sample-by-sample matching of waveforms coding quality measured using SNR

    Source modeling (block processing) block processing of signal => vector of outputs

    every block overlapped blocks

    Block 1

    2

    Block 2

    Block 3

  • 8/12/2019 Lecture 16 (104 Slides)

    3/104

    -- weve carried waveform coding based on optimizing and maximizing

    achieved bit rate reductions on the order of 4:1 (i.e., from 128 KbpsPCM to 32 Kbps ADPCM) at the same time achieving toll quality SNR for telephone-bandwidth speech

    o ower ra e ur er w ou re uc ng speec qua y, we nee oexploit features of the speech production model, including:

    source modeling s ectrum modelin use of codebook methods for coding efficiency

    we also need a new way of comparing performance of differentwaveform and model-based coding methods

    , , -based coders since they operate on blocks of speech and dont followthe waveform on a sample-by-sample basis

    new subjective measures need to be used that measure user-perceived

    3

    , ,

  • 8/12/2019 Lecture 16 (104 Slides)

    4/104

    Enhancements for ADPCM Coders

    pitch prediction noise shaping

    Analysis-by-Synthesis Speech Coders multipulse linear prediction coder (MPLPC) code-excited linear prediction (CELP)

    Open-Loop Speech Coders two-state excitation model LPC vocoder residual-excited linear predictive coder

    m xe exc a on sys ems speech coding quality measures - MOS speech coding standards

    4

  • 8/12/2019 Lecture 16 (104 Slides)

    5/104

    Differential QuantizationDifferential Quantization

    n x nd nd nc

    ][~ n x ][ n xP : simple predictor ofvocal tract response

    ][ nc ][ nd ][ n x

    5= = p

    k

    k

    z zP 1)( ][~

    n x

  • 8/12/2019 Lecture 16 (104 Slides)

    6/104

    Issues with Differential uantizationIssues with Differential uantization difference signal retains the character of

    the excitation si nal switches back and forth between quasi-periodic and noise-like signals

    pre c on ura on even w en us ng

    p =20) is order of 2.5 msec (for sampling predictor is predicting vocal tract response

    not the excitation period (for voiced sounds)

    Solution incorporate two stages ofprediction, namely a short-time predictor

    6

    -

    time predictor for pitch period

  • 8/12/2019 Lecture 16 (104 Slides)

    7/104

    Pitch PredictionPitch Prediction+ + ++ +

    + ++

    +

    +

    + )( zP = M z zP )(

    [ ] x n

    ++

    )(2 zP )(1 1 zP =

    =

    p

    k

    k

    k z zP 12 )(

    )( zPc

    first stage pitch predictor:

    res uaTransmitter Receiver

    second sta e linear redictor vocal tract redictor :

    1 ( ) = M P z z

    721( )

    == p

    k

    k k P z z

  • 8/12/2019 Lecture 16 (104 Slides)

    8/104

    first stage pitch predictor:

    i

    M 1

    this predictor model assumes that the pitch period, , is an integer number of samples and is a gain constant allowing

    i M

    for variations in pitch period over time (for unvoiced or background frames, values of and are irrelevant) i

    M

    1

    is of the form:

    + = + +M M P z z z z 1

    1 =M M k z 1

    this more advanced form provides a way to handle non-integer pitch period through interpolation around the nearest integer

    =i

    k

    8

    pitch period value, M

  • 8/12/2019 Lecture 16 (104 Slides)

    9/104

    The combined inverse system is the cascade in the decoder system:i

    1 2

    1 1 1( ) 1 ( ) 1 ( ) 1 ( )c c

    H z P z P z P z= =

    [ ][ ]1 21 ( ) 1 ( ) 1 ( ) 1 1 -

    cP z P z P z = =

    i

    [ ]1 2 1( ) ( ) ( )P z P z P z

    1 2 1

    1

    which is implemented as a parallel combination of two predictors:

    and

    c

    P z P z P z

    i

    [ ]

    The prediction signal, can be expressed as x ni

    p

    9

    1

    k k =

  • 8/12/2019 Lecture 16 (104 Slides)

    10/104

  • 8/12/2019 Lecture 16 (104 Slides)

    11/104

  • 8/12/2019 Lecture 16 (104 Slides)

    12/104

    first search for M that maximizes [M ] opt

    Solve for more accurate pitch predictor bym n m z ng e var ance o e expan epitch predictor

    Solve for optimum vocal tract predictorcoefficients, k , k=1,2,,p

    12

  • 8/12/2019 Lecture 16 (104 Slides)

    13/104

    Pitch PredictionPitch Prediction

    Vocal tractprediction

    Pitch and

    prediction

    13

  • 8/12/2019 Lecture 16 (104 Slides)

    14/104

    Noise Shaping in DPCMNoise Shaping in DPCMSystemsSystems

    14

  • 8/12/2019 Lecture 16 (104 Slides)

    15/104

  • 8/12/2019 Lecture 16 (104 Slides)

    16/104

    Noise ShapingNoise Shaping

    Basic ADPCM encoder and decoder

    Equivalent ADPCM encoder and decoder

    16Noise shaping ADPCM encoder and decoder

  • 8/12/2019 Lecture 16 (104 Slides)

    17/104

    Noise Sha inNoise Sha in[ ] [ ] [ ] ( ) ( ) ( )

    The equivalence of parts (b) and (a) is shown by the following:

    From art a we see that:

    x n x n e n X z X z E z= + = +

    i

    i

    ( ) ( ) ( ) ( ) [1 ( )] ( ) ( ) ( )

    with

    D z X z P z X z P z X z P z E z= =

    i

    ( ) ( ) ( ) [1 ( )] ( ) [1 ( )] ( )

    showing the equivalence of parts (b) and (a). Further, since

    D z D z E z P z X z P z E z= + = +

    i

    ( ) ( ) ( ) ( )1 ( )

    1([1 ( )] ( ) [1 ( )] ( ))

    X z H z D z D zP z

    P z X z P z E z

    = =

    = +

    ( ) ( )

    Fee X z E z

    = +

    i

    ding back the quantization error through the predictor,

    17

    [ ] [ ]

    , ,, by the quantization error, , incurred in quantizing the

    difference signal

    x n e n

    [ ]., d n

  • 8/12/2019 Lecture 16 (104 Slides)

    18/104

    Shaping the Quantization NoiseShaping the Quantization Noise( )

    ( ),

    To shape the quantization noise we simply replace bya different system function, giving the reconstructed signal as:

    P z

    F z

    i

    '( ) ( ) '( ) '( )1 ( )

    1

    X z H z D z D zP z

    = =

    '

    1 ( )P z

    1 ( )( ) '( )

    F z X z E z

    = +

    [ ]Thus if is coded by the encoder, the -transform of thereconstructed signal at the receiver is:

    x n zi

    '( ) ( ) '( )1 ( )

    '( ) '( ) ( ) '(1

    X z X z E zF z

    E z E z z E zP z

    = + = = )

    181 ( )

    ( ) 1 ( ) where is the effective noise shaping filter F z

    z P z

    =

    i

  • 8/12/2019 Lecture 16 (104 Slides)

    19/104

    Noise shaping filter options:i

    . an we assume no se as a a spec rum, en

    the noise and speech spectrum have the same shape 2. then the e uivalent s stem is the standard

    z

    F z P z

    =

    = DPCM

    1

    '( ) '( ) ( )system where with flat noise spectrum

    '' '' p

    k k

    E z E z E z

    = =

    = =1

    .

    to ''hide'' the noise beneath the spectral peaks of the speech signal;k =

    zeros have the same angles in the -plane, but with a radius that isdivided by .

    z

    19

  • 8/12/2019 Lecture 16 (104 Slides)

    20/104

    Noise Sha in Filter Noise Sha in Filter

    20

  • 8/12/2019 Lecture 16 (104 Slides)

    21/104

    Noise Shaping Filter Noise Shaping Filter 2'

    2 /2 / 2

    ' '

    ,

    1 ( )

    power of then the power spectrum of the shaped noise is of the form:

    S

    S

    e

    j F F j F F F eP e

    =

    1 ( )S P e

    spectrum

    above

    speech

    21

  • 8/12/2019 Lecture 16 (104 Slides)

    22/104

    Fully Quantized AdaptiveFully Quantized AdaptivePredictive Coder Predictive Coder

    22

  • 8/12/2019 Lecture 16 (104 Slides)

    23/104

    Input is x [n]

    P 2 (z ) is the short-term (vocal tract) predictor

    Signal v [n] is the short-term prediction error

    Goal of encoder is to obtain a quantized representation of this excitation

    23

    , .

  • 8/12/2019 Lecture 16 (104 Slides)

    24/104

    Total bit rate for ADPCM coder:i

    where is the number of bits for the quantization of thedifference signal, is the number of bits for encoding the

    ADPCM S P P

    B B

    = + +i

    step size at frame ra ,te and is the total number of bits

    allocated to the predictor coefficients (both long and short-term)

    with frame rate

    P

    P

    F B

    F

    8000 1 4Typically and even with bits, we need between

    8000 and 3200S F B= i

    bps for quantization of difference signal Typically we need about 3000-4000 bps for the side informationi

    (step size and predictor coefficients) Overall we need between 11,000 and 36,000 bps for a fui lly quantized system

    24

  • 8/12/2019 Lecture 16 (104 Slides)

    25/104

    Bit Rate for LP CodingBit Rate for LP Coding

    speech and residual sampling rate: F s=8 kHz LP analysis frame rate: F

    =F

    P = 50-100 frames/sec

    quantizer stepsize: 6 bits/frame predictor parameters:

    M (pitch period): 7 bits/frame vocal tract predictor coefficients: PARCORs 16-20, 46-

    50 bits/frame

    prediction residual: 1-3 bits/sample total bit rate:

    25

    BR = 72*F P + F s (minimum)

  • 8/12/2019 Lecture 16 (104 Slides)

    26/104

    TwoTwo--Level (B=1 bit) Quantizer Level (B=1 bit) Quantizer re ct on res ua

    Quantizer in ut

    Quantizer output

    Reconstructed pitch

    Original pitch residual

    econs ruc e speec

    26

  • 8/12/2019 Lecture 16 (104 Slides)

    27/104

    ThreeThree- -Level Center Level Center- -Clipped Quantizer Clipped Quantizer re ct on res ua

    Quantizer in ut

    Quantizer output

    Reconstructed pitch

    Original pitch residual

    econs ruc e speec

    27

  • 8/12/2019 Lecture 16 (104 Slides)

    28/104

    Summary of Using LP in SpeechSummary of Using LP in Speechodingoding

    the redictor can be more so histicated than a

    vocal tract response predictorcan utilizeperiodicity (for voiced speech frames) t e quant zat on no se spectrum can e s ape

    by noise feedback

    the formant peaks in the speech, thereby utilizing theperceptual masking power of the human auditory

    we now move on to more advanced LP coding ofspeech using Analysis-by-Synthesis methods

    28

  • 8/12/2019 Lecture 16 (104 Slides)

    29/104

    Analysis Analysis- -byby--SynthesisSynthesisSpeech CodersSpeech Coders

    29

  • 8/12/2019 Lecture 16 (104 Slides)

    30/104

    -- --

    closed-loop adaptive predictive coder was

    input/excitation to the vocal tract model) to

    rates while maintaining very high quality at

    30

  • 8/12/2019 Lecture 16 (104 Slides)

    31/104

    A A--bb--S Speech CodingS Speech Coding

    Replace quantizer for generating excitation signal with anoptimization process (denoted as Error Minimization above)

    31

    w ere y e exc a on s gna , n s cons ruc e ase onminimization of the mean-squared value of the synthesis error,

    d [n]= x [n]- x [n]; utilizes Perceptual Weighting filter.~

  • 8/12/2019 Lecture 16 (104 Slides)

    32/104

    A A--bb--S Speech CodingS Speech Coding

    [ ]

    as c opera on o eac oop o c ose - oop - - sys em: 1. at the beginning of each loop (and only once each loop), the

    speech signal, , is used to generate an optimum order th x n p

    i

    11 1( )

    1 ( )

    p

    i

    H zP z

    z = =

    [ ] [ ] [ ]

    [ ],

    2. the difference signal, , based on an initial estimateof the speech signal, is perceptually

    d n x n x n

    x n

    =

    =

    weighted by a speech-adaptive

    filter of the form:1 ( )

    ( )1 ( )

    (see next vugraph)

    3. the error minimization box and the excitation generator create a sequence

    P zW z

    P z

    =

    [ ]

    of error signals that iteratively (once per loop) improve the match to theweighted error signal

    4. the resulting excitation signal, , which is an improved estimate of thed n

    32

    ac ua pre c on error s gna or eac oop era on, s ue o exc e e LPC filter and the loop processing is iterated until the resulting error signal

    meets some criterion for stopping the closed-loop iterations.

  • 8/12/2019 Lecture 16 (104 Slides)

    33/104

  • 8/12/2019 Lecture 16 (104 Slides)

    34/104

  • 8/12/2019 Lecture 16 (104 Slides)

    35/104

    Implementation of A-B-S Speechoding

    the vocal tract filter that produces high qualitysynthetic output, while maintaining a structuredrepresentation that makes it easy to code the

    excitation at low data rates SolutionSolution :: use a set of basis functions whichallow you to iteratively build up an optimal

    ,basis function at each iteration in the A-b-S

    35

  • 8/12/2019 Lecture 16 (104 Slides)

    36/104

    Implementation of A-B-S Speech Coding

    1 2{ [ ], [ ],..., [ ]}, 0 1

    Assume we are given a set of basis functions of the form:

    and each basis function is 0 outside the defining interval.Q

    Q

    f n f n f n n L =

    i

    At each iteration of the A-b-S loop, wei

    :

    select the basis function

    from that maximially reduces the perceptually weighted mean-squared error, E

    ( )1 2

    0

    [ ] [ ] [ ] [ ]

    [ ] [ ]where and are the VT and perceptual weighting filte

    L

    n

    E x n d n h n w n

    h n w n

    =

    =

    rs.[ ],

    [ ] [ ]

    .

    We denote the optimal basis function at the iteration as

    giving the excitation signal where is the optimal

    wei htin coefficient for basis function

    k

    k

    th

    k k k

    k f n

    d n f n

    n

    =

    i

    The A-b-Sk

    i iteration continues until the perceptually weighted error falls below some desired threshold, or until a maximum number of iterations is reached ivin the final excitation si nal as: N d n

    36

    1

    [ ] [ ]k

    N

    k k

    d n f n =

    =

  • 8/12/2019 Lecture 16 (104 Slides)

    37/104

    - -

    Closed Loo Coder Closed Loo Coder

    37Reformulated Closed Loop Coder Reformulated Closed Loop Coder

  • 8/12/2019 Lecture 16 (104 Slides)

    38/104

  • 8/12/2019 Lecture 16 (104 Slides)

    39/104

    - -

    , ,...,

    [ ])

    e now eg n e era on o e - - oop,

    We optimally select one of the basis set (call this anddetermine the am litude ivin :

    k f n

    =i

    i

    [ ] [ ], 1, 2,...,

    W

    k k k d n f n k N = =

    i

    e then form the new perceptually weighted error as:1

    1

    ' [ ] ' [ ] [ ] '[ ]

    ' [ ] ' [ ]

    k k k k

    k k k

    th

    e n e n f n h n

    e n y n

    =

    =

    ( )1

    2

    0

    ' [ ]

    - L

    k k n

    E e n

    =

    =

    i

    ( )1

    2

    10

    ' [ ] ' [ ] L

    k k k n

    e n y n

    =

    =

    39

  • 8/12/2019 Lecture 16 (104 Slides)

    40/104

  • 8/12/2019 Lecture 16 (104 Slides)

    41/104

    - - Our final results are the relations:

    N N

    i

    1 121 1

    2

    '[ ] [ ] '[ ] ' [ ]

    ' ' ' '

    k k k k

    k k

    L L N

    x n f n h n y n = =

    = =

    0 0 1

    21

    2 '[ ] ' [ ] ' [ ]

    k k n n k

    L N

    k k j

    E x n y n y n

    = = =

    = 0 1

    where then k j = =

    1 1

    ' ' ' '

    re-optimized 's satisfy the relation:k L N L

    0 1 0

    , , ,...,

    [ ]

    At receiver use set of along with to create excitation:k

    j k k jn k n

    k f n = = =

    = =

    i

    411[ ] [ ]

    k k k

    x n f n =

    = [ ]h n

    A l iA l i bb S h i C diS h i C di

  • 8/12/2019 Lecture 16 (104 Slides)

    42/104

    Analysis Analysis- -byby--Synthesis CodingSynthesis Coding

    [ ] [ ] 0 1 1 u pu se near pre c ve co ng

    = = f n n Q L

    -

    . . a an . . em e, ,

    Proc. IEEE Conf. Acoustics, Speech and Signal Proc ., 1982.

    [ ] 1 2

    vector of white Gaussian noise, = =M f n Q

    M. R. Schroeder and B. S. Atal, Code-excited linear prediction

    Self-excited linear predictive vocoder (SEV)(CELP), Proc. IEEE Conf. Acoustics, Speech and Signal Proc ., 1985.

    1 2, s e vers ons oprevious excitation source = n n

    42

    R. C. Rose and T. P. Barnwell, The self-excited vocoder,Proc. IEEE Conf. Acoustics, Speech and Signal Proc ., 1986.

  • 8/12/2019 Lecture 16 (104 Slides)

    43/104

    43

  • 8/12/2019 Lecture 16 (104 Slides)

    44/104

    Multipulse LP Coder Multipulse LP Coder

    2

    u pu se uses mpu ses as e as s unc ons; us e as cerror minimization reduces to:

    i

    0 1

    [ ] [ ] k k n k

    E x n h n = =

    =

    44

  • 8/12/2019 Lecture 16 (104 Slides)

    45/104

    1 11. find best and for single pulse solution

    2. subtract out the effect of this impulse from the speech

    waveform and repeat the process3. do this until desired minimum error is obtained -

    perceptually close to the original

    45

    M lti l A l iM lti l A l i

  • 8/12/2019 Lecture 16 (104 Slides)

    46/104

    Multipulse AnalysisMultipulse Analysis

    Output fromprevious

    B. S. Atal and J. R. Remde, A new model of LPC excitation-

    frame

    46

    , .IEEE Conf. Acoustics, Speech and Signal Proc ., 1982.

  • 8/12/2019 Lecture 16 (104 Slides)

    47/104

    Examples of Multipulse LPCExamples of Multipulse LPC

    B. S. Atal and J. R. Remde, A new model of LPC Excitation-

    47

    , .IEEE Conf. Acoustics, Speech and Signal Proc ., 1982.

  • 8/12/2019 Lecture 16 (104 Slides)

    48/104

    -- =

    impulses/sec X 9 bits/impulse => 7200 bps need 2400 bps for A(z) => total bit rate of

    9600 bps code pulse locations differentially ( i = N i N i-1 ) to reduce range of variable

    amplitudes normalized to reduce dynamicran e

    48

  • 8/12/2019 Lecture 16 (104 Slides)

    49/104

    basic idea is that primary pitch pulses are correlated and predictable over

    , . .,s [n] s [n-M ]

    break correlation of speech into short term component (used to providespectral estimates) and long term component (used to provide pitch pulse

    first remove short-term correlation by short-term prediction, followed byremoving long-term correlation by long-term predictions

    49

  • 8/12/2019 Lecture 16 (104 Slides)

    50/104

    ( ) 1 ( ) 1

    prediction error filter

    = = p

    k A z P z z 1

    ( )short term residual, , includes primary pitch pulses that can be removedby long-term predictor of the form

    =

    k

    u n

    ( ) 1 giving

    =

    M B z bz

    ( ) ( ) ( ) = v n u n bu n M ( )with fewer large pulses to code than in u n

    50

    A l iA l i bb S h iS h i

  • 8/12/2019 Lecture 16 (104 Slides)

    51/104

    Analysis Analysis- -byby--SynthesisSynthesis

    mpu ses se ecte to represent t e output o t e ong term pre ctor, rat er t an t e output o t e s ort

    term predictor most impulses still come in the vicinity of the primary pitch pulse

    => result is hi h ualit s eech codin at 8-9.6 Kb s

    51

  • 8/12/2019 Lecture 16 (104 Slides)

    52/104

    Code Excited LinearCode Excited LinearPrediction (CELP)Prediction (CELP)

    52

  • 8/12/2019 Lecture 16 (104 Slides)

    53/104

    Code Excited LPCode Excited LP basic idea is to represent the residual after long-term (pitch

    period) and short-term (vocal tract) prediction on each frame- ,

    by multiple pulses replaced residual generator in previous design by a

    frame at 8 kHz sampling rate

    can use either deterministic or stochastic codebook10

    deterministic codebooks are derived from a training set ofvectors => problems with channel mismatch conditions

    histogram of the residual from the long-term predictorroughly is Gaussian pdf => construct codebook from whiteGaussian random numbers with unit variance

    53

    CELP used in STU-3 at 4800 bps, cellular coders at 800 bps

    Code Excited LPCode Excited LP

  • 8/12/2019 Lecture 16 (104 Slides)

    54/104

    Code Excited LPCode Excited LP

    54

    amplitude distribution of the residual from the long-term pitch predictoroutput is roughly identical to a Gaussian distribution with the same meanand variance.

    CELP EncoderCELP Encoder

  • 8/12/2019 Lecture 16 (104 Slides)

    55/104

    CELP Encoder CELP Encoder

    55

  • 8/12/2019 Lecture 16 (104 Slides)

    56/104

    For each of the excitation VQ codebook vectors

    the following operations occur: the codebook vector is scaled by the LPC gain

    , , the error signal, e [n], is used to excite the long-term

    pitch predictor, yielding the estimate of the speechsignal, x [n], for the current codebook vector the signal, d [n], is generated as the difference

    between the s eech si nal x n and the estimated

    speech signal, x [n] the difference signal is perceptually weighted and the

    -

    ~

    56

    Stochastic Code (CELP)Stochastic Code (CELP)

  • 8/12/2019 Lecture 16 (104 Slides)

    57/104

    Stochastic Code (CELP)Stochastic Code (CELP)Excitation AnalysisExcitation Analysis

    57

  • 8/12/2019 Lecture 16 (104 Slides)

    58/104

    58

  • 8/12/2019 Lecture 16 (104 Slides)

    59/104

  • 8/12/2019 Lecture 16 (104 Slides)

    60/104

    Goal is to suppress noise below the maskingthreshold at all fre uencies usin a filter of the form:

    i

    111

    1

    p

    k k k

    k z

    =

    21

    1

    p pk k

    k k

    z z

    z

    =

    =

    where the typical ranges of the pai

    0.2 0.4

    rameters are:

    1

    2

    . .

    0.8 0.9

    i

    60

    in the valleys without distorting the speech.

  • 8/12/2019 Lecture 16 (104 Slides)

    61/104

    61

  • 8/12/2019 Lecture 16 (104 Slides)

    62/104

    -

    array of Gaussian random numbers, where mostof the samples between adjacent codewordswere identical

    Such overlapping codebooks typically use shiftsof one or two samples, and provide largecomplexity reductions for storage and

    given frame

    62

    Overlapped StochasticOverlapped Stochastic

  • 8/12/2019 Lecture 16 (104 Slides)

    63/104

    Overlapped StochasticOverlapped Stochasticodebookodebook

    Two codewords which are identical except for a shift of

    63

    wo samp es

  • 8/12/2019 Lecture 16 (104 Slides)

    64/104

    64

    CELP WaveformsCELP Waveforms

  • 8/12/2019 Lecture 16 (104 Slides)

    65/104

    a or g na speec

    b s nthetic s eechoutput

    residual

    (d) reconstructed LPCresidual

    (e) prediction residualafter pitch prediction

    65

    10-bit randomcodebook

  • 8/12/2019 Lecture 16 (104 Slides)

    66/104

    66

  • 8/12/2019 Lecture 16 (104 Slides)

    67/104

    67

    FSFS--1016 Encoder/Decoder 1016 Encoder/Decoder

  • 8/12/2019 Lecture 16 (104 Slides)

    68/104

    68

  • 8/12/2019 Lecture 16 (104 Slides)

    69/104

  • 8/12/2019 Lecture 16 (104 Slides)

    70/104

    -- Three sets of features are produced by

    the encoding system, namely:1. the LPC spectral parameters (coded as a

    set of 10 LSP parameters) for each 30 msecframe

    2. the codeword and gain of the adaptivecodebook vector for each 7.5 msec sub-

    3. the codeword and gain of the stochastic

    70

    . -frame

  • 8/12/2019 Lecture 16 (104 Slides)

    71/104

    --

    71

  • 8/12/2019 Lecture 16 (104 Slides)

    72/104

    72

  • 8/12/2019 Lecture 16 (104 Slides)

    73/104

    Total delay of any coder is the time taken by the input

    speec samp e o e processe , ransm e , an

    decoded, plus any transmission delay, including: buffering delay at the encoder (length of analysis frame window)-~ -

    processing delay at the encoder (compute and encode all coderparameters)-~20-40 msec

    buffering delay at the decoder (collect all parameters for a frameof speech)-~20-40 msec processing delay at the decoder (time to compute a frame of

    output using the speech synthesis model)-~10-20 msec ,

    of signals, forward error correction, etc.) is order of 70-130 msec

    73

  • 8/12/2019 Lecture 16 (104 Slides)

    74/104

    ,

    large due to forward adaptation forestimatin the vocal tract and itchparameters backward adaptive methods generally

    produced poor quality speech Chen showed how a backward adaptive

    as a conventional forward adaptive coder atbit rates of 8 and 16 kbps

    74

    Low Delay (LD) CELP Coder Low Delay (LD) CELP Coder

  • 8/12/2019 Lecture 16 (104 Slides)

    75/104

    1 ( / 0.9)P z

    1 ( / 0.4)

    =

    P z

    75

  • 8/12/2019 Lecture 16 (104 Slides)

    76/104

    -- only the excitation sequence is transmitted to thereceiver; the long and short-term predictors arecom ne n o one or er pre c or w osecoefficients are updated by performing LPC analysis onthe previously quantized speech signal

    e exc a on ga n s up a e y us ng e ga ninformation embedded in the previously quantizedexcitation

    - , ,bits/sample at an 8 kHz rate; using a codeword length of5 samples, each excitation vector is coded using a 10-bit

    - -

    codebook) a closed loop optimization procedure is used to populatethe sha e codebook usin the same wei hted error

    76

    criterion as is used to select the best codeword in theCELP coder

    16 kbps LD CELP Characteristics16 kbps LD CELP Characteristics

  • 8/12/2019 Lecture 16 (104 Slides)

    77/104

    8 kHz sam lin rate 2 bits/sample for coding residual

    5 samples per frame are encoded by VQ using a10-bit gain-shape codebook 3 bits (2 bits and sign) for gain (backward adaptive on

    7 bits for wave shape recursive autocorrelation method used to

    compute autocorrelation values from pastsynthetic speech.

    77

    -voice

    LDLD--CELP Decoder CELP Decoder

  • 8/12/2019 Lecture 16 (104 Slides)

    78/104

    a pre c or an ga n va ues are er ve romcoded speech as at the encoder

    1

    1 110 11 ( )

    = M P z

    78

    3 1110 21 ( )

    p P z

  • 8/12/2019 Lecture 16 (104 Slides)

    79/104

  • 8/12/2019 Lecture 16 (104 Slides)

    80/104

    - -

    used to derive an excitation signal that

    while being efficient to code

    code-excited LPC

    the CELP idea

    80

  • 8/12/2019 Lecture 16 (104 Slides)

    81/104

    81

    --

  • 8/12/2019 Lecture 16 (104 Slides)

    82/104

    82

  • 8/12/2019 Lecture 16 (104 Slides)

    83/104

    d n x n

    83

    ModelModel--Based CodingBased Codingassume we model the vocal tract transfer function as

  • 8/12/2019 Lecture 16 (104 Slides)

    84/104

    ( )( )( ) ( ) 1 ( )

    = = =

    X z G GH z S z A z P z

    1

    ( )

    LPC coder 100 frames/sec, 13 parameters/frame (p 10 LPC

    =

    =

    => =

    k k k

    P z a z

    coefficients, pitch period, voicing decision, gain) 1300 parameters/secondfor coding versus 8000 samples/sec for the waveform

    =>

    84

  • 8/12/2019 Lecture 16 (104 Slides)

    85/104

    dont use predictor coefficients (large dynamic range,=>

    poles, PARCOR coefficients, etc. code LP parameters optimally using estimated pdfs for

    1. V/UV-1 bit 100 bps

    2. Pitch Period-6 bits uniform 600 b s3. Gain-5 bits (non-uniform) 500 bps4. LPC poles-10 bits (non-uniform)-5 bitsor an s or o eac o po es ps

    Total required bit rate 7200 bps S5-original

    85

    loss from original speech quality)

    quality limited by simple impulse/noise excitation modelS5-synthetic

  • 8/12/2019 Lecture 16 (104 Slides)

    86/104

    .

    2. use of PARCOR coefficients (| k i |= uniform pdf with small spectral sensitivity

    => 5-6 bits for coding can achieve 4800 bps with almost samequality as 7200 bps system above

    can achieve 2400 bps with 20 msecframes => 50 frames/sec

    86

    --

  • 8/12/2019 Lecture 16 (104 Slides)

    87/104

    87

    --

  • 8/12/2019 Lecture 16 (104 Slides)

    88/104

    the ke roblems with s eech coders based on all-pole linear prediction models

    production model

    idealization of source as either ulse train orrandom noise lack of accountin for arameter correlation

    using a one-dimensional scalar quantizationmethod => aided greatly by using VQ methods

    88

    VQVQ--Based LPC Coder Based LPC Coder

  • 8/12/2019 Lecture 16 (104 Slides)

    89/104

    on PARCOR

    coefficients

    bottom line : todramatically improve

    quality need improved

    Case 1: same quality as 2400 bps LPC vocoder Case 2: same bit rate, higher quality

    exc a on mo e

    10-bit codebook of PARCOR vectors

    44.4 frames/sec

    -

    22 bit codebook => 4.2 mill ioncodewords to be searched

    never achieved good quality

    89

    , ,

    2-bit for f rame synchronization

    total bit rate of 800 bps

    due to computation, storage,graininess of quantization at cellboundaries

  • 8/12/2019 Lecture 16 (104 Slides)

    90/104

    network-64 Kb s PCM 8 kHz sam lin rate 8-bit log quantization)

    international-32 Kbps ADPCM teleconferencing-16 Kbps LD-CELP wireless-13, 8, 6.7, 4 Kbps CELP-based coders secure telephony-4.8, 2.4 Kbps LPC-based

    coders (MELP) - - storage for voice mail, answering machines,

    - -

    90

  • 8/12/2019 Lecture 16 (104 Slides)

    91/104

    91

  • 8/12/2019 Lecture 16 (104 Slides)

    92/104

    bit rate-2400 to 128,000 b s quality-subjective (MOS), objective ( SNR ,

    intelligibility) - , delay-echo, reverberation; block coding delay,

    rocessin dela , multi lexin dela ,transmission delay-~100 msec

    telephone bandwidth-200-3200 Hz, 8kHz

    wideband speech-50-7000 Hz, 16 kHz samplingrate

    92

    Network Speech CodingNetwork Speech Coding

    d dd d

  • 8/12/2019 Lecture 16 (104 Slides)

    93/104

    tandardstandards

    G.711 compandedPCM 64 Kbps tollG.726/727 ADPCM 16-40 Kbps toll

    48, 56,64 Kb s.

    G.728 LD-CELP 16 Kbps toll

    G.729A CS-ACELP 8 Kbps toll-

    93

    . .& ACELP

    . .Kbps

  • 8/12/2019 Lecture 16 (104 Slides)

    94/104

    Secure Telephony SpeechSecure Telephony Speech

    di t d ddi t d d

  • 8/12/2019 Lecture 16 (104 Slides)

    95/104

    oding tandardsoding tandards

    - . ps g

    FS-1016 CELP 4.8 Kbps FS-1016

    95

  • 8/12/2019 Lecture 16 (104 Slides)

    96/104

  • 8/12/2019 Lecture 16 (104 Slides)

    97/104

    2 types of coders waveform approximating-PCM, DPCM, ADPCM-coders which produce a

    reconstructed signal which converges toward the original signal with

    decreasing quantization error - - - - - , , . ,

    which produce a reconstructed signal which does not converge to the originalsignal with decreasing quantization error

    to quality of or iginal speech

    parametric coder converges-

    maximum quality (due to themodel inaccuracy ofrepresenting speech)

    97

  • 8/12/2019 Lecture 16 (104 Slides)

    98/104

    talker and language dependencytalker and language dependency - especially for parametriccoders that estimate itch that is hi hl variable across men, womenand children; language dependency related to sounds of thelanguage (e.g., clicks) that are not well reproduced by model-basedcoders

    -normalized to a maximum level; when actual samples are lower thanthis level, the coder is not operating at full efficiency causing loss ofquality

    - , ,and interfering talkers; levels of background noise varies, makingoptimal coding based on clean speech problematic

    multiple encodingsmultiple encodings - tandem encodings in a multi-linkcommun ca on sys em, e econ erenc ng w mu p e enco ers

    channel errorschannel errors - especially an issue for cellular communications;errors either random or bursty (fades)-redundancy methods ofternused

    98

    nonnon--speech soundsspeech sounds - e.g., music on hold, dtmf tones; sounds thatare poorly coded by the system

  • 8/12/2019 Lecture 16 (104 Slides)

    99/104

    Measures of Speech Coder QualityMeasures of Speech Coder Quality

    Intelligibility-Diagnostic Rhyme Test (DRT)

  • 8/12/2019 Lecture 16 (104 Slides)

    100/104

    Intelligibility Diagnostic Rhyme Test (DRT) compare words that differ in leading consonant en y spo en wor as one o a pa r o c o ces high scores (~90%) obtained for all coders above 4

    Kbps Subjective Quality-Mean Opinion Score (MOS)

    5 excellent quality

    3 fair quality 2 poor quality

    MOS scores for high quality wideband speech(~4.5) and for high quality telephone bandwidth

    100

    speech (~4.1)

    Evolution of S eech Coder Performance

    E ll t North American TDMA

  • 8/12/2019 Lecture 16 (104 Slides)

    101/104

    Excellent North American TDMA

    Good 2000

    Fair

    Poor

    ITU RecommendationsCel l ul ar St andar ds

    Secure Telephony1980 Profile

    1980

    Bad

    ro e2000 Profile

    101Bit Rate (kb/s)

  • 8/12/2019 Lecture 16 (104 Slides)

    102/104

    GOOD (4)G.723.1

    G.729

    IS-127 G.728 G.726G.711

    FAIR (3)

    IS54

    FS1016MELP1995

    2000

    POOR (2)FS10151990

    BAD (1) 1980

    102BIT RATE (kb/s)

  • 8/12/2019 Lecture 16 (104 Slides)

    103/104

    64 kbps Mu-Law PCM 32 kbps CCITT G.721 ADPCM

    - 8 kbps CELP

    4.8 kbps CELP for STU-3

    103

    . - -

    d b d h d

  • 8/12/2019 Lecture 16 (104 Slides)

    104/104

    Wideband Speech Coding a e a er

    3.2 kHz-uncoded 7 kHz-uncoded 7 kHz-coded at 64 kbps (G.722) 7 kHz-coded at 32 kbps (LD-CELP)

    - - Female talker

    3.2 kHz-uncoded 7 kHz-uncoded 7 kHz-coded at 64 kbps (G.722) - -

    104

    7 kHz-coded at 16 kbps (BE-CELP)