cs 294-9 :: fall 2003 audio coding ketan mayer-patel

35
CS 294-9 :: Fall 2003 Audio Coding Ketan Mayer-Patel

Upload: geoffrey-curtis

Post on 24-Dec-2015

215 views

Category:

Documents


2 download

TRANSCRIPT

CS 294-9 :: Fall 2003

Audio Coding

Ketan Mayer-Patel

CS 294-9 :: Fall 2003

Overview of Today• PCM

– Linear– -LaW

• DPCM

• ADPCM

• MPEG-1

• Vocoding

Sampling Techniques

Generic Coding Techniques

Psychoacoutic Coding

Speech Specific Techniques

CS 294-9 :: Fall 2003

Audio Signals• Analog audio is basically voltage as a continuous function

of time.

• Unlike video which is 3D, audio is a 1D signal. – Can capture without having to discretize the higher dimensions.

• Audio sampling basically boils down to quantizing signal level to a set of values.

• Digital audio parameters:– bits per sample

– sampling rate

– number of channels.

CS 294-9 :: Fall 2003

Sampling

• Pulse Amplitude Modulation (PAM)– Each sample’s amplitude is represented by 1 analog value

• Sampling theory (Nyquist)– If input signal has maximum frequency (bandwidth) f,

sampling frequency must be at least 2f

– With a low-pass filter to interpolate between samples, the input signal can be fully reconstructed

CS 294-9 :: Fall 2003

PCM

• Pulse Code Modulation (PCM)– Each sample’s amplitude represented by an integer code-word

– Each bit of resolution adds 6 dB of dynamic range

– Number of bits required depends on the amount of noise that is tolerated

010000110010000100001001101010111100

Quantization error (“noise”)

n = SNR – 4.77

6.02

CS 294-9 :: Fall 2003

Linear PCM• Uses evenly spaced quantization levels.

• Typically 16-bits per sample.

• Provides a large dynamic range.

• Difficult for humans to perceive quantization noise.

• Compact Disks– 16-bit linear sampling– 44.1 KHz sampling rate– 2 channels

CS 294-9 :: Fall 2003

Non-linear Sampling• If we try to use 8 bits per sample, dynamic

range is reduced significantly and quantization noise can be heard.

• In particular, we end up with not enough levels for the lower amplitudes.

• Solution is to sample more densely in the lower amplitudes and less densely for the higher amplitudes.

• Sort of like a log scale.

CS 294-9 :: Fall 2003

Non-linear Sampling Illustrated

Input

Output

CS 294-9 :: Fall 2003

-law and A-law• Non-linear sampling called “companding”

• 8-bits companded provides dynamic range equivalent to 12-bits.

• U-law and A-law are companding standards defined in G.711

• Difference is in exact shape of piece-wise linear companding function.

CS 294-9 :: Fall 2003

f(x) = 127 x sign(x) x ln(1 + |x|)

ln(1 + )(x normalized to [-1, 1])

-Law companding

• Provides 14-bit quality (dynamic range) with an 8-bit encoding

• Used in North American & Japanese ISDN voice service• Simple to compute encoding

CS 294-9 :: Fall 2003

High-resolutionPCM encoding(12, 14, 16 bits)

8-bit-Law

encoding

14-bitdecoding

Sender

TableLookupTable

Lookup

InverseTable

Lookup

InverseTable

Lookup

Receiver

InputAmplitude

0-11-3

29-3131-35

91-9595-103

215-223223-239

463-479

StepSize

1

2

4

8

16

Segment

000

001

010

011

Quanti-

zation00000001

11110000

11110000

11110000

1111

CodeValue

01

1516

3132

4748

63

......

......

......

......

......

......

......

......

-Law Encoding

CS 294-9 :: Fall 2003

High-resolutionPCM encoding(12, 14, 16 bits)

8-bit-Law

encoding

14-bitdecoding

Sender

TableLookupTable

Lookup

InverseTable

Lookup

InverseTable

Lookup

Receiver

Multiplier

1

2

4

8

16

-LawEndoding00000000000001

00011110010000

00111110100000

01011110110000

0111111

DecodeAmplitude

02

3033

9399

219231

471

......

......

......

......

-Law Decoding

CS 294-9 :: Fall 2003

010000110010000100001001101010111100

Difference Encoding

• Differential-PCM (DPCM)– Exploit temporal redundancy in samples

– Difference between 2 x-bit samples can be represented with significantly fewer than x-bits

– Transmit the difference (rather than the sample)

CS 294-9 :: Fall 2003

010000110010000100001001101010111100

“Slope Overload”

Slope Overload Problem

• Differences in high frequency signals near the Nyquist frequency cannot be represented with a smaller number of bits!– Error introduced leads to severe distortion in the higher

frequencies

CS 294-9 :: Fall 2003

Adaptive DPCM (ADPCM)

• Use a larger step-size to encode differences between high-frequency samples & a smaller step-size for differences between low-frequency samples

• Use previous sample values to estimate changes in the signal in the near future

CS 294-9 :: Fall 2003

++

++PredictorPredictor

+

+

y-bitPCM

sample

x-bitADPCM

“difference”DifferenceQuantizerDifferenceQuantizer

Step-SizeAdjusterStep-SizeAdjuster

DequantizerDequantizer+

PredictedPCM

Sample n+1

ADPCM• To ensure differences are always small...

– Adaptively change the step-size (quanta)– (Adaptively) attempt to predict next sample

value

CS 294-9 :: Fall 2003

++

++RegisterRegister

+

+

16-bitPCM

sample

4-bitADPCM

differenceDifferenceQuantizerDifferenceQuantizer

Step-SizeAdjusterStep-SizeAdjuster

DequantizerDequantizer+

PCMSample n–1

IMA’s proposed ADPCM

• Predictor is not adaptive and simply uses the last sample value

• Quantization step-size increases logarithmically with signal frequency

CS 294-9 :: Fall 2003

++

++RegisterRegister

+

+

16-bitPCM

samplePCM

samplen–1

4-bitADPCM

difference(in step-size units)

DifferenceQuantizerDifferenceQuantizer

Step-SizeAdjuster

Step-SizeAdjuster

DequantizerDequantizer+

difference < step_size 000 0.0

step_size < difference < step_size 001 0.25

step_size < difference < step_size 010 0.50

step_size < difference < step_size 011 0.75

step_size < difference < step_size 100 1.0

step_size < difference < step_size 101 1.25

step_size < difference < step_size 110 1.5

step_size < difference 111 1.75

1 41 23 4

5 43 27 4

1 41 23 4

5 43 27 4

Quantization Step-SizeMultiples

QuantizerOutput

IMA Difference Quantization

CS 294-9 :: Fall 2003

0123456789

1011121314151617

789

101112131416171921232528313437

IndexStepSize

181920212223242526272829303132333435

41455055606673808897

107118130143157173190209

IndexStepSize

363738394041424344454647484950515253

230253279307337371408449494544598658724796876963

10601166

IndexStepSize

545556575859606162636465666768697071

128214111552170718782066227224992749302433273660402644284871535858946484

IndexStepSize

7273747576777879808182838485868788

7132784586309493

10442114871263513899152891681818500203502235824623270862979432767

IndexStepSize

IMA Step-size Table

CS 294-9 :: Fall 2003

++

++RegisterRegister

+

+

16-bitPCM

Sample

PCMSample

n–1

4-bitADPCM

difference(in step-size units)

DifferenceQuantizerDifferenceQuantizer

Step-SizeAdjuster

Step-SizeAdjuster

DequantizerDequantizer+

Step-SizeTable

Lookup

Step-SizeTable

Lookup

Range Limit(0 to 88)

Range Limit(0 to 88)

RegisterRegister

Step-SizeTable IndexAdjustment

Lookup

Step-SizeTable IndexAdjustment

Lookup

++Index

Adjustment

PreviousIndex

Quantizer

Output

New Step-Size

Adaptive Step-size Selection

CS 294-9 :: Fall 2003

Step-Size TableIndex Adjustment

QuantizerOutput

000

001

010

011

100

101

110

111

-1

-1

-1

-1

Step-SizeTable

Lookup

Step-SizeTable

Lookup

Range Limit(0 to 88)

Range Limit(0 to 88)

RegisterRegister

Step-SizeTable Index Adjustment

Lookup

Step-SizeTable Index Adjustment

Lookup

++ IndexAdjustment

PreviousIndex

New Step-Size

DifferenceQuantizer

DifferenceQuantizer

difference < step_size

step_size < difference < step_size

step_size < difference < step_size

step_size < difference < step_size

step_size < difference < step_size

step_size < difference < step_size

step_size < difference < step_size

step_size < difference

1 41 23 4

5 43 27 4

1 41 23 4

5 43 27 4

Quantization Step-SizeAdjustment

X 0.91

X 0.91

X 0.91

X 0.91

X 1.21

X 1.46

X 1.77

X 2.14

2

4

6

8

Adaptive Step-size Selection

CS 294-9 :: Fall 2003

X Step Q Adj I M Decode150 7 0 150155 5 7 010 -1 0 0.5 3.5 154167 13 7 111 8 8 1.75 12 166170 4 16 001 -1 7 0.25 4 170250 80 14 111 8 15 1.75 24.5 195250 55 31 111 8 23 1.75 54 249 250 1 66 000 -1 22 0.0 0 249250 1 60 000 -1 21 0.0 0 249200 -49 55 011 -1 20 0.75 -41 208200200200200200200

InputDiffe

rence

Quantizer output

Step Size

Index Adjustment

Step-Size table index

Predicted value

Step-size multip

lier

Reconstituted diffe

rence

+

RegisterRegister

+

+

DifferenceQuantizer

DifferenceQuantizer

Step-SizeAdjuster

Step-SizeAdjuster

DequantizerDequantizer+

Xn

Xn–1

+

IMA ADPCM Example

CS 294-9 :: Fall 2003

+++

Step-SizeAdjuster

Step-SizeAdjuster

+PCM

sample n–1

difference < step_size

step_size < difference < step_size

step_size < difference < step_size

step_size < difference < step_size

step_size < difference < step_size

step_size < difference < step_size

step_size < difference < step_size

step_size < difference

1 41 23 4

5 43 27 4

1 41 23 4

5 43 27 4

Quantization Step-Size TableIndex Adjustment

QuantizerOutput

000 -1

001 -1

010 -1

011 -1

100 2

101 4

110 6

111 8

DequantizerDequantizer

RegisterRegister

Networking ConsiderationsThe IMA codec is reasonably robust to errors

An interval with a low-level signal will correct any step-size error

CS 294-9 :: Fall 2003

Psychoacoustic Properties

• Human perception of sound is a function of frequency and signal strength– (MPEG exploits this relationship.)

100

80

60

40

20

0

SoundLevel(dB)

0.02 0.05 0.1 0.2 0.5 1 2 5 10 20

Frequency(kHz)

Inaudible

Audible

CS 294-9 :: Fall 2003

100

80

60

40

20

0

SoundLevel(dB)

0.02 0.05 0.1 0.2 0.5 1 2 5 10 20

Frequency(kHz)

Inaudible

Audible

Masking tone

Masked tone

Auditory Masking

• The presence of tones at certain frequencies makes us unable to perceive tones at other “nearby” frequencies– Humans cannot distinguish between tones within 100 Hz at

low frequencies and 4 kHz at high frequencies

CS 294-9 :: Fall 2003

MPEG Encoder Block Diagram

Mapping Quantizer Coding

FramePacking

Psycho-acoutsticModel

PCM Audio Samples(32, 44.1, 48 kHz)

EncodedBitstream

Ancillary Data

CS 294-9 :: Fall 2003

Subband Filter• Transforms signal from time domain to

frequency domain.– 32 PCM samples yields 32 subband samples.

• Each subband corresponds to a freq. band evenly spaced from 0 to Nyquist freq.

– Filter actually works on a window of 512 samples that is shifted over 32 samples at a time.

• Subband coefficients are analyzed with psychoacoustic model, quantized, and coded.

CS 294-9 :: Fall 2003

Layer 1• 384 samples per frame.

• Iterative bit allocation process:– For each subband, determine MNR.– Increase number of quantization bits for

subband with smallest MNR.– Iterate until all bits used.

• Fixed allocation of bits among subbands for a particular frame.

• Up to 448 kb/s

CS 294-9 :: Fall 2003

Layer 2• 1152 samples per frame.

• Iterative bit allocation.

• Subband allocation is dynamic.

• Up to 384 kb/s

CS 294-9 :: Fall 2003

Layer 3• 1152 samples

– Up to 320 kb/s

• Each subband further analyzed using MDCT to create 576 frequency lines.– 4 different windowing schemes depending on

whether samples contain “attack” of new frequencies.

• Lots of bit allocation options for quantizing frequency coefficients.

• Quantized coefficients Huffman coded.

CS 294-9 :: Fall 2003

Vo-coding

• Concept: Develop a mathematical model of the vocal cords & throat– Derive/compute model parameters for

a short interval and transmit to the decoder

– Use the parameters to synthesize speech at the decoder

• So what is a good model?– A “buzzer” in a “tube”!

– The buzzer is characterized by its intensity & pitch

– The tube is characterized by its formants

CS 294-9 :: Fall 2003

75

60

45

30

15

0

Amplitude

Frequency(kHz)

Vocoding - Basic Concepts

• Formant — frequency maxima & minima in the spectrum of the speech signal

• Vocoders group and code portions of the signal by amplitude

CS 294-9 :: Fall 2003

“yadda yadda yadda”

y(n) = ak y(n – k) + G x x(n)k=1

p

• Linear Predictive Coding (LPC)– A sample is represented as a linear combination of p

previous samples

“Buzzer” and “Tube” Model

• Vocoding principles:– voice = formants + buzz pitch & intensity

– voice – estimated formants = “residue”

CS 294-9 :: Fall 2003

LPC• Decoder artificially generates speech via formant synthesis

– A mathematical simulation of the vocal tract as a series of bandpass filters

– Encoder codes & transmit filter coefficients, pitch period, gain factor, & nature of excitation

• Standards:– Regular Pulse Excited Linear Predictive Coder (RPE-LPC)

• Digital cellular standard GSM 6.1 (13 kbps)

– Code Excited Linear Predictive Coder (CELP)• US Federal Standard 1016 (4.8 kbps)

– Linear Predictive Coder (LPC)• US Federal Standard 1015 (2.4 kbps)

CS 294-9 :: Fall 2003

Networking Concerns• Audio bandwidth is actually quite small.

• But human sensitivity to loss and noise is quite high.

• Netwoking concerns:– Loss concealment– Jitter control

• Especially for telephony applications.