1 audio coding. 2 digitization processing signal encoder signal decoder samplingquantization storage...

1

Audio Coding

2

Digitization Processing

Signalencoder

Signaldecoder

sampling quantization storage

Analog signal

Digital data

3

Overview of Today

• PCM– Linear– -LaW

• DPCM

• ADPCM

• MPEG-1

• Vocoding

Sampling Techniques

Generic Coding Techniques

Psychoacoutic Coding

Speech Specific Techniques

4

Encode Design

• Bandlimiting filter– Smooth analog signals

• Analog to digital converter (ADC)– Sample and Quantize analog signals.

5

Bandlimiting filterPass only frequency components up to half of Nyquist rate.

6

Analog to digital converter

7

Sampling

• Pulse Amplitude Modulation (PAM)– Each sample’s amplitude is represented by 1 ________

value

• Sampling theory (_________)– If input signal has ________ frequency (bandwidth) f,

sampling frequency must be at least ____

– With a _____-pass filter to interpolate between samples, the input signal can be fully reconstructed

8

PCM

• Pulse Code Modulation (PCM)– Each sample’s amplitude represented by an ________ code-word

– Each bit of resolution adds __ dB of dynamic range

– Number of bits required depends on the amount of noise that is tolerated

010000110010000100001001101010111100

Quantization error (“noise”)

n = SNR – 4.77

6.02

9

Linear PCM

• Quantization levels are _________ spaced.

• ___ bit samples provide plenty of dynamic range.

• Compact Disks do this.

10

Under Sampling

• Sample rate under Nyquist rateLF also called antialiasing filter

Added to original signal and cause distortion.

11

Quantization intervals

12

Associated waveform set

13

-Law companding (ITU Rec. G.711)

• Non-linear quantization of the signal’s amplitude– Quantization step-size decreases logarithmically with

signal ______– Low-amplitude samples represented with ______

accuracy than high-amplitude samples– Humans are less sensitive to changes in “____” sounds

than “_____” sounds

14

f(x) = 127 x sign(x) x ln(1 + |x|)

ln(1 + )(x normalized to [-1, 1])

-Law companding

• Provides __-bit quality (dynamic range) with an _-bit encoding

• Used in North American & Japanese ISDN voice service• Simple to compute encoding

15

010000110010000100001001101010111100

Difference Encoding

• Differential-PCM (DPCM)– Exploit _________ redundancy in samples– ___________ between 2 x-bit samples can be

represented with significantly fewer than x-bits– Transmit the difference (rather than the ________)

16

DPCM Working Principle

Previous sampling value

17

010000110010000100001001101010111100

“Slope Overload”

Slope Overload Problem

• Differences in high frequency signals near the ___________ frequency cannot be represented with a smaller number of bits!– Error introduced leads to severe distortion in the ______

frequencies

18

Adaptive DPCM (ADPCM)

• Use a larger step-size to encode differences between ______-frequency samples & a smaller step-size for differences between ____-frequency samples

• Use ________ sample values to estimate changes in the signal in the near future

19

++

++PredictorPredictor

+

–

+

y-bitPCM

sample

x-bitADPCM

“difference”DifferenceQuantizerDifferenceQuantizer

Step-SizeAdjusterStep-SizeAdjuster

DequantizerDequantizer+

PredictedPCM

Sample n+1

ADPCM

• To ensure differences are always small...– Adaptively change the ____-size (quanta)– (Adaptively) attempt to _____ next sample

value

20

Psychoacoustic Fundamentals

• Absolute threshold of hearing

• Critical band frequency analysis

• Frequency masking

• Temporal masking

21

Absolute Threshold of Hearing

• Human perception of sound is a function of ________ and signal __________– (MPEG exploits this relationship.)

• Sampled segments of the source audio waveform are analyzed but only those features _____________ to the ear are transmitted.

• Psychoacoustic model is used to identify _________ masking and ________ masking and eliminate them from the transmitted signal.

SoundLevel(dB)

Frequency(kHz)

100

80

60

40

20

00.02 0.05 0.1 0.2 0.5 1 2 5 10 20

Inaudible

Audible Maximum allowableEnergy level for Coding distortion

22

100

80

60

40

20

0

SoundLevel(dB)

0.02 0.05 0.1 0.2 0.5 1 2 5 10 20

Frequency(kHz)

Inaudible

Audible

Masking tone

Masked tone

Auditory Masking

• The presence of tones at certain frequencies makes us unable to perceive tones at other “_________” frequencies– Humans cannot distinguish between tones within _____ Hz at low

frequencies and _____kHz at high frequencies

23

MPEG Encoder Block Diagram

Mapping Quantizer Coding

FramePacking

Psycho-acoutsticModel

PCM Audio Samples(32, 44.1, 48 kHz)

EncodedBitstream

Ancillary Data

24

Vo-coding• Concept: Develop a __________

model of the vocal cords & throat– Derive/compute _____ parameters

for a short interval and transmit to the decoder

– Use the parameters to _______ speech at the decoder

• So what is a good model?– A “buzzer” in a “tube”!

– The buzzer is characterized by its _________ & _______

– The tube is characterized by its ___________s

25

75

60

45

30

15

0

Amplitude

Frequency(kHz)

Vocoding - Basic Concepts

• Formant — frequency maxima & minima in the spectrum of the speech signal• Vocoders code

– _____– Period– _________, and – signaling vocal tract _________ parameters

• Voiced sounds, m,v,and l.• Unvoiced sounds, f and s.

26

“yadda yadda yadda”

y(n) = ak y(n – k) + G x x(n)k=1

p

• Linear Predictive Coding (LPC)– A sample is represented as a linear combination of ___ previous ________s

“Buzzer” and “Tube” Model

• Vocoding principles:– voice = _________s + buzz ______ & intensity– voice – estimated ________s = “residue”

27

LPC• Decoder artificially generates speech via

_________ synthesis– A mathematical simulation of the _______

as a series of bandpass filters– Encoder codes & transmit filter _______,

pitch period, gain factor, & nature of excitation

28

LPC Schematic

29

LPC Related Standards

• Standards:– Regular Pulse Excited Linear Predictive Coder

(RPE-LPC)• Digital cellular standard GSM 6.1 (___ kbps)

– Code Excited Linear Predictive Coder (CELP)• US Federal Standard 1016 (_____ kbps)• Waveform template based to improve sound quality.

– Linear Predictive Coder (LPC)• US Federal Standard 1015 (______ kbps)• Very synthetic and used primarily in military

applications with very limited bandwidth.

30

Networking Concerns

• Audio bandwidth is actually quite small.

• But human sensitivity to loss and noise is quite ________.

• Networking concerns:– _______ concealment– ________ control

• Especially for telephony applications.

1 audio coding. 2 digitization processing signal encoder signal decoder samplingquantization storage...

Documents

sampling frequency

lowamplitude samples

frequency components

frequency samplesuse

high frequency signals

input signal

original signal

frequency bandwidth