1 audio coding. 2 digitization processing signal encoder signal decoder samplingquantization storage...
TRANSCRIPT
1
Audio Coding
2
Digitization Processing
Signalencoder
Signaldecoder
sampling quantization storage
Analog signal
Digital data
3
Overview of Today
• PCM– Linear– -LaW
• DPCM
• ADPCM
• MPEG-1
• Vocoding
Sampling Techniques
Generic Coding Techniques
Psychoacoutic Coding
Speech Specific Techniques
4
Encode Design
• Bandlimiting filter– Smooth analog signals
• Analog to digital converter (ADC)– Sample and Quantize analog signals.
5
Bandlimiting filterPass only frequency components up to half of Nyquist rate.
6
Analog to digital converter
7
Sampling
• Pulse Amplitude Modulation (PAM)– Each sample’s amplitude is represented by 1 ________
value
• Sampling theory (_________)– If input signal has ________ frequency (bandwidth) f,
sampling frequency must be at least ____
– With a _____-pass filter to interpolate between samples, the input signal can be fully reconstructed
8
PCM
• Pulse Code Modulation (PCM)– Each sample’s amplitude represented by an ________ code-word
– Each bit of resolution adds __ dB of dynamic range
– Number of bits required depends on the amount of noise that is tolerated
010000110010000100001001101010111100
Quantization error (“noise”)
n = SNR – 4.77
6.02
9
Linear PCM
• Quantization levels are _________ spaced.
• ___ bit samples provide plenty of dynamic range.
• Compact Disks do this.
10
Under Sampling
• Sample rate under Nyquist rateLF also called antialiasing filter
Added to original signal and cause distortion.
11
Quantization intervals
12
Associated waveform set
13
-Law companding (ITU Rec. G.711)
• Non-linear quantization of the signal’s amplitude– Quantization step-size decreases logarithmically with
signal ______– Low-amplitude samples represented with ______
accuracy than high-amplitude samples– Humans are less sensitive to changes in “____” sounds
than “_____” sounds
14
f(x) = 127 x sign(x) x ln(1 + |x|)
ln(1 + )(x normalized to [-1, 1])
-Law companding
• Provides __-bit quality (dynamic range) with an _-bit encoding
• Used in North American & Japanese ISDN voice service• Simple to compute encoding
15
010000110010000100001001101010111100
Difference Encoding
• Differential-PCM (DPCM)– Exploit _________ redundancy in samples– ___________ between 2 x-bit samples can be
represented with significantly fewer than x-bits– Transmit the difference (rather than the ________)
16
DPCM Working Principle
Previous sampling value
17
010000110010000100001001101010111100
“Slope Overload”
Slope Overload Problem
• Differences in high frequency signals near the ___________ frequency cannot be represented with a smaller number of bits!– Error introduced leads to severe distortion in the ______
frequencies
18
Adaptive DPCM (ADPCM)
• Use a larger step-size to encode differences between ______-frequency samples & a smaller step-size for differences between ____-frequency samples
• Use ________ sample values to estimate changes in the signal in the near future
19
++
++PredictorPredictor
+
–
+
y-bitPCM
sample
x-bitADPCM
“difference”DifferenceQuantizerDifferenceQuantizer
Step-SizeAdjusterStep-SizeAdjuster
DequantizerDequantizer+
PredictedPCM
Sample n+1
ADPCM
• To ensure differences are always small...– Adaptively change the ____-size (quanta)– (Adaptively) attempt to _____ next sample
value
20
Psychoacoustic Fundamentals
• Absolute threshold of hearing
• Critical band frequency analysis
• Frequency masking
• Temporal masking
21
Absolute Threshold of Hearing
• Human perception of sound is a function of ________ and signal __________– (MPEG exploits this relationship.)
• Sampled segments of the source audio waveform are analyzed but only those features _____________ to the ear are transmitted.
• Psychoacoustic model is used to identify _________ masking and ________ masking and eliminate them from the transmitted signal.
SoundLevel(dB)
Frequency(kHz)
100
80
60
40
20
00.02 0.05 0.1 0.2 0.5 1 2 5 10 20
Inaudible
Audible Maximum allowableEnergy level for Coding distortion
22
100
80
60
40
20
0
SoundLevel(dB)
0.02 0.05 0.1 0.2 0.5 1 2 5 10 20
Frequency(kHz)
Inaudible
Audible
Masking tone
Masked tone
Auditory Masking
• The presence of tones at certain frequencies makes us unable to perceive tones at other “_________” frequencies– Humans cannot distinguish between tones within _____ Hz at low
frequencies and _____kHz at high frequencies
23
MPEG Encoder Block Diagram
Mapping Quantizer Coding
FramePacking
Psycho-acoutsticModel
PCM Audio Samples(32, 44.1, 48 kHz)
EncodedBitstream
Ancillary Data
24
Vo-coding• Concept: Develop a __________
model of the vocal cords & throat– Derive/compute _____ parameters
for a short interval and transmit to the decoder
– Use the parameters to _______ speech at the decoder
• So what is a good model?– A “buzzer” in a “tube”!
– The buzzer is characterized by its _________ & _______
– The tube is characterized by its ___________s
25
75
60
45
30
15
0
Amplitude
Frequency(kHz)
Vocoding - Basic Concepts
• Formant — frequency maxima & minima in the spectrum of the speech signal• Vocoders code
– _____– Period– _________, and – signaling vocal tract _________ parameters
• Voiced sounds, m,v,and l.• Unvoiced sounds, f and s.
26
“yadda yadda yadda”
y(n) = ak y(n – k) + G x x(n)k=1
p
• Linear Predictive Coding (LPC)– A sample is represented as a linear combination of ___ previous ________s
“Buzzer” and “Tube” Model
• Vocoding principles:– voice = _________s + buzz ______ & intensity– voice – estimated ________s = “residue”
27
LPC• Decoder artificially generates speech via
_________ synthesis– A mathematical simulation of the _______
as a series of bandpass filters– Encoder codes & transmit filter _______,
pitch period, gain factor, & nature of excitation
28
LPC Schematic
29
LPC Related Standards
• Standards:– Regular Pulse Excited Linear Predictive Coder
(RPE-LPC)• Digital cellular standard GSM 6.1 (___ kbps)
– Code Excited Linear Predictive Coder (CELP)• US Federal Standard 1016 (_____ kbps)• Waveform template based to improve sound quality.
– Linear Predictive Coder (LPC)• US Federal Standard 1015 (______ kbps)• Very synthetic and used primarily in military
applications with very limited bandwidth.
30
Networking Concerns
• Audio bandwidth is actually quite small.
• But human sensitivity to loss and noise is quite ________.
• Networking concerns:– _______ concealment– ________ control
• Especially for telephony applications.