Download - Speech Coders for Wireless Communication
-
8/13/2019 Speech Coders for Wireless Communication
1/53
Speech Coders forWireless Communication
-
8/13/2019 Speech Coders for Wireless Communication
2/53
2Courtesy: Communication Networks Research (CNR) Lab.EECS, KAIST
Digital representation of the
speech waveform
Sampler Quantizerx(t) x(n) = x(nt) x(n)
Continuous-time
Continuous-amp.
Discrete-time
Continuous-amp.
Discrete-time
Discrete-amp.
-
8/13/2019 Speech Coders for Wireless Communication
3/53
Three acoustic signalsFrequency
range
Sampling
rate
PCM
bits per samplePCM bit rate
Telephone
speech3003,400Hz* 8kHz 8 64kb/s
Wideband
speech507,000Hz 16kHz 14 224kb/s
Wideband audio 1020,000Hz 48kHz 16 768kb/s
* Bandwidth in Europe : 2003200Hz in the United States and Japan
Frequency response of Telephone transmission channel
Courtesy: Communication Networks Research (CNR) Lab.EECS, KAIST
-
8/13/2019 Speech Coders for Wireless Communication
4/53
Encoder
Decompress
Telephone Speech Music (CD Quality)
0 Hz 4kHz 7kHz 20kHz
Talk A/D Decoder ListenD/A
Storage
Compress
(play)(record/store)
Courtesy: Communication Networks Research (CNR) Lab.EECS, KAIST
-
8/13/2019 Speech Coders for Wireless Communication
5/53
-
8/13/2019 Speech Coders for Wireless Communication
6/53
Hybrid coders
Multi-Pulse Excitation
Efficient at medium bit rates.
A sequence of nonuniformly spaced pulses as an excitation signal
Amplitudes and positions are excitation parameters
Regular-Pulse Excitation (RPE) Efficient at medium bit rates.
A sequence of uniformly spaced pulses as an excitation signal
The position of first pulse within a vector and amplitudes are excitationparameters
Code-Excited Linear Prediction (CELP)
Efficient at low bit rates (below 8 kbps)
A code book of excitation sequences
Two key issues; the design and search of a codebook
Courtesy: Communication Networks Research (CNR) Lab.EECS, KAIST
-
8/13/2019 Speech Coders for Wireless Communication
7/53
7Communication Networks Research (CNR) Lab.EECS, KAIST
0 5 10 15 20
g1g3
n1g2
n2
n3g4
gk
n4
nk
a)
0 5 10 15 20
g1
g3
K
g2 g4g6
b)
g5
c)
Codevector# N
Codevector# 3Codevector# 2
Codevector # 1
Codevector# N
Codevector# 3Codevector# 2
Codevector # 1
Codebook
2M= N
(M = Transmission Bit)
Examples of excitationsa) multipulse
b) regular-pulsec) Code-excited
Linear Prediction
-
8/13/2019 Speech Coders for Wireless Communication
8/53
8Communication Networks Research (CNR) Lab.EECS, KAIST
Speech Compression
Standards
64 kbps -law/A-law PCM(CCTT G.711)
64 kbps 7kHz Subband/ADPCM(CCITT G.722)
32 kbps ADPCM(CCITT G.721)
16 kbps Low Delay CELP(CCITT G.728)
13.2 kbps RPE-LTP(GSM 06.10)
13 kbps ACELP(GSM 06.60)
13 kbps QCELP(US CDMA Cellular)
8 kbps QCELP(US CDMA Cellular)
8 kbps VSELP(US TDMA Cellular)
8 kbps CS-ACELP(ITU G.729)
6.7 kbps VSELP(Japan Digital Cellular)
6.4 kbps IMBE(Immarsat Voice Coding Standard)
5.3 & 6.4 kbps True Speech Coder(ITU G.723)
4.8 kbps CELP(Fed. Standard 1016-STU-3)
2.4 kbps LPC(Fed. Standard 1015 LPC-10E)
-
8/13/2019 Speech Coders for Wireless Communication
9/53
9Communication Networks Research (CNR) Lab.EECS, KAIST
Performance of speech codec
Speech Quality (SNR/SEGSNR, MOS, etc)
Bit Rate (bits per second)
Complexity (MIPS)
Coding Delay (msec)
-
8/13/2019 Speech Coders for Wireless Communication
10/53
10Communication Networks Research (CNR) Lab.EECS, KAIST
Requirements of speech codec
for digital cellular
More channel capacity
Noise immunity
Encryption
Reasonable complexity and encoding delay
-
8/13/2019 Speech Coders for Wireless Communication
11/53
Vocoders
-
8/13/2019 Speech Coders for Wireless Communication
12/53
Anatomy of Speech Organs:
The source of most speech occurs in the larynx.
It contains two folds of tissue called the vocal folds
or vocal cords which can open and shut like a pair of
fans.
The gap between the vocal cords is called the glottis
and as air is forced through the glottis the vocal cords
will start to vibrate and modulate the air flow.
This process is known as phonation.
The frequency of vibration determines the pitch of
the voice
for a male is typically in the range 50-200Hz
for a female the range can be up to 500Hz.
-
8/13/2019 Speech Coders for Wireless Communication
13/53
Amplitude
Time (ms)
50
Opening
phaseClosing
phase
Closure
Period = 12.5ms
Fundamental frequency = 1/.0125 = 80Hz
Rosenberg JASM 49, 1971
Glottal Pulse
-
8/13/2019 Speech Coders for Wireless Communication
14/53
Spectrum of glottal pulseIntensity
Frequency (Hz)
Harmonics of spectrum spaced at 80 Hz, corresponding to
pitch period of 12.5ms.
-
8/13/2019 Speech Coders for Wireless Communication
15/53
Spectrum of glottal pulse
filtered by the vocal tract
Intensity
Frequency (Hz)
Harmonics of spectrum spaced at 80 Hz, corresponding to
pitch period of 12.5ms.
-
8/13/2019 Speech Coders for Wireless Communication
16/53
/ee/ /ar/ /uu/
-
8/13/2019 Speech Coders for Wireless Communication
17/53
Properties of Speech in Brief
ee in keyo in spotoo in blue e in again
Vowels
s in spot k in key
Consonants
Quasi-periodic
Relatively high signal power
Non-periodic (random)
Relatively low signal power
-
8/13/2019 Speech Coders for Wireless Communication
18/53
Wrong /r/ /o/ /ng/
-
8/13/2019 Speech Coders for Wireless Communication
19/53
Moving /m/ /uu/ /v/ /i/ /ng/
-
8/13/2019 Speech Coders for Wireless Communication
20/53
Southampton /s/ /ou/ /th/ /aa/ /m/ /p/ /t/ /a/ /n/
-
8/13/2019 Speech Coders for Wireless Communication
21/53
Digital speech model
A basic digital model for speech production
periodicsignal gen.
random
signal gen.
linear time
variant filterx
Gain
-
8/13/2019 Speech Coders for Wireless Communication
22/53
Vocoder
Send three kinds of information to the
receiver:
(1) voiced or unvoiced signal,
(2) if it is voiced, the period of the excitation
signal,
(3) the parameters of the prediction filter
-
8/13/2019 Speech Coders for Wireless Communication
23/53
Vocoder
voice
classification
pitch
recognition
determine
filter coeff.
digital filter
excitation
signal gen
Encoder/Decoder
-
8/13/2019 Speech Coders for Wireless Communication
24/53
LPC Introduction
This speech coders are called Vocoders (voice
coder).
Basic Idea
They usually provide more bandwidth compressionthan is possible with waveform coding (2400-
9600bps).
Estimate
parameters
Encode
Parameters
Decode
Parameters
Synthetise
Speech
Transmit
Parameters
-
8/13/2019 Speech Coders for Wireless Communication
25/53
Generalities
LP Model
Parameter Estimation
Typical Memory requirements
-
8/13/2019 Speech Coders for Wireless Communication
26/53
LP Model
Impulse
Generator
Pitch Period
WhiteNoise
Generator
All-pole
filter
Glottal filterVocal tract filter
Lip Radiation filter
Voice/Unvoice
Switch
Speech
Signal
Voice
Unvoice Gain
-
8/13/2019 Speech Coders for Wireless Communication
27/53
Parameter Estimation
Therefore, for each frame:
estimate LP coefficients (ais)
estimate Gain
estimate type of excitation (voice or unvoice).
Estimate pitch.
-
8/13/2019 Speech Coders for Wireless Communication
28/53
V/UV Estimation Several Methods
Energy of Signal
Zero Crossing Rate
Autocorrelation Coefficient
SU U
V
-
8/13/2019 Speech Coders for Wireless Communication
29/53
Speech Measurements (1)
Zero Crossing Rate
Log Energy Es
Normalized Autocorrelation Coefficient
N
n
s nSN
E
1
2 ))(1
log(10
))())(((
)1()(
1
0
2
1
2
11
N
n
N
n
N
n
nsns
nsns
C
-
8/13/2019 Speech Coders for Wireless Communication
30/53
V
U V
V
S
S
V U
U
U U
Comparison between actual data and
V/U/S determination results.
-
8/13/2019 Speech Coders for Wireless Communication
31/53
Pitch Detection
Voiced sounds
Produced by forcing air through glottis
Vocal cords oscillate and modulate air flow into quasi
periodic pulses Pulses excite resonances in reminder of vocal tract
Different soundsproduced as muscles work to change
shape of vocal tract
Resonant frequencies or formant frequencies
Fundamental frequency or pitchrate of pulses
-
8/13/2019 Speech Coders for Wireless Communication
32/53
Pitch Detection
Short sections of
Voiced speech
Unvoiced speech
0 100 200 300 400 500 600 700-400
-200
0
200
400
sample number
amplitude
0 100 200 300 400 500 600 700-400
-200
0
200
400
sample number
amplitude
-
8/13/2019 Speech Coders for Wireless Communication
33/53
Time-domain pitch estimation
Well studied area
Variations of
fundamental frequency
are evident
Time-domain
speech processing
should be capable of detecting pitch frequency
0 100 200 300 400 500 600 700-400
-300
-200
-100
0
100
200
300
400
sample number
amplitude
-
8/13/2019 Speech Coders for Wireless Communication
34/53
Pitch Period Estimation Using the Auto-
correlation Function
Periodic signals have periodic auto-correlation function
Basic problems in choosing window length:
Speech changes over time (N low) but at least 2 periods of the waveform
Approaches:
Choose window to catch longest period
Adaptive N
Use modified short-time auto-correlation function
kN
m
n mkwkmnxmwmnxkR1
0
'')]()()][()([)(
-
8/13/2019 Speech Coders for Wireless Communication
35/53
Pitch Period Estimation Using the Auto-
correlation Function (Contd)
Auto-correlation representation - retains too
muchof the information in the speech signal
=> auto-correlation function has many peaks
0 100 200 300 400 500 600 7000
2
4
6
8
10
12
14
-
8/13/2019 Speech Coders for Wireless Communication
36/53
Spectrum flatteners
techniques
Remove the effects of the vocal tract transfer function
Center clipping - nonlinear transformation, clipping value
depends on maximum amplitude
=> Strong peak at thepitch frequency
0 100 200 300 400 500 600 700-400
-300
-200
-100
0
100
200
300
400
sample number
amp
litude
0 100 200 300 400 500 600 7000
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0 100 200 300 400 500 600 7000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
-
8/13/2019 Speech Coders for Wireless Communication
37/53
Fundamental Frequency
F0 estimation: (Hess) determining the mainperiod in quasi-periodic waveform usually using autocorrelation function and the average
magnitude difference function (AMDF)
where L is the frame length Npis number of point pairs(peak in ACF and valley in AMDF indicates F0)
mn
tt
p
t Lmnmnsns
N
mAMDF,
10|,)()(|1
)(
-
8/13/2019 Speech Coders for Wireless Communication
38/53
Typical Memory Requirements Pitch coefficient (6 bits).
Gain (5 bits)
Model parameters:
LP coefficients (8-10 bits)
Small changes in the LPC results in large changes
in the pole positions.
Reflection coefficients (6 bits)
If |rk| near 1, then large distortion.
Log-Area Ratio:
Represent a non-linear transformation of the
Reflection Coefficients to expand the scale near to
|rk| near 1.
-
8/13/2019 Speech Coders for Wireless Communication
39/53
The main difference of the LP vocoders is the
calculation of the source of excitation.
LPC 10
-
8/13/2019 Speech Coders for Wireless Communication
40/53
LPC-10
Impulse
Generator
Pitch
Period
(7 bits)
White NoiseGenerator
1/A(z)
10 Reflection
Coefficients.(5 bitsfor one
and 4 bitsfor the others).
Voice/Unvoice
Switch(1 bit)Synthesized
Speech Signal
Gain
(5 bits)
SpeechSignal
ADC
(8kHz)
Sample
SpeechReflection
Coefficients
(4 bits)
LP Analysis
(Covariance
Method)
Non-linear
warping
Window
(180 samples)
LAR
coefficients
(4 bits and 5 bits)
AMDF and
Zero CrossingVoice/Unvoice
Switch (1 bit)
Pitch Frequency
(7 bit)E
n
c
o
d
e
r
D
e
c
o
d
e
r
Channel
-
8/13/2019 Speech Coders for Wireless Communication
41/53
RELP
Simple vocoder offers poor sound quality and is usually
unsatisfactory.
An improvement is to use the prediction error rather than the
periodical pulse (for voiced signal) or the random noise (for
unvoiced signal) to excite the digital filter to reproduce the
speech. The prediction error is also called the residual.
This scheme is called Residual Excited Linear Prediction(RELP) coding.
-
8/13/2019 Speech Coders for Wireless Communication
42/53
RELP
determinefilter coeff.
digital filter
- quantization encoder
-
8/13/2019 Speech Coders for Wireless Communication
43/53
RELP
RELP follows essentially the same idea as DPCM.
However, in RELP the speech signal is divided into
blocks (20ms/block). The optimum linear predictor is designed for each block.
For each block, the filter coefficients and the prediction
error should be sent to the receiver.
In DPCM, the predictor can be fixed or adaptive.
Only the prediction error is sent to the receiver.
M d li f th di ti
-
8/13/2019 Speech Coders for Wireless Communication
44/53
Modeling of the prediction
error
In each block of speech signal (a frame), the
prediction error may also be correlated.
To decorrelate the prediction error, each frame isfurther divided into 4 sub-frames (5ms). The
prediction error u(n) is then modelled as
where M (40
-
8/13/2019 Speech Coders for Wireless Communication
45/53
Long-term prediction
The decorrelation of the prediction error is
called long-term prediction.
determine
filter coeff.
digital filter
-long-term
predictionencoder
u(n) e(n)
A(z)
U(z)s(n)
-
8/13/2019 Speech Coders for Wireless Communication
46/53
RPE-LTP
The RPE-LTP has been adopted as the
speech coding method in the GSM 06.10
standard
determine
filter coeff.
digital filter
-long-term
prediction
Regular pulse
selection and
coding
-
8/13/2019 Speech Coders for Wireless Communication
47/53
RPE-LTP
Speech is sampled at 8 kHz, quantised to 8 bits/sample
The speech signal is pre-processed to remove any DC
component and to pre-emphasis the high-frequencies
component, partly compensating for their low energy.
The signal is then dived into frames (20ms, 160 samples). An
eighth-order optimum linear predictor is designed using the
Shur algorithm.
The reflection coefficients (related to the filter coefficients) are
nonlinearly mapped to another set of values called log-area
ratio(LAR).
-
8/13/2019 Speech Coders for Wireless Communication
48/53
RPE-LTP
The 8 LAR parameters are quantized using 6,6,5,5,4,4,3,3 bits.
So a total of 36 bits for the LAR (or for the filter coefficients).
The frame is filtered using this filter and produces u(n).
u(n) is then divided into 4 sub-frames (5ms each, 40 samples).
Long-term prediction is performed for each sub-frame. The lag M is
quantized to 7 bits and the gain his represented by 2 bits.
Long-term prediction produces e(n).
-
8/13/2019 Speech Coders for Wireless Communication
49/53
RPE-LTP
e(n) is down-sampled by a factor 3. For each sub-frame,
there are 4 down-sample patterns. Need 2 bits to specify the
pattern used.
The down-sampled e(n) has 13 samples. The maximum of
them is quantized to 6 bits, others are normalised then
represented by 3 bits.
So in each sub-frame, e(n) is represented by 6+13*3=6+39 bits.
A frame has 4 sub-frames, 4*(6+39)=180 bits
The above method is called regular-pulse-excitation (RPE)
-
8/13/2019 Speech Coders for Wireless Communication
50/53
13 kbps RPE-LTP coder
- Encoder -
Input signal
Short termLPC
analysis
RPE grid
Selection and coding
Synthesis filter
1/A(z/5)
+
RPE grid
decoding
LTP
analysis
Pre-
processing
Short term
Analysis
filter
+
LTP parameters
RPE parameters
(13 pulses / 5 ms)
Reflection coefficients
(36 bit / 20 ms)
-
-
8/13/2019 Speech Coders for Wireless Communication
51/53
- Decoder -
RPE
parameters
Synthesis filter
1/A(z/5)
+RPE grid
decoding
Post-
processingShort term
synthesis
LTPparameters
Reflection
coefficients
Output signal
-
8/13/2019 Speech Coders for Wireless Communication
52/53
RPE-LTP
Summary 8 LAR coefficients 36 (bits)
For each sub-frame
pattern code 2
lag 7
gain 2
regular pulse 6+39
total 56
4 sub-frames 4*56=224
Total one frame 224+36=260
bit rate 260 bits/20 ms=13kbs
-
8/13/2019 Speech Coders for Wireless Communication
53/53