1 pattern comparison techniques test pattern:reference pattern:
TRANSCRIPT
1
PATTERN COMPARISON TECHNIQUESPATTERN COMPARISON TECHNIQUES
},,...,,,{ 321 ittttT }.,...,,{ 21jJ
jjj tttR Test Pattern: Reference Pattern:
3
4.3 DISTORTION MEASURES-4.3 DISTORTION MEASURES-MATHEMATICAL CONSIDERATIONSMATHEMATICAL CONSIDERATIONS
).,(),()(
.,,for),(),(),()(
;,for),(),()(
;ifonlyandif0),(
,for),(0)(
yxdzyzxdd
zyxzydzxdyxdc
yxxydyxdb
yxyxd
yxyxda
x and y: two feature vectors defined on a vector space XThe properties of metric or distance function d:
A distance function is called invariant if
4
PERCEPTUAL CONSIDERATIONSPERCEPTUAL CONSIDERATIONSSpectral changes that do not fundamentally change the perceived sound include:
5
PERCEPTUAL CONSIDERATIONSPERCEPTUAL CONSIDERATIONS
Spectral changes that lead to phonetically different sounds include:
6
PERCEPTUAL PERCEPTUAL CONSIDERATIONSCONSIDERATIONSJust-discriminable change:Just-discriminable change:
known as JND (just-noticeable known as JND (just-noticeable difference), DL (difference limen), or difference), DL (difference limen), or differential thresholddifferential threshold
7
4.4 DISTORTION MEASURES-4.4 DISTORTION MEASURES-PERCEPTUAL CONSIDERATIONSPERCEPTUAL CONSIDERATIONS
8
4.4 DISTORTION MEASURES-4.4 DISTORTION MEASURES-PERCEPTUAL CONSIDERATIONSPERCEPTUAL CONSIDERATIONS
9
Spectral Distortion Spectral Distortion MeasuresMeasures
Spectral Density
Fourier Coefficients of Spectral Density
Autocorrelation Function
10
Spectral Distortion Spectral Distortion MeasuresMeasures
Short-term autocorrelation
Then is an energy spectral density)(S
12
Spectral Distortion Spectral Distortion MeasuresMeasures
If σ/A(z) is the all-pole model for the speech spectrum,The residual energy resulting from “inverse filtering”
the input signal with an all-zero filter A(z) is:
13
Spectral Distortion Spectral Distortion MeasuresMeasures
Important properties of all-pole modeling:
The recursive minimization relationship:
16
CEPSTRAL DISTANCESCEPSTRAL DISTANCES
The complex cepstrum of a signal is defined as The Fourier transform of log of the signal spectrum.
nnn
nn
n
jnn
cc
dSSd
dSc
cc
ecS
2
222
2
0
)(
2|)(log)(log|
distance spectral log rms the tospectra theof distance cepstral L therelate
can we theorem,sParseval' applyingby spectra, ofpair aFor
2)(log : thatNote
ts.coefficien cepstral theas toreferred and real are where
)(log
:as expressed becan ))log(S( oftion representa seriesFourier The
17
CEPSTRAL DISTANCESCEPSTRAL DISTANCES
nnn
njn
j
k
n
kknknn
n
nn
ccceceA
aaakcn
ac
zczA
and log where]|)(|/log[
:becomesexpansion seriesTaylor thespectrum,power log theof In terms
p.kfor 0 and 1 where0nfor 1
:derive we,z of powers like of tscoefficien theequating and z
respect toith equation w theof sidesboth atingDifferenti
log)](/log[ :expansionLaurent
20
22
0
1
1
1-1-
1
Truncated cepstral distance
L
nnnc ccLd
1
22 )()(
20
Weighted Cepstral Distances and Weighted Cepstral Distances and LifteringLiftering
It can be shown that under certain regular conditions, the cepstral coefficients, except c0, have:
1) Zero means2) Variances essentially inversed proportional to the square of the
coefficient index:
22 1
ncE n
If we normalize the cepstral distance by the variance inverse:
21
Weighted Cepstral Distances and Weighted Cepstral Distances and LifteringLiftering
Differentiating both sides of the Fourier series equation of spectrum:
This is an L2 distance based upon the differences between the spectral slopes
22
Cepstral Weighting or Liftering Cepstral Weighting or Liftering ProcedureProcedure
h is usually chosen as L/2and L is typically 10 to 16
24
Likelihood DistortionsLikelihood Distortions
Previously defined:Previously defined:
Itakura-Saito Itakura-Saito distortion measuredistortion measure
Where and are one-step prediction errorsWhere and are one-step prediction errors of and as defined: of and as defined:
2
2
)(S )(S
26
Likelihood DistortionsLikelihood Distortions
The residual energy can be easily evaluated by:The residual energy can be easily evaluated by:
27
By replacing by its optimal p-th order LPC model spectrum: By replacing by its optimal p-th order LPC model spectrum: )(S
If we set If we set σσ22 to match the residual energy to match the residual energy αα : :
Which is often referred to as Which is often referred to as Itakura distortion measure Itakura distortion measure
Likelihood DistortionsLikelihood Distortions
28
Likelihood DistortionsLikelihood DistortionsAnother way to write the Itakura distortion measure is:Another way to write the Itakura distortion measure is:
Another gain-independent distortion measure is called the Another gain-independent distortion measure is called the Likelihood Ratio distortion:Likelihood Ratio distortion:
29
4.5.4 Likelihood Distortions4.5.4 Likelihood Distortions
.1
12|)(|
|)(|
||
1,
||
1
||
1,
||
1
2
2
2
2222
p
pt
jp
j
pIS
pLR
aRa
d
eA
eA
AAd
AAd
30
4.5.4 Likelihood Distortions4.5.4 Likelihood Distortions
.||
1,
||
1
,1,1
log)||/1,||/1(
,1/
...,)(log!3
1)(log
!2
1log1)exp(log
22
22
222
2
22
AAd
for
AAd
and
uuuuu
pLR
pp
ppI
p
That is, when the distortion is small, the Itakura distortion measureThat is, when the distortion is small, the Itakura distortion measure is not very different from the LR distortion measureis not very different from the LR distortion measure
32
4.5.4 Likelihood Distortions4.5.4 Likelihood Distortions
)(S
)(nX)(
)()(
zB
zAzH
)(S
)(nX
Consider the Itakura-Saito distortion between Consider the Itakura-Saito distortion between the input and output of a linear system H(z)the input and output of a linear system H(z)
33
4.5.4 Likelihood Distortions4.5.4 Likelihood Distortions
,)(
)()(
.2
1)(log)(
1),(
)(log)(
).()()(
2
2
2
2
zB
zAzH
deH
eHSSd
eHV
SeHS
j
jIS
j
j
34
4.5.4 Likelihood Distortions4.5.4 Likelihood Distortions
22
2
2
2
1
1
1
1
1,
1
12)(
)(1
2)(
1),(
1)(
1)(
2
1
BAd
d
eA
eBd
eHSSd
zazB
zazA
IS
j
j
jIS
p
ii
p
ii
35
4.5.5 Variations of Likelihood Distortions4.5.5 Variations of Likelihood Distortions
.),(),(2
1
),(1
)(
mmIS
mIS
mx
ssdssd
ssd
Symmetric distortion measures:Symmetric distortion measures:
36
4.5.5 Variations of Likelihood Distortions4.5.5 Variations of Likelihood Distortions
).,(2
1),(:
.!4!2
1cosh
).,(2
1)](cosh[
21)(1)(
2
1),(
.),(),(2
1),(,1
22
42
)()()1(
)1(
ssdssdso
VVV
ssdd
V
dVeVessd
ssdssdssdm
COSH
COSH
VVx
ISISx
COSH distortionCOSH distortion
38
4.5.6 Spectral Distortion Using a 4.5.6 Spectral Distortion Using a Warped Frequency ScaleWarped Frequency Scale
Psychophysical studies have shown that human perception of the frequency Content of sounds does not follow a linear scale. This research has led to the idea of defining subjective pitch of pure tones.
For each tone with an actual frequency, f, measured in Hz, a subjective pitch is measured on a scale called the “mel” scale.
As a reference point, the pitch of a 1 kHz tone, 40 dB above the perceptual hearing threshold, is defined as 1000 mels.
40
4.5.6 Spectral Distortion Using a 4.5.6 Spectral Distortion Using a Warped Frequency ScaleWarped Frequency Scale
41
4.5.6 Spectral Distortion Using a 4.5.6 Spectral Distortion Using a Warped Frequency ScaleWarped Frequency Scale
42
4.5.6 Spectral Distortion Using a 4.5.6 Spectral Distortion Using a Warped Frequency ScaleWarped Frequency Scale
4324
23
22
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
500,13
500,10
500,8
000,7
800,5
800,4
000,4
400,3
900,2
500,2
150,2
850,1
600,1
370,1
170,1
000,1
840
700
570
450
350
250
150
50
500,3
500,2
800,1
300,1
100,1
900
700
550
450
380
320
280
240
210
190
160
150
140
120
110
100
100
100
000,12
500,9
700,7
400,6
300,5
400,4
700,3
150,3
700,2
320,2
000,2
720,1
840,1
270,1
080,1
920
770
630
510
400
300
200
100
500,15
000,12
500,9
700,7
400,6
300,5
400,4
700,3
150,3
700,2
320,2
000,2
720,1
840,1
270,1
080,1
920
770
630
510
400
300
200
100
Number
Bank
Critical
(Hz)
Frequency
Center
(Hz)
Band
Critical
(Hz)
Frequency
fLowerCutof
(Hz)
Frequency
fUpperCutof
Examples ofExamples ofCritical bandwidthCritical bandwidth
44
Warped cepstral distanceWarped cepstral distance
B
B B
dbbSbSssd ,
2|))((log))((log|),( 2
2
2
~
i kikkkii
i
B
B
kibjkk
kii
wcccc
dbeccccB
d
,
))((2
2
~
))((
))((2
1
b is the frequency in Barks, S(b is the frequency in Barks, S(θθ(b)) is the spectrum on a (b)) is the spectrum on a Bark scale, and B is the Nyquist frequency in Barks.Bark scale, and B is the Nyquist frequency in Barks.
45
4.5.6 Spectral Distortion Using a Warped Frequency Scale4.5.6 Spectral Distortion Using a Warped Frequency Scale
.)()()(
.2
2~
))((
ikkk
L
Li
L
Lkiic
B
B
kibjik
wccccLd
B
dbew
Where the warping function is defined byWhere the warping function is defined by
46
4.5.6 Spectral Distortion Using a Warped Frequency Scale4.5.6 Spectral Distortion Using a Warped Frequency Scale
.13||6)]()([2
1)(
13||10)1000(3333
)()(
6||13
tan76.0
1000
3333)()(
21
10/)776.8(2
1
bforbbb
bforbb
bforb
bb
b
47
4.5.6 Spectral Distortion Using a Warped Frequency Scale4.5.6 Spectral Distortion Using a Warped Frequency Scale
48
4.5.6 Spectral Distortion Using a 4.5.6 Spectral Distortion Using a Warped Frequency ScaleWarped Frequency Scale
49
4.5.6 Spectral Distortion Using a Warped Frequency Scale4.5.6 Spectral Distortion Using a Warped Frequency Scale
2~
1
~2
1
~~
)()(
,...,2,1,2
1cos)(log
c~ n
L
nn
K
k
kn
ccLd
LnK
knSc
Mel-frequency cepstrum:Mel-frequency cepstrum:
is the output power of the triangular filtersis the output power of the triangular filtersKkSk ,...,1,~
Mel-frequency cepstral distanceMel-frequency cepstral distance
50
4.5.7 Alternative Spectral Representations and Distortion Measures4.5.7 Alternative Spectral Representations and Distortion Measures
51
4.5.7 Alternative Spectral Representations and Distortion Measures4.5.7 Alternative Spectral Representations and Distortion Measures
,1
1logloglog
,...,2,1,1
1
1
1
i
i
i
ii
i
i
i
ii
k
k
A
Ag
pik
k
A
Ag
Wave reflection occurs at each sectional boundary with Wave reflection occurs at each sectional boundary with reflection coefficients denoted by reflection coefficients denoted by pik i ,...,2,1,
52
4.5.7 Alternative Spectral Representations and Distortion Measures4.5.7 Alternative Spectral Representations and Distortion Measures
n
iiig ggd
1
22 )log(log
Another possible parametric representation of the all-pole Another possible parametric representation of the all-pole spectrum is the set of line spectral frequencies (LSFs) defined as spectrum is the set of line spectral frequencies (LSFs) defined as the roots of the following two polynomials based Upon the inverse the roots of the following two polynomials based Upon the inverse filter A(z):filter A(z):
These two polynomials are equivalent to artificially augmenting These two polynomials are equivalent to artificially augmenting the the p-section nonuniform acoustic tube with an extra section that is p-section nonuniform acoustic tube with an extra section that is either completely closed (area=0) or completely open either completely closed (area=0) or completely open (area=(area=∞). LSF parameters, due to their particular structure, ∞). LSF parameters, due to their particular structure, possess properties similar to those of the formant frequencies possess properties similar to those of the formant frequencies and bandwidths.and bandwidths.
).()()(
)()()(1)1(
1)1(
zAzzAzQ
zAzzAzPp
p
53
4.5.7 Alternative Spectral Representations and Distortion Measures4.5.7 Alternative Spectral Representations and Distortion Measures
.)()(
)( where
2/)]()([)(
:tionimplementa oneIn
bands critical ofnumber total the:K
difference slope spectral band critical for the coefficien weighting the:)(
differenceenergy aboslute for theconstant weighting the:
)()()(),(1
2
iVu
u
iVu
uiu
iuiuiu
iu
u
iiiuEEuSSd
GMGM
GM
LMLM
LMs
ss
E
K
iSSEWSM
Weighted slope metric proposed by Weighted slope metric proposed by Klatt:Klatt:
54
4.5.7 Alternative Spectral Representations and Distortion Measures4.5.7 Alternative Spectral Representations and Distortion Measures
).( and )(in
y singularitprevent toand sticscharacteri spectral global theand local the
todue onscontributi thebalance toused are and tscoefficien The
ly.respective peaks, spectral (GM) maximum global theand ,(LM) maximum
localnearest its and band criticalith at the magnitude spectral the
between dB)(in sdifference spectral log theare )( and )(
)()()( where
2/)]()([)(
)()()(),(1
2
iuiu
uu
iViV
iVu
u
iVu
uiu
iuiuiu
iiiuEEuSSd
ss
GMLM
GMLM
GMGM
GM
LMLM
LMs
ss
K
iSSEWSM
55
4.5.7 Alternative Spectral Representations and Distortion Measures4.5.7 Alternative Spectral Representations and Distortion Measures
56
ComputatiComputationon
ExpressionExpressionNotatioNotationn
DistortionDistortion
MeasureMeasure
Metric pL
Distance
CepstralTruncated
Distance)Cepstral
(Liftered)Weighted
pd
)(2 Ldc
2cWd
pp d
ss1
2)(log)(log
L
nnn cc
1
2)(
2
1
))(( nn
L
n
ccnw
egralsFFTs int,log,2
*,L
*,L
Summary of Spectral Distortion Summary of Spectral Distortion MeasuresMeasures
57
ComputatiComputationon
ExpressionExpressionNotatioNotationn
Distortion Distortion MeasureMeasure
Distortion
SaitoItakura
Distortion
Itakura
Distortion
RatioLikelihood
ISd
Id
LRd
1log2
1log2)(
)(
2
2
2
2
2
2
2
2
p
p
p d
A
A
d
S
S
*,p
*,p
*,p
2
2
2
log
2log
p
pt
p
aRa
d
A
A
1
12
2
2
2
p
pt
p
aRa
d
A
A
Summary of Spectral Distortion Summary of Spectral Distortion MeasuresMeasures
58
ComputatiComputationon
ExpressionExpressionNotatioNotationn
Distortion Distortion MeasureMeasure
DistanceCOSH COSHd ),(),(
2
1
12)(
)(logcosh
ssdssd
d
S
S
ISIS
*,2p
DistortionRatio
LikelihoodWeighted
WLRd )()()(
122 nn
L
n
ccnrnr
*,L
Metric
SlopeWighted
WSMd
K
iSSE iiiuEEu
1
2)()()( *,K
Summary of Spectral Distortion Summary of Spectral Distortion MeasuresMeasures
59
4.6 INCORPORATION OF SPECTRAL DYNAMIC 4.6 INCORPORATION OF SPECTRAL DYNAMIC FEATURES INTO THE DISTORTION MEASURE FEATURES INTO THE DISTORTION MEASURE
,)(),(log
n
jnn et
tc
t
tS
A first-order differential (log) spectrum is defined A first-order differential (log) spectrum is defined by:by:
60
4.6 INCORPORATION OF SPECTRAL DYNAMIC 4.6 INCORPORATION OF SPECTRAL DYNAMIC FEATURES INTO THE DISTORTION MEASUREFEATURES INTO THE DISTORTION MEASURE
0])([
0])([
0])([
)]()([
43
32
21
2
23
221
2321
22321
thththttc
thththttc
ththhtc
ththhtcE
M
Mt
M
Mt
M
Mt
M
Mt
Fitting the cepstral trajectoryFitting the cepstral trajectoryby a second order polynomial,by a second order polynomial,Choose h1, h2, h3 such that Choose h1, h2, h3 such that
E is minimized.E is minimized.
Differentiating E with respect Differentiating E with respect to h1, h2, and h3 and setting to h1, h2, and h3 and setting to zero results in 3 equations:to zero results in 3 equations:
61
4.6 INCORPORATION OF SPECTRAL DYNAMIC 4.6 INCORPORATION OF SPECTRAL DYNAMIC FEATURES INTO THE DISTORTION MEASUREFEATURES INTO THE DISTORTION MEASURE
.
,)(12
1
)12(
)()12()(
)(
2
3
42
2
3
2
1
M
MtM
M
MtM
M
MtM
M
Mt
M
MtM
M
M
Mt
tT
ThtcM
h
tMT
tcrMtcTh
T
ttch
The solutions to these equations are:The solutions to these equations are:
62
4.6 INCORPORATION OF SPECTRAL DYNAMIC 4.6 INCORPORATION OF SPECTRAL DYNAMIC FEATURES INTO THE DISTORTION MEASUREFEATURES INTO THE DISTORTION MEASURE
63
4.6 INCORPORATION OF SPECTRAL DYNAMIC 4.6 INCORPORATION OF SPECTRAL DYNAMIC FEATURES INTO THE DISTORTION MEASUREFEATURES INTO THE DISTORTION MEASURE
M
MtM
M
Mt
M
MtM
tn
M
MtMnt
n
tMT
tctMtcT
ht
tc
Ttctht
tc
42
2
302
2
20
)12(
)()12()(2
2)(
)()(
The first and second time derivatives of cn can be obtained by differentiatingThe first and second time derivatives of cn can be obtained by differentiatingthe fitting curve, givingthe fitting curve, giving
64
4.6 INCORPORATION OF SPECTRAL DYNAMIC 4.6 INCORPORATION OF SPECTRAL DYNAMIC FEATURES INTO THE DISTORTION MEASUREFEATURES INTO THE DISTORTION MEASURE
0
2
2)2(
0
)1(
2)2()2(
2
2
2
2
22
2
2)1()1(
22
2
)( and
)( where
)(2
),(log),(log
,)(2
),(log),(log
)2(
)1(
t
nn
t
nn
nnn
nnn
t
tc
t
tc
d
t
tS
t
tSd
d
t
tS
t
tSd
A differential spectral distance:A differential spectral distance:
A second differential spectral distance:A second differential spectral distance:
65
4.6 INCORPORATION OF SPECTRAL DYNAMIC 4.6 INCORPORATION OF SPECTRAL DYNAMIC FEATURES INTO THE DISTORTION MEASUREFEATURES INTO THE DISTORTION MEASURE
1usually
,
321
2
232
22221
22 )2()1(
dddd
n
jnn
n
jnn
etjn
et
tcjntS
t
.)(
)()],([log
)1(
2
Cepstral weighting or liftering by differentiatingCepstral weighting or liftering by differentiating
Combining the first and second differential spectral distances with the Combining the first and second differential spectral distances with the Cepstral distance results in:Cepstral distance results in:
66
4.6 INCORPORATION OF SPECTRAL DYNAMIC 4.6 INCORPORATION OF SPECTRAL DYNAMIC FEATURES INTO THE DISTORTION MEASUREFEATURES INTO THE DISTORTION MEASURE
.)(
2
),(log),(log
2)1()1(2
22222
nnn
w
n
d
t
tS
t
tSd
A weighted differential cepstral distance:A weighted differential cepstral distance:
67
4.6 INCORPORATION OF SPECTRAL DYNAMIC 4.6 INCORPORATION OF SPECTRAL DYNAMIC FEATURES INTO THE DISTORTION MEASUREFEATURES INTO THE DISTORTION MEASURE
nnnnnWW
nnnnn
nn
nn
nn
nnnnn
w
tttctcndd
tttctcn
ttntctcn
tttctcn
dtStS
td
)].()([)]()([2
)]()([)]()([2
)]()([)]()([
)()()()(
2,(log),(log
)1()1(222
22
22
21
)1()1(221
2)1()1(222
2221
2)1(2
)1(211
2
22
2122
21
Taking the L2 distanceTaking the L2 distance
Other operators can be added to produce a combined representation Other operators can be added to produce a combined representation Of the spectrum and the differential spectra. As an example:Of the spectrum and the differential spectra. As an example:
jnn
nn ettcjntS
t
)]()([),(log )1(2