acoustic)phonetics) - tut · acoustic)phonetics) ... •...
TRANSCRIPT
SGN$14006)Audio)and)Speech)Processing)
)Acoustic)Phonetics)
Slides'for'this'lecture'are'based'on'those'created'by'Katariina'Mahkonen'for'TUT'course'”Puheenkäsi;elyn'menetelmät”.'
Other'sources:'QuaAeri:'DiscreteCTime'Speech'Signal'Processin'–'Principles&PracAce.'
'K.'Koppinen:'”Puheenkäsi;elyn'menetelmät”,'luentomoniste,'TTY,'''h;p://www.cs.tut.fi/courses/SGNC4010/sgn4010.pdf'
'
Acoustic)Phonetics)
• AcousAcally,'speech'signal,'as'any'sound,'can'be'viewed'as'air'pressure'level'variaAon'
• AcousAc'phoneAcs'studies'the'acousAc'characterisAcs'of'speech'and'their'relaAonship'to'speech'producAon'
'
'
2!
Link!Longitudinal waves: http://www.acs.psu.edu/drussell/Demos/waves/wavemotion.html!
Speech)waveform/spectrum)is)quasi$stationary)only)5)$20ms)
• Speech'is'processed'in'short'frames'(frameCbyCframe)'
• Length'of'the'frame/window'in'speech'processing'is'usually'10C20ms'
• Hanning/Hamming'type'windows'are'commonly'used'
• Remember'how'windowing'works:'
What)are)speech)signals)like?)• In'Ame'domain'
• Which'phonemes'have'a'lot'of'energy?'
• How'does'the'voiced/unvoiced'difference'appear'in'the'signal?'• How'do'plosives'look'like?'
• In'the'frequency'domain'
• How'does'the'voiced'nature'of'a'phoneme'appear'in'fCdomain?'
• How'do'you'find'the'pitch'from'speech'spectrum?'
• Which'phoneme'is'easiest'to'recognize'from'the'spectrogram?'
• What'special'feature'is'visible'in'the'spectrum'of'a'nasal?'
• So`ware'for'speech'signal'visualizaAon:'Audacity,'Wavesurfer,'Praat,'Rtgram'(Windows),'Baudline'(Linux)'
4!
Audacity download!
Windows: RTgram download/! Linux: Baudline download!
5!
Example sentence: ”He knew what taboos…” (arctic_b0510.wav)!
h! e! (k)n! ew! t! a! b! s!w(h)! a! t! oo!
Modeling))speech)
6!
Quatieri: Discrete-Time "Speech Signal Processing"- Principles and Practice!
Larynx)excitation))(glottis)signal))
Function)of)the)vocal)folds)
A glottis is closed when swallowing!B in voiced phonemes, the vocal chords vibrate periodically!C when whispering, airflow passes only through interarytenoid space!D in glottal fricatives (/h/), vocal folds are narrowly open !E rest/breathing position!F yawning!
What kind of signal (and spectrum) is produced from these?!
Electroglottograph)(EGG))
! Measures vocal"fold contact area!
Inverse)Oiltering)
How)does)glottal)signal)look)like?)
• In'voiced'phonemes,'vibraAng'vocal'folds'cause'pressure'pulses'at'the'
vibraAon'rate'(fundamental'frequency'fate'F0)'(posiAon'B'above)'
• In'unvoiced'phonemes,'the'narrow'hole'in'the'larynx'causes'a'turbulence'
in'the'airstream'from'the'lungs'(posiAon'C).'
Waveshapes)of)periodic)glottal)signal)
Spectra)of)periodic)glottal)signal)
13!Aiheesta University of New South Walesin (Australiassa) sivuilla!
http://www.phys.unsw.edu.au/jw/glottis-vocal-tract-voice.html!
Other)sources)of)sound)energy)in)speech)
1. A'constricAon'(narrow'point)'in'the'vocal'tract'causes'a'
turbulence'in'the'air'stream'that'passes'through'
• Difference'to'whispering:'here'the'turbulence'occurs'in'the'vocal'tract'and'not'in'the'gloes'
2. Vocal'tract'is'closed'to'build'up'pressure'which'is'then'
released'and'the'air'”explodes”'out'(plosives)'
14!
Modeling)speech)–)inOluence)of)vocal)tract)on)speech)spectrum)
15!
Quatieri: Discrete-Time Speech Signal Processing - Principles and Practice!
Amazing grace (overtone singing)!
Spektrograms of overtone singing!
Formants:)resonances)of)vocal)tract)• The'most'important'characterisAc'of'the'vocal'tract'are'its'resonances'(formants)'
• Due'to'standing'waves'in'the'vibraAng'air'column'
• Formants'(F1,'F2,'...)'can'usually'be'seen'in'the'spectrum'as'boosted'frequency'regions'
• In'addiAon'to'frequency,'a'formant'is'characterized'by'its'intensity'and'bandwidth'
• Different'vocal'tract'configuraAons'correspond'to'different'formant'frequencies'"'all'vowels'can'be'classified'based'on'formants'
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''spectrum'of'phoneme'/a/'
'
'
''
'
''
'
'
'
16!
Standing waves!
Vowels)$)formants)
17!
Mathematical)modeling)of)the)vocal)tract)• CalculaAng'the'vocal'tract'resonances'based'on'the'shape'of'the'vocal'tract'is'analyAcally'intractable'(numerical'soluAons'
exist)'
'C'Should'take'into'account''
• Different'larynx'excitaAons,''• TimeCvarying'and'locaAonCdependent'changes'in'the'
vocal'tract'shape'
• Nasal'tract'opening/closing,''• Sound'radiaAon'from'the'lips,''
• Energy'losses,''• Turbulences''• etc.'
''
• However'by'studying'simplified'models'we'can'gain'a'fair'
amount'of'understanding'of'speech'producAon'
'
'
18!
Acoustics)of)simple)tubes)
19!
• Resonances'of'the'vocal'tract'are'due'to'standing'waves'in'the'vibraAng'air'column'(similar'to'e.g.'wind'instruments)'
• In'a'tube'of'uniform'crossCsecAonal'area,'the'wavelengths'λ'of'standing'waves'are''
• SubsAtuAng'the'typical'vocal'tract'length'(male'0.17m,'female'0.15m),'the''frequencies'of'the'resonances'would'be'500Hz,'1500Hz,'2500Hz,…'
• In'vocal'tract,'the'crossCsecAonal'area'varies'and'thus'resonance'frequencies'vary,'but'as'a'rule'of'thumb,'there'is'roughly''1'resonance'per'1'kHz'
Acoustics)of)simple)tubes)
20!
• The'acousAcs'of'a'uniform'tube'can'be'solved'exactly'and'it'helps'us'in'the'following'(notaAon'x,'l,'S'on'the'previous'slide)'
• AcousAcally'interesAng'variables'are'parAcle'velociAes'v(x,t)'in'the'tube'at'point'x'and'Ame't'and'pressure'p(x,t)'
• For'simplicity,'we'assume'planar'pressure'waves'that'travel'along'the'tube.'For'convenience'we'use'volume'velocity'u(x,t)'instead'of'parAcle'velocity:'u(x,t)'='Sv(x,t)'
• The'relaAonship'between'pressure'and'volume'velocity'is'governed'by'soCcalled'wave%equa(ons:'''''''where'ρ'denotes'atmospheric'pressure'and'c'is'speed'of'sound'
−∂p∂x
=ρS∂u∂t
−∂u∂x
=Sρc2
∂p∂t
! Intuition: if particle at point x is not moving but pressure is higher at its right side, pressure difference causes the particle accelerate to left.!
! Intuition: if pressure at point x is zero but particle velocity is higher on the right hand side, particles ”pile up” at point x and pressure increases.!
Solution)to)the)wave)equation)
21!
• It'is'quite'easy'to'see'that'given'an'arbitrary'funcAon'f(y),'the'following'is'a'soluAon'to'the'wave'equaAon:''''u(x,t)'='f(t'–'x/c)'''''p(x,t)'='(ρc/S)'f(t'–'x/c)'which'is'simply'a'forward'traveling'wave'at'speed'c.'That'can'be'verified'by'subsAtuAng't!'t+1'and'x!'x+c'and'noAng'that'the'funcAon'gets'the'same'values'(wave'travels'for'one'sec.).'
• Similarly,'a'backward'traveling'wave'is'a'soluAon'to'the'wave'equaAons.''
• Alltogether,'we'can'write'the'soluAon'in'generic'form'as'''''u(x,t)'='f(t'–'x/c)'–'b(t'+'x/c)'''''p(x,t)'='(ρc/S)''['f(t'–'x/c)'+'b(t'+'x/c)']'where'f'and'b'are'arbitrary'forward'and'backward'traveling'waves,'respecAvely'
Modeling)the)vocal)tract)with)simple)tubes)
• Vocal'tract'is'straightened'and'modeled'using'slices'with'constant'length'and'uniform'crossCsecAonal'area'
22!On the shape of the vocal tract “tube”!http://www.davidmhoward.com/voiceSoundModifiers.htm!
ReOlection)of)the)pressure)wave)
23!
nf
nn fknn fk )1( −
• When two simple tubes are joined, reflections occur at the boundary!!
• Reflection coefficient kn indicates how large part of "the volume-velocity wave traveling from a tube to the next is reflected back (tube cross-sectional areas Sn and"Sn+1 ):!
!!
1
1
+
+
+
−=
nn
nnn SS
SSk
ReOlection)of)the)pressure)wave)
• Areas'are'posiAve,'therefore'91'<'kn'<'1'• If'Sn+1'='0'','then''kn''=''1','and'the'wave'is'reflected'back'as'it'is'• If'Sn+1'is'large,'kn''≈'C1','and'the'wave'is'reflected'in'its'enArety,'but'in'opposite'phase'
• If'Sn'=Sn+1','no'reflecAon'occurs'
24!
Reflection of waves!1
1
+
+
+
−=
nn
nnn SS
SSkwww.acs.psu.edu/drussell/Demos/reflect/reflect.html!
Modeling)wave)reOlections)in)z$plane)–))lattice)structure)(Kelly$Lochbaum)structure))
25!-> length of one model slice = 340 m/s / fs!
• fn'is'forwardCtraveling'sound'wave'in'the'tube'and'bn'is'backwardCtraveling'wave'
• Let’s'model'the'wave'propagaAon'
and'reflecAon'using'the'KellyC
Lochbaum'laece'structure'shown'
in'the'figure'on'the'right'
• Let’s'sample'the'behaviour'of'the'
structure'so'that'the'wave'is'
delayed'by'one'Ame'unit'(zC1)'when'it'travels'the'length'of'one'tube'
secAon'
Kelly$Lochbaum)equations)• Wave'behaviour'can'be'described'using'the'following'
equaAons.'
• From'the'figure'we'
obtain:'
26!
)()()1()( 11
1 zbkzzfkzf nnnnn +−
+ −−=
11
2 )()1()()( −+
− ++= zzbkzzfkzb nnnnn
Kelly$Lochbaum)equations)'
• Let’s'solve'fn+1(z)'and'bn+1(z)'as'a'funcAon'of'fn(z)'and'bn(z)''• SoluAon'in'matrix'format:''
27!
!"
#$%
&
!!!!
"
#
$$$$
%
&
++−
+−
+=!
"
#$%
&−
−
+
+
)()(
11
11)()(
1
1
1
1
zbzf
kz
kzk
kzk
kz
zbzf
n
n
nn
n
n
n
n
n
n
!"
#$%
&=!
"
#$%
&
+
+
)()(
)()(
1
1
zbzf
Hzbzf
n
nn
n
n
Vocal)tract)model)using)the)lattice)structure)
• We'obtain'a'discreteCAme'representaAon'for'the'tubeCsegment'
model'of'the'vocal'tract'by'concatenaAng'laece'elements'28!
Lattice)structure)
• The'model'has'been'found'to'work'sufficiently'well'also'with'
the'simplified'assumpAon'b0=0'ja'bN=0'• Transfer'funcAon'of'the'enAre'laece'structure'is'obtained'as:'
• So:'
'
29!
!"
#$%
&==!
"
#$%
&=!
"
#$%
&=!
"
#$%
&−
−
−−
−
−
)()(
......)()(
)()(
)()(
0
001
2
21
1
1
zbzf
HHHzbzf
HHzbzf
Hzbzf
NNN
NNN
N
NN
N
N
!"
#$%
&Η=!
"
#$%
&=!
"
#$%
&=!
"
#$%
&− 0
)()(
)()(
...)()(
0)(
0
001
zGz
zbzf
HHHzbzfzS
NNN
N
The)Oilter)used)to)model)the)vocal)tract)'
• We'observe'that'the'filter'H(z)'is'of'allCpole'type:'
• Thus'the'vocal'tract'is'modeled'using'the'aboveCdescribed'allCpole'
structure,'aka'autoregressive'(AR)'model.'That'is'system'whose'
transfer'funcAon'is'of'the'form'
• The'next'lecture'will'discuss'a'mathemaAcal'technique'called'
LINEAR'PREDICTION'(LP)'that'allows'us'to'determine'the'
coefficients'of'the'funcAon'A(z).'The'linear'predicAon'can'in'pracAce'
be'computed'using'a'(fast)'LevinsonCDurbin'algorithm.'
'
'
30!
)(1)(zA
z =Η
But)before)that…)
Some vocal talent!http://www.youtube.com/watch?v=ZxcnloCzxq4!