spectrogram & its reading by tae-yeoub jang. reviving sonus 2 what is spectrogram? begin to be...
Post on 18-Dec-2015
225 views
TRANSCRIPT
Reviving Sonus 2
What is spectrogram? Begin to be used since 1940s Another representation of frequency
domain analysis The most popular way of representing
spectral information 3 dimensional representation
X-axis: Time Y-axis: Frequency Darkness (or color): Energy
Reviving Sonus 5
Wideband vs. Narrowbandspectrograms of the question "Is Pat sad, or mad?" The 5th, 10th and 15th harmonics have been marked by white squares in two of the vowels
Reviving Sonus 6
Types of spectrogram Wideband spectrogram
better time resolution eg) 15 msec window, 1 msec shift, 125
Hz bandwidth Narrowband spectrogram
better frequency resolution eg) 50 msec window, 1 msec shift, 40 H
z bandwidth
Reviving Sonus 7
Advantages & Disadvantages Advantages
Time alignment
Disadvantages Less reliable than waveform
Reviving Sonus 8
Vowel Spectrogram Formant frequencies are critical cues fo
r vowel distinction F1: Height
high vowels: low F1 F2: Backness
back vowels: low F2
Reviving Sonus 9
Example formant frequencies of English monophthongs
F3F3 290
0255
0249
0249
0264
0238
0230
0250
0239
0
F2F2 2250
1900
1770
1660
1100
1030
870 1500
1190
F1F1 280 400 550 690 710 450 310 900 640
Reviving Sonus 10
"heed, hid, head, had, hod, hawed, hood, who'd" (a male speaker, American English)
Reviving Sonus 11
Consonant Spectrogram General
Acoustic structure more complicated than vowels
Adjacent sounds (especially vowels) convey important information locus
High frequency characteristics especially for fricatives and affricates
Reviving Sonus 12
What is LOCUS Information of formant transition from vowels
into obstruents or from obstruents into vowels
The target frequency that each formant transition is heading toward as an obstruction is made, or the frequency the transition comes as the obstruction is released
The characteristic of the consonantal place and manner roughly the same in different vowel contexts
Reviving Sonus 13
Stops General
Fairly distinct locus for each place Burst Silence during the closure (only at
syllable onset position) Virtually no difference during the
closure
Reviving Sonus 14
Stops (cntd.) Voicing distinction
voiced: vertical striations for voiced sounds, less abrupt burst, frequently weakened to be like fricatives or approximants
voiceless: generally abrupt burst at higher frequency area
Reviving Sonus 15
Stops (cntd.) Place distinction
bilabial relatively low F2, F3 locus rising into and
falling out of vowel weak and spread vertical lines
alveolar F2 locus about 1800 Hz Strong vertical lines
velar Velar pinch: vowels F2, F3 merging often double burst long formant transitions
Reviving Sonus 16
Stops (cntd.) Manner distinction
Silence duration, VOT, vowel F0
silence VOT F0
aspirated
short long high
tense long short high
lax med med low
Reviving Sonus 19
Fricatives General
Random noise pattern especially in high frequency regions
Place distinction Labiodental [f, v]: rising locus into the following vowel Dental [, ð]: major energy above 6000Hz Alveolar [s, z]: major energy above 4000Hz Alveopalatal [š, ž ]: major energy above 6000Hz Glottal [h]: the trace of formant frequencies of neighbou
ring vowels
Reviving Sonus 20
Fricatives (cntd.) Weak vs. strong
Strong [s, z, š, ž ]: darker bands Weak [f, v, , ð ]: spread and fainter
Voiced [v, ð ]: often so weak and confused with nasals or approximants
Cues to tell [] from [f]: higher formants of [] fall into adjacent vowels
Reviving Sonus 23
Nasals General
Formants similar to vowels but fainter Very low F1 (about 250Hz), F2 (about
2500Hz), and F3 (about 3250Hz) Place distinction
bilabial [m]: downward F2, F3 locus alveolar [n]: less amount of F2 transition velar [ŋ ]: velar pinch
Reviving Sonus 25
Liquies & Approximants General
Formants similar to vowels but fainter (especially at high frequency regions)
Approximately F1(250Hz), F2(1200Hz), F3(2400Hz)
Change in formant structure
Reviving Sonus 26
Liquids & Approximants(cntd.) Phone specific properties
Labial glide [w]: very low F1, F2 (600-1000Hz|) and gets
too close to each relatively low F3 rapid falloff of spectral amplitude
Palatal glide [y]: extremely low F1 extremely high F2, F3
Reviving Sonus 27
Liquids & Approximants(cntd.) Phone specific properties (cntd.)
Flap [ ]: soft burst, short durationՐ Retroflex [r]:
F3 dipping down close to F2 General lowering of F3, F4
Lateral [l]: Low F1, F2 (approx. F1 250Hz, F2 1200Hz) usually substantial energy in the high F region
Reviving Sonus 29
Final remarks Spectrogram is not the only cue
for acoustic distinction of speech sounds
Very often, the waveform is more reliable
Reviving Sonus 30
References & Links http://cslu.cse.ogi.edu/tutordemos/SpectrogramReadin
g/spectrogram_reading.html http://hctv.humnet.ucla.edu/departments/linguistics/Vo
welsandConsonants/course http://www.cs.indiana.edu/~port/teach/306/speech.aco
ustics.html http://www.phon.ucl.ac.uk/courses/spsci/b203/week2-5.