psy1302 psychology of language lecture 6 speech perception i
TRANSCRIPT
Psy1302 Psy1302 Psychology of Psychology of LanguageLanguage
Lecture 6Lecture 6
Speech Perception ISpeech Perception I
Understanding SpeechUnderstanding Speech Seemingly easy for us.Seemingly easy for us.
– 1 sec. of conversation contains 1 sec. of conversation contains roughlyroughly 8-10 phonemes8-10 phonemes 3-4 syllables3-4 syllables 2-3 words2-3 words
Difficult for engineers…Difficult for engineers…
OverviewOverview
PreliminariesPreliminaries– Sound Waves, SpectrogramSound Waves, Spectrogram– Phonetics/Phonology Phonetics/Phonology
Why is Speech Perception Difficult?Why is Speech Perception Difficult?– Lack of invarianceLack of invariance– CoarticulationCoarticulation
So How Do We Do It?So How Do We Do It?– Categorical Perception (our endowment)Categorical Perception (our endowment)– Motor Theory vs. Auditory Theory (next Tues)Motor Theory vs. Auditory Theory (next Tues)– Fine-tuning of Categorical Perception (next Tues)Fine-tuning of Categorical Perception (next Tues)– Top-down Influences (next Thurs)Top-down Influences (next Thurs)
SoundsSoundsA wave is a disturbance of a medium which transports energy through the medium without permanently transporting matter.
ListeningListening
Hearing Hearing Frequency: 20 Hz Frequency: 20 Hz and 20000 Hzand 20000 Hz
Speech: 200-8000 Speech: 200-8000 HzHz
Most sensitive to Most sensitive to 1000-3500 Hz1000-3500 Hz
Phones: 300-3400 Phones: 300-3400 HzHz
Complex SoundsComplex SoundsP
ress
ure
Time
Pre
ssur
e
Time
Pre
ssur
e
Time
+
=
0
10
20
30
40
50
60
70
80
90
100
0 100 200 300 400 500 600
blue wave
red wave
Am
plitu
de
Frequency
0
10
20
30
40
50
60
70
80
90
100
0 100 200 300 400 500 600
blue wave
red wave
Am
plitu
deFrequency
0
10
20
30
40
50
60
70
80
90
100
0 100 200 300 400 500 600
blue wave
red wave
Am
plitu
de
Frequency
SpeakingSpeaking
Vocal Tract: Vocal Fold Lips(Modeled as a tube)
Vocal Fold Lips
Average Man - Length = 17.4cm
Vocal Tract ModelVocal Tract Model
F1
F2
F3
Vocal Tract = 17.4 cmSpeed of sound = 34800 cm/sec
Speed = Distance/Time = Wavelength x Frequency
Freq = Speed/Wavelength
L = 17.4cm
500Hz
1500Hz
2500Hz
λ = 4L
λ = 4L/3
λ = 4L/5
Vocal TractVocal Tract
http://www.exploratorium.edu/exhibits/vocal_vowels/vocal_vowels.html
ee
oo
oh
ah
eh
(heed)
(hid)
(hay)
(head)
(had)(hall)
(hoe)
(hood)
(who)
(hot)(hut)
(her)
Decrease of F2 (Place)
Incr
ease
of F
1 (T
ongu
e H
eigh
t)
VowelsVowels Vowels: unimpeded Vowels: unimpeded
sound through sound through vibrating vocal vibrating vocal cordscords
Lips
http://www.youtube.com/watch?v=v9Wdf-RwLcs
Phonetics and Phonetics and PhonologyPhonology Phonetics: the study of speech soundsPhonetics: the study of speech sounds
– (articulatory, acoustic, auditory phonetics)(articulatory, acoustic, auditory phonetics)– how are speech sounds are produced, what how are speech sounds are produced, what
are the physical properties of the speech are the physical properties of the speech sounds, how are they interpreted?sounds, how are they interpreted?
Phonology: investigation the organization Phonology: investigation the organization of speech sounds in languages, of speech sounds in languages, – what are the phonotactic rules of the what are the phonotactic rules of the
language, which sounds (phonemes) affect language, which sounds (phonemes) affect meaning?meaning?
Phonemes & PhonesPhonemes & Phones Phones: speech soundsPhones: speech sounds Phonemes: The unit of sound that Phonemes: The unit of sound that
makes a contribution to meaningmakes a contribution to meaning– Minimal PairsMinimal Pairs
[b] vs. [p] e.g., bat vs. pat[b] vs. [p] e.g., bat vs. pat [r] vs. [l] e.g., rump vs. lump[r] vs. [l] e.g., rump vs. lump
– Crosslinguistic DifferencesCrosslinguistic Differences The flied lice was yummy.The flied lice was yummy.
Vowel articulationVowel articulation
Chambers in mouthChambers in mouth (above the glottis):(above the glottis):
– Oral cavityOral cavity– Pharynx (behind tongue)Pharynx (behind tongue)– Area between lipsArea between lips– (Nasal cavity)(Nasal cavity)
Length and shape of each chamber affect the Length and shape of each chamber affect the ‘resonance’ (or the properties of the vibration) of ‘resonance’ (or the properties of the vibration) of
vowel soundvowel sound
--- pharynx
Tongue body positionTongue body positionSaggital view of tongue positions in vowelsSaggital view of tongue positions in vowels
1) Tip 2) body 3) root
Articulatory Articulatory DescriptionDescription
4-part classification system for vowels:4-part classification system for vowels:
1) Tongue height1) Tongue height
2) Frontness vs. backness of tongue 2) Frontness vs. backness of tongue
3) Tenseness3) Tenseness
4) Lip rounding4) Lip rounding
[ also (5) Nasality (in many languages)][ also (5) Nasality (in many languages)]
(heed)
(hid)
(hay)
(head)
(had)(hot)
(hoe)
(hood)
(who)
(hall)(hut)
(her)
Decrease of F2 (Place)
Incr
ease
of F
1 (T
ongu
e H
eigh
t)
VowelsVowels Vowels: unimpeded Vowels: unimpeded
sound through sound through vibrating vocal vibrating vocal cordscords
Lips
1) Vowel Height1) Vowel Height
HighHigh vowels: vowels: tongue body is tongue body is raisedraised
MidMid vowels: tongue vowels: tongue body is intermediatebody is intermediate
LowLow vowels: tongue vowels: tongue body is loweredbody is lowered
(heed)
(hid)
(hay)
(head)
(had)(hot)
(hoe)
(hood)
(who)
(hall)(hut)
(her)
Decrease of F2 (Place)
Incr
ease
of F
1 (T
ongu
e H
eigh
t)
Lips
2) Vowel 2) Vowel Front/BacknessFront/Backness
FrontFront vowels: vowels: tongue body is tongue body is pushed forwardpushed forward
BackBack vowels: vowels: tongue body is tongue body is pulled backpulled back
CentralCentral vowels: vowels: tongue body is tongue body is neutralneutral
(heed)
(hid)
(hay)
(head)
(had)(hot)
(hoe)
(hood)
(who)
(hall)(hut)
(her)
Decrease of F2 (Place)
Incr
ease
of F
1 (T
ongu
e H
eigh
t)
Lips
3) Vowel Tenseness3) Vowel Tenseness
TenseTense: more : more extreme position extreme position of the tongue or of the tongue or lipslips
LaxLax: less tense : less tense position of position of tongue or lipstongue or lips
(heed)
(hid)
(hay)
(head)
(had)(hot)
(hoe)
(hood)
(who)
(hall)(hut)
(her)
Decrease of F2 (Place)
Incr
ease
of F
1 (T
ongu
e H
eigh
t)
Lips
4) Vowel Roundness4) Vowel Roundness RoundedRounded: produced : produced
with rounded lipswith rounded lips
UnroundedUnrounded: : produced with produced with unrounded lipsunrounded lips
Many languages also Many languages also have front rounded have front rounded vowels (e.g., French)vowels (e.g., French)
lit “bed”; lu “read”; lit “bed”; lu “read”; loup “wolf” loup “wolf”
(heed)
(hid)
(hay)
(head)
(had)(hot)
(hoe)
(hood)
(who)
(hall)(hut)
(her)
Decrease of F2 (Place)
Incr
ease
of F
1 (T
ongu
e H
eigh
t)
Lips
Articulatory DescriptionArticulatory Description
For consonants, three-part For consonants, three-part classification system: classification system:
1) Voicing1) Voicing
2) Place (of articulation)2) Place (of articulation)
3) Manner (of articulation)3) Manner (of articulation)
e.g., e.g., voiced labiodental fricative voiced labiodental fricative = = [v][v]
1) Voicing1) Voicing
VoicingVoicing: what is happening at the : what is happening at the LARYNX?LARYNX?
– Are the vocal folds spread apart Are the vocal folds spread apart (voiceless), or are they close together (voiceless), or are they close together and vibrating (voiced)?and vibrating (voiced)?
Front
Vocal folds Glottis
Back
1.1. [p] [p] ppatat [b] [b] bbatat
2.2. [d] [d] ddie ie [t] [t] ttieie
3.3. [g] [g] ggill ill [k] [k] kkillill
4.4. [f] [f] ffatat [v] [v] vvatat
5. [s] 5. [s] ssipip [z] [z] zzipip
Voiced or Voiceless?Voiced or Voiceless?
[p] [p] ppatat [b] [b] bbatat[t] [t] ttieie [d] [d] ddieie
[k] [k] kkillill [g] [g] ggillill[f] [f] ffatat [v] [v] vvatat
[s] [s] ssipip [z] [z] zzipip[[өө] ] ththighigh [[δδ] ] ththyy[[šš] dilu] dilutitionon [[žž] delu] delusisionon[[čč] e] etchtch [[ĵĵ] e] edgdgee
Voiceless VoicedVoiceless Voiced
2) Place2) Place
PlacePlace (of articulation): WHERE in (of articulation): WHERE in the vocal tract is the constriction the vocal tract is the constriction being made?being made?
Place of ArticulationPlace of Articulation Place of contact Place of contact
creating the creating the obstruction in making obstruction in making the consonantthe consonant
1.1. BilabialBilabial2.2. Labial DentalLabial Dental3.3. Dental/InterdentalDental/Interdental4.4. AlveolarAlveolar5.5. PalatoalveolarPalatoalveolar6.6. PalatalPalatal7.7. VelarVelar8.8. UvularUvular9.9. GlottalGlottal
PlacePlace
BilabialBilabial: w/ both lips: w/ both lips– [b], [p], [m], [w][b], [p], [m], [w]
LabiodentalLabiodental: w/ lower lip and upper teeth: w/ lower lip and upper teeth– [f], [v][f], [v]
(Inter-)dental(Inter-)dental: tip of tongue btw. the : tip of tongue btw. the teethteeth– [[өө] ] ththin, [in, [δδ] ] ththen en
AlveolarAlveolar: tongue tip on alveolar ridge: tongue tip on alveolar ridge– [t], [d], [n], [l], [s], [z][t], [d], [n], [l], [s], [z]
PlacePlace
(Alveo-)Palatal(Alveo-)Palatal: w/ tongue at or near hard : w/ tongue at or near hard palatepalate– Alveopalatal:Alveopalatal: [[šš] ] shshill, [ill, [žž] a] azzure, [ure, [čč] ] chchurch, [urch, [ĵĵ] ]
jjillill– Palatal: [j] Palatal: [j] yyouou
Velar:Velar: w/ tongue at or near soft palate, or w/ tongue at or near soft palate, or velumvelum– [k], [g], [k], [g], [[ŋŋ] ki] kingng
GlottalGlottal: produced at the larynx: produced at the larynx– [[//] u] uhh-oh, [h]-oh, [h]
3) Manner3) Manner
MannerManner (of articulation): HOW is (of articulation): HOW is the air being modified as it the air being modified as it moves through the vocal tract? moves through the vocal tract?
MannerManner
Stop:Stop: full obstruction in oral cavity full obstruction in oral cavity (w/ velum raised/closed)(w/ velum raised/closed)– [p], [b], [t], [d], [k], [g], [p], [b], [t], [d], [k], [g], [[//]]
Fricative:Fricative: partial obstruction w/ partial obstruction w/ turbulenceturbulence– [f], [v], [f], [v], [[өө] , [] , [δδ], ], [s], [z], [s], [z], [[š]š], [, [žž]]
Affricate: Affricate: stop followed by stop followed by fricativefricative– [[čč], [], [ĵĵ]]
+
MannerManner
Nasal: Nasal: full obstruction in oral cavity w/ full obstruction in oral cavity w/ velum lowered/openvelum lowered/open– [m], [n], [ŋ][m], [n], [ŋ]
Liquid: Liquid: constriction but no turbulenceconstriction but no turbulence– [l] = lateral liquid[l] = lateral liquid– [r] = retroflex liquid[r] = retroflex liquid
Glide: Glide: slightly more constriction than a slightly more constriction than a vowelvowel– [w], [j] (and shows additional evidence of [w], [j] (and shows additional evidence of
“consonantness”: patterns with consonants)“consonantness”: patterns with consonants)
Consonants & Vowel Consonants & Vowel TriviaTrivia Number of Consonants and Number of Consonants and
VowelsVowels– Varies dialectallyVaries dialectally
(Mary, Marry, Merry)(Mary, Marry, Merry)
– American HeritageAmerican Heritage 25 consonants and 18 vowels25 consonants and 18 vowels
Consonants vs. VowelsConsonants vs. Vowels
Silbo Gomero, a Whistling LanguageSilbo Gomero, a Whistling Language Example 4 vowels and 4 consonantsExample 4 vowels and 4 consonants 4000+ words4000+ words
– A: Hey, Servando!A: Hey, Servando!B: What?B: What?A: Look, go tell Julio to bring the castanets.A: Look, go tell Julio to bring the castanets.B: OK. B: OK. A: Hey, Julio!A: Hey, Julio!B: What?B: What?A: Lili says you should go get the kids and A: Lili says you should go get the kids and have them bring the castanets for the have them bring the castanets for the party.party.B: OK.OK.OK.B: OK.OK.OK.
http://www.cnn.com/2003/TECH/science/11/18/whistle.language.ap/http://www.agulo.net/silbo/silbo.mp3http://www.agulo.net/silbo/silbo.mp3
Why is it so difficult?Why is it so difficult?(engineer’s lament)(engineer’s lament)
Segmentation Problem:Segmentation Problem: Speech sounds seem separable and Speech sounds seem separable and
sequential: like beads on a stringsequential: like beads on a string
Reality:Reality:– Speech sounds overlap Speech sounds overlap – Each speech sound is affected by the Each speech sound is affected by the
elements around itelements around it
– (See your spectrograms!)(See your spectrograms!)
Co-articulationCo-articulation
speech sound is affected by the speech sound is affected by the elements around itelements around it
//šš/ (sh) in sheep, shoe/ (sh) in sheep, shoe
/k/ (k) in keep, cook/k/ (k) in keep, cook
/d/ (d) in dee, do/d/ (d) in dee, do
Why is it so difficult?Why is it so difficult?(engineer’s lament)(engineer’s lament)
The “Lack of Invariance” problem– local coarticulation effects– speech rate & style– prosody– individual variability
accent/dialect speaker’s vocal tract – length matters.
How do we do it?How do we do it?
How do we perceive the sounds?How do we perceive the sounds? What acoustic cues do we use?What acoustic cues do we use?
Spectrogram RevistedSpectrogram Revisted
Steady StateSteady State Transitional StateTransitional State
0
10
20
30
40
50
60
70
80
90
100
-9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9
Stimulus ID
% I
den
tifi
cati
on
ba
da
ga
Categorical PerceptionCategorical Perception(Idealized Data)(Idealized Data)
Categorical Perception Categorical Perception of Soundsof Sounds Is the Categorical Perception phenomenon Is the Categorical Perception phenomenon
unique to speech sounds or can we find it in unique to speech sounds or can we find it in other sounds?other sounds?
Which part of the sound are we using to do Which part of the sound are we using to do Categorical Perception?Categorical Perception?