psy1302 psychology of language lecture 6 speech perception i

58
Psy1302 Psy1302 Psychology of Language Psychology of Language Lecture 6 Lecture 6 Speech Perception I Speech Perception I

Upload: lillie-wagstaff

Post on 14-Dec-2015

233 views

Category:

Documents


3 download

TRANSCRIPT

Psy1302 Psy1302 Psychology of Psychology of LanguageLanguage

Lecture 6Lecture 6

Speech Perception ISpeech Perception I

Understanding SpeechUnderstanding Speech Seemingly easy for us.Seemingly easy for us.

– 1 sec. of conversation contains 1 sec. of conversation contains roughlyroughly 8-10 phonemes8-10 phonemes 3-4 syllables3-4 syllables 2-3 words2-3 words

Difficult for engineers…Difficult for engineers…

Big PictureBig Picture

Big PictureBig Picture

OverviewOverview

PreliminariesPreliminaries– Sound Waves, SpectrogramSound Waves, Spectrogram– Phonetics/Phonology Phonetics/Phonology

Why is Speech Perception Difficult?Why is Speech Perception Difficult?– Lack of invarianceLack of invariance– CoarticulationCoarticulation

So How Do We Do It?So How Do We Do It?– Categorical Perception (our endowment)Categorical Perception (our endowment)– Motor Theory vs. Auditory Theory (next Tues)Motor Theory vs. Auditory Theory (next Tues)– Fine-tuning of Categorical Perception (next Tues)Fine-tuning of Categorical Perception (next Tues)– Top-down Influences (next Thurs)Top-down Influences (next Thurs)

SoundsSoundsA wave is a disturbance of a medium which transports energy through the medium without permanently transporting matter.

ListeningListening

Hearing Hearing Frequency: 20 Hz Frequency: 20 Hz and 20000 Hzand 20000 Hz

Speech: 200-8000 Speech: 200-8000 HzHz

Most sensitive to Most sensitive to 1000-3500 Hz1000-3500 Hz

Phones: 300-3400 Phones: 300-3400 HzHz

Complex SoundsComplex SoundsP

ress

ure

Time

Complex SoundsComplex SoundsP

ress

ure

Time

Pre

ssur

e

Time

Pre

ssur

e

Time

+

=

0

10

20

30

40

50

60

70

80

90

100

0 100 200 300 400 500 600

blue wave

red wave

Am

plitu

de

Frequency

0

10

20

30

40

50

60

70

80

90

100

0 100 200 300 400 500 600

blue wave

red wave

Am

plitu

deFrequency

0

10

20

30

40

50

60

70

80

90

100

0 100 200 300 400 500 600

blue wave

red wave

Am

plitu

de

Frequency

Sounds and Sounds and SpectrogramsSpectrograms

SpeakingSpeaking

Vocal Tract: Vocal Fold Lips(Modeled as a tube)

Vocal Fold Lips

Average Man - Length = 17.4cm

Vocal Tract ModelVocal Tract Model

F1

F2

F3

Vocal Tract = 17.4 cmSpeed of sound = 34800 cm/sec

Speed = Distance/Time = Wavelength x Frequency

Freq = Speed/Wavelength

L = 17.4cm

500Hz

1500Hz

2500Hz

λ = 4L

λ = 4L/3

λ = 4L/5

SpeakingSpeaking

c d

a

b

c da bon top of his deck

Voca

l fo

lds

Lip

s

Vocal TractVocal Tract

http://www.exploratorium.edu/exhibits/vocal_vowels/vocal_vowels.html

ee

oo

oh

ah

eh

(heed)

(hid)

(hay)

(head)

(had)(hall)

(hoe)

(hood)

(who)

(hot)(hut)

(her)

Decrease of F2 (Place)

Incr

ease

of F

1 (T

ongu

e H

eigh

t)

VowelsVowels Vowels: unimpeded Vowels: unimpeded

sound through sound through vibrating vocal vibrating vocal cordscords

Lips

http://www.youtube.com/watch?v=v9Wdf-RwLcs

Fre

quen

cy (

Hz)

Time (s)

1000

Spectrogram of 3 vowelsSpectrogram of 3 vowels

ee oo ah

Phonetics and Phonetics and PhonologyPhonology Phonetics: the study of speech soundsPhonetics: the study of speech sounds

– (articulatory, acoustic, auditory phonetics)(articulatory, acoustic, auditory phonetics)– how are speech sounds are produced, what how are speech sounds are produced, what

are the physical properties of the speech are the physical properties of the speech sounds, how are they interpreted?sounds, how are they interpreted?

Phonology: investigation the organization Phonology: investigation the organization of speech sounds in languages, of speech sounds in languages, – what are the phonotactic rules of the what are the phonotactic rules of the

language, which sounds (phonemes) affect language, which sounds (phonemes) affect meaning?meaning?

Phonemes & PhonesPhonemes & Phones Phones: speech soundsPhones: speech sounds Phonemes: The unit of sound that Phonemes: The unit of sound that

makes a contribution to meaningmakes a contribution to meaning– Minimal PairsMinimal Pairs

[b] vs. [p] e.g., bat vs. pat[b] vs. [p] e.g., bat vs. pat [r] vs. [l] e.g., rump vs. lump[r] vs. [l] e.g., rump vs. lump

– Crosslinguistic DifferencesCrosslinguistic Differences The flied lice was yummy.The flied lice was yummy.

PhonemesPhonemes

Two Kinds of PhonemesTwo Kinds of Phonemes– VowelsVowels– ConsonantsConsonants

Vowel articulationVowel articulation

Chambers in mouthChambers in mouth (above the glottis):(above the glottis):

– Oral cavityOral cavity– Pharynx (behind tongue)Pharynx (behind tongue)– Area between lipsArea between lips– (Nasal cavity)(Nasal cavity)

Length and shape of each chamber affect the Length and shape of each chamber affect the ‘resonance’ (or the properties of the vibration) of ‘resonance’ (or the properties of the vibration) of

vowel soundvowel sound

--- pharynx

Tongue body positionTongue body positionSaggital view of tongue positions in vowelsSaggital view of tongue positions in vowels

1) Tip 2) body 3) root

Articulatory Articulatory DescriptionDescription

4-part classification system for vowels:4-part classification system for vowels:

1) Tongue height1) Tongue height

2) Frontness vs. backness of tongue 2) Frontness vs. backness of tongue

3) Tenseness3) Tenseness

4) Lip rounding4) Lip rounding

[ also (5) Nasality (in many languages)][ also (5) Nasality (in many languages)]

(heed)

(hid)

(hay)

(head)

(had)(hot)

(hoe)

(hood)

(who)

(hall)(hut)

(her)

Decrease of F2 (Place)

Incr

ease

of F

1 (T

ongu

e H

eigh

t)

VowelsVowels Vowels: unimpeded Vowels: unimpeded

sound through sound through vibrating vocal vibrating vocal cordscords

Lips

1) Vowel Height1) Vowel Height

HighHigh vowels: vowels: tongue body is tongue body is raisedraised

MidMid vowels: tongue vowels: tongue body is intermediatebody is intermediate

LowLow vowels: tongue vowels: tongue body is loweredbody is lowered

(heed)

(hid)

(hay)

(head)

(had)(hot)

(hoe)

(hood)

(who)

(hall)(hut)

(her)

Decrease of F2 (Place)

Incr

ease

of F

1 (T

ongu

e H

eigh

t)

Lips

2) Vowel 2) Vowel Front/BacknessFront/Backness

FrontFront vowels: vowels: tongue body is tongue body is pushed forwardpushed forward

BackBack vowels: vowels: tongue body is tongue body is pulled backpulled back

CentralCentral vowels: vowels: tongue body is tongue body is neutralneutral

(heed)

(hid)

(hay)

(head)

(had)(hot)

(hoe)

(hood)

(who)

(hall)(hut)

(her)

Decrease of F2 (Place)

Incr

ease

of F

1 (T

ongu

e H

eigh

t)

Lips

3) Vowel Tenseness3) Vowel Tenseness

TenseTense: more : more extreme position extreme position of the tongue or of the tongue or lipslips

LaxLax: less tense : less tense position of position of tongue or lipstongue or lips

(heed)

(hid)

(hay)

(head)

(had)(hot)

(hoe)

(hood)

(who)

(hall)(hut)

(her)

Decrease of F2 (Place)

Incr

ease

of F

1 (T

ongu

e H

eigh

t)

Lips

4) Vowel Roundness4) Vowel Roundness RoundedRounded: produced : produced

with rounded lipswith rounded lips

UnroundedUnrounded: : produced with produced with unrounded lipsunrounded lips

Many languages also Many languages also have front rounded have front rounded vowels (e.g., French)vowels (e.g., French)

lit “bed”; lu “read”; lit “bed”; lu “read”; loup “wolf” loup “wolf”

(heed)

(hid)

(hay)

(head)

(had)(hot)

(hoe)

(hood)

(who)

(hall)(hut)

(her)

Decrease of F2 (Place)

Incr

ease

of F

1 (T

ongu

e H

eigh

t)

Lips

ConsonantsConsonants

Consonants: Consonants: Impeded soundImpeded sound

Articulatory DescriptionArticulatory Description

For consonants, three-part For consonants, three-part classification system: classification system:

1) Voicing1) Voicing

2) Place (of articulation)2) Place (of articulation)

3) Manner (of articulation)3) Manner (of articulation)

e.g., e.g., voiced labiodental fricative voiced labiodental fricative = = [v][v]

1) Voicing1) Voicing

VoicingVoicing: what is happening at the : what is happening at the LARYNX?LARYNX?

– Are the vocal folds spread apart Are the vocal folds spread apart (voiceless), or are they close together (voiceless), or are they close together and vibrating (voiced)?and vibrating (voiced)?

Front

Vocal folds Glottis

Back

Voiced or Voiceless?Voiced or Voiceless?

voiced voiceless

1.1. [p] [p] ppatat [b] [b] bbatat

2.2. [d] [d] ddie ie [t] [t] ttieie

3.3. [g] [g] ggill ill [k] [k] kkillill

4.4. [f] [f] ffatat [v] [v] vvatat

5. [s] 5. [s] ssipip [z] [z] zzipip

Voiced or Voiceless?Voiced or Voiceless?

[p] [p] ppatat [b] [b] bbatat[t] [t] ttieie [d] [d] ddieie

[k] [k] kkillill [g] [g] ggillill[f] [f] ffatat [v] [v] vvatat

[s] [s] ssipip [z] [z] zzipip[[өө] ] ththighigh [[δδ] ] ththyy[[šš] dilu] dilutitionon [[žž] delu] delusisionon[[čč] e] etchtch [[ĵĵ] e] edgdgee

Voiceless VoicedVoiceless Voiced

liquid

liquidr *

Voicing Value

2) Place2) Place

PlacePlace (of articulation): WHERE in (of articulation): WHERE in the vocal tract is the constriction the vocal tract is the constriction being made?being made?

Place of ArticulationPlace of Articulation Place of contact Place of contact

creating the creating the obstruction in making obstruction in making the consonantthe consonant

1.1. BilabialBilabial2.2. Labial DentalLabial Dental3.3. Dental/InterdentalDental/Interdental4.4. AlveolarAlveolar5.5. PalatoalveolarPalatoalveolar6.6. PalatalPalatal7.7. VelarVelar8.8. UvularUvular9.9. GlottalGlottal

PlacePlace

BilabialBilabial: w/ both lips: w/ both lips– [b], [p], [m], [w][b], [p], [m], [w]

LabiodentalLabiodental: w/ lower lip and upper teeth: w/ lower lip and upper teeth– [f], [v][f], [v]

(Inter-)dental(Inter-)dental: tip of tongue btw. the : tip of tongue btw. the teethteeth– [[өө] ] ththin, [in, [δδ] ] ththen en

AlveolarAlveolar: tongue tip on alveolar ridge: tongue tip on alveolar ridge– [t], [d], [n], [l], [s], [z][t], [d], [n], [l], [s], [z]

PlacePlace

(Alveo-)Palatal(Alveo-)Palatal: w/ tongue at or near hard : w/ tongue at or near hard palatepalate– Alveopalatal:Alveopalatal: [[šš] ] shshill, [ill, [žž] a] azzure, [ure, [čč] ] chchurch, [urch, [ĵĵ] ]

jjillill– Palatal: [j] Palatal: [j] yyouou

Velar:Velar: w/ tongue at or near soft palate, or w/ tongue at or near soft palate, or velumvelum– [k], [g], [k], [g], [[ŋŋ] ki] kingng

GlottalGlottal: produced at the larynx: produced at the larynx– [[//] u] uhh-oh, [h]-oh, [h]

liquid

liquidr

3) Manner3) Manner

MannerManner (of articulation): HOW is (of articulation): HOW is the air being modified as it the air being modified as it moves through the vocal tract? moves through the vocal tract?

MannerManner

Stop:Stop: full obstruction in oral cavity full obstruction in oral cavity (w/ velum raised/closed)(w/ velum raised/closed)– [p], [b], [t], [d], [k], [g], [p], [b], [t], [d], [k], [g], [[//]]

Fricative:Fricative: partial obstruction w/ partial obstruction w/ turbulenceturbulence– [f], [v], [f], [v], [[өө] , [] , [δδ], ], [s], [z], [s], [z], [[š]š], [, [žž]]

Affricate: Affricate: stop followed by stop followed by fricativefricative– [[čč], [], [ĵĵ]]

+

MannerManner

Nasal: Nasal: full obstruction in oral cavity w/ full obstruction in oral cavity w/ velum lowered/openvelum lowered/open– [m], [n], [ŋ][m], [n], [ŋ]

Liquid: Liquid: constriction but no turbulenceconstriction but no turbulence– [l] = lateral liquid[l] = lateral liquid– [r] = retroflex liquid[r] = retroflex liquid

Glide: Glide: slightly more constriction than a slightly more constriction than a vowelvowel– [w], [j] (and shows additional evidence of [w], [j] (and shows additional evidence of

“consonantness”: patterns with consonants)“consonantness”: patterns with consonants)

liquid

liquidr

Manner

Consonants & Vowel Consonants & Vowel TriviaTrivia Number of Consonants and Number of Consonants and

VowelsVowels– Varies dialectallyVaries dialectally

(Mary, Marry, Merry)(Mary, Marry, Merry)

– American HeritageAmerican Heritage 25 consonants and 18 vowels25 consonants and 18 vowels

Consonants vs. VowelsConsonants vs. Vowels

Silbo Gomero, a Whistling LanguageSilbo Gomero, a Whistling Language Example 4 vowels and 4 consonantsExample 4 vowels and 4 consonants 4000+ words4000+ words

– A: Hey, Servando!A: Hey, Servando!B: What?B: What?A: Look, go tell Julio to bring the castanets.A: Look, go tell Julio to bring the castanets.B: OK. B: OK. A: Hey, Julio!A: Hey, Julio!B: What?B: What?A: Lili says you should go get the kids and A: Lili says you should go get the kids and have them bring the castanets for the have them bring the castanets for the party.party.B: OK.OK.OK.B: OK.OK.OK.

http://www.cnn.com/2003/TECH/science/11/18/whistle.language.ap/http://www.agulo.net/silbo/silbo.mp3http://www.agulo.net/silbo/silbo.mp3

Why is it so difficult?Why is it so difficult?(engineer’s lament)(engineer’s lament)

Segmentation Problem:Segmentation Problem: Speech sounds seem separable and Speech sounds seem separable and

sequential: like beads on a stringsequential: like beads on a string

Reality:Reality:– Speech sounds overlap Speech sounds overlap – Each speech sound is affected by the Each speech sound is affected by the

elements around itelements around it

– (See your spectrograms!)(See your spectrograms!)

Co-articulationCo-articulation

speech sound is affected by the speech sound is affected by the elements around itelements around it

//šš/ (sh) in sheep, shoe/ (sh) in sheep, shoe

/k/ (k) in keep, cook/k/ (k) in keep, cook

/d/ (d) in dee, do/d/ (d) in dee, do

Speech RateSpeech Rate

“I am going to leave.”

Why is it so difficult?Why is it so difficult?(engineer’s lament)(engineer’s lament)

The “Lack of Invariance” problem– local coarticulation effects– speech rate & style– prosody– individual variability

accent/dialect speaker’s vocal tract – length matters.

How do we do it?How do we do it?

How do we perceive the sounds?How do we perceive the sounds? What acoustic cues do we use?What acoustic cues do we use?

http://www.haskins.yale.edu/featured/patplay.html

Pattern Playback Pattern Playback MachineMachine

Spectrogram RevistedSpectrogram Revisted

Steady StateSteady State Transitional StateTransitional State

Fre

quen

cy (

Hz)

Time (msec)

ba da ga

0

10

20

30

40

50

60

70

80

90

100

-9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9

Stimulus ID

% I

den

tifi

cati

on

ba

da

ga

Categorical PerceptionCategorical Perception(Idealized Data)(Idealized Data)

Categorical PerceptionCategorical Perception

Categorical Perception Categorical Perception of Soundsof Sounds Is the Categorical Perception phenomenon Is the Categorical Perception phenomenon

unique to speech sounds or can we find it in unique to speech sounds or can we find it in other sounds?other sounds?

Which part of the sound are we using to do Which part of the sound are we using to do Categorical Perception?Categorical Perception?

Categorical Perception of Categorical Perception of SoundsSounds

VS.

Mattingly et al. (1971)Mattingly et al. (1971)

Which of the Which of the following following manipulations manipulations results in results in categorical categorical perception?perception?

NOYES NO

NO NO NO