week 11: speech perception: how do we perceive...

Post on 21-Jul-2018

221 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

LING 40030 - General Linguistics

• Week 11: Speech Perception: How do we

perceive speech?

o Variability inherent in the production of speech

o Making sense of variability in speech sounds

o Acoustics and perceptual cues

o Patterns found in speech

o A phonological spin on perception

Recap on Articulatory Phonetics

• Consonants are classified using 3 labels:

Voicing (voiced or voiceless)

Place of Articulation ( e.g. Bilabial, Alveolar,

Velar)

Manner of Articulation (e.g. stop/plosive,

fricative, affricate)

What sound is this?

It’s [ � ]

Recap on Articulatory Phonetics

• Vowels are also classified using 3 labels:

Height (high, high-mid, low-mid, low)

Fronting ( front, central, back)

Lip rounding (rounded or unrounded)

• Can you produce:

• A pulmonic ingressive voiceless labio-dental

fricative?

• High Front Vowel?

• Voiced postalveolar affricate, low back unrounded

vowel followed by a voiced alveolar nasal?

Speech Communication

Return of the Danish Eth [�]

• IPA symbols correspond to one and only

articulatory configuration

• The question was raised why Danish [�] (e.g. [ma�] ‘food’ ) sounds different to English [�] ?

• Language internally, each eth corresponds to one

and only one configuration

• Partial answer: broad transcription (less detailed)

versus narrow transcription.

Return of the Danish Eth [�]

• Partial answer: historical differences in the

transcription system [ not sound system ]

• Basbøll (2005) notes the Dania transcription system

first developed 1890.

• IPA started 1888 (currently at 2005 revision)

• Basbøll (2005) provides a narrow transcription in

IPA of Danish ‘soft’ d as [ D ] with the description of an “alveolar non-lateral approximant”

Return of the Danish Eth [�]

• Yet another partial answer: the difference between

[d] and [�] in both languages can be viewed as a matter of sonority (loudness)

• vowels>approximants>nasals>fricatives>stops

• Do you want to know why some sounds are louder

than others?

• Even if you don’t, here’s the answer…

Source-Filter Theory of Speech Production

• Please rotate your head slightly!

Phonetics and Phonology

• Phonetics/Phonology focuses on sounds (or

gestures), their articulations, their patterning

within language and how they are organised by

the language user in the communication chain.

• We should be able to describe, record and

model how the structure of language is related

to, and affected by, the medium of

communication.

Phonetics and Phonology

• Traditionally, phonology did not admit much by way of detailed phonetics.

• Phonetics has been viewed as being on the periphery of grammar while is within the grammar proper.

• Phonetics deals with sounds as articulatory or acoustic entities.

• Phonology deals with sounds as contrastive entities.

Spectrogram: “She said sushi”

� i s � d s u � i

Is Phonetic Detail Relevant to

Phonology?

• From little acorns…

• Sound changes often have ‘minor’ predictable

phonetic differences as their genesis

• For instance, the development of secondary

articulations in Irish

• What is a secondary articulation?

Secondary Articulations

Secondary Articulations

• Archaic Irish:

• [ b�l� ] (nom.) versus [ b�li ] (gen.) ‘limb’

• No contrast between palatalised and velarised

sounds.

• Narrower transcription: [b�l�� ] versus [ b�li ]

• Word-final vowels were lost

• Old Irish

• [b�l� ] versus [ b�l ]

• Contrast between palatalised and velarised

sounds arose

Variability: Stress

• Stress is the relative emphasis that may be

given to certain syllables in a word.

• Stress is considered to be a property of the

syllable, but its affects most often (though by

no means exclusively) show themselves on

vowels.

Variability: Stress

• Stressed vowels (syllables) can have the

following attributes:

• additional (relative) loudness (acoustically: amplitude)

• longer duration

• pitch movement

• IPA recognises two degrees of stress:• Primary stress: [ st� s ]

• Secondary stress [ �f�n�t���n ]

Transcription of Stress

• Photo

• Photograph

• Photography

• Photographic

• Details

• Manmade

• Bernard

fot����æff�t����fifot����æf�k

foto

ditelzmænmedb��n��d

d�telzmænmed

b��n��d

Isn’t there a load of Schwas in those

transcriptions?

• English, unstressed syllables typically have a reduced vowel quality

• Often the vowels in unstressed syllables become more centralised…Often they become schwa [�]

• Vowels followed by [�] can be rhoticised (i.e. a have a degree of retroflexion)

• Additional retroflexion can be indicated with the rhoticity diacritic [ � ]

• In English, schwa before [�] rhoticised in unstressed syllables eg murder [m��d��], banter [bænt��],

Variability: Vowels

• Vowels (and vowel systems) exhibit

considerable variability

• Irish English Speakers: How front is your

[u]?

• The further north on the island you go the

more fronted [u] is.

• The further south and west you go the

backed [ u ] is

Variability: Vowels

u

Across Irish English dialects, the so-called

“high back vowel” can range from [u] through

[�] to [y]: basically it is high and rounded

Variability: Consonants

• Sounds are not uttered in isolation

• Variability is induced during each articulatory phase

by sounds preceding and following

• Approach Phase: articulators are moved towards their

intended target

• Hold Phase: articulators reach target and maintain

that configuration

• Release Phase: articulators move away from the

target position

Variability: Consonants

• This phenomenon is called coarticulation

• Coarticulation refers to the fact the speech

production is a continually varying structure.

The timing of one articulatory configuration is

not absolutely separable from the next (or the

previous) configuration

An Eye into Coarticulation

• See handout (or overhead, if you’re watching!)

• The degree of observable coarticulation,

though coarticulation can be quite pervasive

e.g. free Ontario, sinistre structure

Coarticulation / Assimilation

• In connected speech, morphemes are added to words

and words are combined with other words to build

phrases.

• Each of these operations is likely to provide scope for

coarticulatory or assimilatory effects to be witnessed.

• Assimilation can defined as the process by which

speech segment becomes more like that of another

segment within a word, or at a word

boundary…that’s not terribly different than

coarticulation

Coarticulation / Assimilation• The difference between coarticulation and assimilation

can be viewed as being a matter of degree.

• Coarticulation is certainly the genesis of assimilation

• Nasal-Consonant clusters Mexican Spanish:

– [ ka mp o] ‘country’

– [ a �f � a] ‘amphora’

– [ ma n!t! o] ‘cloak’

– [ ma ns o] ‘gentle’

– [ ma "t� o] ‘I stain’

– [ ma #k o] ‘one-handed’

• English prefix ‘in-’

– inedible, improbable, indescribable, i[n/#]complete, irregular,

illegal

Assimilation and other

Connected Speech Processes• Assimilation is not restricted to place of articulation:

• French speaker ordering a Jack Daniels [$a� danj lz]

• Ukrainian saying “just don’t..” [d$'zd dont]• Lithuanian speaker “fast growing” [ faz��o�# ] (note: two

processes)

• Elision (deletion): Make amends [ mek am nz]

• Fusion: e.g. tune is Irish English [ t�un ] versus RP [ tjun ]

• Liaison: Most famously in French: de temps en temps [ d!�t!�(z � t!�( ]

• Insertion: in Irish gorm ‘blue’ [ �' ��m� ]

• Juncture: I scream/ice cream; nitrate/night rate; why

choose/white shoes

Do you notice all kinds of variability?

• Consider the following words from Irish English

• [ t*�p ], [ t/�ai ], [ stai ], [ w�s1�� ], [ b's1 ]

• Prior to last week, would you have realised:

– Top begins with a voiceless aspirated alveolar stop?

– Try begins with a voiceless retracted alveolar stop?

– The second sound is sty is a voiceless unaspirated alveolar

stop?

– The middle letter in water is a voiceless retracted alveolar

fricative?

– The final letter in but is (or can be) a voiceless retracted

alveolar fricative?

Do you notice all kinds of variability?

• Consider the following words from Irish English

• [ t�p ], [ t��ai ], [ stai ], [ w�s��� ], [ b's� ]

• What did you think about these sounds?

• That they’re all kinds of ‘t’

• In actual fact, they’re all kinds of /t/

• These sounds function as a single unit, the

phoneme

• Each sound here is an allophone of the /t/

phoneme

The Phoneme: Bringing Order to Chaos

• Phonemes are the units which signal a difference to

meaning

• A speech sound which speakers of a language can

recognise as a distinctive sound (in their language) and

affects meaning

• A psychological real speech sound, recognised as

different from other speech sounds e.g. in speech errors:

speech capabilities → skeech papabilities

• An abstract mental representation: phonemes exist only

in opposition (contrast) to other phonemes

The Phoneme Rules

[allophoneA] / contextP

/PhonemeX/

[allophoneB] / contextQ

• This rule can be read as “/X/ is realised as

allophoneA in contextP and as allophoneB in

contextQ

Alternatively

• Is it a bird? Is it a plane? No, it’s...

/ Kal-El /

[ Superman ] /

/

Lois Lane need help

[ Clark Kent ] at the Daily Planet

How do we Identify Phonemes?

• Phonemes are composed of allophones (i.e.,

determining how a phoneme is realised under

specific conditions)

• Step 1: Ascertain the environments of sounds

under investigation

• Step 2: Analyse patterns observed in their

distribution

• Step 3: Determine the rule(s) required to

describe the pattern you observe.

Consider the following data from Spanish

[ ba#ko ] bank [ bo�e2a ] bodega

[ amb�s ] both [ s mbla ] seem

[ de3e ] have to [ a3e ] have

[ da ] give [ dias ] days

[ dando ] giving [ banda ] ribbon

[ na�a ] nothing [ abla�o ] spoken

[ �anar ] gain [ �ata ] gate

[ l #�wa ] language [ sa#� ia sangria

[ d o2a ] drug [ mu2a ] boundary

and in particular, the sounds [d, �, b, 3, �, 2 ]

Environment Chart

b 3 d � � 2

#

Environment Chart

b 3 d � � 2

#_

Environment Chart

b 3 d � � 2

#_a

Environment Chart

b 3 d � � 2

#_a

m

Environment Chart

b 3 d � � 2

#_a

m_

Environment Chart

b 3 d � � 2

#_a

m_�

Environment Chart

b 3 d � � 2

#_a

m_�

#

Environment Chart

b 3 d � � 2

#_a

m_�

#_

Environment Chart

b 3 d � � 2

#_a

m_�

#_o

Environment Chart

b 3 d � � 2

#_a

m_�

#_o

m

Environment Chart

b 3 d � � 2

#_a

m_�

#_o

m_

Environment Chart

b 3 d � � 2

#_a

m_�

#_o

m_l

Environment Chart

b 3 d � � 2

#_a

m_�

#_o

m_l

#

Environment Chart

b 3 d � � 2

#_a

m_�

#_o

m_l

#_

Environment Chart

b 3 d � � 2

#_a

m_�

#_o

m_l

#_a

Environment Chart

b 3 d � � 2

#_a e_e

m_�

#_o

m_l

#_a

Environment Chart

b 3 d � � 2

#_a e_e

m_� a_e

#_o

m_l

#_a

Environment Chart

b 3 d � � 2

#_a e_e #_e a_a

m_� a_e #_a o_e

#_o #_ a_o

m_l #_i

#_a n_a

Environment Chart

b 3 d � � 2

#_a e_e #_e a_a #_a o_a

m_� a_e #_a o_e #_ e_a

#_o #_ a_o u_a

m_l #_i

#_a n_a

Environment Chart

b 3 d � � 2

#_a e_e #_e a_a #_a o_a

m_� a_e #_a o_e #_ e_a

#_o #_ a_o u_a

m_l #_i

#_a n_a

How do we Identify Phonemes?

• Step 1: Ascertain the environments of sounds

under investigation. � Done

• Step 2: Analyse patterns observed in their

distribution:

• The general assumption is that a phoneme has a

regular distribution, occuring in all environments

• #_, _#, C_V, C_C, V_C, V_V

• Where do the fricative allophones occur?

Generalising the Patterns

b 3 d � � 2

#_a e_e #_e a_a #_a o_a

m_� a_e #_a o_e #_ e_a

#_o #_ a_o u_a

m_l #_i

#_a n_a

#_V

C_V

C_C

V_V

How do we Identify Phonemes?

• Step 1: Ascertain the environments of sounds

under investigation � Done

• Step 2: Analyse patterns observed in their

distribution � Done

• Step 3: Determine the rule(s) required to

describe the pattern you observe:

• Related allophones should have phonetic

similarity (share at least two articulatory labels)

Three Rules for Three Phonemes

• This rule states /b/ becomes a fricative with it occurs intervocallically

• This rule states /d/ is realised as [�] intervocallically, and as [d] in all

other contexts

• /g/ is produced as a fricative when it is in between two vowels

[ � ] / V_V

/d/

[ d ] / elsewhere

[ 3 ] / V_V

/b/

[ b ] / elsewhere

[ 2 ] / V_V

/g/

[ � ] / elsewhere

Consider the following data from English

and in particular, the sounds [ �i, ai ]

b�it bite bai by b�aib bribe

sai sigh baid bide ��is rice

��ip ripe w�if wife �ail rile

dai� dire �aiz rise dai die

nain nine d�ik dike waiv wives

�ai rye �ai guy daim dime

Environment Chart

ai �i

s_# �_# b_t

d_� �_d �_p

n_n �_l w_f

�_# d_# d_k

b_# w_v �_s

b_d d_m

Distributions

ai �i

s_# �_# b_t

d_� �_d �_p

n_n �_l w_f

�_# d_# d_k

b_# w_v �_s

b_d d_m

C_#

C_CC_C

C1 =

s,d,n, ,b,g,d,w

C2 =

#, ,n,d,l,v,m

C1 =

b, ,w,d

C2 =

t,p,f,k,s

Proposed Rule

[ �i ] / followed by voiceless sounds

/ai/

[ ai ] / elsewhere

• This rule reads “/ai/ is realised as [ �i ] before

voiceless sounds, and as [ai] in other

environments

Consider the following data from Farsi

and in particular, the sounds [ r, r6, ]

[ rah ] road [ ruz ] day

[ baz�ir6 ] towel [ si ini ] pastry

[ zi a ] because [ omr6 ] life

[ ran ] paint [ bar� ] leaf

[ farsi ] Persian [ bi an ] pale

Environments / Distributions

r r6

#_a i_# i_a

#_a m_# i_i

a_s i_a

#_u

a_�

#_V

V_C

V_#

C_# V_V

Proposed Rule for Farsi

• This rule reads “/r/ is realised as [ r6 ] word

finally, as [ ] intervocallically and as [ r ] in

all other environments

[ ] / V_V

/r/ [ r6 ] / _#

[ r ] / elsewhere

Speech Perception

• The phoneme thus far has been deduced from

transcriptions...speech production data

• The phoneme is as much a speech perception

entity as a speech production one

• Our conception of the phoneme needs to be

broader

• What set of sounds are included in perceptual

phoneme /t/?

• [t*], [t] [t], [s], but also [ ] and [ 8 ]

Speech Perception

• What’s involved?

• Decoding acoustic information.

• Determining perceptual (acoustic) cues.

• Ascertaining potential articulatory correlate

Acoustics

• Consult Ladefoged (2006), Fry (1979), Rogers

(2000) or Hayward (2000) for greater detail

• What are sounds acoustically made of?

• We can use a spectrogram to see...

Brief Intro to Spectography

• Spectography is a means to graphically

represent sounds

• A spectrogram is a 3-dimensional depiction

illustrating:

– frequency of a sound

– loudness of a sound

– change over time

What is a Spectrum?

• A spectrum is a recipe of the ingredients of a

sound

• The loudness of sound is measured in dB

(decibels)

• The frequency of a sound is measure in Hz

(Hertz) = how often a sound repeats every

second

FYI: dB is a non-linear scale

- 0 dB → threshold of audibility [ NOT no sound ]

- 10 dB → rustle of leaves

- 20 dB → ticking of watch at ear

- 30 dB → whispered conversation

- 60 dB → conversation at 1 m

- 70 dB → busy traffic

- 90 dB → pneumatic drill at 1 m

- 100 dB → car horn at 5 m

- 120 dB → amplified rock band

- 130 dB → aeroplane at 30 m

Sample Spectrum

(Frequency Domain)

Spectrogram of Same Spectrum

(Time Domain)

Vowels and their formants

• Formants are the primary perceptual cues for

vowels. F1, F2, F3 ...

• Formants are region of energy within the

frequency spectrum

• As a rule of thumb, there is approximant 1

formant per 1,000 Hz.

• Experiments have proved that only the first

two formants are necessary for vowel

perception

850 Hz

Graphing Vowels using their

formants

i a u

250 Hz

2500 Hz

1500 Hz

300 Hz

900 Hz

Graphing Vowels using their

formants (charting 1st Formant

against 2nd Formant

F1

F2

200

400

600

800

2500 2000 1500 1000 500

i

u

a

Consonants and their formant

transitions

• Stops/Plosives are articulatory configurations

involving no airflow

• No airflow = (almost) no acoustic output

• There may be evidence of vocal fold vibration

• However, stops can be characterised by the

formant transitions following or approaching

vowels

Consonants and their formant

transitions

i b i

Consonants and their formant

transitions

The Frequency of Fricatives

• Fricatives involve a turbulent airflow

• On a spectrogram, we see ‘noise’ not a

formant structure

• The frequency range differentiates between

fricatives

The Frequency of Fricatives

f � s �

Other Sounds

• Nasals, laterals and rhotics (r-sounds) all have

formant structure, resembling vowels

• However, these sounds are less sonorous

(quieter), and appear less dark (= less energy)

on a spectrogram

• Laterals and Rhotics form a natural class

called liquids

• Rhotics exhibit a prominent lowering of F3,

laterals do not.

Nasals

a m a a n a

Laterals and Rhotics

a l a a a

Is that all?!!

• There are numerous acoustic cues for every

sound.

• Problem of invariance...as well as

articulatory/acoustic mismatch.

• However, this “redundancy” in perceptual cues

is good for communication purposes.

• Don’t always work

• My slip of the ear ≈ “felines do not emit

sounds like canines”

Speech Perception

• Speech seem to utilise categorical perception

• Despite articulatory/acoustic domain being

continuous in nature, we perceive categories

• We can be relatively insensitive to some

changes in acoustics, but at certain thresholds

we become aware of differences

Perception of Place

Speech Perception: Duplex & Bimodal

• What do you think this noise is?

• Or this?

• Now can you guess what it is?

• Outside of a context, speech sounds don’t have

to sound like speech!

• What have your eyes got to do with perceiving

speech?

Theories of Speech Perception

• There are many competing theories

• Can be divided into:

• Passive theories - essentially cue counting

• Active theories - using detected cues to infer

speaker’s articulatory actions

• Probably a mixture of both required.

When Perception Meets Phonology

• How many vowels do you hear in this?

• Answer: As many as we want!!!

• You also perceive the sounds you expect to

hear!

Dealing with Patterns• Consider the following data from Scottish English:

Bruce [ b �s ] Greece [ g is ]

spoon [ sp�n ]

brute [ b �t ] greet [ g it ]brew [ b � ]

brood [ b �d ] greed [ g id ]

brewed [b �9d] agreed [ �g i9d]

• “Scottish Vowel Length Rule”: (in part) morphologically

conditioned

• Also notice, in Scottish English loose is [l�s], while look is

[l�k]: Irish English realisations would be [ lus ] and [ lUk ]

respectively. How do we explain this?

Looking at Diversity• How can we make generalisations for the diversity of

patterns witnessed in English vowel systems?

• Wells (1982) “Accents of English” proposes “lexical sets”

for English which can be defined as “groups of words which

share a phonetic feature”

KIT (ship, rip, dim)

DRESS (step, ebb, hem)

TRAP (bad, cab, ham)

LOT (stop, rob, swan)

STRUT (cub, rub, hum)

FOOT (full, look, could)

BATH (staff, clasp, dance)

CLOTH (cough, long, gone)

NURSE (hurt, term, work)

FLEECE (seed, key, seize)

FACE (weight, rein, steak)

PALM (calm, bra, father)

THOUGHT (taut, hawk, broad)

GOAT (soap, soul, home)

GOOSE (who, group, few)

PRICE (ripe, tribe, aisle)

CHOICE (boy, void coin)

MOUTH (pouch, noun, crowd)

NEAR (beer, pier, fierce)

SQUARE (care, air, wear)

START (far, sharp, farm)

NORTH (war, storm, for)

FORCE (floor, coarse, ore)

CURE (poor, tour, fury)

Patterns in Lexical Sets

• Dialects can be described in terms of the phonetic quality of the

unit illustrated by a particular lexical set

• Dialects can also be described in terms contrasts between lexical

sets, or the sets may pattern together or merge.

• For example:

• In Scottish English, GOOSE is [�]. There is a merger between

GOOSE and FOOT sets.

• In RP (Received Pronunciation, BBC English), there is a merger

between THOUGHT, NORTH and FORCE (RP is “non-rhotic”

no post vocalic [�])• In Irish English, GOOSE can range from [u] to [�:]. Merger

between TRAP and BATH (and possibly PALM). Is Sam =

psalm? NORTH/FORCE contrast?

New Zealand Vowels

• Listen to this sample of New Zealand English

• In particular, listen out for words from the

TRAP, DRESS, and KIT sets.

New Zealand Vowels

u

æ

æ = [+front, +low]

Detail Voice : VOT

• Lisker & Abrambson (1968) proposed Voice Onset Time

English /b/

English /p/ [ - voice]

[ + voice]

Phonological Features

• Also called phonetic features or distinctive

features

• Used to capture classes of sounds

• Sonorants are [+sonorant]

• Obstruents are [-sonorant]

• Many features...see handout

Using Phonological Features

• In the Spanish dataset, we proposed three rules.

• But the rules were very similar...better to

express patterns as generalisation

• Fricatives are [+continuant]

• Our three rules become one:

• [-sonorant, +voiced] → [+continuant] / V_V

• Voiced stops become fricatives intervocallically

And finally, some Perceptual Magic

• Speech Perception is non trivial

• Listen to this: “slit”

• Broken into “s” and “lit”

• Add silence to “s”

• Now listen to “s” plus silence plus “lit”

top related