why light · web viewphoneme: the shortest segment of speech that, if changed, would change the...

25
Chapter 1 – Perceiving Speech The external stimuli for various perceptual experiences . . . Psychological Characteristic External Stimulus Characteristic Brightness of lights Intensity of lights Hue Wavelength (OK, so the relationship is not obvious) Loudness of sounds Intensity of sounds Pitch of sounds Frequency of sounds Words?? What are the external stimulus characteristics that result in differences in words? If we wanted to have a machine make speech, what would we tell it to do? The Sounds of Speech – G9 p 318 The building blocks of speech – phonemes. Phoneme: The shortest segment of speech that, if changed, would change the meaning of a word English: 37 phonemes – 24 consonants and 15 vowels Vowel: (from Wikipedia): a sound in spoken language, such as an English ah! [ɑː] or oh! [oʊ], pronounced with an open vocal tract so that there is no build-up of air pressure at any point above the glottis. Glottis: the space in between the vocal folds (see below) Consonant: (from Wikipedia): a speech sound that is articulated with complete or partial closure of the vocal tract Topic 19: Speech Intro - 1 3/21/5

Upload: trancong

Post on 01-Apr-2018

214 views

Category:

Documents


1 download

TRANSCRIPT

Chapter 1 – Perceiving Speech

The external stimuli for various perceptual experiences . . .

Psychological Characteristic External Stimulus Characteristic

Brightness of lights Intensity of lightsHue Wavelength (OK, so the relationship is not obvious)Loudness of sounds Intensity of soundsPitch of sounds Frequency of sounds

Words?? What are the external stimulus characteristics that result in differences in words?

If we wanted to have a machine make speech, what would we tell it to do?

The Sounds of Speech – G9 p 318The building blocks of speech – phonemes.Phoneme: The shortest segment of speech that, if changed, would change the meaning of a word

English: 37 phonemes – 24 consonants and 15 vowels

Vowel: (from Wikipedia): a sound in spoken language, such as an English ah! [ɑː] or oh! [oʊ], pronounced with an open vocal tract so that there is no build-up of air pressure at any point above the glottis.

Glottis: the space in between the vocal folds (see below)

Consonant: (from Wikipedia): a speech sound that is articulated with complete or partial closure of the vocal tract

Different languages have different phonemes and different numbers of phonemes.

Representing Phonemes visually: International Phonetic Alphabet (IPA): A special set of symbols with a unique symbol or combination of symbols and accent marks for each different phoneme.

Ability to hear and distinguish between phonemes develops and begins to be refined at an early age. G9 p 333

By adulthood, it is very difficult for us to appreciate or learn to appreciate differences in phonemes of languages other than our own.

Topic 19: Speech Intro - 1 3/21/5

Producing Speech – Y1 p. 358, G9 p 318

The physical movements of vocal tract which accompany the production of a phoneme.

Topic 19: Speech Intro - 2 3/21/5

Oral cavity

Nasal cavity

Tongue

Vowels G9 p 319–

` Sound is primarily the result of vibration of the vocal folds/cords in the larynx.

Males: Fundamental frequencies range from 85 – 180 Hz

Females: Fundamental frequencies range from 165-255 Hz.

Children: Fundamental frequencies are higher than 300 Hz

Different vowel sounds are produced by changing the shape of the oral cavity

The vocal chords produce essentially the same sound for all vowels.

We change the shape of the oral cavity to produce the different vowel sounds.

Changing the shape of the oral cavity changes the spectrum of the sound waves produced by the vibrations of the vocal folds.

Certain harmonics are attenuated relative to others.

Topic 19: Speech Intro - 3 3/21/5

The different vowel soundsHere’s the spectrum of “ahh” just beyond the vocal folds (Y1 – p 359)Note the full complement of harmonics. (“Ahh, if you could hear my ahh as I hear it.”)

Here’s the effect of the oral cavity on the various components of the sound produced by the vocal folds

And here’s the spectrum of the resulting sound, after being filtered by the oral cavity

The groups of high intensity frequencies are called formants.

Most vowels have 2 or 3 prominent formants.

Topic 19: Speech Intro - 4 3/21/5

The formants of a couple of common vowel sounds

/I/ the i sound in zip

From G9, p 319

/U/ the u sound in “put”

Topic 19: Speech Intro - 5 3/21/5

F1

F2

F3

F1

F2

F3

Spectrogram – a graphical way of displaying speech sounds over time

From G9, p 319A spectrogram of the word, “had”.

Note that frequency is on the vertical axis.

Time is the horizontal axis.

Intensity is represented by the darkness of the ink.

Here’s a SigView spectrogram of Mike’s “had”. Note that the vertical axis is upside down.

I believe that the lowest frequency band is noise caused by my computer fan. There should be only three bands – one at around 600-800 Hz, one at about 1500-1500 Hz, and one at about 2200-2400 Hz.

Topic 19: Speech Intro - 6 3/21/5

?

F1

F2

F3

Consonants are produced based on three characteristics or combinations of them

1) Place of articulation – for American English

Place at which air stream is obstructed

A) Alveolar ridge – d, t

B) Bottom lip and upper front teeth – /f/ /v/C) Bilabial – both lips ‘m’ ‘w’

2) Manner of articulation

A) Stop – air stopped, then release /d/ /t/B) Fricative – air forced through a small opening - /f/ /v/C) Affricates – stop, then release, as in /chip/C) Nasal – air forced through nose /m/D) Approximants – /w/

3) Voicing

Whether the vocal chords vibrate during production of the consonant/f/ vs /v/

Consonants do NOT have “nice” spectrographic displays, as do vowels.

Topic 19: Speech Intro - 7 3/21/5

Issues in the study of speech perception . . . G9 p 320

1. Variability due to changes in context – called the co-articulation issue

A given phoneme may have a very different spectrogram in one context vs. another and yet be perceived to be the same phoneme.

Consider the phoneme, /d/, as in /di/ and /du/. The /d/ sounds the same in both, but its frequencies have been shown to be different.

When followed by /i/, the frequencies that make up the sound, /d/, rise, as shown in blue.

But when followed by /u/, the same /d/ sound has very different frequencies

This is an example of variability in the sound stimulus due to differences in context – variability that we do not perceive.

As Goldstein mentions, this is an example of a perceptual constancy – a perception of no change when the stimulus actually changes quite a bit. Phonemic constancy?

Topic 19: Speech Intro - 8 3/21/5

2. Variability due to different speakers G9 p 321

a. Differences in fundamental frequencies between speakers

b. Variations in the spectrum – the formants - of each vowel in different people, times, and dialects. Suppose several different people say, “eeee”. The frequencies of the formants are different for those different people, yet we perceive “eeee” from all.

c. Differences in both of the above in in the same speaker at different times – in sickness and in health

All of these contribute to enormous variability in the acoustic properties of individual phonemes spoken by different people or the same people at different times, of words comprised of those phonemes, of speech comprised of those words.

Yet we are able to extract the correct phonemes from that variability and understand the speech they represent.

This is another constancy issue – just like shape constancy or color constancy – our experience is constant despite the fact that the external stimuli are continually changing.

It might be argued that the most work the brain does is the processing required to keep our experience constant across changes in the stimulus.

This is a major problem in machine speech recognition, although it is being solved as this is written. Try “Siri”.

Of course, figuring out how to get a machine to recognize speech and figuring out how we recognize speech are two different things.

Topic 19: Speech Intro - 9 3/21/5

3. Identification of boundaries between phonemes. -Categorical Perception

This issue refers to the fact that we categorize sounds as being identical across a range of changes in physical characteristics – another example of constancy, by the way.

Example of the perception of /t/ vs /d/ depending on the range of Voice Onset Times (VOTs).

When we say, /ta/, there is a fairly long delay between the /t/ sound and the /a/ sound. This delay is called the Voice Onset Time of the syllable.

When we say, /da/, there is a fairly short delay between the /d/ sound and the /a/ sound.

Play “Mike’s ta and da.wav” using Audacity

See below for a description of the effect of VOT on perception

Topic 19: Speech Intro - 10 3/21/5

Eirnas & Corbit, 1973 Study Demonstrating Categorical Perception. G9 p 323

Presented a sound stimulus in which the the Voice Onset Time was manipulated in short segments.

The sound stimulus began with a combination /t//d/ sound, which was the same for all presentations.

The /a/ sound that followed was also the same for all presentations.

The only thing that changed was the Voice Onset Time – the time between the presentation of /t//d/ and the presentation of /a/. That VOT varied in small increments.

Dependent Variable: The researchers asked participants to indicate whether or not they heard /da/.

This effect is very much like our perception of colors across continuous changes in the wavelength of light. Green is green until it’s blue or yellow.

At all short wavelengths, we perceive blue, analagous to (/da/) in this example.

At all longer wavelengths, we perceive green, analagous to (/ta/ ) in this example.

Hue perception analogy . . .

Topic 19: Speech Intro - 11 3/21/5

Sound is perceived as /ta/ across all VOTs from 50-60

Sound is perceived as /da/ across all VOTs from 0-30

4. Use of facial cues to identify phonemes G98 p 324

The McGurk effect – a demonstration of the importance of facial cues

A person listens to the sounds /ba-ba/ and reports hearing /ba-/ba/.

Then a visual cue of someone miming /ga-ga/ in synch with the sound, /ba-ba/ is added.

When the visual cue is added, the listener now reports hearing, /da-da/.

The Web VL 13.2 demonstrating the McGurk effect is in the link below.

Listen first with eyes closed.

Then listen again with eyes open and notice how the sight of the speaker’s lip movements changes what you hear.

C:\Users\Michael\Desktop\Desktop folders\Class Videos\McGurk Effect from G9 Web VL 13p2.mp4

HAVE STUDENTS CLOSE THEIR EYES BEFORE BEGINNING.

Topic 19: Speech Intro - 12 3/21/5

Perception: /da-da/

Sight: /ga-ga/

Sound: /ba-ba/

5. Using meaning to identify phonemes and words.

5a. We use the meaningfulness of combinations of words to aid perception..Which of the following would be easiest to understand?

Gadgets simplify work around the house.versusGadgets kill passengers from the eyes.versusBetween gadgets highways passengers the steal.

5b. We use the rules of grammar.

There is no time to questionversusQuestion, no time there is

5c. We use meaning to create perceptual breaks between words.

How do we know when one phoneme ends and another begins, when one word ends and another begins?

Boundaries are sometimes present as silence in the auditory stream. These are easy.

More often, there is no easily identifiable separation between sounds representing successive phonemes.

Play Mike’s “Now is the time for all good men to come to the aid of their party” using Audacity.

“to the aid of” has no breaks in it.

...to the aaaaaaaaaaaaiiiiiiiiiiiiddd of C:\Users\Michael\Desktop\Desktop folders\Class Videos\To the aid of.mp4

The point is that there are no absolute silences in the above waveform, yet we are able to perceive the words that are being spoken.

The absence of acoustic breaks in our own speech is made quite obvious when we listen to someone speak in a foreign language and try to identify the breaks between words.

The breaks are there, as anyone who can speak the language will tell you.

Topic 19: Speech Intro - 13 3/21/5

But many of them are identified not by silence, but by content.

Topic 19: Speech Intro - 14 3/21/5

Neural Basis of Speech Perception and ProductionFrom Yantis – to show the differences between the hemispheres

From Goldstein

Note that the two speech areas –Wernicke’s area and Broca’s area– are only in the left hemisphere for most people.

Wernicke’s (Vair – nicka) area – left temporal lobe – an area that’s involved in speech recognition

Broca’s area – An area that’s probably specialized for speech production – left frontal lobe

Topic 19: Speech Intro - 15 3/21/5

Two speech streams . . . G9 p 329.As Goldstein says on p 329, it has been proposed that there are two large streams associated with speech recognition and production.

The yellow ventral stream – a what pathway – contains neurons that are involved in speech recognition.

The blue stream – a where/action pathway – is responsible for speech production.

Topic 19: Speech Intro - 16 3/21/5

Language Disorders

Aphasia - Problem with speech

Some specific types

Wernicke’s (pronounced Vair – nicka) Aphasia, also known as Sensory aphasia.

Inability to comprehend words or to arrange sounds into coherent speech.

It’s been proposed that sufferers can’t hear phonemic distinctions, just as Japanese can’t hear the difference between /r/ and /l/. So can’t comprehend speech.

May speak, but the phonemes get mixed up, creating word salad. It’s sometimes called fluent aphasia – aphasia with words.

Play Class Videos \Wernicke's Aphasia recordings of patients.mp4

Broca’s Aphasia, also known as Motor, Expressive, or Nonfluent aphasia.

Difficulty in speaking, although persons with this aphasia continue to understand speech.

Sufferers peak very slowly, deliberately using very simple grammatical structure. All forms of a verb are likely to be reduced to the infinitive or participle. Nouns expressed only in the singular and conjunctions, adjectives, adverbs, and articles are very uncommon

Play the 9 min compilation video of Sarah Scott, a stroke victim suffering from Expressive Aphasia. C:\Users\Michael\Desktop\Desktop folders\Class Videos\Sarah Scott to 6 years poststroke.mp4

Below are links to the complete collection of interviews.http://www.youtube.com/watch?v=1aplTvEQ6ew (7:19 min). 2009http://www.youtube.com/watch?v=6zNKz7YoUao (6:04 min). 16 months post-strokehttp://www.youtube.com/watch?v=1l9P4H1BKEU (12:24 min). May 2011 2 years post-strokehttp://www.youtube.com/watch?v=rUTHNS45Qmc (9:56 min). 3 years post-strokehttp://www.youtube.com/watch?v=WiKc2y_Hc2w January, 2013 4 years post-strokehttps://www.youtube.com/watch?v=a9z6eX85Zn4 6 years after the stroke.

Topic 19: Speech Intro - 17 3/21/5

The End.

Topic 19: Speech Intro - 18 3/21/5