auditory objects of attention chris darwin university of sussex with thanks to : rob hukin (ra) nick...

Auditory Objects of Attention

Chris Darwin

University of Sussex

With thanks to :• Rob Hukin (RA)• Nick Hill (DPhil)• Gustav Kuhn (3° year proj)• MRC

Need for sound segregation

• Ears receive mixture of sounds

• We hear each sound source as having its own appropriate timbre, pitch, location

• Stored information about sounds (eg acoustic/phonetic relations) concerns a single source

Mechanisms of segregation

• Primitive grouping mechanisms based on general heuristics

• Schema-based mechanisms based on specific knowledge.

A Paradox

• We can attend to sounds coming from a particular direction– everyday experience

– Auditory RTs faster to cued side (Spence & Driver, 1994)

• Interaural time differences (ITDs) are the main cue to the location of a complex sound (Wightman & Kistler, 1992).

A Paradox

On the other hand

• ITDs are ineffective at grouping together sounds from a single sound source (Culling & Summerfield, 1995; Darwin & Hukin, 1995)

Culling & Summerfield (1995):4 noise bands

ITD versus ILD

CueILD ITD

AR EE AR EE

delay

delay

Treatment

Control

ILD segregates; ITD does not

0

25

50

75

100

ILD ITDLateralisation cue

Treatment

Control

Left cochlea Right cochlea

200 Hz

500 Hz

1000 Hz

2000 Hz

M S O

+600µs

-600µs

-600µs

+600µs

EE AR

Coincidence detection and ITD

Two models of attention

Establish ITD of frequency

components

Attend to common ITD across

components

Establish ITD of frequency

components

Group components by harmonicity, onset-time etc

Establish direction of grouped object

Attend to direction of

grouped object

Attend to common ITD Attend to direction of object

Peripheral filtering into frequency components

Peripheral filtering into frequency components

Plan

• check out Culling & Summerfield for more natural sounds

•Show evidence for grouping before across-frequency ITD calculated

• show that ITD can be a very powerful sequential grouping cue

Phoneme boundary shift

ILD condition

600-Hz

Target vowel /I/ or //

"Hello, you'll hear the sound X now"

no 600-Hz

Left

Right

ILD segregates; ITD does not

-10

0

10

20

30

40

ILD ITD

Vowel in SentenceVowel Alone

Phase Ambiguity500 Hz: period = 2ms

L lags by 1.5 ms L leads by 0.5 ms

LL R

cross-correlation peaks at +0.5ms and -1.5ms

auditory system weighted toone closest to zero

Disambiguating phase-ambiguity

• Narrowband noise at 500 Hz with ITD of 1.5 ms (3/4 cycle) heard at lagging side.

•Increasing noise bandwidth changes location to the leading side.

Explained by across-frequency consistency of ITD.

(Jeffress, Trahiotis & Stern)

Resolving phase ambiguity

500 Hz: period = 2ms

L lags by 1.5 ms or L leads by 0.5 ms ?

-2.5200

800

600

400

-0.5 1.5 3.5

Delay of cross-correlator ms

Fre

quency

of

audit

ory

filt

er

Hz

Cross-correlation peaks for noise delayed in one ear by 1.5 ms

300 Hz: period = 3.3ms

R R LL R

Actual delay

Left ear actually lags by 1.5 ms

L lags by 1.5 ms or L leads by 1.8 ms ?

R

Segregation by onset-time

200

400

600

800

Fre

quen

cy (

Hz)

Duration (ms)0 400

Duration (ms)0 80 400

Synchronous Asynchronous

ITD: ± 1.5 ms (3/4 cycle at 500 Hz)

Segregated tone changes location

-20

0

20

0 20 40 80

Onset Asynchrony (ms)

Poi

nter

IID

(dB

)

Pure

ComplexR L

Segregation by mistuning

200

400

600

800

Fre

quen

cy (

Hz)

Duration (ms)0 400

Duration (ms)0 80 400

In tune Mistuned

Mistuned tone changes location

Interim Summary

• ITD ineffective for simultaneous segregation

• Integration of ITD across frequency influenced by grouping cues

Question: Can attention be directed on the basis of ITD to grouped objects?

Attending to one sentence

Could you please write the word dog down now

…dog...

You’ll also hear the sound bird this time

Continuity of attention exptITD = + 45 µs

"Could you please write the word dog down now"

+ 45 µs + 45 µs

Fo = 106 Hz 106 Hz 100 Hz

ITD = - 45 µs

"You'll also hear the sound bird this time"

- 45 µs - 45 µs

Fo = 100 Hz 100 Hz106 Hz

1.0 2.0 s0.0

Continuity of Fo vs ITD

• Fo differences: 0, 1, 2, 4 semitones

• ITD differences: ± 45, 91, 181 µs

• Normal: Fo & ITD work together

• Switched: Fo & ITD opposed

Monotone Fo continuity ineffective

Continuity of ITD very effective

50

60

70

80

90

100

0 1 2 4

±45 µs

±91 µs

±181 µs

difference in Fo (semitones)

Summary

• ITD ineffective for simultaneous grouping

• ITD provides good spatial separation for grouped objects

• Monotone pitch contours ineffective for source continuity

New questions

• Reverberation?

• Natural prosody?

• Talker differences?

Simulated reverberant room

Reverberation impairs ITD

Natural prosodic contours

Natural prosody good against reverb

Vocal tract change

Me (m)

Higher pitch Shorter vocal-tract(higher formants)

Both (-> f)

Vocal tract good against reverb

0

20

40

60

80

100

Fo together Fo original Fo apart Fo original + VT

Effect of reverberation on relative strength of ITD, prosody and vocal tract

RT60 = 0

RT60 = 0.5 s

chan

ge in

% c

orre

ct b

y IT

D

whe

n op

pose

d b

y pr

osod

y

ITD = ±91 µs

Shadowing sentences

Jemma felt stiff and tired after 3 hours in the hot and stuffy room and she would have liked ||

…to go outdoors for a breath of fresh air

We had spent our entire time from Cairo to Luxor in a tiny bus with no proper windows and really wanted ||

…the air conditioning to be switched on

…liked the airconditioning...

Shadowing results

0

10

20

30

40

50

Normal Swapped

Same VT

Different VT

Sw

itche

s (a

gain

st I

TD

) in

sha

dow

ing

(%)

ITD = ±91 µs

p<0.05

p<0.05

p<0.002

+ITD+Prosody

+ITD+Prosody+Vocal Tract

+ITD-Prosody

+ITD-Prosody-Vocal Tract

Summary

• ITD no good for simultaneous grouping

• …but great for locating grouped objects

• ITD messed up by reverberation

• Prosody and speaker characteristics less

messed up by reverberation

auditory objects of attention chris darwin university of sussex with thanks to : rob hukin (ra) nick...

Documents

hz slide

itd slide

ild slide

time slide

r slide

right slide

tunemistuned slide

frequency hz duration