rob van der willigen http://~robvdw/cnpa04/coll1/audperc_2007_p8.ppt auditory perception

Rob van der WilligenRob van der Willigenhttp://~robvdw/cnpa04/coll1/AudPerc_2007_Phttp://~robvdw/cnpa04/coll1/AudPerc_2007_P88.ppt.ppt

Auditory PerceptionAuditory Perception

Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture. Today’s goal

Understanding the problem of

Auditory Scene Analysis (ASA):

- Higher-levels principles of organization

- Complex Waveforms analysis

- Neural Activity Patterns (NAP) analysis

Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture. OBJECTS?

Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture.

Psychoacoustics

The Problem of Auditory Scene Analysis (ASA)

“Conversion of auditory sensory input into a adequate representation of reality.”

ASA allows an organism to obtain and react appropriately to complex sounds from the environment

Albert S. Bregman (Auditory scene analysis, 1999; p. 1)

“Only by being aware of how sound is created and shaped in the world can we know how to use it to derive the properties of the sound-producing events around us”


Psychoacoustics

ASA: Objects compared to Streams

In vision we intuitively focus on objects.

In fact the visual system uses light reflections to form separate descriptions of the individual objects. These descriptions include the object’s shape, size, distance, color etc.

But how is object information determined from sound?


Psychoacoustics

Information Carrying Capacity and ASA

For an ideal coding system, Shannon showed that

where C is the channel capacity, B is the channel bandwidth (in Hz), S and N are the average received signal and noise powers respectively, and the noise is additive white Gaussian noise.

This Equation is referred to as the Shannon–Hartley law and it acts as abenchmark for various practical modulation/demodulation schemes since it defines the absolute maximum information rate, R, which can be reliably (without error) sent over the channel.

Eyes 2x108 receptors 2x106 Axons Capacity (bits/s) 107Ears 3x104 receptors 2x104 Axons Capacity (bits/s) 105

]/)[1(log2 sbitsN

SBC


Psychoacoustics


Eyes 2x108 receptors 2x106 Axons Capacity (bits/s) 107

Ears 3x104 receptors 2x104 Axons Capacity (bits/s) 105

]/)[1(log2 sbitsN

SBC


Psychoacoustics


Ears 3x104 receptors 2x104 Axons Capacity (bits/s) 105

Intensity differences of 1 dB over a range of about 120 dB

120 levels can encode 7 bits (2^7=128).

24 nonoverlaping frequency bands

]/)[1(log2 sbitsN

SBC


Intelligibility of natural sounds (INS)Acoustic Input is Tonotopic

TONOTOPIC PLACE MAP

LOGARITHMIC:

20 Hz -> 200 Hz

2kH -> 20 kHz

each occupies 1/3 of the basilar membrane

Analogy with Visual Input


Intelligibility of natural sounds (INS)

Most studies on auditory perception involve measurements of neuronal responses to acousticallysimple sounds such as short duration pure tones.

The logic behind this approach stems from the classic portrayal of auditory processing as spectral decomposition. Accordingly, it is well known that tonotopic `maps’ exist in every mammalian auditory cortex studied to date.

Within these maps, neurons are `tuned' to particular sound frequencies organized from low to high across the cortex, preserving the topography observed in the cochlea.

Traditional Approach to INS

The outline of the response area (red) represents the pure tone tuning curve or frequency threshold curve.

The frequency for which threshold in dB is minimum is the characteristic frequency (CF).



Representation of Natural Sounds

Any acoustic signal can be graphically or mathematically depicted in either of two domains: Temporal domain versus Spectral domain

Spectrogram:2D plot of log energy (power) across time and log frequency scale

Natural Sounds:Human speech

Animal vocalizations


offset synchrony

onset synchrony

common AmplitudeModulation (AM)

Continuity:FrequencyModulation (FM)sweep

“… pure pleasure … ”

harmonicity


Simultaneous (spectral, harmonic) integration:

• simultaneity of onsets• coherence of changes

– frequency, SPL, spectral envelope

• Harmonicity

Sequential (temporal, melodic) integration:

• proximity (pitch, time, location)

• similarity (timbre, loudness) • lack of sudden changes

“information bearing elements” of Natural sounds



“information bearing elements” of Natural sounds

are difficult to separate!

Primates, including humans, discriminate and identify natural sounds by selectively attending to spectral and/or temporal modulations.

Paradoxically, time and frequency are difficult to separate because natural sounds are highly characterized by FM-sweeps.

Important Concepts

• Intelligibility of natural soundsClassic acoustic features such as,

harmonics, across-frequency patterns of coherent amplitude modulations and simultaneous onsets/offsets, cannot fully account for INS

In natural sounds it is important to distinguish between spectrotemporal properties of the envelope and those of the fine structure.

• FM sweepThe combined modulation in frequency and time.

• InseparabilityMathematically, this means that the entire

spectrogram cannot be described by the outer product of one temporal and one spectral function.

• Spectral-temporal elementsInformation bearing elements of natural sounds

that can be separated.

• SpectrotemporalInformation bearing elements of natural sounds that cannot be separated.

Frequency (FM) and (AM) amplitude modulations are prominent features of animal vocalizations and human speech, which play an important role in the communication of information (Hauser, 1996)


Modulated envelope of a Dynamic Ripple:

Velocity : ω = 2 [cycles per sec,Hz]Density: Ω = 0.4 [cycles per octave]Phase: ϕ = -90Amplitude: ΔM = 10 [relative dB]

with a 10 dB amplitude modulation around a 60 dB base, with spectral and temporal 1-D sections. Ripple phase changes linearly with time and spectral position (in octaves) (Depireux, 2001; Kowalski et al., 1996a,b)

Frequency Modulation (FM) Approach to INS


Contrary to what one would expect if frequency decomposition were the primary encoding mechanism of auditory system, pure tones are not the best stimuli for evoking responses.

Using reverse correlation techniques, deCharms et al.1998 showed that complex stimuli consisting of trains of Frequency Modulation (FM) sweeps, evoke much higher firing rates than best-frequency pure tones.

FM sweeps can be described as Moving Auditory Gratings, or Dynamic Ripples.

An idealized STRF:

FM-responsive cells in IC (monkey) and cortex (monkey, cat, ferret) show selectivity for stimulus parameters such as direction and speed of the FM (Versnel & Opstal 2009;Shamma et al. 1993).

STRF


How is object information determined from sound?


Psychoacoustics



Psychoacoustics

The Problem of Auditory Scene Analysis (ASA)

Complex waveforms are the Superposition of all individual sounds plus the acoustic effects of the environment between transmitter and receiver.

Input for the auditory system are complex waveforms.

An important part of building a representation of individual sounds is to determine which parts of the sensory stimulation (complex waveforms) originate from the same event and environmental object.

The individual waveforms are not easily recognizable from the mixture .

Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture. The Problem of ASA

Problem I: Sound localization can only result from the neural processing of acoustic cues in the tonotopic input of the (two) ear(s)!

Problem II: How does the auditory system parse the superposition of distinct sounds into the original acoustic input?

Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture. The Problem of HearingThe Problem of Hearing

Tonotopie blijft in het auditief systeem tot en met de auditieve hersenschorsbehouden.

“De samenstelling van een geluid uit afzonderlijke tonen is te vergelijken met de manier waaropwit licht in afzonderlijke kleuren uiteenvalt wanneer het door een prisma gaat .”

John A.J. van Opstal (Al kijkend hoort men, 2006; p. 8)John A.J. van Opstal (Al kijkend hoort men, 2006; p. 8)


Psychoacoustics


Streams play the same role in audition as objects do in vision.

The conceptual point is that the notion of objects and streams require a mental representation of the visual and auditory input, respectively.

The line drawing shows an object with three legs or does it?Can the same occur in the auditory domain?

Gestalt psychologists would argue that “laws” of perceptual organization are innate and transcend modality.


Psychoacoustics

Gestalt Principles in Vision

Proximity: – grouping of nearby dots

Similarity: – grouping of similar dots

Closure: – recognition of incomplete patterns

Good continuation: – e.g. 2 lines crossing


Proximity

Why perceive rows vs. columns?

Psychoacoustics



Similarity

Why perceive rows vs. columns?

Psychoacoustics



Closure

Psychoacoustics



Good continuation

Psychoacoustics



Psychoacoustics

Gestalt Principles in Audition ?

Questions:

1. Which factors determine an “auditory object”?

2. Which cues separate “auditory objects” from each other?

3. Which rules organize our auditory perceptions?

Seminal book:“Auditory Scene Analysis” (Albert S. Bregman, 1990)


Listeners are capable of parsing an acoustic scene (a complex sound) to form a mental representation of each sound source – stream – in the perceptual process of auditory scene analysis (Bregman, 1990) from events to streams

Two conceptual processes of ASA:Segmentation. Decompose the acoustic mixture into

sensory elements (segments)Grouping. Combine segments into streams, so that

segments in the same stream originate from the same source

Psychoacoustics

Gestalt Principles in Audition ?


Psychoacoustics


AI CSAITT STIOTOS

AI CSAITT STIOTOS

The upper pattern of letters appear to be not meaningful

The lower pattern HAS meaning due to spatial segregation.


Psychoacoustics


Stream segregation in aCycle of six tones:

An example of good continuation / Proximity?

http://www.psych.mcgill.ca/labs/auditory/bregmancd.html

Compact disk of demonstrations of auditory scene analysis


Psychoacoustics



An example of good continuation / Proximity?


Psychoacoustics


Stream segregation due to Frequency gap:

An example of Good continuation/Proximity?

Loss of rhythmic information as a result of stream segregation.

When a repeating cycle breaks into two streams, the rhythm of the full sequence is lost and replaced by those of the component streams (Panel 1). This change can be heard clearly if the rhythm of the whole sequence is quite different from those of the component streams. In the present example, we use triplets of tones separated by silences, HLH-HLH-HLH-... (where H represents a high tone, L a low one, and the hyphen corresponds to a silence equal in duration to a single tone).

We perceive this pattern as having a galloping rhythm.

An interesting fact about this pattern is that when it breaks up into high and low streams, neither the high nor the low one has a galloping rhythm. We hear two concurrent streams of sound in each of which the tones are isochronous (equally spaced in time).


Psychoacoustics


“Good continuation” dominates

“Pitch proximity”dominates


Psychoacoustics



An example of Good continuation/Proximity?

http://www.psych.mcgill.ca/labs/auditory/bregmancd.html

Compact disk of demonstrations of auditory scene analysis


Psychoacoustics


Gliding tone through a noise burst:

An example of good continuation?


Psychoacoustics

Gestalt Principles in Speech Perception

offset synchrony

onset synchrony

common AM

continuity


harmonicity


Psychoacoustics

Gestalt Principles in Speech Perception

offset synchrony

onset synchrony

continuity


harmonicity


Psychoacoustics

Gestalt Principles in Audition

Auditory peripheral processing amounts to a decomposition of the acoustic signal.

ASA cues essentially reflect structural coherence of a sound source.

A subset of cues believed to be strongly involved in ASA:

Simultaneous organization: Periodicity, temporal modulation, onset.

Sequential organization: Location, pitch contour and other source characteristics (e.g. vocal tract).


Psychoacoustics

Gestalt Principles in Audition

Sequential (temporal, melodic) integration• proximity (pitch, time, location)• similarity (timbre, loudness) • lack of sudden changes

Simultaneous (spectral, harmonic) integration• simultaneity of onsets• coherence of changes

– frequency, SPL, spectral envelope

• harmonicity


Multiple positions have identical ILD, ITD

Psychoacoustics

Auditory versus Visual Scene Analysis


Ele

vati

on

(d

eg)

-40

-20

0

+20

+40

+60

Frequency kHz

Am

pli

tud

e (d

B)

In humans mid-frequencies also exhibit a prominent notch that varies in frequency with changes in sound source elevation (6 – 11 kHz)

Elevation

Psychoacoustics



Distance estimation: Determine how far away a sound is.

Cue: relative amounts of direct vs. reverberant energy a closer sound more direct energy

Psychoacoustics



Distance estimation: Determine how far away a sound is.

Cue: relative amounts of direct vs. reverberant energy a closer sound more direct energy

Psychoacoustics

Affect of reverberation on speech signal

Top is original, clean signal.Bottom is signal convolved with room IRF.

“two oh six”


Psychoacoustics

Initial reflections contribute to sense of room character

zero point (first reflection) ommitted• initial part consists of discrete echos• later part is more continous• echos can enhance intelligibility.

How?• best concert halls have initialtime-delay gap of 15-30 ms.

How to deal with reverberation?• Use better, more directional,microphones to minimize energy fromReflections

• use microphone arrays (e.g. inconference rooms) to do noise (i.e.echo) cancellation


Psychoacoustics


Gestalt principles focus on similarities between the different Modalities such as vision and audition, but there are differences as well due to the difference in physical properties.

In audition sound-emitting properties rather than sound-reflecting properties of the environment are important.

Sound is used to discover the time and frequency pattern of the source not its spatial shape. In other words acoustic events are transparent; they do not occlude energy from what lies behind.

Echoes (reflections) obscure the original properties of sounds. Although echoes are delayed copies (containing all the original information) the superposition of the original sound and its echoes creates redundant information.

Acoustic information can only effectively used from large objects such as rooms or mountains. Only than effects of the environment between transmitter and receiver are noticeable.


Psychoacoustics

How does ASA work ? What does the brain Need/Do?

Spectrogram:Plot of log energy across time and frequency (linear frequency scale)

Cochleogram:Cochlear filtering by the gamma-tone filter-bank (or other models of cochlear filtering), followed by a stage of nonlinear rectification; the latter corresponds to hair cell transduction by either a hair cell model or simple compression operations (log and cube root)

Quasi-logarithmic frequency scale, and filter bandwidth is frequency-dependent

Previous work suggests better resilience to noise than spectrogram

At a coarse temporal scale, cochleograms and spectrograms look similar except forthe scale of the frequency axis.

Spectrogram

Cochleogram


Psychoacoustics


Simulation of (a) the basilar membrane motion, (b) the neural activity pattern (NAP), and (c) the stabilized auditory image produced by a pulse train with a rate of 125 pulses per second.

The narrow, low-frequency filters isolate individual harmonics of 125 Hz; the broader high-frequency filters emit impulse responses (a).

The transduction process compresses the dynamic range and sharpens the features in the pattern at the same time. The temporal integration mechanism stabilizes the pattern and removes global phase differences (c).

The auditory image produced in response to a single acoustic pulse is shown on an expanded time scale in (d).

NAP in response to repetitive clicks (b) NAP in response to a single click (d)


Psychoacoustics


Binaural Cue

Extraction

Pattern Analysis

Azimuth Localization

Target

Noise

Auditory Filterbanks

L

R

Resynthesis


Psychoacoustics

Fundamental problem of ASA

Auditory scene analysis requires:

Analysis over long time windowsAnalysis over broad spectral widths

A sensitive auditory system requires:

Analysis over very short time windowsAnalysis over narrow frequency bands

rob van der willigen http://~robvdw/cnpa04/coll1/audperc_2007_p8.ppt auditory perception

Documents

auditory perception

axons capacity bitss

receptors 2x10

channel capacity

psychoacoustics asa

asa eyes 2x10

visual input slide

psychoacoustics information