rob van der willigen http://~robvdw/cnpa04/coll1/audperc_2007_p8.ppt auditory perception
Post on 20-Dec-2015
214 views
TRANSCRIPT
Rob van der WilligenRob van der Willigenhttp://~robvdw/cnpa04/coll1/AudPerc_2007_Phttp://~robvdw/cnpa04/coll1/AudPerc_2007_P88.ppt.ppt
Auditory PerceptionAuditory Perception
Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture. Today’s goal
Understanding the problem of
Auditory Scene Analysis (ASA):
- Higher-levels principles of organization
- Complex Waveforms analysis
- Neural Activity Patterns (NAP) analysis
Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture. OBJECTS?
Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture.
Psychoacoustics
The Problem of Auditory Scene Analysis (ASA)
“Conversion of auditory sensory input into a adequate representation of reality.”
ASA allows an organism to obtain and react appropriately to complex sounds from the environment
Albert S. Bregman (Auditory scene analysis, 1999; p. 1)
“Only by being aware of how sound is created and shaped in the world can we know how to use it to derive the properties of the sound-producing events around us”
Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture.
Psychoacoustics
ASA: Objects compared to Streams
In vision we intuitively focus on objects.
In fact the visual system uses light reflections to form separate descriptions of the individual objects. These descriptions include the object’s shape, size, distance, color etc.
But how is object information determined from sound?
Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture.
Psychoacoustics
Information Carrying Capacity and ASA
For an ideal coding system, Shannon showed that
where C is the channel capacity, B is the channel bandwidth (in Hz), S and N are the average received signal and noise powers respectively, and the noise is additive white Gaussian noise.
This Equation is referred to as the Shannon–Hartley law and it acts as abenchmark for various practical modulation/demodulation schemes since it defines the absolute maximum information rate, R, which can be reliably (without error) sent over the channel.
Eyes 2x108 receptors 2x106 Axons Capacity (bits/s) 107Ears 3x104 receptors 2x104 Axons Capacity (bits/s) 105
]/)[1(log2 sbitsN
SBC
Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture.
Psychoacoustics
Information Carrying Capacity and ASA
Eyes 2x108 receptors 2x106 Axons Capacity (bits/s) 107
Ears 3x104 receptors 2x104 Axons Capacity (bits/s) 105
]/)[1(log2 sbitsN
SBC
Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture.
Psychoacoustics
Information Carrying Capacity and ASA
Ears 3x104 receptors 2x104 Axons Capacity (bits/s) 105
Intensity differences of 1 dB over a range of about 120 dB
120 levels can encode 7 bits (2^7=128).
24 nonoverlaping frequency bands
]/)[1(log2 sbitsN
SBC
Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture.
Intelligibility of natural sounds (INS)Acoustic Input is Tonotopic
TONOTOPIC PLACE MAP
LOGARITHMIC:
20 Hz -> 200 Hz
2kH -> 20 kHz
each occupies 1/3 of the basilar membrane
Analogy with Visual Input
Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture.
Intelligibility of natural sounds (INS)
Most studies on auditory perception involve measurements of neuronal responses to acousticallysimple sounds such as short duration pure tones.
The logic behind this approach stems from the classic portrayal of auditory processing as spectral decomposition. Accordingly, it is well known that tonotopic `maps’ exist in every mammalian auditory cortex studied to date.
Within these maps, neurons are `tuned' to particular sound frequencies organized from low to high across the cortex, preserving the topography observed in the cochlea.
Traditional Approach to INS
The outline of the response area (red) represents the pure tone tuning curve or frequency threshold curve.
The frequency for which threshold in dB is minimum is the characteristic frequency (CF).
Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture.
Intelligibility of natural sounds (INS)
Representation of Natural Sounds
Any acoustic signal can be graphically or mathematically depicted in either of two domains: Temporal domain versus Spectral domain
Spectrogram:2D plot of log energy (power) across time and log frequency scale
Natural Sounds:Human speech
Animal vocalizations
Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture.
offset synchrony
onset synchrony
common AmplitudeModulation (AM)
Continuity:FrequencyModulation (FM)sweep
“… pure pleasure … ”
harmonicity
Intelligibility of natural sounds (INS)
Simultaneous (spectral, harmonic) integration:
• simultaneity of onsets• coherence of changes
– frequency, SPL, spectral envelope
• Harmonicity
Sequential (temporal, melodic) integration:
• proximity (pitch, time, location)
• similarity (timbre, loudness) • lack of sudden changes
“information bearing elements” of Natural sounds
Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture.
Intelligibility of natural sounds (INS)
“information bearing elements” of Natural sounds
are difficult to separate!
Primates, including humans, discriminate and identify natural sounds by selectively attending to spectral and/or temporal modulations.
Paradoxically, time and frequency are difficult to separate because natural sounds are highly characterized by FM-sweeps.
Important Concepts
• Intelligibility of natural soundsClassic acoustic features such as,
harmonics, across-frequency patterns of coherent amplitude modulations and simultaneous onsets/offsets, cannot fully account for INS
In natural sounds it is important to distinguish between spectrotemporal properties of the envelope and those of the fine structure.
• FM sweepThe combined modulation in frequency and time.
• InseparabilityMathematically, this means that the entire
spectrogram cannot be described by the outer product of one temporal and one spectral function.
• Spectral-temporal elementsInformation bearing elements of natural sounds
that can be separated.
• SpectrotemporalInformation bearing elements of natural sounds that cannot be separated.
Frequency (FM) and (AM) amplitude modulations are prominent features of animal vocalizations and human speech, which play an important role in the communication of information (Hauser, 1996)
Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture.
Modulated envelope of a Dynamic Ripple:
Velocity : ω = 2 [cycles per sec,Hz]Density: Ω = 0.4 [cycles per octave]Phase: ϕ = -90Amplitude: ΔM = 10 [relative dB]
with a 10 dB amplitude modulation around a 60 dB base, with spectral and temporal 1-D sections. Ripple phase changes linearly with time and spectral position (in octaves) (Depireux, 2001; Kowalski et al., 1996a,b)
Frequency Modulation (FM) Approach to INS
Intelligibility of natural sounds (INS)
Contrary to what one would expect if frequency decomposition were the primary encoding mechanism of auditory system, pure tones are not the best stimuli for evoking responses.
Using reverse correlation techniques, deCharms et al.1998 showed that complex stimuli consisting of trains of Frequency Modulation (FM) sweeps, evoke much higher firing rates than best-frequency pure tones.
FM sweeps can be described as Moving Auditory Gratings, or Dynamic Ripples.
An idealized STRF:
FM-responsive cells in IC (monkey) and cortex (monkey, cat, ferret) show selectivity for stimulus parameters such as direction and speed of the FM (Versnel & Opstal 2009;Shamma et al. 1993).
STRF
Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture.
How is object information determined from sound?
Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture.
Psychoacoustics
ASA: Objects compared to Streams
Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture.
Psychoacoustics
ASA: Objects compared to Streams
Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture.
Psychoacoustics
ASA: Objects compared to Streams
Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture.
Psychoacoustics
The Problem of Auditory Scene Analysis (ASA)
Complex waveforms are the Superposition of all individual sounds plus the acoustic effects of the environment between transmitter and receiver.
Input for the auditory system are complex waveforms.
An important part of building a representation of individual sounds is to determine which parts of the sensory stimulation (complex waveforms) originate from the same event and environmental object.
The individual waveforms are not easily recognizable from the mixture .
Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture. The Problem of ASA
Problem I: Sound localization can only result from the neural processing of acoustic cues in the tonotopic input of the (two) ear(s)!
Problem II: How does the auditory system parse the superposition of distinct sounds into the original acoustic input?
Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture. The Problem of HearingThe Problem of Hearing
Tonotopie blijft in het auditief systeem tot en met de auditieve hersenschorsbehouden.
“De samenstelling van een geluid uit afzonderlijke tonen is te vergelijken met de manier waaropwit licht in afzonderlijke kleuren uiteenvalt wanneer het door een prisma gaat .”
John A.J. van Opstal (Al kijkend hoort men, 2006; p. 8)John A.J. van Opstal (Al kijkend hoort men, 2006; p. 8)
Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture.
Psychoacoustics
ASA: Objects compared to Streams
Streams play the same role in audition as objects do in vision.
The conceptual point is that the notion of objects and streams require a mental representation of the visual and auditory input, respectively.
The line drawing shows an object with three legs or does it?Can the same occur in the auditory domain?
Gestalt psychologists would argue that “laws” of perceptual organization are innate and transcend modality.
Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture.
Psychoacoustics
Gestalt Principles in Vision
Proximity: – grouping of nearby dots
Similarity: – grouping of similar dots
Closure: – recognition of incomplete patterns
Good continuation: – e.g. 2 lines crossing
Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture.
Proximity
Why perceive rows vs. columns?
Psychoacoustics
Gestalt Principles in Vision
Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture.
Similarity
Why perceive rows vs. columns?
Psychoacoustics
Gestalt Principles in Vision
Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture.
Closure
Psychoacoustics
Gestalt Principles in Vision
Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture.
Good continuation
Psychoacoustics
Gestalt Principles in Vision
Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture.
Psychoacoustics
Gestalt Principles in Audition ?
Questions:
1. Which factors determine an “auditory object”?
2. Which cues separate “auditory objects” from each other?
3. Which rules organize our auditory perceptions?
Seminal book:“Auditory Scene Analysis” (Albert S. Bregman, 1990)
Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture.
Listeners are capable of parsing an acoustic scene (a complex sound) to form a mental representation of each sound source – stream – in the perceptual process of auditory scene analysis (Bregman, 1990) from events to streams
Two conceptual processes of ASA:Segmentation. Decompose the acoustic mixture into
sensory elements (segments)Grouping. Combine segments into streams, so that
segments in the same stream originate from the same source
Psychoacoustics
Gestalt Principles in Audition ?
Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture.
Psychoacoustics
ASA: Objects compared to Streams
AI CSAITT STIOTOS
AI CSAITT STIOTOS
The upper pattern of letters appear to be not meaningful
The lower pattern HAS meaning due to spatial segregation.
Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture.
Psychoacoustics
ASA: Objects compared to Streams
Stream segregation in aCycle of six tones:
An example of good continuation / Proximity?
http://www.psych.mcgill.ca/labs/auditory/bregmancd.html
Compact disk of demonstrations of auditory scene analysis
Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture.
Psychoacoustics
ASA: Objects compared to Streams
Stream segregation in aCycle of six tones:
An example of good continuation / Proximity?
Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture.
Psychoacoustics
ASA: Objects compared to Streams
Stream segregation due to Frequency gap:
An example of Good continuation/Proximity?
Loss of rhythmic information as a result of stream segregation.
When a repeating cycle breaks into two streams, the rhythm of the full sequence is lost and replaced by those of the component streams (Panel 1). This change can be heard clearly if the rhythm of the whole sequence is quite different from those of the component streams. In the present example, we use triplets of tones separated by silences, HLH-HLH-HLH-... (where H represents a high tone, L a low one, and the hyphen corresponds to a silence equal in duration to a single tone).
We perceive this pattern as having a galloping rhythm.
An interesting fact about this pattern is that when it breaks up into high and low streams, neither the high nor the low one has a galloping rhythm. We hear two concurrent streams of sound in each of which the tones are isochronous (equally spaced in time).
Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture.
Psychoacoustics
ASA: Objects compared to Streams
“Good continuation” dominates
“Pitch proximity”dominates
Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture.
Psychoacoustics
ASA: Objects compared to Streams
Stream segregation in aCycle of six tones:
An example of Good continuation/Proximity?
http://www.psych.mcgill.ca/labs/auditory/bregmancd.html
Compact disk of demonstrations of auditory scene analysis
Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture.
Psychoacoustics
ASA: Objects compared to Streams
Gliding tone through a noise burst:
An example of good continuation?
Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture.
Psychoacoustics
Gestalt Principles in Speech Perception
offset synchrony
onset synchrony
common AM
continuity
“… pure pleasure … ”
harmonicity
Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture.
Psychoacoustics
Gestalt Principles in Speech Perception
offset synchrony
onset synchrony
continuity
“… pure pleasure … ”
harmonicity
Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture.
Psychoacoustics
Gestalt Principles in Audition
Auditory peripheral processing amounts to a decomposition of the acoustic signal.
ASA cues essentially reflect structural coherence of a sound source.
A subset of cues believed to be strongly involved in ASA:
Simultaneous organization: Periodicity, temporal modulation, onset.
Sequential organization: Location, pitch contour and other source characteristics (e.g. vocal tract).
Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture.
Psychoacoustics
Gestalt Principles in Audition
Sequential (temporal, melodic) integration• proximity (pitch, time, location)• similarity (timbre, loudness) • lack of sudden changes
Simultaneous (spectral, harmonic) integration• simultaneity of onsets• coherence of changes
– frequency, SPL, spectral envelope
• harmonicity
Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture.
Multiple positions have identical ILD, ITD
Psychoacoustics
Auditory versus Visual Scene Analysis
Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture.
Ele
vati
on
(d
eg)
-40
-20
0
+20
+40
+60
Frequency kHz
Am
pli
tud
e (d
B)
In humans mid-frequencies also exhibit a prominent notch that varies in frequency with changes in sound source elevation (6 – 11 kHz)
Elevation
Psychoacoustics
Auditory versus Visual Scene Analysis
Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture.
Distance estimation: Determine how far away a sound is.
Cue: relative amounts of direct vs. reverberant energy a closer sound more direct energy
Psychoacoustics
Auditory versus Visual Scene Analysis
Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture.
Distance estimation: Determine how far away a sound is.
Cue: relative amounts of direct vs. reverberant energy a closer sound more direct energy
Psychoacoustics
Affect of reverberation on speech signal
Top is original, clean signal.Bottom is signal convolved with room IRF.
“two oh six”
Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture.
Psychoacoustics
Initial reflections contribute to sense of room character
zero point (first reflection) ommitted• initial part consists of discrete echos• later part is more continous• echos can enhance intelligibility.
How?• best concert halls have initialtime-delay gap of 15-30 ms.
How to deal with reverberation?• Use better, more directional,microphones to minimize energy fromReflections
• use microphone arrays (e.g. inconference rooms) to do noise (i.e.echo) cancellation
Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture.
Psychoacoustics
Auditory versus Visual Scene Analysis
Gestalt principles focus on similarities between the different Modalities such as vision and audition, but there are differences as well due to the difference in physical properties.
In audition sound-emitting properties rather than sound-reflecting properties of the environment are important.
Sound is used to discover the time and frequency pattern of the source not its spatial shape. In other words acoustic events are transparent; they do not occlude energy from what lies behind.
Echoes (reflections) obscure the original properties of sounds. Although echoes are delayed copies (containing all the original information) the superposition of the original sound and its echoes creates redundant information.
Acoustic information can only effectively used from large objects such as rooms or mountains. Only than effects of the environment between transmitter and receiver are noticeable.
Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture.
Psychoacoustics
How does ASA work ? What does the brain Need/Do?
Spectrogram:Plot of log energy across time and frequency (linear frequency scale)
Cochleogram:Cochlear filtering by the gamma-tone filter-bank (or other models of cochlear filtering), followed by a stage of nonlinear rectification; the latter corresponds to hair cell transduction by either a hair cell model or simple compression operations (log and cube root)
Quasi-logarithmic frequency scale, and filter bandwidth is frequency-dependent
Previous work suggests better resilience to noise than spectrogram
At a coarse temporal scale, cochleograms and spectrograms look similar except forthe scale of the frequency axis.
Spectrogram
Cochleogram
Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture.
Psychoacoustics
How does ASA work ? What does the brain Need/Do?
Simulation of (a) the basilar membrane motion, (b) the neural activity pattern (NAP), and (c) the stabilized auditory image produced by a pulse train with a rate of 125 pulses per second.
The narrow, low-frequency filters isolate individual harmonics of 125 Hz; the broader high-frequency filters emit impulse responses (a).
The transduction process compresses the dynamic range and sharpens the features in the pattern at the same time. The temporal integration mechanism stabilizes the pattern and removes global phase differences (c).
The auditory image produced in response to a single acoustic pulse is shown on an expanded time scale in (d).
NAP in response to repetitive clicks (b) NAP in response to a single click (d)
Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture.
Psychoacoustics
How does ASA work ? What does the brain Need/Do?
Binaural Cue
Extraction
Pattern Analysis
Azimuth Localization
Target
Noise
Auditory Filterbanks
L
R
Resynthesis
Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture.
Psychoacoustics
Fundamental problem of ASA
Auditory scene analysis requires:
Analysis over long time windowsAnalysis over broad spectral widths
A sensitive auditory system requires:
Analysis over very short time windowsAnalysis over narrow frequency bands
Q ui ckTi me™ and a TI FF (LZW) decompressor are needed to see thi s pi cture.
Psychoacoustics
Fundamental problem of ASA
Auditory scene analysis requires:
Analysis over long time windowsAnalysis over broad spectral widths
A sensitive auditory system requires:
Analysis over very short time windowsAnalysis over narrow frequency bands