computational perceptionlewicki/cp-s08/auditory-coding1.pdf · what principles should guide the...
TRANSCRIPT
Computational Perception15-485/785
Auditory Coding 1
February 5, 2008
Michael S. Lewicki ◇ Carnegie MellonCP08:: Auditory Coding 1
What are the problems of sensory coding?
• What should the sensor sense?
• How is energy transduced?
• How to deal with noise?
• How to compress dynamic range?
• How to prevent the sensor from being damaged?
2
Michael S. Lewicki ◇ Carnegie MellonCP08:: Auditory Coding 1
Two approaches to the study of systems
1. Experimental/behavioral approach:
• describe and characterize behavior
• understand range and limitations
• investigate system properties and organization
• develop theories to better understand functional roles
2. Theoretical/computational approach:
• define problem
• develop models and algorithms
• understand range and limitations
• develop more algorithms: more general/specialized; faster/less resources
3
Michael S. Lewicki ◇ Carnegie MellonCP08:: Auditory Coding 1
The complexity of the auditory system
from Warren, 1999
The cochlea and inner ear
4
Michael S. Lewicki ◇ Carnegie MellonCP08:: Auditory Coding 1
What principles should guide the choice of representation?
5
Unsupervised approaches:
• find useful “features”
• adapted to the patterns of interest
• useful in a wide range of tasks
Supervised approaches:
• Maximize performance on given task
At low-levels, we have to use unsupervised approaches.
Michael S. Lewicki ◇ Carnegie MellonCP08:: Auditory Coding 1
Linear superposition
6
Linear superposition
Goal is to describe the data to desired precision.Code signal by linear superposition of basis functions:
x = !a1s1 + !a2s2 + · · · + !aLsL + !"
= As + !
• x(t) is represented by a vector x• !ai are the basis vectors• A is the basis (could be Fourier, wavelet, etc.)• si are the coe!cients
Can solve for s in the no noise case
s = A!1x
Computational Perception and Scene Analysis, Jan 29, 2004 / Michael S. Lewicki, CMU !! !
!
? 19
Michael S. Lewicki ◇ Carnegie MellonCP08:: Auditory Coding 1
An information theoretic approachInformation Theoretic Approaches
Want algorithm to choose optimal A (basis matrix).
Generative model for data is:x = As + !
Probability of pattern x given representation s
P (x|A, s) ! f(x"As,!, I)
Computational Perception and Scene Analysis, Jan 29, 2004 / Michael S. Lewicki, CMU !! !
!
? 207
Michael S. Lewicki ◇ Carnegie MellonCP08:: Auditory Coding 1
Learning objectiveLearning Objective
Objective: maximize coding e!ciency! maximize probability of data ensemble
Probability of pattern ensemble is:
P (x1,x2, ...,xN |A) =!
k
P (xk|A)
Computational Perception and Scene Analysis, Jan 29, 2004 / Michael S. Lewicki, CMU !! !
!
? 228
Michael S. Lewicki ◇ Carnegie MellonCP08:: Auditory Coding 1
Optimal coding of an acoustic waveformE!cient coding: focus on coding waveform directly
• We do not assume a Fourier or spectral representation.• Goal:
Predict optimal transformation of acoutsic waveformfrom statistics of the acoustic environment.
• Use a simple model: bank of linear filters
Computational Perception and Scene Analysis, Jan 29, 2004 / Michael S. Lewicki, CMU !! !
!
? 239
Michael S. Lewicki ◇ Carnegie MellonCP08:: Auditory Coding 1
Goal : Encode the patters to desiredprecision:
x = !a1s1 + · · · + !aLsL + !"
= As + !
Posterior :
P (s|x,A) =P (s)P (x|s,A)
P (x|A)
Prior : si’s are independent and sparse:
P (s) =!
i
P (si)
P (si) ! exp""
####si
#i
####qi$
Coding patterns with a statistical model
x = !a1s1 + !a2s2
!a1
!a2
Michael S. Lewicki ◇ Carnegie MellonCP08:: Auditory Coding 1
Goal : Encode the patters to desiredprecision:
x = !a1s1 + · · · + !aLsL + !"
= As + !
Posterior :
P (s|x,A) =P (s)P (x|s,A)
P (x|A)
Prior : si’s are independent and sparse:
P (s) =!
i
P (si)
P (si) ! exp""
####si
#i
####qi$
Coding patterns with a statistical model
q=4
q=2 (Gaussian)
q=1 (Laplacian)
q=0.5
Michael S. Lewicki ◇ Carnegie MellonCP08:: Auditory Coding 1
Goal : Encode the patters to desiredprecision:
x = !a1s1 + · · · + !aLsL + !"
= As + !
Posterior :
P (s|x,A) =P (s)P (x|s,A)
P (x|A)
Prior : si’s are independent and sparse:
P (s) =!
i
P (si)
P (si) ! exp""
####si
#i
####qi$
Likelihood : Assume ! # Gaussian,
P (x|s,!) ! exp""1
2!T!!1!
$
Inference: use the MAP value:
s = arg maxs
P (s|x,A)
Simple special case: no noise (ICA)
s = A!1x
Inference (or recognition or coding):
finds most e!cient representation ofpattern x in a given basis A
Coding patterns with a statistical model
Michael S. Lewicki ◇ Carnegie MellonCP08:: Auditory Coding 1
Learning: Optimizing the model parameters
Learning objective:
maximize coding e!ciency! maximize P (x|A) over A.
Probability of pattern ensemble is:
P (x1,x2, ...,xN |A) =!
k
P (xk|A)
P (x|A) is obtained by marginalization:
P (x|A) ="
dsP (x|A, s)P (s)
=P (s)
|detA|
Use independent component analysis(ICA) to learn A:
!A " AAT !!A log P (x|A)
= #A(zsT # I) ,
where z = (log P (s))!. Assumegeneralized Gaussians:
P (si) $ N qi(si|µ,!).
This learning rule:
• learns the feature set that captures themost structure
• optimizes basis to maximize thee!ciency of the code
Michael S. Lewicki ◇ Carnegie MellonCP08:: Auditory Coding 1
Learning the optimal codes
Learning procedure:
- random sound segments (8 msec)
- optimize features using ICA
What sounds to use?
Predict optimal transformation of sound waveform from statistics of the acoustic environment
Goal:
Use a variety of sound ensembles:
- non-harmonic environmental sounds (e.g. footsteps, stream sounds, etc.)
- animal vocalizations (rainforest mammals, e.g. chirps, screeches, cries, etc.)
- speech (samples from 100 male & female speakers from the TIMIT corpus)
What tasks are auditory systems adapted to do?
- localization ⇒ environmental sounds
- communication ⇒ vocalizations
- general sound recognition
14
Michael S. Lewicki ◇ Carnegie MellonCP08:: Auditory Coding 1
vocalizations
environmental sounds
transient ambient
Natural sounds
15
fox
squirrel
walking on leaves
rustling leaves
cracking branches
stream by waterfall
Michael S. Lewicki ◇ Carnegie MellonCP08:: Auditory Coding 1
vocalizations
environmental sounds
transient ambient
Natural sounds
16
fox
squirrel
walking on leaves
rustling leaves
cracking branches
stream by waterfall
Michael S. Lewicki ◇ Carnegie MellonCP08:: Auditory Coding 1
vocalizations
environmental sounds
transient ambient
Natural sounds
17
fox
squirrel
walking on leaves
rustling leaves
cracking branches
stream by waterfall
Michael S. Lewicki ◇ Carnegie MellonCP08:: Auditory Coding 1
vocalizations
environmental sounds
transient ambient
Natural sounds
18
fox
squirrel
walking on leaves
rustling leaves
cracking branches
stream by waterfall
Michael S. Lewicki ◇ Carnegie MellonCP08:: Auditory Coding 1
vocalizations
environmental sounds
transient ambient
Natural sounds
19
fox
squirrel
walking on leaves
rustling leaves
cracking branches
stream by waterfall
Michael S. Lewicki ◇ Carnegie MellonCP08:: Auditory Coding 1
vocalizations
environmental sounds
transient ambient
Natural sounds
20
fox
squirrel
walking on leaves
rustling leaves
cracking branches
stream by waterfall
Michael S. Lewicki ◇ Carnegie MellonCP08:: Auditory Coding 1
vocalizations
environmental sounds
transient ambient
Natural sounds
21
fox
squirrel
walking on leaves
rustling leaves
cracking branches
stream by waterfall
Michael S. Lewicki ◇ Carnegie MellonCP08:: Auditory Coding 1
Optimal linear filters for natural sounds
environmental sounds vocalizations
speech
The optimal code depends on theclass of sounds being encoded: - a wavelet-like transform is best for environmental sounds - a Fourier-like transform is best for vocalizations - an intermediate transform is best for speech or general natural sounds
23
Michael S. Lewicki ◇ Carnegie MellonCP08:: Auditory Coding 1
Characterizing the filter population
0 2 4 6 8time (ms)
!"# $ % # $!!&!!'!!%!!$!!
()*+,*-./012345!"# $ % # $!
!&!!'!!%!!$!!
()*+,*-./012345
0 2 4 6 8time (ms)
time-frequency distributions
24
Michael S. Lewicki ◇ Carnegie MellonCP08:: Auditory Coding 1
Schematic time-frequency distributions
time time
Fourier typical wavelet
frequ
ency
25
0 2 4 6 80
1
2
3
4
5
6
7
time (ms)
frequ
ency
(kHz
)
0 2 4 6 80
1
2
3
4
5
6
7
8
time (ms)
frequ
ency
(kHz
)
0 2 4 6 80
1
2
3
4
5
6
7
time (ms)
frequ
ency
(kHz
)
Env.Sounds
Speech
Vocalizations
0 2 4 6 8time (ms)
0 2 4 6 8time (ms)
0.5 1 2 5 10−40−30−20−10
0
frequency (kHz)
0 2 4 6 8time (ms)
0.5 1 2 5 10−40−30−20−10
0
frequency (kHz)0.5 1 2 5 10
−40−30−20−10
0
frequency (kHz)
0.5 1 2 5 10−40−30−20−10
0
frequency (kHz)0.5 1 2 5 10
−40−30−20−10
0
frequency (kHz)0.5 1 2 5 10
−40−30−20−10
0
frequency (kHz)
0 2 4 6 8time (ms)
0 2 4 6 8time (ms)
0 2 4 6 8time (ms)
0 2 4 6 8time (ms)
0 2 4 6 8time (ms)
0.5 1 2 5 10−40−30−20−10
0
frequency (kHz)0.5 1 2 5 10
−40−30−20−10
0
frequency (kHz)
0 2 4 6 8time (ms)
0.5 1 2 5 10−40−30−20−10
0
frequency (kHz)
vocalizations
mag
nitu
de (d
B)
speech
mag
nitu
de (d
B)m
agni
tude
(dB)
env. sounds
Michael S. Lewicki ◇ Carnegie MellonCP08:: Auditory Coding 1
0.2 0.5 1 2 5 10 20
1
2
5
10
20
center frequency (kHz)
Q10
dB
+ vocalizationso speech/combinedx environmental sounds
Theory
0.2 0.5 1 2 5 10 20
1
2
5
10
20
center frequency (kHz)
Q10
dB
Theory
0.2 0.5 1 2 5 10 20
1
2
5
10
20
characteristic frequency (kHz)
Q10
dB
Evans, 1975Rhode and Smith, 1985
0.2 0.5 1 2 5 10 20
1
2
5
10
20
characteristic frequency (kHz)
Q10
dB
Evans, 1975Rhode and Smith, 1985
Comparison to cat auditory nerve data
Q10dB = fc/w10dB
Data
Filter sharpness:
28
Next time:non-linear coding