computational perceptionlewicki/cp-s08/auditory-coding1.pdf · what principles should guide the...

Computational Perception15-485/785

Auditory Coding 1

February 5, 2008

Michael S. Lewicki ◇ Carnegie MellonCP08:: Auditory Coding 1

What are the problems of sensory coding?

• What should the sensor sense?

• How is energy transduced?

• How to deal with noise?

• How to compress dynamic range?

• How to prevent the sensor from being damaged?

2


Two approaches to the study of systems

1. Experimental/behavioral approach:

• describe and characterize behavior

• understand range and limitations

• investigate system properties and organization

• develop theories to better understand functional roles

2. Theoretical/computational approach:

• define problem

• develop models and algorithms

• understand range and limitations

• develop more algorithms: more general/specialized; faster/less resources

3


The complexity of the auditory system

from Warren, 1999

The cochlea and inner ear

4


What principles should guide the choice of representation?

5

Unsupervised approaches:

• find useful “features”

• adapted to the patterns of interest

• useful in a wide range of tasks

Supervised approaches:

• Maximize performance on given task

At low-levels, we have to use unsupervised approaches.


Linear superposition

6

Linear superposition

Goal is to describe the data to desired precision.Code signal by linear superposition of basis functions:

x = !a1s1 + !a2s2 + · · · + !aLsL + !"

= As + !

• x(t) is represented by a vector x• !ai are the basis vectors• A is the basis (could be Fourier, wavelet, etc.)• si are the coe!cients

Can solve for s in the no noise case

s = A!1x

Computational Perception and Scene Analysis, Jan 29, 2004 / Michael S. Lewicki, CMU !! !

!

? 19


An information theoretic approachInformation Theoretic Approaches

Want algorithm to choose optimal A (basis matrix).

Generative model for data is:x = As + !

Probability of pattern x given representation s

P (x|A, s) ! f(x"As,!, I)


!

? 207


Learning objectiveLearning Objective

Objective: maximize coding e!ciency! maximize probability of data ensemble

Probability of pattern ensemble is:

P (x1,x2, ...,xN |A) =!

k

P (xk|A)


!

? 228


Optimal coding of an acoustic waveformE!cient coding: focus on coding waveform directly

• We do not assume a Fourier or spectral representation.• Goal:

Predict optimal transformation of acoutsic waveformfrom statistics of the acoustic environment.

• Use a simple model: bank of linear filters


!

? 239


Goal : Encode the patters to desiredprecision:

x = !a1s1 + · · · + !aLsL + !"

= As + !

Posterior :

P (s|x,A) =P (s)P (x|s,A)

P (x|A)

Prior : si’s are independent and sparse:

P (s) =!

i

P (si)

P (si) ! exp""

####si

#i

####qi$

Coding patterns with a statistical model

x = !a1s1 + !a2s2

!a1

!a2



x = !a1s1 + · · · + !aLsL + !"

= As + !

Posterior :


P (x|A)


P (s) =!

i

P (si)

P (si) ! exp""

####si

#i

####qi$


q=4

q=2 (Gaussian)

q=1 (Laplacian)

q=0.5



x = !a1s1 + · · · + !aLsL + !"

= As + !

Posterior :


P (x|A)


P (s) =!

i

P (si)

P (si) ! exp""

####si

#i

####qi$

Likelihood : Assume ! # Gaussian,

P (x|s,!) ! exp""1

2!T!!1!

$

Inference: use the MAP value:

s = arg maxs

P (s|x,A)

Simple special case: no noise (ICA)

s = A!1x

Inference (or recognition or coding):

finds most e!cient representation ofpattern x in a given basis A



Learning: Optimizing the model parameters

Learning objective:

maximize coding e!ciency! maximize P (x|A) over A.

Probability of pattern ensemble is:

P (x1,x2, ...,xN |A) =!

k

P (xk|A)

P (x|A) is obtained by marginalization:

P (x|A) ="

dsP (x|A, s)P (s)

=P (s)

|detA|

Use independent component analysis(ICA) to learn A:

!A " AAT !!A log P (x|A)

= #A(zsT # I) ,

where z = (log P (s))!. Assumegeneralized Gaussians:

P (si) $ N qi(si|µ,!).

This learning rule:

• learns the feature set that captures themost structure

• optimizes basis to maximize thee!ciency of the code


Learning the optimal codes

Learning procedure:

- random sound segments (8 msec)

- optimize features using ICA

What sounds to use?

Predict optimal transformation of sound waveform from statistics of the acoustic environment

Goal:

Use a variety of sound ensembles:

- non-harmonic environmental sounds (e.g. footsteps, stream sounds, etc.)

- animal vocalizations (rainforest mammals, e.g. chirps, screeches, cries, etc.)

- speech (samples from 100 male & female speakers from the TIMIT corpus)

What tasks are auditory systems adapted to do?

- localization ⇒ environmental sounds

- communication ⇒ vocalizations

- general sound recognition

14


vocalizations

environmental sounds

transient ambient

Natural sounds

15

fox

squirrel

walking on leaves

rustling leaves

cracking branches

stream by waterfall


vocalizations


transient ambient

Natural sounds

16

fox

squirrel

walking on leaves

rustling leaves

cracking branches

stream by waterfall


vocalizations


transient ambient

Natural sounds

17

fox

squirrel

walking on leaves

rustling leaves

cracking branches

stream by waterfall


vocalizations


transient ambient

Natural sounds

18

fox

squirrel

walking on leaves

rustling leaves

cracking branches

stream by waterfall


vocalizations


transient ambient

Natural sounds

19

fox

squirrel

walking on leaves

rustling leaves

cracking branches

stream by waterfall


vocalizations


transient ambient

Natural sounds

20

fox

squirrel

walking on leaves

rustling leaves

cracking branches

stream by waterfall


vocalizations


transient ambient

Natural sounds

21

fox

squirrel

walking on leaves

rustling leaves

cracking branches

stream by waterfall


Optimal linear filters for natural sounds

environmental sounds vocalizations

speech

The optimal code depends on theclass of sounds being encoded: - a wavelet-like transform is best for environmental sounds - a Fourier-like transform is best for vocalizations - an intermediate transform is best for speech or general natural sounds

23


Characterizing the filter population

0 2 4 6 8time (ms)

!"# $ % # $!!&!!'!!%!!$!!

()*+,*-./012345!"# $ % # $!

!&!!'!!%!!$!!

()*+,*-./012345

0 2 4 6 8time (ms)

time-frequency distributions

24


Schematic time-frequency distributions

time time

Fourier typical wavelet

frequ

ency

25

0 2 4 6 80

1

2

3

4

5

6

7

time (ms)

frequ

ency

(kHz

)

0 2 4 6 80

1

2

3

4

5

6

7

8

time (ms)

frequ

ency

(kHz

)

0 2 4 6 80

1

2

3

4

5

6

7

time (ms)

frequ

ency

(kHz

)

Env.Sounds

Speech

Vocalizations

0 2 4 6 8time (ms)

0 2 4 6 8time (ms)

0.5 1 2 5 10−40−30−20−10

0

frequency (kHz)

0 2 4 6 8time (ms)

0.5 1 2 5 10−40−30−20−10

0

frequency (kHz)0.5 1 2 5 10

−40−30−20−10

0

frequency (kHz)

0.5 1 2 5 10−40−30−20−10

0


−40−30−20−10

0


−40−30−20−10

0

frequency (kHz)

0 2 4 6 8time (ms)

0 2 4 6 8time (ms)

0 2 4 6 8time (ms)

0 2 4 6 8time (ms)

0 2 4 6 8time (ms)

0.5 1 2 5 10−40−30−20−10

0


−40−30−20−10

0

frequency (kHz)

0 2 4 6 8time (ms)

0.5 1 2 5 10−40−30−20−10

0

frequency (kHz)

vocalizations

mag

nitu

de (d

B)

speech

mag

nitu

de (d

B)m

agni

tude

(dB)

env. sounds


0.2 0.5 1 2 5 10 20

1

2

5

10

20

center frequency (kHz)

Q10

dB

+ vocalizationso speech/combinedx environmental sounds

Theory

0.2 0.5 1 2 5 10 20

1

2

5

10

20

center frequency (kHz)

Q10

dB

Theory

0.2 0.5 1 2 5 10 20

1

2

5

10

20

characteristic frequency (kHz)

Q10

dB

Evans, 1975Rhode and Smith, 1985

0.2 0.5 1 2 5 10 20

1

2

5

10

20

characteristic frequency (kHz)

Q10

dB

Evans, 1975Rhode and Smith, 1985

Comparison to cat auditory nerve data

Q10dB = fc/w10dB

Data

Filter sharpness:

28

Next time:non-linear coding

computational perceptionlewicki/cp-s08/auditory-coding1.pdf · what principles should guide the...

Documents