computational perceptionlewicki/cp-s08/auditory-coding1.pdf · what principles should guide the...

29
Computational Perception 15-485/785 Auditory Coding 1 February 5, 2008

Upload: others

Post on 10-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computational Perceptionlewicki/cp-s08/auditory-coding1.pdf · What principles should guide the choice of representation? 5 Unsupervised approaches: • find useful “features”

Computational Perception15-485/785

Auditory Coding 1

February 5, 2008

Page 2: Computational Perceptionlewicki/cp-s08/auditory-coding1.pdf · What principles should guide the choice of representation? 5 Unsupervised approaches: • find useful “features”

Michael S. Lewicki ◇ Carnegie MellonCP08:: Auditory Coding 1

What are the problems of sensory coding?

• What should the sensor sense?

• How is energy transduced?

• How to deal with noise?

• How to compress dynamic range?

• How to prevent the sensor from being damaged?

2

Page 3: Computational Perceptionlewicki/cp-s08/auditory-coding1.pdf · What principles should guide the choice of representation? 5 Unsupervised approaches: • find useful “features”

Michael S. Lewicki ◇ Carnegie MellonCP08:: Auditory Coding 1

Two approaches to the study of systems

1. Experimental/behavioral approach:

• describe and characterize behavior

• understand range and limitations

• investigate system properties and organization

• develop theories to better understand functional roles

2. Theoretical/computational approach:

• define problem

• develop models and algorithms

• understand range and limitations

• develop more algorithms: more general/specialized; faster/less resources

3

Page 4: Computational Perceptionlewicki/cp-s08/auditory-coding1.pdf · What principles should guide the choice of representation? 5 Unsupervised approaches: • find useful “features”

Michael S. Lewicki ◇ Carnegie MellonCP08:: Auditory Coding 1

The complexity of the auditory system

from Warren, 1999

The cochlea and inner ear

4

Page 5: Computational Perceptionlewicki/cp-s08/auditory-coding1.pdf · What principles should guide the choice of representation? 5 Unsupervised approaches: • find useful “features”

Michael S. Lewicki ◇ Carnegie MellonCP08:: Auditory Coding 1

What principles should guide the choice of representation?

5

Unsupervised approaches:

• find useful “features”

• adapted to the patterns of interest

• useful in a wide range of tasks

Supervised approaches:

• Maximize performance on given task

At low-levels, we have to use unsupervised approaches.

Page 6: Computational Perceptionlewicki/cp-s08/auditory-coding1.pdf · What principles should guide the choice of representation? 5 Unsupervised approaches: • find useful “features”

Michael S. Lewicki ◇ Carnegie MellonCP08:: Auditory Coding 1

Linear superposition

6

Linear superposition

Goal is to describe the data to desired precision.Code signal by linear superposition of basis functions:

x = !a1s1 + !a2s2 + · · · + !aLsL + !"

= As + !

• x(t) is represented by a vector x• !ai are the basis vectors• A is the basis (could be Fourier, wavelet, etc.)• si are the coe!cients

Can solve for s in the no noise case

s = A!1x

Computational Perception and Scene Analysis, Jan 29, 2004 / Michael S. Lewicki, CMU !! !

!

? 19

Page 7: Computational Perceptionlewicki/cp-s08/auditory-coding1.pdf · What principles should guide the choice of representation? 5 Unsupervised approaches: • find useful “features”

Michael S. Lewicki ◇ Carnegie MellonCP08:: Auditory Coding 1

An information theoretic approachInformation Theoretic Approaches

Want algorithm to choose optimal A (basis matrix).

Generative model for data is:x = As + !

Probability of pattern x given representation s

P (x|A, s) ! f(x"As,!, I)

Computational Perception and Scene Analysis, Jan 29, 2004 / Michael S. Lewicki, CMU !! !

!

? 207

Page 8: Computational Perceptionlewicki/cp-s08/auditory-coding1.pdf · What principles should guide the choice of representation? 5 Unsupervised approaches: • find useful “features”

Michael S. Lewicki ◇ Carnegie MellonCP08:: Auditory Coding 1

Learning objectiveLearning Objective

Objective: maximize coding e!ciency! maximize probability of data ensemble

Probability of pattern ensemble is:

P (x1,x2, ...,xN |A) =!

k

P (xk|A)

Computational Perception and Scene Analysis, Jan 29, 2004 / Michael S. Lewicki, CMU !! !

!

? 228

Page 9: Computational Perceptionlewicki/cp-s08/auditory-coding1.pdf · What principles should guide the choice of representation? 5 Unsupervised approaches: • find useful “features”

Michael S. Lewicki ◇ Carnegie MellonCP08:: Auditory Coding 1

Optimal coding of an acoustic waveformE!cient coding: focus on coding waveform directly

• We do not assume a Fourier or spectral representation.• Goal:

Predict optimal transformation of acoutsic waveformfrom statistics of the acoustic environment.

• Use a simple model: bank of linear filters

Computational Perception and Scene Analysis, Jan 29, 2004 / Michael S. Lewicki, CMU !! !

!

? 239

Page 10: Computational Perceptionlewicki/cp-s08/auditory-coding1.pdf · What principles should guide the choice of representation? 5 Unsupervised approaches: • find useful “features”

Michael S. Lewicki ◇ Carnegie MellonCP08:: Auditory Coding 1

Goal : Encode the patters to desiredprecision:

x = !a1s1 + · · · + !aLsL + !"

= As + !

Posterior :

P (s|x,A) =P (s)P (x|s,A)

P (x|A)

Prior : si’s are independent and sparse:

P (s) =!

i

P (si)

P (si) ! exp""

####si

#i

####qi$

Coding patterns with a statistical model

x = !a1s1 + !a2s2

!a1

!a2

Page 11: Computational Perceptionlewicki/cp-s08/auditory-coding1.pdf · What principles should guide the choice of representation? 5 Unsupervised approaches: • find useful “features”

Michael S. Lewicki ◇ Carnegie MellonCP08:: Auditory Coding 1

Goal : Encode the patters to desiredprecision:

x = !a1s1 + · · · + !aLsL + !"

= As + !

Posterior :

P (s|x,A) =P (s)P (x|s,A)

P (x|A)

Prior : si’s are independent and sparse:

P (s) =!

i

P (si)

P (si) ! exp""

####si

#i

####qi$

Coding patterns with a statistical model

q=4

q=2 (Gaussian)

q=1 (Laplacian)

q=0.5

Page 12: Computational Perceptionlewicki/cp-s08/auditory-coding1.pdf · What principles should guide the choice of representation? 5 Unsupervised approaches: • find useful “features”

Michael S. Lewicki ◇ Carnegie MellonCP08:: Auditory Coding 1

Goal : Encode the patters to desiredprecision:

x = !a1s1 + · · · + !aLsL + !"

= As + !

Posterior :

P (s|x,A) =P (s)P (x|s,A)

P (x|A)

Prior : si’s are independent and sparse:

P (s) =!

i

P (si)

P (si) ! exp""

####si

#i

####qi$

Likelihood : Assume ! # Gaussian,

P (x|s,!) ! exp""1

2!T!!1!

$

Inference: use the MAP value:

s = arg maxs

P (s|x,A)

Simple special case: no noise (ICA)

s = A!1x

Inference (or recognition or coding):

finds most e!cient representation ofpattern x in a given basis A

Coding patterns with a statistical model

Page 13: Computational Perceptionlewicki/cp-s08/auditory-coding1.pdf · What principles should guide the choice of representation? 5 Unsupervised approaches: • find useful “features”

Michael S. Lewicki ◇ Carnegie MellonCP08:: Auditory Coding 1

Learning: Optimizing the model parameters

Learning objective:

maximize coding e!ciency! maximize P (x|A) over A.

Probability of pattern ensemble is:

P (x1,x2, ...,xN |A) =!

k

P (xk|A)

P (x|A) is obtained by marginalization:

P (x|A) ="

dsP (x|A, s)P (s)

=P (s)

|detA|

Use independent component analysis(ICA) to learn A:

!A " AAT !!A log P (x|A)

= #A(zsT # I) ,

where z = (log P (s))!. Assumegeneralized Gaussians:

P (si) $ N qi(si|µ,!).

This learning rule:

• learns the feature set that captures themost structure

• optimizes basis to maximize thee!ciency of the code

Page 14: Computational Perceptionlewicki/cp-s08/auditory-coding1.pdf · What principles should guide the choice of representation? 5 Unsupervised approaches: • find useful “features”

Michael S. Lewicki ◇ Carnegie MellonCP08:: Auditory Coding 1

Learning the optimal codes

Learning procedure:

- random sound segments (8 msec)

- optimize features using ICA

What sounds to use?

Predict optimal transformation of sound waveform from statistics of the acoustic environment

Goal:

Use a variety of sound ensembles:

- non-harmonic environmental sounds (e.g. footsteps, stream sounds, etc.)

- animal vocalizations (rainforest mammals, e.g. chirps, screeches, cries, etc.)

- speech (samples from 100 male & female speakers from the TIMIT corpus)

What tasks are auditory systems adapted to do?

- localization ⇒ environmental sounds

- communication ⇒ vocalizations

- general sound recognition

14

Page 15: Computational Perceptionlewicki/cp-s08/auditory-coding1.pdf · What principles should guide the choice of representation? 5 Unsupervised approaches: • find useful “features”

Michael S. Lewicki ◇ Carnegie MellonCP08:: Auditory Coding 1

vocalizations

environmental sounds

transient ambient

Natural sounds

15

fox

squirrel

walking on leaves

rustling leaves

cracking branches

stream by waterfall

Page 16: Computational Perceptionlewicki/cp-s08/auditory-coding1.pdf · What principles should guide the choice of representation? 5 Unsupervised approaches: • find useful “features”

Michael S. Lewicki ◇ Carnegie MellonCP08:: Auditory Coding 1

vocalizations

environmental sounds

transient ambient

Natural sounds

16

fox

squirrel

walking on leaves

rustling leaves

cracking branches

stream by waterfall

Page 17: Computational Perceptionlewicki/cp-s08/auditory-coding1.pdf · What principles should guide the choice of representation? 5 Unsupervised approaches: • find useful “features”

Michael S. Lewicki ◇ Carnegie MellonCP08:: Auditory Coding 1

vocalizations

environmental sounds

transient ambient

Natural sounds

17

fox

squirrel

walking on leaves

rustling leaves

cracking branches

stream by waterfall

Page 18: Computational Perceptionlewicki/cp-s08/auditory-coding1.pdf · What principles should guide the choice of representation? 5 Unsupervised approaches: • find useful “features”

Michael S. Lewicki ◇ Carnegie MellonCP08:: Auditory Coding 1

vocalizations

environmental sounds

transient ambient

Natural sounds

18

fox

squirrel

walking on leaves

rustling leaves

cracking branches

stream by waterfall

Page 19: Computational Perceptionlewicki/cp-s08/auditory-coding1.pdf · What principles should guide the choice of representation? 5 Unsupervised approaches: • find useful “features”

Michael S. Lewicki ◇ Carnegie MellonCP08:: Auditory Coding 1

vocalizations

environmental sounds

transient ambient

Natural sounds

19

fox

squirrel

walking on leaves

rustling leaves

cracking branches

stream by waterfall

Page 20: Computational Perceptionlewicki/cp-s08/auditory-coding1.pdf · What principles should guide the choice of representation? 5 Unsupervised approaches: • find useful “features”

Michael S. Lewicki ◇ Carnegie MellonCP08:: Auditory Coding 1

vocalizations

environmental sounds

transient ambient

Natural sounds

20

fox

squirrel

walking on leaves

rustling leaves

cracking branches

stream by waterfall

Page 21: Computational Perceptionlewicki/cp-s08/auditory-coding1.pdf · What principles should guide the choice of representation? 5 Unsupervised approaches: • find useful “features”

Michael S. Lewicki ◇ Carnegie MellonCP08:: Auditory Coding 1

vocalizations

environmental sounds

transient ambient

Natural sounds

21

fox

squirrel

walking on leaves

rustling leaves

cracking branches

stream by waterfall

Page 22: Computational Perceptionlewicki/cp-s08/auditory-coding1.pdf · What principles should guide the choice of representation? 5 Unsupervised approaches: • find useful “features”
Page 23: Computational Perceptionlewicki/cp-s08/auditory-coding1.pdf · What principles should guide the choice of representation? 5 Unsupervised approaches: • find useful “features”

Michael S. Lewicki ◇ Carnegie MellonCP08:: Auditory Coding 1

Optimal linear filters for natural sounds

environmental sounds vocalizations

speech

The optimal code depends on theclass of sounds being encoded: - a wavelet-like transform is best for environmental sounds - a Fourier-like transform is best for vocalizations - an intermediate transform is best for speech or general natural sounds

23

Page 24: Computational Perceptionlewicki/cp-s08/auditory-coding1.pdf · What principles should guide the choice of representation? 5 Unsupervised approaches: • find useful “features”

Michael S. Lewicki ◇ Carnegie MellonCP08:: Auditory Coding 1

Characterizing the filter population

0 2 4 6 8time (ms)

!"# $ % # $!!&!!'!!%!!$!!

()*+,*-./012345!"# $ % # $!

!&!!'!!%!!$!!

()*+,*-./012345

0 2 4 6 8time (ms)

time-frequency distributions

24

Page 25: Computational Perceptionlewicki/cp-s08/auditory-coding1.pdf · What principles should guide the choice of representation? 5 Unsupervised approaches: • find useful “features”

Michael S. Lewicki ◇ Carnegie MellonCP08:: Auditory Coding 1

Schematic time-frequency distributions

time time

Fourier typical wavelet

frequ

ency

25

Page 26: Computational Perceptionlewicki/cp-s08/auditory-coding1.pdf · What principles should guide the choice of representation? 5 Unsupervised approaches: • find useful “features”

0 2 4 6 80

1

2

3

4

5

6

7

time (ms)

frequ

ency

(kHz

)

0 2 4 6 80

1

2

3

4

5

6

7

8

time (ms)

frequ

ency

(kHz

)

0 2 4 6 80

1

2

3

4

5

6

7

time (ms)

frequ

ency

(kHz

)

Env.Sounds

Speech

Vocalizations

Page 27: Computational Perceptionlewicki/cp-s08/auditory-coding1.pdf · What principles should guide the choice of representation? 5 Unsupervised approaches: • find useful “features”

0 2 4 6 8time (ms)

0 2 4 6 8time (ms)

0.5 1 2 5 10−40−30−20−10

0

frequency (kHz)

0 2 4 6 8time (ms)

0.5 1 2 5 10−40−30−20−10

0

frequency (kHz)0.5 1 2 5 10

−40−30−20−10

0

frequency (kHz)

0.5 1 2 5 10−40−30−20−10

0

frequency (kHz)0.5 1 2 5 10

−40−30−20−10

0

frequency (kHz)0.5 1 2 5 10

−40−30−20−10

0

frequency (kHz)

0 2 4 6 8time (ms)

0 2 4 6 8time (ms)

0 2 4 6 8time (ms)

0 2 4 6 8time (ms)

0 2 4 6 8time (ms)

0.5 1 2 5 10−40−30−20−10

0

frequency (kHz)0.5 1 2 5 10

−40−30−20−10

0

frequency (kHz)

0 2 4 6 8time (ms)

0.5 1 2 5 10−40−30−20−10

0

frequency (kHz)

vocalizations

mag

nitu

de (d

B)

speech

mag

nitu

de (d

B)m

agni

tude

(dB)

env. sounds

Page 28: Computational Perceptionlewicki/cp-s08/auditory-coding1.pdf · What principles should guide the choice of representation? 5 Unsupervised approaches: • find useful “features”

Michael S. Lewicki ◇ Carnegie MellonCP08:: Auditory Coding 1

0.2 0.5 1 2 5 10 20

1

2

5

10

20

center frequency (kHz)

Q10

dB

+ vocalizationso speech/combinedx environmental sounds

Theory

0.2 0.5 1 2 5 10 20

1

2

5

10

20

center frequency (kHz)

Q10

dB

Theory

0.2 0.5 1 2 5 10 20

1

2

5

10

20

characteristic frequency (kHz)

Q10

dB

Evans, 1975Rhode and Smith, 1985

0.2 0.5 1 2 5 10 20

1

2

5

10

20

characteristic frequency (kHz)

Q10

dB

Evans, 1975Rhode and Smith, 1985

Comparison to cat auditory nerve data

Q10dB = fc/w10dB

Data

Filter sharpness:

28

Page 29: Computational Perceptionlewicki/cp-s08/auditory-coding1.pdf · What principles should guide the choice of representation? 5 Unsupervised approaches: • find useful “features”

Next time:non-linear coding