latent variable framework for modeling and separating

66
COGNITIVE AND NEURAL SYSTEMS, BOSTON UNIVERSITY Boston, Massachusetts Madhusudana Shashanka [email protected] 17 August 2007 Dissertation Defense Latent Variable Framework for Modeling and Separating Single-Channel Acoustic Sources Committee Chair: Prof. Daniel Bullock Readers: Prof. Barbara Shinn-Cunningham Dr. Paris Smaragdis Prof. Frank Guenther Reviewers: Dr. Bhiksha Raj Prof. Eric Schwartz

Upload: others

Post on 04-Jun-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Latent Variable Framework for Modeling and Separating

COGNITIVE AND NEURAL SYSTEMS, BOSTON UNIVERSITYBoston, Massachusetts

Madhusudana [email protected]

17 August 2007Dissertation Defense

Latent Variable Framework for Modeling andSeparating Single-Channel Acoustic Sources

CommitteeChair: Prof. Daniel BullockReaders: Prof. Barbara Shinn-Cunningham

Dr. Paris SmaragdisProf. Frank Guenther

Reviewers: Dr. Bhiksha RajProf. Eric Schwartz

Page 2: Latent Variable Framework for Modeling and Separating

2

Outline

• Introduction• Time-Frequency Structure Background• Latent Variable Decomposition:

A Probabilistic Framework Contributions• Sparse Overcomplete Decomposition• Conclusions

Page 3: Latent Variable Framework for Modeling and Separating

3

Introduction

The achievements of the ear are indeed fabulous. While I am writing, my elder son rattles the fire rake in the stove, the infant babbles contentedly in his baby carriage, the church clock strikes the hour, …… In the vibrations of air striking my ear, all these sounds are superimposed into a single extremely complex stream of pressure waves. Without doubt the achievements of the ear are greater than those of the eye.

Wolfgang Metzger, in Gesetze des Sehens (1953)Abridged in English and quoted by Reinier Plomp (2002)

Introduction

Page 4: Latent Variable Framework for Modeling and Separating

4

Cocktail Party Effect

Introduction Cocktail Party Effect

(Cocktail Party by SLAW, Maniscalco Gallery. From slides of Prof. Shinn-Cunningham, ARO 2006)

Colin Cherry (1953)

Our ability to follow one speaker in the presenceof other sounds.

The auditory system separates the input intodistinct auditory objects.

Challenging problem from a computationalperspective.

Page 5: Latent Variable Framework for Modeling and Separating

5

Cocktail Party Effect

• Fundamental questions– How does the brain solve it?– Is it possible to build a machine capable of solving it in a

satisfactory manner?• Need not mimic the brain

• Two cases– Multi-channel (Human auditory system is an example with two

sensors)– Single-Channel

Introduction Cocktail Party Effect

Page 6: Latent Variable Framework for Modeling and Separating

6

Source Separation: Formulation

• Given just , how to separate sources ?

• Problem: Indeterminacy – Multiple ways in which

source signals can be reconstructed from the available information

Introduction Source Separation

x(t) =

N∑

j=1

sj(t)

x(t)

s1(t)

sN (t)

sj(t). . .

...

x(t)

sj(t)

Page 7: Latent Variable Framework for Modeling and Separating

7

Source Separation: Approaches

• Exact solutions not possible, but can approximate– by utilizing information about the problem

• Psychoacoustically/Biologically inspired approach – Understand how the auditory system solves the problem– Utilize the insights gained (rules and heuristics) in the artificial

system

• Engineering approach– Utilize probability and signal processing theories to take

advantage of known or hypothesized structure/statistics of the source signals and/or the mixing process

Introduction Source Separation

Page 8: Latent Variable Framework for Modeling and Separating

8

Source Separation: Approaches

• Psychoacoustically inspired approach – Seminal work of Bregman (1990) - Auditory Scene Analysis (ASA)– Computational Auditory Scene Analysis (CASA)– Computational implementations of the views outlined by Bregman

(Rosenthal and Okuno, 1998)– Limitations: reconcile subjective concepts (e.g. “similarity”,

“continuity”) with strictly deterministic computational platforms?– Difficulty incorporating statistical information

• Engineering approach– Most work has focused on multi-channel signals– Blind Source Separation: Beamforming and ICA– Unsuitable for single-channel signals

Introduction Source Separation

Page 9: Latent Variable Framework for Modeling and Separating

9

Source Separation: This Work

• We take a machine learning approach in a supervised setting– Assumption: One or more sources present in the mixture are

“known”– Analyze the sample waveforms of the known sources and extract

characteristics unique to each one– Utilize the learned information for source separation and other

applications

• Focus on developing a probabilistic framework for modeling single-channel sounds– Computational perspective, goal not to explain human auditory

processing– Provide a framework grounded in theory that allows principled

extensions – Aim is not just to build a particular separation system

Introduction Source Separation

Page 10: Latent Variable Framework for Modeling and Separating

10

Outline

• Introduction• Time-Frequency Structure

– We need a representation of audio to proceed

• Latent Variable Decomposition:A Probabilistic Framework

• Sparse Overcomplete Decomposition• Conclusions

Page 11: Latent Variable Framework for Modeling and Separating

11

Representation

Time-Frequency Structure Audio Representation

FREQ

TIME

Time-domain representation

Sampled waveform: each sample represents the sound pressure level at a particular time instant.

Time-Frequency representation

TF representation shows energy in TF bins explicitly showing the variation along time and frequency.

TIME

AMPL

Page 12: Latent Variable Framework for Modeling and Separating

12

TF Representations

Time-Frequency Structure Audio Representation

Short Time Fourier Transform (STFT; Gabor, 1946)

• time-frames: successive fixed-width snippets of the waveform (windowed and overlapping)

• Spectrogram: Fourier transforms of all time slices. The result for a given time slice is a spectral vector.

• Other TF representations possible (different filter banks): only STFT considered in this work

• Constant-Q (Brown, 1991)• Gammatone (Patterson et al. 1995)• Gamma-chirp (Irino and Patterson, 1997)• TF distributions (Laughlin et al. 1994)

Piano

Cymbals

Female Speaker

Page 13: Latent Variable Framework for Modeling and Separating

13

Magnitude Spectrograms

Time-Frequency Structure Audio Representation

Short Time Fourier Transform (STFT; Gabor, 1946)

• Magnitude spectrograms: TF entries represent energy-like quantities that can be approximated to add additively in case of sound mixtures

• Phase information is ignored. Enough information present in the magnitude spectrogram, simple test:

• Speech with cymbals phase

• Cymbals with piano phase

+ +

Page 14: Latent Variable Framework for Modeling and Separating

14

TF Masks

Time-Frequency Masks• Popular in the CASA literature

• Assign higher weight to areas of the spectrogram where target is dominant

• Intuition: dominant source masks the energy of weaker ones in any TF bin, thus only such “dominant” TF bins are sufficient for reconstruction

• Reformulate the problem – goal is to estimate the TF mask (Ideal Binary Mask; Wang, 2005)

• Utilize cues like harmonicity, F0 continuity, common onsets/offsets etc.:

– Synchrony strand (Cooke, 1991)

– TF Maps (Brown and Cooke, 1994)

– Correlograms (Weintraub, 1985; Slaney and Lyon, 1990)

Time-Frequency Structure Modeling TF Structure

Target

Mixture

“Masked”Mixture

Page 15: Latent Variable Framework for Modeling and Separating

15

TF Masks

Time-Frequency Masks: Limitations

• Implementation of “fuzzy” rules and heuristics from ASA; ad-hoc methods, difficulty incorporating statistical information (Roweis, 2000)

• Assumption: energy sparsely distributed i.e. different sources are disjoint in their spectro-temporal content (Yilmaz and Rickard, 2004)

– performs well only on mixtures that exhibit well-defined regions in the TF plane corresponding to the various sources (van der Kouwe et al. 2001)

Time-Frequency Structure Modeling TF Structure

Target

Mixture

“Masked”Mixture

Page 16: Latent Variable Framework for Modeling and Separating

16

Basis Decomposition Methods

Basis Decomposition

• Idea: observed data vector can be expressed as a linear combination of a set of “basis components”

• Data vectors � spectral vectors

• Intuition: every source exhibits characteristic structure that can be captured by a finite set of components

Time-Frequency Structure Modeling TF Structure

Data Vectors

Mixture WeightsBasis Components

vt =K∑

k=1

hktwk

V =WH

+

Page 17: Latent Variable Framework for Modeling and Separating

17

Basis Decomposition Methods

Basis Decomposition Methods

• Many Matrix Factorization methods available, e.g. PCA, ICA

• Toy example: PCA components can have negative values

• But spectrogram values are positive –interpretation?

Time-Frequency Structure Modeling TF Structure

+−+−

PCA Components

FREQ

TIME

FREQ

Page 18: Latent Variable Framework for Modeling and Separating

18

Basis Decomposition Methods

Basis Decomposition Methods• Non-negative Matrix Factorization (Lee

and Seung, 1999)• Explicitly enforces non-negativity on

both the factored matrices • Useful for analyzing spectrograms

(Smaragdis, 2004, Virtanen, 2006)

• Issues– Can’t incorporate prior biases

– Restricted to 2D representations

Time-Frequency Structure Modeling TF Structure

W

H

Mixture Weights

BasisComponents

FREQ

TIME

Page 19: Latent Variable Framework for Modeling and Separating

19

Outline

• Introduction• Time-Frequency Structure• Latent Variable Decomposition: Probabilistic Framework

– Our alternate approach: Latent variable decomposition treating spectrograms as histograms

• Sparse Overcomplete Decomposition• Conclusions

Page 20: Latent Variable Framework for Modeling and Separating

20

Latent Variables

• Widely used in social and behavioral sciences– Traced back to Spearman (1904), factor analytic models for

Intelligence Testing

• Latent Class Models (Lazarsfeld and Henry, 1968)– Principle of local independence (or the common cause criterion)– If a latent variable underlies a number of observed variables, the

observed variables conditioned on the latent variable should be independent

Latent Variable Decomposition Background

Page 21: Latent Variable Framework for Modeling and Separating

21

Spectrograms as Histograms

Generative Model• Spectral vectors – energy at various

frequency bins

• Histograms of multiple draws from a frame-specific multinomial distribution over frequencies

• Each draw � “a quantum of energy”

Latent Variable Decomposition Generative Model

f

HISTOGRAM

f

FRAME t

HISTOGRAM Pick a ball

Note color,update histogram

+1

Place it back

Multinomial Distributionunderlying the t-th spectralvector

Pt(f)

Page 22: Latent Variable Framework for Modeling and Separating

22

Model

Latent Variable Decomposition Framework

f

Pt(f)

Page 23: Latent Variable Framework for Modeling and Separating

23

Model

Latent Variable Decomposition Framework

f

P (f |z)

Pt(z)

Pt(f) =∑

z

P (f |z)Pt(z) Generative Model

• Mixture Multinomial

• Procedure– Pick Latent Variable z (urn):

– Pick frequency f from urn:– Repeat the process times,

the total energy in the t-thframe

Pt(z)P (f |z)

Vt

Page 24: Latent Variable Framework for Modeling and Separating

24

Model

Generative Model

• Mixture Multinomial

• Procedure– Pick Latent Variable z (urn):

– Pick frequency f from urn:

– Repeat the process times, the total energy in the t-th frame

Latent Variable Decomposition Framework

f

HISTOGRAM

. . .

Pt(z)P (f |z)

Vt

Pt(f) =∑

z

P (f |z)Pt(z)

Frame-specific spectral distribution

Frame-specific mixture weights

Source-specific basis components

Page 25: Latent Variable Framework for Modeling and Separating

25

Model

Latent Variable Decomposition Framework

f

HISTOGRAM

. . .

Pt(f) =∑

z

P (f |z)Pt(z)

Frame-specific spectral distribution

Frame-specific mixture weights

Source-specific basis components

Generative Model

• Mixture Multinomial

• Procedure– Pick Latent Variable z (urn):

– Pick frequency f from urn:

– Repeat the process times, the total energy in the t-th frame

Pt(z)P (f |z)

Vt

Page 26: Latent Variable Framework for Modeling and Separating

26

Model

Latent Variable Decomposition Framework

f

HISTOGRAM

. . .

Pt(f) =∑

z

P (f |z)Pt(z)

Frame-specific spectral distribution

Frame-specific mixture weights

Source-specific basis components

Generative Model

• Mixture Multinomial

• Procedure– Pick Latent Variable z (urn):

– Pick frequency f from urn:

– Repeat the process times, the total energy in the t-th frame

Pt(z)P (f |z)

Vt

Page 27: Latent Variable Framework for Modeling and Separating

27

Model

Latent Variable Decomposition Framework

f

Pt(f)

P (f |z)

Pt(z)

Pt(f) =∑

z

P (f |z)Pt(z) Generative Model

• Mixture Multinomial

• Procedure– Pick Latent Variable z (urn):

– Pick frequency f from urn:– Repeat the process times,

the total energy in the t-thframe

Pt(z)P (f |z)

Vt

Page 28: Latent Variable Framework for Modeling and Separating

28

The mixture multinomial as a point in a simplex

Latent Variable Decomposition Framework

P (f |z)

Pt(f)

Page 29: Latent Variable Framework for Modeling and Separating

29

Learning the Model

Analysis

• Given the spectrogram , estimate the parameters

• represent the latent structure, they underlie all the frames and hence characterize the source

Latent Variable Decomposition Framework

. . .

Pt(f) =∑

z

P (f |z)Pt(z)

Frame-specific spectral distribution

Frame-specific mixture weights

Source-specific basis components

P (f |z)

V

V

Page 30: Latent Variable Framework for Modeling and Separating

30

Learning the Model: Geometry

Latent Variable Decomposition Model Geometry

Simplex BoundaryData PointsBasis VectorsConvex Hull(001)

(010)

(100)

3 Basis Vectors

• Spectral distributions and basis components are points in a simplex

• Estimation process: find corners of the convex hull that surrounds normalized spectral vectors in the simplex

Page 31: Latent Variable Framework for Modeling and Separating

31

Learning the Model: Parameter Estimation

• Expectation-Maximization Algorithm

Latent Variable Decomposition Framework

Pt(z) =

∑f VftPt(z|f)∑

z

∑f VftPt(z|f)

P (f |z) =

∑t VftPt(z|f)∑

f

∑t VftPt(z|f)

Pt(z|f) =Pt(z)P (f |z)∑z Pt(z)P (f |z)

Vft Entries of the training spectrogram

Page 32: Latent Variable Framework for Modeling and Separating

32

Example Bases

Latent Variable Decomposition Framework

f

z t

Speech

Harp

Piano

Page 33: Latent Variable Framework for Modeling and Separating

33

Source Separation

Latent Variable Decomposition Source Separation

SpeechHarp

Page 34: Latent Variable Framework for Modeling and Separating

34

Source Separation

Latent Variable Decomposition Source Separation

SpeechHarp Mixture

SpeechBases

HarpBases

Page 35: Latent Variable Framework for Modeling and Separating

35

Source Separation

Latent Variable Decomposition Source Separation

• Mixture Spectrogram Model – linear combination of individual sources

Pt(f) = Pt(s1)P (f |s1) + Pt(s2)P (f |s2)

Pt(f) =

(

Pt(s1)∑

z∈{zs1}

Pt(z|s1)Ps1(f |z)

)

+

(

Pt(s2)∑

z∈{zs2}

Pt(z|s2)Ps2(f |z)

)

Page 36: Latent Variable Framework for Modeling and Separating

36

Source Separation

Latent Variable Decomposition Source Separation

Pt(s, z|f) =Pt(s)Pt(z|s)Ps(f |z)∑

s Pt(s)∑z∈{zs}

Pt(z|s)Ps(f |z)

Pt(f, s) = Pt(s)∑

z∈{zs}

Pt(z|s)Ps(f |z)

• Expectation-Maximization Algorithm

Pt(s) =

∑f Vft

∑z∈{zs}

Pt(s, z|f)∑s

∑f Vft

∑z∈{zs}

Pt(s, z|f)

Pt(z|s) =

∑f VftPt(s, z|f)∑

z∈{zs}

∑f VftPt(s, z|f)

Vft

Entries of themixture spectrogram

Page 37: Latent Variable Framework for Modeling and Separating

37

Source Separation

Latent Variable Decomposition Source Separation

Pt(f) = Pt(s1)P (f |s1) + Pt(s2)P (f |s2)

Pt(f) =

(

Pt(s1)∑

z∈{zs1}

Pt(z|s1)Ps1(f |z)

)

+

(

Pt(s2)∑

z∈{zs2}

Pt(z|s2)Ps2(f |z)

)

Vft Mixture Spectrogram

V̂ft(s) =Pt(f, s)

Pt(f)Vft

Pt(f, s)

Page 38: Latent Variable Framework for Modeling and Separating

38

Source Separation

Latent Variable Decomposition Source Separation

V

Φ

O

R

Magnitude of the mixture

Phase of the mixture

Mag. of the original signal

Mag. of the reconstruction

g(X) = 10 log10

( ∑f,t

O2

ft∑f,t |Ofte

jΦft−XftejΦft |2

)

Mixture

Reconstructions

SNR Improvements 5.25 dB 5.30 dB

SNR = g(R)− g(V)

Page 39: Latent Variable Framework for Modeling and Separating

39

Source Separation: Semi supervised

Latent Variable Decomposition Source Separation

• Possible even if only one source is “known”– Bases of other source estimated during separation

• “Raise My Rent” by David Gilmour• Background music “bases” learned

from 5 seconds of music-only clips in the song

• Lead guitar bases learned from the rest of the song

• “Sunrise” by Norah Jones

• Harder – wave-file clipped

• Background music bases learned from 5 seconds of music-only segments of the song

• More eg: http://cns.bu.edu/~mvss/courses/speechseg/

Song FG BG Song FG BG

= +Denoising

Only speechknown

Page 40: Latent Variable Framework for Modeling and Separating

40

Outline

• Introduction• Time-Frequency Structure• Latent Variable Decomposition: Probabilistic Framework• Sparse Overcomplete Decomposition

– Learning more structure than the dimensionality will allow

• Conclusions

Page 41: Latent Variable Framework for Modeling and Separating

41

Limitation of the Framework

• Real signals exhibit complex spectral structure – The number of components required to model this structure could

be potentially large– However, the latent variable framework has a limitation:

The number of components that can be extracted is limited by the number of frequency bins in the TF representation (an arbitrary choice in the context of ground truth).

– Extracting an overcomplete set of components leads to the problem of indeterminacy

Sparse Overcomplete Decomposition Sparsity

Page 42: Latent Variable Framework for Modeling and Separating

42

Overcompleteness: Geometry

Sparse Overcomplete Decomposition Geometry of Sparse Coding

(001)

(010)

(100)

3 Basis Vectors

(100)

(010)

(001)

4 Basis Vectors

Simplex BoundaryData Points

(100) (010)

(001)

DATA

(100)

(010)

(001)

7 Basis Vectors

(100)

(010)

(001)

10 Basis Vectors

• Overcomplete case– As the number of bases increases, basis components migrate towards the corners

of the simplex– Accurately represent data, but lose data-specificity

Page 43: Latent Variable Framework for Modeling and Separating

43

Indeterminacy in the Overcomplete Case

Sparse Overcomplete Decomposition Geometry of Sparse Coding

• Multiple solutions that have zero error � indeterminacy

Page 44: Latent Variable Framework for Modeling and Separating

44

Sparse Coding

Sparse Overcomplete Decomposition Geometry of Sparse Coding

• Restriction � use the fewest number of corners – At least three required for accurate representation

– The number of possible solutions greatly reduced, but still indeterminate

– Instead, we minimize the entropy of mixture weights

ABD, ACE, ACDABE, ABG, ACGACF

Page 45: Latent Variable Framework for Modeling and Separating

45

Sparsity

• Sparsity – originated as a theoretical basis for sensory coding (Kanerva, 1988; Field, 1994; Olshausen and Field, 1996)– Following Attneave (1954), Barlow (1959, 1961) to use

information-theoretic principles to understand perception– Has utility in computational models and engineering methods

• How to measure sparsity?– fewer number of components � more sparsity

• Number of non-zero mixture weights i.e. the L0 norm

– L0 hard to optimize; L1 (or L2 in certain cases) used as an approximation

– We use entropy of the mixture weights as the measure

Sparse Overcomplete Decomposition Sparsity

Page 46: Latent Variable Framework for Modeling and Separating

46

Learning Sparse Codes: Entropic Prior

• Model --

– Estimate such that entropyof is minimized

• Impose an entropic prior on (Brand, 1999)

– where is the entropy

– is the sparsity parameter that can be controlled

– with high entropies are penalized with low probability

– MAP formulation solved using Lambert’s W function

Sparse Overcomplete Decomposition Sparsity

P (f |z)

Pt(z)

Pt(z)

H(θ) = −∑

i

θi log θi

Pt(z)

Pe(θ) ∝ eβH(θ)

Pt(f) =∑

z

P (f |z)Pt(z)

Pt(z)

β

Page 47: Latent Variable Framework for Modeling and Separating

47

Geometry of Sparse Coding

Sparse Overcomplete Decomposition Geometry of Sparse Coding

• Sparse Overcomplete case– Sparse mixture weights � spectral vectors must be close to a small number of

corners, forcing the convex hull to be compact– Simplex formed by bases shrinks to fit the data

Sparsity Param = 0.01

7 Basis Vectors

(100)

(001)

(010) 7 Basis Vectors

Sparsity Param = 0.05

(100)

(001)

(010)

Simplex BoundaryData Points

(100) (010)

(001)

DATA

Sparsity Param = 0.1

7 Basis Vectors

(100)

(001)

(010)

Sparsity Param = 0.3

7 Basis Vectors

(100)

(001)

(010)

Page 48: Latent Variable Framework for Modeling and Separating

48

Sparse-coding: Illustration

Sparse Overcomplete Decomposition Examples

1

2

3

1 2 3P (f |z)

Pt(z)

f

t

No Sparsity

Page 49: Latent Variable Framework for Modeling and Separating

49

1

2

3

4

1 2 3 4

Sparse-coding: Illustration

Sparse Overcomplete Decomposition Examples

P (f |z)

Pt(z)

f

t

No Sparsity

Page 50: Latent Variable Framework for Modeling and Separating

50

1

2

3

4

1 2 3 4

Sparse-coding: Illustration

Sparse Overcomplete Decomposition Examples

P (f |z)

Pt(z)

f

t

Sparse Mixture Weights

Page 51: Latent Variable Framework for Modeling and Separating

51

Speech Bases

Sparse Overcomplete Decomposition Examples

Trained without Sparse Mixture WeightsCompact Code

Trained with Sparse Mixture WeightsSparse-Overcomplete Code

f

f

Page 52: Latent Variable Framework for Modeling and Separating

52

Entropy Trade-off

Sparse Overcomplete Decomposition Geometry of Sparse Coding

Sparse-coding Geometry

• Sparse mixture weights � bases which are holistic representations of the data

• Decrease in mixture-weight entropy � increase in bases components entropy, components become more “informative”

– Empirical evidence

0 0.005 0.01 0.05 0.1 0.15 0.2 0.25 0.3 0.5 0.7 0.9 12.5

3

3.5

4

4.5

5

5.5

Sparsity Parameter β

Ave

rage

Ent

ropy

of P

(f|z

)

7501000

0 0.005 0.01 0.05 0.1 0.15 0.2 0.25 0.3 0.5 0.7 0.9 10

1

2

3

4

5

6

Sparsity Parameter β

Ave

rage

Ent

ropy

of P

t(z)

7501000

Page 53: Latent Variable Framework for Modeling and Separating

53

Source Separation: Results

Sparse Overcomplete Decomposition Source Separation

Red – CC, compact code, 100 components

Blue – SC, sparse-overcomplete code, 1000 components, β = 0.7

Mixture 1

Mixture 2

CC

SC

CC

SC

3.82 dB 3.80 dB

9.16 dB8.90 dB

5.25 dB 5.30 dB

8.33 dB8.02 dB

Page 54: Latent Variable Framework for Modeling and Separating

54

Source Separation: Results

Sparse Overcomplete Decomposition Source Separation

Results

• Sparse-overcomplete code leads to better separation

• Separation quality increases with increasing sparsity before tapering off at high sparsity values (> 0.7)

0.005 0.01 0.05 0.1 0.15 0.2 0.25 0.3 0.5 0.7 0.92

3

4

5

6

7

8

Sparsity Parameter β

dB

Mixture 1 −− 1000 Basis Vectors

SNRSER

0.005 0.01 0.05 0.1 0.15 0.2 0.25 0.3 0.5 0.7 0.92

3

4

5

6

7

8

9

Sparsity Parameter β

dB

Mixture 1 −− 750 Basis Vectors

SNRSER

Page 55: Latent Variable Framework for Modeling and Separating

55

Other Applications

Sparse Overcomplete Decomposition Other Applications

• Framework is general, operates on non-negative data – Text data (word counts), images etc.

• Examples

Page 56: Latent Variable Framework for Modeling and Separating

56

Other Applications

Sparse Overcomplete Decomposition Other Applications

• Framework is general, operates on non-negative data – Text data (word counts), images etc.

• Image Examples: Feature Extraction

α =

0.0

5, β

= 0

α =

0, β

= 0

.3

α =

−15

0, β

= 0

α = β = 0

α =

0, β

= −

10

(a)

(b)

(c)

(e)

(d)

Page 57: Latent Variable Framework for Modeling and Separating

57

Other Applications

Sparse Overcomplete Decomposition Other Applications

• Framework is general, operates on non-negative data – Text data (word counts), images etc.

• Image Examples: Hand-written digit classification

Basis Components Mixture Weights

Data Vector

Basis Vectors Mixture Weights

Pixel Image

β = 0.5

β = 0

β = 0.2

β = 0.05

Page 58: Latent Variable Framework for Modeling and Separating

58

Other Applications

Sparse Overcomplete Decomposition Other Applications

• Framework is general, operates on non-negative data – Text data (word counts), images etc.

• Image Examples: Hand-written digit classification

0 0.05 0.1 0.2 0.32

2.5

3

3.5

4

4.5

5

Sparsity Parameter

Per

cent

age

Err

or

255075100200

Page 59: Latent Variable Framework for Modeling and Separating

59

Outline

• Introduction• Time-Frequency Structure• Latent Variable Decomposition: Probabilistic Framework• Sparse Overcomplete Decomposition• Conclusions

– In conclusion…

Page 60: Latent Variable Framework for Modeling and Separating

60

Thesis Contributions

• Modeling single-channel acoustic signals – important applications in various fields

• Provides a probabilistic framework – amenable to principled extensions and improvements

• Incorporates the idea of sparse coding in the framework• Points to other extensions – in the form of priors• Theoretical analysis of models and algorithms• Applicability to other data domains

• Six refereed publications in international conferences and workshops (ICASSP, ICA, NIPS), two manuscripts under review (IEEE TPAMI, NIPS)

Conclusions Thesis Contributions

Page 61: Latent Variable Framework for Modeling and Separating

61

Future Work

• Representation– Other TF representations (eg. constant-Q, gammatone)– Multidimensional representations (correlograms, higher order

spectra)– Utilize phase information in the representation

• Model and Theory

• Applications

Conclusions Future Work

Page 62: Latent Variable Framework for Modeling and Separating

62

Future Work

• Representation

• Model and Theory– Employ priors on parameters to impose known/hypothesized

structure (Dirichlet, mixture Dirichlet, Logistic Normal)– Explicitly model time structure using HMMs/other dynamic models– Utilize discriminative learning paradigm– Extract components that form independent subspaces, could be

used for unsupervised separation– Relation between sparse decomposition and non-negative ICA– Extensions/improvements to inference algorithms (eg. tempered

EM)

• Applications

Conclusions Future Work

Page 63: Latent Variable Framework for Modeling and Separating

63

Future Work

• Representation

• Model and Theory

• Applications – Other audio applications such as music transcription, speaker

recognition, audio classification, language identification etc.– Explore applications in data-mining, text semantic analysis, brain-

imaging analysis, radiology, chemical spectral analysis etc.

Conclusions Future Work

Page 64: Latent Variable Framework for Modeling and Separating

64

Acknowledgements

• Prof. Barbara Shinn-Cunningham• Dr. Bhiksha Raj and Dr. Paris Smaragdis• Thesis Committee Members• Faculty/Staff at CNS and Hearing Research Center• Scientists/Staff at Mitsubishi Electric Research Laboratories• Friends and well-wishers

– Supported in part by the Air Force Office of Scientific Research(AFOSR FA9550-04-1-0260), the National Institutes of Health (NIH R01 DC05778), the National Science Foundation (NSF SBE-0354378), and the Office of Naval Research (ONR N00014-01-1-0624).

– Arts and Sciences Dean’s Fellowship, Teaching Fellowship– Internships at Mitsubishi Electric Research Laboratories

Page 65: Latent Variable Framework for Modeling and Separating

65

Additional Slides

Page 66: Latent Variable Framework for Modeling and Separating

66

Publications

Refereed Publications and Manuscripts Under Review

• MVS Shashanka, B Raj, P Smaragdis. “Probabilistic Latent Variable Model for Sparse Decompositions of Non-negative Data” submitted to IEEE Transactions on Pattern Analysis And Machine Intelligence

• MVS Shashanka, B Raj, P Sparagdis. “Sparse Overcomplete Latent Variable Decomposition of Counts Data” submitted to NIPS 2007

• P Smaragdis, B Raj, MVS Shashanka. “Supervised and Semi-Supervised Separation of Sounds from Single-Channel Mixtures,” Intl. Conf. on ICA, London, Sep 2007

• MVS Shashanka, B Raj, P Smaragdis. “Sparse Overcomplete Decomposition for Single Channel Speaker Separation,” Intl. Conf. on Acoustics, Speech and Signal Processing, Honolulu, Apr 2007

• B Raj, R Singh, MVS Shashanka, P Smaragdis. “Bandwidth Expansion with a PolyaUrn Model,” Intl. Conf. on Acoustics, Speech and Signal Proc., Honolulu, Apr 2007

• B Raj, P Smaragdis, MVS Shashanka, R Singh, “Separating a Foreground Singer from Background Music,” Intl Symposium on Frontiers of Research on Speech and Music, Mysore, India, Jan 2007

• P Smaragdis, B Raj, MVS Shashanka. “A Probabilistic Latent Variable Model for Acoustic Modeling ,” Workshop on Advances in Models for Acoustic Processing, NIPS 2006

• B Raj, MVS Shashanka, P Smaragdis. “Latent Dirichlet Decomposition for Single Channel Speaker Separation,” Intl. Conf. on Acoustics, Speech and Signal Processing, Paris, May 2006

Conclusions Thesis Contributions