brian king, [email protected]@uw.edu advised by les atlas electrical engineering, university of...

43
A Framework for Complex Probabilistic Latent Semantic Analysis and its Application to Single-Channel Source Separation Brian King, [email protected] Advised by Les Atlas Electrical Engineering, University of Washington This research was funded by Air Force Office of Scientific Research 1

Upload: dwayne-jacobs

Post on 29-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

PowerPoint Presentation

A Framework for Complex Probabilistic Latent Semantic Analysis and its Application to Single-Channel Source SeparationBrian King, [email protected] by Les AtlasElectrical Engineering, University of WashingtonThis research was funded by Air ForceOffice of Scientific Research11Problem StatementDevelop a theoretical framework for complex probabilistic latent semantic analysis (CPLSA) and its application in single-channel source separationIntro Background Current Proposed2La spend a little more time on this slide, mention why its important- Im going to discuss the limitations in the current methods and why my method exceeds these limitations2OutlineIntroductionBackgroundMy current contributionsProposed work

Intro Background Current Proposed33Nonnegative Matrix Factorization (NMF)Intro Background Current Proposed

Xf,tBf,kWk,tX

Time (t)Frequency (f)Frequency (f)Basis Index (k)Basis Index (k)4[1] D.D. Lee and H.S. Seung, Algorithms for Non-Negative Matrix Factorization, Neural Information Processing Systems, 2001, pp. 556--562.Mention sparsity here4Using Matrix Factorization for Source SeparationFind BasesFind WeightsxindivxmixedY1SeparationY2 STFT*STFT*ISTFT**ISTFT**y1y2 XindivXmixed

*Short Time Fourier Transform**Inverse Short Time Fourier TransformB, WIntro Background Current Proposed5SeparationX El highlight separation slide and way were going into the separation step5Using Matrix Factorization for Synthesis / Source SeparationMatrix FactorizationXB1B2W1W2XY1Basesf,kY2Weightsk,tSeparated Signalsf,tSynthesisB, WY1Y2Source SeparationY1Synthesized Signalf,tBWIntro Background Current Proposed

66NMF Cost Function: Frobenius Norm with SparsitywhereIntro Background Current Proposed

7Frobenius2L1 Sparsity

Xf,tBf,kWk,tX

Probabilistic Latent Semantic Analysis (PLSA)Views the magnitude spectrogram as a joint probability distribution

Intro Background Current Proposed

8[2] M. Shashanka, B. Raj, and P. Smaragdis, Probabilistic Latent Variable Models as Nonnegative Factorizations, Computational Intelligence and Neuroscience, vol. 2008, 2008, pp. 1-9.Ee explain the name8Probabilistic Latent Semantic Analysis (PLSA)Uses the following generative modelPick a time, P(t)Pick a base from that time, P(k|t)Pick a frequency of that base, P(f|k)Increment the chosen (f,t) by oneRepeatCan be written asIntro Background Current Proposed

9Multinomial distributions9Probabilistic Latent Semantic Analysis (PLSA)Relationship to NMFP(t) is the sum of all magnitude at time tP(k|t) similar to weight matrix Wk,tP(f|k) similar to base matrix Bf,k

NMF

PLSAIntro Background Current Proposed

10Now Ill discuss the relationship between PLSA and NMF- Mention z is similar to k!10Probabilistic Latent Semantic AnalysisAdvantage of PLSA over NMF: ExtensibilityA tremendous amount of applicable literature on generative modelsEntropic priors [2]HMMs with state-dependent dictionaries [6]

[2] M. Shashanka, B. Raj, and P. Smaragdis, Probabilistic Latent Variable Models as Nonnegative Factorizations, Computational Intelligence and Neuroscience, vol. 2008, 2008, pp. 1-9.[6] G.J. Mysore, A Non-Negative Framework for Joint Modeling of Spectral Structures and Temporal Dynamics in Sound Mixtures, PhD Thesis, Stanford University, 2010. Intro Background Current Proposed11- Say that the generative model framework allows for extensions currently impossible in NMF- Further extension: say thats whats coming11 but superposition?OriginalSourcesMixtureProperSeparationNMFSeparation#1#2

!!!!!!Intro Background Current Proposed12This is a core pointnonnegative is an unsuitable number space12CMF Cost Function: Frobenius Norm with SparsitywhereIntro Background Current Proposed

13Frobenius2L1 Sparsity

Xf,tBf,kWk,tX

[3] H. Kameoka, N. Ono, K. Kashino, and S. Sagayama, Complex NMF: A New Sparse Representation for Acoustic Signals, International Conference on Acoustics, Speech, and Signal Processing, 2009.X El make a backup slide explaining WHY you cant just have all the phase in complex bases13Comparing NMF and CMF via ASR: IntroductionDataBoston University news corpus [7]150 utterances (72 minutes)Two talkers synthetically mixed at 0 dB target/masker ratio1 minute each of clean speech used for trainingRecognizersSphinx-3 (CMU)SRI

[7] M. Ostendorf, The Boston University Radio Corpus, 1995.Intro Background Current Proposed14Comparing NMF and CMF via ASR: ResultsIntro Background Current Proposed Unprocessed Non-negative Complex * Error bars mark 95% confidence levelWord Accuracy %Better15Mention that just a few percent improvement is a big dealJc add some listening examples of unprocessed, NMF, CMFMention that this is a significant challenge to traditional thinking that NMF performs satisfactorily and that ignoring phase is okayEl use the fact that these numbers really stink to point to the need for a better solution15Comparing NMF and CMF via ASR: ConclusionIncorporating phase estimates into matrix factorization can improve source separation performanceComplex matrix factorization is worth further researchIntro Background Current Proposed16[4] B. King and L. Atlas, Single-Channel Source Separation Using Complex Matrix Factorization, IEEE Transactions on Audio, Speech, and Language Processing (submitted).[5] B. King and L. Atlas, Single-channel Source Separation using Simplified-training Complex Matrix Factorization, International Conference on Acoustics, Speech, and Signal Processing, Dallas, TX: 2010. but overparameterization? can result in a potentially infinite number of solutions which isnt a good thing!Example: estimate observation with 3 bases, Intro Background Current Proposed#1#3#217Same bases, weights, SPARSITY!- To speed up this slide, reveal all 3 more rapidly17OverparameterizationDifficult toExtendReview of Current MethodsIntro Background Current ProposedNMFPLSACMF?ExtendibleUniqueAdditiveSuperposition18Write out over-p, super-p18Proposed Solution:Complex Probabilistic Latent Semantic Analysis (CPLSA)Intro Background Current ProposedGoal: incorporate phase observation and estimation into current nonnegative PLSA frameworkImplicitly solvesExtensibilitySuperpositionProposal to solveOverparameterization19Discuss why its called cplsa19Proposed Solution: OutlineIntro Background Current ProposedTransform complex to nonnegative data3 CPLSA variantsPhase constraints for STFT consistencyUnique solution20Transform Complex to Nonnegative DataIntro Background Current ProposedWhy is this important?Modeling observed data Xf,t as a probability mass functionPMFs are nonnegative, realObservation needs to be nonnegative, real21If then

aaaa-FINEDiscuss much more WHY the complex nonnegative is even important: modeling the observation as a pdf

21Transform Complex to Nonnegative DataIntro Background Current ProposedStarting point: Shashanka [8]N real N+1 nonnegativeAlgorithmN+1-length orthogonal vectors (AN+1,N)Affine transform (for nonnegativity)NormalizeMy new, proposed methodN complex 2N real2N real data 2N+1 nonnegative

[8] M. Shashanka, Simplex Decompositions for Real-Valued Datasets, IEEE International Workshop on Machine Learning for Signal Processing, 2009, pp. 1-6.22- Mention again that complex -> nonnegative is a contribution of ME!22

Transform Complex to Nonnegative DataIntro Background Current Proposed

233 Variants of CPLSAIntro Background Current Proposed#1 Complex bases

Phase is associated with basesNot a good model for STFT#2 Nonnegative bases + base-dependent phases

Good model for audio, but overparameterized

24- Mention again that no ones ever done this before243 Variants of CPLSAIntro Background Current ProposedNonnegative bases + source-dependent phases

Additive source modelGood model for audioFewer parametersSimplifies to NMF for single-source caseCompare with CPLSA #2

25

Phase Constraints for STFT ConsistencyIntro Background Current ProposedSTFT is consistent when

Incorporate STFT consistency [9] into phase estimation step for separated sourcesUnique solution!

[9] J. Le Roux, N. Ono, and S. Sagayama, Explicit Consistency Constraints for STFT Spectrograms and Their Application to Phase Reconstruction, 2008.

26- Add reference?26Summary of Proposed TheoryIntro Background Current ProposedGoal: incorporate phase observation and estimation into current nonnegative PLSA framework (extensible, additive, unique)TheoryTransform complex to nonnegative data3 CPLSA variantsPhase constraints for STFT consistency27Proposed ExperimentsIntro Background Current ProposedSeparating speech in structured, nonstationary noiseMethodsCPLSA, PLSA, CMFNoiseBabble noiseAutomotive noiseMeasurementsObjective perceptualASR28This type of noise chosen because its a common type of noise and typically difficult to deal with- Mention that Ive already compared NMF and CMF28Objective Measurement TestsIntro Background Current ProposedGoal: explore parameter spaceHow they affect performance in CPLSAFind best-performing parametersCompare performance of CPLSA with PLSA, CMFDataTIMIT corpus [10]MeasurementsBlind Source Separation Evaluation Toolbox [11]Perceptual Evaluation of Speech Quality (PESQ) [12][10] J.S. Garofolo, L.F. Lamel, W.M. Fisher, J.G. Fiscus, D.S. Pallett, and N.L. Dahlgren, DARPA TIMIT Acoustic Phonetic Continuous Speech Corpus, NIST, 1993.[11] E. Vincent, R. Gribonval, and C. Fevotte, Performance Measurement in Blind Audio Source Separation, IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, 2006, pp. 1462-1469.[12] A. Rix, J. Beerends, M. Hollier, and A. Hekstra, Perceptual Evaluation of Speech Quality (PESQ) - A New Method for Speech Quality Assessment of Telephone Networks and Codecs, ICASSP, 2001, pp. 749-752 vol.2.

29I want to get away from the incrementalLa cite the BSS, PESQGm MUSHRA + Fevotte or Vincent have a new BSS Eval metric (LVA / ICA)29Automatic Speech Recognition TestsIntro Background Current ProposedGoal: test robustness of parametersUse best-performing parameters from objective measurementsCompare performance of CPLSA with PLSA, CMFDataWall Street Journal corpus [13]ASR SystemSphinx-3 (CMU)

[13] D.B. Paul and J.M. Baker, The Design for the Wall Street Journal-Based CSR Corpus, Proceedings of the workshop on Speech and Natural Language, Stroudsburg, PA, USA: Association for Computational Linguistics, 1992, pp. 357362.

30Potential question why use different corpora?La cite WSJProCSeA

30Examples31

Frequency (Hz)Time (s)

Subway Noise NMF4.3 dB improvementdB is SDR improvement (see last slide for definition)32Frequency (Hz)Time (s)

Subway Noise NMF4.2 dB improvement34Fountain Noise Example #1

Target speaker synthetically added at -3 dB SNRSpeaker model trained on 60 seconds clean speech35

Fountain Noise Example #2No clean speech available for training of target talkerGeneric speaker model used

Mixed Speech (0 dB, no reverb)36

Mixed Speech (0 dB, reverb)37

Thank you!3838 minutes!!!

3839Why not encode phase into bases? Individual phase termIntro Background Current Proposed40X11ej/11ej/522ej/22ej/633ej/33ej/744ej/44ej/81234111ej0ej/1ej/5ej0ej/2ej/6ej0ej/3ej/7ej0ej/4ej/8BWej

40Why not encode phase into bases? Complex B, WIntro Background Current Proposed41X11ej/11ej/522ej/22ej/633ej/33ej/744ej/44ej/812341ej?1ej?1ej?BW

41BSS Evaluation Measures42

but superposition?Intro Background Current Proposed

43??? Take this slide out?43