a study of sparse non-negative matrix factor 2-d deconvolution combined with mask application for...

1

A Study of Sparse Non-negative Matrix Factor 2-D Deconvolution Combined With

Mask Application for Blind Source Separation of Frog Species

Reporter ： Jain-De Lee Advisor ： Wen-Ping Chen

Department of Electrical EngineeringNational Kaohsiung University of Applied Sciences

Network Application Laboratory

2

Outline

Introduction and Motivation

Background

Research Method

Experiment Results

Conclusion and Future Works

Research Results

3

Introduction Current Technology of the Ecological Survey

◦ Sensor networks

◦ Wireless network

Advantage◦ Reduce the cost of human resource and time ◦ Save and share the raw data conveniently

Disadvantage

◦ Large amount of raw data needs to be analyzed

Voiceprint Recognition System◦ Analyze raw data fast

4

Introduction

5

Introduction Blind Source Separation

◦ Cocktail party problem

6

Introduction Independent Subspace Analysis :

◦ M. A. Casey and A. Westner[2000] Proceedings of the International Computer Music Conference

◦ Md. K. I. Molla and K. Hirose[2007] IEEE Transactions on Audio, Speech, and Language Processing

Wiener Filter :◦ L. Bonaroya and F. Bimbot[2003]

International Symposium on Independent Component Analysis and Blind Signal Separation

◦ E. M. Grais and H. Erdogan[2011] IEEE Digital Signal Processing, Sedona, Arizona

Non-negative Matrix Factorization:◦ P. Smaragdis[2004]

International Symposium on Independent Component Analysis and Blind Source Separation

7

Introduction Independent Component Analysis Combined with Other

Methods :◦ J. Lin and A. Zhang[2005]

NDT & E International

◦ M. E. Davies and C. J. James[2007] Signal Processing

◦ X. Cheng, N. Li, Y. Cheng and Z. Chen[2007] International Conference on Bioinformatics and Biomedical

Engineering

◦ B. Mijović, M. D. Vos, I. Gligorijević, J. Taelman and S. V. Huffel[2010] IEEE Transactions on Biomedical Engineering

8

Motivation Single Channel Blind Source Separation Preprocessing of Voiceprint Recognition System Improve Quality of Separated Signals

9

Background

Post-processing

Reconstruct Signal

Blind Source Separation

ICA、 NMFPre-processing

Whitening、 T-F Representation

10

Background Independent Component Analysis

◦ Looking for components of statistically independent from observational signals and estimating de-mixing matrix

◦ Constraint conditions

The components are statistically independent

At most one gaussian source is allowed

At least as many sensor responses as source signals

◦ Processing steps

Pre-processing

Centering

Whitening

Measurement of non-Gaussian component

11

Background Measurement of Non-Gaussian Component

◦ Kurtosis

◦ Mutual Information

◦ Neg-entropy

H(x) H(y)

H(x,y)

I(x,y)

H(x|y) H(y|x)

Random Variable y kurt(y)

Gaussian kurt(y) = 0

Non-Gaussian

Super-Gaussian kurt(y) > 0

Sub-Gaussian kurt(y) < 0

)()()( YHYHYJ gauss

J(Y): Neg-Entropy

H(Ygauss): Entropy of Gaussian Distribution

H(Y): Entropy of Random Variable

I(x,y)

12

V1

… … ...

= H2H1W1 W2V1 +

Background Non-negative Matrix Factorization

W1 W2

H1

H2 ...…

13

Background Non-negative Matrix Factorization

◦ Cost function Based on Euclidean Distance

Based on Kullback–Leibler Divergence

2

,

2 mn

nmnmVV

mnnmnm

nm

nmnm V

VVVD

,

log)||(

V : Original Signal

: Reconstructed Signal

14

Background

In it ia l W a n d H

R e c o n s tru c t S ig n a l

U p d a te th e M a tr ix W

C a lc u la te th e C o s t F u n c tio n V a lu e

C o n v e rg e n t?

E n d

N

Y

U p d a te th e M a tr ix H


15

Background Sparse Non-negative Matrix Factor 2-D Deconvolution

(SNMF2D)

◦ Obtain temporal structure and the pitch change

◦ Control the sparse degree of non-negative matrix factorization

Non-negative Matrix Factor 2-D Deconvolution

◦ τ basis matrix and φ coefficient matrix

◦ Shift operator

Sparse Coding◦ Take a few units to represent the data effectively

◦ Parts-based representations

16151413

1211109

8765

4321

A

1514130

111090

7650

32101

A

8765

4321

0000

00002

A

16

Background

HWV

12

dH

11

dH

11

dW 1

2

dW 2

1

dW 2

2

dW

0

17

Background Sparse Non-negative Matrix Factor 2-D Deconvolution

◦ Cost function Based on Euclidean Distance

Based on Kullback–Leibler Divergence

λ:Sparse Factor

f(•):Sparse Function

)()~

(2

1~ 2

,

2HfVV

jiijij

)(~

~log)~

||(.

HfVV

VVD ijijji ij

ijij

18

Background

In it ia l W a n d H

N o rm a liz e th e M a tr ix W


U p d a te th e M a tr ix W

C a lc u la te th e C o s t F u n c tio n V a lu e

C o n v e rg e n t?

E n d

N

Y

U p d a te th e M a tr ix H


19

Research Method

Signal Input

Pre-processing

Data update

Reconstruct Signals

Binary Mask

Signals Correction

Post-processing

Separated Signals

SNMF2D

Mask

20

Research Method Pre-processing

◦ Time-domain signal converses to time-frequency signal Analysis windows Window function Signal conversion

0

)()(

nsns

otherwise

Nn

,

10, ,

1

2cos46.054.0)(

N

nnw

10 Nn

1

0

2

)(ˆ)(N

n

kN

nj

enskX

21

研究方法1W 2W 3W

dd

d HWV

1dW 2dW

1H

2H

1dH

2dH

1H

2H

22

研究方法

Reconstructed Signal of Latouche's Frog Reconstructed Signal of Sauter's Brown Frog

23

Mask Correction Mask Correction

B in a ry M a s k

S ig n a l E x tra c tio n

R a tio C a lc u la tio n

S ig n a l C o rre c tio n

R e c o n s tru c te d S ig n a l

24

Mask Correction Binary Mask

◦ The reconstructed signal converses to binary mask

◦ Find a suitable threshold T

1,),(

0,),(),(

TyxG

TyxGyxM

M(x,y): Binary Mask

G(x,y): Reconstructed Signal

25

Mask Correction Otsu Method

◦ Create a histogram

Element

Number

26

Mask Correction

T T T T T T

22121 )( MMWWDi

}{maxarg1

iLi

DT

L

T

iiPW

11

L

TiiPW

12

1

11 W

PiM

T

ii

2

12 W

PiM

L

Tii

Element

27

Mask Correction

28

Mask Correction Signal Extraction

),(),(),( yxMyxVyxS

V(x,y):Original Mixed-Signal

S(x,y): Extraction of Signal

29

Mask Correction

30

Mask Correction Find a Ratio of Mixture Components

),(

),(),(

yxG

yxGyxR

T

ii Ni 1，

GT(x,y): Sum of reconstructed signals

Gi(x,y): Reconstructed Signal

Ri(x,y): Ratio of mixture

N: Total Numbers of reconstructed Signals

31

Mask Correction Signal correction

),(),(),(~

yxRyxSyxS iii Ni 1，

),(~

yxSi : Revised Signals

),( yxSi : Extraction of Signals

32

Mask Correction Signal correction

),(~

),(),(ˆ yxSyxSyxS jii ji ，

),(ˆ yxSi : Corrected Signals

0

),(ˆ),(ˆ yxS

yxS0),(ˆ,

0),(ˆ,

yxS

yxS

33

Mask Correction

35

Post-processing

Phase Information IDFT Window

Function

1

0

2

)(1

)(ˆN

k

kN

nj

ekXN

ns

22 cos)(sin)( nsns )(ns

36

Experiment Results Parameter Items Parameter Value

STFT

Window Size 512 samples

Window Overlapping 50%

Window Function Hamming Window

Frequency Bin 512

SNMF2D

Basis Matrix [1…3]

Coefficient Matrix [1…5]

Sparse Factor 5

Frog Species 8

Mixtrue Items 7

37

Experiment Results

SSDR MSDR SSDR MSDR0

5

10

15

30

50

80

dB

White-lippd tree frog

Japanese tree frog

SSDR MSDR SSDR MSDR SSDR MSDR02468

10121416

30

50

80

dB

Japanese tree frog

Latouche's frog

Heymons's narrow-

mouthed toad


2

4

6

8

10

12

30

50

80

dB

Taipei green tree frog

Latouche's frog

Performance Measurement—SDR(Signal-to-Distortion Ratio)

38


2

4

6

8

10

30

50

80

dB

Eiffinger's tree frog

Latouche's frog

SSDR MSDR SSDR MSDR SSDR MSDR-1.50.52.54.56.58.5

10.512.5

30

50

80

dB

Moltrecht's Green tree

frog


Latouche's frog


5

10

15

20

25

30

50

80

dB

Moltrecht's green tree frog

Heymons's narrow-mouthed toad

SSDR MSDR SSDR MSDR SSDR MSDR0

2

4

6

8

10

12

30

50

80

dB



Olive frog

39

Experiment Results

SSIR MSIR SSIR MSIR0

5

10

15

20

25

30

30

50

80

dB


Japanese tree frog

SSIR MSIR SSIR MSIR SSIR MSIR0

5

10

15

20

25

30

30

50

80

dB

Japanese tree frog

Latouche's frog

Heymons's narrow-

mouthed toad


5

10

15

20

25

30

30

50

80

dB


Latouche's frog

Performance Measurement—SIR(Source-to-Interference Ratio)

40


5

10

15

20

25

30

50

80

dB

Moltrecht's Green tree

frog


Latouche's frog


5

10

15

30

50

80

dB

Eiffinger's tree frog

Latouche's frog


5

10

15

20

25

30

30

50

80

dB

Moltrecht's green tree frog



5

10

15

30

50

80

dB



Olive frog

41

Experiment Results

Method Iterations Variance

SNMF2D

30 10.71275

50 7.56728

SNMF2D+MASK

30 27.73557

50 19.40138

42

Experiment Results

Parameter Items Parameter Value

Frame Length 512 samples

Frame Overlapping 50%

Window Function Hamming Window

Frequency Bin 512

Feature Parameters Mel-Frequency Cepstral Coefficient

Feature Dimensions 15D

Test Syllable 410

43

Experiment Results Recognition Experiment

Method Iterations Total SyllableCorrect

SyllableAccuracy(%)

SNMF2D

30 410 203 49.51

50 410 200 48.78

80 410 205 50

SNMF2D+MASK

30 410 318 77.56

50 410 323 78.78

80 410 334 81.46

44

Conclusion and Future Works The proposed method

◦ Improve the quality of separated signals effectively

◦ Use less time to improve the quality of separated signals

◦ Enhance the recognition rate of separated signals, and the average recognition rate can be improved 29.84%

45

Conclusion and Future Works Future Works

◦ Study of de-noise methods

◦ Determine the numbers of species of raw data

◦ Study of the initial value setting

◦ Collect various sound of species . Then, Improve the recognition rate

46

Research ResultsCompetition◦第七屆數位訊號處理創思設計競賽—入圍

Patent◦蛙聲混音分離方法—審查中

47

Thank you for your attention !!

a study of sparse non-negative matrix factor 2-d deconvolution combined with mask application for...

Documents

gaussian source

d deconvolution basis

blind signal separatione

raw data needs

original signal

entropy of random variable

observational signals

huffel2010ieee transactions