a study of sparse non-negative matrix factor 2-d deconvolution combined with mask application for...
TRANSCRIPT
1
A Study of Sparse Non-negative Matrix Factor 2-D Deconvolution Combined With
Mask Application for Blind Source Separation of Frog Species
Reporter : Jain-De Lee Advisor : Wen-Ping Chen
Department of Electrical EngineeringNational Kaohsiung University of Applied Sciences
Network Application Laboratory
2
Outline
Introduction and Motivation
Background
Research Method
Experiment Results
Conclusion and Future Works
Research Results
3
Introduction Current Technology of the Ecological Survey
◦ Sensor networks
◦ Wireless network
Advantage◦ Reduce the cost of human resource and time ◦ Save and share the raw data conveniently
Disadvantage
◦ Large amount of raw data needs to be analyzed
Voiceprint Recognition System◦ Analyze raw data fast
4
Introduction
5
Introduction Blind Source Separation
◦ Cocktail party problem
6
Introduction Independent Subspace Analysis :
◦ M. A. Casey and A. Westner[2000] Proceedings of the International Computer Music Conference
◦ Md. K. I. Molla and K. Hirose[2007] IEEE Transactions on Audio, Speech, and Language Processing
Wiener Filter :◦ L. Bonaroya and F. Bimbot[2003]
International Symposium on Independent Component Analysis and Blind Signal Separation
◦ E. M. Grais and H. Erdogan[2011] IEEE Digital Signal Processing, Sedona, Arizona
Non-negative Matrix Factorization:◦ P. Smaragdis[2004]
International Symposium on Independent Component Analysis and Blind Source Separation
7
Introduction Independent Component Analysis Combined with Other
Methods :◦ J. Lin and A. Zhang[2005]
NDT & E International
◦ M. E. Davies and C. J. James[2007] Signal Processing
◦ X. Cheng, N. Li, Y. Cheng and Z. Chen[2007] International Conference on Bioinformatics and Biomedical
Engineering
◦ B. Mijović, M. D. Vos, I. Gligorijević, J. Taelman and S. V. Huffel[2010] IEEE Transactions on Biomedical Engineering
8
Motivation Single Channel Blind Source Separation Preprocessing of Voiceprint Recognition System Improve Quality of Separated Signals
9
Background
Post-processing
Reconstruct Signal
Blind Source Separation
ICA、 NMFPre-processing
Whitening、 T-F Representation
10
Background Independent Component Analysis
◦ Looking for components of statistically independent from observational signals and estimating de-mixing matrix
◦ Constraint conditions
The components are statistically independent
At most one gaussian source is allowed
At least as many sensor responses as source signals
◦ Processing steps
Pre-processing
Centering
Whitening
Measurement of non-Gaussian component
11
Background Measurement of Non-Gaussian Component
◦ Kurtosis
◦ Mutual Information
◦ Neg-entropy
H(x) H(y)
H(x,y)
I(x,y)
H(x|y) H(y|x)
Random Variable y kurt(y)
Gaussian kurt(y) = 0
Non-Gaussian
Super-Gaussian kurt(y) > 0
Sub-Gaussian kurt(y) < 0
)()()( YHYHYJ gauss
J(Y): Neg-Entropy
H(Ygauss): Entropy of Gaussian Distribution
H(Y): Entropy of Random Variable
I(x,y)
12
V1
… … ...
= H2H1W1 W2V1 +
Background Non-negative Matrix Factorization
W1 W2
H1
H2 ...…
13
Background Non-negative Matrix Factorization
◦ Cost function Based on Euclidean Distance
Based on Kullback–Leibler Divergence
2
,
2 mn
nmnmVV
mnnmnm
nm
nmnm V
VVVD
,
log)||(
V : Original Signal
: Reconstructed Signal
14
Background
In it ia l W a n d H
R e c o n s tru c t S ig n a l
U p d a te th e M a tr ix W
C a lc u la te th e C o s t F u n c tio n V a lu e
C o n v e rg e n t?
E n d
N
Y
U p d a te th e M a tr ix H
R e c o n s tru c t S ig n a l
15
Background Sparse Non-negative Matrix Factor 2-D Deconvolution
(SNMF2D)
◦ Obtain temporal structure and the pitch change
◦ Control the sparse degree of non-negative matrix factorization
Non-negative Matrix Factor 2-D Deconvolution
◦ τ basis matrix and φ coefficient matrix
◦ Shift operator
Sparse Coding◦ Take a few units to represent the data effectively
◦ Parts-based representations
16151413
1211109
8765
4321
A
1514130
111090
7650
32101
A
8765
4321
0000
00002
A
16
Background
HWV
12
dH
11
dH
11
dW 1
2
dW 2
1
dW 2
2
dW
0
17
Background Sparse Non-negative Matrix Factor 2-D Deconvolution
◦ Cost function Based on Euclidean Distance
Based on Kullback–Leibler Divergence
λ:Sparse Factor
f(•):Sparse Function
)()~
(2
1~ 2
,
2HfVV
jiijij
)(~
~log)~
||(.
HfVV
VVD ijijji ij
ijij
18
Background
In it ia l W a n d H
N o rm a liz e th e M a tr ix W
R e c o n s tru c t S ig n a l
U p d a te th e M a tr ix W
C a lc u la te th e C o s t F u n c tio n V a lu e
C o n v e rg e n t?
E n d
N
Y
U p d a te th e M a tr ix H
R e c o n s tru c t S ig n a l
19
Research Method
Signal Input
Pre-processing
Data update
Reconstruct Signals
Binary Mask
Signals Correction
Post-processing
Separated Signals
SNMF2D
Mask
20
Research Method Pre-processing
◦ Time-domain signal converses to time-frequency signal Analysis windows Window function Signal conversion
0
)()(
nsns
otherwise
Nn
,
10, ,
1
2cos46.054.0)(
N
nnw
10 Nn
1
0
2
)(ˆ)(N
n
kN
nj
enskX
21
研究方法1W 2W 3W
dd
d HWV
1dW 2dW
1H
2H
1dH
2dH
1H
2H
22
研究方法
Reconstructed Signal of Latouche's Frog Reconstructed Signal of Sauter's Brown Frog
23
Mask Correction Mask Correction
B in a ry M a s k
S ig n a l E x tra c tio n
R a tio C a lc u la tio n
S ig n a l C o rre c tio n
R e c o n s tru c te d S ig n a l
24
Mask Correction Binary Mask
◦ The reconstructed signal converses to binary mask
◦ Find a suitable threshold T
1,),(
0,),(),(
TyxG
TyxGyxM
M(x,y): Binary Mask
G(x,y): Reconstructed Signal
25
Mask Correction Otsu Method
◦ Create a histogram
Element
Number
26
Mask Correction
T T T T T T
22121 )( MMWWDi
}{maxarg1
iLi
DT
L
T
iiPW
11
L
TiiPW
12
1
11 W
PiM
T
ii
2
12 W
PiM
L
Tii
Element
27
Mask Correction
28
Mask Correction Signal Extraction
),(),(),( yxMyxVyxS
V(x,y):Original Mixed-Signal
S(x,y): Extraction of Signal
29
Mask Correction
30
Mask Correction Find a Ratio of Mixture Components
),(
),(),(
yxG
yxGyxR
T
ii Ni 1,
GT(x,y): Sum of reconstructed signals
Gi(x,y): Reconstructed Signal
Ri(x,y): Ratio of mixture
N: Total Numbers of reconstructed Signals
31
Mask Correction Signal correction
),(),(),(~
yxRyxSyxS iii Ni 1,
),(~
yxSi : Revised Signals
),( yxSi : Extraction of Signals
32
Mask Correction Signal correction
),(~
),(),(ˆ yxSyxSyxS jii ji ,
),(ˆ yxSi : Corrected Signals
0
),(ˆ),(ˆ yxS
yxS0),(ˆ,
0),(ˆ,
yxS
yxS
33
Mask Correction
34
35
Post-processing
Phase Information IDFT Window
Function
1
0
2
)(1
)(ˆN
k
kN
nj
ekXN
ns
22 cos)(sin)( nsns )(ns
36
Experiment Results Parameter Items Parameter Value
STFT
Window Size 512 samples
Window Overlapping 50%
Window Function Hamming Window
Frequency Bin 512
SNMF2D
Basis Matrix [1…3]
Coefficient Matrix [1…5]
Sparse Factor 5
Frog Species 8
Mixtrue Items 7
37
Experiment Results
SSDR MSDR SSDR MSDR0
5
10
15
30
50
80
dB
White-lippd tree frog
Japanese tree frog
SSDR MSDR SSDR MSDR SSDR MSDR02468
10121416
30
50
80
dB
Japanese tree frog
Latouche's frog
Heymons's narrow-
mouthed toad
SSDR MSDR SSDR MSDR0
2
4
6
8
10
12
30
50
80
dB
Taipei green tree frog
Latouche's frog
Performance Measurement—SDR(Signal-to-Distortion Ratio)
38
SSDR MSDR SSDR MSDR0
2
4
6
8
10
30
50
80
dB
Eiffinger's tree frog
Latouche's frog
SSDR MSDR SSDR MSDR SSDR MSDR-1.50.52.54.56.58.5
10.512.5
30
50
80
dB
Moltrecht's Green tree
frog
Taipei green tree frog
Latouche's frog
SSDR MSDR SSDR MSDR0
5
10
15
20
25
30
50
80
dB
Moltrecht's green tree frog
Heymons's narrow-mouthed toad
SSDR MSDR SSDR MSDR SSDR MSDR0
2
4
6
8
10
12
30
50
80
dB
Heymons's narrow-mouthed toad
White-lippd tree frog
Olive frog
39
Experiment Results
SSIR MSIR SSIR MSIR0
5
10
15
20
25
30
30
50
80
dB
White-lippd tree frog
Japanese tree frog
SSIR MSIR SSIR MSIR SSIR MSIR0
5
10
15
20
25
30
30
50
80
dB
Japanese tree frog
Latouche's frog
Heymons's narrow-
mouthed toad
SSIR MSIR SSIR MSIR0
5
10
15
20
25
30
30
50
80
dB
Taipei green tree frog
Latouche's frog
Performance Measurement—SIR(Source-to-Interference Ratio)
40
SSIR MSIR SSIR MSIR SSIR MSIR0
5
10
15
20
25
30
50
80
dB
Moltrecht's Green tree
frog
Taipei green tree frog
Latouche's frog
SSIR MSIR SSIR MSIR0
5
10
15
30
50
80
dB
Eiffinger's tree frog
Latouche's frog
SSIR MSIR SSIR MSIR0
5
10
15
20
25
30
30
50
80
dB
Moltrecht's green tree frog
Heymons's narrow-mouthed toad
SSIR MSIR SSIR MSIR SSIR MSIR0
5
10
15
30
50
80
dB
Heymons's narrow-mouthed toad
White-lippd tree frog
Olive frog
41
Experiment Results
Method Iterations Variance
SNMF2D
30 10.71275
50 7.56728
SNMF2D+MASK
30 27.73557
50 19.40138
42
Experiment Results
Parameter Items Parameter Value
Frame Length 512 samples
Frame Overlapping 50%
Window Function Hamming Window
Frequency Bin 512
Feature Parameters Mel-Frequency Cepstral Coefficient
Feature Dimensions 15D
Test Syllable 410
43
Experiment Results Recognition Experiment
Method Iterations Total SyllableCorrect
SyllableAccuracy(%)
SNMF2D
30 410 203 49.51
50 410 200 48.78
80 410 205 50
SNMF2D+MASK
30 410 318 77.56
50 410 323 78.78
80 410 334 81.46
44
Conclusion and Future Works The proposed method
◦ Improve the quality of separated signals effectively
◦ Use less time to improve the quality of separated signals
◦ Enhance the recognition rate of separated signals, and the average recognition rate can be improved 29.84%
45
Conclusion and Future Works Future Works
◦ Study of de-noise methods
◦ Determine the numbers of species of raw data
◦ Study of the initial value setting
◦ Collect various sound of species . Then, Improve the recognition rate
46
Research ResultsCompetition◦第七屆數位訊號處理創思設計競賽—入圍
Patent◦蛙聲混音分離方法—審查中
47
Thank you for your attention !!