speech processing laboratory, temple university may 5, 2004 1 structure-based speech...
DESCRIPTION
Speech Processing Laboratory, Temple University May 5, Overview Voiced and Unvoiced Speech Usable and Unusable Speech Nonlinearities in Speech Non-Linear Embedding Research Goal Proposed ResearchTRANSCRIPT
Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University
May 5, 2004May 5, 2004
1
Structure-Based Speech Classification Structure-Based Speech Classification Using Nonlinear Embedding Using Nonlinear Embedding
TechniquesTechniques
Uchechukwu Ofoegbu
AdvisorDr. Robert E. Yantorno
CommitteeDr. Saroj K. Biswas
Dr. Henry M. Sendaula
Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University
May 5, 2004May 5, 2004
2
AcknowledgmentAcknowledgment Dr. Robert YantornoDr. Robert Yantorno Dr. Saroj BiswasDr. Saroj Biswas Dr. Henry SendaulaDr. Henry Sendaula Speech Lab MembersSpeech Lab Members
Air Force Research Laboratory,Air Force Research Laboratory,Rome, NYRome, NY
Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University
May 5, 2004May 5, 2004
3
OverviewOverview Voiced and Unvoiced Speech
Usable and Unusable Speech
Nonlinearities in Speech
Non-Linear Embedding
Research Goal
Proposed Research
Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University
May 5, 2004May 5, 2004
4
Voiced and Unvoiced SpeechVoiced and Unvoiced Speech
Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University
May 5, 2004May 5, 2004
5
Voiced/Unvoiced CharacteristicsVoiced/Unvoiced Characteristics
Voiced
Quasi-periodic excitation
Modulation by vocal tract
Production of vowels, voiced fricatives & plosives
Unvoiced
No periodic vibration of vocal chords
Noise-like nature
Production of unvoiced fricatives and plosives
Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University
May 5, 2004May 5, 2004
6
Usable SpeechUsable Speech
Portions of co-channel speech still usable for applications such as Speaker ID and Speech Recognition.
Low-energy (unvoiced/silence) segments overlap with high-energy (voiced) segments
Target-to-interferer Ratio (TIR) > 20dB
Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University
May 5, 2004May 5, 2004
7
Nonlinearities in SpeechNonlinearities in SpeechGlottal waveform changes
Shape varies with amplitude
Physical observations Flow in vocal tract is non-laminar
Coupling between vocal tract and folds When glottis is open, prominent changes are observed
in formant characteristics
Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University
May 5, 2004May 5, 2004
8
Nonlinear EmbeddingNonlinear Embedding
Nonlinear Systems
Point moving along some trajectory in an abstract state space
Coordinates of the point are independent degrees of freedom of the system
State space could be reconstructed from a scalar signal
Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University
May 5, 2004May 5, 2004
9
Nonlinear Embedding (cont’d)Nonlinear Embedding (cont’d)
Takens’ Method of Delays
A state space representation topologically equivalent to the original state space of a system can be reconstructed from a single observable dimension
Vectors in m-dimensional state space are formed from time-delayed values of a signal
Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University
May 5, 2004May 5, 2004
10
Nonlinear Embedding (cont’d)Nonlinear Embedding (cont’d)
dmisdisdisisix 1,,2,,
m = embedding dimension
d = delay value
Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University
May 5, 2004May 5, 2004
11
Nonlinear Embedding (Cont’d)Nonlinear Embedding (Cont’d)Delay value, d:
Dependent on sampling rate and signal properties
Large enough such that nonlinearities are taken into account by the reconstructed trajectory
Small enough to retain reasonable time resolution
Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University
May 5, 2004May 5, 2004
12
Nonlinear Embedding (Cont’d)Nonlinear Embedding (Cont’d)Dimension, m:
Generation of voiced speech constitutes a low-dimensional system
Generation of unvoiced speech constitutes a relatively high-dimensional system
Using a low dimension (such as m = 3) sufficiently reconstructs voiced but not unvoiced speech
Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University
May 5, 2004May 5, 2004
13
Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University
May 5, 2004May 5, 2004
14
Embedded Voiced and Embedded Voiced and Unvoiced SpeechUnvoiced Speech
-50000
5000
10000
-5000
0
5000
10000-5000
0
5000
10000
Embedded Voiced Speech
-2000
0
2000
-2000-10000
10002000-2000
-1000
0
1000
2000
Embedded Unvoiced Speech
Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University
May 5, 2004May 5, 2004
15
Embedded Usable and Embedded Usable and Unusable SpeechUnusable Speech
-4000-2000
02000
40006000
-5000
0
5000-4000
-2000
0
2000
4000
6000
Embedded Co-channel Speech of 30dB TIR
-10000-5000
05000
-10000-5000
05000
-10000
-5000
0
5000
Embedded Co-channel Speech of 10dB TIR
Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University
May 5, 2004May 5, 2004
16
Research GoalResearch GoalFeature Extraction
Difference-Mean Comparison (DMC) Measure
– Voiced/unvoiced classification
Nodal Density Measure– Voiced/unvoiced classification– Usable/unusable classification
Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University
May 5, 2004May 5, 2004
Difference-Mean Difference-Mean Comparison (DMC) MeasureComparison (DMC) Measure
Voiced/Unvoiced ClassificationVoiced/Unvoiced Classification
Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University
May 5, 2004May 5, 2004
18
IntroductionIntroduction 3rd order difference computation along first
non-singleton dimension
Ist order difference of NxN matrix given by
Length(3rd order diff. > mean) observed
(2,1) (1,1) (2, 2) (1, 2) . . . (2, ) (1, )(3,1) (2,1) (3, 2) (2,2) . . . (3, ) (2, )
. . .
. . .
. . .( ,1) (( 1),1) ( , 2) (( 1),2) . . . ( , ) (( 1), )
X X X X X N X NX X X X X N X N
X N X N X N X N X N N X N N
Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University
May 5, 2004May 5, 2004
19
Embedded Voiced and Embedded Voiced and Unvoiced SpeechUnvoiced Speech
-50000
5000
10000
-5000
0
5000
10000-5000
0
5000
10000
Embedded Voiced Speech
-2000
0
2000
-2000-10000
10002000-2000
-1000
0
1000
2000
Embedded Unvoiced Speech
Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University
May 5, 2004May 5, 2004
20
Difference-Mean Comparison Difference-Mean Comparison Distribution Distribution
0 20 40 60 80 100 120 140 1600
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Prob
abili
ty
Difference-Mean Comparison
Clean Speech
VoicedUnvoiced
Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University
May 5, 2004May 5, 2004
21
Difference-Mean Comparison Difference-Mean Comparison DistributionDistribution
0 20 40 60 80 100 120 140 1600
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Prob
abili
ty
Difference-Mean Comparison
Speech + 15dB Pink Noise
VoicedUnvoiced
Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University
May 5, 2004May 5, 2004
22
Difference-Mean Comparison Difference-Mean Comparison DistributionDistribution
0 50 100 1500
0.05
0.1
0.15
0.2
Prob
abili
ty
Difference-Mean Comparison
Speech + 15dB White NoiseVoicedUnvoiced
Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University
May 5, 2004May 5, 2004
23
DMC-Based Decisions
200 400 600 800 1000 1200 1400-1
0
1
Clean Speech => 1:V; 0:Dont Care; -1:UV
Ampl
itude
200 400 600 800 1000 1200 1400-1
0
1
Deci
sion
200 400 600 800 1000 1200 1400-1
0
1
Sample Number
Deci
sion
Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University
May 5, 2004May 5, 2004
24
DMC-Based Decisions
200 400 600 800 1000 1200 1400-1
0
1
Speech + 15dB Pink Noise => 1:V; 0:Dont Care; -1:UV
Ampl
itude
200 400 600 800 1000 1200 1400-1
0
1
Deci
sion
200 400 600 800 1000 1200 1400-1
0
1
Sample Number
Deci
sion
Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University
May 5, 2004May 5, 2004
25
DMC-Based Decisions
200 400 600 800 1000 1200 1400-1
0
1
Speech + 15dB White Noise => 1:V; 0:Dont Care; -1:UV
Ampl
itude
200 400 600 800 1000 1200 1400-1
0
1
Deci
sion
200 400 600 800 1000 1200 1400-1
0
1
Sample Number
Deci
sion
Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University
May 5, 2004May 5, 2004
26
DMC-Based Decisions
200 400 600 800 1000 1200 1400-1
0
1
Clean Speech => 1:V; 0:Dont Care; -1:UV
Ampl
itude
200 400 600 800 1000 1200 1400-1
0
1
Deci
sion
200 400 600 800 1000 1200 1400-1
0
1
Sample Number
Deci
sion
Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University
May 5, 2004May 5, 2004
27
DMC-Based Decisions
200 400 600 800 1000 1200 1400-1
0
1
Speech + 15dB Pink Noise => 1:V; 0:Dont Care; -1:UVAm
plitu
de
200 400 600 800 1000 1200 1400-1
0
1
Deci
sion
200 400 600 800 1000 1200 1400-1
0
1
Sample Number
Deci
sion
Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University
May 5, 2004May 5, 2004
28
DMC-Based Decisions
200 400 600 800 1000 1200 1400-1
0
1
Speech + 15dB White Noise => 1:V; 0:Dont Care; -1:UV
Ampl
itude
200 400 600 800 1000 1200 1400-1
0
1
Deci
sion
200 400 600 800 1000 1200 1400-1
0
1
Sample Number
Deci
sion
Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University
May 5, 2004May 5, 2004
29
ResultsResultsHits Minus False Alarms for Voiced Speech
0
20
40
60
80
100
Clean 15dB P ink 15dB White
FR/RE E/ZC DMC
Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University
May 5, 2004May 5, 2004
30
Results (Cont’d)Results (Cont’d)Hits Minus False Alarms for Unvoiced Speech
0
20
40
60
80
100
Clean 15dB Pink 15dB White
Perc
ent
FR/RE E/ZC DMC
Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University
May 5, 2004May 5, 2004
Nodal Density MeasureNodal Density Measure Voiced/Unvoiced ClassificationUsable/Unusable Classification
Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University
May 5, 2004May 5, 2004
32
IntroductionIntroduction Smallest cube which encloses the signal is
determined
This cube is divided into N smaller cubes
Edges of the smaller cubes are defined as nodes
Number of nodes spanned by the signal is determined
Ratio of number of nodes spanned to total number of nodes is defined as nodal density
Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University
May 5, 2004May 5, 2004
Voiced/Unvoiced ClassificationVoiced/Unvoiced Classification
Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University
May 5, 2004May 5, 2004
34
Embedded Voiced and Unvoiced Embedded Voiced and Unvoiced Speech Frames with GridsSpeech Frames with Grids
-0.1-0.05
00.05
0.10.15
-0.1-0.05
00.05
0.10.15-0.1
-0.05
0
0.05
0.1
0.15
Voiced
-0.01-0.005
00.005
0.01
-0.01
-0.0050
0.005
0.01-0.01
-0.005
0
0.005
0.01
Unvoiced
Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University
May 5, 2004May 5, 2004
35
Nodes Spanned by Embedded Voiced and Nodes Spanned by Embedded Voiced and Unvoiced Speech FramesUnvoiced Speech Frames
-0.1-0.05
00.05
0.10.15
-0.1-0.05
00.05
0.10.15-0.1
-0.05
0
0.05
0.1
0.15
Voiced
-0.01-0.005
00.005
0.01
-0.01
-0.005
0
0.005
0.01-0.01
-0.005
0
0.005
0.01
Unvoiced
Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University
May 5, 2004May 5, 2004
36
Nodal-Density Distribution Nodal-Density Distribution
0.03 0.04 0.05 0.06 0.070
0.05
0.1
0.15
0.2
0.25
Prob
abili
ty
Nodal-Density
Clean Speech VoicedUnvoiced
Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University
May 5, 2004May 5, 2004
37
Nodal-Density Distribution Nodal-Density Distribution
0.03 0.04 0.05 0.06 0.070
0.05
0.1
0.15
0.2
0.25
Prob
abili
ty
Nodal-Density
Speech + 15dB Pink Noise
VoicedUnvoiced
Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University
May 5, 2004May 5, 2004
38
Nodal-Density Distribution Nodal-Density Distribution
0.04 0.045 0.05 0.055 0.06 0.065 0.07 0.0750
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
Prob
abili
ty
Nodal-Density
Speech + 15dB White NoiseVoicedUnvoiced
Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University
May 5, 2004May 5, 2004
39
FilteringFiltering
Moving Average Filter
Order, M = 10
Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University
May 5, 2004May 5, 2004
40
Nodal-Density Distributions after Nodal-Density Distributions after FilteringFiltering
0.03 0.04 0.05 0.06 0.070
0.05
0.1
0.15
0.2
Prob
abili
ty
Nodal Density
Clean Speech
VoicedUnvoiced
0.03 0.04 0.05 0.06 0.070
0.05
0.1
0.15
0.2
0.25
Prob
abili
ty
Nodal-Density
Clean Speech VoicedUnvoiced
Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University
May 5, 2004May 5, 2004
41
Nodal-Density Distributions after Nodal-Density Distributions after FilteringFiltering
0.03 0.04 0.05 0.06 0.070
0.05
0.1
0.15
0.2
0.25
Prob
abili
ty
Nodal Density
Speech + 15dB Pink Noise
VoicedUnvoiced
0.03 0.04 0.05 0.06 0.070
0.05
0.1
0.15
0.2
0.25
Prob
abili
ty
Nodal-Density
Speech + 15dB Pink NoiseVoicedUnvoiced
Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University
May 5, 2004May 5, 2004
42
Nodal-Density Distributions After Nodal-Density Distributions After FilteringFiltering
0.04 0.05 0.06 0.070
0.05
0.1
0.15
0.2
Prob
abili
ty
Nodal Density
Speech + 15dB White Noise
VoicedUnvoiced
0.04 0.045 0.05 0.055 0.06 0.065 0.07 0.0750
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
Prob
abili
ty
Nodal-Density
Speech + 15dB White NoiseVoicedUnvoiced
Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University
May 5, 2004May 5, 2004
43
ResultsResultsHits Minus False Alarms for Voiced Speech
010203040506070
Clean 15dB P ink 15dB White
ND ND_Filt
Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University
May 5, 2004May 5, 2004
44
Results (Cont’d)Results (Cont’d)
Hits Minus False Alarms for Unvoiced Speech
010203040506070
Clean 15dB Pink 15dB White
Perc
ent
ND ND_Filt
Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University
May 5, 2004May 5, 2004
Proposed ResearchProposed Research
Usable/Unusable ClassificationUsable/Unusable Classification
Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University
May 5, 2004May 5, 2004
46
Embedded Usable and Unusable Embedded Usable and Unusable Speech Frames with GridsSpeech Frames with Grids
-10000-5000
05000
-10000-5000
05000
-10000
-5000
0
5000
Embedded Co-channel Speech of 10dB TIR with Grids
-5000
0
5000
-5000
0
5000-5000
0
5000
Embedded Co-channel Speech of 30dB TIR with Grids
Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University
May 5, 2004May 5, 2004
47
Nodes Spanned by Embedded Usable Nodes Spanned by Embedded Usable and Unusable Speech Framesand Unusable Speech Frames
-4000-2000
02000
40006000
-5000
0
5000-4000
-2000
0
2000
4000
6000
Nodes Spanned by Embedded Co-channel Speech of 30dB TIR
-10000
-5000
0
5000
-10000
-5000
0
5000-6000
-4000
-2000
0
2000
4000
6000
Nodes Spanned by Embedded Co-channel Speech of 30dB TIR
-10000
-5000
0
5000
-10000
-5000
0
5000-6000
-4000
-2000
0
2000
4000
6000
Nodes Spanned by Embedded Co-channel Speech of 30dB TIR
Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University
May 5, 2004May 5, 2004
48
Preliminary ResultsPreliminary Results
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1ROC Curve for Usable Speech Detection Using the Nodal Density Measure
False Alarms
Hits
Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University
May 5, 2004May 5, 2004
49
SummarySummary
SpeechSpeech Nonlinear Embedding
Difference-Mean
Comparison
Nodal Density Usable/Unusable Usable/Unusable
ClassificationClassification
V/UV ClassificationV/UV Classification
V/UV ClassificationV/UV Classification
Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University
May 5, 2004May 5, 2004
50
Future Proposed ResearchFuture Proposed Research Determine optimum filter for nodal density-based
voiced/unvoiced classification
Develop nodal density measure for usable/unusable classification
Investigate the presence of complimentary information in between both features (DMC and nodal density) for voiced/unvoiced classification
Perform decision-level fusion of both features
Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University
May 5, 2004May 5, 2004
51
If you understood this If you understood this presentation presentation
……
please askplease ask QUESTIONS !!!QUESTIONS !!!