1.130 -wavelets, filter banks and applications wavelet-based feature extraction for phoneme...

32
1.130 -Wavelets, Filter Banks and Applications Wavelet-Based Feature Extraction for Phoneme Recognition and Classification Ghinwa Choueiter

Upload: henry-sparks

Post on 14-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1.130 -Wavelets, Filter Banks and Applications Wavelet-Based Feature Extraction for Phoneme Recognition and Classification Ghinwa Choueiter

1.130 -Wavelets, Filter Banks and Applications

Wavelet-Based Feature Extraction for Phoneme Recognition and

Classification

Ghinwa Choueiter

Page 2: 1.130 -Wavelets, Filter Banks and Applications Wavelet-Based Feature Extraction for Phoneme Recognition and Classification Ghinwa Choueiter

Outline • Introduction: What are wavelets/phonemes

• Problem specification

• Motivation

• Experimental Setup

• Wavelet-based feature extractor architecture

• Results

• Conclusions

• References

Page 3: 1.130 -Wavelets, Filter Banks and Applications Wavelet-Based Feature Extraction for Phoneme Recognition and Classification Ghinwa Choueiter

What are Wavelets

• The wavelet is a well localised function both in the time and frequency domains

• Alternative proposed to overcome the resolution problem of the STFT for analyzing nonstationary signals

• Uses a constant-Q analysis to represent the signal in a time-scale plane

• Showed potential in applications of speech recognition such as speech analysis, pitch detection, and speech compression

Page 4: 1.130 -Wavelets, Filter Banks and Applications Wavelet-Based Feature Extraction for Phoneme Recognition and Classification Ghinwa Choueiter

What are Wavelets (2)

Daubechies 4-tap filter

k

k ktgt 22

k

k ktht 22

Wavelet equationScaling equation

Page 5: 1.130 -Wavelets, Filter Banks and Applications Wavelet-Based Feature Extraction for Phoneme Recognition and Classification Ghinwa Choueiter

What are Wavelets (3)

• P. P. Vaidyanathan, “Lossless systems in wavelet transforms”. IEEE International Symposium on Circuits and Systems, 1991.

Discrete time Wavelet transforms and magnitude responses of wavelet filters at 3 different scales

Page 6: 1.130 -Wavelets, Filter Banks and Applications Wavelet-Based Feature Extraction for Phoneme Recognition and Classification Ghinwa Choueiter

What are Phonemes • Phonemes are the smallest units in the sound

system of a language that allows distinguishing between the meanings of words

• Phonemes Categories:1. Vowels are produced with periodic excitation and are thus

characterized by resonance frequencies (200Hz-3500Hz)

2. Fricatives are generated due to turbulence at narrow constriction and are characterized by a noisy broad-spectrum

3. Plosives are produced by a complete closure of the vocal tract followed by its sudden release. Spectral content is usually weak in energy

Page 7: 1.130 -Wavelets, Filter Banks and Applications Wavelet-Based Feature Extraction for Phoneme Recognition and Classification Ghinwa Choueiter

Problem Specification

• Mel-frequency cepstral coefficients are the most widely speech features in the problem of speech recognition

Speech WaveformPower

SpectrumComputation

MelSpectrum

Computation

NaturalLogarithm

DCT MFCC

• The mel-scaled filterbank is a series of triangular BPF designed to simulate the human auditory system

Page 8: 1.130 -Wavelets, Filter Banks and Applications Wavelet-Based Feature Extraction for Phoneme Recognition and Classification Ghinwa Choueiter

Problem Specification (2)

• In this work we attempt to extract features based on a wavelet analysis making use of the flexibility that it provides in manipulating time versus frequency resolution in order to design the appropriate classifiers for the different types of signals that we have.

Page 9: 1.130 -Wavelets, Filter Banks and Applications Wavelet-Based Feature Extraction for Phoneme Recognition and Classification Ghinwa Choueiter

Problem Specification (3)

• Perform phoneme recognition among three classes:

• Perform phoneme recognition within each category

1. Vowels ‘ae’/bat ‘aa’/ Bob ‘iy’/beat ‘uw’/boot

2. Fricatives ‘sh’/she ‘v’/vowel ‘s’/see ‘dh’/thee

3. Plosives Stops ‘b’/bob ‘p’/poop ‘d’/dot ‘k’/cot

Page 10: 1.130 -Wavelets, Filter Banks and Applications Wavelet-Based Feature Extraction for Phoneme Recognition and Classification Ghinwa Choueiter

MotivationSample vowels spectrograms

Low-Frequency Formants

Page 11: 1.130 -Wavelets, Filter Banks and Applications Wavelet-Based Feature Extraction for Phoneme Recognition and Classification Ghinwa Choueiter

Motivation (2)Sample fricatives spectrograms

Strong High- Frequency Content

Page 12: 1.130 -Wavelets, Filter Banks and Applications Wavelet-Based Feature Extraction for Phoneme Recognition and Classification Ghinwa Choueiter

Motivation (3)Sample plosives spectrograms

Weak Overall Frequency Content

Page 13: 1.130 -Wavelets, Filter Banks and Applications Wavelet-Based Feature Extraction for Phoneme Recognition and Classification Ghinwa Choueiter

The Experimental Setup

• Timit speech database

• Speech signals sampled at 16khz

• Phonemes extracted from 200 training utterances and 150 test utterances

Phoneme class Training Data Test Data

Vowels 322 299

Stops 370 381

Fricatives 396 347

Page 14: 1.130 -Wavelets, Filter Banks and Applications Wavelet-Based Feature Extraction for Phoneme Recognition and Classification Ghinwa Choueiter

The Experimental Setup (2)Vowels Training

DataTest Data

‘ae’ 100 86

‘iy’ 100 100

‘aa’ 100 93

‘uw’ 22 20

Frics Training Data

Test Data

‘sh’ 96 52

‘v’ 100 97

‘s’ 100 100

‘dh’ 100 98

Stops Training Data

Test Data

‘b’ 70 81

‘p’ 100 100

‘d’ 100 100

‘k’ 100 100

Page 15: 1.130 -Wavelets, Filter Banks and Applications Wavelet-Based Feature Extraction for Phoneme Recognition and Classification Ghinwa Choueiter

The Experimental Setup (3)

• Features Extracted:

1. 13 dimensional MFCC vectors

2. Variable dimensions Wavelet-DCT vectors depending on the phoneme class

• ML and MAP classifiers used with Gaussian Mixture Models where Mixture=4

Page 16: 1.130 -Wavelets, Filter Banks and Applications Wavelet-Based Feature Extraction for Phoneme Recognition and Classification Ghinwa Choueiter

Previous Work

• Mel wavelet cepstral coefficients

• Applying wavelet analysis to speech segmentation

and classification

• Mel-scaled discrete wavelet coefficients

• Applying sampled continuous wavelet transform in

phoneme recognition

• Symmetric octave filter bank

Page 17: 1.130 -Wavelets, Filter Banks and Applications Wavelet-Based Feature Extraction for Phoneme Recognition and Classification Ghinwa Choueiter

A Basic Feature Extractor Architecture

Provides us with three degrees of freedom:•The wavelet type•The fractional moment k•The decomposition

Speech WaveformDWT

FractionalMoment

Computation

NaturalLogarithm

Wavelet-DCTCoeffWavelet Type

Fractional MomentDegree k

Decomposition type

HammingWindow

DCT

Page 18: 1.130 -Wavelets, Filter Banks and Applications Wavelet-Based Feature Extraction for Phoneme Recognition and Classification Ghinwa Choueiter

The Vowel-Fricative/Stop Feature Extraction

• The wavelet type: ‘sym4’.• k=1• The decomposition:

Page 19: 1.130 -Wavelets, Filter Banks and Applications Wavelet-Based Feature Extraction for Phoneme Recognition and Classification Ghinwa Choueiter

The Plosive-Fricative Feature Extractor

• The wavelet type: ‘haar’• k=1• The decomposition:

Page 20: 1.130 -Wavelets, Filter Banks and Applications Wavelet-Based Feature Extraction for Phoneme Recognition and Classification Ghinwa Choueiter

The Vowel Feature Extractor

• The wavelet type: ‘sym4’• k=1• The decomposition:

Page 21: 1.130 -Wavelets, Filter Banks and Applications Wavelet-Based Feature Extraction for Phoneme Recognition and Classification Ghinwa Choueiter

The Fricative Feature Extractor

• The wavelet type: ‘sym6’• k=1• The decomposition:

Page 22: 1.130 -Wavelets, Filter Banks and Applications Wavelet-Based Feature Extraction for Phoneme Recognition and Classification Ghinwa Choueiter

The Plosive Feature Extractor

• The wavelet type: ‘haar’• k=0.85• The decomposition:

Page 23: 1.130 -Wavelets, Filter Banks and Applications Wavelet-Based Feature Extraction for Phoneme Recognition and Classification Ghinwa Choueiter

The Complete Classifier Architecture

SpeechWaveform

Vowel-FricClassifier

Vowel-Stop

Classifier

Stop-FricClassifier

VowelClassifier

StopClassifier

FricativeClassifier

Vowel, Stop, orFricative?

Vowel or Stop/Fricative?

Vowel

Fric/Stop

Vowel

Fric

Stop

Page 24: 1.130 -Wavelets, Filter Banks and Applications Wavelet-Based Feature Extraction for Phoneme Recognition and Classification Ghinwa Choueiter

Preliminaries: Consistency

Page 25: 1.130 -Wavelets, Filter Banks and Applications Wavelet-Based Feature Extraction for Phoneme Recognition and Classification Ghinwa Choueiter

Preliminaries: Discrimination

vs sv v

vs s v

Page 26: 1.130 -Wavelets, Filter Banks and Applications Wavelet-Based Feature Extraction for Phoneme Recognition and Classification Ghinwa Choueiter

Preliminaries: Behavior

Page 27: 1.130 -Wavelets, Filter Banks and Applications Wavelet-Based Feature Extraction for Phoneme Recognition and Classification Ghinwa Choueiter

Results for Vowels

Wavelet-DCT

MFCC

Maximum Likelihood Maximum A-Priori

ae iy aa uw

ae 70 4 11 1

iy 1 96 0 3

aa 11 0 78 4

uw 1 2 2 15

86.622 %

ae iy aa uw

ae 73 5 8 0

iy 7 92 1 0

aa 10 0 82 1

uw 2 2 5 11

86.288 %

ae iy aa uw

ae 68 7 7 4

iy 9 94 0 2

aa 7 0 82 4

uw 1 1 5 13

85.953 %

ae iy aa uw

ae 67 10 9 0

iy 5 93 1 1

aa 13 0 77 3

uw 1 1 7 11

82.943 %

Page 28: 1.130 -Wavelets, Filter Banks and Applications Wavelet-Based Feature Extraction for Phoneme Recognition and Classification Ghinwa Choueiter

Results for Fricatives

Wavelet-DCT

MFCC

Maximum Likelihood Maximum A-Priori

sh v s dh

sh 42 1 9 0

v 0 75 1 21

s 20 2 78 0

dh 0 16 0 82

79.827 %

sh v s dh

sh 42 0 9 1

v 0 64 2 31

s 21 1 77 1

dh 0 16 1 81

76.081 %

sh v s dh

sh 42 0 10 0

v 0 68 0 29

s 20 0 80 0

dh 0 34 1 63

72.911 %

sh v s dh

sh 44 1 7 0

v 0 77 0 20

s 17 0 82 1

dh 1 26 0 71

78.963 %

Page 29: 1.130 -Wavelets, Filter Banks and Applications Wavelet-Based Feature Extraction for Phoneme Recognition and Classification Ghinwa Choueiter

Results for Plosives

Wavelet-DCT

MFCC

Maximum Likelihood Maximum A-Priori

b p d k

b 50 10 8 13

p 11 54 12 23

d 23 7 52 18

k 5 27 17 51

54.331 %

b p d k

b 50 13 9 9

p 10 57 9 24

d 19 11 53 17

k 3 37 14 46

54.068 %

b p d k

b 46 15 18 2

p 11 64 8 17

d 23 9 52 16

k 4 28 17 51

55.906 %

b p d k

b 48 17 11 5

p 6 68 10 16

d 23 11 53 13

k 2 27 16 55

52.000 %

Page 30: 1.130 -Wavelets, Filter Banks and Applications Wavelet-Based Feature Extraction for Phoneme Recognition and Classification Ghinwa Choueiter

Results of Category Classification

• Wavelets perform considerably better than MFCC in discriminating between vowels on one side and fricatives (95% vs. 90%) or plosives (98% vs. 95%) on the other.

• For classifying between fricatives and plosives, wavelets fall only marginally behind MFCC (90% vs. 91%).

Page 31: 1.130 -Wavelets, Filter Banks and Applications Wavelet-Based Feature Extraction for Phoneme Recognition and Classification Ghinwa Choueiter

Conclusions and Future Work

• The results obtained from the wavelet-based feature extraction are quite promising.

• Designing specific wavelets that would be optimized for the task at hand.

• Consider an algorithm that would select the optimum decomposition for a family of signals.

• Incorporating confidence scoring.

• Further investigation into the fractional moments.

Page 32: 1.130 -Wavelets, Filter Banks and Applications Wavelet-Based Feature Extraction for Phoneme Recognition and Classification Ghinwa Choueiter

References

• G. Strang and T. Nguyen, Wavelets and Filter Banks. Wellesley-Cambridge Press, 1997.

• P. P. Vaidyanathan, “Lossless systems in wavelet transforms”. IEEE International Symposium on Circuits and Systems, 1991.

• K. Kim, D. H. Youn and C. Lee “Evaluation of wavelet filters for speech recognition”. IEEE International Conference on Systems, Man, and Cybernetics, 2000, vol. 4, pp. 2891-2894. 2000.

• Z. Tufekci and J. N. Gowdy, “Feature extraction using discrete wavelet transform for speech recognition”. Proceedings of the IEEE Southeastcon 2000, pp. 116-123. 2000.

• B. T. Tan, M. Fu and A. Spray “The use of wavelet transforms in phoneme recognition”. Proceedings of the Fourth International Conference on Spoken Language, ICSLP 96, vol. 4, pp. 2431-2434. Oct 3-6, 1996.

• B. T. Tan, R. Lang, H. Schroder, A. Spray, and P. Dermody. "Applying wavelet analysis to speech segmentation and classification." Wavelet Applications, Harold H. Szu, Editor, Proc. SPIE 2242, pp. 750-761, 1994.