1907_1

10
 N e u r a l n e t w o r k a d a p t i v e w a v e l e t s f o r s i g n a l r e p r e s e n t a t i o n a n d c l a s s i f i c a t i o n H a r o l d H . S z u B r i a n T e l f e r D e p a r t m e n t o f t h e N a v y N a v a l S u r f a c e W a r f a r e C e n t e r C o d e R 4 4 1 0 9 0 1 N e w H a m p s h i r e A v e n u e S i l v e r S p r i n g , M a r y l a n d 2 0 9 0 3 - 5 0 0 0 S h u b h a K a d a m b e A . I . d u P o n t I n s t i t u t e A p p l i e d S c i e n c e a n d E n g i n e e r i n g L a b o r a t o r i e s W i l m i n g t o n , D e l a w a r e 1 9 8 9 9 1 I n t r o d u c t i o n W a v e l e t s s h o w p r o m i s e f o r b o t h s i g n a l ( o r i m a g e ) r e p r e - s e n t a t i o n a n d c l a s s i f i c a i o n . S i g n a l r e p r e s e n t a t i o n u s i n g w a v e l e t s h a s r e c e i v e d b y f a r t h e m o s t 1 6 R e p r e - s e n t a t o n a n d c l a s s i f i c a t i o n b o t h c a n b e v i e w e d a s f e a t u r e e x t r a c t i o n p r o b l e m s i n w h i c h t h e g o a l i s t o f i n d a s e t o f d a u g h t e r w a v e l e t s ( d i l a t i o n s a n d s h i f t s o f a m o t h e r w a v e l e t ) t h a t e i t h e r b e s t r e p r e s e n t t h e s i g n a l o r b e s t s e p a r a t e v a r i o u s s i g n a l c l a s s e s i n t h e r e s u l t i n g f e a t u r e s p a c e . H o w e v e r , t h e b e s t s e t o f w a v e l e t s f o r r e p r e s e n t a t i o n w i l l n o t n e c e s s a r i l y b e t h e s a m e a s t h e b e s t s e t f o r c l a s s i f i c a t i o n , o r v i c e v e r s a . T h i s i s b e c a u s e r e p r e s e n t a t i o n e m p h a s i z e s t h e h u m p s o f a d i s t r i b u t i o n , w h i l e c l a s s i f i c a t i o n e m p h a s i z e s t h e o v e r l a p p i n g t a i l s , w h i c h t e n d t o b e c l o s e t o t h e d e c i s i o n b o u n d a r i e s . W e p r e s e n t e x a m p l e s o f h o w w a v e l e t s c a n b e a d a p t i v e l y c o m p u t e d f o r r e p r e s e n t a t i o n a n d c l a s s i f i c a t i o n . B y ' ' a d a p - t i v e , ' ' w e m e a n t h a t e i t h e r t h e w a v e l e t p a r a m e t e r s o r t h e w a v e l e t s h a p e a r e i t e r a t i v e l y c o m p u t e d t o m i n i m i z e a n e n - e r g y f u n c t i o n f o r a s p e c i f i c a p p l i c a t i o n . T h i s d i f f e r s f r o m m o s t p r e v i o u s a p p l i c a t i o n s t h a t t e s t a l a r g e f i x e d s e t o f w a v e l e t s a n d t h e n d i s c a r d t h e o n e s t h a t c o n t r i b u t e l e a s t . I n a d d i t i o n , a n e w c o n c e p t o f a ' ' s u p e r p o s i t i o n - w a v e l e t , ' ' o r i n s h o r t , ' ' s u p e r - w a v e l e t , ' ' i s i n t r o d u c e d . T h e s u p e r - w a v e l e t i s a l i n e a r c o m b i n a t i o n o f a d a p t i v e w a v e l e t s t h a t i s i t s e l f t r e a t e d a s a w a v e l e t , i n t h a t d i l a t i o n s o f a s u p e r - w a v e l e t h a n d l e s c a l e c h a n g e s i n a s i g n a l . T h e n t r o d u c t i o n o f a s u p e r - w a v e l e t m e a n s t h a t t h e f u n d a m e n t a l s h a p e o f t h e w a v e l e t c a n b e a d a p t e d t o p a r t i c u l a r a p p l i c a t i o n s , r a t h e r t h a n j u s t t h e p a r a m e t e r s o f a f i x e d - s h a p e w a v e l e t . A s n o t e d a b o v e , w a v e l e t s h a v e b e e n r a r e l y a p p l i e d t o c l a s s i f i c a t i o n . O n e p a p e r h a s c o n s i d e r e d t h i s , 7 b u t h a s u s e d P a p e r W T - O 1 1 r e c e i v e d A p r i l 2 0 , 1 9 9 2 ; r e v i s e d m a n u s c r i p t r e c e i v e d J u n e 4 , 1 9 9 2 ; a c c e p t e d f o r p u b l i c a t i o n J u n e 8 , 1 9 9 2 . 1 9 9 2 S o c i e t y o f P h o t o - O p t i c a l I n s t r u m e n t a t i o n E n g i n e e r s . 0 0 9 1 - 3 2 8 6 / 9 2 / $ 2 . 0 0 . A b s t r a c t . M e t h o d s a r e p r e s e n t e d f o r a d a p t i v e l y g e n e r a t i n g w a v e l e t t e m - p l a t e s f o r s i g n a l r e p r e s e n t a t i o n a n d c l a s s i f i c a t i o n u s i n g n e u r a l n e t w o r k s . D i f f e r e n t n e t w o r k s t r u c t u r e s a n d e n e r g y f u n c t i o n s a r e n e c e s s a r y a n d a r e g i v e n f o r r e p r e s e n t a t i o n a n d c l a s s i f i c a t i o n . T h e i d e a i s i n t r o d u c e d o f a  s u p e r - a v e l e t , a l i n e a r c o m b i n a t i o n o f w a v e l e t s t h a t i s e l f i s t r e a t e d a s a w a v e l e t . T h e s u p e r - w a v e l e t a l l o w s t h e s h a p e o f t h e w a v e l e t t o a d a p t t o a p a r t i c u l a r p r o b l e m , w h i c h g o e s b e y o n d a d a p t i n g p a r a m e t e r s o f a f i x e d - s h a p e w a v e l e t . S i m u l a t i o n s a r e g i v e n f o r 1 - D s i g n a l s , w i t h t h e c o n - c e p t s e x t e n d a b l e t o i m a g e r y . I d e a s a r e d i s c u s s e d f o r a p p l y i n g t h e c o n - c e p t s i n t h e p a p e r t o p h o n e m e a n d s p e a k e r r e c o g n i t i o n . S u b j e c t t e r m s : w a v e l e t t r a n s f o r m s ; c l a s s i f i c a t i o n ; f e a t u r e s e l e c t i o n ; n e u r a l n e t - w o r k s ; p h o n e m e r e c o g n i t i o n ; s i g n a l a p p r o x i m a t i o n ; s i g n a l r e p r e s e n t a t i o n ; s p e a k e r r e c o g n t i o n . O p t i c a l E n g i n e e r i n g 3 1 ( 9 ) , 1 9 0 7 - 1 9 1 6 ( S e p t e m b e r 1 9 9 2 ) . h u m a n - s e l e c t e d r a t h e r t h a n a d a p t i v e w a v e l e t s . W e n o w r e - v i e w m a j o r a p p r o a c h e s t h a t h a v e b e e n u s e d f o r r e p r e s e n - t a t i o n . M o s t a p p r o a c h e s , e . g . , R e f . 1 a n d 2 , u s e a f i x e d m o t h e r w a v e l e t w i t h v a r y i n g s h i f t a n d d i l a t i o n p a r a m e t e r s . C l e a r l y , h i g h e r c o m p r e s s i o n r a t i o s c a n b e o b t a i n e d b y c h o o s i n g t h e m o t h e r w a v e l e t t h a t b e s t f i t s t h e d a t a . C o i f m a n a n d W i c k e r h a u s e r 3 f i n d t h e b e s t ( i n t e r m s o f m i n i m u m e n t r o p y ) m o t h e r w a v e l e t s f r o m a l i b r a r y o f p o s s i b l e m o t h e r w a v e l e t s f o r c o m p r e s s i n g a s i g n a l . T h e y h a v e p r o d u c e d a f a s t a - g o r i t h m ( N o g N , w h e r e N i s t h e s i g n a l l e n g t h ) f o r d o i n g s o . O n e w o u l d e x p e c t t h a t h i g h e r c o m p r e s s i o n r a t i o s c a n b e o b t a i n e d b y a d a p t i v e l y c o m p u t i n g t h e w a v e l e t , r a t h e r t h a n s e l e c t i n g o n e f r o m a f i x e d l i b r a r y , a l t h o u g h t h i s i s l i k e l y t o b e a t t h e c o s t o f g r e a t e r c o m p u t a t i o n t i m e . T e w f i k , S i n g h a , a n d J o r g e n s e n 4 p r o p o s e a c o m p l e x b u t e f f i c i e n t a p p r o a c h f o r c o m p u t i n g t h e b e s t o r t h o g o n a l m o t h e r w a v e l e t f r o m a s c a l i n g f u n c t i o n . S e v e r a l m e t h o d s t h a t c o m b i n e n e u r a l n e t - w o r k s a n d w a v e l e t s h a v e b e e n c o n s i d e r e d . D a u g m a n ' u s e s a n e u r a l n e t w o r k t o l e a r n t h e b e s t s e t o f c o e f f i c i e n t s f o r a p p r o x i m a t i n g a n i m a g e w i t h a s e t o f G a b o r w a v e l e t s . P a t i a n d K r i s h n a p r a s a d 5 a l s o f i n d t h e b e s t c o e f f i c i e n t s f o r a w a v e - l e t e x p a n s i o n u s i n g a n e u r a l n e t w o r k . H o w e v e r , a m o r e a d a p t i v e a p p r o a c h i s t o l e a r n t h e w a v e l e t p a r a m e t e r s u s i n g n e u r a l n e t w o r k s , a s Z h a n g a n d B e n v e n i s t e 6 h a v e d o n e f o r a p p r o x i m a t i n g a f u n c t i o n . O u r a p p r o a c h i s s i m i l a r t o R e f . 6 b u t d i f f e r s i n i m p o r t a n t a s p e c t s , n a m e l y , t h e w a v e l e t f u n c - t i o n , t h e a p p r o x i m a t i o n f u n c t i o n , t h e l e a r n i n g a l g o r i t h m , a n d t h e s u p e r - w a v e l e t c o n c e p t . W e a l s o a d d r e s s c l a s s i f i - c a t i o n a s w e l l a s r e p r e s e n t a t i o n . T h r o u g h o u t t h i s p a p e r , w e u s e e x a m p l e s b a s e d o n s p e e c h s i g n a l s . T h a t i s b e c a u s e w e f e e l t h i s a d a p t i v e w a v e l e t a p - p r o a c h h a s g r e a t p o t e n t i a l f o r b o t h s p e e c h a n d m a c h i n e - m a d e s o u n d s ( a s w e l l a s i m a g e s ) . H o w e v e r , t h e p r i n c i p a l p u r p o s e o f t h e p a p e r i s t o d e m o n s t r a t e t h e c o n c e p t o f a d a p - t i v e w a v e l e t s , a n d n o t t o a c t u a l l y s o l v e a n y a s p e c t o f t h e s p e e c h p r o b l e m , w h i c h r e m a i n s f o r f u t u r e w o r k . O P T I C A L E N G I N E E R I N G / S e p t e m b e r 1 9 9 2 / V o l . 3 1 N o . 9 / 1 9 0 7 Downloaded From: http://opticalengineering.spiedigitallibrary.org/ on 07/09/2015 Terms of Use: http://spiedl.org/terms

Upload: blah

Post on 03-Nov-2015

215 views

Category:

Documents


0 download

DESCRIPTION

blah

TRANSCRIPT

  • Neural network adaptive wavelets forsignal representation and classificationHarold H. SzuBrian TelferDepartment of the NavyNaval Surface Warfare CenterCode R4410901 New Hampshire AvenueSilver Spring, Maryland 20903-5000

    Shubha KadambeA. I. duPont InstituteApplied Science and Engineering

    LaboratoriesWilmington, Delaware 19899

    1 IntroductionWavelets show promise for both signal (or image) repre-sentation and classification. Signal representation usingwavelets has received by far the most 16 Repre-sentation and classification both can be viewed as featureextraction problems in which the goal is to find a set ofdaughter wavelets (dilations and shifts of a mother wavelet)that either best represent the signal or best separate varioussignal classes in the resulting feature space. However, thebest set of wavelets for representation will not necessarilybe the same as the best set for classification, or vice versa.This is because representation emphasizes the humps of adistribution, while classification emphasizes the overlappingtails, which tend to be close to the decision boundaries.

    We present examples of how wavelets can be adaptivelycomputed for representation and classification. By '

    'adap-tive,' ' we mean that either the wavelet parameters or thewavelet shape are iteratively computed to minimize an en-ergy function for a specific application. This differs frommost previous applications that test a large fixed set ofwavelets and then discard the ones that contribute least. Inaddition, a new concept of a ''superposition-wavelet,' ' orin short, ' 'super-wavelet,' ' is introduced. The super-waveletis a linear combination of adaptive wavelets that is itselftreated as a wavelet, in that dilations of a super-wavelethandle scale changes in a signal. The introduction of a super-wavelet means that the fundamental shape of the waveletcan be adapted to particular applications, rather than justthe parameters of a fixed-shape wavelet.

    As noted above, wavelets have been rarely applied toclassification. One paper has considered this,7 but has used

    Paper WT-O1 1 received April 20, 1992; revised manuscript received June 4, 1992;accepted for publication June 8, 1992.

    1992 Society of Photo-Optical Instrumentation Engineers. 0091-3286/92/$2.00.

    Abstract. Methods are presented for adaptively generating wavelet tem-plates for signal representation and classification using neural networks.Different network structures and energy functions are necessary and aregiven for representation and classification. The idea is introduced of a"super-wavelet," a linear combination of wavelets that itself is treated asa wavelet. The super-wavelet allows the shape of the wavelet to adaptto a particular problem, which goes beyond adapting parameters of afixed-shape wavelet. Simulations are given for 1 -D signals, with the con-cepts extendable to imagery. Ideas are discussed for applying the con-cepts in the paper to phoneme and speaker recognition.Subject terms: wavelet transforms; classification; feature selection; neural net-works; phoneme recognition; signal approximation; signal representation; speakerrecognition.Optical Engineering 31(9), 1907- 1916 (September 1992).

    human-selected rather than adaptive wavelets. We now re-view major approaches that have been used for represen-tation. Most approaches, e.g. , Ref. 1 and 2, use a fixedmother wavelet with varying shift and dilation parameters.Clearly, higher compression ratios can be obtained by choosingthe mother wavelet that best fits the data. Coifman andWickerhauser3 find the best (in terms of minimum entropy)mother wavelets from a library of possible mother waveletsfor compressing a signal. They have produced a fast a!-gorithm (N !ogN, where N is the signal length) for doingso. One would expect that higher compression ratios can beobtained by adaptively computing the wavelet, rather thanselecting one from a fixed library, although this is likely tobe at the cost of greater computation time. Tewfik, Singha,and Jorgensen4 propose a complex but efficient approachfor computing the best orthogonal mother wavelet from ascaling function. Several methods that combine neural net-works and wavelets have been considered. Daugman' usesa neural network to learn the best set of coefficients forapproximating an image with a set of Gabor wavelets. Patiand Krishnaprasad5 also find the best coefficients for a wave-let expansion using a neural network. However, a moreadaptive approach is to learn the wavelet parameters usingneural networks, as Zhang and Benveniste6 have done forapproximating a function. Our approach is similar to Ref. 6but differs in important aspects, namely, the wavelet func-tion, the approximation function, the learning algorithm,and the super-wavelet concept. We also address classifi-cation as well as representation.

    Throughout this paper, we use examples based on speechsignals. That is because we feel this adaptive wavelet ap-proach has great potential for both speech and machine-made sounds (as well as images). However, the principalpurpose of the paper is to demonstrate the concept of adap-tive wavelets, and not to actually solve any aspect of thespeech problem, which remains for future work.

    OPTICAL ENGINEERING / September 1 992 / Vol. 31 No. 9 / 1907Downloaded From: http://opticalengineering.spiedigitallibrary.org/ on 07/09/2015 Terms of Use: http://spiedl.org/terms

  • SZU, TELFER, and KADAMBE

    2 Formulation

    (t) = Wkh()k=1 \ak/

    Tg(b)k= = [s(t) (t)]Wk[l .75 sin(l .75t')

    t3bk t=1

    v=r(u)=rI Wk>Sfl(t)h(__)l(5) Lk=1 t=1 \ ak j

    1908 / OPTICAL ENGINEERING / September 1992 / Vol. 31 No. 9

    Section 2 formulates the problem and the different neuralnetwork structures used for representation and classification.Simulation examples are given in Sec. 3, which also dis-cusses the super-wavelet concept. Section 4 discusses howthis approach might apply to phoneme and speaker recog-nition. Section 5 offers conclusions. An appendix providesphysiological insights into speech characteristics.

    A network and formulation for signal representation is of-fered first and is followed by signal classification.

    2.1 RepresentationA signal s(t) can be approximated by daughters of a mother .wavelet ht accordin to Fig. 1 Example neural network architecture for wavelet signal ap-' 1 b proximation, where the time value t feeds into the K nodes with

    wavelet nonlinearities.

    We use a conjugate gradient method8 to minimize E. Form-ing the column vectors g(w)and w from the elements g(w)k

    where the Wk, bk, and ak are the weight coefficients, shifts, and Wk, the i'th iteration for minimizing E with respect toand dilations for each daughter wavelet. This approximation w proceeds according to the following two steps [s(w) iscan be expressed as the neural network of Fig . 1 , which the search direction as a function of w]:contains wavelet nonlinearities in the artificial neurons ratherthan the standard sigmoidal nonlinearities. This architecture 1 . if k is multiple of n then s(w)' = g(w)' (7)is similar to a radial basis function (RBF) neural 31,32because symmetric wavelets form a family of RBFs spec- i i g(w)iTg(w)i iiified by the dilation parameter. The network parameters wk, else s(w) = g(w) + (w)' DTg(w) 1) s(w)bk, and ak can be optimized by minimizing an energy func- gtion. We employ the least-mean-squares (LMS) energy for (8)signal representation.

    1 T 2. =w+ as(w)' . (9)E= : [s(t) i(t)J2 . (2) . . . .2t= i Step 1 computes a search direction s at iteration i. Step 2

    computes the new weight vector using a variable step-sizeA simple extension of Eq. (2) would be to produce an ap- a. At each iteration, steps 1 and 2 are computed for theproximation over multiple realizations of a particular wave- representation parameter vectors w, a, and b. It is preferableform to reduce noise and extract commonality. Adopting to perform a line search to find the best step-size (this canthe mother wavelet greatly reduce the number of iterations needed for conver-

    gence), but to demonstrate the concept of adaptive waveletsh(t) =cos(1 .75t) exp( t/2) (3) we use fixed step-sizes for simplicity.and letting t' =(tbk)/ak, the gradients of E [Eq. (2)] are 2.2 Classification

    8E T 2 Wavelets appear promising as a feature space for classifi-g(w)k = : [s(t) (t)1 cos(1.75t') exp( t' /2) , cation. The extraction of features in this case are the vector

    aWk t= i inner products of a set of wavelets with the input signal.(4) These features can then be input to a classifier. A majorissue is which wavelets to select. As an example of an

    adaptive solution to this problem, we consider the combinedclassifier and wavelet feature detector given by

    (10)

    where v,, is the output for the n'th training vector s(t) andr(z) = 1/[1 + exp( z)] , a sigmoidal function. This classifiercan be depicted as the neural network of Fig. 2, which useswavelet weights rather than the wavelet nonlinearities of therepresentation network in Fig. 1 . The lower part of Fig. 2produces inner products of the signal and wavelets, with

    (6) the first wavelet on the left and the K'th wavelet on the

    x exp( t'2/2)/ak+cos(1 .75t') exp( t'2/2)t'/akl

    0E Tg(a)k = : [s(t) (t)]Wk[l .75 sin(1 .75t')

    t3ak t=1

    x exp( t'2/2)t'/ak+cos(1 .75t')x exp( t'212)t'2/ak] tg(b)k

    Downloaded From: http://opticalengineering.spiedigitallibrary.org/ on 07/09/2015 Terms of Use: http://spiedl.org/terms

  • NEURAL NETWORK ADAPTIVE WAVELETS

    Fig. 2 Example neural network architecture for classifier with waveletfeatures (after synthesis all weights compress to single layer becauseof linearities).

    right. Figure 2 shows two layers of weights, but once thenetwork is synthesized, the two layers collapse into onebecause a nonlinearity does not exist between layers. Theclassification parameters wk, bk, and ak can be optimizedby minimizing, e.g. , for two well-separated classes,

    N

    E=(d_v)2 (11)where d is the desired classifier output for s(t). We setd = 1 for one class and d =0 for the other. Extensions orother approaches are certainly possible. For example, a morecomplex multilayer network or a network with multipleoutput elements to handle more than two classes could beadopted instead of the classifier in Eq. (10). More than onemother wavelet could be included in Eq. (10) [e.g. , in ad-dition to h(t), one could include another mother g(t)]. Also,a different measure could be used to determine the optimalfeatures, such as the Fisher ratio9or minimax criterion, whichminimizes the difference between the intraclass variation ofthe training vectors and maximizes the interciassor a minimum-misclassification-error criterion for overlap-ping classes.11 However, the approach of Eqs. (10) and (11)suffices to demonstrate the concept of adaptive wavelet fea-ture generation. Employing the wavelet of Eq. (3) and let-ting t' = (t bk)/ak, u'(u) = ao(u)1t9u=cr(u)[1 ff(u)J, thegradients of E [Eq. (1 1)] are

    N Tg(w)k= (d v)u'(u)s(t) cos(1 .75t')

    n==1 t=1

    N Tg(a)k= (dv)ff'(u)s(t)wkn=1 t1

    x [1 .75 sin(1 .75t') exp( t'2/2)t'/ak+ cos(1 .75t') exp( t'2/2)t'2/akl t'g(b)k , (14)

    Conjugate gradient descent is used to minimize Eq. (11),as described in Sec. 2. 1 . We have proposed different net-work structures and energy functions for representation andclassification. We next present results run on sample datafor the formulations in this section.

    3 SimulationsSimulationresults for signal representation are offered first,followed by signal classification.

    3.1 RepresentationTo demonstrate how adaptive wavelets can approximatefunctions, we consider three phonemes, "a," "e," and "i,"that were extracted from speech signals and which are shownin Fig. 3. (These are long vowels spoken in isolation.) Inthis section, we consider the phonemes as generic signalsto demonstrate the neural network's operation. In Sec. 4,we consider how this method could be applied to phonemeand speaker recognition in future work. Note that each pho-neme in Fig. 3 is periodic. We approximate only a singleperiod of each. The solid lines in Fig. 4 show the extractedperiods, where the signal on either side of the period hasbeen windowed with a Gaussian falloff with a standarddeviation of two pixels. The dashed lines in Fig. 4 showthe wavelet approximations, which we now describe. Thewavelet in Eq. (3) with a = 8 is shown in Fig. 5. To deter-mine the number and initial placement for each wavelet,we convolve this wavelet with the signal and place a motherwavelet at each location where a peak occurs. The numberof wavelets selected for ''a,' ' ' 'e,' ' and ' 'i' ' was 1 1 , 6, and14. All weights and dilations were initialized to 0 and 8,respectively. The gradient descent algorithm was run for500 iterations (batch mode) with a =10 2 aa ab 10 ,and a restart cycle of n = 10. Letting e =s (where theboldface denotes column vectors containing the time sam-ples of the signal, etc.), we measure a normalized approx-imation error eTe/sTs, so that the error between s and an all-zero vector 0 equals 1. The approximation errors for "a,""e," and "i" are 0.210, 0.231, and 0.083. Tables 1, 2,and 3 show the final parameters for each approximation.Note that the dilations range over an order of magnitude,with the smallest being 0.207 times the initial value of 8,and the largest being twice the initial value. The maximumchange between an initial and final shift is 9.8, or roughlyhalf of the initial spacing between shifts. Thus, the neuralnetwork has adaptively created a wide range of daughterwavelets and has produced good approximations.

    (12) The purpose of these simulations is to show the potentialof neural networks for adaptively creating a wavelet ap-proximation, not to produce an efficient production codefor doing so. The program required three CPU minutes forapproximating the single period of ' 'i,' ' which is repre-sented by 14 wavelets and required proportionately less forthe other two phonemes . The speed of neural network syn-

    (13) thesis could be dramatically improved by (1) incorporating

    OPTICAL ENGINEERING / September 1992 / Vol. 31 No. 9 / 1909

    x exp( t'2/2)N T

    g(b)k= : (d Vn)ff(Un)Sn(t)Wk{l.75 sin(1 .75t')n=1 t==1

    X exp( t'2/2)/ak+ cos(1 .75t') exp( t'2/2)t'/ak}

    Downloaded From: http://opticalengineering.spiedigitallibrary.org/ on 07/09/2015 Terms of Use: http://spiedl.org/terms

  • 00

    E0

    0-C

    0E0

    szU, TELFER, and KADAMBE

    1 91 0 / OPTICAL ENGINEERING / September 1 992 / Vol. 31 No. 9

    0-C

    a-E0

    250

    250

    500

    0-C

    a.E0

    0

    250 500 50 100 150timetime

    (a) (a)

    200

    500

    0

    a.ECe

    50 150100 200 300 400 500 time

    time(b)

    (b)

    I 00 200

    0-C

    a.ECe

    1500

    1000

    500

    0

    -500

    1500

    1000

    500

    A

    (APPV\ ktv100 200 300 400 500

    time

    (c)

    50 100 150 200time

    Fig. 3 Phonemes: (a) "a," (b) "e," and (c) "i'

    250

    (c)

    Fig. 4 Single periods of phonemes extracted from Fig. 3 signals(solid lines) and adaptive wavelet approximations (dashed lines):(a) "a," (b) "e," and (c) "i.'

    Downloaded From: http://opticalengineering.spiedigitallibrary.org/ on 07/09/2015 Terms of Use: http://spiedl.org/terms

  • NEURAL NETWORK ADAPTIVE WAVELETS

    WaveletNumber

    Weightsw

    Dilationsa

    ShiftInitial

    s bFinal

    1 165 4.72 86 90.72 238 2.42 98 94.93 700 11.0 111 1114 15.0 8.89 127 1285 85.9 6.73 140 1396 -110 8.47 152 1517 49.8 6.39 166 1658 -220 10.7 181 1859 -394 8.22 190 200

    10 -468 2.66 201 20411 -259 1.66 214 209

    a line search that computes the best step-sizes a at eachiteration, (2) including a stopping criterion, (3) creating bet-ter initial values for the weights and dilations, and (4) runningthe algorithm on specialized neural network hardware.

    Note that the wavelet approximation of each signal de-scribes a super-wavelet, which is a linear combination ofwavelets that itself can be treated as a wavelet. Dilationsof the super-wavelet could represent the same period ofspeech spoken at different speeds. Section 4 provides a moredetailed discussion.

    3.2 ClassificationTo generate a simple training set for demonstration pur-poses, we segmented three single-period training vectorsfrom each of the ' 'a,' ' ' 'e,' ' and ''i' ' signals. The length ofeach period was adjusted to be identical, and all nine signalswere normalized to unit norm. These training vectors areplotted in Figs. 6(a) through 6(c). Because the three classesare quite different, they pose a simple recognition problem.To make the problem more challenging, 10 additional vec-tors for each class were synthesized by adding Gaussiannoise with o =0.2 (SNR = 6 dB) to the first period ex-tracted from each class. A representative noisy training vec-tor ' 'a' ' is plotted in Fig. 6(d). Thus the training set con-tamed 39 vectors. A test set was not used because we aresimply demonstrating the concept of adaptive wavelet fea-ture selection and not testing a real application. As in Sec. 3.1,these signals are treated as generic and static (we compute

    WaveletNumber

    Weightsw

    Dilationsa

    Shifts bInitial Final

    123456

    96.9588-193

    182-372-65.3

    10.112.49.32

    9.2416.09.81

    98114136158186203

    96.4114139158184203

    WaveletNumber

    Weightsw

    Dilationsa

    ShifInitial

    ts bFinal

    1 62.6 7.86 92 93.92 1720 3.68 103 1033 35.3 7.05 112 1134 1070 4.13 121 1225 -173 9.09 131 1306 138 7.89 139 1407 -511 4.34 148 1478 136 7.28 157 1579 -104 7.48 164 164

    10 104 7.39 172 17211 -271 7.29 181 18012 -295 7.84 190 19313 -492 9.29 203 20514 1050 2.20 215 212

    only vector inner products, not correlations), and this is notintended to be a test of phoneme recognition.

    A two-class case was tested, with all ''a' ' vectors formingone class (desired output of 1) and all ''e' ' and ' 'i' ' vectorsforming the other (desired output of 0). We chose to usefour wavelet features after empirically determining that num-ber was sufficient to classify the data. The Eq. (3) waveletswere initialized to equal dilations ak = 16 (k = 1 4),shifts evenly spaced across the signals, b1 =62, b2 =87,b3 112, b4 = 137, and zero-valued weights. Leaving thewavelets fixed at the initial values, we first minimized E[Eq. (1 1)] by adapting only the weights (150 iterations,ct =0. 1 , tla ab 0.0, restart cycle of 10). This resultedin five classification errors, or a 13% misclassification rate.Minimizing with adaptive wavelet features (150 iterations,ciw= aa ab 0. 1) reduced the classification errors to 1 ,or2.5%. This demonstrates the point that adaptive waveletfeatures can produce much better classification rates thanad hoc fixed wavelet features. Table 4 gives the resultingparameters for each case and shows that for the adaptivecase, the dilations and shifts changed from their initial val-ues. Figure 7 shows the resulting wavelet features for eachcase. Because the classifier linearly combines the waveletfeatures, Fig. 7 plots the linear combination of the weightedwavelets. (Linear combination of wavelets has previouslybeen used to form a detection filter,7 but the wavelets werefixed and not adaptive.) Figure 7 clearly shows the differ-ences between the fixed and adaptive wavelet features . Ina real application, one would want to pick good initial valuesfor the wavelet parameters and then optimize them as dem-onstrated in this paper. Good initial values are important toavoid local minima of energy functions such as Eq. (1 1).The combination of wavelets in Fig. 7(b) forms a super-wavelet meant for classification rather than representation.

    Features rather than raw data are used for classificationfor several reasons: (1) reducing the dimension of data makes

    OPTICAL ENGINEERING / September 1 992 / Vol. 31 No. 9 / 1 91 1

    1.00

    0.75

    0.50 -

    0.25

    5,0

    0.ECs

    Table 2 Weights, dilations, and shifts for adaptive wavelet approx-imation of "e."

    0.00

    -0.25

    Table 3 Weights, dilations, and shifts for adaptive wavelet approx-imation of "i.'

    0 50 100 150time

    Fig. 5 Wavelet given by Eq. (3) with a = 8.

    Table 1 Weights, dilations, and shifts for adaptive wavelet approx-imation of "a."

    Downloaded From: http://opticalengineering.spiedigitallibrary.org/ on 07/09/2015 Terms of Use: http://spiedl.org/terms

  • Q) 0.10

    E0.00

    0.20a)00Eca o.io

    Table4 Weights, dilations, and shifts for fixed and adaptive waveletfeatures for classification.

    Wavelet Fixed Features Adaptive FeaturesNumber w a b w a b

    1 2.59 16.0 62.0 3.89 9.8 59.72 2.73 16.0 87.0 3.59 13.7 84.73 0.24 16.0 112.0 0.76 15.7 111.94 2.40 16.0 137.0 4.30 13.5 133.4

    the problem more overdetermined by the training set andtherefore can increase the classification rate, (2) reducingthe dimension ofthe data speeds up training, and (3) featurescan incorporate invariances such as scale, translation, etc.,to avoid impractically large training sets. The adaptive waveletfeatures we have demonstrated primarily address the firstreason. The adaptive nature requires more computation timethan fixed features , but this second reason is a relativelyminor issue, since training is normally performed off-line.This paper has not addressed the third reason, which rep-

    resents an important issue for future work. Invariances seemless important for speech signals than for images , since scalechanges in speech can conceivably be handled by waveletdilations, while classifying objects in images often requiresrotation invariance.

    Sections 2 and 3 clearly show that representation andclassification significantly differ in terms of the networkstructure and the type of criterion that is optimized and interms of the resulting wavelets. However, both approachescan be used for recognition, as described in Sec. 4.

    4 Speech Case StudyTo make the concepts presented in the previous sectionsmore concrete, we consider how these ideas might apply tophoneme and speaker recognition. Implementation of theseideas to a particular application remains for future work.

    American English speech is composed of 42 basic sounds,or 12 The phonemes are broadly classified as beingvoiced, unvoiced, and 13 Voiced sounds are periodic

    1 91 2 / OPTICAL ENGINEERING / September 1 992 / Vol. 31 No. 9

    0.30

    0.20

    Szu, TELFER, and KADAMBE

    a)00.ECa

    -0.10

    -0.20

    0

    0.40

    0.30

    50 100 150 0 50 100 150t t

    (a) (b)

    0.00

    -0.10

    0

    0.50

    0.25

    0.00

    -0.25

    -0.50

    (c) (d)

    Fig. 6 Training vectors: three single periods (solid, dashed, and dashed-dot lines) from (a) "a," (b) "e,"(c) 'i," and (d) noisy "a" training vector (ten noisy training vectors used for each class).

    50 100 150 0 50 100 150

    Downloaded From: http://opticalengineering.spiedigitallibrary.org/ on 07/09/2015 Terms of Use: http://spiedl.org/terms

  • a)-o

    0ECa

    a)-D

    0ECa

    NEURAL NETWORK ADAPTIVE WAVELETS

    -1500

    (b)Fig. 7 Wavelet features: (a) fixed and (b) adaptive.

    or semiperiodic (e.g. , the ' 'a,' ' ' 'e,' ' and ' 'i' ' phonemes inFig. 3). Unvoiced sounds are higher frequency and morenoiselike. The waveform ofa phoneme varies from phonemeto phoneme, from speaker to speaker, and from pronunci-ation to pronunciation for the same speaker. Figure 8 showsexamples of speech using the words ' 'the seat,' ' in whichthe periodic nature of the voiced sounds are visible and thehigh-frequency noiselike quality of unvoiced sounds can beseen in the ' 's' ' and ''t.' ' Also, the two signals give an ideaof how speech differs in frequency and envelopes betweentwo speakers, particularly between male and female speak-ers. More details on the physiology of speech characteristicsand speaker differences are given in Sec. 6. We make useof the waveform variability of phonemes to suggest a pho-neme recognition system and the variability of the samephoneme from speaker to speaker to suggest a speaker rec-ognition system using adaptive super-wavelets. Sections 4.1and 4.2 discuss a phoneme recognition and a speaker rec-ognition system.

    Fig. 8 American (a) male and (b) female speakers saying 'the seat,"extracted from conversational speech with a 1 6-kHz sampling rate.

    4.1 Phoneme RecognitionPhoneme recognition systems are used in automatic speechrecognition systems and related phoneme generators are usedin speech synthesis systems . Several phoneme recognizershave been developed. 1418 These systems exploit the van-ations in the phonemes' spectra by computing the spectrumof a small segment of a speech signal and then computingthe mel or bark scale coefficients from the power spectrum.These features are classified by classical methods14 and neuralnetwork approaches. 15-18

    Adaptive wavelets offer two potential approaches. First,a super-wavelet could be generated for each phoneme usingthe function representation method of Sec. 2. 1 , optimizedover multiple speakers . The super-wavelet might be fash-ioned to represent several periods of a phoneme to improvethe SNR. The set of super-wavelets then forms a bank offilters that can be correlated with a speech signal. The cor-relation peaks would identify the phonemes. Dilated yen-

    OPTICAL ENGINEERING / September 1992 / Vol. 31 No. 9 / 1913

    7500

    5000

    2500 IH

    -D

    0ECS

    4.0

    (a)

    0 50 100 150 0 2500 5000 7500 10000time sample (16 kHz samplinq)

    (a)

    3.0

    2.0

    -1000

    (b)

    50 100 150 0 2500 5000 7500 10000t time sample (16 kHz sampling)

    Downloaded From: http://opticalengineering.spiedigitallibrary.org/ on 07/09/2015 Terms of Use: http://spiedl.org/terms

  • szu, TELFER, and KADAMBE

    sions of the super-wavelets could be used to identify speechat different speeds. This idea is shown conceptually in Fig. 9,which plots correlations of the super-wavelets in Fig. 4(normalized to unit norm) with the full phoneme signals inFig . 3 . The correlation peaks clearly indicate the occurenceof each period and the type of phoneme. For example,Fig. 9(a) plots the correlations of the three super-waveletswith the ' 'a' ' phoneme and the highest correlation peaksare in the solid-line plot produced by the ''a' ' super-wavelet.The correlation peaks decrease over time because the signalstrength is decreasing. Clearly, the local signal strength mustbe taken into account in such an approach. This is a sim-plistic example, in that Fig. 9 is testing on the training data(at least for the first period of the signal) and the waveletapproximations have not been produced over multiple pho-neme realizations, but Fig. 9 is only meant to show theconcept. This approach resembles template matching, ex-cept that the super-wavelet is produced from multiple pho-neme realizations and wavelet dilations handle speed changes.

    In the second approach, a classifier with adaptive waveletfeatures as in Sec. 2.2 could be used to identify phonemes.This is similar to the classifiers described above, except thatinstead of features taken from a spectrogram, the adaptivewavelets generate wideband transient features that are tai-bred to the problem. By optimizing the features for theproblem, fewer features should be needed and better clas-sification rates could result. The adaptive wavelet approachseems best suited to the better characterized waveforms ofvoiced rather than unvoiced phonemes.

    4.2 Speaker RecognitionTwo main applications of speaker recognition are (1) verifyinga person's identity prior to admitting him to a secured placeor to a telephone transaction and (2) associating a personwith a voice in police 19 Due to fewer applications ofspeaker recognition compared to speech recognition andlack of complete knowledge about which characteristics ofa speech signal help in identifying a speaker, speaker rec-ognition has received less emphasis.2024

    In general, automatic speaker recognizers exploit the var-iability in speech characteristics of different speakers causedby variations in the vocal cords and vocal tract. The dif-ferences in different speakers' vocal cords introduce van-ations in the pitch period (fundamental frequency of a speechsignal), and differences in the vocal tract introduce vania-tions in its resonant frequencies and, hence, variations inthe waveform on spectrum of a phoneme.

    The speaker recognizens developed so far can be broadlyclassified into text-dependent and text-independent systems.Text-dependent systems use a specially designed utterance,whereas text-independent systems operate on previously un-known speech utterances . The erron rate is lower in the caseof the text-dependent systems; however, text-independentsystems are more flexible and foolproof. Hence , we considera text-independent speaker recognition system.

    Generally , text-independent speaker recognition systemsuse a feature set averaged over a long utterance for clas-sification purposes.2527 The main disadvantage of long-term statistics is that they are often impractical for real-timetext-independent applications. However, this problem canbe overcome by using phonemes. Few phoneme-based speakerrecognition systems have been developed.24'28 One of these24

    1914 / OPTICAL ENGINEERING / September 1992 / Vol. 31 No.9

    a)-o

    0.ECa

    0a)a)00

    0 100 200 300 400time shift

    (a)

    00ECaC0Caa)8

    0 100 300200time shift

    (b)

    5000

    4000

    i 30002000

    0

    000

    -1000

    -2000

    100 200 300time shift

    (c)

    Fig. 9 Correlation of (a) "a," (b) "e," and (c) "i' phonemes in Fig. 3with "a," "e," and 'i" super-wavelets in Fig. 4, plotted with solid,dashed, and dashed-dot lines, respectively.

    Downloaded From: http://opticalengineering.spiedigitallibrary.org/ on 07/09/2015 Terms of Use: http://spiedl.org/terms

  • NEURAL NETWORK ADAPTIVE WAVELETS

    uses linear-predictive-coding (LPC) cepstral coefficients asfeatures for a quadratic classifier.

    The same adaptive wavelet approaches outlined in Sec. 4.1can be applied to speaker recognition, except in this casethe same phoneme from different speakers would form dif-ferent classes. However, speaker recognition has the ad-vantage that the spectral features of vowels (voiced pho-nemes) are most useful for speaker recognizers.29 This isconvenient for adaptive wavelets that can better capture thewaveforms of voiced sounds than unvoiced.

    5 ConclusionWavelets frequently have been applied to representation,but rarely to classification. We have shown how waveletscan be adaptively computed for either task, using differentneural network structures and energy functions best suitedfor each. The new concept of a super-wavelet allows thewavelet shape to be adaptively computed for a particularproblem, rather than only adaptively computing the param-eters of a fixed-shape wavelet.

    Our concern is primarily with classification rather thanrepresentation. For representation, orthogonal wavelets haveproven very useful for efficient and fast data compression,e.g. , Ref. 3. The adaptive wavelets we studied are not or-thogonal, but we see this as less of an issue for classificationwhere we are trying to find features that separate the classesrather than orthogonal features.

    The concepts of adaptive wavelets have been demon-strated on 1-D signals, and a discussion was presented onhow these concepts could apply to phoneme and speakerrecognition. However, these concepts should also apply toimages. In particular, the idea of using dilations of a super-wavelet to handle input scale changes applies to both 1-Dsignals and images. The idea of adaptively generating anoptimal set of wavelet features seems like a powerful ap-proach for both signals and images.

    6 Appendix: Speech CharacteristicsSpeechcharacteristics arise from physiology. This appendixreviews physiological details to provide insight into inter-phoneme and interspeaker differences. The human speechproduction system consists of the lungs , trachea (windpipe),pharynx (throat cavity), and vocal tract (which includes theoral and nasal Speech sounds are produced bythe passage of forced air from the lungs through the tracheainto the pharynx. The upper portion of the trachea containsa cartilaginous structure called the larynx. The larynx housestwo liplike ligaments called the vocal cords . The slitlikeopening between these two vocal cords is called the glottis.The vocal cords are held by arytenoid cartilage. This car-tilage facilitates in adjusting the tension in the vocal cords.The air from the pharynx then passes through the oral ornasal cavity of the vocal tract depending on whether thevelum (soft palate at the rear of the roof of the mouth) isclosed or open.

    A language can be described by a set of linguistic unitscalled phonemes (distinct speech sounds).19 For example,American English can be described by a set of 42 pho-nemes.12 The nature of each phoneme varies based on thesource of excitation (forced air), i.e., manner of articulationand the shape of the vocal tract, i.e., place of articulation.The shape of the vocal tract varies while producing various

    sounds based on the movements of articulatories such asthe glottis, the pharynx, the velum, the jaw, the tongue,and the lips. The variations in the characteristics of a pho-neme based on the shape of the vocal tract can be explainedas follows: The vocal tract can be considered similar to anacoustic tube. The forced air from the lungs and the pha-ryngeal cavity causes the vocal tract to resonate, whichmodulates the sound waveform, and the resonant frequen-cies (formant frequencies) of the vocal tract vary dependingon the length and shape of the vocal tract. The length ofthe vocal tract is fixed for a given speaker but varies fromspeaker to speaker.

    Based on the source of excitation, phonemes can be broadlyclassified into voiced, unvoiced, and mixed voiced or mixedunvoiced 13 The voiced sounds are produced by theperiodic or the semiperiodic vibrations of the vocal cords.The period of the vocal cords' vibrations depends on themass and compliance of the vocal cords and the subglottalpressure (air pressure below the glottis). The unvoiced soundsare produced by a turbulent flow of air created by someconstriction in the vocal tract. During the production ofunvoiced sounds , the vocal cords are held apart and theglottis is fully open. The mixed sounds are produced by theabrupt release of air pressure built up due to closure at somepoint in the vocal tract. The abrupt release of air pressureprovides transient excitation of the vocal tract. The transientexcitation can be associated with or without the vocal cordvibrations producing mixed voiced or mixed unvoiced sounds.

    Based on the place of articulation, the speech sounds canbe classified into the following eight groups19:

    1 . Labials: If both lips are held together, the sound iscalled bilabia; if the lower lip is in contact with theupper teeth, the sound is called labio dental.

    2. Dental: If the tongue tip or blade touches the edge orback of the upper incisor teeth, the sound is calleddental.

    3 . Alveolar: If the tongue tip or blade approaches ortouches the alveolar ridge (the ridge in the jaw wherethe teeth sockets are located) then the sound is calledalveolar.

    4. Palatal: If the tongue blade (dorsum) constricts withthe hard palate or if the tongue tip curls, the soundis called palatal.

    5 . Velar: If the dorsum approaches the soft palate, thesound is called velar.

    6. Uvular: If the tongue dorsum approaches the uvula,the sound is called uvular.

    7. Pharyngeal: If the pharynx constricts, the sound iscalled pharyngeal.

    8. Glottal: If the vocal cords are either close or con-stricted, the sound is called glottal.

    From the above description, it is clear that the variety ofbody parts involved in producing speech create a rich var-iation in the spectral and the waveform nature of differentphonemes.

    AcknowledgmentsThe support of this research by the Naval Surface WarfareCenter Dahlgren Division White Oak (NSWCDDWO) In-

    OPTICAL ENGINEERING /September 1992/Vol. 31 No.9/1915

    Downloaded From: http://opticalengineering.spiedigitallibrary.org/ on 07/09/2015 Terms of Use: http://spiedl.org/terms

  • SZU, TELFER, and KADAMBE

    dependent Research Program and an Office of Naval Research Young Navy Scientist Award is gratefully acknowl-edged.

    References1 - J Daugman, '

    'Complete discrete 2-D Gabor transforms by neuralnetworks for image analysis and compression,' ' IEEE Trans. Acoust.,Speech, SignaiProc. 36, 11691179 (July 1988).

    2. R. DeVore, B. Jawerth, and B.Lucier, ''Image compression throughwavelet transform coding," IEEE Trans. Jnf. Theory 38, 7 19746(March 1992).

    3. R. Coifman and M. Wickerhauser, "Entropy based algorithms forbest basis selection," IEEE Trans. Inf. Theory 38, 713718 (March1992).

    4. A Tewfik, D. Singha, and P. Jorgensen, "On the optimal choice ofa wavelet for signal representation,' ' iEEE Trans. Inf. Theory 38,747765 (March 1992).

    5. Y. Pati and P. Krishnaprasad, "Analysis and synthesis of feedforwardneural networks using discrete affine wavelet transformations,' ' Tech.Rep. SRC-TR-90-44, Univ. Maryland Systems Research Center (1991).

    6. Q. Zhang and A. Benveniste, ' 'Approximation by nonlinear waveletnetworks,' ' Proc. IEEE mt. Conf. Acoustics, Speech, and Signal Pro-cessing 5, 34173420 (May 1991).

    7. D. Casasent, i.-S. Smokelin, and A. Ye, ' 'Optical Gabor and wavelettransforms for scene analysis," Proc SPIE 1702 (April 1992).

    8. R. Fletcher, PracticalMethods ofOptimization, John Wiley and Sons,New York (1987).

    9. R. Duda and P. Hart, Pattern Classfication and Scene Analysis, JohnWiley and Sons, New York (1973).

    10. H. Szu, ' 'Neural networks based on Peano curves and hairy neurons,"Telematics Informatics 7(3/4), 403430 (1990).

    11. B. Telfer and H. Szu, ''Implementing the minimum-misclassification-error energy function for target recognition,' ' Proc. IEEE mt. JointConf. Neural NetworksBaltimore 4, 214219 (June 1992).

    12. L. R. Rabiner and R. W. Schafer, Digital Processing ofSpeech Sig-nals, Prentice-Hall, New Jersey (1978).

    13. J. W. Pickett, The Sounds of Speech Communication: a Primer ofAcoustic Phonetics and Speech Perception, University Park Press,Baltimore (1980).

    14. K. Tanaka, ' 'Aparametric representation and a clustering method forphoneme recognitionapplication to stops in a CV environment,"IEEE Trans. Acoust., Speech, SignalProc. 29, 11171127 (December1981).

    15. A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, and K. Lang,''Phoneme recognition using time-delay neural networks,' ' IEEE Trans.

    Acoustics, Speech, Signal Proc. 37, 328339 (March 1989).16. F. Greco, A. Paoloni, and G. Ravaioli, ' 'A recurrent time-delay neural

    network for improved phoneme recognition,' ' Proc. IEEE mt. Conf.on Acoustics, Speech, and Signal Processing 1, 8184 (May 1991).

    17. M. Nakamura, S. Tamura, and S. Sagayama, ' 'Phoneme recognitionby phoneme filter neural network,' ' Proc. IEEE mt. Conf. Acoustics,Speech, and Signal Processing 1, 8588 (May 1991).

    18. J. Takami and S. Sagayama, "A pairwise discriminant approach torobust phoneme recognition by time delay neural networks," Proc.IEEE mt. Conf. onAcoustics, Speech, and Signal Processing 1, 8992(May 1991).

    19. D. O'Shaughnessy, Speech Communication Human and Machine,Addison Wesley, New York (1990).

    20. U. Goldstein, "Speaker-identifying features based on formant tracks,"J. Acoust. Soc. Am. 59, 176182 (1976).

    1916 / OPTICAL ENGINEERING / September 1992 / Vol. 31 No. 9

    21. G. Doddington, "Speaker-recognition: identifying people from theirvoice," Proc. IEEE 73, 16511664 (1985).

    22. F. Soong, A. Rosenberg, L. Rabiner, and B. Juang, "A vector quan-tization approach to speaker recognition,' ' Proc. IEEE Int. Conf.Acoustics, Speech, and Signal Processing, pp. 387390 (May 1985).

    23. H. Hattori, ''Text-independent speaker recognition using neural net-

    works,' ' Proc. Int. Conf. Acoustics, Speech, and Signal ProcessingII, 153156 (March 1992).

    24. M. Savic and I. Sorensen, ' 'Phoneme based speaker verification,"Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing II,165168 (March 1992).

    25. H. Hollien and W. Majewski, "Speaker identification by long-termspectra under normal and distorted speech conditions,' ' J. Acoust.Soc. Am. 62, 975980 (1977).

    26. J. Markel and S. Davis, ''Text-independent speaker recognition from

    a large linguistically unconstrained time-spaced data base," IEEE Trans.Acoustics, Speech, Signal Proc. 27, 7482 (1979).

    27. K. Li and G. Hughes, "Talker differences as they appear in correlationmatrices of continuous speech spectra,' ' J. Acoust. Soc. Am. 55,833837 (1974).

    28. A. Higgins and R. Wohlford, "A new method of text-independentspeaker recognition,' ' Proc. IEEE Int. Conf. Acoustics, Speech andSignal Processing, pp. 869872 (1986).

    29. F. Nolan, The Phonetic Bases of Speaker Recognition, CambridgeUniversity Press, Cambridge (1983).

    30. J. L. Flanagan, Speech Analysis, Synthesis and Perception, Springer-Verlag, 2nd expanded ed. , New York (1972).

    31. D. Broomhead and D. Lowe, "Multi-variable functional interpolationand adaptive networks," Complex Syst. 2, 321 (1988).

    32. J. Moody and C. Darken, "Fast learning in networks of locally tunedprocessing units," Neural Computation 1, 281294 (1989).

    Harold H. Szu: Biography and photograph appear with the specialsection guest editorial.

    Brian Telfer: Biography and photograph appear withthe paper "Causalanalytical wavelet transform" in this issue.

    Shubha Kadambe received her under-graduate degrees in physics and electron-ics from Mysore University and Madras In-stitute of Technology, India, in 1977 and1980, respectively. She received her MS(EE) from Tuskegee University, Alabama,in 1986 and her PhD (EE) from the Uni-versity of Rhode Island in 1 991 . From 1980to 1981 , she was atrainee at Bhabha AtomicResearch Center, a premier research or-ganization in India. She was a scientific of-

    ficer at the same organization from 1981 to 1984. She is currently apostdoctoral research fellow atthe Applied Science and EngineeringLabs, A. I. duPont Institute, Wilmington, Delaware, conducting re-search in developing speech aids for the handicapped. Her researchinterests include speech analysis and synthesis, speech modeling,speech enhancement, time-frequency and time-scale representa-tions, neural networks, and image processing.

    Downloaded From: http://opticalengineering.spiedigitallibrary.org/ on 07/09/2015 Terms of Use: http://spiedl.org/terms