on learning with kernels for audio signal processing: the ... · background - applications in audio...

On Learning with Kernels for Audio SignalProcessing: the old and the new

Hachem Kadri

QARMA team - LIFAix-Marseille University

hachem.kadri@lif.univ-mrs.fr

GIPSA-Lab 2013

Background - Functional Learning (1/2)

= f (xi

) + ‘

I Supervised learning

≠æ Data: n training examples {(x1, y1), . . . , (xn, yn)}≠æ Goal: learn f

Predictor ‘≠æ Response ModelRd {≠1, 1} Binary ClassificationRd {1, 2, 3, . . .} Multi-class ClassificationRd R Multiple Regression

H. Kadri, QARMA Learning with kernels: the old and the new 2/1

Background - Functional Learning (2/2)

I Minimization problem

minfœF

, f(xi

æ V: loss function - e.g. square loss:!y

≠ f(xi

I Overfitting problem

I Regularized minimization

minfœF

, f(xi

+ ⁄�(f)

æ �: regularization - e.g. L2-norm: �(f) = ÎfÎ2F

Background - Learning with Kernels (1/2)

= f (xi

) + ‘

œ RI Linear model: f(x) = Èa, xÍ + b

I Kernels: nonlinear/nonparametric estimation

input space feature space

RKHS associated with a positive definite kernel k givesa desired feature space!!

Background - Learning with Kernels (2/2)

2 PerspectivesI Feature space

≠æ nonlinear in input space

≠æ projecting data into a Feature space

≠æ linear in Feature space

≠æ kernel trick È�(x1), �(x2)Í = k(x1, x2)

I RKHS theory≠æ Mercer theorem: integral operator + positive kernel

≠æ reproducing property: Èf, k(x, ·)Í = f(x)

≠æ representer theorem: f(·) =qi

, ·) ; –

Background - Applications in audio processing

I Music segmentation (Davy et al., 2006 - IRCCyN)

I Speaker verification (Louradour et al., 2007 - IRIT)

I Speaker change detection (Harchaoui et al., 2008 - LTCI)

I Sound recognition (Rabaoui et al., 2008 - LAGIS)

I Speech inversion (Toutios et al., 2008 - LORIA)

Learning with kernels - Limitations and Challenges

+ + Geometric intuition and interpretation- - Choosing the kernel in advance- - Sequential, time-varying characteristics- - Limited to single task/scalar output

I “Sophisticated” kernel methods

• learning kernels ≠æ MKL: Multiple Kernel Learning

• probability distribution ≠æ RKHS embedding of distributions

• connection geometric/time-varying ≠æ FDA

• multi-task/complex outputs ≠æ Operator-valued Kernel

• Deep Learning - Representation Learning ≠æ ? . . .

≠æ Learning with kernels/RKHS embedding≠ ≠ ≠ + ≠ ≠ ≠ ≠ ≠ ≠ ≠ + ≠ ≠ ≠ ≠ ≠ ≠ ≠ ≠ ≠ ≠ + ≠ ≠ ≠ ≠ ≠ ≠≠ >

1950 1995 ≠ 2002 2005 ≠ . . .

Aronsazn Vapnik, Cortes, Scholkopf, Smola Gretton, Le Song, Fukumizu

≠æ Learning ⇠⇠⇠XXXwith kernels≠ ≠ ≠ ≠ ≠ ≠ ≠ ≠ ≠ ≠ ≠ + ≠ ≠ ≠ + ≠ ≠ ≠ + ≠ ≠ ≠ + ≠ ≠ ≠ ≠ ≠≠ >

2004 2006 2007 2010Lackiert - Bach Sonn. Rakoto. Cortes - Kloft

≠æ Learning with operator-valued kernels≠ ≠ ≠ ≠ ≠ + ≠ ≠ ≠ ≠ ≠ ≠ + ≠ ≠ ≠ ≠ ≠ + ≠ ≠ ≠ ≠ + ≠ ≠ ≠ ≠ ≠≠ >

1958 ≠ 1960 2005 2008 2010/2011Pedrick - Schwartz Micc & Pontil Caponnetto Kadri/d’Alche-Buc

≠æ Learning ⇠⇠⇠XXXwith operator-valued kernels≠ ≠ ≠ ≠ ≠ ≠ ≠ ≠ ≠ ≠ ≠ ≠ ≠ ≠ ≠ ≠ + ≠ ≠ ≠ ≠ + ≠ ≠ ≠ + ≠ ≠ ≠≠ >

2011 2012 2013Dinuzzo Kadri Sindhwani

FDA - Examples

850 900 950 1000 10502

wavelengths

Spectrometric Curves

0 50 100 150

frequencies

Speech

−0.04 −0.03 −0.02 −0.01 0 0.01 0.02 0.03 0.04−0.06

−0.04

−0.02

meters

Handwriting

0 2 4 6 8 10 12

−0.2

−0.1

months

Electricity Consumption

Regression Classification

Time warping Forecasting

Ramsay and Silverman (2002) - Ferraty and Vieu (2006)

FDA - Functional inputs & functional outputs

= f (xi

) + ‘

Predictor ‘≠æ Response ModelL

2 Functional Model - Functional Responses

Temperature

0 50 100 150 200 250 300 350

Precipitation

0 50 100 150 200 250 300 3500

• Operator estimation≠æ min

i=1Îy

≠ f(xi

)Î2Y + ⁄ÎfÎ2

Learning from functional responses - Discrete case

Learning from mul-tiple response data

Multiple out-put regression

Learning vector-valued function

Statistics Machinelearning

Multi-tasklearning

(Micchelli and Pontil, 2005)

C&Wprocedure

(Breiman and Friedman, 1997)

Reproducing kernels - From scalar to functional

Scalar-valued Function-valued

• Operator-valued kernels & function-valued RKHS≠æ Nonlinear FDA

Outline

• Hilbert space of operators with Reproducing Kernelsæ Function-valued RKHSæ Operator-valued kernel

• Operator estimationæ L

2-regularized operator learning algorithmæ Block operator kernel matrix inversion

• Application to audio and speech processingæ Speech inversionæ Environmental sound recognition

Operator-valued kernels - Definition

• (xi

(s), y

i=1 œ X ◊ Y

• X : �x

≠æ R ; Y : �y

≠æ R

• � ™ R : curve ; � ™ R2 : image

DefinitionKF (., .) : X ◊ X ≠æ L(Y)

IKF is Hermitian if KF (w, z) = KF (z, w)ú,

I it is nonnegative on X if for any {(wi

)i=1,...,r

} œ X ◊ Yÿ

ÈKF (wi

ÍY Ø 0

Operator-valued kernels - Function-valued RKHS

• Extending real/vector-valued RKHS theory to FDA (Kadri etal. AISTATS 2010)

• RKHS of function-valued functions

DefinitionA Hilbert space F = {f : X ≠æ Y} is called a reproducing kernelHilbert space if there is an operator-valued kernel KF such that:

Ih : z ‘≠æ KF (w, z)g =∆ h œ F , ’w œ X and g œ Y

I ’f œ F , Èf, KF (w, .)gÍF = Èf(w), gÍY (reproducing property)

Operator-valued kernels - Uniqueness & Bijection

LemmaF function-valued RKHS =∆ KF (w, z) is unique

I Proof:Û ÈK Õ(wÕ

, .)gÕ, K(w, .)gÍF = ÈK Õ(wÕ

, w)gÕ, gÍY

Û ÈK(w, .)g, K

Õ(wÕ, .)gÕÍF = ÈK(w, w

Õ)g, g

= Èg, K(w, w

Õ)úhÍY = Èg, K(wÕ

, w)gÕÍY

TheoremKF (w, z) nonnegative ≈∆ RKHS F

I Proof:≈

nqi,j=1

ÈK(wi

ÍY =nq

i,j=1ÈK(w

, .)ui

, K(wj

, .)uj

∆ F0, ’f œ F0, f(.) =nq

i=1KF (w

, .)–i

Operator-valued kernels - Construction

• Multi-task kernel =∆ K(w, z) = k(w, z)TI k: real-valued kernelI T: diagonal matrix + low rank matrix (finite dimension)

• FDA kernel =∆ T œ L(Y) (infinite dimension) ?I Concurrent functional linear model

æ y(t) = –(t) + —(t)x(t)æ Multiplication operatoræ Varying coe�cient model (Hastie and Tibshirani, 1993)

I Functional linear model for functional responses (Ramsay andSilverman, 2005)

æ y(t) = –(t) +s

—(s, t)x(s)ds

æ Hilbert-Schmidt integral operator

Operator-valued kernels - Examples

1. Multiplication operatorKF : X ◊ X ≠æ L(Y)

x1, x2 ‘≠æ k

(x1, x2)T ky ; T

(t) , h(t)y(t)

2. Hilbert-Schmidt integral operatorKF : X ◊ X ≠æ L(Y)

x1, x2 ‘≠æ k

(x1, x2)T ky ; T

(t) ,s

h(s, t)y(s)ds

3. Composition operatorKF : X ◊ X ≠æ L(Y)

x1, x2 ‘≠æ C

Â(x1)CúÂ(x2) ; C

: f ‘≠æ f ¶ Ï

Operator-valued kernels - Feature map(Kadri et al., ICML 2011)

I Operator-valued kernel admits a feature map representationæ ÈK(x1, x2)y1, y2ÍY = È�(x1, y1), �(x2, y2)ÍL(X ,Y)

æ ÈK(x1, .)y1, K(x2, .)y2ÍF = ÈK(x1, x2)y1, y2ÍY

I Complex/infinite-dimensional inputsæ multiple functional data x

œ (L2)p

I FDA viewpointæ one observation = one continuous curve

Real-valued RKHS�

: (L2)p æ L((L2)p

,R)x ‘æ k(x, .)

dim: p ≠æ 1

Function-valued RKHS�y

: (L2)p æ L((L2)p

2)x ‘æ K(x, .)y

dim: p ≠æ inf

Optimization problem - Representer theorem

TheoremThe solution of the minimization problem

minfœF

i=1Îy

≠ f(xi

)Î2Y + ⁄ÎfÎ2

is achieved by a function of the form

ú(.) =nÿ

i=1KF (x

, .)—i

Optimization problem - Solution

minfœF

≠ f(xi

)Î2Y + ⁄ÎfÎ2

using the representer theorem & the reproducing property

≈∆ min—iœY

j=1KF (x

Î2Y + ⁄

ÈKF (xi

I Discretization (Kadri et al., AISTATS 2010)æ grid {t1, . . . , t

} =∆ —

(t1), . . . , —

I Approximation (Kadri et al., Tech. Report 2011)æ Y a real RKHS =∆ —

l=1 –

I Analytic solution (Kadri et al., ICML 2011)æ (K + ⁄I)— = y ; — œ (Y)n and K œ [L(Y)]n◊n

Optimization problem - Block operator kernel matrixinversion

I (block) numerical range: spectral and operator theory

I Spectral theory of block operator matrices (C. Tretter, 2008)

) = G(xi

)T, ’x

I Kronecker product

æ K =

caG(x1, x1)T . . . G(x1, x

)T... . . . ...

, x1)T . . . G(xn

db = G ¢ T

æ K≠1 = G≠1 ¢ T

Algorithm 1 L

2-Regularized Operator Learning Algorithm

Inputdata x

œ (L2([0, 1]))p, y

2([0, 1]), size n

Eigendecomposition of G = G(xi

i,j=1 œ Rn◊n

eigenvalues –

œ R, eigenvectors v

œ Rn, size n

Eigendecomposition of T œ L(Y )Initialize k: number of eigenfunctionseigenvalues ”

œ R, eigenfunctions w

2([0, 1]), size k

Eigendecomposition of K = G ¢ T

K = K(xi

i,j=1 œ (L(Y ))n◊n

eigenvalues ◊

œ R, eigenfunctions z

œ (L2([0, 1]))n, size n◊k

◊ = – ¢ ”, z = v ¢ w

Solution — = (K + ⁄I)≠1y

Initialize ⁄: regularization parameter— = q

i=1 (◊i

+ ⁄)≠1 qn

j=1Èzij

Applications - Speech inversionSS

Speech production

Speech Inversion

1: upper lip

2: lower lip

3: jaw

4: tongue tip

5: tongue body

6: velum

7: glottis

Time(s)

Speech signal

Figure : Acoustic to articulatory inversion

I speech inversionæ learning the acoustic-to-articulatory mappingæ from MFCC to Vocal-tract time functions (VTTF)æ improving speech technology and understandingæ helping individuals with speech and hearing disorders

Applications - Speech inversion

0 0.5 10

0 0.2 0.4−20

-20 0.5 1

0 0.5 1−20

0 0.5 18

0 0.5 1

0 0.5 10

0 0.5 1

0 0.5 1−0.1

−0.05

0 0.5 10

Time(s)

0 0.2 0.4−20

0 0.2 0.48

0 0.2 0.40

0 0.2 0.420

0 0.2 0.40

0 0.2 0.4100

0 0.2 0.4−0.2

0 0.2 0.40

Time(s)

0 0.5 10

0 0.5 18

0 0.5 1−50

0 0.5 10

0 0.5 1−20

0 0.5 10

0 0.5 1−0.2

0 0.5 10

Time(s)

"beautiful" "conversation" "smooth"

Applications - Speech inversion

Tab.2 : Average RSSE for the tract variables

VT variables Á-SVR Multi-task functionalLA 2.763 2.341 1.562LP 0.532 0.512 0.528

TTCD 3.345 1.975 1.647TTCL 7.752 5.276 3.463TBCD 2.155 2.094 1.582TBCL 15.083 9.763 7.215VEL 0.032 0.034 0.029GLO 0.041 0.052 0.064Total 3.962 2.755 2.011

IÁ-SVR (Mitra et al., ICASSP 2009)

I Multi-task kernel (Kadri et al., ICASSP 2011)

Applications - Sound recognition

I Sound Recognition

≠æ Surveillance and security applications

I Features extraction≠æ temporal, spectral, cepstral, ... characteristics

0 1 2 3 4 5 6 7x 104

−0.5

Time(s)

0 50 100 150 200 250 3000

1Evolution of the Zero Crossing Rate (ZCR)

0 50 100 1500.005

0.03Spec−Roll−off (SRF)

0 50 100 150−10

3013 Cepstral coefficients (MFCC)

Applications - Sound recognitionI Limitations - Multivariate data modeling

≠æ features contain discrete values of various parameters≠æ feature vector œ RDP by concatenating samples of ”= features

I Solution - Multivariate functional data modeling

≠æ modeling each audio signal by a vector of functions in (L2)D

Tab.3 : Classes of sounds and number of samples in the database usedfor performance evaluation.

Classes Number Train Test Total Duration (s)Human screams C1 40 25 65 167

Gunshots C2 36 19 55 97Glass breaking C3 48 25 73 123

Explosions C4 41 21 62 180Door slams C5 50 25 75 96Phone rings C6 34 17 51 107

Children voices C7 58 29 87 140Machines C8 40 20 60 184

Total 327 181 508 18mn 14s

Figure : Structural similarities between two di�erent classes

Figure : Structural diversity inside the same sound class and betweenclasses

Tab.4 : Confusion Matrix obtained when using the Regularized LeastSquares Classification (RLSC) algorithm (Rifkin et al, 2003)

C1 C2 C3 C4 C5 C6 C7 C8C1 92 4 4.76 0 5.27 11.3 6.89 0C2 0 52 0 14 0 2.7 0 0C3 0 20 76.2 0 0 0 17.24 5C4 0 16 0 66 0 0 0 0C5 4 8 0 4 84.21 0 6.8 0C6 4 0 0 0 10.52 86 0 0C7 0 0 0 8 0 0 69.07 0C8 0 0 19.04 8 0 0 0 95

Total Recognition Rate = 77.56%

Tab.5 : Confusion Matrix obtained when using the FunctionalRegularized Least Squares algorithm

C1 C2 C3 C4 C5 C6 C7 C8C1 100 0 0 2 0 5.3 3.4 0C2 0 82 0 8 0 0 0 0C3 0 14 90.9 8 0 0 3.4 0C4 0 4 0 78 0 0 0 0C5 0 0 0 1 89.47 0 6.8 0C6 0 0 0 0 10.53 94.7 0 0C7 0 0 0 0 0 0 86.4 0C8 0 0 9.1 3 0 0 0 100

Total Recognition Rate = 90.18%

Applications - Beyond Audio ProcessingI Functional outputs - BCI

0 20 40 60 80 100 120 140 160 180 200−20

0 20 40 60 80 100 120 140 160 180 200−10

0 20 40 60 80 100 120 140 160 180 200−5

Time samples0 50 100 150 200

−1.5

−0.5

Time samples

ment S

0 50 100 150 200

Time samples

I Structured outputs - Image, Text, Graph prediction

5 10 15

I Tensor outputs - Multilinear multitaskJury 1 Jury 2 Jury 3

Athlete performance technical score · · ·artistic score · · ·

Conclusion & Perspectives

I Conclusionæ RKHS framework for functional data - Nonlinear FDAæ FDA kernelsæ Audio and Speech processing applications

I Perspectivesæ Mixed data (discrete, continuous,...)æ Learning the operator-valued kernelæ Multilinear representation learning

on learning with kernels for audio signal processing: the ... · background - applications in audio...

Documents

digital audio effects processing & reverberation - aalto ·...

audio signal processing for surveillance …

audio signal processing in faust

audio processing and music recognition

introducing audio signal processing & audio...

digital transmit audio processing with presonus studio …...

process audio samples 3d audio processing for numerous sound...

introducing audio signal processing & audio coding · to...

tlv320aic3110 low-power audio codec with audio processing

audio signal processing -- quantization

lc786820e - compressed audio signal processor ic with usb...

davy graduate programme investing in your success · an...

digital audio effects -...

audio processing on constrained devices

rasta processing of speech - speech and audio processing...

speech & audio processing speech & audio coding examples

audio processing material

expanding accessibility audio signal processing

audio processing

speech & audio processing