introduction to kernels (part ii) application to sequencesthe use of this kernel with svm...

17
Introduction to Kernels (part II) Application to sequences Liva Ralaivola [email protected] School of Information and Computer Science Institute for Genomics and Bioinformatics Introduction to Kernels (part II)Application to sequences – p.1

Upload: others

Post on 22-Sep-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Kernels (part II) Application to sequencesthe use of this kernel with SVM outperformed the results of the spectrum kernel and compared favorably to several state-of-the-art

Introduction to Kernels (part II)Application to sequences

Liva [email protected]

School of Information and Computer ScienceInstitute for Genomics and Bioinformatics

Introduction to Kernels (part II)Application to sequences – p.1

Page 2: Introduction to Kernels (part II) Application to sequencesthe use of this kernel with SVM outperformed the results of the spectrum kernel and compared favorably to several state-of-the-art

Outline

Classifying sequences

A rough scale on the difficulty to classify objects

Gram matricesCombining Gram matrices to classify proteins

Building sequence kernels

Introduction to Kernels (part II)Application to sequences – p.2

Page 3: Introduction to Kernels (part II) Application to sequencesthe use of this kernel with SVM outperformed the results of the spectrum kernel and compared favorably to several state-of-the-art

Classifying sequences

Problem of general interestclassification of textsclassification of music/sound/speechclassification of web logs (user modeling). . .

and of particular interest in bioinformaticsremote homology detection between proteins from theirsequences of amino acidsproteins structure predictionprediction of DNA splice sites. . .

Introduction to Kernels (part II)Application to sequences – p.3

Page 4: Introduction to Kernels (part II) Application to sequencesthe use of this kernel with SVM outperformed the results of the spectrum kernel and compared favorably to several state-of-the-art

A rough scale on classificationproblems difficulty

sequences graphs

difficulty

fewmany# methods

hard"easy"

�� � � � � � � ��

Typical difficulties encountered with structured objectscan be of various sizesno straightforward ways to do calculusno all-purpose similarities. . .

Introduction to Kernels (part II)Application to sequences – p.4

Page 5: Introduction to Kernels (part II) Application to sequencesthe use of this kernel with SVM outperformed the results of the spectrum kernel and compared favorably to several state-of-the-art

(Kernel) Gram matrices (1/2)

Let

��� �� � �

be a Mercer kernel (

may be a space ofsequences)

for a set of patterns

�� ����� �� � � � ��� �

��� �

��� �� !�� � �� " � !�� � �$# " % % % � !�� � �� "

� !�� � �$# " � !�$# � �$# " % % % � !�# � �� "

% % % % % %� !�� � ��� " � !��# � ��� " % % % � !��� � �� "&

'�' (

is the Gram matrix of

�with respect to

if corresponding targets )� �� � � � )� are available* ��� is sufficient for any Kernel Machine to be trained

Introduction to Kernels (part II)Application to sequences – p.5

Page 6: Introduction to Kernels (part II) Application to sequencesthe use of this kernel with SVM outperformed the results of the spectrum kernel and compared favorably to several state-of-the-art

(Kernel) Gram matrices (2/2)

A property of the Gram matrix (Mercer’s property)

Proposition 1 (Semi-Positiveness of the Gram matrix). Let� � �� � �

be a symmetric function.�

is a Mercer kernel +, �� � �� �� � � � ��� � � �.- / �

, 0 �1� 0 2 3 � , 0 / ��

This means that for any Mercer kernel

and any set of patterns

,the Gram matrix

� � has only nonnegative eigenvalues

This gives, in particular,�� and

�# being Mercer kernels� 4� � 5 / 6

is a Mercer kernel7 �� 8�9 �# � 7 �9 : 3 is a Mercer kernel�� �# is a Mercer kernel

Introduction to Kernels (part II)Application to sequences – p.6

Page 7: Introduction to Kernels (part II) Application to sequencesthe use of this kernel with SVM outperformed the results of the spectrum kernel and compared favorably to several state-of-the-art

Combining Gram matrices toclassify proteins (1/2)

Reference: [Lanckriet et al., 2004]

Problems addressed (a few thousands of yeast proteins)ribosomal protein classificationmembrane protein classification

Methoduse of genome-wide data sets coded as similarity/Grammatricesconvex combination of these kernel functionsSemi-definite programming and Support Vector Learning

Introduction to Kernels (part II)Application to sequences – p.7

Page 8: Introduction to Kernels (part II) Application to sequencesthe use of this kernel with SVM outperformed the results of the spectrum kernel and compared favorably to several state-of-the-art

Combining Gram matrices toclassify proteins (2/2)

Seven kernels used (all coming from some biological knowledge)�1; < � �>= � �>? @�AB : based on amino acids sequence similaritymeasures provided by state-of-the-art algorithms(Smith-Waterman, BLAST and Pfam HMM)�DC CE : a Fast Fourier Transform-based kernel matrix usinghydropathy profiles of the proteins�GFH � �GI : kernel matrices based on protein interactioninformation�>J : a similarity matrix based on microarray gene expressionmeasurements

Omitting the technical details (SDP)a kernel

�K L- �- � L- 2 3 � M N �� � � � O is looked forat the same time an SVM classifier based on

is learnedthe results obtained are the best ever

Introduction to Kernels (part II)Application to sequences – p.8

Page 9: Introduction to Kernels (part II) Application to sequencesthe use of this kernel with SVM outperformed the results of the spectrum kernel and compared favorably to several state-of-the-art

Building sequence kernels (1/4)

Question: what if no such an ’exhaustive’ or relevant knowledge isavailable?Answer: build you own sequence kernel

How to do that?strong idea: comparing subsequences of sequences (cf.convolution kernels [Haussler, 1999])another idea: Fisher kernels [Jaakkola and Haussler, 1998]

Introduction to Kernels (part II)Application to sequences – p.9

Page 10: Introduction to Kernels (part II) Application to sequencesthe use of this kernel with SVM outperformed the results of the spectrum kernel and compared favorably to several state-of-the-art

Building sequence kernels (2/4)

The spectrum kernel [Leslie et al., 2002]counts the number of common

-mers in two sequencescomputes a value from this countsfor proteins: alphabet

P

of

Q 3

symbols

Introduction to Kernels (part II)Application to sequences – p.10

Page 11: Introduction to Kernels (part II) Application to sequencesthe use of this kernel with SVM outperformed the results of the spectrum kernel and compared favorably to several state-of-the-art

Building sequence kernels (2/4)

The spectrum kernel [Leslie et al., 2002]counts the number of common

-mers in two sequencescomputes a value from this countsfor proteins: alphabet

P

of

Q 3

symbols

aabbcbbecced

1RS R TVU WX YU ZX X X[>\ ]^_

Introduction to Kernels (part II)Application to sequences – p.10

Page 12: Introduction to Kernels (part II) Application to sequencesthe use of this kernel with SVM outperformed the results of the spectrum kernel and compared favorably to several state-of-the-art

Building sequence kernels (2/4)

The spectrum kernel [Leslie et al., 2002]counts the number of common

-mers in two sequencescomputes a value from this countsfor proteins: alphabet

P

of

Q 3

symbols

aabbcbbecced

1

1

`a ` bVc de fc ge e eh>i jklIntroduction to Kernels (part II)Application to sequences – p.10

Page 13: Introduction to Kernels (part II) Application to sequencesthe use of this kernel with SVM outperformed the results of the spectrum kernel and compared favorably to several state-of-the-art

Building sequence kernels (2/4)

The spectrum kernel [Leslie et al., 2002]counts the number of common

-mers in two sequencescomputes a value from this countsfor proteins: alphabet

P

of

Q 3

symbols

1

1

1

aabbcbbecced

m>n opq rs r tVu vw xu yw w w

Introduction to Kernels (part II)Application to sequences – p.10

Page 14: Introduction to Kernels (part II) Application to sequencesthe use of this kernel with SVM outperformed the results of the spectrum kernel and compared favorably to several state-of-the-art

Building sequence kernels (3/4)

Computation of the spectrum kernel

explicit construction of the feature spacez � {| { }

feature vectors are very sparse (few nonzero elements)using a suffix tree structure [Ukkonen, 1995]

makes it possible to compute the kernels efficientlymakes it possible to manage space efficiently

Results on a protein classification problemcomparable to other methods (but not better)raises the need of another sequence kernel

Introduction to Kernels (part II)Application to sequences – p.11

Page 15: Introduction to Kernels (part II) Application to sequencesthe use of this kernel with SVM outperformed the results of the spectrum kernel and compared favorably to several state-of-the-art

Building sequence kernels (4/4)

Mismatch String Kernel [Leslie et al., 2003]based on idea of the spectrum kernelallows mismatches in

-mers comparisonsfor

� ~

· spectrum kernel would consider the similarity between� � � and � � � as being 0· (3,1)-mismatch string kernel would assign a value>0 to

this similaritycan use the suffix tree data structure for efficient computations

Results on a protein classification taskthe use of this kernel with SVM outperformed the results of thespectrum kerneland compared favorably to several state-of-the-art methods

Introduction to Kernels (part II)Application to sequences – p.12

Page 16: Introduction to Kernels (part II) Application to sequencesthe use of this kernel with SVM outperformed the results of the spectrum kernel and compared favorably to several state-of-the-art

Conclusion

Importance ofGram matricescombination of kernels (cf. Combining Classifiers and CombiningKernels)

Building kernelsconvolution and spectral kernelssubstructures enumerationcomplexity of kernel computationFisher kernels (cf. presentation by a group a students ?)

Introduction to Kernels (part II)Application to sequences – p.13

Page 17: Introduction to Kernels (part II) Application to sequencesthe use of this kernel with SVM outperformed the results of the spectrum kernel and compared favorably to several state-of-the-art

References[Haussler, 1999] Haussler, D. (1999). Convolution Kernels on Discrete Structures. Technical

Report UCS-CRL-99-10, UC Santa Cruz.

[Jaakkola and Haussler, 1998] Jaakkola, T. and Haussler, D. (1998). Exploiting generativemodels in discriminative classifiers. In Adv. in Neural Information Processing Systems,volume 11.

[Lanckriet et al., 2004] Lanckriet, G. R. G., De Bie, T., Cristianini, N., Jordan, M. I., and Noble,W. S. (2004). A statistical framework for genomic data fusion. Bioinformatics.

[Leslie et al., 2003] Leslie, C., Eskin, E., Cohen, A., Weston, J., and Noble, W. S. (2003).Mismatch string kernels for SVM protein classification. In Adv. in Neural InformationProcessing Systems, volume 15.

[Leslie et al., 2002] Leslie, C., Eskin, E., and Noble, W. S. (2002). The spectrum kernel: Astring kernel for svm protein classification. In Proc. of the Pacific Symposium onBiocomputing, 2002, pages 564–575.

[Ukkonen, 1995] Ukkonen, E. (1995). On–line construction of suffix trees. Algorithmica,14:249–60.

Introduction to Kernels (part II)Application to sequences – p.14