introduction to kernels (part ii) application to sequencesthe use of this kernel with svm...
TRANSCRIPT
Introduction to Kernels (part II)Application to sequences
Liva [email protected]
School of Information and Computer ScienceInstitute for Genomics and Bioinformatics
Introduction to Kernels (part II)Application to sequences – p.1
Outline
Classifying sequences
A rough scale on the difficulty to classify objects
Gram matricesCombining Gram matrices to classify proteins
Building sequence kernels
Introduction to Kernels (part II)Application to sequences – p.2
Classifying sequences
Problem of general interestclassification of textsclassification of music/sound/speechclassification of web logs (user modeling). . .
and of particular interest in bioinformaticsremote homology detection between proteins from theirsequences of amino acidsproteins structure predictionprediction of DNA splice sites. . .
Introduction to Kernels (part II)Application to sequences – p.3
A rough scale on classificationproblems difficulty
sequences graphs
difficulty
fewmany# methods
hard"easy"
�� � � � � � � ��
Typical difficulties encountered with structured objectscan be of various sizesno straightforward ways to do calculusno all-purpose similarities. . .
Introduction to Kernels (part II)Application to sequences – p.4
(Kernel) Gram matrices (1/2)
Let
��� �� � �
be a Mercer kernel (
�
may be a space ofsequences)
for a set of patterns
�� ����� �� � � � ��� �
��� �
��� �� !�� � �� " � !�� � �$# " % % % � !�� � �� "
� !�� � �$# " � !�$# � �$# " % % % � !�# � �� "
% % % % % %� !�� � ��� " � !��# � ��� " % % % � !��� � �� "&
'�' (
is the Gram matrix of
�with respect to
�
if corresponding targets )� �� � � � )� are available* ��� is sufficient for any Kernel Machine to be trained
Introduction to Kernels (part II)Application to sequences – p.5
(Kernel) Gram matrices (2/2)
A property of the Gram matrix (Mercer’s property)
Proposition 1 (Semi-Positiveness of the Gram matrix). Let� � �� � �
be a symmetric function.�
is a Mercer kernel +, �� � �� �� � � � ��� � � �.- / �
, 0 �1� 0 2 3 � , 0 / ��
This means that for any Mercer kernel
�
and any set of patterns
�
,the Gram matrix
� � has only nonnegative eigenvalues
This gives, in particular,�� and
�# being Mercer kernels� 4� � 5 / 6
is a Mercer kernel7 �� 8�9 �# � 7 �9 : 3 is a Mercer kernel�� �# is a Mercer kernel
Introduction to Kernels (part II)Application to sequences – p.6
Combining Gram matrices toclassify proteins (1/2)
Reference: [Lanckriet et al., 2004]
Problems addressed (a few thousands of yeast proteins)ribosomal protein classificationmembrane protein classification
Methoduse of genome-wide data sets coded as similarity/Grammatricesconvex combination of these kernel functionsSemi-definite programming and Support Vector Learning
Introduction to Kernels (part II)Application to sequences – p.7
Combining Gram matrices toclassify proteins (2/2)
Seven kernels used (all coming from some biological knowledge)�1; < � �>= � �>? @�AB : based on amino acids sequence similaritymeasures provided by state-of-the-art algorithms(Smith-Waterman, BLAST and Pfam HMM)�DC CE : a Fast Fourier Transform-based kernel matrix usinghydropathy profiles of the proteins�GFH � �GI : kernel matrices based on protein interactioninformation�>J : a similarity matrix based on microarray gene expressionmeasurements
Omitting the technical details (SDP)a kernel
�K L- �- � L- 2 3 � M N �� � � � O is looked forat the same time an SVM classifier based on
�
is learnedthe results obtained are the best ever
Introduction to Kernels (part II)Application to sequences – p.8
Building sequence kernels (1/4)
Question: what if no such an ’exhaustive’ or relevant knowledge isavailable?Answer: build you own sequence kernel
How to do that?strong idea: comparing subsequences of sequences (cf.convolution kernels [Haussler, 1999])another idea: Fisher kernels [Jaakkola and Haussler, 1998]
Introduction to Kernels (part II)Application to sequences – p.9
Building sequence kernels (2/4)
The spectrum kernel [Leslie et al., 2002]counts the number of common
�
-mers in two sequencescomputes a value from this countsfor proteins: alphabet
P
of
Q 3
symbols
Introduction to Kernels (part II)Application to sequences – p.10
Building sequence kernels (2/4)
The spectrum kernel [Leslie et al., 2002]counts the number of common
�
-mers in two sequencescomputes a value from this countsfor proteins: alphabet
P
of
Q 3
symbols
aabbcbbecced
1RS R TVU WX YU ZX X X[>\ ]^_
Introduction to Kernels (part II)Application to sequences – p.10
Building sequence kernels (2/4)
The spectrum kernel [Leslie et al., 2002]counts the number of common
�
-mers in two sequencescomputes a value from this countsfor proteins: alphabet
P
of
Q 3
symbols
aabbcbbecced
1
1
`a ` bVc de fc ge e eh>i jklIntroduction to Kernels (part II)Application to sequences – p.10
Building sequence kernels (2/4)
The spectrum kernel [Leslie et al., 2002]counts the number of common
�
-mers in two sequencescomputes a value from this countsfor proteins: alphabet
P
of
Q 3
symbols
1
1
1
aabbcbbecced
m>n opq rs r tVu vw xu yw w w
Introduction to Kernels (part II)Application to sequences – p.10
Building sequence kernels (3/4)
Computation of the spectrum kernel
explicit construction of the feature spacez � {| { }
feature vectors are very sparse (few nonzero elements)using a suffix tree structure [Ukkonen, 1995]
makes it possible to compute the kernels efficientlymakes it possible to manage space efficiently
Results on a protein classification problemcomparable to other methods (but not better)raises the need of another sequence kernel
Introduction to Kernels (part II)Application to sequences – p.11
Building sequence kernels (4/4)
Mismatch String Kernel [Leslie et al., 2003]based on idea of the spectrum kernelallows mismatches in
�
-mers comparisonsfor
� ~
· spectrum kernel would consider the similarity between� � � and � � � as being 0· (3,1)-mismatch string kernel would assign a value>0 to
this similaritycan use the suffix tree data structure for efficient computations
Results on a protein classification taskthe use of this kernel with SVM outperformed the results of thespectrum kerneland compared favorably to several state-of-the-art methods
Introduction to Kernels (part II)Application to sequences – p.12
Conclusion
Importance ofGram matricescombination of kernels (cf. Combining Classifiers and CombiningKernels)
Building kernelsconvolution and spectral kernelssubstructures enumerationcomplexity of kernel computationFisher kernels (cf. presentation by a group a students ?)
Introduction to Kernels (part II)Application to sequences – p.13
References[Haussler, 1999] Haussler, D. (1999). Convolution Kernels on Discrete Structures. Technical
Report UCS-CRL-99-10, UC Santa Cruz.
[Jaakkola and Haussler, 1998] Jaakkola, T. and Haussler, D. (1998). Exploiting generativemodels in discriminative classifiers. In Adv. in Neural Information Processing Systems,volume 11.
[Lanckriet et al., 2004] Lanckriet, G. R. G., De Bie, T., Cristianini, N., Jordan, M. I., and Noble,W. S. (2004). A statistical framework for genomic data fusion. Bioinformatics.
[Leslie et al., 2003] Leslie, C., Eskin, E., Cohen, A., Weston, J., and Noble, W. S. (2003).Mismatch string kernels for SVM protein classification. In Adv. in Neural InformationProcessing Systems, volume 15.
[Leslie et al., 2002] Leslie, C., Eskin, E., and Noble, W. S. (2002). The spectrum kernel: Astring kernel for svm protein classification. In Proc. of the Pacific Symposium onBiocomputing, 2002, pages 564–575.
[Ukkonen, 1995] Ukkonen, E. (1995). On–line construction of suffix trees. Algorithmica,14:249–60.
Introduction to Kernels (part II)Application to sequences – p.14