kernel based data fusion
DESCRIPTION
Kernel based data fusion. Discussion of a Paper by G. Lanckriet. Paper. Overview. Problem : Aggregation of heterogeneous data Idea : Different data are represented by different kernels Question : How to combine different kernels in an elegant/efficient way? - PowerPoint PPT PresentationTRANSCRIPT
1
Kernel based data fusionDiscussion of a Paper by G. Lanckriet
2
Paper
3
Overview
Problem: Aggregation of heterogeneous data
Idea: Different data are represented by differentkernels
Question: How to combine different kernels in anelegant/efficient way?
Solution: Linear combination and SDP
Application: Recognition of ribosomal and membrane proteins
4
Linear combination of kernels
weight kernel
Resulting kernel K is positive definite (xTKx > 0 for x, provided i > 0 and xTKi x > 0 )
Elegant aggregation of heterogeneous data More efficient than training of individual SVMs KCCA uses unweighted sum over individual kernels
xTKx = x2K
x
x2K
0
5
Support Vector Machine
slack variables
square norm vector
penalty term
Hyperplane
6
Dual form
Lagrange multipliers
quadratic, convex
Maximization instead of minimization Equality constraints Lagrange multipliers instead of w,b, Quadratic program (QP)
positive definite
scalar 0
7
Inserting linear combination
Combined kernel must be within the cone of positive semidefinite matrices
Fixed trace,avoids trivial solution
ugly
8
Cone and other stuff
http://www.convexoptimization.com/dattorro/positive_semidefinate_cone.html
The set of all symmetric positive semidefinite matrices of particular dimension is called the positive semidefinite cone.
xTAx ≥ 0, x
A
Positive semidefinite:
Positive semidefinite cone:
9
Semidefinite program (SDP)
positive semidefinite constraints
Fixed trace,avoids trivial solution
10
Dual form
Quadratically constraint quadratic program (QCQP) QCQPs can be solved more efficiently than SDPs
(O(n3) <-> O(n4.5)) Interior point methods
quadratic constraint
11
Interior point algorithm
Linear program:
maximize cTx
subject to Ax < b x ≥ 0
Classical Simplex method follows edges of polyhedron Interior point methods walk through the interior of the
feasible region
12
Application Recognition of ribosomal and membrane proteins in
yeast 3 Types of data
Amino acid sequences Protein protein interactions mRNA expression profiles
7 Kernels Empirical kernel map -> sequence homology
BLAST(B), Smith-Waterman(SW), Pfam FFT -> sequence hydropathy
KD hydropathy profiles, padding, low-pass filter, FFT, RBF Interaction kernel(LI) -> PPI Diffusion(D) -> PPI RBF(E) -> gene expression
13
Results
Combination of kernels performs better than individual kernels Gene expression (E) most important for ribosomal protein recognition PPI (D) most important for membrane protein recognition
14
Results Small improvement compared to weights = 1 SDP robust in the presence of noise How performs SDP versus kernel weights derived
from accuracy of individual SVMs? Membrane protein recognition
Other methods use sequence information only TMHMM designed for topology prediction TMHMM not trained on yeast only
15
Why is this cool?
Everything you ever dreamed of: Optimization of C included
(2-norm soft margin SVM =1/C)
Hyperkernels (optimize the kernel itself)
Transduction (learn from labeled & unlabeled samples in polynomial time)
SDP has many applications(Graph theory, combinatorial optimization, …)
16
Literature Learning the kernel matrix with semidefinite programming
G.R.G.Lanckrit et. al, 2004 Kernel-based data fusion and its application to protein
function prediction in yeastG.R.G.Lanckrit et. al, 2004
Machine learning using HyperkernelsC.S.Ong, A.J.Smola, 2003
Semidefinite optimizationM.J.Todd, 2001
http://www-user.tu-chemnitz.de/~helmberg/semidef.html
17
Software SeDuMi (SDP) Mosek (QCQP, Java,C++, commercial) YALMIP (Matlab)… http://www-user.tu-chemnitz.de/~helmberg/semidef.html