kernel based data fusion

1

Kernel based data fusionDiscussion of a Paper by G. Lanckriet

2

Paper

3

Overview

Problem: Aggregation of heterogeneous data

Idea: Different data are represented by differentkernels

Question: How to combine different kernels in anelegant/efficient way?

Solution: Linear combination and SDP

Application: Recognition of ribosomal and membrane proteins

4

Linear combination of kernels

weight kernel

Resulting kernel K is positive definite (xTKx > 0 for x, provided i > 0 and xTKi x > 0 )

Elegant aggregation of heterogeneous data More efficient than training of individual SVMs KCCA uses unweighted sum over individual kernels

xTKx = x2K

x

x2K

0

5

Support Vector Machine

slack variables

square norm vector

penalty term

Hyperplane

6

Dual form

Lagrange multipliers

quadratic, convex

Maximization instead of minimization Equality constraints Lagrange multipliers instead of w,b, Quadratic program (QP)

positive definite

scalar 0

7

Inserting linear combination

Combined kernel must be within the cone of positive semidefinite matrices

Fixed trace,avoids trivial solution

ugly

8

Cone and other stuff

http://www.convexoptimization.com/dattorro/positive_semidefinate_cone.html

The set of all symmetric positive semidefinite matrices of particular dimension is called the positive semidefinite cone.

xTAx ≥ 0, x

A

Positive semidefinite:

Positive semidefinite cone:

9

Semidefinite program (SDP)

positive semidefinite constraints

Fixed trace,avoids trivial solution

10

Dual form

Quadratically constraint quadratic program (QCQP) QCQPs can be solved more efficiently than SDPs

(O(n3) <-> O(n4.5)) Interior point methods

quadratic constraint

11

Interior point algorithm

Linear program:

maximize cTx

subject to Ax < b x ≥ 0

Classical Simplex method follows edges of polyhedron Interior point methods walk through the interior of the

feasible region

12

Application Recognition of ribosomal and membrane proteins in

yeast 3 Types of data

Amino acid sequences Protein protein interactions mRNA expression profiles

7 Kernels Empirical kernel map -> sequence homology

BLAST(B), Smith-Waterman(SW), Pfam FFT -> sequence hydropathy

KD hydropathy profiles, padding, low-pass filter, FFT, RBF Interaction kernel(LI) -> PPI Diffusion(D) -> PPI RBF(E) -> gene expression

13

Results

Combination of kernels performs better than individual kernels Gene expression (E) most important for ribosomal protein recognition PPI (D) most important for membrane protein recognition

14

Results Small improvement compared to weights = 1 SDP robust in the presence of noise How performs SDP versus kernel weights derived

from accuracy of individual SVMs? Membrane protein recognition

Other methods use sequence information only TMHMM designed for topology prediction TMHMM not trained on yeast only

15

Why is this cool?

Everything you ever dreamed of: Optimization of C included

(2-norm soft margin SVM =1/C)

Hyperkernels (optimize the kernel itself)

Transduction (learn from labeled & unlabeled samples in polynomial time)

SDP has many applications(Graph theory, combinatorial optimization, …)

16

Literature Learning the kernel matrix with semidefinite programming

G.R.G.Lanckrit et. al, 2004 Kernel-based data fusion and its application to protein

function prediction in yeastG.R.G.Lanckrit et. al, 2004

Machine learning using HyperkernelsC.S.Ong, A.J.Smola, 2003

Semidefinite optimizationM.J.Todd, 2001

http://www-user.tu-chemnitz.de/~helmberg/semidef.html

17

Software SeDuMi (SDP) Mosek (QCQP, Java,C++, commercial) YALMIP (Matlab)… http://www-user.tu-chemnitz.de/~helmberg/semidef.html

kernel based data fusion

Documents

positive semidefinite

semidefinite optimization

xapositive semidefinite

kernel weights

semidefinite programming

kernel matrix

kernel itselftransduction

kernelbased data fusion