r berrt you o - dartmouth collegemorgan.dartmouth.edu/docs/informatics/docs/20031119_ryu.pdf ·...

36
1 o o support vector machines support vector machines Graduate Student Graduate Student and and Graduate Research Assistant Graduate Research Assistant Epi Epi - - GRA presentation GRA presentation robert robert yu yu ( ( [email protected] [email protected] ) ) november november 2003 2003 r r b b e e r r t t y y o o u u

Upload: docong

Post on 16-Mar-2018

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: r berrt you o - Dartmouth Collegemorgan.dartmouth.edu/Docs/Informatics/Docs/20031119_ryu.pdf · Graduate Research Assistant ... r berrt you. 2 support vector machines Computer Science

1

oo

support vector machines support vector machines

Graduate Student Graduate Student andand

Graduate Research AssistantGraduate Research Assistant

EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003

rr bbeerr tt yyoouu

Page 2: r berrt you o - Dartmouth Collegemorgan.dartmouth.edu/Docs/Informatics/Docs/20031119_ryu.pdf · Graduate Research Assistant ... r berrt you. 2 support vector machines Computer Science

2

support vector machines support vector machines

Computer Science

Statistics, bioinformatics

Molecular Genetics

Human Genetics

Linkage Analysis

Statistical Learning Methods

EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003

Page 3: r berrt you o - Dartmouth Collegemorgan.dartmouth.edu/Docs/Informatics/Docs/20031119_ryu.pdf · Graduate Research Assistant ... r berrt you. 2 support vector machines Computer Science

3

support vector machines support vector machines

Data ClassificationData Classification

Statistical Learning MethodsStatistical Learning Methods

Support Vector Machines Support Vector Machines (SVM)(SVM)

Case ProjectsCase Projects

DiscussionDiscussion

EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003

Page 4: r berrt you o - Dartmouth Collegemorgan.dartmouth.edu/Docs/Informatics/Docs/20031119_ryu.pdf · Graduate Research Assistant ... r berrt you. 2 support vector machines Computer Science

4

support vector machines support vector machines

Data ClassificationData Classification

EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003

Page 5: r berrt you o - Dartmouth Collegemorgan.dartmouth.edu/Docs/Informatics/Docs/20031119_ryu.pdf · Graduate Research Assistant ... r berrt you. 2 support vector machines Computer Science

5

support vector machines support vector machines

DatasetDataset

x11, x12, x13, ……, x1n

x21, x22, x23, ……, x2n

x31, x32, x33, ……, x3n

xm1, xm2, xm3, ……, xmn

e1

e2

e3

em

y1

y2

y3

ym

Examples Attributes Categories

EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003

Page 6: r berrt you o - Dartmouth Collegemorgan.dartmouth.edu/Docs/Informatics/Docs/20031119_ryu.pdf · Graduate Research Assistant ... r berrt you. 2 support vector machines Computer Science

6

support vector machines support vector machines

DatasetDataset

x11, x12, x13, ……, x1n

x21, x22, x23, ……, x2n

x31, x32, x33, ……, x3n

xm1, xm2, xm3, ……, xmn

e1

e2

e3

em

1

-1

1

-1

Examples Attributes Categories

EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003

Page 7: r berrt you o - Dartmouth Collegemorgan.dartmouth.edu/Docs/Informatics/Docs/20031119_ryu.pdf · Graduate Research Assistant ... r berrt you. 2 support vector machines Computer Science

7

support vector machines support vector machines

DatasetDataset

x11, x12, x13, ……, x1n

x31, x32, x33, ……, x3n

x21, x22, x23, ……, x2n

xm1, xm2, xm3, ……, xmn

e1

e3

e2

em

1

1

-1

-1

Examples Attributes Categories

EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003

Page 8: r berrt you o - Dartmouth Collegemorgan.dartmouth.edu/Docs/Informatics/Docs/20031119_ryu.pdf · Graduate Research Assistant ... r berrt you. 2 support vector machines Computer Science

8

support vector machines support vector machines

Statistical Learning MethodsStatistical Learning Methods

EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003

Page 9: r berrt you o - Dartmouth Collegemorgan.dartmouth.edu/Docs/Informatics/Docs/20031119_ryu.pdf · Graduate Research Assistant ... r berrt you. 2 support vector machines Computer Science

9

support vector machines support vector machines

Statistical Learning MethodsStatistical Learning Methods

Unsupervised Learning Methods

Supervised Learning Methods

No output teaches

Usually used to find the underlying process

Clustering analysis, etc.

Using output in training set to supervise

Usually used to classify unseen data

Support vector machines, supervised clustering analysis, etc.

EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003

Page 10: r berrt you o - Dartmouth Collegemorgan.dartmouth.edu/Docs/Informatics/Docs/20031119_ryu.pdf · Graduate Research Assistant ... r berrt you. 2 support vector machines Computer Science

10

support vector machines support vector machines

Brief History of SVMBrief History of SVM

EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003

Page 11: r berrt you o - Dartmouth Collegemorgan.dartmouth.edu/Docs/Informatics/Docs/20031119_ryu.pdf · Graduate Research Assistant ... r berrt you. 2 support vector machines Computer Science

11

support vector machines support vector machines

Brief History of SVMBrief History of SVM

1960s VC Theory, grounded on statistical learning theory, by Vapnik, Lerner, Chervonenkis (Russia)

EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003

Page 12: r berrt you o - Dartmouth Collegemorgan.dartmouth.edu/Docs/Informatics/Docs/20031119_ryu.pdf · Graduate Research Assistant ... r berrt you. 2 support vector machines Computer Science

12

support vector machines support vector machines

Brief History of SVMBrief History of SVM

1960s VC Theory, grounded on statistical learning theory, by Vapnik, Lerner, Chervonenkis (Russia)

1980s ~ 90s VC Theory developed into learning machines that can generalize to unseen data

EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003

Page 13: r berrt you o - Dartmouth Collegemorgan.dartmouth.edu/Docs/Informatics/Docs/20031119_ryu.pdf · Graduate Research Assistant ... r berrt you. 2 support vector machines Computer Science

13

support vector machines support vector machines

Brief History of SVMBrief History of SVM

1960s VC Theory, grounded on statistical learning theory, by Vapnik, Lerner, Chervonenkis (Russia)

1980s ~ 90s VC Theory developed into learning machines that can generalize to unseen data

Late 1990s Vapnik et al at AT&T Bell Labs SVM emerged, initially used in OCR and object recognition tasks.

EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003

Page 14: r berrt you o - Dartmouth Collegemorgan.dartmouth.edu/Docs/Informatics/Docs/20031119_ryu.pdf · Graduate Research Assistant ... r berrt you. 2 support vector machines Computer Science

14

support vector machines support vector machines

SVM Strategy and MethodsSVM Strategy and Methods

EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003

Page 15: r berrt you o - Dartmouth Collegemorgan.dartmouth.edu/Docs/Informatics/Docs/20031119_ryu.pdf · Graduate Research Assistant ... r berrt you. 2 support vector machines Computer Science

15

support vector machines support vector machines

SVM Strategy and MethodsSVM Strategy and Methods

Learning from training data

Predicting on unseen data

using generalization theory

EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003

Page 16: r berrt you o - Dartmouth Collegemorgan.dartmouth.edu/Docs/Informatics/Docs/20031119_ryu.pdf · Graduate Research Assistant ... r berrt you. 2 support vector machines Computer Science

16

support vector machines support vector machines ExampleExample

A 2D input space 3D feature space

x2

x1

√2 x1x2

x22

x12

Decision boundary: Input space feature space:

x12 + x2

2 ≤ 1 (x12, x2

2, √2x1x2)

xi(xi,1,xi,2) yi

Mapping function

x f(x,α)

EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003

Page 17: r berrt you o - Dartmouth Collegemorgan.dartmouth.edu/Docs/Informatics/Docs/20031119_ryu.pdf · Graduate Research Assistant ... r berrt you. 2 support vector machines Computer Science

17

support vector machines support vector machines Example (cont’d)Example (cont’d)

A 2D input space 3D feature space

A close-up, projected onto the first 2 dimensions, of the optimal separator, with margin and support vectors.

x22

x12

x22

x12

Hypothesis 1 Hypothesis 2

α1 α2

EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003

Page 18: r berrt you o - Dartmouth Collegemorgan.dartmouth.edu/Docs/Informatics/Docs/20031119_ryu.pdf · Graduate Research Assistant ... r berrt you. 2 support vector machines Computer Science

18

support vector machines support vector machines

SVM StrategySVM Strategy

Striking balance between capacity and accuracy

Minimizing generalization errors

EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003

Page 19: r berrt you o - Dartmouth Collegemorgan.dartmouth.edu/Docs/Informatics/Docs/20031119_ryu.pdf · Graduate Research Assistant ... r berrt you. 2 support vector machines Computer Science

19

support vector machines support vector machines SVM Learning SVM Learning –– find a linear classifierfind a linear classifier

yi = -1

yi = +1margin

hyperplane

w

xwx + b ≥ +1 = y

i

x wx + b = 0

x wx + b ≤ -1 = y

i

b---

----

||w||

γ

EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003

Page 20: r berrt you o - Dartmouth Collegemorgan.dartmouth.edu/Docs/Informatics/Docs/20031119_ryu.pdf · Graduate Research Assistant ... r berrt you. 2 support vector machines Computer Science

20

support vector machines support vector machines

x wx + b ≥ +1 = yi

x wx + b = 0

x wx + b ≤ -1 = yi

yi( xi w + b ) – 1 ≥ 0

Margin = 2 / ||w||, max( 2/||w|| ) = min( ||w|| ).

||w||: Euclidean norm of w, equal to √(w●w)

Lagrangian LP and its multiplier, αi

Minimize LP = (1/2)||w||2 - Σ αiyi(xiw + b) + Σ αi and w = Σ αiyixi

The Karush-Kuhn-Tucker Conditions (KKT)

∂LP/∂wν = wν – yixi = 0 and αi(yi(wxi + b) – 1) = 0

The conditions are the essence in seeking solutions for w, b and α

SVM Learning SVM Learning –– maximize the marginmaximize the margin

EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003

Page 21: r berrt you o - Dartmouth Collegemorgan.dartmouth.edu/Docs/Informatics/Docs/20031119_ryu.pdf · Graduate Research Assistant ... r berrt you. 2 support vector machines Computer Science

21

support vector machines support vector machines

SVM GeneralizationSVM Generalization- To minimize the error (risk) – Risk Minimization

EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003

Page 22: r berrt you o - Dartmouth Collegemorgan.dartmouth.edu/Docs/Informatics/Docs/20031119_ryu.pdf · Graduate Research Assistant ... r berrt you. 2 support vector machines Computer Science

22

support vector machines support vector machines

Kernel FunctionsKernel Functions

)||||exp(),( 2yxyxk −−= γ

dotdot

polynomialpolynomial

radialradial

neuralneural

anovaanova

user defineduser defined

∑=

==n

iii yxyxyxk

1

*),(

dyxyxk )1*(),( +=

)*tanh(),( byaxyxk +=

∑ −−=i

dii yxyxk )))(exp((),( γ

),(),( yxyxk Φ=

EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003

Page 23: r berrt you o - Dartmouth Collegemorgan.dartmouth.edu/Docs/Informatics/Docs/20031119_ryu.pdf · Graduate Research Assistant ... r berrt you. 2 support vector machines Computer Science

23

support vector machines support vector machines

Summary of SVMSummary of SVM

To find (learn) linear classifiers

By idea of maximizing margin

Minimizing VC dimensions

Efficient extension to non-linear SVMsthrough use of kernels

EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003

Page 24: r berrt you o - Dartmouth Collegemorgan.dartmouth.edu/Docs/Informatics/Docs/20031119_ryu.pdf · Graduate Research Assistant ... r berrt you. 2 support vector machines Computer Science

24

support vector machines support vector machines

AlogrithmsAlogrithms and Implementationand Implementation

Pseudocode for the general working set method

1 Given training set S2 α ← 03 select an arbitrary working set Ŝ S4 repeat5 solve optimisation problem on Ŝ6 select new working set from data not

satisfying KKT conditions7 until stopping criterion satisfied8 return α

Computation complexity: O(n3)

EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003

Page 25: r berrt you o - Dartmouth Collegemorgan.dartmouth.edu/Docs/Informatics/Docs/20031119_ryu.pdf · Graduate Research Assistant ... r berrt you. 2 support vector machines Computer Science

25

support vector machines support vector machines

Publicly Available SoftwarePublicly Available Software

MINOS- Stanford Optimization Laboratory- using hybrid strategy

LOQO- uses a primal dual interior-point method

quadprog in MatLab- using quadratic program subroutine

SVMLight and mySVM- by Thorsten Joachims, Cornell University- developed at University of Dortmund- freely available on the Internet

EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003

Page 26: r berrt you o - Dartmouth Collegemorgan.dartmouth.edu/Docs/Informatics/Docs/20031119_ryu.pdf · Graduate Research Assistant ... r berrt you. 2 support vector machines Computer Science

26

support vector machines support vector machines

mySVMmySVM

Based on the VC theory by Vapnik 1998

Using optimization algorithm of SVMLight by Joachims1999

Possible application to

Pattern recognitionRegressionDistribution estimation

EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003

Page 27: r berrt you o - Dartmouth Collegemorgan.dartmouth.edu/Docs/Informatics/Docs/20031119_ryu.pdf · Graduate Research Assistant ... r berrt you. 2 support vector machines Computer Science

27

support vector machines support vector machines

EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003

Page 28: r berrt you o - Dartmouth Collegemorgan.dartmouth.edu/Docs/Informatics/Docs/20031119_ryu.pdf · Graduate Research Assistant ... r berrt you. 2 support vector machines Computer Science

28

support vector machines support vector machines

Case Case ProjectsProjects-- to facilitate the discussion of SVM method

RA: “ibd sharing” dataset

“lung cancer” dataset

Goal: - to establish the SVM;

- predict clinical categories.

EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003

Page 29: r berrt you o - Dartmouth Collegemorgan.dartmouth.edu/Docs/Informatics/Docs/20031119_ryu.pdf · Graduate Research Assistant ... r berrt you. 2 support vector machines Computer Science

29

support vector machines support vector machines

Case Case ProjectsProjects-- to facilitate the discussion of SVM method

RA: “ibd sharing” dataset

“lung cancer” datasetGoal: - to group patients from genes behavior;

- to categorize lung cancer genes.

EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003

Page 30: r berrt you o - Dartmouth Collegemorgan.dartmouth.edu/Docs/Informatics/Docs/20031119_ryu.pdf · Graduate Research Assistant ... r berrt you. 2 support vector machines Computer Science

30

support vector machines support vector machines

DiscussionDiscussion

Data preparationRepresentativity of training data

Testing data

Attribute set

SVM Training and PredictionVC dimension

Cross-validation: insufficient training data improve generalization.

EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003

Page 31: r berrt you o - Dartmouth Collegemorgan.dartmouth.edu/Docs/Informatics/Docs/20031119_ryu.pdf · Graduate Research Assistant ... r berrt you. 2 support vector machines Computer Science

31

support vector machines support vector machines

KnowledgeKnowledge--based analysis of based analysis of microarraymicroarraygene expression data by using gene expression data by using

support vector machinessupport vector machines

Michael P. S. Brown, et alDepartment of Computer Science and Center for Molecular Biology of RNA, Department of Biology, University of California, Santa Cruz, Santa Cruz, CA

Vol. 97, Issue 1, 262-267, January 4, 2000http://www.cse.ucsc.edu/research/compbio/genex/

EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003

Page 32: r berrt you o - Dartmouth Collegemorgan.dartmouth.edu/Docs/Informatics/Docs/20031119_ryu.pdf · Graduate Research Assistant ... r berrt you. 2 support vector machines Computer Science

32

support vector machines support vector machines

AcknowledgementAcknowledgementalso served as a summery of

my GRA work of past 17 months

Dr. M. Spitz, Dr. Chris Amos, Dakai Zhu

Linkage Analysis: Dakai Zhu, Joe Zhou, Jack Liu

Positional Id: Sanjay Shete, Jianfang Chen

Pedgen Database: Dong Zeng, Carlos Barcenas, Phillip Lum

SVM’s: Dr. Amos, Dr. Spitz, Dr. Carol Etzel, Wei Chen

and others including

John Gu, Qing Zhang, Wenfu Wang, David Ma, etc.

EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003

Page 33: r berrt you o - Dartmouth Collegemorgan.dartmouth.edu/Docs/Informatics/Docs/20031119_ryu.pdf · Graduate Research Assistant ... r berrt you. 2 support vector machines Computer Science

33

support vector machines support vector machines

http://epi.mdanderson.org/~ryu/

EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003

Page 34: r berrt you o - Dartmouth Collegemorgan.dartmouth.edu/Docs/Informatics/Docs/20031119_ryu.pdf · Graduate Research Assistant ... r berrt you. 2 support vector machines Computer Science

34

support vector machines support vector machines

EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003

Page 35: r berrt you o - Dartmouth Collegemorgan.dartmouth.edu/Docs/Informatics/Docs/20031119_ryu.pdf · Graduate Research Assistant ... r berrt you. 2 support vector machines Computer Science

35

support vector machines support vector machines

EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003

Page 36: r berrt you o - Dartmouth Collegemorgan.dartmouth.edu/Docs/Informatics/Docs/20031119_ryu.pdf · Graduate Research Assistant ... r berrt you. 2 support vector machines Computer Science

36

support vector machines support vector machines

Thank YouThank You

EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003