r berrt you o - dartmouth collegemorgan.dartmouth.edu/docs/informatics/docs/20031119_ryu.pdf ·...
TRANSCRIPT
1
oo
support vector machines support vector machines
Graduate Student Graduate Student andand
Graduate Research AssistantGraduate Research Assistant
EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003
rr bbeerr tt yyoouu
2
support vector machines support vector machines
Computer Science
Statistics, bioinformatics
Molecular Genetics
Human Genetics
Linkage Analysis
Statistical Learning Methods
EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003
3
support vector machines support vector machines
Data ClassificationData Classification
Statistical Learning MethodsStatistical Learning Methods
Support Vector Machines Support Vector Machines (SVM)(SVM)
Case ProjectsCase Projects
DiscussionDiscussion
EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003
4
support vector machines support vector machines
Data ClassificationData Classification
EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003
5
support vector machines support vector machines
DatasetDataset
x11, x12, x13, ……, x1n
x21, x22, x23, ……, x2n
x31, x32, x33, ……, x3n
…
xm1, xm2, xm3, ……, xmn
e1
e2
e3
…
em
y1
y2
y3
…
ym
Examples Attributes Categories
EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003
6
support vector machines support vector machines
DatasetDataset
x11, x12, x13, ……, x1n
x21, x22, x23, ……, x2n
x31, x32, x33, ……, x3n
…
xm1, xm2, xm3, ……, xmn
e1
e2
e3
…
em
1
-1
1
…
-1
Examples Attributes Categories
EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003
7
support vector machines support vector machines
DatasetDataset
x11, x12, x13, ……, x1n
x31, x32, x33, ……, x3n
…
x21, x22, x23, ……, x2n
xm1, xm2, xm3, ……, xmn
e1
e3
…
e2
em
1
1
…
-1
-1
Examples Attributes Categories
EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003
8
support vector machines support vector machines
Statistical Learning MethodsStatistical Learning Methods
EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003
9
support vector machines support vector machines
Statistical Learning MethodsStatistical Learning Methods
Unsupervised Learning Methods
Supervised Learning Methods
No output teaches
Usually used to find the underlying process
Clustering analysis, etc.
Using output in training set to supervise
Usually used to classify unseen data
Support vector machines, supervised clustering analysis, etc.
EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003
10
support vector machines support vector machines
Brief History of SVMBrief History of SVM
EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003
11
support vector machines support vector machines
Brief History of SVMBrief History of SVM
1960s VC Theory, grounded on statistical learning theory, by Vapnik, Lerner, Chervonenkis (Russia)
EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003
12
support vector machines support vector machines
Brief History of SVMBrief History of SVM
1960s VC Theory, grounded on statistical learning theory, by Vapnik, Lerner, Chervonenkis (Russia)
1980s ~ 90s VC Theory developed into learning machines that can generalize to unseen data
EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003
13
support vector machines support vector machines
Brief History of SVMBrief History of SVM
1960s VC Theory, grounded on statistical learning theory, by Vapnik, Lerner, Chervonenkis (Russia)
1980s ~ 90s VC Theory developed into learning machines that can generalize to unseen data
Late 1990s Vapnik et al at AT&T Bell Labs SVM emerged, initially used in OCR and object recognition tasks.
EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003
14
support vector machines support vector machines
SVM Strategy and MethodsSVM Strategy and Methods
EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003
15
support vector machines support vector machines
SVM Strategy and MethodsSVM Strategy and Methods
Learning from training data
Predicting on unseen data
using generalization theory
EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003
16
support vector machines support vector machines ExampleExample
A 2D input space 3D feature space
x2
x1
√2 x1x2
x22
x12
Decision boundary: Input space feature space:
x12 + x2
2 ≤ 1 (x12, x2
2, √2x1x2)
xi(xi,1,xi,2) yi
Mapping function
x f(x,α)
EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003
17
support vector machines support vector machines Example (cont’d)Example (cont’d)
A 2D input space 3D feature space
A close-up, projected onto the first 2 dimensions, of the optimal separator, with margin and support vectors.
x22
x12
x22
x12
Hypothesis 1 Hypothesis 2
α1 α2
EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003
18
support vector machines support vector machines
SVM StrategySVM Strategy
Striking balance between capacity and accuracy
Minimizing generalization errors
EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003
19
support vector machines support vector machines SVM Learning SVM Learning –– find a linear classifierfind a linear classifier
yi = -1
yi = +1margin
hyperplane
w
xwx + b ≥ +1 = y
i
x wx + b = 0
x wx + b ≤ -1 = y
i
b---
----
||w||
γ
EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003
20
support vector machines support vector machines
x wx + b ≥ +1 = yi
x wx + b = 0
x wx + b ≤ -1 = yi
yi( xi w + b ) – 1 ≥ 0
Margin = 2 / ||w||, max( 2/||w|| ) = min( ||w|| ).
||w||: Euclidean norm of w, equal to √(w●w)
Lagrangian LP and its multiplier, αi
Minimize LP = (1/2)||w||2 - Σ αiyi(xiw + b) + Σ αi and w = Σ αiyixi
The Karush-Kuhn-Tucker Conditions (KKT)
∂LP/∂wν = wν – yixi = 0 and αi(yi(wxi + b) – 1) = 0
The conditions are the essence in seeking solutions for w, b and α
SVM Learning SVM Learning –– maximize the marginmaximize the margin
EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003
21
support vector machines support vector machines
SVM GeneralizationSVM Generalization- To minimize the error (risk) – Risk Minimization
EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003
22
support vector machines support vector machines
Kernel FunctionsKernel Functions
)||||exp(),( 2yxyxk −−= γ
dotdot
polynomialpolynomial
radialradial
neuralneural
anovaanova
user defineduser defined
∑=
==n
iii yxyxyxk
1
*),(
dyxyxk )1*(),( +=
)*tanh(),( byaxyxk +=
∑ −−=i
dii yxyxk )))(exp((),( γ
),(),( yxyxk Φ=
EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003
23
support vector machines support vector machines
Summary of SVMSummary of SVM
To find (learn) linear classifiers
By idea of maximizing margin
Minimizing VC dimensions
Efficient extension to non-linear SVMsthrough use of kernels
EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003
24
support vector machines support vector machines
AlogrithmsAlogrithms and Implementationand Implementation
Pseudocode for the general working set method
1 Given training set S2 α ← 03 select an arbitrary working set Ŝ S4 repeat5 solve optimisation problem on Ŝ6 select new working set from data not
satisfying KKT conditions7 until stopping criterion satisfied8 return α
Computation complexity: O(n3)
⊂
EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003
25
support vector machines support vector machines
Publicly Available SoftwarePublicly Available Software
MINOS- Stanford Optimization Laboratory- using hybrid strategy
LOQO- uses a primal dual interior-point method
quadprog in MatLab- using quadratic program subroutine
SVMLight and mySVM- by Thorsten Joachims, Cornell University- developed at University of Dortmund- freely available on the Internet
EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003
26
support vector machines support vector machines
mySVMmySVM
Based on the VC theory by Vapnik 1998
Using optimization algorithm of SVMLight by Joachims1999
Possible application to
Pattern recognitionRegressionDistribution estimation
EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003
27
support vector machines support vector machines
EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003
28
support vector machines support vector machines
Case Case ProjectsProjects-- to facilitate the discussion of SVM method
RA: “ibd sharing” dataset
“lung cancer” dataset
Goal: - to establish the SVM;
- predict clinical categories.
EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003
29
support vector machines support vector machines
Case Case ProjectsProjects-- to facilitate the discussion of SVM method
RA: “ibd sharing” dataset
“lung cancer” datasetGoal: - to group patients from genes behavior;
- to categorize lung cancer genes.
EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003
30
support vector machines support vector machines
DiscussionDiscussion
Data preparationRepresentativity of training data
Testing data
Attribute set
SVM Training and PredictionVC dimension
Cross-validation: insufficient training data improve generalization.
EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003
31
support vector machines support vector machines
KnowledgeKnowledge--based analysis of based analysis of microarraymicroarraygene expression data by using gene expression data by using
support vector machinessupport vector machines
Michael P. S. Brown, et alDepartment of Computer Science and Center for Molecular Biology of RNA, Department of Biology, University of California, Santa Cruz, Santa Cruz, CA
Vol. 97, Issue 1, 262-267, January 4, 2000http://www.cse.ucsc.edu/research/compbio/genex/
EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003
32
support vector machines support vector machines
AcknowledgementAcknowledgementalso served as a summery of
my GRA work of past 17 months
Dr. M. Spitz, Dr. Chris Amos, Dakai Zhu
Linkage Analysis: Dakai Zhu, Joe Zhou, Jack Liu
Positional Id: Sanjay Shete, Jianfang Chen
Pedgen Database: Dong Zeng, Carlos Barcenas, Phillip Lum
SVM’s: Dr. Amos, Dr. Spitz, Dr. Carol Etzel, Wei Chen
and others including
John Gu, Qing Zhang, Wenfu Wang, David Ma, etc.
EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003
33
support vector machines support vector machines
http://epi.mdanderson.org/~ryu/
EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003
34
support vector machines support vector machines
EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003
35
support vector machines support vector machines
EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003
36
support vector machines support vector machines
Thank YouThank You
EpiEpi--GRA presentation ● GRA presentation ● robertrobert yuyu (([email protected]@rice.edu)) ● ● novembernovember 2003 2003