classifying and clustering using support vector machine

35
Classifying and clustering using Support Vector Machine 2 nd PhD report PhD title : Data mining in unstructured data Daniel I. MORARIU, MSc PhD Suppervisor: Lucian N. VINŢAN Sibiu, 2005

Upload: tauret

Post on 10-Jan-2016

50 views

Category:

Documents


1 download

DESCRIPTION

Classifying and clustering using Support Vector Machine. 2 nd PhD report PhD title : Data mining in unstructured data Daniel I. MORARIU , MSc PhD Suppervisor: Lucian N. VIN ŢAN. Sibiu, 2005. Contents. Classification (clustering) steps Reuters Database processing - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Classifying and clustering using Support Vector Machine

Classifying and clustering using Support Vector

Machine

2nd PhD report

PhD title : Data mining in unstructured dataDaniel I. MORARIU, MSc

PhD Suppervisor: Lucian N. VINŢANSibiu, 2005

Page 2: Classifying and clustering using Support Vector Machine

Contents Classification (clustering) steps Reuters Database processing Feature extraction and selection

Information Gain Support Vector Machine

Support Vector Machine Binary classification Multiclass classification Clustering Sequential Minimal Optimizations (SMO) Probabilistic outputs

Experiments & results Binary classification. Aspects and results. Feature subset selection. A comparative approach. Multiclass classification. Quantitative aspects. Clustering. Quantitative aspects.

Conclusions and further work

Page 3: Classifying and clustering using Support Vector Machine

Classifying (clustering) steps

Text mining – features extraction

Features selection

Classifying or Clustering

Testing results

Page 4: Classifying and clustering using Support Vector Machine

Reuters Database Processing

806791 total documents, 126 topics, 366 regions, 870 industry codes

Industry category selection – “system software”

7083 documents 4722 training samples 2361 testing samples

19038 attributes (features) 68 classes (topics)

Binary classification Topics “c152” (only 2096 from 7083)

Page 5: Classifying and clustering using Support Vector Machine

Frequency vector

Terms frequency Stopwords Stemming Threshold

Large frequency vector

Features extraction

Page 6: Classifying and clustering using Support Vector Machine

Information Gain

SVM features selection Liniar kernel – weight vector

Features selection

c

iii ppSEnt

12 )(log)(

)()(),()(

vAValuesv

v SEntropyS

SSEntropyASGain

Page 7: Classifying and clustering using Support Vector Machine

Contents Classification (clustering) steps Reuters Database processing Feature extraction and selection

Information Gain Support Vector Machine

Support Vector Machine Binary classification Multiclass classification Clustering Sequential Minimal Optimizations (SMO) Probabilistic outputs

Experiments & results Binary classification. Aspects and results. Feature subset selection. A comparative approach. Multiclass classification. Quantitative aspects. Clustering. Quantitative aspects.

Conclusions and further work

Page 8: Classifying and clustering using Support Vector Machine

Support Vector Machine Binary classification

Optimal hyperplane Higher-dimensional feature space Primal optimization problem Dual optimization problem -

Lagrange multipliers Karush-Kuhn-Tucker conditions Support Vectors Kernel trick Decision function

Page 9: Classifying and clustering using Support Vector Machine

Optimal Hyperplane

bxxf )(sgn w

{x|‹w,x›+b=0}

X2

X1 yi=+1

yi=-1

{x|‹w,x›+b=-1}{x|‹w,x›+b=+1}

w margin

Page 10: Classifying and clustering using Support Vector Machine

Higher-dimensional feature space

x

Page 11: Classifying and clustering using Support Vector Machine

Primal optimization problem

mibxi

bH

,...,1,1,y subject to

,2

1)(minimize

i

2

,

w

www

m

iii bwybL

1

2)1),((

2

1),,( ixww

l

iiii yandli

1

0,,...,1,0

Dual optimization problem Maximize:

subject to:

m

iiii xy

0

w

Ci 0

Lagrangeformulation

Page 12: Classifying and clustering using Support Vector Machine

SVM - caracteristics Karush-Kuhn-Tucker (KKT) conditions

only the Lagrange multipliers that are non-zero at the saddle point

Support Vectors the patterns xi for which

Kernel trick Positively defined kernel

Decision function

0i

',)',( xxxxk

miiii bxxyxf

,1

,sgn)(

i

Page 13: Classifying and clustering using Support Vector Machine

Multi-class classification Separate one class versus the rest

m

i

ji

jii

jj

Mjbxxkyxgxg

1,1

),()(where,)(maxarg

)(maxargsgn)(

,1xgxf j

Mj

Page 14: Classifying and clustering using Support Vector Machine

Clustering Caracteristics

mapped data into a higher dimensional space

search for the minimal enclosing sphere Primal optimisation problem

Dual optimisation problem

Karush Kuhn Tucker condition

mjRax jj ...1,)( 22

j j j

jjjjjj CaxRRL 222 )(

1j

j j

jj xa )(jj C

Page 15: Classifying and clustering using Support Vector Machine

Contents Classification (clustering) steps Reuters Database processing Feature extraction and selection

Information Gain Support Vector Machine

Support Vector Machine Binary classification Multiclass classification Clustering Sequential Minimal Optimizations (SMO) Probabilistic outputs

Experiments & results Binary classification. Aspects and results. Feature subset selection. A comparative approach. Multiclass classification. Quantitative aspects. Clustering. Quantitative aspects.

Conclusions and further work

Page 16: Classifying and clustering using Support Vector Machine

SMO characteristics Only two parameters are updated (minimal

size of updates).

Benefit: doesn’t need any extra matrix storage doesn’t need to use numerical QP optimization step needs more iterations to converge, but only needs a

few operations at each step, which leads to overall speed-up

Components: Analytic method to solve the problem for two

Lagrange multipliers Heuristics for choosing the points

01

m

iii y

Page 17: Classifying and clustering using Support Vector Machine

Analytic method

Heuristics for choosing the point Choice of 1st point (x1/1):

Find KKT violations

Choice of 2nd point (x2/2): update 1, 2 which cause a large change, which, in

turn, result in a large increase of the dual objective maximize quantity |E1-E2|

SMO - components

Cyyyy oldold 2122112211 ,0 ,

2

21

21222

)()( ,)( where

)(

xxyxfE

EEy

iii

old

Page 18: Classifying and clustering using Support Vector Machine

Probabilistic outputs

))(exp(1

1)()1()(

xfxpxyPinputclassP

))(exp(1

11

BxAffyP

Page 19: Classifying and clustering using Support Vector Machine

Features selection using SVM

Linear kernel

Primal optimisation form

Keeped only that value that have weight in learned w vector great ther a threshold

bxf xwsgn

'2)',( xxxxk

miwiththresholdwall i ,..,1

Page 20: Classifying and clustering using Support Vector Machine

Contents Classification (clustering) steps Reuters Database processing Feature extraction and selection

Information Gain Support Vector Machine

Support Vector Machine Binary classification Multiclass classification Clustering Sequential Minimal Optimizations (SMO) Probabilistic outputs

Experiments & results Binary classification. Aspects and results. Feature subset selection. A comparative approach. Multiclass classification. Quantitative aspects. Clustering. Quantitative aspects.

Conclusions and further work

Page 21: Classifying and clustering using Support Vector Machine

Polynomial kernel

Gaussian kernel

Kernels used

dxxdxxk '2)',(

Cn

xxxxk

2'

exp)',(

Page 22: Classifying and clustering using Support Vector Machine

Binary using values ”0” and “1”

Nominal

Connell SMART

Data representation

),(max

),(),(

dn

tdntdTF

otherwisetdn

tdniftdTF

)),(log(1log(1

0),(0),(

Page 23: Classifying and clustering using Support Vector Machine

Binary classification - 63

0,00

10,00

20,00

30,00

40,00

50,00

60,00

70,00

80,00

1 2 3 4 5 6 7 10

kernel degree

accu

racy

(%)

BINARY

NOMINAL

CONNELL SMART

d - kernel’s degree 1 2 3 4 5 6 7 10

Binary 40.13 64.78 66.54 27.23 46.54 71.62 56.95 55.19

Nominal 38.96 62.65 67.93 82.03

16.62

11.95

83.99 64.08

CONNELL SMART 40.24 63.32 62.41 14.41 7.78 49.72 68.27 49.65

Page 24: Classifying and clustering using Support Vector Machine

Binary classification - 7999

0.00

10.00

20.00

30.00

40.00

50.00

60.00

70.00

80.00

90.00

100.00

1 2 3 4 5 6 7 10

kernel degree

accu

racy

(%)

BINARY

NOMINAL

CONNELL SMART

d - kernel’s degree 1 2 3 4 5 6 7 10

Binary 35.77

41.74 61.88 77.64 69.21 81.87 10.95 35.77

Nominal 56.69

26.83

28.06

28.27

29.14

41.38 36.19 34.05

CONNELL SMART 50.44

35.28 41.17 59.28 79.82 81.81 82.32 17.85

Page 25: Classifying and clustering using Support Vector Machine

Influence of vector size

30.00

40.00

50.00

60.00

70.00

80.00

90.00

100.00

1 2 3 4 5 6 7 10

kernel degree

accu

racy

(%) 63

1309

2488

7999

Polynomial kernel

Page 26: Classifying and clustering using Support Vector Machine

Influence of vector size

30,00

40,00

50,00

60,00

70,00

80,00

90,00

100,00

C0.01 C0.05 C0.1 C0.5 C0.7 C1.0 C1.4 C2.1

degree of kernel

accu

ran

cy(%

) 41

63

1309

2488

7999

Gaussian kernel

Page 27: Classifying and clustering using Support Vector Machine

Polynomial kernel

IG versus SVM – 427 features

20.00

30.00

40.00

50.00

60.00

70.00

80.00

90.00

100.00

1 2 3 4 5 6 7 10

kernel degree

IG - BINARY IG - NOMINAL IG - CONNEL SMARTSVM - BINARY SVM - NOMINAL SVM - CONNEL SMART

Page 28: Classifying and clustering using Support Vector Machine

Gaussian kernel

IG versus SVM – 427 features

50,00

55,00

60,00

65,00

70,00

75,00

80,00

85,00

90,00

95,00

0,01 0,05 0,1 0,5 0,7 1 1,4 2,1 2,7

kernel degree

accu

racy

(%

)

IG - BINARY IG - NOMINAL IG - CONNEL SMART

SVM - BINARY SVM - NOMINAL SVM - CONNEL SMART

Page 29: Classifying and clustering using Support Vector Machine

LibSvm versus UseSvm - 2493

Polynomial kernel

dxxdxxk '2)',(

40

50

60

70

80

90

100

1 2 3 4 5 6 7 10

kernel degree

accu

racy

(%

)

LibSVM

LibSVM+coef0

UseSVM

dcoefxxgammaxxk 0')',(

Page 30: Classifying and clustering using Support Vector Machine

LibSvm versus UseSvm - 2493

Gaussian kernel

Cn

xxxxk

2'

exp)',(

2'*exp)',( xxgammaxxk

40

50

60

70

80

90

100

0,01 0,05 0,1 0,5 0,7 1 1,4 2,1 2,7

kernel degree

accu

racy

(%

)

LibSVM

LibSVM+gamma

UseSVM

Page 31: Classifying and clustering using Support Vector Machine

Multiclass classification

0,00

20,00

40,00

60,00

80,00

100,00

2 3 4 5

kernel degree

accu

racy

(%) BINARY

NOMINAL

CONNELL SMART

Polynomial kernel - 2488 features

Page 32: Classifying and clustering using Support Vector Machine

Multiclass classification Gaussian kernel 2488 features

0,0010,0020,0030,0040,0050,0060,0070,0080,0090,00

C0.05 C0.1 C1.0 C1.4 C2.1 C2.7

kernel degree

accu

racy

(%)

BINARY

NOMINAL

CONNELL SMART

Page 33: Classifying and clustering using Support Vector Machine

Clustering using SVM

0

10

20

30

40

50

60

70

80

0.01 0.1 0.5

percent υ

accu

racy 41

63

1309

2111

υ\#features 41 63 1309 2111

0,01 0,6% 0,6% 0,7% 0,6%

0,1 0,5% 0,5% 0,5% 0,5%

0,5 25,2% 25,1% 25,1% 25,1%

Page 34: Classifying and clustering using Support Vector Machine

Conclusions – best results

Polynomial kernel and nominal representation (degree 5 and 6 )

Gaussian kernel and Connell Smart ( C=2.7)

Reduced # of support vectors for polynomial kernel in comparison with Gaussian kernel (24,41% versus 37.78%)

# features between 6% (1309) and 10% (2488)

Multiclass follows the binary classification Clustering has a smaller # of sv‘s Clustering follows binary classification

Page 35: Classifying and clustering using Support Vector Machine

Further work Features extraction and selection

Association rules between words (Mutual Information)

Synonym and Polysemy problem Better implementation of SVM with

linear kernel Using families of words (WordNet) SVM with kernel degree greater then 1

Classification and clustering Using classification and clustering

together