function of rival similarity in a cognitive data analysis

54
FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS Nikolay Zagoruiko Irina Borisova, Vladimir Dyubanov, Olga Kytnenko Institute of Mathematics of the Siberian Devision of the Russian Academy of Sciences, Pr. Koptyg 4, 630090 Novosibirsk, Russia, [email protected]

Upload: maxim-kazantsev

Post on 26-May-2015

469 views

Category:

Education


0 download

TRANSCRIPT

Page 1: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Nikolay ZagoruikoIrina Borisova, Vladimir Dyubanov, Olga Kytnenko

Institute of Mathematics of the Siberian Devisionof the Russian Academy of Sciences,

Pr. Koptyg 4, 630090 Novosibirsk, Russia,

[email protected]

Page 2: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Data Analysis, Pattern Recognition, Empirical Prediction, Discovering of Regularities, Data Mining, Machine Learning,

Knowledge Discovering, Intelligence Data Analysis, Cognitive Calculations

The special attention involves ability of the person

- To estimate similarities and distinctions between objects, - To make classification of objects, - To recognize a belonging of new objects to available classes, - To discover natural dependences between characteristics and - To use these dependences (knowledge) for forecasting

Page 3: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Specificity of Data Mining tasks:

• Polytypic attributes

• Number of attributes>> number of objects

• Presence of noise, spikes and blanks

• Absence of the information on distributions

Page 4: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Situation in Data Mining

Thousands of algorithmsReasons: Types of scales, dependences of features, lows of

distribution, linear-nonlinear decision rules, small or big training set,

How to make algorithms, which will be invariant to this features?

Which function is common for all DM algorithms?

Basic function, used by the person at the clustering, recognition, feature selection etc., is function of estimation of similarity between objects.

Page 5: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Measures of Similarity

2

1

21

1

21

3

41

( )

( , ) 1 ( ) ,

( , ) 1 | |

( , ) 1 max | |,

( , )( , ) ,

max( , )

( , ) 1 ,

....

na bi i

i

na b

i i ii

na b

i i ii

a bi i

a bni i

i a bi i i

x x

S a b x x

S a b x x

S a b x x

min x xS a b

x x

S a b e

Page 6: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Similarity is not absolute, but a relative category

Is an object b similar to a or it is not similar? Whether the objects b and a belong to one class?

a b

a b c

a b c

We should know the answer on question: In competition with what?

Page 7: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Measure F(z,a|b) of similarity of the object z to object a in competition with object b

Locality: F is depend on distances (z,a) and (z,b) only.

Normality: If z=a, F(z,a|b)=+1. If z=b, F(z,a|b)=-1.

If (z,a)=(z,b), F(z,a|b)=F(z,b|a)= 0.

Invariance to moving and rotation of coordinates.

Antysimmetricity: F(z,a|b)= -F(z,b|a)

======================================

Simmetry: F(z,a|b)=F(z,b|a)

Thriangularity: F(z,a|b)+F(a,b|z)≥F(b,z|a)

======================================

Competitive Space

Page 8: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Function of Concurrent (Rival) Similarity (FRiS)

r1

r2

-1

z

A

+1

B

d2

F

A Bz

r1

r2

)(

)()2|1,(

12

12

rr

rrzF

Page 9: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Methods of DM, using FRiS-function, allows to improve a old algorithms

and to solve a some new tasks:

• Quantitative estimation of compactness • Choice of informative attributes • Construction of decision rules• Censoring of the training set• Generalized classification• Filling of blanks (inputation)• Forecasting• Ordering of objects

Page 10: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

A

B

A

B

A

B

All pattern recognition methods are based on hypothesis of compactness

Braverman E.M., 1962

The patterns are compact if-the number of boundary points is not enough in comparison with common number; - compact patterns are separated from each other refer to not too elaborate borders.

Compactness

Page 11: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Compactness

For high compactness it is need:

Maximum of the similarity between objects of one pattern

Minimum of the similarity between objectsof different patterns

Page 12: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

r2

r1

i

A

B

j

b

r1 r2

j

j

b

b

r2

r1

Maximal similarity between objects of the same pattern

Compact patterns should satisfy to condition of the

1

1( , | )

AM

ijA

D F j i bM

Max inCompactness

2 1 2 1( , | ) ( ) / ( )F j i b r r r r

Page 13: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Min out

r2

r1

r1

r2

i

A

B

j

q

sb

Compactness

Maximal difference of these objects with the objects of other patterns

1 1

1( , | )

A BM M

ii qA B

T F q s iM M

*A BC C C

Compact patterns should satisfy to the condition

( ) / 2i i iC D T

2 1 2 1( , | ) ( ) / ( )F q s i r r r r

1

1 AM

A iiA

C CM

1

1 BM

B qqB

C CM

Page 14: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Algorithm FRiS-Stolpfor selection of the standards (“stolps”)

max ( ) / 2i i iC D T

Decision rules

Page 15: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Decision rules

Page 16: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Recognition

Page 17: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

k=K

Page 18: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

k=K+2

Page 19: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

k=K+11

Page 20: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

k=K+29

Page 21: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Censoring of the training setCensoring

Page 22: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Censoring of the training setCensoring

Page 23: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Censoring of the training setCensoring

Page 24: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Censoring of the training set

1.0.8689 -90(90)-202.0.8902 -90(90)-203.0.9084 -90(90)-204.0.9167 -90(90)-205.0.8903 -90(90)-206.0.7309 -88(90)-97.0.2324 -86(90)-7

MMmmkCH /',...)/(

H P

=argmax |r|(H,P) =1,2,…7

Censoring

=4 or 5

Page 25: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Informativeness by Fisherfor normal distribution

1 22 21 2

| |FI

Compactness has the same sense and can be used as a criteria of informativeness, which is invariant to

low of distribution and to relation of N:M

Results of comparative researches have shown appreciable advantage of this criterion in comparison

with number of errors at the Cross-Validation

Criteria of informativeness

Page 26: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Comparison of the criteria (CV - FRiS)

Order of attributes by informativeness

....... ....... C = 0,661

....... ....... C = 0,883

noise0,6

0,7

0,8

0,9

1

1,1

0,05 0,1 0,15 0,2 0,25 0,3

Fs

U

Fs

U

N=100 M=2*100

mt =2*35 mC =2*65 +noise

noise

Criteria

Page 27: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Algorithm GRAD It based on combination of two greedy approaches:

forward and backward searches.

At a stage forward algorithm Addition

is used

At a stage backward algorithm Deletion is used

GRAD

Page 28: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Algorithm AdDel To easing influence of collecting errors a relaxation method it is applied.n1 - number of most informative attributes, add-on to subsystem (Addition),n2<n1 - number of less informative attributes, eliminated from subsystem (Deletion).

AdDel Relaxation method: n steps forward - n/2 steps back

Algorithm AdDel. Reliability (R) of recognition at

different dimension space.

R(AdDel) > R(DelAd) > R(Ad) > R(Del)

GRAD

Page 29: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Algorithm GRAD• AdDel can work with not single attributes only, but also with groups of

attributes (granules) of different capacity m=1,2,3,…: , , ,…

The granules can be formed by the exhaustive search method.

• But: Problem of combinatory explosion!

Decision: orientation on individual informativeness of attributes

Dependence of frequency f hits in an informative subsystem from serial number L on individual informativeness

It allows to granulate a most informative part attributes only

GRAD

L

f

Page 30: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Algorithm GRAD(Granulated AdDel)

1. Independent testing N attributes

Selection m1<<N first best (m1 granules power 1)

2. Forming combinations

Selection m2<< first best (m2 granules power 2)

3. Forming combinations

Selection m3<< first best (m3 granules power 3)

M =<m1,m2,m3> - set of secondary attributes (granules)AdDel(M) selects m*<<|M| best granules, which included n* attributes

21mC

21mC

31mC

31mC

2 6 9 25,3 ,5 , ,...X x x x x

GRAD

Page 31: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Value of FRiS for points on a plane

Page 32: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Classification (Algorithm FRiS-Class)

FRiS-Cluster divides a objects on clustersFRiS-Tax unites a clusters to classes (taxons)

Using FRiS-function allows:- To make a taxons of any form;- To search a optimal number of taksons.

r1

r2*

r1 r2*

Page 33: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS
Page 34: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Примеры таксономии алгоритмом FRiS-Class

Page 35: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Comparison the FRiS-Class with other algorithms of taxonomy

0,3

0,4

0,5

0,6

0,7

0,8

0,9

2 3 4 5 6 7 8 9 10 11 12 13 14 15

FRiS-Cluster

Kmeans

Forel

Scat

FRiS-Tax

K

Page 36: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Taxonomic Decision Rule

Page 37: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Taxonomic Decision Rule

Page 38: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Taxonomic Decision Rule

Page 39: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Universal classification

Labeled Semilabeled Unlabeled

(Pattern Rec) (ТРФ) (Clastering)

Page 40: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Universal classification

Unlabeled Semilabeled Labeled

(Clastering) (Pattern Rec)

=================================

FRiS-TDR

Page 41: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Some real tasks DM

Task K M NMedicine:Diagnostics of Diabetes II type 3 43 5520 Diagnostics of Prostate Cancer 4 322 17153Recognition of type of Leukemia 2 38 7129

Physics:Complex analysis of spectra 7 20-400 1024

Commerse:Forecasting of book sealing(Data Mining Cup 2009) - 4812 1862

Page 42: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Data Mining Cup 2009http:www.prudsys.deServiceDownloadsbin

Prognosis of data at absolure scale

1…………………………………………1856 1…8

TRAINING

1... 84% = 0.. A = 0 - 2300.2394

CONTROL

1.......2418

To predict 19344 cells

Page 43: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

DMC 2009

618 teams from 164 Universities of 42 countries participated

231 have sent decisions, 49 were selected for rating

1 Uni Karlsruhe TH_ II

17260 16 TU Graz

23626

2 TU Dortmund 17912   18 Uni Weimar_I 23796

3 TU Dresden 18163   19 Zhejiang University of Sc. and Tech 23952

4 Novosibirsk State University 18353   20 University Laval 24884

5 Uni Karlsruhe TH_ I

18763   24 University of Southampton

25694

6 FH Brandenburg_I

19814   25 Telkom Institute of Technology

25829

7 FH Brandenburg_II

20140   26 University of Central Florida

26254

8 Hochschule Anhalt

20767   32 Indian Institute of Technology

28517

9 Uni Hamburg_

21064   34 Anna University Coimbatore

28670

10 KTH Royal Institute of Technology

21195   38 Technical University of Kosice 32841

11 RWTH Aachen_I 21780   39 Uiversity of Edinburgh 45096

14 Budapest University of Technology

23277   48 Warsaw School of Economics

77551

15 Isfahan University of Technology

23488   49 FH Hannover 1938612

NN Teams Errors NN Teams Errors

Page 44: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Comparison with 10 methods of feature selection

• Jeffery I.,Higgins D.,Culhane A. Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data. //

• http://www.biomedcentral.com/1471-2105/7/3599 tasks on microarray data. 10 methods the feature selection.Independent attributes. Selection of n first (best). Criteria – min of errors on CV: 10 time by 50%.

4 decision rules:Support Vector Machine (SVM), Between Group Analysis (BGA),

Naive Bayes Classification (NBC), K-Nearest Neighbors (KNN).

40 decision of each of 9 tasks

Page 45: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Methods of selection

Methods Results

Significance analysis of microarrays (SAM) 42Analysis of variance (ANOVA) 43Empirical Bayes t-statistic 32Template matching 38 maxT 37 Between group analysis (BGA) 43 Area under the receiver operating characteristic curve (ROC) 37Welch t-statistic 39 Fold change 47 Rank products 42 FRiS-GRAD 12

Empirical Bayes t-statistic – for middle set of objectsArea under a ROC curve – for small noise and large set Rank products – for large noise and small set

Page 46: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Results on tasks

• Задача N0 m1/m2 max of 4 GRAD• ALL1 12625 95/33 100.0 100.0• ALL2 12625 24/101 78.2 80.8• ALL3 12625 65/35 59.1 73.8• ALL4 12625 26/67 82.1 83,9• Prostate 12625 50/53 90.2 93.1 • Myeloma 12625 36/137 82.9 81.4• ALL/AML 7129 47/25 95.9 100.0• DLBCL 7129 58/19 94.3 89.8• Colon 2000 22/40 88.6 89.5

Page 47: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Recognition of two types of Leukemia - ALL and AML

ALL AMLTraining set 38 27 11 N = 7129Control set 34 20 14

I.Guyon, J.Weston, S.Barnhill, V.Vapnik Gene Selection for Cancer Classification using

Support Vector Machines. Machine Learning. 2002, 46 1-3: pp. 389-422.

Page 48: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Training set 38 Test set 34N g Vsuc Vext Vmed Tsuc Text Tmed P7129 0,95 0,01 0,42 0,85 -0,05 0,42 294096 0,82 -0,67 0,30 0,71 -0,77 0,34 242048 0,97 0,00 0,51 0,85 -0,21 0,41 291024 1,00 0,41 0,66 0,94 -0,02 0,47 32512 0,97 0,20 0,79 0,88 0,01 0,51 30256 1,00 0,59 0,79 0,94 0,07 0,62 32128 1,00 0,56 0,80 0,97 -0,03 0,46 3364 1,00 0,45 0,76 0,94 0,11 0,51 3232 1,00 0,45 0,65 0,97 0,00 0,39 3316 1,00 0,25 0,66 1,00 0,03 0,38 348 1,00 0,21 0,66 1,00 0,05 0,49 344 0,97 0,01 0,49 0,91 -0,08 0,45 312 0,97 -0,02 0,42 0,88 -0,23 0,44 301 0,92 -0,19 0,45 0,79 -0,27 0,23 27

Pentium T=3 hours

FRiS Decision Rules P

0,72656 537/1 , 1833/1 , 2641/2 , 4049/2 34

0,71373 1454/1 , 2641/1 , 4049/1 34

0,71208 2641/1 , 3264/1 , 4049/1 34

0,71077 435/1 , 2641/2 , 4049/2 , 6800/1 34

0,70993 2266/1 , 2641/2 , 4049/2 34

0,70973 2266/1 , 2641/2 , 2724/1 , 4049/2 34

0,70711 2266/1 , 2641/2 , 3264/1 , 4049/2 34

0,70574 2641/2 , 3264/1 , 4049/2 , 4446/1 34

0,70532 435/1 , 2641/2 , 2895/1 , 4049/2 34

0,70243 2641/2 , 2724/1 , 3862/1 , 4049/2 34

Name of gene Weight

2641/1 , 4049/1 33 2641/1 32

В 27 первых подпространствах P =34/34

Pentium T=15 sec

I.Guyon, J.Weston, S.Barnhill, V.Vapnik Zagoruiko N., Borisova I., Dyubanov V., Kutnenko O.

Best features SVM FRiS

FRE 803,4846 30(88%) 33(97%)

4846 27(79%) 30(88%)

Page 49: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Projection a training set on 2641 и 4049 features

AML

ALL

Page 50: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Diabetes of II type Ordering of patients

M=43 17+8+18, N=5520

• Average similarity Fav of patients to healthy people

Healthy Patients

Group of risk

The group of risk did not participate in training

It is useful for early diagnostics of diseases and for monitoring process of treatment

F=+1

F=-1

Page 51: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Methods of DM, using FRiS-function, allows to improve a old algorithms

and to solve a some new tasks:

• Quantitative estimation of compactness • Choice of informative attributes • Construction of decision rules• Censoring of the training set• Generalized classification• Filling of blanks (inputation)• Forecasting• Ordering of objects

Page 52: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Unsettled problems

• Stolp+corridor (FRiS+LDR)• Imputation of polytypical tables• Unite of tasks of different types (UC+X)• Optimization of algorithms• Realization of program system (OTEX 2)• Applications (medicine, genetics,…)• …..

Page 53: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Conclusion

FRiS-function:1.Provides effective measure of similarity,

informativeness and compactness

2.Provides unification of methods and invariance to parameters of tasks,low of distribution, relation M:N

3.Provides high enough quality of decisions

Publications:

http://math.nsc.ru/~wwwzag

Page 54: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Thank you!

• Questions, please?