function of rival similarity in a cognitive data analysis

49
FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS Nikolay Zagoruiko Irina Borisova, Vladimir Dyubanov, Olga Kytnenko Institute of Mathematics of the Siberian Devision of the Russian Academy of Sciences, Pr. Koptyg 4, 630090 Novosibirsk, Russia, [email protected]

Upload: irene-pochinok

Post on 08-Jul-2015

286 views

Category:

Documents


0 download

DESCRIPTION

Nikolay ZagoruikoIrina Borisova, Vladimir Dyubanov, Olga Kytnenko

TRANSCRIPT

Page 1: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Nikolay ZagoruikoIrina Borisova, Vladimir Dyubanov, Olga Kytnenko

Institute of Mathematics of the Siberian Devisionof the Russian Academy of Sciences,

Pr. Koptyg 4, 630090 Novosibirsk, Russia,

[email protected]

Page 2: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Data Analysis, Pattern Recognition, Empirical Prediction, Discovering of Regularities, Data Mining, Machine Learning,

Knowledge Discovering, Intelligence Data Analysis, Cognitive Calculations

The special attention involves ability of the person

- To estimate similarities and distinctions between objects, - To make classification of objects, - To recognize a belonging of new objects to available classes, - To discover natural dependences between characteristics and - To use these dependences (knowledge) for forecasting

Page 3: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Specificity of Data Mining tasks:

• Polytypic attributes

• Quantity of attributes>> numbers of objects• Presence of noise, spikes and blanks

• Absence of the information on distributions

Page 4: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Some real tasks DM

Task K M NMedicine:Diagnostics of Diabetes II type 3 43 5520 Diagnostics of Prostate Cancer 4 322 17153Recognition of type of Leukemia 2 38 7129

Physics:Complex analysis of spectra 7 20-400 1024

Commerse:Forecasting of book sealing(Data Mining Cup 2009) - 4812 1862

Page 5: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Data Mining Cup 2009http:www.prudsys.deServiceDownloadsbin

Prognosis of data at absolure scale

1.......2418

CONTROL

1... 84% = 0.. A = 0 - 2300.2394

TRAINING

1…81…………………………………………1856

To predict 19344 cells

Page 6: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

DMC 2009

618 teams from 164 Universities of 42 countries participated

231 have sent decisions, 49 were selected for rating

1938612FH Hannover 49 23488 Isfahan University of Technology

15

77551 Warsaw School of Economics

48 23277 Budapest University of Technology

14

45096Uiversity of Edinburgh39 21780RWTH Aachen_I11

32841 Technical University of Kosice 38 21195 KTH Royal Institute of Technology

10

28670Anna University Coimbatore

34 21064Uni Hamburg_

9

28517Indian Institute of Technology

32 20767Hochschule Anhalt

8

26254University of Central Florida

26 20140FH Brandenburg_II

7

25829 Telkom Institute of Technology

25 19814FH Brandenburg_I

6

25694 University of Southampton

24 18763Uni Karlsruhe TH_ I

5

24884University Laval20 18353Novosibirsk State University 4

23952Zhejiang University of Sc. and Tech19 18163TU Dresden 3

23796Uni Weimar_I18 17912TU Dortmund 2

23626TU Graz

1617260Uni Karlsruhe TH_ II

1

NN Teams Errors NN Teams Errors

Page 7: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Comparison with 10 methods

• Jeffery I.,Higgins D.,Culhane A. Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data. //

• http://www.biomedcentral.com/1471-2105/7/3599 tasks on microarray data. 10 methods the feature selection.Independent attributes. Selection of n first (best). Criteria – min of errors on CV: 10 time by 50%.

Decision rules:Support Vector Machine (SVM), Between Group Analysis (BGA),

Naive Bayes Classification (NBC), K-Nearest Neighbors (KNN).

Page 8: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Methods of selection

Methods Results

Significance analysis of microarrays (SAM) 42Analysis of variance (ANOVA) 43Empirical Bayes t-statistic 32Template matching 38 maxT 37 Between group analysis (BGA) 43 Area under the receiver operating characteristic curve (ROC) 37Welch t-statistic 39 Fold change 47 Rank products 42 FRiS-GRAD 12

Empirical Bayes t-statistic – for middle set of objectsArea under a ROC curve – for small noise and large set Rank products – for large noise and small set

Page 9: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Results of comperasing

• Задача N0 m1/m2 max of 4 GRAD• ALL1 12625 95/33 100.0 100.0• ALL2 12625 24/101 78.2 80.8• ALL3 12625 65/35 59.1 73.8• ALL4 12625 26/67 82.1 83,9• Prostate 12625 50/53 90.2 93.1 • Myeloma 12625 36/137 82.9 81.4• ALL/AML 7129 47/25 95.9 100.0• DLBCL 7129 58/19 94.3 89.8• Colon 2000 22/40 88.6 89.5

Page 10: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Recognition of two types of Leukemia - ALL and AML

ALL AMLTraining set 38 27 11 N = 7129Control set 34 20 14

I.Guyon, J.Weston, S.Barnhill, V.Vapnik Gene Selection for Cancer Classification using

Support Vector Machines. Machine Learning. 2002, 46 1-3: pp. 389-422.

Page 11: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Training set 38 Test set 34N g Vsuc Vext Vmed Tsuc Text Tmed P7129 0,95 0,01 0,42 0,85 -0,05 0,42 294096 0,82 -0,67 0,30 0,71 -0,77 0,34 242048 0,97 0,00 0,51 0,85 -0,21 0,41 291024 1,00 0,41 0,66 0,94 -0,02 0,47 32512 0,97 0,20 0,79 0,88 0,01 0,51 30256 1,00 0,59 0,79 0,94 0,07 0,62 32128 1,00 0,56 0,80 0,97 -0,03 0,46 3364 1,00 0,45 0,76 0,94 0,11 0,51 3232 1,00 0,45 0,65 0,97 0,00 0,39 3312 1,00 0,25 0,66 1,00 0,03 0,38 348 1,00 0,21 0,66 1,00 0,05 0,49 344 0,97 0,01 0,49 0,91 -0,08 0,45 312 0,97 -0,02 0,42 0,88 -0,23 0,44 301 0,92 -0,19 0,45 0,79 -0,27 0,23 27

PentiumT=3 hours

FRiS Decision Rules P 0,72656 537/1 , 1833/1 , 2641/2 , 4049/2 34 0,71373 1454/1 , 2641/1 , 4049/1 34 0,71208 2641/1 , 3264/1 , 4049/1 34 0,71077 435/1 , 2641/2 , 4049/2 , 6800/1 34 0,70993 2266/1 , 2641/2 , 4049/2 34 0,70973 2266/1 , 2641/2 , 2724/1 , 4049/2 34 0,70711 2266/1 , 2641/2 , 3264/1 , 4049/2 34 0,70574 2641/2 , 3264/1 , 4049/2 , 4446/1 34 0,70532 435/1 , 2641/2 , 2895/1 , 4049/2 34 0,70243 2641/2 , 2724/1 , 3862/1 , 4049/2 34

Name of gene Weight

2641/1 , 4049/1 33 2641/1 32

On the 27 first rules P =34/34 The 10 best rules

PentiumT=13 sec

I.Guyon, J.Weston, S.Barnhill, V.Vapnik Zagoruiko N., Borisova I., Dyubanov V., Kutnenko O.

Page 12: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Projection of training set on 2-dim. space 2641 and 4049

ALL

AML

Page 13: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Diabetes of II type Ordering of patients

M=43 17+8+18, N=5520

• Average similarity Fav of patients to healthy people

Healthy Patients

Group of risk

The group of risk did not participate in training

It is useful for early diagnostics of diseases and for monitoring process of treatment

F=+1

F=-1

Page 14: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

The reason for abundance of methods -

Absence of the uniform approach to the solution of tasks of different type

Types of scales, dependences of features, lows of distribution,

linear-nonlinear decision rules, small or big training set, …

Uniform approach can be founded on next hypothesis:

Basic function, used by the person at the classification, recognition, feature selection etc. consists in method of estimation of similarity between objects.

Page 15: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Measures of Similarity

2

1

21

1

21

3

41

( )

( , ) 1 ( ) ,

( , ) 1 | |

( , ) 1 max | |,

( , )( , ) ,

max( , )

( , ) 1 ,

....

na bi i

i

na b

i i ii

na b

i i ii

a bi i

a bni i

i a bi i i

x x

S a b x x

S a b x x

S a b x x

min x xS a b

x x

S a b e

α

α

α

=

=

=

=

− −

= − −

= − −

= − −

=

∑= −

Page 16: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Similarity is not absolute, but a relative category

Is a object b similar to a or it is not similar? Whether objects a and b belong to one class?

a b

a b c

a b c

We should know the answer on question: In competition with what?

Page 17: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Function of Concurrent (Rival) Similarity (FRiS)

r1

r2

-1

z

2 11

1 2

( )

( )

r rF

r r

−=+

A

+1

B

d2

F

A Bz

r1

r2

Page 18: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

A

B

A

B

A

B

A

B

All pattern recognition methods are based on hypothesis of compactness

Braverman E.M., 1962The patterns are compact if-the number of boundary points is not enough in comparison with their common number; - compact patterns are separated from each other refer to not too elaborate borders.

Compactness

Page 19: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Compactness

Similarity between objects of one pattern should be maximal

Similarity between objects of different patterns should be minimal

Page 20: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

r2

r1

i

A

B

j

b

r1 r2

j

j

b

br2r1

Maximal similarity between objects of the same pattern

Compact patterns should satisfy to condition of the

1

1( , | )

AM

ijA

D F j i bM =

= ∑

Defensive capacity:Compactness

2 1 2 1( , | ) ( ) / ( )F j i b r r r r= − +

Page 21: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Tolerance:

r2

r1

r1

r2

i

A

B

j

q

sb

Compactness

Maximal difference of these objects with the objects of other patterns

1 1

1( , | )

A BM M

ii qA B

T F q s iM M = =

= ∑ ∑

*A BC C C=

Compact patterns should satisfy to the condition

( ) / 2i i iC D T= +

2 1 2 1( , | ) ( ) / ( )F q s i r r r r= − +

1

1 AM

A iiA

C CM =

= ∑1

1 BM

B qqB

C CM =

= ∑

Page 22: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Selection of the standards (stolps)Algorithm FRiS-Stolp

max ( ) / 2i i iC D T= +

Page 23: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS
Page 24: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS
Page 25: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS
Page 26: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS
Page 27: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS
Page 28: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Censoring of the training set

Page 29: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Censoring of the training set

Page 30: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Censoring of the training set

Page 31: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Censoring of the training set

1.0.8689 -90(90)-202.0.8902 -90(90)-203.0.9084 -90(90)-204.0.9167 -90(90)-205.0.8903 -90(90)-206.0.7309 -88(90)-97.0.2324 -86(90)-7

MMmmkCH /',...)/( == α

α H P

=argmax |r|(H,P)αα =1,2,…7

Page 32: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Informativeness by Fisherfor normal distribution

1 22 21 2

| |FI

µ µσ σ

−=+

Compactness has the same sense and can be used as a criteria of informativeness, which is invariant to

low of distribution and to relation of NM

Results of comparative researches have shown appreciable advantage of this criterion in comparison

with commonly used number of errors at Cross-Validation

Criteria

Page 33: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Comparison of the criteria (CV - FRiS)

Order of attributes by informativeness

....... ....... C = 0,661....... ....... C = 0,883

noise0,6

0,7

0,8

0,9

1

1,1

0,05 0,1 0,15 0,2 0,25 0,3

Fs

U

Fs

U

N=100 M=2*100

mt =2*35 mC =2*65 +noise

noise

Criteria

Page 34: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Algorithm GRAD It based on combination of two greedy approaches:

forward and backward searches.

At a stage forward algorithm Addition

is used

At a stage backward algorithm Deletion is used

GRAD

Page 35: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Algorithm AdDel To easing influence of collecting errors a relaxation method it is applied.n1 - number of most informative attributes, add-on to subsystem (Addition),n2<n1 - number of less informative attributes, eliminated from subsystem (Deletion).

AdDel Relaxation method: n steps forward - n/2 steps back

Algorithm AdDel. Reliability (R) of recognition at

different dimension space.

R(AdDel) > R(DelAd) > R(Ad) > R(Del)

GRAD

Page 36: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Algorithm GRAD• AdDel can work with not single attributes only, but also with groups of

attributes (granules) of different capacity m=1,2,3,…: , , ,…

The granules can be formed by the exhaustive search method.

• But: Problem of combinatory explosion!

Decision: orientation on individual informativeness of attributes

Dependence of frequency f hits in an informative subsystem from serial number L on individual informativeness

It allows to granulate a most informative part attributes only

GRAD

L

f

Page 37: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Algorithm GRAD(Granulated AdDel)

1. Independent testing N attributes

Selection m1<<N first best (m1 granules power 1)

2. Forming combinations

Selection m2<< first best (m2 granules power 2)

3. Forming combinations

Selection m3<< first best (m3 granules power 3)

M =<m1,m2,m3> - set of secondary attributes (granules)AdDel(M) selects m*<<|M| best granules, which included n* attributes

21mC

21mC

31mC

31mC

2 6 9 25,3 ,5 , ,...X x x x x=< >

GRAD

Page 38: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Value of FRiS for points on a plane

Page 39: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Classification (Algorithm FRiS-Class)

FRiS-Cluster divides a objects on clustersFRiS-Tax unites a clusters to classes (taxons)

Using FRiS-function allows:- To make a taxons of any form;- To search a optimal number of taksons.

r1

r2*

r1 r2*

Page 40: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS
Page 41: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Examples of taxonomies by a algorithm FRiS-Class

Page 42: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Примеры таксономии алгоритмом FRiS-Class

Page 43: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Comparison the FRiS-Class with other algorithms of taxonomy

0,3

0,4

0,5

0,6

0,7

0,8

0,9

2 3 4 5 6 7 8 9 10 11 12 13 14 15

FRiS-Cluster

Kmeans

Forel

Scat

FRiS-Tax

K

Page 44: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Universal classification

• Unlabeled Semilabeled Labeled

(Clastering) (ТРФ) (Pattern Rec)

=================================

FRiS-TDR

Page 45: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

New methods of DM, using FRiS-function

• Quantitative estimation of compactness • Choice of informative attributes • Construction of decision rules• Censoring of the training set• Universal classification• Filling of blanks (inputation)• Forecasting• Detection of spikes

Page 46: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Unsettled problems

• Censoring of training set• Recognition with boundary• Stolp+corridor (FRiS+LDR)• Imputation • Associations• Unite of tasks of different types (UC+X)• Optimization of algorithms• Realization of program system (OTEX 2)• Applications (medicine, genetics,…)• …..

Page 47: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Conclusion

FRiS-function:

1. Provides effective measure of similarity, informativeness and compactness

2. Provides invariance to parameters of tasks, low of distribution, relation M:N3. Provides high quality of decisions

Page 48: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Conclusion

FRiS-function:1.Provides effective measure of

similarity, informativeness and compactness

2.Provides unification of methods3.Provides high quality of decisions

Publications: http://math.nsc.ru/~wwwzag

Page 49: FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Thank you!

• Questions, please?