selective gaussian naïve bayes model for diffuse large-b-cell lymphoma classification: some...

Post on 13-Apr-2017

160 Views

Category:

Science

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

0.5setgray0

0.5setgray1

0.5setgray.70

0.5setgray.90

Selective Gaussian Naive Bayes Model forDiffuse Large-B-Cell Lymphoma Classification:Some Improvements in Preprocessing and Variable Elimination

Barcelona, July 2005Andres Cano, F. Javier Garcıa, Andres Masegosa and Serafın Moral

Dept. Computer Science and Artificial Intelligence

University of Granada

Slide . . ..– p. 1/13

Introduction: Gene Expression Data

MicroArray: a biochipthat measures the expressionlevel of thousands of genes inonly one experiment.

It’s a micromatrix.

Each row contains geneticmaterial of a given gene.

Each tumoral pattern is put ineach column andhybridizated (each cell iscolored).

The micromatrix is scannedand the data is obtained.

Hybridization of LymphochipSlide . . ..– p. 2/13

Introduction: Gene Expression Data

MicroArray: a biochipthat measures the expressionlevel of thousands of genes inonly one experiment.

It’s a micromatrix.

Each row contains geneticmaterial of a given gene.

Each tumoral pattern is put ineach column andhybridizated (each cell iscolored).

The micromatrix is scannedand the data is obtained.Hybridization of Lymphochip

Slide . . ..– p. 2/13

Diffuse Large-B-Cell Lymphoma Classification

The 60 % of patients with Diffuse Large-B-Cell Lymphoma(DLBCL) succumbs to this disease.

Alizadeh et al (2000) discovered, using the Lymphochip, thatDLBCL comprises two different diseases: GCB, with a highsurvival index and ABC, with a low survival index.

They provide a data set with 42 cases, 21 cases of GCB andACB, each one with the measure of 4096 gene expression level.

The problem is, using this sort of data sets:

Build an automatic classifier for the prediction of the subtypeof DLBCL pattern.

Find a minimum subset of genes that make thisclassification.

Slide . . ..– p. 3/13

Diffuse Large-B-Cell Lymphoma Classification

The 60 % of patients with Diffuse Large-B-Cell Lymphoma(DLBCL) succumbs to this disease.

Alizadeh et al (2000) discovered, using the Lymphochip, thatDLBCL comprises two different diseases: GCB, with a highsurvival index and ABC, with a low survival index.

They provide a data set with 42 cases, 21 cases of GCB andACB, each one with the measure of 4096 gene expression level.

The problem is, using this sort of data sets:

Build an automatic classifier for the prediction of the subtypeof DLBCL pattern.

Find a minimum subset of genes that make thisclassification.

Slide . . ..– p. 3/13

Bayesian Classification of gene expression

Data Domain:

Continuous Data

p(X1|C) C1 C2

x1

10,3 0,8

x2

10,7 0,2

Discretized Data

Data Dependences:

C

X1 X2 X3

Naive Bayes Structure

C

X1 X2 X3

TAN Structure

Slide . . ..– p. 4/13

Bayesian Classification of gene expression

Data Domain:

Continuous Data

p(X1|C) C1 C2

x1

10,3 0,8

x2

10,7 0,2

Discretized Data

Data Dependences:

C

X1 X2 X3

Naive Bayes Structure

C

X1 X2 X3

TAN Structure

Slide . . ..– p. 4/13

Feature Selection with Gene Expression Data

These data sets have:

High Dimensionality:4000 and 20000 genes.

Low number of Cases:40 and 200 cases.

FSS Problems:

High Risk Overfitting

Low reliability results.

Solutions:

Filter methods.

Wrapper methods.

Filter Method + Wrapper Method

Slide . . ..– p. 5/13

Feature Selection with Gene Expression Data

These data sets have:

High Dimensionality:4000 and 20000 genes.

Low number of Cases:40 and 200 cases.

FSS Problems:

High Risk Overfitting

Low reliability results.

Solutions:

Filter methods.

Wrapper methods.

Filter Method + Wrapper Method

Slide . . ..– p. 5/13

Feature Selection with Gene Expression Data

These data sets have:

High Dimensionality:4000 and 20000 genes.

Low number of Cases:40 and 200 cases.

FSS Problems:

High Risk Overfitting

Low reliability results.

Solutions:

Filter methods.

Wrapper methods.

Filter Method + Wrapper Method

Slide . . ..– p. 5/13

Feature Selection with Gene Expression Data

These data sets have:

High Dimensionality:4000 and 20000 genes.

Low number of Cases:40 and 200 cases.

FSS Problems:

High Risk Overfitting

Low reliability results.

Solutions:

Filter methods.

Wrapper methods.

Filter Method + Wrapper Method

Slide . . ..– p. 5/13

Feature Selection with Gene Expression Data

These data sets have:

High Dimensionality:4000 and 20000 genes.

Low number of Cases:40 and 200 cases.

FSS Problems:

High Risk Overfitting

Low reliability results.

Solutions:

Filter methods.

−Select the best features using a reasonable criterion.−Use a independent criterion.−Advantage: Very efficiency.−Problem: The criterion is not associated to the problem.

Wrapper methods.

Filter Method + Wrapper Method

Slide . . ..– p. 5/13

Feature Selection with Gene Expression Data

These data sets have:

High Dimensionality:4000 and 20000 genes.

Low number of Cases:40 and 200 cases.

FSS Problems:

High Risk Overfitting

Low reliability results.

Solutions:

Filter methods.

Wrapper methods.

−Select the best features using a final criterion.−For each subset of features, try to solve the problem.−Advantage: It is very powerful.−Problem: It is very time cosuming.

Filter Method + Wrapper Method

Slide . . ..– p. 5/13

Feature Selection with Gene Expression Data

These data sets have:

High Dimensionality:4000 and 20000 genes.

Low number of Cases:40 and 200 cases.

FSS Problems:

High Risk Overfitting

Low reliability results.

Solutions:

Filter methods.

Wrapper methods.

Filter Method + Wrapper Method

Slide . . ..– p. 5/13

Preordering the features

FSS Search:Non Selected Features

C

X1 X2 X3 X4

X5 X6 X7 X8

Changes:

Introduction of preorder in the features

Filter Preorder Accuracy Preorder

Limit the search space to the N-first features

Slide . . ..– p. 6/13

Preordering the features

FSS Search:Non Selected Features

C

X1 X2 X3 X4

X5 X6 X7 X8X1

Accuracy = 83 %

Step1: Search in Non Selected Features.

Changes:

Introduction of preorder in the features

Filter Preorder Accuracy Preorder

Limit the search space to the N-first features

Slide . . ..– p. 6/13

Preordering the features

FSS Search:Non Selected Features

C

X1 X2 X3 X4

X5 X6 X7 X8X2

Accuracy = 89 %

Step1: Search in Non Selected Features.

Changes:

Introduction of preorder in the features

Filter Preorder Accuracy Preorder

Limit the search space to the N-first features

Slide . . ..– p. 6/13

Preordering the features

FSS Search:Non Selected Features

C

X1 X2 X3 X4

X5 X6 X7 X8X8

Accuracy = 84 %

Step1: Search in Non Selected Features.

Changes:

Introduction of preorder in the features

Filter Preorder Accuracy Preorder

Limit the search space to the N-first features

Slide . . ..– p. 6/13

Preordering the features

FSS Search:Non Selected Features

C

X3

X1 X2 X4

X5 X6 X7 X8

Step 2: Select the best Node.

Accuracy = 91 %

Changes:

Introduction of preorder in the features

Filter Preorder Accuracy Preorder

Limit the search space to the N-first features

Slide . . ..– p. 6/13

Preordering the features

FSS Search:Non Selected Features

C

X3

X1 X2 X4

X5 X6 X7 X8X1

Accuracy = 88 %

Step 3: Follow the Search.

Changes:

Introduction of preorder in the features

Filter Preorder Accuracy Preorder

Limit the search space to the N-first features

Slide . . ..– p. 6/13

Preordering the features

FSS Search:Non Selected Features

C

X3

X1 X2 X4

X5 X6 X7 X8X2

Accuracy = 91 %

Step 3: Follow the Search.

Changes:

Introduction of preorder in the features

Filter Preorder Accuracy Preorder

Limit the search space to the N-first features

Slide . . ..– p. 6/13

Preordering the features

FSS Search:Non Selected Features

C

X3 X7 X5

X1 X2 X4

X6 X8

Accuracy = 93 %

Until the Stop Condition.

Changes:

Introduction of preorder in the features

Filter Preorder Accuracy Preorder

Limit the search space to the N-first features

Slide . . ..– p. 6/13

Preordering the features

FSS Search:Non Selected Features

C

X3 X7 X5

X1 X2 X4

X6 X8

Accuracy = 93 %

Changes:

Introduction of preorder in the features

Filter Preorder Accuracy Preorder

Limit the search space to the N-first features

Slide . . ..– p. 6/13

Preordering the features

FSS Search:

C

Non Selected Features

X3

X5

X7

X1

....

Changes:

Introduction of preorder in the features

Filter Preorder Accuracy Preorder

Limit the search space to the N-first features

Slide . . ..– p. 6/13

Preordering the features

FSS Search:

C

Non Selected Features

X3

X5

X7

X1

....

Changes:

Introduction of preorder in the features

Filter Preorder Accuracy Preorder

Limit the search space to the N-first features

Slide . . ..– p. 6/13

Preordering the features

FSS Search:

C

Non Selected Features

X3

X5

X7

X1

....

Changes:

Introduction of preorder in the features

Filter Preorder Accuracy Preorder

Limit the search space to the N-first features

Slide . . ..– p. 6/13

Preordering the features

FSS Search:

C

Non Selected Features

X3

X5

X7

X1

....

Limited Search Space

Changes:

Introduction of preorder in the features

Filter Preorder Accuracy Preorder

Limit the search space to the N-first features

Slide . . ..– p. 6/13

Preordering the features

Limited FSS Search:

C

Non Selected Features

Limited Search Space

X3

X5

X7

X1

X2

X6

....

Slide . . ..– p. 6/13

Preordering the features

Limited FSS Search:

C

Non Selected Features

Limited Search Space

X3

X5

X7

X1

X2

X6

....

Step1: Search in Non Selected Features.

X3

Accuracy = 88 %

Slide . . ..– p. 6/13

Preordering the features

Limited FSS Search:

C

Non Selected Features

Limited Search Space

X3

X5

X7

X1

X2

X6

....

Step1: Search in Non Selected Features.

X5

Accuracy = 88 %

Slide . . ..– p. 6/13

Preordering the features

Limited FSS Search:

C

Non Selected Features

Limited Search Space

X3

X5

X7

X1

X2

X6

....

Step1: Search in Non Selected Features.

X7

Accuracy = 84 %

Slide . . ..– p. 6/13

Preordering the features

Limited FSS Search:

C

X3

Step 2: Select the best Node.

Accuracy = 88 %Non Selected Features

Limited Search SpaceX5

X7

X1

X2

X6

....

Slide . . ..– p. 6/13

Preordering the features

Limited FSS Search:

C

X3

Non Selected Features

Limited Search Space

X5

X7

X1

X2

X6

X4

....

Step 3: Follow the Search.

X5

Accuracy = 89 %

Slide . . ..– p. 6/13

Preordering the features

Limited FSS Search:

C

X3

Non Selected Features

Limited Search Space

X5

X7

X1

X2

X6

X4

....

Step 3: Follow the Search.

X7

Accuracy = 87 %

Slide . . ..– p. 6/13

Preordering the features

Limited FSS Search:

C

X3 X1 X5

Non Selected Features

Limited Search SpaceX7

X2

X6

X4

....

Until the Stop Condition.

Accuracy = 95 %

Slide . . ..– p. 6/13

Irrelevant Variable Elimination

Heuristic for Irrelevant features:

C

X1 X2 X3

Classifier X

Train Set

C

Z

Classifier Z

Z not irrelevant to X

Slide . . ..– p. 7/13

Irrelevant Variable Elimination

Heuristic for Irrelevant features:C

X1 X2 X3

Classifier X

Train Set

C

Z

Classifier Z

Z not irrelevant to X

Right Classified Non-Right Classified

Slide . . ..– p. 7/13

Irrelevant Variable Elimination

Heuristic for Irrelevant features:C

X1 X2 X3

Classifier X

Train Set

C

Y

Classifier Y

Y irrelevant to X

C

Z

Classifier Z

Z not irrelevant to X

Right Classified Non-Right Classified

Slide . . ..– p. 7/13

Irrelevant Variable Elimination

Heuristic for Irrelevant features:C

X1 X2 X3

Classifier X

Train Set

C

Z

Classifier Z

Z not irrelevant to X

Right Classified Non-Right Classified

Slide . . ..– p. 7/13

Limited Search with Variable Elimination

Wrapper Search with Variable Elimination:

Slide . . ..– p. 8/13

Limited Search with Variable Elimination

Wrapper Search with Variable Elimination:

C

Not Selected Features

Limited Search Space

X3

X5

X7

X1

X2

X6

....

Slide . . ..– p. 8/13

Limited Search with Variable Elimination

Wrapper Search with Variable Elimination:

C

Not Selected Features

Limited Search Space

X3

X5

X7

X1

X2

X6

....

Step1: Search in Non Selected Features.

X3

Slide . . ..– p. 8/13

Limited Search with Variable Elimination

Wrapper Search with Variable Elimination:

C

Not Selected Features

Limited Search Space

X3

X5

X7

X1

X2

X6

....

Step1: Search in Non Selected Features.

X5

Slide . . ..– p. 8/13

Limited Search with Variable Elimination

Wrapper Search with Variable Elimination:

C

Not Selected Features

Limited Search Space

X3

X5

X7

X1

X2

X6

....

Step1: Search in Non Selected Features.

X7

Slide . . ..– p. 8/13

Limited Search with Variable Elimination

Wrapper Search with Variable Elimination:

C

X3

Step 2: Select the best Node.

Not Selected Features

Limited Search SpaceX5

X7

X1

X2

X6

....

Slide . . ..– p. 8/13

Limited Search with Variable Elimination

Wrapper Search with Variable Elimination:

C

X3

Not Selected Features

Limited Search SpaceX5

X7

X1

X2

X6

....

Step 3: Elimination of Irrelvant Features respect to X3.

X3

Slide . . ..– p. 8/13

Limited Search with Variable Elimination

Wrapper Search with Variable Elimination:

C

X3

Not Selected Features

Limited Search SpaceX7

X1

X2

X6

....

Step 3: Elimination of Irrelvant Features respect to X3.

X3

Slide . . ..– p. 8/13

Limited Search with Variable Elimination

Wrapper Search with Variable Elimination:

C

X3

Not Selected Features

Limited Search SpaceX7

X2

X6

....

Step 3: Elimination of Irrelvant Features respect to X3.

X3

Slide . . ..– p. 8/13

Limited Search with Variable Elimination

Wrapper Search with Variable Elimination:

C

X3

Step 4: Follow the Search.

Not Selected Features

Limited Search Space

X7

X2

X6

X8

X4

X9

....

X7

Slide . . ..– p. 8/13

Limited Search with Variable Elimination

Wrapper Search with Variable Elimination:

C

X3

Step 4: Follow the Search.

Not Selected Features

Limited Search Space

X7

X2

X6

X8

X4

X9

....

X2

Slide . . ..– p. 8/13

Limited Search with Variable Elimination

Wrapper Search with Variable Elimination:

C

X3

Until the Stop Condition.

X6 X4

X5 X1 X8 X7

Irrelevant Features

X9 X10

Not Selected FeaturesFinal Subset Selected

Slide . . ..– p. 8/13

Classifying Diffuse Large-B-Cell Lymphoma

Data Base I: Taken from the work of Alizadeth et al (2000).

42 samples ( 21 GCB + 21 ABC).

348 genes.

Validation Scheme: Leave-one-out Validation.

Data Base II: Taken from the work of Wright et al (2004).

217 samples (134 GCB + 83 ABC).

8503 genes.

Validation Scheme:

−10 Train and Test sets of equal size.

−Each Train set is reduced by a filter method.

−Number of Filtered Genes: 78,7± 4,4

Slide . . ..– p. 9/13

Classifying Diffuse Large-B-Cell Lymphoma

Data Base I: Taken from the work of Alizadeth et al (2000).

42 samples ( 21 GCB + 21 ABC).

348 genes.

Validation Scheme: Leave-one-out Validation.

Data Base II: Taken from the work of Wright et al (2004).

217 samples (134 GCB + 83 ABC).

8503 genes.

Validation Scheme:

−10 Train and Test sets of equal size.

−Each Train set is reduced by a filter method.

−Number of Filtered Genes: 78,7± 4,4

Slide . . ..– p. 9/13

Classifying Diffuse Large-B-Cell Lymphoma

Data Base I: Taken from the work of Alizadeth et al (2000).

42 samples ( 21 GCB + 21 ABC).

348 genes.

Validation Scheme: Leave-one-out Validation.

Data Base II: Taken from the work of Wright et al (2004).

217 samples (134 GCB + 83 ABC).

8503 genes.

Validation Scheme:

−10 Train and Test sets of equal size.

−Each Train set is reduced by a filter method.

−Number of Filtered Genes: 78,7± 4,4

Slide . . ..– p. 9/13

Experimental Results I

Slide . . ..– p. 10/13

Experimental Results I

Feature Preorder

Data Base Data Random Preorder Filter Preorder Accuracy Preorder

DB1 Accuracy 80,9± 4,9 81,0± 4,9 92,8± 2,1

DB1 No Genes 4,3± 0,5 3,2± 0,1 3,8± 0,5

DB2 Accuracy 88,9± 0,6 91,0± 0,4 89,1± 0,5

DB2 No Genes 8,0± 3,2 9,0± 5,1 7,6± 4,0

Slide . . ..– p. 10/13

Experimental Results I

Feature Preorder

Data Base Data Random Preorder Filter Preorder Accuracy Preorder

DB1 Accuracy 80,9± 4,9 81,0± 4,9 92,8± 2,1

DB1 No Genes 4,3± 0,5 3,2± 0,1 3,8± 0,5

DB2 Accuracy 88,9± 0,6 91,0± 0,4 89,1± 0,5

DB2 No Genes 8,0± 3,2 9,0± 5,1 7,6± 4,0

Preorder Limit

Data Base Data LFSS

DB1 Accuracy 92,8± 2,1

DB1 No Genes 3,8± 0,3

DB2 Accuracy 91,8± 0,4

DB2 No Genes 7,8± 3,0

Slide . . ..– p. 10/13

Experimental Results I

Feature Preorder

Data Base Data Random Preorder Filter Preorder Accuracy Preorder

DB1 Accuracy 80,9± 4,9 81,0± 4,9 92,8± 2,1

DB1 No Genes 4,3± 0,5 3,2± 0,1 3,8± 0,5

DB2 Accuracy 88,9± 0,6 91,0± 0,4 89,1± 0,5

DB2 No Genes 8,0± 3,2 9,0± 5,1 7,6± 4,0

Preorder Limit

Data Base Data LFSS

DB1 Accuracy 92,8± 2,1

DB1 No Genes 3,8± 0,3

DB2 Accuracy 91,8± 0,4

DB2 No Genes 7,8± 3,0

Slide . . ..– p. 10/13

Experimental Results I

Feature Preorder

Data Base Data Random Preorder Filter Preorder Accuracy Preorder

DB1 Accuracy 80,9± 4,9 81,0± 4,9 92,8± 2,1

DB1 No Genes 4,3± 0,5 3,2± 0,1 3,8± 0,5

DB2 Accuracy 88,9± 0,6 91,0± 0,4 89,1± 0,5

DB2 No Genes 8,0± 3,2 9,0± 5,1 7,6± 4,0

Preorder Limit

Data Base Data LFSS

DB1 Accuracy 92,8± 2,1

DB1 No Genes 3,8± 0,3

DB2 Accuracy 91,8± 0,4

DB2 No Genes 7,8± 3,0

Slide . . ..– p. 10/13

Experimental Results II

Slide . . ..– p. 11/13

Experimental Results II

Elimination of Irrelevant Features

Data Base Data LFSS-VE LFSS FSS

BD1 Accuracy 95,2± 1,4 92,8± 2,1 80,9± 4,9

BD1 No Genes 5,4± 0,1 3,8± 0,3 4,3± 0,5

BD1 No Eval 1882 2840 74900

DB2 Accuracy 93,0± 0,4 91,8± 0,4 88,9± 0,6

DB2 No Genes 8,1± 5,6 7,8± 3,0 8,0± 3,2

DB2 No Eval 1018 1080 8002

Slide . . ..– p. 11/13

Experimental Results II

Elimination of Irrelevant Features

Data Base Data LFSS-VE LFSS FSS

BD1 Accuracy 95,2± 1,4 92,8± 2,1 80,9± 4,9

BD1 No Genes 5,4± 0,1 3,8± 0,3 4,3± 0,5

BD1 No Eval 1882 2840 74900

DB2 Accuracy 93,0± 0,4 91,8± 0,4 88,9± 0,6

DB2 No Genes 8,1± 5,6 7,8± 3,0 8,0± 3,2

DB2 No Eval 1018 1080 8002

Slide . . ..– p. 11/13

Experimental Results II

Elimination of Irrelevant Features

Data Base Data LFSS-VE LFSS FSS

BD1 Accuracy 95,2± 1,4 92,8± 2,1 80,9± 4,9

BD1 No Genes 5,4± 0,1 3,8± 0,3 4,3± 0,5

BD1 No Eval 1882 2840 74900

DB2 Accuracy 93,0± 0,4 91,8± 0,4 88,9± 0,6

DB2 No Genes 8,1± 5,6 7,8± 3,0 8,0± 3,2

DB2 No Eval 1018 1080 8002

Slide . . ..– p. 11/13

Experimental Results II

Elimination of Irrelevant Features

Data Base Data LFSS-VE LFSS FSS

BD1 Accuracy 95,2± 1,4 92,8± 2,1 80,9± 4,9

BD1 No Genes 5,4± 0,1 3,8± 0,3 4,3± 0,5

BD1 No Eval 1882 2840 74900

DB2 Accuracy 93,0± 0,4 91,8± 0,4 88,9± 0,6

DB2 No Genes 8,1± 5,6 7,8± 3,0 8,0± 3,2

DB2 No Eval 1018 1080 8002

Slide . . ..– p. 11/13

Experimental Results II

Elimination of Irrelevant Features

Data Base Data LFSS-VE LFSS FSS

BD1 Accuracy 95,2± 1,4 92,8± 2,1 80,9± 4,9

BD1 No Genes 5,4± 0,1 3,8± 0,3 4,3± 0,5

BD1 No Eval 1882 2840 74900

DB2 Accuracy 93,0± 0,4 91,8± 0,4 88,9± 0,6

DB2 No Genes 8,1± 5,6 7,8± 3,0 8,0± 3,2

DB2 No Eval 1018 1080 8002

Slide . . ..– p. 11/13

Experimental Results II

Elimination of Irrelevant Features

Data Base Data LFSS-VE LFSS FSS

BD1 Accuracy 95,2± 1,4 92,8± 2,1 80,9± 4,9

BD1 No Genes 5,4± 0,1 3,8± 0,3 4,3± 0,5

BD1 No Eval 1882 2840 74900

DB2 Accuracy 93,0± 0,4 91,8± 0,4 88,9± 0,6

DB2 No Genes 8,1± 5,6 7,8± 3,0 8,0± 3,2

DB2 No Eval 1018 1080 8002

Filter Preorder vs Accuracy Preorder

Data Base Data LFSS-VE Filter Preorder LFSS-VE Accuracy Preorder

DB1 Accuracy 88,1± 3,3 95,2± 1,4

DB1 No Genes 3,9± 0,1 5,4± 0,1

DB2 Accuracy 90,7± 0,5 93,0± 0,4

DB2 No Genes 7,6± 2,7 8,1± 5,6

Slide . . ..– p. 11/13

Experimental Results II

Elimination of Irrelevant Features

Data Base Data LFSS-VE LFSS FSS

BD1 Accuracy 95,2± 1,4 92,8± 2,1 80,9± 4,9

BD1 No Genes 5,4± 0,1 3,8± 0,3 4,3± 0,5

BD1 No Eval 1882 2840 74900

DB2 Accuracy 93,0± 0,4 91,8± 0,4 88,9± 0,6

DB2 No Genes 8,1± 5,6 7,8± 3,0 8,0± 3,2

DB2 No Eval 1018 1080 8002

Filter Preorder vs Accuracy Preorder

Data Base Data LFSS-VE Filter Preorder LFSS-VE Accuracy Preorder

DB1 Accuracy 88,1± 3,3 95,2± 1,4

DB1 No Genes 3,9± 0,1 5,4± 0,1

DB2 Accuracy 90,7± 0,5 93,0± 0,4

DB2 No Genes 7,6± 2,7 8,1± 5,6

Slide . . ..– p. 11/13

Experimental Results II

Elimination of Irrelevant Features

Data Base Data LFSS-VE LFSS FSS

BD1 Accuracy 95,2± 1,4 92,8± 2,1 80,9± 4,9

BD1 No Genes 5,4± 0,1 3,8± 0,3 4,3± 0,5

BD1 No Eval 1882 2840 74900

DB2 Accuracy 93,0± 0,4 91,8± 0,4 88,9± 0,6

DB2 No Genes 8,1± 5,6 7,8± 3,0 8,0± 3,2

DB2 No Eval 1018 1080 8002

Filter Preorder vs Accuracy Preorder

Data Base Data LFSS-VE Filter Preorder LFSS-VE Accuracy Preorder

DB1 Accuracy 88,1± 3,3 95,2± 1,4

DB1 No Genes 3,9± 0,1 5,4± 0,1

DB2 Accuracy 90,7± 0,5 93,0± 0,4

DB2 No Genes 7,6± 2,7 8,1± 5,6

Slide . . ..– p. 11/13

Experimental Results III

Results Comparison

Test Dataset

True class Predicted class

ABC GCB Unclass.

ABC 38 1 2

GCB 2 57 8

Test Dataset

True class Predicted class

ABC GCB Unclass.

ABC 32,7 3,5 4,8

GCB 3,2 58,8 5,0

Test Dataset

True class Predicted class

ABC GCB Unclass.

ABC 32,7 1,3 7,0

GCB 1,7 57,4 7,9

- Wright et al. Classifier- Validated in one partition- 27 genes selected.

- Wrapper + Abduction- Validated in 10 partitions.- 7.0 genes selected.

- LFSS-VE- Validated in 10 partitions.- 8.1 genes selected.

Slide . . ..– p. 12/13

Experimental Results III

Results Comparison

Test Dataset

True class Predicted class

ABC GCB Unclass.

ABC 38 1 2

GCB 2 57 8

Test Dataset

True class Predicted class

ABC GCB Unclass.

ABC 32,7 3,5 4,8

GCB 3,2 58,8 5,0

Test Dataset

True class Predicted class

ABC GCB Unclass.

ABC 32,7 1,3 7,0

GCB 1,7 57,4 7,9

- Wright et al. Classifier- Validated in one partition- 27 genes selected.

- Wrapper + Abduction- Validated in 10 partitions.- 7.0 genes selected.

- LFSS-VE- Validated in 10 partitions.- 8.1 genes selected.

Slide . . ..– p. 12/13

Experimental Results III

Results Comparison

Test Dataset

True class Predicted class

ABC GCB Unclass.

ABC 38 1 2

GCB 2 57 8

Test Dataset

True class Predicted class

ABC GCB Unclass.

ABC 32,7 3,5 4,8

GCB 3,2 58,8 5,0

Test Dataset

True class Predicted class

ABC GCB Unclass.

ABC 32,7 1,3 7,0

GCB 1,7 57,4 7,9

- Wright et al. Classifier- Validated in one partition- 27 genes selected.

- Wrapper + Abduction- Validated in 10 partitions.- 7.0 genes selected.

- LFSS-VE- Validated in 10 partitions.- 8.1 genes selected.

Slide . . ..– p. 12/13

Experimental Results III

Results Comparison

Test Dataset

True class Predicted class

ABC GCB Unclass.

ABC 38 1 2

GCB 2 57 8

Test Dataset

True class Predicted class

ABC GCB Unclass.

ABC 32,7 3,5 4,8

GCB 3,2 58,8 5,0

Test Dataset

True class Predicted class

ABC GCB Unclass.

ABC 32,7 1,3 7,0

GCB 1,7 57,4 7,9

- Wright et al. Classifier- Validated in one partition- 27 genes selected.

- Wrapper + Abduction- Validated in 10 partitions.- 7.0 genes selected.

- LFSS-VE- Validated in 10 partitions.- 8.1 genes selected.

Slide . . ..– p. 12/13

Experimental Results III

Results Comparison

Test Dataset

True class Predicted class

ABC GCB Unclass.

ABC 38 1 2

GCB 2 57 8

Test Dataset

True class Predicted class

ABC GCB Unclass.

ABC 32,7 3,5 4,8

GCB 3,2 58,8 5,0

Test Dataset

True class Predicted class

ABC GCB Unclass.

ABC 32,7 1,3 7,0

GCB 1,7 57,4 7,9

- Wright et al. Classifier- Validated in one partition- 27 genes selected.

- Wrapper + Abduction- Validated in 10 partitions.- 7.0 genes selected.

- LFSS-VE- Validated in 10 partitions.- 8.1 genes selected.

Slide . . ..– p. 12/13

Experimental Results III

Results Comparison

Test Dataset

True class Predicted class

ABC GCB Unclass.

ABC 38 1 2

GCB 2 57 8

Test Dataset

True class Predicted class

ABC GCB Unclass.

ABC 32,7 3,5 4,8

GCB 3,2 58,8 5,0

Test Dataset

True class Predicted class

ABC GCB Unclass.

ABC 32,7 1,3 7,0

GCB 1,7 57,4 7,9

- Wright et al. Classifier- Validated in one partition- 27 genes selected.

- Wrapper + Abduction- Validated in 10 partitions.- 7.0 genes selected.

- LFSS-VE- Validated in 10 partitions.- 8.1 genes selected.

Slide . . ..– p. 12/13

Conclusions and Future Work

The wrapper technique is a powerful method in supervisedclassification task.

Its main disadvantage is its high computational cost.

In special, in Gene Expression Data bases due to its highdimensionality.

LFSS-VE solves these disadvantages in the DLBCLclassification using a Preodering of the features and aLimited Search Space.

The elimination of irrelevant features is a good method toenhance the performance of a wrapper method.

The future line of work is the validation of our model withother data sets: breast cancer, colon cancer, leukemia ...

Slide . . ..– p. 13/13

top related