genecbr a case-base reasoning tool for cancer diagnosis using microarray datasets

26
1 geneCBR a case-base reasoning tool for cancer diagnosis using microarray datasets dr. florentino fdez-riverola university of vigo Computer System of New Generation

Upload: edmund

Post on 21-Jan-2016

31 views

Category:

Documents


0 download

DESCRIPTION

geneCBR a case-base reasoning tool for cancer diagnosis using microarray datasets. dr. florentino fdez-riverola university of vigo. Computer System of New Generation. Outline. DNA Microarray Technology characteristics and model operation overview Bioinformatics and AI - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: geneCBR a case-base reasoning tool for cancer diagnosis using microarray datasets

1

geneCBRa case-base reasoning tool for cancer diagnosis using

microarray datasets

dr. florentino fdez-riverolauniversity of vigo

Computer System of New Generation

Page 2: geneCBR a case-base reasoning tool for cancer diagnosis using microarray datasets

Microarrays Bioinformatics-AI CBR systems geneCBR Demo

OutlineDNA Microarray Technology

characteristics and model operation overview

Bioinformatics and AInew challenges and emerging research areas

CBR systemscase-based reasoning

GENE-CBRhuman genome analysis using CBR systems

DemogeneCBR in action: cancer diagnosis using

microarrays 2/26

Page 3: geneCBR a case-base reasoning tool for cancer diagnosis using microarray datasets

Microarrays Bioinformatics-AI CBR systems geneCBR Demo

Microarrays: characteristics

3/26

silicon chips that can measure the expression levels of thousands of genes simultaneously

microarrays are base on a database over 40000 fragments of genes called expressed sequence tags (ESTs)

allow us for the first time to obtain a “global” view of the cells belonging to:

• different individuals• different time-intervals for the same individual• different tissues of the same individual

gene expression profiles can be used as inputs to large-scale data analysis as:

• fingerprints to build more accurate molecular classification• discovering hidden taxonomies• Increasing our understanding of normal and disease states

Page 4: geneCBR a case-base reasoning tool for cancer diagnosis using microarray datasets

Microarrays Bioinformatics-AI CBR systems geneCBR Demo

Microarrays: model operation overview

4/26

how does the chip work?

Microarray chips incorporate different dyed genes tiled in a grid-like fashion

The individual’s DNA to analyze is dyed with a different colour

Both sets of labelled DNA strands are allowed to hybridize or bind

hybridization events are detected identifying fluorescent changes in the strands or DNA

an scanner and the associated software perform various forms of image analysis to measure and report raw gene expression values

the scanned intensities show how active the genes represented by the ESTs are in the cell:

• strong fluorescence indicates that the gene is very active in the cell

• no fluorescence indicates that the gene is inactive in the cell

scanner

preprocessing microarray data file

Page 5: geneCBR a case-base reasoning tool for cancer diagnosis using microarray datasets

Microarrays Bioinformatics-AI CBR systems geneCBR Demo

Available databone narrow samples from 43 adult patients with Acute Myeloid Leukemia (AML) plus 6 sane individuals

10 patients with Acute Promyelocytic Leukemia [APL]4 patients with Acute Myeloid Leukemia with inv(16) [AML-inv(16)]7 patients with Acute Monocytic Leukemia [AML-mono]22 patients with Acute non-Monocytic Leukemia [AML-other]6 samples belonging to sane individuals [control samples]

5/26

volume of information processedeach microarray contains 22.283 ESTs ( genes)49 microarrays = 1.091.867 gene expression values

today available data150 microarrays (Human Genome 133A) + 210 microarrays (Human Genome - plus)

Page 6: geneCBR a case-base reasoning tool for cancer diagnosis using microarray datasets

Microarrays Bioinformatics-AI CBR systems geneCBR Demo

Challenges for microarray Data Mining

6/26

three main types of data analysis needed for biomedical applications:

gene selection ( attribute selection in AI):• find the genes most strongly related to a particular class

classification ( supervised classification in AI)• classifying diseases or predicting outcomes based on gene

expression patterns, and perhaps even identifying the best treatment for given genetic signature

clustering ( unsupervised classification in AI)• finding new biological classes or refining existing ones

three parallel research areas:convenient visualization of experiments and resultsdiscovery of biological knowledge (metabolic pathways, etc.)low-level analysis providing better readouts (preprocessing, normalization, etc.)

Page 7: geneCBR a case-base reasoning tool for cancer diagnosis using microarray datasets

Microarrays Bioinformatics-AI CBR systems geneCBR Demo

Problems with existing data

7/26

analysis of microarrays presents a number of unique challenges for Machine Learning and Data Mining techniques but …Its capacity for generating enormous amounts of data is, however, also an handicap:

great amount of data belonging to each individual (thousands of genes)

• efficiency and memory problemslack of initial knowledge

• which is the significance level of each gene?given the difficulty of collecting microarray samples, the number of samples is likely to remain small in many interesting caseshaving so many fields relative to so few samples creates a high likelihood of finding false positivesthese problems are increased if we consider the potential errors that can be present in microarray data (symmetric and random errors)

it is required sophisticated data analysis techniques and robust methods capable of extracting biologically meaningful knowledge from the raw data

Page 8: geneCBR a case-base reasoning tool for cancer diagnosis using microarray datasets

Microarrays Bioinformatics-AI CBR systems geneCBR Demo

CBR systems (Case-Based Reasoning)

8/26

Kolodner (1983a, 1983b). Problem solving paradigm in AI. It can be viewed as a methodology for reasoning and learning

“reasoning by re-using past cases is a powerful and frequently applied way to solve problems for humans” Joh (1997)

the memory of the system (case base) stores a certain number of previously experienced situations

CASE = PROBLEM description + applied SOLUTION [ + RESULT ]

a new problem is solved by finding similar past cases and reusing them in the new problem situation

Riesbeck et al., (1989)

4 cyclical steps are performed when it is necessary to solve a new problem

Kolodner (1993); Aamodt y Plaza (1994); Watson (1997)

Case-based reasoning is - in effect - a cyclic and integrated process of solving a problem, learning from this experience, solving a new problem, and so on...

Page 9: geneCBR a case-base reasoning tool for cancer diagnosis using microarray datasets

Microarrays Bioinformatics-AI CBR systems geneCBR Demo

The CBR cycle

9/26

RETRIEVINGone or more previously experienced cases

New problem

CASEBASECASEBASE

MEMORY

most similarcases

(1) (1) RETRIEVERETRIEVE

proposed solution

(2)(2)

REUSEREUSE

confirmedsolution

(3)(3)

REVISEREVISE

(4) RETAIN(4) RETAIN

REUSINGthe case(s) in one way or another

REVISINGthe solution based on reusing a previous case(s)

RETAININGthe new experience by incorporating it into the existing knowledge-base (case base).

Page 10: geneCBR a case-base reasoning tool for cancer diagnosis using microarray datasets

Microarrays Bioinformatics-AI CBR systems geneCBR Demo

Main characteristics of CBR systems

10/26

adaptive and dynamic systems: the number of cases stored in the memory of the model changes, allowing the system adaptation to new situations

CBR allow the utilisation of general knowledge in the resolution of a particular problem

CBR facilitate the indexation of the available information

CBR can use uncompleted cases

CBR are advised about their limitations (perhaps a problem has no solution)

CBR facilitate the utilisation of representative and flexible data structures

case adaptation aids to discover inter-connections and hided structures in the available data

CBR can be completely automated

Page 11: geneCBR a case-base reasoning tool for cancer diagnosis using microarray datasets

Microarrays Bioinformatics-AI CBR systems geneCBR Demo

GENE-CBR

11/26

Page 12: geneCBR a case-base reasoning tool for cancer diagnosis using microarray datasets

Microarrays Bioinformatics-AI CBR systems geneCBR Demo

Goals

12/26

ObjectivesDevelop an effective and reliable

system able to diagnose cancer subtypes based on the analysis of microarray data

doctor

research group

programmer

GENE-CBRCBR system (Case-Based Reasoning)

“Solve new problems (new patient) based on the previous experience (diagnosed patients)”

uses

AI techniques

selection, clustering, inference…

Implement a flexible tool for designing and testing new techniques and experiments

Construct an advanced edition module for run-time modification of coded techniques

BeanShell

Programmer interface

Page 13: geneCBR a case-base reasoning tool for cancer diagnosis using microarray datasets

Microarrays Bioinformatics-AI CBR systems geneCBR Demo

Logic architecture

13/26

CBRCBR

JCBR

DiagnosticMode

(diagnosing)

ExpertMode

(testing techniques)

Programming Mode (BeanShell)

research group

doctor

programmer

CASE

BASE

[2]REUSE

[1]RETRIEVE

[3]REVISE

[4]RETAIN

wizard

DFP GCSDFP GCS

Page 14: geneCBR a case-base reasoning tool for cancer diagnosis using microarray datasets

Microarrays Bioinformatics-AI CBR systems geneCBR Demo

Model overview

14/26

most relevant genes = DFP

genetically similar patients

revised prediction and final diagnostic

reclassification

gene-CBR

Initial prediction

GeneSelection

Clustering

PredictionKnowledgeDiscovery

Page 15: geneCBR a case-base reasoning tool for cancer diagnosis using microarray datasets

Microarrays Bioinformatics-AI CBR systems geneCBR Demo

GENE-CBR::[i] retrieval

15/26

objectives:perform gene selection without losing information

• extracting simplified fuzzy patterns (FP) for each pathologypossibility of using AI techniques initially discarded

main phases:supervised fuzzy discretisation of gene expression values

• Low, Medium, High and overlapping labels (LM, MH)supervised gene selection for each pathology

advantages:independence of the ordering existing in datatakes into account data variabilityallows for discovering new knowledgeobtained results are interpretable

Page 16: geneCBR a case-base reasoning tool for cancer diagnosis using microarray datasets

Microarrays Bioinformatics-AI CBR systems geneCBR Demo

GENE-CBR::[i] retrieval

16/26

healthy APL AML-inv() AML-mono AML-otherLeucemia Aguda Promielocítica

Page 17: geneCBR a case-base reasoning tool for cancer diagnosis using microarray datasets

Microarrays Bioinformatics-AI CBR systems geneCBR Demo

GENE-CBR::[i] retrieval

17/26

healthy APL AML-inv() AML-mono AML-other

FP_healthy

FP_APL

FP_AML-inv()

FP_AML-monocytic

FP_AML-other

DFP

.

.

.

Page 18: geneCBR a case-base reasoning tool for cancer diagnosis using microarray datasets

Microarrays Bioinformatics-AI CBR systems geneCBR Demo

GENE-CBR::[ii] reuse

18/26

objectives:unsupervised identification of genetic similarities between patients

• taking only into account the previous selected genes (DFP)

main phases:training a GCS network DFP-dimensional

• Growing Cell Structures. Fritzke, B. (1993)

presenting the new patient to the networkclassifying using a proportional weighting voting schema

advantages:clustering without taking into account the patient classdefinition of an indexing and similarity structure between nodes ( relating patients)generation of clusters containing new subtypes of unknown cancer (knowledge discovery)

Page 19: geneCBR a case-base reasoning tool for cancer diagnosis using microarray datasets

Microarrays Bioinformatics-AI CBR systems geneCBR Demo

GENE-CBR::[ii] reuse

19/26

DFP

.

.

.

AML-inv()

AML-inv()

AML-inv()

AML-otras

¿? AML-inv()

+ Similarity

- Similarity

PAT. gene expression values DFP CLASS

Page 20: geneCBR a case-base reasoning tool for cancer diagnosis using microarray datasets

Microarrays Bioinformatics-AI CBR systems geneCBR Demo

GENE-CBR::[iii] revise

20/26

objectives:provide doctors with meaningful information about the classification carried out by the systemhelp in discovering new knowledge

• if-then rules as decision making support mechanism

information supplied:identification of similar patients (from a genetically point of view)proportional weighting voting and assigned weightsrules generation using See5. Quinlan, J.R. (2000)

• DFP genes belonging to the set of patients retrieved by the GCS network

advantages:doctors can supervise the final decision proposed by the systemnew knowledge generation in the form of easy understandable rules

Page 21: geneCBR a case-base reasoning tool for cancer diagnosis using microarray datasets

Microarrays Bioinformatics-AI CBR systems geneCBR Demo

GENE-CBR::[iii] revise

21/26

AML-inv()

AML-inv()

AML-inv()

AML-otras

AML-inv()

Rule 6: (45 / 4, lift 1.1)

If X65962 (AFFX-HSAC07/X00351_5_at) is LOW then

If U96781 (AFFX-BioDn-3_at) is LOW-MEDIUM then AML-other

Else If D87845 (AFFX-hum_alu_at) is HIGH then AML-inv() [0.968]

CARIOTYPEBIOLOGICAL AND CLINICAL CHARACTERISTICS

Page 22: geneCBR a case-base reasoning tool for cancer diagnosis using microarray datasets

Microarrays Bioinformatics-AI CBR systems geneCBR Demo

GENE-CBR::[iv] retain

22/26

objectives:feedback the system with new knowledge

• new subclassification of existing cancer pathologies• reclassification of existing patients• identification of correlated genes• discovering of new marks able to distinguish new pathologies• Identification of prototypical patients and rare cases

main phases:update the case base with new a microarray every time a new classification is generatedmodification of the parameters of the model

advantages:possibility of easily integrating new biological knowledge in the hybrid system

Page 23: geneCBR a case-base reasoning tool for cancer diagnosis using microarray datasets

Microarrays Bioinformatics-AI CBR systems geneCBR Demo

Applied technologies

23/26

100% JavaSwingBeanShellLog4jJFreeChart

Design patternsActionFutureMVCSingletonWizard

Unified Modeling Language

Poseidon for UML

Page 24: geneCBR a case-base reasoning tool for cancer diagnosis using microarray datasets

Microarrays Bioinformatics-AI CBR systems geneCBR Demo

Future work

24/26

going through a plug-in architecturedesigning a core where each technique is implemented as a plug-in => aiBENCH

implementing fold-cross validationgeneration of multiple training and test cases in an automatic way

supporting standard microarray data formatsMIAME: Minimum Information About a Microarray Experiment

deploying of GENE-CBR with JavaWebStartremote and automatic access to latest versions of GENE-CBR project

on-line access to genetic sequence databasesgeneBank (http://www.ncbi.nlm.nih.gov/Genbank)

Page 25: geneCBR a case-base reasoning tool for cancer diagnosis using microarray datasets

Microarrays Bioinformatics-AI CBR systems geneCBR Demo

Demo:: GENE-CBR in action

25/26

Page 26: geneCBR a case-base reasoning tool for cancer diagnosis using microarray datasets

26

geneCBRa case-base reasoning tool for cancer diagnosis using

microarray datasets

dr. florentino fdez-riverolauniversity of vigo

Computer System of New Generation