the kedri integrated system for personalised modelling

The KEDRI Integrated System for Personalised Modelling:

Software development and experiment results

Prof. Nikola Kasabov Dr. Raphael Hu Gary Chen

The Knowledge Engineering and Discovery Research Institute (KEDRI) Auckland University of Technology

www.kedri.info

23/11/2011 [email protected]; [email protected]

Presenter

Presentation Notes

http://www.kedri.info/�

Overview

• Introduction

• The development of new algorithms and methods for personalised modelling in KEDRI

• Software prototype demo

• Conclusion and future direction


KEDRI: The Knowledge Engineering and Discovery Research Institute at AUT

(www.kedri.info)

• Established in 2002 by Prof. Nikola Kasabov • Focus: novel information processing methods,

technologies and applications for discoveries across different areas of science

• Methods are mainly based on personalised modelling, brain information processing, evolution, genetics and quantum physics;


KEDRI


Computational Modelling Techniques


Global modelling creates a model from the data which covers the entire problem space and is represented by a single function, e.g. a regression function, a RBF, a MLP neural network, SVM, etc.

Global, local and personalized modelling are three main approaches for modelling and pattern discovery in machine learning area [1].

Local modelling builds a set of local models from data, each representing a sub-space (e.g. a cluster) of the whole problem space. These models can be a set of rules or a set of local regressions, etc.

Personalised modelling uses transductive reasoning to create a specific model for each single data point (e.g. a data vector, a patient record) within a localised problem space.

[email protected]; [email protected]

• The issue of using global modelling for prediction problems: a global model is derived from all available data for the target and then applied to any new patient anywhere at anytime. Prediction and treatment based on global models are only effective for some patients (approx 70%) [2].

• Personalised Modelling: The rationale behind personalised modeling paradigm is: since each person is different, the most effective treatment could be only based on the detailed analysis for this particular patient.

• The availability of utilising a variety of data: DNA, RNA, protein expression, inheritance, disease, etc.

• The benefits of using personalised models for medical applications – To produce better results for classification and prediction – To create the profiling for individuals – To provide a potential improvement scenario for individuals, if it is possible

Why Personalised Modelling?

23/11/2011

Research Objectives of Personalised Modelling

• To create accurate personalised computational models: the model is specific for an individual utilising the available information from other individuals related to the same problem.

• To develop new algorithms and methods for personalised modelling;

• To apply the above proposed algorithms and methods on the data from different sources: gene expression data, protein data, SNPs (single-nucleotide polymorphism) data, clinical data, etc;


The Integrated Method for Personalised Modelling (IMPM) for Data Analysis

The proposed framework and system using IMPM biomedical data analysis [2]

23/11/2011 [email protected]; h @ t

Data Repository

Feature selection

Similarity measurement

Neighbour creation

Learning models, e.g.

risk probability evaluation,

disease classification,

etc.

Outcome visualisation (personalised profiling, risk probability)

Optimisation (evolutionary computation, snn)

Optimisation


Coevolutionary algorithm (CEA): CEA is derived from evolutionary algorithm. The individuals in CEA are from two or more populations and their assigned fitness values based on their interactions between different populations.

A sample of a simple 2-species coevolutionary model.

Software Architecture of IMPM


An example of software architecture of ISPM

An Integrated Optimisation System for Personalised Modelling (IOSPM)

• Cross-platform – implemented by QT which is able to be compiled under different platforms, such as Microsoft Windows, Mac OS, and Linux.

• Integrated – combine methods/functions written in different languages (e.g. MATLAB, Python, JAVA and C/C++ etc).

• Extensible – new methods/functions can be easily plugged in by editing system schema to generate dynamic GUI interface.

23/11/2011 [email protected]; h @ t

+Select Data file()+Select Optimisation Method()

+Select Modelling Method()+Select Data pre-processing()

-<<UI>>Main GUI

Data Loading

Data Pre-processing

PMOptimisation

Spiking Neural Network

Lib SVM

K-Nearest Neighbor

DENFIS

WKNN/WWKNN

+Select Visualisation Mode()+Create Results Report()

-<<UI>>Visualisation GUI

Visualisation Mode 2


Data Report Generator


QT XML GUIMATLAB

ExecutablePackage

OpenGL PythonPackage C++ Code

INTERFACE

1

INNTERFACE 2

An overview of the IOSPM system



• Module for Neighbourhood Creation: Euclidean distance method; Hamming distance method; Cosine distance

method; Kernel distance methods; other methods.

• Module for Classification/Prediction: – Classification methods, such as: MLR, MLP, ECF, wkNN, wwkNN, TWNFI,

SVM, eSNN. – Probability prediction methods, such as: DENFIS, TWNFI.

• Module for Optimisation: Evolutionary computatio (EC), quantum inspired evolutionary algorithm, particle

swarm optimisation (PSO), quantum inspired PSO, other methods.

• Module for Task Distribution Centre: This module will control the whole optimisation process, will communicate with

the user, will visualise the results.

Implementation of ISPM

23/11/2011

An exemplar content of the modules is given below:


Model Overall accuracy Class 1 Class 2 MLR (global) 72.58% 75.00% 68.18% RBF (global) 79.03% 90.00% 59.09% IMPM(personalised) 87.10% 90.00% 81.82%

Global Modelling vs. Personalised Modelling

23/11/2011

Colon cancer gene expression data

Personalised Modelling for Bioinformatics Research

20 40 60 80 100 120 140

100

200

300

400

500

600

each bit represents one feature

gene

ratio

n

Compact GA Evolution

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

37712851892419 812184315743501991513 5611863814 8091069395 462 3480

0.02

0.04

0.06

0.08

Index of genes

Weig

hte

d i

mp

ort

an

ce

Weighted importance of selected features

(a) The evolution of feature selection for sample #32 using 600 generations of GA optimisation;

(b) The weighted importance of the selected features for sample #32 after one run of the method;

An example: applying PM on gene expression data for colon cancer diagnosis

Results from a simple experiment on colon cancer gene expression data


0.2 0.40.6 0.8

1 0.20.4 0.6

0.8

0.2

0.4

0.6

0.8

f189

2Visualizing the results of PFS with 3 featuresBlue (Circle points) - actual value of this gene

Green Upward Triangle -Healthy; Red Downward Triangle-Diseased

f377 f1285

377 12851892 419 812 18431574 350 1991 513 561 1863 814 809 1069 395 462 3480

200

400

600

800

1000

1200

1400

Index of Selected Genes

Gene E

xpre

ssio

n L

evel

Blue (Circle points) - actual value of this geneGreen Upward Triangle -Healthy Red Downward Triangle-Diseased

419 377 1423 132 105818921982 350 79110601495 49 824 8921296186319240

5

10

15

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Threshold

Class

ificati

on A

ccura

cy

Colon cancer data - area under Curve: 0.87727

ROC CurveOverall AccuracyClass 1 AccuracyClass 2 Accuracy

(c) Sample 32 (a blue dot) is plotted with its neighbouring samples (red triangles represent cancer samples and green triangles - control) in the 3D space of the top 3 gene variables from (b); (d) The profile of sample #32 (blue dots) versus the average local profile of the control (green) and cancer (red triangles) using the features from (b) (e) The 17 most frequently selected features for all samples - the method is run 20 times for each sample; (f)The accuracy of personalised diagnosis across all 60 samples when the 17 markers from (e) are used in a leave-one-out cross validation; in case of ROC curve x axis represents false positive rate (1-specificity), while y axis is true positive rate (sensitivity); the area under curve is 0.87727 and the overall accuracy - 87.10%;


Personalised Modelling (PM) for CVD Diagnosis and Risk Prognosis

5 10 15 20 25 30 350.65

0.7

0.75

0.8

0.85

0.9

Num of neighbors (k)

Cla

ssifi

catio

n ac

cura

cy

The dependency of classification accuracy on number of neighbors

overallAccclass 1 accclass 2 acc

The PM method optimises automatically the number of the neighbouring samples K, which can be unique for every input sample or chosen as an optimal for all.

This study aims at personalised modelling for cardiovascular diseases (CVD) diagnosis.


Software Demo



• The proposed IMPM has a major advantage: the modelling process starts with all relevant variables available for a person, rather than with a fixed set of variables required by a global model.

• The proposed IMPM leads to a better prognostic accuracy and a computed personalised profile;

• With global optimisation using IMPM, a small set of variables (potential markers) can be identified from the selected variable set across the whole population

• The proposed algorithms and models of IMPM are generic which can be potentially incorporated into a variety of applications for data analysis and knowledge discovery with certain constraints, such as financial risk analysis, time series data prediction, etc

• We hope that this study will motivate the applications of personalised modelling research in different research areas.

Conclusion

23/11/2011

Reference List:

1. Kasabov, N.: Global, local and personalized modelling and pattern discovery in bioinformatics: An integrated approach. Pattern Recognition Letters 28(6) (2007) 673–685.

2. Amnon Shabo. Health record banks: integrating clinical and genomic data into patientcentric longitudinal and cross-institutional health records. Personalised Medicine, 4(4):453–455, 2007.

3. Kasabov, N and Hu, Y (2011) Integrated optimisation method for personalised modelling and case study applications, Int. J. Functional Informatics and Personalised Medicine, vol. 3, no.3, pp. 236-256, 2010.


Questions?


the kedri integrated system for personalised modelling

Health & Medicine

nikola kasabov

23112011 nkasabovaut

discovery

gene expression

kedri integrated

nkasabovaut

pattern discovery

research objectives