the kedri integrated system for personalised modelling
Post on 19-Oct-2014
825 views
DESCRIPTION
Raphael Hu KEDRI, Auckland University of Technology (Wednesday, 1.15, Data Analysis Workshop)TRANSCRIPT
The KEDRI Integrated System for Personalised Modelling:
Software development and experiment results
Prof. Nikola Kasabov Dr. Raphael Hu Gary Chen
The Knowledge Engineering and Discovery Research Institute (KEDRI) Auckland University of Technology
www.kedri.info
23/11/2011 [email protected]; [email protected]
Overview
• Introduction
• The development of new algorithms and methods for personalised modelling in KEDRI
• Software prototype demo
• Conclusion and future direction
23/11/2011 [email protected]; [email protected]
KEDRI: The Knowledge Engineering and Discovery Research Institute at AUT
(www.kedri.info)
• Established in 2002 by Prof. Nikola Kasabov • Focus: novel information processing methods,
technologies and applications for discoveries across different areas of science
• Methods are mainly based on personalised modelling, brain information processing, evolution, genetics and quantum physics;
23/11/2011 [email protected]; [email protected]
KEDRI
23/11/2011 [email protected]; [email protected]
Computational Modelling Techniques
23/11/2011 [email protected]; [email protected]
Global modelling creates a model from the data which covers the entire problem space and is represented by a single function, e.g. a regression function, a RBF, a MLP neural network, SVM, etc.
Global, local and personalized modelling are three main approaches for modelling and pattern discovery in machine learning area [1].
Local modelling builds a set of local models from data, each representing a sub-space (e.g. a cluster) of the whole problem space. These models can be a set of rules or a set of local regressions, etc.
Personalised modelling uses transductive reasoning to create a specific model for each single data point (e.g. a data vector, a patient record) within a localised problem space.
[email protected]; [email protected]
• The issue of using global modelling for prediction problems: a global model is derived from all available data for the target and then applied to any new patient anywhere at anytime. Prediction and treatment based on global models are only effective for some patients (approx 70%) [2].
• Personalised Modelling: The rationale behind personalised modeling paradigm is: since each person is different, the most effective treatment could be only based on the detailed analysis for this particular patient.
• The availability of utilising a variety of data: DNA, RNA, protein expression, inheritance, disease, etc.
• The benefits of using personalised models for medical applications – To produce better results for classification and prediction – To create the profiling for individuals – To provide a potential improvement scenario for individuals, if it is possible
Why Personalised Modelling?
23/11/2011
Research Objectives of Personalised Modelling
• To create accurate personalised computational models: the model is specific for an individual utilising the available information from other individuals related to the same problem.
• To develop new algorithms and methods for personalised modelling;
• To apply the above proposed algorithms and methods on the data from different sources: gene expression data, protein data, SNPs (single-nucleotide polymorphism) data, clinical data, etc;
23/11/2011 [email protected]; [email protected]
The Integrated Method for Personalised Modelling (IMPM) for Data Analysis
The proposed framework and system using IMPM biomedical data analysis [2]
23/11/2011 [email protected]; h @ t
Data Repository
Feature selection
Similarity measurement
Neighbour creation
Learning models, e.g.
risk probability evaluation,
disease classification,
etc.
Outcome visualisation (personalised profiling, risk probability)
Optimisation (evolutionary computation, snn)
Optimisation
23/11/2011 [email protected]; [email protected]
Coevolutionary algorithm (CEA): CEA is derived from evolutionary algorithm. The individuals in CEA are from two or more populations and their assigned fitness values based on their interactions between different populations.
A sample of a simple 2-species coevolutionary model.
Software Architecture of IMPM
23/11/2011 [email protected]; [email protected]
An example of software architecture of ISPM
An Integrated Optimisation System for Personalised Modelling (IOSPM)
• Cross-platform – implemented by QT which is able to be compiled under different platforms, such as Microsoft Windows, Mac OS, and Linux.
• Integrated – combine methods/functions written in different languages (e.g. MATLAB, Python, JAVA and C/C++ etc).
• Extensible – new methods/functions can be easily plugged in by editing system schema to generate dynamic GUI interface.
23/11/2011 [email protected]; h @ t
+Select Data file()+Select Optimisation Method()
+Select Modelling Method()+Select Data pre-processing()
-<<UI>>Main GUI
Data Loading
Data Pre-processing
PMOptimisation
Spiking Neural Network
Lib SVM
K-Nearest Neighbor
DENFIS
WKNN/WWKNN
+Select Visualisation Mode()+Create Results Report()
-<<UI>>Visualisation GUI
Visualisation Mode 2
Visualisation Mode 1
Data Report Generator
Visualisation Mode 3
QT XML GUIMATLAB
ExecutablePackage
OpenGL PythonPackage C++ Code
INTERFACE
1
INNTERFACE 2
An overview of the IOSPM system
23/11/2011 [email protected]; [email protected]
[email protected]; [email protected]
• Module for Neighbourhood Creation: Euclidean distance method; Hamming distance method; Cosine distance
method; Kernel distance methods; other methods.
• Module for Classification/Prediction: – Classification methods, such as: MLR, MLP, ECF, wkNN, wwkNN, TWNFI,
SVM, eSNN. – Probability prediction methods, such as: DENFIS, TWNFI.
• Module for Optimisation: Evolutionary computatio (EC), quantum inspired evolutionary algorithm, particle
swarm optimisation (PSO), quantum inspired PSO, other methods.
• Module for Task Distribution Centre: This module will control the whole optimisation process, will communicate with
the user, will visualise the results.
Implementation of ISPM
23/11/2011
An exemplar content of the modules is given below:
[email protected]; [email protected]
Model Overall accuracy Class 1 Class 2 MLR (global) 72.58% 75.00% 68.18% RBF (global) 79.03% 90.00% 59.09% IMPM(personalised) 87.10% 90.00% 81.82%
Global Modelling vs. Personalised Modelling
23/11/2011
Colon cancer gene expression data
Personalised Modelling for Bioinformatics Research
20 40 60 80 100 120 140
100
200
300
400
500
600
each bit represents one feature
gene
ratio
n
Compact GA Evolution
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
37712851892419 812184315743501991513 5611863814 8091069395 462 3480
0.02
0.04
0.06
0.08
Index of genes
Weig
hte
d i
mp
ort
an
ce
Weighted importance of selected features
(a) The evolution of feature selection for sample #32 using 600 generations of GA optimisation;
(b) The weighted importance of the selected features for sample #32 after one run of the method;
An example: applying PM on gene expression data for colon cancer diagnosis
Results from a simple experiment on colon cancer gene expression data
23/11/2011 [email protected]; [email protected]
0.2 0.40.6 0.8
1 0.20.4 0.6
0.8
0.2
0.4
0.6
0.8
f189
2Visualizing the results of PFS with 3 featuresBlue (Circle points) - actual value of this gene
Green Upward Triangle -Healthy; Red Downward Triangle-Diseased
f377 f1285
377 12851892 419 812 18431574 350 1991 513 561 1863 814 809 1069 395 462 3480
200
400
600
800
1000
1200
1400
Index of Selected Genes
Gene E
xpre
ssio
n L
evel
Blue (Circle points) - actual value of this geneGreen Upward Triangle -Healthy Red Downward Triangle-Diseased
419 377 1423 132 105818921982 350 79110601495 49 824 8921296186319240
5
10
15
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Threshold
Class
ificati
on A
ccura
cy
Colon cancer data - area under Curve: 0.87727
ROC CurveOverall AccuracyClass 1 AccuracyClass 2 Accuracy
(c) Sample 32 (a blue dot) is plotted with its neighbouring samples (red triangles represent cancer samples and green triangles - control) in the 3D space of the top 3 gene variables from (b); (d) The profile of sample #32 (blue dots) versus the average local profile of the control (green) and cancer (red triangles) using the features from (b) (e) The 17 most frequently selected features for all samples - the method is run 20 times for each sample; (f)The accuracy of personalised diagnosis across all 60 samples when the 17 markers from (e) are used in a leave-one-out cross validation; in case of ROC curve x axis represents false positive rate (1-specificity), while y axis is true positive rate (sensitivity); the area under curve is 0.87727 and the overall accuracy - 87.10%;
23/11/2011 [email protected]; [email protected]
Personalised Modelling (PM) for CVD Diagnosis and Risk Prognosis
5 10 15 20 25 30 350.65
0.7
0.75
0.8
0.85
0.9
Num of neighbors (k)
Cla
ssifi
catio
n ac
cura
cy
The dependency of classification accuracy on number of neighbors
overallAccclass 1 accclass 2 acc
The PM method optimises automatically the number of the neighbouring samples K, which can be unique for every input sample or chosen as an optimal for all.
This study aims at personalised modelling for cardiovascular diseases (CVD) diagnosis.
23/11/2011 [email protected]; [email protected]
Software Demo
23/11/2011 [email protected]; [email protected]
[email protected]; [email protected]
• The proposed IMPM has a major advantage: the modelling process starts with all relevant variables available for a person, rather than with a fixed set of variables required by a global model.
• The proposed IMPM leads to a better prognostic accuracy and a computed personalised profile;
• With global optimisation using IMPM, a small set of variables (potential markers) can be identified from the selected variable set across the whole population
• The proposed algorithms and models of IMPM are generic which can be potentially incorporated into a variety of applications for data analysis and knowledge discovery with certain constraints, such as financial risk analysis, time series data prediction, etc
• We hope that this study will motivate the applications of personalised modelling research in different research areas.
Conclusion
23/11/2011
Reference List:
1. Kasabov, N.: Global, local and personalized modelling and pattern discovery in bioinformatics: An integrated approach. Pattern Recognition Letters 28(6) (2007) 673–685.
2. Amnon Shabo. Health record banks: integrating clinical and genomic data into patientcentric longitudinal and cross-institutional health records. Personalised Medicine, 4(4):453–455, 2007.
3. Kasabov, N and Hu, Y (2011) Integrated optimisation method for personalised modelling and case study applications, Int. J. Functional Informatics and Personalised Medicine, vol. 3, no.3, pp. 236-256, 2010.
23/11/2011 [email protected]; [email protected]
Questions?
23/11/2011 [email protected]; [email protected]