datamining @ artreat

28
Datamining @ ARTreat Veljko Milutinović [email protected] Zoran Babović [email protected] Nenad Korolija [email protected] Goran Rakočević [email protected] Marko Novaković [email protected]

Upload: pisces

Post on 15-Jan-2016

42 views

Category:

Documents


0 download

DESCRIPTION

Datamining @ ARTreat. Veljko Milutinović [email protected] Zoran Babović [email protected] Nenad Korolija [email protected] Goran Rakočević [email protected] Marko Novaković [email protected]. Agenda. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Datamining @ ARTreat

Datamining @ ARTreat

Veljko Milutinović [email protected] Babović [email protected] Korolija [email protected] Rakočević [email protected] Novaković [email protected]

Page 2: Datamining @ ARTreat

2/28

Agenda

ARTReat – the project

Arteriosclerosis – the basics

Plaque classification

Hemodynamic analysis

Data mining for the hemodynamic problem

Data mining from patent records

Page 3: Datamining @ ARTreat

3/28

ARTreat – the project

ARTreat targets at providing a patient-specific computational modelof the cardiovascular system, used to improve the quality of predictionfor the atherosclerosis progression and propagation into life-threatening events.

FP7 Large-scale Integrating Project (IP)

16 partners

Funding: 10,000,000 €

Page 4: Datamining @ ARTreat

4/28

Atherosclerosis

Atherosclerosis is the condition in which an artery wall thickens as the result of a build-up of fatty materials such as cholesterol

Page 5: Datamining @ ARTreat

5/28

Artheriosclerotic plaque

Begins as a fatty streak, an ill-defined yellow lesion–fatty plaque, develops edges that evolve to fibrous plaques, whitish lesions with a grumous lipid-rich core

Page 6: Datamining @ ARTreat

6/28

Plaque components

Fibrous, Lipid, Calcified, Intra-plaque Hemorrhage

Page 7: Datamining @ ARTreat

7/28

Plaque classification

Different types of plaque pose different risks

Manual plaque classification (done by doctors)is a difficult task, and is error prone

Idea: develop an AI algorithmto distinguish between different types of plaque

Visual data mining

Page 8: Datamining @ ARTreat

8/28

Plaque classification (2)

Developed by Foundation for Research and Technology

Based on Support Vector Machines

Looks at images produced by IVUS and MRIand are hand labeled by physicians

Up to 90% accurate

Page 9: Datamining @ ARTreat

9/28

Data mining task in Belgrade

Two separate paths: Data mining from the results of hemodynamic

simulations Data mining form medical patient records

Goal: to provide input regarding the progression of the diseaseto be used for medical decision support

Page 10: Datamining @ ARTreat

10/28

Hemodynamics – the basics

Study of the flow of blood through the blood vessels

Maximum Wall Shear Stress –

an important parameterfor plaque development prognoses

Page 11: Datamining @ ARTreat

11/28

Hemodynamics - CFD

Classical methods for hemodynamic calculations employ Computer Fluid Dynamics (CFD) methods

Involves solving the Navier-Stokes equation:

…but involves solving it millions of times!

One simulation can take weeks

Page 12: Datamining @ ARTreat

12/28

Data mining form hemodynamic simulations (first path)

Idea: use results of previously done simulations

Train a data mining AI system capable of regression analysis

Use the system to estimate the desired valuesin a much shorter time

Page 13: Datamining @ ARTreat

13/28

Neural Networks - background

Systems that are inspired by the principle of operationof biological neural systems (brain)

Page 14: Datamining @ ARTreat

14/28

Neural Networks – the basics

A parallel, distributed information processing structure

Each processing element has a single output which branches (“fans out”) into as many collateral connections as desired

One input, one output and one or more hidden layers

Page 15: Datamining @ ARTreat

15/28

Artificial neurons

Each node (neuron) consists of two segments: Integration function Activation function

Common activation function Sigmoid

Page 16: Datamining @ ARTreat

16/28

Neural Networks - backpropagation

A training method for neural networks

Try to minimize the error function:by adjusting the weights

Gradient descent:

Calculate the “blame” of each input for the output error

Adjust the weights by:(γ- the learning rate)

Page 17: Datamining @ ARTreat

17/28

Input data set

Carotid artery

11 geometric parameters and the MWSS value

Page 18: Datamining @ ARTreat

18/28

The model

One hidden layer

Input layer: linear

Hidden and output: sigmoid

Learning rate 0.6

500K training cycles

Decay and momentum

Page 19: Datamining @ ARTreat

19/28

Current results

Average error: 8.6%

Maximum error 16,9%

Page 20: Datamining @ ARTreat

20/28

The “dreaded” line 4

Line 4 of the original test set proved difficult to predict

Error was over 30%

Turned out to be an outlier

Combination of parameters was such that it couldn’t

But the CFD worked, NN worked

Visually the geometry looked fine

Goes to show how challenging the data preprocessing can be

Page 21: Datamining @ ARTreat

Dataset analysis Two distinct areas of MWSS values:

the subset with lower values of MWSS, where a similar clear pattern can be seen against all of the input variables,

scattered cloud of values in the subset with higher MWSS values.

Histogram shows the majority of values grouped in the lower half of the values in the set, with only a small number of points in the higher half.

21

Page 22: Datamining @ ARTreat

MWSS value prediction

Two approaches:

Single model

Two models: one for the low MWSS value data, one for higher values, classifier to choose the appropriate model

Models based on Linear Regression and SVM

22

Page 23: Datamining @ ARTreat

Results

Model Root square mean error Correlation coef.

Single model LR 19% 0.7

Single model SVM 17% 0.77

Low value model LR 11% 0.81

Low value model SVM 7% 0.91

High value model LR 42% 0.21

High value model SVM 31% 0.07

23

Classifier Correctly classified Kappa F measure

SVM 93.2% 0.64 0.517

Poor results for higher values of MWSS – insufficient values to train a model

Page 24: Datamining @ ARTreat

MWSS position

A few outliers and “strange” values in the data set

After elimination:

24

Coordinate LR SVM

RSME CC RSME CC

X 0.2389 0.9721 0.277 0.9691

Y 0.1733 0.8953 0.1671 0.9136

Z 0.0736 0.8086 0.1221 0.8304

Further investigation needed into the data and the “outlier” values, although it is only a small number of them

Page 25: Datamining @ ARTreat

25/28

Genetic data

Single coronary angiography

Blood chemistry

Medications

Single Nucleotide Polymorphism (SNP) data on selected DNA sequences

Page 26: Datamining @ ARTreat

26/28

…and now for something completely different

Page 27: Datamining @ ARTreat

27/28

Questions

Page 28: Datamining @ ARTreat

Datamining @ ARTreat Project

Veljko Milutinović [email protected] Babović [email protected] Korolija [email protected] Rakočević [email protected] Novaković [email protected]