analysis of death causes in the stulong data set jan burian, jan rauch euromise – cardio...

15
Analysis of Death Analysis of Death Causes in the Causes in the STULONG Data Set STULONG Data Set Jan Burian, Jan Rauch Jan Burian, Jan Rauch EuroMISE – Cardio EuroMISE – Cardio University of Economics University of Economics Prague Prague

Upload: meredith-hamilton

Post on 31-Dec-2015

220 views

Category:

Documents


7 download

TRANSCRIPT

Page 1: Analysis of Death Causes in the STULONG Data Set Jan Burian, Jan Rauch EuroMISE – Cardio University of Economics Prague

Analysis of Death Causes Analysis of Death Causes in the STULONG Data Setin the STULONG Data Set

Jan Burian, Jan RauchJan Burian, Jan Rauch

EuroMISE – CardioEuroMISE – Cardio

University of EconomicsUniversity of Economics PraguePrague

Page 2: Analysis of Death Causes in the STULONG Data Set Jan Burian, Jan Rauch EuroMISE – Cardio University of Economics Prague

Discovery Challenge 2003Discovery Challenge 2003 22

DEATH CAUSE DEATH CAUSE PATIENTSPATIENTS %%myocardial infarctionmyocardial infarction 80 80 20.620.6coronary heart diseasecoronary heart disease 3333 8.58.5stroke stroke 30 30 7.77.7other causesother causes 79 79 20.320.3sudden deathsudden death 2323 5.95.9unknownunknown 88 2.02.0tumorous diseasetumorous disease 114114 29.329.3general atherosclerosisgeneral atherosclerosis 2222 5.75.7

TOTAL TOTAL 389389 100.0100.0

Page 3: Analysis of Death Causes in the STULONG Data Set Jan Burian, Jan Rauch EuroMISE – Cardio University of Economics Prague

Discovery Challenge 2003Discovery Challenge 2003 33

Data matrix ENTRY

General characteristics Examinations Vices

Marital status

Transport to a job

Physical activity in a job

Activity after a job

Education

Responsibility

Age

Weight

Height

Chest pain

Breathlesness

Cholesterol

Urine

Subscapular

Triceps

Alcohol

Liquors

Beer 10

Beer 12

Wine

Smoking

Former smoker Duration of smoking

Tea

Sugar

Coffee

Page 4: Analysis of Death Causes in the STULONG Data Set Jan Burian, Jan Rauch EuroMISE – Cardio University of Economics Prague

Discovery Challenge 2003Discovery Challenge 2003 44

Analytic questions

Are there strong relations concerning death cause?

General characteristics (?) Death cause (?)

Examinations (?) Death cause (?)

Vices(?) Death cause (?)

Combinations (?) Death cause (?)

Page 5: Analysis of Death Causes in the STULONG Data Set Jan Burian, Jan Rauch EuroMISE – Cardio University of Economics Prague

Discovery Challenge 2003Discovery Challenge 2003 55

Example of relation: founded implication

A Cholesterol<250;273> & Coffee(3 and more cups)

0.63;15 Death cause (tumorous disease) S

S ¬S

A 15 9 24

¬ A 99 266 365

114 275 389

63% of patients satisfying A satisfy also S

there are 15 patients satisfying both A and S

Page 6: Analysis of Death Causes in the STULONG Data Set Jan Burian, Jan Rauch EuroMISE – Cardio University of Economics Prague

Discovery Challenge 2003Discovery Challenge 2003 66

Example of relation: above average

A Age( 65) +0.76;15 Death cause (general atherosclerosis) S

A Age( 65) 0.1;15 Death cause (general atherosclerosis) S

S ¬S

A 15 136 151

¬ A 7 231 238

22 275 389

relative frequency of S: 22/389 = 0.057

relative frequency of S if A: 15/151 = 0.099

relative frequency of S if A is 76 per cent higher than the relative frequency of S

there are 15 patients satisfying both A and S

Page 7: Analysis of Death Causes in the STULONG Data Set Jan Burian, Jan Rauch EuroMISE – Cardio University of Economics Prague

Discovery Challenge 2003Discovery Challenge 2003 77

Liquors(?) & Smoking(?) +0.55;15 Death cause(?)

Alcohol(?) & Tea(?) +0.55;15 Death cause(?)

Beer 12(?) & Wine(?) +0.55;15 Death cause(?)

Liquors(?) & Smoking(?) & Coffee(?) & Beer 12(?) +0.55;15 Death cause(?)

????? +0.55;15 Death cause(?)

Vices(?) +0.55;15 Death cause (?)

For which combinations of vices is relative frequency of some death causes at least 55 per cent higher than relative frequency of the same death cause among all patients ?

We require at least 15 patients with particular death cause satisfying both particular condition.

Example of task

Page 8: Analysis of Death Causes in the STULONG Data Set Jan Burian, Jan Rauch EuroMISE – Cardio University of Economics Prague

Discovery Challenge 2003Discovery Challenge 2003 88

4ft-Miner application Vices(?) +

0.55;15 Death cause (?)

Vices(?) = Antecedent +0.75;15

Death cause(?)

Page 9: Analysis of Death Causes in the STULONG Data Set Jan Burian, Jan Rauch EuroMISE – Cardio University of Economics Prague

Discovery Challenge 2003Discovery Challenge 2003 99

Dealing with attributesAn example – Age

Predefined intervals length 10: Age<40,50), Age<50,60), …, Age <70,80)

Predefined intervals length 5: Age<40,45), Age<45,50), … Age <70,75)

Sliding window length 10

Sliding window length 5

Sliding window length 2

Page 10: Analysis of Death Causes in the STULONG Data Set Jan Burian, Jan Rauch EuroMISE – Cardio University of Economics Prague

Discovery Challenge 2003Discovery Challenge 2003 1010

Sliding window length 544, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, ....., 67, 68, 69, 70, 71, 72, 73, 74

44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, ....., 67, 68, 69, 70, 71, 72, 73, 74

44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, ....., 67, 68, 69, 70, 71, 72, 73, 74

...........

44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, ....., 67, 68, 69, 70, 71, 72, 73, 74

44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, ....., 67, 68, 69, 70, 71, 72, 73, 74

Page 11: Analysis of Death Causes in the STULONG Data Set Jan Burian, Jan Rauch EuroMISE – Cardio University of Economics Prague

Discovery Challenge 2003Discovery Challenge 2003 1111

Dealing with attributesAn other example – Marital status

Marital status(divorced) – 39 patients

Marital status(single) – 28 patients

81.5 %

10.0 % 7.2 % 1.3 %

Page 12: Analysis of Death Causes in the STULONG Data Set Jan Burian, Jan Rauch EuroMISE – Cardio University of Economics Prague

Discovery Challenge 2003Discovery Challenge 2003 1212

Dealing with attributesSome further examples

Predefined intervals, sliding windows Cholesterol Subscapular Height, Weight, …

Particular values Activity after job Physical activity in a job Education Transport Responsibility …

Page 13: Analysis of Death Causes in the STULONG Data Set Jan Burian, Jan Rauch EuroMISE – Cardio University of Economics Prague

Discovery Challenge 2003Discovery Challenge 2003 1313

4ft-Miner result example

Beer 12(yes) & Vine(yes) +0.55;15 Death cause (tumorous disease)

Page 14: Analysis of Death Causes in the STULONG Data Set Jan Burian, Jan Rauch EuroMISE – Cardio University of Economics Prague

Discovery Challenge 2003Discovery Challenge 2003 1414

Tasks: Antecedent Death cause (?)Antecedent rules verifications

General characteristics

(9 attributes)

0.5;15 6 70 422

+0.75;15 3 58 685

Examinations

(6 attributes)

0.5;15 1 5 754

+0.5;15 5 16 836

Vices

(5 attributes)

0.5;15 0 22 755

+0.55;15 9 20 610

Combinations

1 general + 1 other

0.5;15 11 186 690

+0.75;15 22 294 288

Solution time in all cases ≤ 8 sec Intel Pentium on 3Ghz, 512 MB RAM

Page 15: Analysis of Death Causes in the STULONG Data Set Jan Burian, Jan Rauch EuroMISE – Cardio University of Economics Prague

Discovery Challenge 2003Discovery Challenge 2003 1515

Conclusions

Only 389 patients with death code

Some potentially interesting rules

Fast work with 4ft-Miner

Possibility of tuning work with attributes

predefined intervals,

sliding windows