discovery challenge – ecml/pkdd200 4 september 20, 2004, pisa, italy atherosclerosis
DESCRIPTION
Discovery Challenge – ECML/PKDD200 4 September 20, 2004, Pisa, Italy Atherosclerosis. Marie Tomečková EuroMISE Centre – Cardio Institute of Computer Science, Academy of Sciences of the CR, Prague, The Czech Republic - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Discovery Challenge – ECML/PKDD200 4 September 20, 2004, Pisa, Italy Atherosclerosis](https://reader036.vdocument.in/reader036/viewer/2022081519/56813854550346895d9ffc42/html5/thumbnails/1.jpg)
Discovery Challenge – ECML/PKDD2004
September 20, 2004, Pisa, Italy
Atherosclerosis
Marie Tomečková EuroMISE Centre – Cardio
Institute of Computer Science, Academy of Sciences of the CR, Prague, The Czech Republic
Supported by the project LN00B107 of the Ministry of Education of the Czech Republic
![Page 2: Discovery Challenge – ECML/PKDD200 4 September 20, 2004, Pisa, Italy Atherosclerosis](https://reader036.vdocument.in/reader036/viewer/2022081519/56813854550346895d9ffc42/html5/thumbnails/2.jpg)
Atherosclerosis
• a total complicated disease of the vessels in all organism
• a dynamic process, it begins in childhood and adolescence and continues for the whole life
• opinions on the origin and progress of the disease are developing
• interaction and influence of genetic predisposition and exterior environment
• the influence of so-called risk factors is still regarded• On the other hand – there some so-called protective
factors
![Page 3: Discovery Challenge – ECML/PKDD200 4 September 20, 2004, Pisa, Italy Atherosclerosis](https://reader036.vdocument.in/reader036/viewer/2022081519/56813854550346895d9ffc42/html5/thumbnails/3.jpg)
Risk factors of atherosclerosis
• non-affectable: sex, age, family history
• affectable:• factors of life style
• physical activity• smoking• reaction on stress
• blood pressure, metabolic factors - level of lipids and glucose, homocystein
• many other factors: coagulopathies, infections, inflammation, factors changing the function of endothelium, social and psychological factors
• combinations, clustering and interactivity: Reaven´s syndrom
![Page 4: Discovery Challenge – ECML/PKDD200 4 September 20, 2004, Pisa, Italy Atherosclerosis](https://reader036.vdocument.in/reader036/viewer/2022081519/56813854550346895d9ffc42/html5/thumbnails/4.jpg)
STULONG(acronym)
• LONGitudinal twenty years lasting STUdy of risk factors of atherosclerosis
• The study was realized in the years 1975-2000 on the 2nd Dept. of Internal Medicine, 1st Faculty of Medicine of Charles University, Prague, the Czech Republic
• The data were transferred to the electronic form by the European Centre of Medical Informatics, Statistisc, and Epidemiology of Charles University and Academy of Sciences of the Czech Republic
![Page 5: Discovery Challenge – ECML/PKDD200 4 September 20, 2004, Pisa, Italy Atherosclerosis](https://reader036.vdocument.in/reader036/viewer/2022081519/56813854550346895d9ffc42/html5/thumbnails/5.jpg)
STULONG
Main aims of the study:
• To determine prevalence of the risk factors of atherosclerosis in middle-aged men
• To follow up the development of the risk factors • To asses the possibilities and the influence of the complex
intervention on the incidence and values of the risk factors and on the cardiovascular mortality
![Page 6: Discovery Challenge – ECML/PKDD200 4 September 20, 2004, Pisa, Italy Atherosclerosis](https://reader036.vdocument.in/reader036/viewer/2022081519/56813854550346895d9ffc42/html5/thumbnails/6.jpg)
Population:
• urban population of middle-aged men (centre of Prague)
• 2370 men have been invited
• 1417 men have been examined, the respondence was 59%
• Middle-aged men – it is the population mostly threatened by the atherosclerosis and by its consequenses
![Page 7: Discovery Challenge – ECML/PKDD200 4 September 20, 2004, Pisa, Italy Atherosclerosis](https://reader036.vdocument.in/reader036/viewer/2022081519/56813854550346895d9ffc42/html5/thumbnails/7.jpg)
Definition of risk factors
blood pressure 160/95 mm Hg
cholesterol 260 mg% (6,7 mmol/l)
smoking 15 cigarettes/day
obesity 15% above optimal weight
positive family history prematured death on the atherosclerotic diseases (parents, siblings)
![Page 8: Discovery Challenge – ECML/PKDD200 4 September 20, 2004, Pisa, Italy Atherosclerosis](https://reader036.vdocument.in/reader036/viewer/2022081519/56813854550346895d9ffc42/html5/thumbnails/8.jpg)
STULONG - analysis
Statistical - descriptive statistics
- logistic regression
- survival analysis
Data mining - different methods
- resulting in different conclusions
![Page 9: Discovery Challenge – ECML/PKDD200 4 September 20, 2004, Pisa, Italy Atherosclerosis](https://reader036.vdocument.in/reader036/viewer/2022081519/56813854550346895d9ffc42/html5/thumbnails/9.jpg)
Basic characteristics of men in STULONG (risk group - at least 1 RF, without the disease))
• Prevalence of risk factors at the entry
RF n %
hypercholesterolemia 290 34.2
hypertension 287 34.0
smoking 543 63.3 !!!
obesity 196 23.0
positive family history 216 25.3
![Page 10: Discovery Challenge – ECML/PKDD200 4 September 20, 2004, Pisa, Italy Atherosclerosis](https://reader036.vdocument.in/reader036/viewer/2022081519/56813854550346895d9ffc42/html5/thumbnails/10.jpg)
Prevalence of risk factors in risk group
63,1
34,3 34,2 25,4 23,2
36,9
65,7 65,8 74,6 76,8
0%
50%
100%
smoking(n=860)
TCH(n=851)
HT(n=848)
RA(n=855)
obesity(n=856)
yes no
![Page 11: Discovery Challenge – ECML/PKDD200 4 September 20, 2004, Pisa, Italy Atherosclerosis](https://reader036.vdocument.in/reader036/viewer/2022081519/56813854550346895d9ffc42/html5/thumbnails/11.jpg)
Basic characteristics of men in STULONG(risk group, age 46.1±3.6)
mean s (±)
Nr of RF 1.7 2.0
cholesterol (mmol/l)
6.25 5.4
systolic blood pressure (mm
Hg)
134.4 67.3
diastolic blood pressure (mm
Hg)
85.3 47.5
Nr of cig/day 9.4 25.0
Brocca index (%) 106.8 47.4
![Page 12: Discovery Challenge – ECML/PKDD200 4 September 20, 2004, Pisa, Italy Atherosclerosis](https://reader036.vdocument.in/reader036/viewer/2022081519/56813854550346895d9ffc42/html5/thumbnails/12.jpg)
Mortality depending on the number of RF(atherosclerotic cardiovascular diseases)
1,6
5,0
8,5
10,8
20,0
0
5
10
15
20
25
0 1 2 3 4number of RFA
per 1000
![Page 13: Discovery Challenge – ECML/PKDD200 4 September 20, 2004, Pisa, Italy Atherosclerosis](https://reader036.vdocument.in/reader036/viewer/2022081519/56813854550346895d9ffc42/html5/thumbnails/13.jpg)
Survival analysis
N 1 RF 2 RF 3 RF 4 RF
10. year 99 % 98 % 93 % 93 % 91 %
20. year 97 % 89 % 84 % 80 % 63 %
![Page 14: Discovery Challenge – ECML/PKDD200 4 September 20, 2004, Pisa, Italy Atherosclerosis](https://reader036.vdocument.in/reader036/viewer/2022081519/56813854550346895d9ffc42/html5/thumbnails/14.jpg)
The relative risk of death caused by atheroslcerotic CVD
variable cathegory RR p 35 - 44 1,00 age 45 - 55 2,08 0,001
basic 1,00 vocational 0,80 0,398
middle 0,67 0,151 education university 0,36 0,003
no 1,00 smoking yes 2,36 <0,001
<=140/90 mm Hg 1,00 141/91-159/94 mm
Hg 1,36 0,301 PB
>=160/95 mmHg 2,49 <0,001
<5,2 mmol/l 1,00 5,2-6,6 mmol/l 1,50 0,154 TCH >= 6,7 mmol/l 1,87 0,034
![Page 15: Discovery Challenge – ECML/PKDD200 4 September 20, 2004, Pisa, Italy Atherosclerosis](https://reader036.vdocument.in/reader036/viewer/2022081519/56813854550346895d9ffc42/html5/thumbnails/15.jpg)
Discovery Challenges
Atherosclerosis – growing number of the papers
• 2002 – Helsinki …….…5 papers• 2003 – Cavtat …………9 papers• 2004 – Pisa ………….. 11 papers
![Page 16: Discovery Challenge – ECML/PKDD200 4 September 20, 2004, Pisa, Italy Atherosclerosis](https://reader036.vdocument.in/reader036/viewer/2022081519/56813854550346895d9ffc42/html5/thumbnails/16.jpg)
Four data files for
analysis – data mining
Entry - attributes obtained from entry examination – 1417 men –244 attributes of each men
Control – attributes recorded during the follow up (changing of the social and health status, values of follow risk factors, therapy …) – 10 600 investigations – each with 66 attributes
Letter – additional information collected at the end of the study by the postal questionnaire (men, who disscharged the following) - 403 men – 62 attributes of each men
Death – date and cause of death – 389 men
![Page 17: Discovery Challenge – ECML/PKDD200 4 September 20, 2004, Pisa, Italy Atherosclerosis](https://reader036.vdocument.in/reader036/viewer/2022081519/56813854550346895d9ffc42/html5/thumbnails/17.jpg)
Four groups of analytic questions
Related to
the entry examination the long - term observation – follow-up the postal questionnaire – at the end of the study the relations concerning entry examination, control
examination, and death
![Page 18: Discovery Challenge – ECML/PKDD200 4 September 20, 2004, Pisa, Italy Atherosclerosis](https://reader036.vdocument.in/reader036/viewer/2022081519/56813854550346895d9ffc42/html5/thumbnails/18.jpg)
Approaches to solve the analytic questions – 1:
given in the past Discovery Challenges
• Univariated and bivariated data analysis
• Assiciation rules
• SDS rules (Set Differs of Set)
• Trend analysis
• Time windows analysis
• ROC analysis
• Disciminate function
![Page 19: Discovery Challenge – ECML/PKDD200 4 September 20, 2004, Pisa, Italy Atherosclerosis](https://reader036.vdocument.in/reader036/viewer/2022081519/56813854550346895d9ffc42/html5/thumbnails/19.jpg)
Approaches to solve the analytic questions – 2:
• Fuzzy approximate dependencies, fuzzy logic
• Functional dependencies
• Inductive logic programming technigue
• Explicit relations
• The selection of the strongest emerging patterns
• Genetic approach
• Approach to generate a mathematical algebraic model
![Page 20: Discovery Challenge – ECML/PKDD200 4 September 20, 2004, Pisa, Italy Atherosclerosis](https://reader036.vdocument.in/reader036/viewer/2022081519/56813854550346895d9ffc42/html5/thumbnails/20.jpg)
Analytic guestions - some results
• Protective influence of number of the visits
• Protective influence of the beer drinking, but not of the wine drinking
• Correlation of Body Mass Index with the skin foldes – very good discrimination of the three basic groups of men (normal, risk, pathological)
![Page 21: Discovery Challenge – ECML/PKDD200 4 September 20, 2004, Pisa, Italy Atherosclerosis](https://reader036.vdocument.in/reader036/viewer/2022081519/56813854550346895d9ffc42/html5/thumbnails/21.jpg)
Further use and publications of the STULONG data
are possible only under the condition of the following explicit quotation:
„The study (STULONG) was realized at the 2nd Department of Internal Medicine, 1st Faculty of Medicine of Charles University and University Hospital, Prague 2, Czech Republic (head Prof. M. Aschermann, MD, SDr, FECS), under the supervision of Prof.F. Boudík, MD, SDr, with the collaboration of M. Tomečková, MD,PhD, and Ass. Prof. J. Bultas, MD, PhD. The data were transferredto the electronic form by the European Centre of Medical Informatics,Statistisc, and Epidemiology of Charles University and Academy ofSciences of Czech Republic (head Prof. RNDr J. Zvárová, SDr).”
At present time, the data analysis is supported by the project Nr.LN 00B 107 of the Ministry of Education of the CR.
![Page 22: Discovery Challenge – ECML/PKDD200 4 September 20, 2004, Pisa, Italy Atherosclerosis](https://reader036.vdocument.in/reader036/viewer/2022081519/56813854550346895d9ffc42/html5/thumbnails/22.jpg)
Thank youThank you
for your effort in the for your effort in the SSTULONG data set TULONG data set
analysisanalysis and for your attention and for your attention
Marie Tomečková
EuroMISE Centre – Cardio
Pod Vodárenskou věží 2
182 07 Prague, The Czech Republic
http://www.euromise.cz