prediction of costs associated with the hospital …...associated diagnosis: b20*, b21*, b22*, b23*,...
Post on 28-Jun-2020
1 Views
Preview:
TRANSCRIPT
Figure 3THE 4 STEPS OF THE DATA MINING APPROACH
26,706patients
30,294patients
3,463patients
€ 17,602€ 4,473
THE ALGORITHM SPLITS PATIENTSINTO 2 SUBGROUPSusing the most discriminative variable for the cost
REPETITION OF STEP 2 ON EACH SUBGROUP UNTIL
48ys
GENDER MEAN AGE COMORBIDITES OPPORTUNISTICINFECTION
AVERAGE COSTEXPLAINED VARIABLE
10 5
PRESENCE OF AN ENDOCRINO-METABOLIC
COMPLICATION ?
2,447patients
€ 13,084
1,016patients
€ 28,484
NO YESNO YES
25,369patients
€ 3,850
1,337patients
€ 16,293
NO YES
21,980patients
€ 1,280
3,389patients
€ 9,190
NO YES
804patients
€ 12,016
533patients
€ 22,746
A DATA SCIENTISTCONFIGURES THE ALGORITHM
Information gain criterion
Learning sample ratio
Maximal depth
Observations weights
Statistical pruning
Stopping criterion
€ 5,981MEAN ANNUAL COST / PATIENT
34% 66%
NO YES
SELECTION OF ALL THE PATIENTS’ FEATURES
PRESENCE OF ABACTERIAL INFECTION ?
REMAINING VARIABLES ARE NOT DISCRIMINATIVE ENOUGH
OR
TOO FEW PATIENTS IN EACH SUBGROUP TO BE SPLIT AGAIN
OR
THE NUMBER OF SUBGROUPS REACHES A GIVEN THRESHOLD
PRESENCE OFA CANCER ?
PRESENCE OFA CANCER ?
1 - No presence of infection not associated to HIV.
2 - Presence of a bacterial infection
3 - Absence of an endocrino-metabolic comorbidity
EXEMPLE OF A PROFILE
RESULTS : 165 PROFILESeach leaf of the tree is a profile of patients
88,1%
83,7%
72,6% 2,6%
11,4%
4,4%
11,2% 1,8%
8,1% 3,5%
804patients
€ 12,016
DID THE PATIENTS HAVE AN INFECTION UNRELATED TO HIV ?
see Figure 2
Prediction of costs associatedwith the hospital management of
HIV patients in France
ADVANCED DATA MINING APPROACH
Martin PRODEL1,3, Camille AMAZ1, Alexandre VAINCHTOCK1, Laurent FINKIELSZTEJN2, Céline AUBIN2
1HEVA, Lyon, France ; 2ViiV Healthcare, Marly-le-Roi, France ; 3Mines, Saint-Étienne, France
Data mining is not new and has been efficiently employed in others domains (Bank, Cybernetics, Marketing, Energy, etc.)1. The amount of data collected in medical information systems is tremendous, but much data remains unused because of their complexity2. This still increasing volume of data requires new analytical approaches that are efficient, sensitive and better than classical statistics to handle Big Data1.
INTRODUCTION
CONCLUSION
CRITERIADATA MINING STATISTICS
PARADIGM
MAIN FEATURE
SECOND FEATURE
EXAMPLES OF TECHNIQUES
VOLUME
TYPICAL RESULTS
Explanatory approach, discover the unknown with no preconception
Unrivalled quality of predictions and explanations, using cross-validations. Use extrapolation for missing data
High adaptability and reusability in other domains
Decision Tree, Neural Network, Bayesian, Genetic Algorithm, Support Vector Machine
Capable of dealing with millions of data (“Big data”)
Data mining selects and tells you the non-linear combinations of many variables that best explain the value of the target variable.
(Un)-validate predefined hypothesis, risk of experimenter bias
Precise measurement of uncertainty, comparison of 2 populations, confidence assessment of measures
Mathematical definition of statistical tests ensures significant results
Principal Component Analysis, p-value testing, multivariate & univariate regression
Suitable for middle size datasets (< 1 million observations)
Logistic regression tells you that X% of the cost is explained by having this or this comorbidity, PCA tells you if variables are redundant.
OBJECTIVE
METHODS
RESULTS
The main objective of this study is to assess the capability of a state-of-the-art data mining technique to be applied on a healthcare database in order to better understand drivers of healthcare expenditure and the management of diseases.We utilized the approach in order to explain the patient features that are the main drivers of cost associated with HIV patients' hospital management. A secondary objective is to assess how reproducible such an approach is on other medical databases with potentially different patient features.
Study period: January 2013 - December 2014Number of HIV patients: 30,294Number of stays: 70,180Total annual cost: 180 million €Mean annual cost per patient : 5,981 € (+/- 10,661)Nb of patient profiles found using Data Mining: 165
KEY FIGURES
The PMSI-MSO (French Medical Information System - Medicine, Surgery, Obstetric units) database was used to extract all hospital stays in 2013 with at least one of the following HIV ICD-10 (International Classification of Diseases, 10th revision) codes as principal diagnosis, related diagnosis or significantly associated diagnosis: B20*, B21*, B22*, B23*, B24*.
DATA EXTRACTION
The cost of stays were evaluated from a NHS perspective. The PMSI database provides many details about stays’ cost as it was specifically designed for economic evaluation of hospital activities.
HIV HOSPITALIZATION COST
Data were analyzed with a supervised-learning data mining technique, an Enhanced Decision Tree algorithm based on Breiman’ CART model3 (figure 3).
OUR DATA MINING ALGORITHM
JESS 201720 avril 2017
Télécom ParisTech
The present study is a pilot that successfully demonstrates how a data mining technique, from the field of Artificial Intelligence, can help us better understand Hospital costs for HIV patients: our data mining algorithm identified specific patient profiles which explain the differentiating cost drivers in HIV inpatient care. Classical statistical approaches would struggle to provide such detailed profiles with numerous combinations of variables.
1 - This approach could work with other data sources, especially with more clinical and laboratory data (e.g.: viral load).It can also explain drivers for any available target variables (cost, death, patients virologically suppressed or patients dropping out of care).2 - Such collaborative efforts between health care professionals and engineers can lead to enhanced uses of health data that can provide new answers and thus improve disease managementExample: Preventing a bacterial infection in newly diagnosed HIV patients with Endocrino-Metabolic comorbidities, is much more cost reducing than preventing a fungal infection.
3 - Based on results of Data Mining analyses, clinicians will know the medical interventions to focus on and prioritize.Example: Results show that newly diagnosed HIV patients are more likely to get bacterial infections if they already have neurological and renal comorbidities (=predictive capabilities).
REFERENCES : 1Adem Karahoca, Dilek Karahoca and Mert Şanver (2012). Survey of Data Mining and Applications (Review from 1996 to Now), Data Mining Applications in Engineering and Medicine, Associate Prof. Adem Karahoca (Ed.), InTech, DOI: 10.5772/48803.2Herland, Matthew, Taghi M Khoshgoftaar, et Randall Wald. «A review of data mining using big data in health informatics.» Journal Of Big Data, 2014: 1--35.3Breiman, L., Friedman, J.H., Olshen, R.A., and Stone, C.J. (1984) “Classification and Regression Trees”, Wadsworth, Belmont, CA. Republished by CRC Press
Figure 1METHODOLOGICAL DIVERGENCES BETWEEN DATA MINING AND STATISTICS
For each patient hospitalized once in 2013 with an HIV code, a 12 months follow-up was performed to capture any further hospital stay. The same algorithm and medical review as for the inclusion stay were conducted on these additional stays, in order to exclude stays not related to HIV. All stays were classified by HIV clinician experts into 10 groups of comorbidities and 5 groups of opportunistic infections (OI) using ICD-10 codes (figure 2).
1-YEAR FOLLOW-UP OF PATIENTS & CLASSIFICATIONS OF STAYS
€ 1,500
€ 5,981
€ 20,000
€ 30,000
€ 43,000
Annual cost / patient
30,294patients
mean cost
€ 5,981
mean cost
€ 3,850
25,369 p.
84 %
mean cost
€ 2,388
19,078 p.
63 %
1 - Infections unrelated to HIV ?
2 - Bacterial opportunistic infection ?
3 - Presence of a cancer ?
4 - Fungal opportunistic infection ?
5 - Endocrino-metabolic comorbidity ?
533 p.
2 %
mean cost
€ 22,746
632 p.
2 %PROFILE 1
ENTIRE COHORTPROFILE 2 PROFILE 3 PROFILE 4
11
3
5
5
3
4
5
2
2
mean cost
€ 33,830
1 2 3 4 5 1 2 3 4 5 1 2 3 4 51 2 3 4 5
Endocrino-metabolic
Cardiovascular
Psychiatric
Hepatic
Cancer
Neurological
Respiratory diseases
Infectionsunrelated to HIV
Renal
Bone
8,373 p.
7,616 p.
7,439 p.
6,906 p.
4,724 p.
4,576 p.
4,084 p.
3,486 p.
3,011 p.
2,885 p.
Figure 2INPUT VARIABLES FOR THE ANALYSIS OF PATIENT PROFILES WITH DATA MINING
7%5%
5%
2%
2%
BacterialFungalViral
2,058 p.
1,419 p.
1,369 p.
660 p.614 p.
ProtozoanOther
28%
25%
25%
23%
16%
15%
13%
12%
10%
10%
Figure 4DISCOVERY OF PATIENT PROFILES
NO
YES
70,180stays
M€ 180total annual cost
top related