discovering patterns in adverse drug reactions
Post on 19-Jan-2016
47 Views
Preview:
DESCRIPTION
TRANSCRIPT
Discovering Patterns in Adverse Drug Reactions
Student: Ernst Joham
Supervisor: Associate Prof Jiuyong Li
Associate Supervisor Dr. Jan Stanek
2
• Background• Motivation• Research questions• Literature review• Data Mining process• Results • Conclusion
Outline
3
• What is data mining?
Data mining is used to discover unexpected, interesting and valuable information in datasets.
• High percentage of patients admitted or prolonged hospitalisation is due to ADRS.
• What can cause ADRS?• Amount of dosage given to patients• More then one drug taken at the same time
• Ingredients in drugs which can result in adverse reaction.
Background
4
Background
• Problems with medical datasets• Medical data is more diverse and complex• Ethical and legal issues• Data quality
• Missing values• Noise
• Ownership• Lack of information
5
Motivation
• To have a successful outcome in discovering patterns for medical datasets
• Finding the most suitable algorithms to handle noise and missing values for medical datasets
• Improve complexity and diversity of medical datasets
6
• The aim of the research was to use data mining methods in an attempt to produce relevant results from real world medical data.
• The following research questions were answered
(1) Is it possible to discover patterns in spares datasets?
(2) What patterns can be identified through data mining for ADRs?
Research Questions
7
• Decision Tree, Logistic programs, K nearest neighbour and Bayesian classifier techniques have been applied to medical datasets (Laverac 1999).
• Lee et al(2000) states that techniques that easily extract specific knowledge are the key for medical decision.
• A study on drug discovery showed that neural networks performed better then logistic regression, but decision tree performed better in identifying active compounds (Obenshain 2004).
Literature review (techniques)
8
• Medical data mining applications that is expected to discover new knowledge should follow a five stage process model (Wang 2000).
• planning tasks • developing data mining hypotheses• preparing data• selecting data mining tools• evaluating data mining results.
• Cios & Moore 2002 state that for success you need to follow the DMKD that adds several steps to the CRISP-DM model that has been applied to several medical problem domains.
Literature review (process model)
9
• Brown & Kros (2003) focused on the impact of missing data and how existing methods can help.
They categories methods for dealing with missing data into:
• Use complete data only• Delete selected case or variables• Data imputation• Model-based approaches
• Some researchers have focused on data cleansing tools to help eliminate noise but this can only achieve a reasonable result (Zhu & Wu 2004).
Literature review (problems with medical datasets)
10
• (Zhu & Wu 2004). Attribute noise is more difficult to handle and include:
• (1) Incorrect attribute values• (2) Missing or don’t know attribute values• (3) Incomplete attributes or don’t care values
Literature review
11
• The project used the data mining method of CRISP_DM six step data mining process
• Understand the main aim of the project• Understand the dataset
ADRDATE Agedays BRAND DRUG ID Prob ROUTE Recov Severity URNO ATC
31/01/2007 Lyclear Permethrin 707 Cert Topical Rec Minor unknown P03AC04
9/06/2003 14367 Tegretol CR Carbamazepine 4 Cert Oral Rec ax6cx8z N03AF01
11/06/2003 1 4173 Zoloft Sertraline 5 Unc Oral
ax66486 N06AB06
Data Mining Processing
Data mining Process
12
ADRDATE ADEDAYS ROUTE RECOV ATCMissing valuesUnknown
0 1 570 344
188
191
NRREC
82657
Summary of missing values
Total 1286 records
13
• Data .csv format
• R programming language
• Rattle tool for data mining• Data preparation
• Remove duplicates• Correct misspelled words• Correct meanings of values• Find missing ATC values (Anatomical Therapeutic
Chemical) • Leave missing values for rest of dataset
Data Mining Process
14
• Data transformation• Date when the patient was admitted to hospital for ADRs
(October-March =1, April-September = 0)• How old the patient is categorised into equal number of
records.(0-2 years old = 1, 2-5 years old = 2, 5-11 years old = 3, 11-16 years old = 4, and above 16 years of age = 5)
• The administration of the medication that caused the ADR is either oral or intravenous.(Oral = 1, Intravenous = 0)
• Recovered from ADRs or not.(Recovered = 0, Not recovered = 1)
• The drugs given to the patient either are antibiotics or not.(Antibiotics =1, Not Antibiotics =0)
Data mining Process
15
Data Mining ProcessingADRDATE AGE
AGE
RECOV ATC
ROUTEROUTE
ROUTE
16
• Modelling phase• Logistic regression,• Decision tree,• Risk pattern algorithm
• Evaluation Phase• Deployment
Data Mining Process
17
• Results for the logistic regression technique Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.901353 0.466304 -4.077 4.55e-05 ***
ADRDATE 0.136312 0.285722 0.477 0.633
AGEDAYS 0.002067 0.115482 0.018 0.986
ROUTE 0.059532 0.290016 0.205 0.837
ANTIBIOTICS -0.181255 0.300150 -0.604 0.546
Results
18
• Decision Tree Result1) root 1035 473 1 (0.4570048 0.5429952)
2) AGE>=3.5 407 140 0 (0.6560197 0.3439803)
4) ADRDATE< 0.5 203 61 0 (0.6995074 0.3004926) *
5) ADRDATE>=0.5 204 79 0 (0.6127451 0.3872549)
10) AGE>=4.5 100 35 0 (0.6500000 0.3500000)
20) ROUTE>=0.5 79 27 0 (0.6582278 0.3417722) *
21) ROUTE< 0.5 21 8 0 (0.6190476 0.3809524)
42) RECOV=Yes 18 6 0 (0.6666667 0.3333333) *
43) RECOV=NO 3 1 1 (0.3333333 0.6666667) *
Results
19
• Decision Tree Result
11) AGE< 4.5 104 44 0 (0.5769231 0.4230769)
22) ROUTE< 0.5 77 30 0 (0.6103896 0.3896104) *
23) ROUTE>=0.5 27 13 1 (0.4814815 0.5185185) *
3) AGE< 3.5 628 206 1 (0.3280255 0.6719745)
6) ROUTE< 0.5 236 109 1 (0.4618644 0.5381356)
12) RECOV=NO 24 6 0 (0.7500000 0.2500000)
Results
20
• Risk patterns for NO1 3 3.0324 2.4852 26 9 7 ADRDATE 1 A GEDAYS 3 ANTIBIOTICS 0
2 2 3.1002 2.5582 62 46 16 AGEDAYS 3 ANTIBIOTICS 0
3 3 2.5663 2.1904 25 9 6 ADRDATE 1 AGEDAYS 4 ROUTE 1
4 3 2.5375 2.1757 34 26 8 AGEDAYS 4 ROUTE 1 ANTIBIOTICS 0
• Pattern 1 where Risk Ratio = 2.48• Agedays = between 5-11 years old• Adrdate = months between October – March• Antibiotics = No
Results
21
• Building a data mining process to answer the problem posed.
• Use algorithms that work for medical applications• Noise and missing values does pose a problem but
reasonable results can still be achieved.• More relevant patterns can be produced for medical
experts if maximum information is included in the dataset.
Conclusion
22
• Brown, ML & Kros, JF 2003, 'Data mining and the impact of missing data', Industrial Management & Data Systems, vol. 103, pp. 611-621.
• Cios, K 2002, 'Uniqueness of medical data mining', Artificial intelligence in medicine, vol. 26, no. 1-2, pp. 1-24.
• CRISP_DM 2000, Cross Industry Standard Process for Data Mining, viewed 27 August 2008, <http://www.crisp-dm.org/Partners/index.htm>.
• Li, J, Fe, AW-c, He, H, Chen, J, Jin, H, McAullay, D, Williams, G, Sparks, R & Kelman, C 2005, Mining risk patterns in medical data, ACM, Chicago, Illinois, USA.
• Lavrač, N 1999, 'Selected techniques for data mining in medicine', Artificial intelligence in medicine, vol. 16, no. 1, pp. 3-23.
• Lee, I-N, Liao, S-C & Embrechts, M 2000, 'Data mining techniques applied to medical information', Medical Informatics & the Internet in Medicine, vol. 25, no. 2, pp. 81-102.
• Obenshain, MK 2004, ‘Application of Data Mining Techniques to Healthcare Data’, Infection Control and Hospital Epidemiology, vol.25, no 8, pp. 690-695.
• Safety of Medicines 2002, A Guide to Detecting and Reporting Adverse DrugReaction Why Health Professionals Need to Take Action, WHO publications, viewed 15 April 2008, http://whqlibdoc.who.int/hq/2002/WHO_EDM_QSM_2002.2.pdf>.
• Wang, H & Wang, S 2008, 'Medical knowledge acquisition through data mining', paper presented at the IT in Medicine and Education, 2008. ITME 2008. IEEE International Symposium on, Xiamen
• Zhu, X, Khoshgoftaar, T, Davidson, I & Zhang, S 2007, 'Editorial: Special issue on mining low-quality data', Knowledge and Information Systems, vol. 11, no. 2, pp. 131-136.
Reference
top related