discovering patterns in adverse drug reactions

Discovering Patterns in Adverse Drug Reactions

Student: Ernst Joham

Supervisor: Associate Prof Jiuyong Li

Associate Supervisor Dr. Jan Stanek

• Background• Motivation• Research questions• Literature review• Data Mining process• Results • Conclusion

Outline

• What is data mining?

Data mining is used to discover unexpected, interesting and valuable information in datasets.

• High percentage of patients admitted or prolonged hospitalisation is due to ADRS.

• What can cause ADRS?• Amount of dosage given to patients• More then one drug taken at the same time

• Ingredients in drugs which can result in adverse reaction.

Background

• Problems with medical datasets• Medical data is more diverse and complex• Ethical and legal issues• Data quality

• Missing values• Noise

• Ownership• Lack of information

Motivation

• To have a successful outcome in discovering patterns for medical datasets

• Finding the most suitable algorithms to handle noise and missing values for medical datasets

• Improve complexity and diversity of medical datasets

• The aim of the research was to use data mining methods in an attempt to produce relevant results from real world medical data.

• The following research questions were answered

(1) Is it possible to discover patterns in spares datasets?

(2) What patterns can be identified through data mining for ADRs?

Research Questions

• Decision Tree, Logistic programs, K nearest neighbour and Bayesian classifier techniques have been applied to medical datasets (Laverac 1999).

• Lee et al(2000) states that techniques that easily extract specific knowledge are the key for medical decision.

• A study on drug discovery showed that neural networks performed better then logistic regression, but decision tree performed better in identifying active compounds (Obenshain 2004).

Literature review (techniques)

• Medical data mining applications that is expected to discover new knowledge should follow a five stage process model (Wang 2000).

• planning tasks • developing data mining hypotheses• preparing data• selecting data mining tools• evaluating data mining results.

• Cios & Moore 2002 state that for success you need to follow the DMKD that adds several steps to the CRISP-DM model that has been applied to several medical problem domains.

Literature review (process model)

• Brown & Kros (2003) focused on the impact of missing data and how existing methods can help.

They categories methods for dealing with missing data into:

• Use complete data only• Delete selected case or variables• Data imputation• Model-based approaches

• Some researchers have focused on data cleansing tools to help eliminate noise but this can only achieve a reasonable result (Zhu & Wu 2004).

Literature review (problems with medical datasets)

• (Zhu & Wu 2004). Attribute noise is more difficult to handle and include:

• (1) Incorrect attribute values• (2) Missing or don’t know attribute values• (3) Incomplete attributes or don’t care values

Literature review

• The project used the data mining method of CRISP_DM six step data mining process

• Understand the main aim of the project• Understand the dataset

ADRDATE Agedays BRAND DRUG ID Prob ROUTE Recov Severity URNO ATC

31/01/2007 Lyclear Permethrin 707 Cert Topical Rec Minor unknown P03AC04

9/06/2003 14367 Tegretol CR Carbamazepine 4 Cert Oral Rec ax6cx8z N03AF01

11/06/2003 1 4173 Zoloft Sertraline 5 Unc Oral

ax66486 N06AB06

Data Mining Processing

Data mining Process

ADRDATE ADEDAYS ROUTE RECOV ATCMissing valuesUnknown

0 1 570 344

Summary of missing values

Total 1286 records

• Data .csv format

• R programming language

• Rattle tool for data mining• Data preparation

• Remove duplicates• Correct misspelled words• Correct meanings of values• Find missing ATC values (Anatomical Therapeutic

Chemical) • Leave missing values for rest of dataset

Data Mining Process

• Data transformation• Date when the patient was admitted to hospital for ADRs

(October-March =1, April-September = 0)• How old the patient is categorised into equal number of

records.(0-2 years old = 1, 2-5 years old = 2, 5-11 years old = 3, 11-16 years old = 4, and above 16 years of age = 5)

• The administration of the medication that caused the ADR is either oral or intravenous.(Oral = 1, Intravenous = 0)

• Recovered from ADRs or not.(Recovered = 0, Not recovered = 1)

• The drugs given to the patient either are antibiotics or not.(Antibiotics =1, Not Antibiotics =0)

Data mining Process

Data Mining ProcessingADRDATE AGE

RECOV ATC

ROUTEROUTE

• Modelling phase• Logistic regression,• Decision tree,• Risk pattern algorithm

• Evaluation Phase• Deployment

Data Mining Process

• Results for the logistic regression technique Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) -1.901353 0.466304 -4.077 4.55e-05 ***

ADRDATE 0.136312 0.285722 0.477 0.633

AGEDAYS 0.002067 0.115482 0.018 0.986

ROUTE 0.059532 0.290016 0.205 0.837

ANTIBIOTICS -0.181255 0.300150 -0.604 0.546

Results

• Decision Tree Result1) root 1035 473 1 (0.4570048 0.5429952)

2) AGE>=3.5 407 140 0 (0.6560197 0.3439803)

4) ADRDATE< 0.5 203 61 0 (0.6995074 0.3004926) *

5) ADRDATE>=0.5 204 79 0 (0.6127451 0.3872549)

10) AGE>=4.5 100 35 0 (0.6500000 0.3500000)

20) ROUTE>=0.5 79 27 0 (0.6582278 0.3417722) *

21) ROUTE< 0.5 21 8 0 (0.6190476 0.3809524)

42) RECOV=Yes 18 6 0 (0.6666667 0.3333333) *

43) RECOV=NO 3 1 1 (0.3333333 0.6666667) *

Results

• Decision Tree Result

11) AGE< 4.5 104 44 0 (0.5769231 0.4230769)

22) ROUTE< 0.5 77 30 0 (0.6103896 0.3896104) *

23) ROUTE>=0.5 27 13 1 (0.4814815 0.5185185) *

3) AGE< 3.5 628 206 1 (0.3280255 0.6719745)

6) ROUTE< 0.5 236 109 1 (0.4618644 0.5381356)

12) RECOV=NO 24 6 0 (0.7500000 0.2500000)

Results

• Risk patterns for NO1 3 3.0324 2.4852 26 9 7 ADRDATE 1 A GEDAYS 3 ANTIBIOTICS 0

2 2 3.1002 2.5582 62 46 16 AGEDAYS 3 ANTIBIOTICS 0

3 3 2.5663 2.1904 25 9 6 ADRDATE 1 AGEDAYS 4 ROUTE 1

4 3 2.5375 2.1757 34 26 8 AGEDAYS 4 ROUTE 1 ANTIBIOTICS 0

• Pattern 1 where Risk Ratio = 2.48• Agedays = between 5-11 years old• Adrdate = months between October – March• Antibiotics = No

Results

• Building a data mining process to answer the problem posed.

• Use algorithms that work for medical applications• Noise and missing values does pose a problem but

reasonable results can still be achieved.• More relevant patterns can be produced for medical

experts if maximum information is included in the dataset.

Conclusion

• Brown, ML & Kros, JF 2003, 'Data mining and the impact of missing data', Industrial Management & Data Systems, vol. 103, pp. 611-621.

• Cios, K 2002, 'Uniqueness of medical data mining', Artificial intelligence in medicine, vol. 26, no. 1-2, pp. 1-24.

• CRISP_DM 2000, Cross Industry Standard Process for Data Mining, viewed 27 August 2008, <http://www.crisp-dm.org/Partners/index.htm>.

• Li, J, Fe, AW-c, He, H, Chen, J, Jin, H, McAullay, D, Williams, G, Sparks, R & Kelman, C 2005, Mining risk patterns in medical data, ACM, Chicago, Illinois, USA.

• Lavrač, N 1999, 'Selected techniques for data mining in medicine', Artificial intelligence in medicine, vol. 16, no. 1, pp. 3-23.

• Lee, I-N, Liao, S-C & Embrechts, M 2000, 'Data mining techniques applied to medical information', Medical Informatics & the Internet in Medicine, vol. 25, no. 2, pp. 81-102.

• Obenshain, MK 2004, ‘Application of Data Mining Techniques to Healthcare Data’, Infection Control and Hospital Epidemiology, vol.25, no 8, pp. 690-695.

• Safety of Medicines 2002, A Guide to Detecting and Reporting Adverse DrugReaction Why Health Professionals Need to Take Action, WHO publications, viewed 15 April 2008, http://whqlibdoc.who.int/hq/2002/WHO_EDM_QSM_2002.2.pdf>.

• Wang, H & Wang, S 2008, 'Medical knowledge acquisition through data mining', paper presented at the IT in Medicine and Education, 2008. ITME 2008. IEEE International Symposium on, Xiamen

• Zhu, X, Khoshgoftaar, T, Davidson, I & Zhang, S 2007, 'Editorial: Special issue on mining low-quality data', Knowledge and Information Systems, vol. 11, no. 2, pp. 131-136.

Reference

discovering patterns in adverse drug reactions

data mining methods

medical datasetsmedical

data mining hypotheses

real world medical data

medical decision

impact of missing data

medical datasetsfinding

diversity of medical

Documents

adverse drug reactions management

antituberculosis adverse drug reactions

pharmacovigilance & adverse drug reactions

adverse food reactions reactions immunologic

adverse drug reactions dr. rita grace y. alvero. adverse...

adverse food reactions – retail food … of adverse food...

adverse events/adverse reactions/serious adverse … ·...

cutaneous adverse drugs reactions

adverse drug reactions ppt

antidepressant physical adverse reactions

assessing adverse reactions - affiliatedphysicians.com

adverse drug reactions

medication errors adverse drug events (ades) adverse drug...

adverse reactions to

adverse drug reactions ,deepika

adverse reactions - medicines learning...

adverse reactions the most common hematologic adverse...

report on suspected adverse drug reactions form/… ·...

adverse drug reactions 1

adverse drug reactions - pure - aanmelden · adverse drug...