july 21 - aime 2009 carol friedman, phd department of biomedical informatics columbia university...

40
July 21 - AIME 2009 Carol Friedman, PhD Department of Biomedical Informatics Columbia University Discovering Novel Adverse Drug Events Using Natural Language Processing and Mining of Electronic Health Records

Upload: eric-mcdonald

Post on 12-Jan-2016

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: July 21 - AIME 2009 Carol Friedman, PhD Department of Biomedical Informatics Columbia University Discovering Novel Adverse Drug Events Using Natural Language

July 21 - AIME 2009

Carol Friedman, PhD

Department of Biomedical Informatics

Columbia University

Discovering Novel Adverse Drug

Events Using Natural Language

Processing and Mining of

Electronic Health Records

Page 2: July 21 - AIME 2009 Carol Friedman, PhD Department of Biomedical Informatics Columbia University Discovering Novel Adverse Drug Events Using Natural Language

July 21 - AIME 2009

Motivation: Severity of Problem

• Clinical trials do not test a broad population

• Adverse Drug Events (ADEs) world-wide problem

• *Expense from ADEs is $5.6 billion annually

• *Estimated that over 2 million patients hospitalized due to ADEs

• *ADEs are fourth leading cause of death

*In US alone

Page 3: July 21 - AIME 2009 Carol Friedman, PhD Department of Biomedical Informatics Columbia University Discovering Novel Adverse Drug Events Using Natural Language

July 21 - AIME 2009

Motivation: Limitations of Approaches

• Manual review of case reports (Venulet J 1988)

• Spontaneous reporting to designated agency (Evans JM 2001; Eland IA 1999; Wysowski DK 2005)

– Serious ADEs reported less than 1-10% of time– Reporting is voluntary for physicians/patients– Recognition of ADEs is highly subjective– Difficult to determine cause of ADE– Biased by length of time on market and other

factors– Cannot determine number of patients on drug or

percent at risk • Drug prescribing/claims data (Hershman D 2007; Ray

WA 2009)

Page 4: July 21 - AIME 2009 Carol Friedman, PhD Department of Biomedical Informatics Columbia University Discovering Novel Adverse Drug Events Using Natural Language

July 21 - AIME 2009

Severity of Under Reporting

Study showed 87% of time physicians ignored patient reports of known ADEs

(Golumb et al. Physicians response to patient reports of adverse drug effects. Drug Safety 2007)

Page 5: July 21 - AIME 2009 Carol Friedman, PhD Department of Biomedical Informatics Columbia University Discovering Novel Adverse Drug Events Using Natural Language

July 21 - AIME 2009

Related Work

• Automated methods mainly based on spontaneous reporting databases– Most methods use (Evans SJ 2001; Szarfman A 2002)

• Surrogate observed-to-expected ratios• Incidence of drug-event reporting compared to background

reporting across all drugs and events

• Some research aimed at improving effectiveness of SPR databases– Create ontology of higher order adverse events

• MedDRA

– Avoid fragmentation of signal

Page 6: July 21 - AIME 2009 Carol Friedman, PhD Department of Biomedical Informatics Columbia University Discovering Novel Adverse Drug Events Using Natural Language

July 21 - AIME 2009

Related Work

• Pharmacoepidemiology databases used to confirm suspicions – General practice research database (GPRD) (Wood

& Martinez 2004)

– New Zealand Intensive Medicines Monitoring (IMMP) (Coulter 1998)

– Medicine Monitoring Unit (MEMO) (Evans et al. 2001)

• EHR databases used to find signals (Brown JS et al. 2007; Berlowitz DR et al. 2006; Wang X et al. 2009)

– Mainly coded data used– Has potential for active real time surveillance– Should reduce biased reporting

Page 7: July 21 - AIME 2009 Carol Friedman, PhD Department of Biomedical Informatics Columbia University Discovering Novel Adverse Drug Events Using Natural Language

July 21 - AIME 2009

Related Work

• Consortiums involving multiple EHRs– EU-ADR project (http://www.alert-project.org/)– eHealth initiative (

http://www.ehealthinitiative.org/drugSafety/)

• Related work using EHR to detect known ADEs – not aimed at discovering novel ADEs

(Bates DW 2003; Hongman B 2001)

Page 8: July 21 - AIME 2009 Carol Friedman, PhD Department of Biomedical Informatics Columbia University Discovering Novel Adverse Drug Events Using Natural Language

July 21 - AIME 2009

Exploiting the Electronic Health Record

DATA

……

22hct

1.3inr

83bun

Text notes

Labs

… …

…pepcid

…lasix

Orders

primarycare

special-ties

inpatientprogress

admithistory

CentralizedData

ExecutableData

NLP +Integration

Applications

•Decision support•Patient Safety•Acquire knowledge•Discovery•Guidelines•Surveillance•Patient management•Clinical Trial recruitment•Improved documentation•Quality assurance

Page 9: July 21 - AIME 2009 Carol Friedman, PhD Department of Biomedical Informatics Columbia University Discovering Novel Adverse Drug Events Using Natural Language

July 21 - AIME 2009

The Electronic Health Record (EHR)

• Rich source of patient information• Mostly untapped • Primary use for EHR

– Documenting care in multi-provider environment– Manual review by providers

• More complete than coded ICD-9 codes– Symptoms– Clinical conditions not beneficial for billing

• Fragmented• Heterogeneous• Noisy

Page 10: July 21 - AIME 2009 Carol Friedman, PhD Department of Biomedical Informatics Columbia University Discovering Novel Adverse Drug Events Using Natural Language

July 21 - AIME 2009

Research Opportunities: NLP Issues

• Occurrence of clinical events in natural language– Drugs, diseases, symptoms– Temporal information is critical

• Irregularity of reports– Section headings important but abbreviated/missing– Use of indentation, lists, run on sentences– Tables & semi-structured data in reports

• Abbreviations – 2/2 meaning secondary to– co meaning cardiac output or complaining of

• Mapping terms in text to an ontology/controlled vocabulary– infiltrate in chest x-ray means chest infiltrate– ontology terms more limited than language

Page 11: July 21 - AIME 2009 Carol Friedman, PhD Department of Biomedical Informatics Columbia University Discovering Novel Adverse Drug Events Using Natural Language

July 21 - AIME 2009

Research Opportunities: Statistical Issues

• Find associations between drug, symptoms, and diseases– Not explicit in EHR

• Large volumes of data– Statistical significance vs. clinical significance

• Statistical associations – not relationships– Drug treats condition / Drug causes condition

• Integrating time sequences is important– For treats: condition must precede drug event– For causes: drug event must precede condition

Page 12: July 21 - AIME 2009 Carol Friedman, PhD Department of Biomedical Informatics Columbia University Discovering Novel Adverse Drug Events Using Natural Language

July 21 - AIME 2009

• Confounding (indirect associations)– Metolazone treats heart failure (HF)– HF is manifested by shortness of breath (SOB)– Metolazone and SOB indirectly related

• Higher order associations– Drug interactions: Drug1, drug2, condition– Drug-contraindications: Drug, disease, condition

• Rare ADEs

Research Opportunities: Statistical Issues

Page 13: July 21 - AIME 2009 Carol Friedman, PhD Department of Biomedical Informatics Columbia University Discovering Novel Adverse Drug Events Using Natural Language

July 21 - AIME 2009

Other Research Opportunities: Knowledge Acquisition

• Structured Knowledge bases– UMLS relations (may_be_treated_by)– Proprietary ones – usually unavailable

• Text/Semi-Structured Knowledge (need NLP)– Spontaneous reporting databases: indications,

drugs, adverse events– Literature (Medline)– Web sites (WebMD, Micromedix)– Online medical textbooks– Claims Data (Health IT payors)

Page 14: July 21 - AIME 2009 Carol Friedman, PhD Department of Biomedical Informatics Columbia University Discovering Novel Adverse Drug Events Using Natural Language

July 21 - AIME 2009

Text Mining for Knowledge Acquisition

• Statistical methods: co-occurrences – Discovered associations between diseases and

diets from literature (Weeber M 2002)– Identified disease candidate genes ( Hristovski D 2005)

• NLP systems– Trends in medications based on the literature and

narrative clinical reports (Chen ES 2007, 2008)

– Semantic relations in the literature (Hristovski D 2006)

Page 15: July 21 - AIME 2009 Carol Friedman, PhD Department of Biomedical Informatics Columbia University Discovering Novel Adverse Drug Events Using Natural Language

July 21 - AIME 2009

Overview of Our NLP-EHR based Pharmacovigilance System

Narrativerecords

Codeddata

EHR Selecting & filtering

Detectassociations

MedLEE NLP

Standardize & integrate

ADE SignalsEliminate confoundingMedical

knowledge

Page 16: July 21 - AIME 2009 Carol Friedman, PhD Department of Biomedical Informatics Columbia University Discovering Novel Adverse Drug Events Using Natural Language

July 21 - AIME 2009

Natural Language Processing of EHR

Codeddata

EHR Selecting & filtering

Detectassociations

Standardize & integrate

Eliminate confounding

ADE SignalsMedicalknowledge

MedLEE NLPNarrativerecords

Page 17: July 21 - AIME 2009 Carol Friedman, PhD Department of Biomedical Informatics Columbia University Discovering Novel Adverse Drug Events Using Natural Language

July 21 - AIME 2009

Meds:Tegretol xrZocor

All:Several sz meds

PMHx:sz d/o - well controlled on tegretolhigh chol - on zocorCAD - 60% lesion in LADM by cathMR - secondary to mitral prolapse

PSHx:rib fx in 2001, shoulder fx secondary to trauma

Vitals: 130/80 12 80

A/P: 54 y/o m with mult med problems, all relatively well controlled. Pt sz free, not anemic as of 2/2003. Concerned of MR and its possible long term effects.

Page 18: July 21 - AIME 2009 Carol Friedman, PhD Department of Biomedical Informatics Columbia University Discovering Novel Adverse Drug Events Using Natural Language

July 21 - AIME 2009

Coded Output from NLP

med:tegretol xr sectname>> report medication item code>> UMLS:C0592163_Tegretol XRmed:zocor sectname>> report medication item code>> UMLS:C0678181_Zocor

.........problem:mitral valve regurgitation sectname>> report past history item code>> UMLS:C0026266_Mitral Valve Insufficiency

…….. problem:rib fracture date>> 2001 sectname>> report past history item

Page 19: July 21 - AIME 2009 Carol Friedman, PhD Department of Biomedical Informatics Columbia University Discovering Novel Adverse Drug Events Using Natural Language

July 21 - AIME 2009

Coding Issues

• Not all conditions have codes– Non-communicative

• Some conditions are combinations of codes – Difficulty sleeping– Vascular injury

• Granularity of coding system– Many different codes for a concept

Asthma: asthma exacerbation, asthma disturbing sleep, moderate asthma, suspected asthma, …

Page 20: July 21 - AIME 2009 Carol Friedman, PhD Department of Biomedical Informatics Columbia University Discovering Novel Adverse Drug Events Using Natural Language

July 21 - AIME 2009

Standardizing Coded Data

MedLEE NLP

Selecting & filtering

Detectassociations

Standardize & integrate

Eliminate confounding

ADE SignalsMedicalknowledge

Narrativerecords

EHR

Codeddata

C0744727: low hematocritHCT:20

Page 21: July 21 - AIME 2009 Carol Friedman, PhD Department of Biomedical Informatics Columbia University Discovering Novel Adverse Drug Events Using Natural Language

July 21 - AIME 2009

Standardizing Coded EHR Data:Laboratory Tests and Medications

• Lab values denoting normal/abnormal vary– Abnormal range may depend on age, sex, ethnicity, weight– Change in lab values and duration must be considered

• Standardizing medications is complex & requires additional knowledge– Tradename to generic (Avandia rosaglitazone)

– Handling of combination medications• 1.5% Lidocaine with 1:200,000 Epinephrine

– Handling of dose & Route• Diazepam 2 MG Oral Tablet

Page 22: July 21 - AIME 2009 Carol Friedman, PhD Department of Biomedical Informatics Columbia University Discovering Novel Adverse Drug Events Using Natural Language

July 21 - AIME 2009

Selecting and Filtering

Narrativerecords

Codeddata

EHR Selecting & filtering

Detectassociations

MedLEE NLP

Standardize & integrate

ADE SignalsEliminate confoundingMedical

knowledge

• Select using UMLS classes (diseases, medications)Filter out:•negations, past info, …• wrong time order

Page 23: July 21 - AIME 2009 Carol Friedman, PhD Department of Biomedical Informatics Columbia University Discovering Novel Adverse Drug Events Using Natural Language

July 21 - AIME 2009

Selecting and Filtering

• Dependence on accuracy of semantic classification– UMLS classification errors

- Finding: birth history, cardiac output, divorce

+ Finding: cardiomegaly, fever

• Temporal information difficult to obtain– An adverse drug event should only follow drug event– Processing of explicit time information is complex and vague

• Yesterday, last admission, 2/5

– Information typically occur in reports without dates

Page 24: July 21 - AIME 2009 Carol Friedman, PhD Department of Biomedical Informatics Columbia University Discovering Novel Adverse Drug Events Using Natural Language

July 21 - AIME 2009

Detect Associations

Narrativerecords

Codeddata

EHR Selecting & filtering

Detectassociations

MedLEE NLP

Standardize & integrate

ADE SignalsEliminate confoundingMedical

knowledge

• Obtain event frequencies•Co-occurrence frequencies•Form 2x2 tables•Calculate associations

Page 25: July 21 - AIME 2009 Carol Friedman, PhD Department of Biomedical Informatics Columbia University Discovering Novel Adverse Drug Events Using Natural Language

July 21 - AIME 2009

Detect Associations

• Correct temporal sequence is critical– Drug event should precede adverse event– Dates are not usually stated along with events– Section of reports helpful surrogate

• Statistical associations correspond to different clinical relations– For pharmacovigilance:

• Want drug causes adverse event

• Confounding caused by dependencies in data

Page 26: July 21 - AIME 2009 Carol Friedman, PhD Department of Biomedical Informatics Columbia University Discovering Novel Adverse Drug Events Using Natural Language

July 21 - AIME 2009

Confounding Interdependencies

AdverseEvent

Disease

Treats Manifested by

Cause_ADEDrug

Page 27: July 21 - AIME 2009 Carol Friedman, PhD Department of Biomedical Informatics Columbia University Discovering Novel Adverse Drug Events Using Natural Language

July 21 - AIME 2009

Confounding Interdependencies

ML SOB

HD

ML: Metolazone; HD: Hypertensive Disease; SOB: Shortness of Breath

Page 28: July 21 - AIME 2009 Carol Friedman, PhD Department of Biomedical Informatics Columbia University Discovering Novel Adverse Drug Events Using Natural Language

July 21 - AIME 2009

Drug Associations Network

Dx Dx1-n

Sx

process

Rx

treatment

Sx1-n

process

association

association

Rx1-n

asso

ciatio

n

processprocess

ADE

treatment

treatment

ADE

Page 29: July 21 - AIME 2009 Carol Friedman, PhD Department of Biomedical Informatics Columbia University Discovering Novel Adverse Drug Events Using Natural Language

July 21 - AIME 2009

Reduce Confounding

Narrativerecords

Codeddata

EHR Selecting & filtering

Detectassociations

MedLEE NLP

Standardize & integrate

ADE SignalsEliminate confounding

Medicalknowledge

Page 30: July 21 - AIME 2009 Carol Friedman, PhD Department of Biomedical Informatics Columbia University Discovering Novel Adverse Drug Events Using Natural Language

July 21 - AIME 2009

Reduce Confounding• Collect knowledge from external sources and

associations– Drug-treat-disease– Disease-manifested by-symptom– Drug-interacts with-drug

• Use Information theory– Mutual Information (MI)– Data processing inequality

MI3 < (MI1, MI3)

AdverseEvent

Disease

MI1

Drug

MI2

MI3

Page 31: July 21 - AIME 2009 Carol Friedman, PhD Department of Biomedical Informatics Columbia University Discovering Novel Adverse Drug Events Using Natural Language

July 21 - AIME 2009

Initial Study: Methods• 6 drugs chosen

– Ibuprofen, Morphine, Warfarin: longtime on market with known ADEs

– Bupropion, Paroxetine, Rosiglitazone: ADEs discovered after 2004

– 1 drug class: ACE inhibitors

• 25,074 textual discharge summaries in 2004 from NYPH processed using MedLEE NLP

• Reference standard created using expert knowledge sources

• Drug-potential ADE pairs determined• Recall/precision calculated• Qualitative analysis performed to classify drug-

potential ADE pairs detected

Page 32: July 21 - AIME 2009 Carol Friedman, PhD Department of Biomedical Informatics Columbia University Discovering Novel Adverse Drug Events Using Natural Language

July 21 - AIME 2009

Initial Study: Results

• Quantitative– recall (.75), precision (.30)

• Qualitative analysis: potential drug-ADE pairsa.Known drug-ADEs: 30%

b.Drug-indication pairs: 30%

c. Remote drug-indication pair: 33%

d.Unknown clinical associations: 6%

Page 33: July 21 - AIME 2009 Carol Friedman, PhD Department of Biomedical Informatics Columbia University Discovering Novel Adverse Drug Events Using Natural Language

July 21 - AIME 2009

Confounding Interdependencies

AdverseEvent

Disease

Treats Manifested by

Cause_ADEDrug

Disease2

Page 34: July 21 - AIME 2009 Carol Friedman, PhD Department of Biomedical Informatics Columbia University Discovering Novel Adverse Drug Events Using Natural Language

July 21 - AIME 2009

Study 2: Reduction of Confounding

• Evaluation set• 14 associations related to 2 drugs from Study 1

• Reference standard• Drug-ADE associations determined and MI, DPI used

to automatically classify them

Drug-ADE Relation

Direct Side effects of the drug (Rosiglitazone-headache)

Indirect Conditions related to the disease/symptoms the drug treats (Metolazone-shortness of breath)

Either Conditions in both ‘direct’ and ‘indirect’ categories

(Rosiglitazone-chest Pain)

Page 35: July 21 - AIME 2009 Carol Friedman, PhD Department of Biomedical Informatics Columbia University Discovering Novel Adverse Drug Events Using Natural Language

July 21 - AIME 2009

Results

• Precision • 0.86 when handling confounding• 0.31 when without handling confounding

Page 36: July 21 - AIME 2009 Carol Friedman, PhD Department of Biomedical Informatics Columbia University Discovering Novel Adverse Drug Events Using Natural Language

July 21 - AIME 2009

Discussion: Limitations& Future Directions

• Mutual information only strategy to handle confounding– More complex MI strategy will be explored– Other statistical/knowledge based methods will be explored

• Inpatient data only/sicker patient population– The same methods could be used for outpatient data as well -

possibly more noisy

• Drug dosage, drug-drug and more complex interactions should be explored

Page 37: July 21 - AIME 2009 Carol Friedman, PhD Department of Biomedical Informatics Columbia University Discovering Novel Adverse Drug Events Using Natural Language

July 21 - AIME 2009

Discussion: Limitations& Future Directions

• Small evaluation data set– More comprehensive evaluation

• Limitations inherent from NLP, coding, association detection

• Limitations due to fragmented/incomplete patient data

Page 38: July 21 - AIME 2009 Carol Friedman, PhD Department of Biomedical Informatics Columbia University Discovering Novel Adverse Drug Events Using Natural Language

July 21 - AIME 2009

Summary• Need for more pharmacovigilance research

– Based on the EHR– Using available databases and text

• Studies demonstrated promising results• Many interesting research opportunities

– Natural language processing– Statistical methods– Integrating different sources of data– Gathering knowledge from different sources– Automated knowledge acquisition for evidence

based medicine

Page 39: July 21 - AIME 2009 Carol Friedman, PhD Department of Biomedical Informatics Columbia University Discovering Novel Adverse Drug Events Using Natural Language

July 21 - AIME 2009

Acknowledgement

• NLP Data Mining group at DBMI at Columbia– George Hripcsak– Marianthi Markatou– Herb Chase– Xiaoyan Wang– David Albers– Jung-wei Fan– Lyudmila Shagina– Noemie Elhadad

• Grants– R01 LM007659 from NLM– R01 LM008635 from NLM– R01 LM06910 from NLM– 5T15LM007079 from NLM training grant

Page 40: July 21 - AIME 2009 Carol Friedman, PhD Department of Biomedical Informatics Columbia University Discovering Novel Adverse Drug Events Using Natural Language

July 21 - AIME 2009

QUESTIONS

THANK YOU!