biocuration 2014 - effective automated classification of adverse events using ontology-based...

Post on 21-Nov-2014

495 Views

Category:

Health & Medicine

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

EFFECTIVE AUTOMATED CLASSIFICATION USING ONTOLOGY-BASED ANNOTATION : EXPERIENCE WITH ANALYSIS OF ADVERSE EVENT REPORTS

Mélanie Courtot, mcourtot@gmail.com Current: PhD Candidate, Terry Fox Laboratory, BC Cancer Agency Starting April 14th 2014: PDF, MBB Dept., Simon Fraser University (and affiliation with BC Public Health Microbiology and Research Laboratory).

Background and problem statement • Surveillance of Adverse Events Following Immunization is important • Detection of issues with vaccine •  Importance of vaccine-risk communication

• Analysis of AE reports is a subjective, time- and money costly process • Manual review of the textual reports

Hypothesis

Health Agencies

Data repositories

Other guideline(s)

Brighton guideline

AUTOMATIC CASE CLASSIFICATION

BRIGHTON ANNOTATIONS

ADVERSE EVENT REPORTING ONTOLOGY

(AERO)

Clinician

2INFORMATION

RECALLSOPs

GENERAL POPULATION

GUIDELINE REPRESENTATION

1

DATA INTEGRATION&

ANSWERING QUERIES

3

Encoding Brighton guidelines in OWL allows automated classification of adverse events at similar accuracy

Test case

• VAERS dataset • Vaccine Adverse Event Reporting System • 6032 reports: ~5800 negative, ~230 positive • Post H1N1 immunization 2009/2010 • Manually classified for anaphylaxis

• MedDRA (Medical Dictionary of Regulatory Activities) is used to represent clinical findings

Free text partof the report

MedDRA encodedstructured data

Example VAERS report

Automated Diagnosis workflow

ADVERSE EVENT REPORTING ONTOLOGY

(AERO)

OWL/RDFEXPORT

VAERS DATASET

MySQL

BRIGHTON ANNOTATIONS

ASCII files MySQL

~800 MedDRA terms mapped to 32 Brighton terms

REASONER

?

MANUALLY CURATEDDATASET

A

B

C

D

Results

ADVERSE EVENT REPORTING ONTOLOGY

(AERO)

OWL/RDFEXPORT

VAERS DATASET

MySQL

BRIGHTON ANNOTATIONS

ASCII files MySQL

~800 MedDRA terms mapped to 32 Brighton terms

REASONER

?

MANUALLY CURATEDDATASET

A

B

C

D

At best cut-off point: Sensitivity 57% Specificity 97%

Standardized MedDRA Queries • SMQs are an existing MedDRA-based screening method

• Retrieval of documents based on Anaphylaxis SMQ alone only fair: 54% sensitivity, 97% specificity

•  Idea: •  Identify MedDRA terms that are significantly associated

with the diagnosis outcome using contingency tables • Augment the existing MedDRA SMQ with those terms

Cosine similarity method • Represent documents (query and report) as vectors of

terms • Compare the cosine measure of the angle they form

Cosine ~ 1 Query ~ Report

Cosine ~ 0 Query != Report

Example • Vector MEDDRA SMQ: ’Choking', 'Cough’, ’Oedema’, 'Rash’

• Vector REPORT#72: ’Oedema’, 'Rash’, ‘Vomiting’ • Vector REPORT#104: ‘Palpitations’, ‘Fatigue’, Neuropathy’

Results - Expanded MedDRA SMQ

At best cut-off point: Sensitivity 92%, Specificity 87%

Discussion • Using the ontology, the sensitivity is too low for efficient

screening • Brighton guidelines are not meant for screening, but for

diagnosis confirmation • We improved on the screening result and reached 92%

sensitivity, 87% specificity. • Using both approaches concurrently yields best screening

results

Key outcomes • Current encoding standards don’t allow for complete

representation of events •  e.g., missing temporality descriptors (sudden onset, rapid

progression) •  Critical for diagnosis confirmation and causality assessment

•  Information lacking in reports form surveillance systems •  Not assessed? Not recorded? Negative?

•  Logical translation of guidelines allows for better detection of inconsistencies and errors •  We are working with the Brighton Collaboration towards adding a

logical formalization to the existing case definitions

Use of the ontology for reporting •  In current systems:

•  Fast screening -> fast detection of potentially positive reports

• Reporter can be sent a more detailed report, e.g. “Brighton-based anaphylaxis report form”

•  In future systems: •  Implementation of the ontology-based system at the

time of data entry • Provides labels and textual definitions for each term • Enable consistency checking

Next steps: IRIDA project •  Integrated Rapid Infectious Disease Analysis •  http://www.irida.ca •  IRIDA is a bioinformatics platform for genomic

epidemiology analysis to improve outbreak surveillance and detection

• Collaboration between academia and public health • Ontologies will be developed to annotate clinical, lab and

epidemiology data, and integrate for further analysis

Acknowledgements •  Ryan Brinkman, BC Cancer Agency, Vancouver, Canada •  Alan Ruttenberg, University at Buffalo, New York, USA •  Julie Lafleche, Robert Pless, Barbara Law, Public

Health Agency of Canada, Ottawa, Ontario •  Jan Bonhoeffer, Brighton Collaboration, Basel,

Switzerland •  IRIDA project: Fiona Brinkman, William Hsiao

top related