biocuration 2014 - effective automated classification of adverse events using ontology-based...
DESCRIPTION
TRANSCRIPT
EFFECTIVE AUTOMATED CLASSIFICATION USING ONTOLOGY-BASED ANNOTATION : EXPERIENCE WITH ANALYSIS OF ADVERSE EVENT REPORTS
Mélanie Courtot, [email protected] Current: PhD Candidate, Terry Fox Laboratory, BC Cancer Agency Starting April 14th 2014: PDF, MBB Dept., Simon Fraser University (and affiliation with BC Public Health Microbiology and Research Laboratory).
Background and problem statement • Surveillance of Adverse Events Following Immunization is important • Detection of issues with vaccine • Importance of vaccine-risk communication
• Analysis of AE reports is a subjective, time- and money costly process • Manual review of the textual reports
Hypothesis
Health Agencies
Data repositories
Other guideline(s)
Brighton guideline
AUTOMATIC CASE CLASSIFICATION
BRIGHTON ANNOTATIONS
ADVERSE EVENT REPORTING ONTOLOGY
(AERO)
Clinician
2INFORMATION
RECALLSOPs
GENERAL POPULATION
GUIDELINE REPRESENTATION
1
DATA INTEGRATION&
ANSWERING QUERIES
3
Encoding Brighton guidelines in OWL allows automated classification of adverse events at similar accuracy
Test case
• VAERS dataset • Vaccine Adverse Event Reporting System • 6032 reports: ~5800 negative, ~230 positive • Post H1N1 immunization 2009/2010 • Manually classified for anaphylaxis
• MedDRA (Medical Dictionary of Regulatory Activities) is used to represent clinical findings
Free text partof the report
MedDRA encodedstructured data
Example VAERS report
Automated Diagnosis workflow
ADVERSE EVENT REPORTING ONTOLOGY
(AERO)
OWL/RDFEXPORT
VAERS DATASET
MySQL
BRIGHTON ANNOTATIONS
ASCII files MySQL
~800 MedDRA terms mapped to 32 Brighton terms
REASONER
?
MANUALLY CURATEDDATASET
A
B
C
D
Results
ADVERSE EVENT REPORTING ONTOLOGY
(AERO)
OWL/RDFEXPORT
VAERS DATASET
MySQL
BRIGHTON ANNOTATIONS
ASCII files MySQL
~800 MedDRA terms mapped to 32 Brighton terms
REASONER
?
MANUALLY CURATEDDATASET
A
B
C
D
At best cut-off point: Sensitivity 57% Specificity 97%
Standardized MedDRA Queries • SMQs are an existing MedDRA-based screening method
• Retrieval of documents based on Anaphylaxis SMQ alone only fair: 54% sensitivity, 97% specificity
• Idea: • Identify MedDRA terms that are significantly associated
with the diagnosis outcome using contingency tables • Augment the existing MedDRA SMQ with those terms
Cosine similarity method • Represent documents (query and report) as vectors of
terms • Compare the cosine measure of the angle they form
Cosine ~ 1 Query ~ Report
Cosine ~ 0 Query != Report
Example • Vector MEDDRA SMQ: ’Choking', 'Cough’, ’Oedema’, 'Rash’
• Vector REPORT#72: ’Oedema’, 'Rash’, ‘Vomiting’ • Vector REPORT#104: ‘Palpitations’, ‘Fatigue’, Neuropathy’
Results - Expanded MedDRA SMQ
At best cut-off point: Sensitivity 92%, Specificity 87%
Discussion • Using the ontology, the sensitivity is too low for efficient
screening • Brighton guidelines are not meant for screening, but for
diagnosis confirmation • We improved on the screening result and reached 92%
sensitivity, 87% specificity. • Using both approaches concurrently yields best screening
results
Key outcomes • Current encoding standards don’t allow for complete
representation of events • e.g., missing temporality descriptors (sudden onset, rapid
progression) • Critical for diagnosis confirmation and causality assessment
• Information lacking in reports form surveillance systems • Not assessed? Not recorded? Negative?
• Logical translation of guidelines allows for better detection of inconsistencies and errors • We are working with the Brighton Collaboration towards adding a
logical formalization to the existing case definitions
Use of the ontology for reporting • In current systems:
• Fast screening -> fast detection of potentially positive reports
• Reporter can be sent a more detailed report, e.g. “Brighton-based anaphylaxis report form”
• In future systems: • Implementation of the ontology-based system at the
time of data entry • Provides labels and textual definitions for each term • Enable consistency checking
Next steps: IRIDA project • Integrated Rapid Infectious Disease Analysis • http://www.irida.ca • IRIDA is a bioinformatics platform for genomic
epidemiology analysis to improve outbreak surveillance and detection
• Collaboration between academia and public health • Ontologies will be developed to annotate clinical, lab and
epidemiology data, and integrate for further analysis
Acknowledgements • Ryan Brinkman, BC Cancer Agency, Vancouver, Canada • Alan Ruttenberg, University at Buffalo, New York, USA • Julie Lafleche, Robert Pless, Barbara Law, Public
Health Agency of Canada, Ottawa, Ontario • Jan Bonhoeffer, Brighton Collaboration, Basel,
Switzerland • IRIDA project: Fiona Brinkman, William Hsiao