1 ctakes the clinical text analysis and knowledge extraction system

12
1 cTAKES The clinical Text Analysis and Knowledge Extraction System

Upload: robyn-stanley

Post on 03-Jan-2016

218 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: 1 cTAKES The clinical Text Analysis and Knowledge Extraction System

1

cTAKES

The clinical Text Analysis andKnowledge Extraction System

Page 2: 1 cTAKES The clinical Text Analysis and Knowledge Extraction System

2

cTAKES Overview

• Open source software• Natural Language Processing (NLP) • Developed at Mayo Clinic• Contributed to the Open Health Natural

Language Processing (OHNLP) Consortium

• Built on the Apache UIMA framework• Unstructured Information Management Architecture

• UIMA framework itself is also open source

Page 3: 1 cTAKES The clinical Text Analysis and Knowledge Extraction System

3

Open Health Natural Language Processing (OHNLP) Consortium

Goal: Foster an open-source collaborative community around clinical NLP that can deliver best-of-breed annotators, leverage the dynamic features of UIMA flow-control, and establish the infrastructure for clinical NLP.

www.ohnlp.org

Page 4: 1 cTAKES The clinical Text Analysis and Knowledge Extraction System

4

www.ohnlp.org

Gateway to • News• Documentation• Downloads• Forums for asking questions• Bug tracker for reporting issues• List of publications

Page 5: 1 cTAKES The clinical Text Analysis and Knowledge Extraction System

5

cTAKES Goals• Phenotype extraction• Generic – to be used for a variety of

retrievals and use cases• Expandable – at the information model level

and methods• Modular• Cutting edge technologies – best methods

combining existing practices and novel research with rapid technology transfer

• Best software practices (80M+ notes)

• Commitment to both R and D in R&D

Page 6: 1 cTAKES The clinical Text Analysis and Knowledge Extraction System

6

Original cTAKES Components

• Sentence boundary detection (OpenNLP technology)

• Tokenization (rule-based)• Morphologic normalization (NLM’s LVG)• POS tagging (OpenNLP technology)• Shallow parsing (OpenNLP technology)• Named Entity Recognition

• Dictionary mapping (lookup algorithm)• Negation and context identification (both

based on NegEx)

Page 7: 1 cTAKES The clinical Text Analysis and Knowledge Extraction System

7

Original cTAKES Named Entities

• Drug mentions• Disease/disorder mentions• Sign/symptom mentions• Anatomical site mentions

With these attributes• RxNorm code or Concept Unique Identifier

(CUI) and SNOMED-CT codes.• Negation (denies chest pain)• Status (history of, family history of,

possible/probable)

Page 8: 1 cTAKES The clinical Text Analysis and Knowledge Extraction System

8

Additional cTAKES Components

• Smoking status classifier• More detailed drug mention annotator

• dosage• route• form• drug change status• and more

• Peripheral Artery Disease (PAD) annotator• Dependency parser

Page 9: 1 cTAKES The clinical Text Analysis and Knowledge Extraction System

9

Output Example: Disorder Object

“No evidence of unstable angina.”

•Text: unstable angina•Associated codes:

• SNOMED 4557003• UMLS CUI C0002965

•Named entity type: disease/disorder •Negation: true

Page 10: 1 cTAKES The clinical Text Analysis and Knowledge Extraction System

10

cTAKES Configuration Options

• XML configuration files• Control many things, such as

• Dictionary location• Dictionary format• Which dictionaries to use• Type of input (plain text or CDA)

• Forums contain details on creating your own dictionary

Page 11: 1 cTAKES The clinical Text Analysis and Knowledge Extraction System

11

cTAKES Methods

Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications Guergana K Savova, James J Masanz, Philip V Ogren, Jiaping Zheng, Sunghwan Sohn, Karin C Kipper-Schuler, Christopher G Chute. JAMIA 2010;17:507-513

Page 12: 1 cTAKES The clinical Text Analysis and Knowledge Extraction System

12

References

http://www.ohnlp.org

http://uima.apache.org