1 ctakes the clinical text analysis and knowledge extraction system
TRANSCRIPT
1
cTAKES
The clinical Text Analysis andKnowledge Extraction System
2
cTAKES Overview
• Open source software• Natural Language Processing (NLP) • Developed at Mayo Clinic• Contributed to the Open Health Natural
Language Processing (OHNLP) Consortium
• Built on the Apache UIMA framework• Unstructured Information Management Architecture
• UIMA framework itself is also open source
3
Open Health Natural Language Processing (OHNLP) Consortium
Goal: Foster an open-source collaborative community around clinical NLP that can deliver best-of-breed annotators, leverage the dynamic features of UIMA flow-control, and establish the infrastructure for clinical NLP.
www.ohnlp.org
4
www.ohnlp.org
Gateway to • News• Documentation• Downloads• Forums for asking questions• Bug tracker for reporting issues• List of publications
5
cTAKES Goals• Phenotype extraction• Generic – to be used for a variety of
retrievals and use cases• Expandable – at the information model level
and methods• Modular• Cutting edge technologies – best methods
combining existing practices and novel research with rapid technology transfer
• Best software practices (80M+ notes)
• Commitment to both R and D in R&D
6
Original cTAKES Components
• Sentence boundary detection (OpenNLP technology)
• Tokenization (rule-based)• Morphologic normalization (NLM’s LVG)• POS tagging (OpenNLP technology)• Shallow parsing (OpenNLP technology)• Named Entity Recognition
• Dictionary mapping (lookup algorithm)• Negation and context identification (both
based on NegEx)
7
Original cTAKES Named Entities
• Drug mentions• Disease/disorder mentions• Sign/symptom mentions• Anatomical site mentions
With these attributes• RxNorm code or Concept Unique Identifier
(CUI) and SNOMED-CT codes.• Negation (denies chest pain)• Status (history of, family history of,
possible/probable)
8
Additional cTAKES Components
• Smoking status classifier• More detailed drug mention annotator
• dosage• route• form• drug change status• and more
• Peripheral Artery Disease (PAD) annotator• Dependency parser
9
Output Example: Disorder Object
“No evidence of unstable angina.”
•Text: unstable angina•Associated codes:
• SNOMED 4557003• UMLS CUI C0002965
•Named entity type: disease/disorder •Negation: true
10
cTAKES Configuration Options
• XML configuration files• Control many things, such as
• Dictionary location• Dictionary format• Which dictionaries to use• Type of input (plain text or CDA)
• Forums contain details on creating your own dictionary
11
cTAKES Methods
Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications Guergana K Savova, James J Masanz, Philip V Ogren, Jiaping Zheng, Sunghwan Sohn, Karin C Kipper-Schuler, Christopher G Chute. JAMIA 2010;17:507-513