the umls* metathesaurus*: lessons for metadata registries
DESCRIPTION
The UMLS* Metathesaurus*: Lessons for Metadata Registries. Betsy L. Humphreys [email protected] http://www.nlm.nih.gov. * UMLS and Metathesaurus are registered trademarks of the National Library of Medicine. Outline of Presentation. - PowerPoint PPT PresentationTRANSCRIPT
The UMLS* Metathesaurus*: Lessons for Metadata Registries
Betsy L. Humphreys
http://www.nlm.nih.gov* UMLS and Metathesaurus are registered trademarks of the National Library of Medicine
Outline of Presentation
• Brief overview -- NLM’s Unified Medical Language System (UMLS) Project and its products
• Description of the UMLS Metathesaurus– content, construction methods, characteristics
• Interspersed Metadata Questions/Issues
UMLS Purpose
• Make it easy for health professionals and researchers to retrieve and integrate relevant information from disparate automated sources, e.g.– computer-based patient records– factual databanks– bibliographic databases and full-text– expert systems
UMLS Focus -- Conceptual Connections
• Build knowledge sources that can be used by intelligent programs to overcome:– disparities in language used by different users
and in different information sources;– difficulties in identifying which of many
information sources is relevant
UMLS Knowledge Sources
Multi-purpose tools or “intellectual middleware” for System Developers
• Metathesaurus
• SPECIALIST lexicon and lexical programs
• Semantic Network
UMLS Knowledge Sources Distribution
• Annual updates, 1990 - -
• Free under license agreement with NLM– Need separate license agreements with vocabulary
producers for some uses of some vocabularies in the Metathesaurus
• Available to licensed users (~900) via Internet server and on CDs
• Relational format (ASN.1 retired due to lack of use, XML being developed)
1999 UMLS Metathesaurus
• 626,313 concepts
• 1,134,413 “terms” (Eye, Eyes, eye = 1)
• 1,358,891 “strings”/concept names– (Eye, Eyes, eye = 3)
• ~50 source vocabularies
UMLS Metathesaurus
• Concepts, terms, and attributes from many controlled “vocabularies”
• New inter-source relationships, definitional information, use information
• Scope determined by combined scope of source vocabularies
UMLS Source “Vocabularies”
• Widely varying purposes, structures, properties, but all are in essence “sets of valid values” for data elements:– Thesauri, e.g., MeSH– Statistical Classifications, e.g., ICD– Billing Codes, e.g., CPT– Clinical coding systems, e.g., SNOMED – Lists of controlled terms, e.g., COSTAR, HL7 value
sets
Metathesaurus Construction
• Convert machine-readable vocabulary sources to UMLS “normal” form, making source semantics explicit
• Merge, using source semantics and lexical processing techniques
• Edit results, adding additional relationships and semantic information
$100,000 Metadata Questions
• What constitutes “explicit semantics” for Metadata?– At a minimum interpretable by humans– Preferably interpretable by machines
• How will the significant human effort required to create useful Metadata registries be organized and funded?
Metathesaurus Characteristics (1)
• Concept organization
• Many sources in a common database format
• Representation of the meaning in each source vocabulary
• Explicit tagging of each source vocabulary’s information
Current MeSH --Organized by Preferred Term
D015154Esophageal Motility Disorders (MH)
Esophageal Dysmotility (ET-- syn) Nutcracker Esophagus (ET-- nar)
UMLS Metathesaurus -- Organized by Concept
C0014858Esophageal Motility Disorders (MeSH, Read)Esophageal Dysmotility (MeSH,Read) Oesophageal Dysmotility (Read)
C0028705Nutcracker Esophagus (MeSH, Read, ....) Symptomatic esophageal peristalsis (Read)
Metadata Question
• What is the operational definition of synonymy in the realm of Metadata element names?– OR, When does a distinction make a difference
in Metadata?
Metadata Question
• Will the Metathesaurus approach to “multiple meanings” work for data element names?– E.g., Country
• Country of Birth
• Country of Residence
• Country of Publication
– REMINDER: different data elements can have the SAME set of valid values
SO|C0007452|L0007452|S0023004|LCH90|PT|U000852|0|SO|C0007452|L0007452|S0023004|MSH99|MH|D002417|0|SO|C0007452|L0007452|S0023004|PSY94|PT|08010|3|SO|C0007452|L0007452|S0023004|SNM2|RT|E-4994|3|SO|C0007452|L0007452|S0023004|SNMI98|SY|L-80100|3|SO|C0007452|L0010229|S0002635|RCD98|PT|X79op|3|SO|C0007452|L0010229|S0002635|SNM2|RT|E-4994|3|SO|C0007452|L0010229|S0002635|SNMI98|SY|L-80100|3|SO|C0007452|L0010229|S0364778|PSY94|ET|12270|3|SO|C0007452|L0010229|S0417039|AOD95|DE|0000014422|0||
Metadata Question
• What level of explicit tagging is needed in Metadata Registries?
Metathesaurus Characteristics (2)
• Added relationships between concepts and terms from different vocabularies
• Added definitional and use information
• “Context-free” unique identifiers – the concept “names” that never change
• Normalized word and string indexes produced using UMLS lexical tools
CON|C0007452|ENG|P|L0007452|PF|S0023004|Cattle|CON|C0007452|ENG|S|L0010229|PF|S0002635|Cow|CON|C0007452|ENG|S|L0010229|VC|S0417039|cow|CON|C0007452|ENG|S|L0010229|VP|S0364778|Cows|CON|C0007452|ENG|S|L0530279|PF|S0604672|Bovine,NOS|CON|C0007452|ENG|S|L0530279|VO|S1428975|bovines|CON|C0007452|ENG|S|L0530314|PF|S0604663|Bovinespecies|CON|C0007452|ENG|S|L0530314|VC|S0596242|BOVINESPECIES|CON|C0007452|ENG|S|L1120284|PF|S1344523|bovid|CON|C0007452|ENG|S|L1193708|PF|S1428974|Bovidae
CXT|C0007452|S0002635|RCD98|X79op|1|ANC|1|Readthesaurus|C0338370|.....|||CXT|C0007452|S0002635|RCD98|X79op|1|ANC|2|Organisms|C0029235|XM0Nm|||CXT|C0007452|S0002635|RCD98|X79op|1|ANC|3|Animal|C0003062|X79ol|||CXT|C0007452|S0002635|RCD98|X79op|1|ANC|4|Vertebrate|C0042567|XM0OI|||CXT|C0007452|S0002635|RCD98|X79op|1|ANC|5|Mammal|C0024660|X79pW|||CXT|C0007452|S0002635|RCD98|X79op|1|CCP||Cow|C0007452|X79op|||CXT|C0007452|S0002635|RCD98|X79op|1|SIB||Bat -animal|C0008139|X79om|||CXT|C0007452|S0002635|RCD98|X79op|1|SIB||Cat|C0677516|X79oo|||CXT|C0007452|S0002635|RCD98|X79op|1|SIB||Deer|C0011133|X79oq|||CXT|C0007452|S0002635|RCD98|X79op|1|SIB||Dog|C0012984|X79or|||CXT|C0007452|S0002635|RCD98|X79op|1|SIB||Horse|C0019944|X79ou|||
Metadata Question
• In the realm of Metadata, what requires unique, permanent, context-free identifiers?
Normalization -- example
• disorder esophageal motility = normalized form of:– Esophageal Motility Disorders– Esophageal Motility Disorder– Motility Disorder, Esophageal– Disorder, Esophageal Motility
Metadata Questions
• Are similar lexical resources needed as adjuncts to Metadata Registries?
• Are the UMLS lexical tools directly useful for Metadata efforts?