![Page 1: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/1.jpg)
NLM Resources for Mining Biomedical Text
3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013
Olivier Bodenreider
Lister Hill National Center for Biomedical Communications Bethesda, Maryland - USA
![Page 2: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/2.jpg)
Lister Hill National Center for Biomedical Communications 2
Overview
An example Types of resources for mining biomedical text Three types of resources
Lexical resources Terminological resources Ontological resources
![Page 3: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/3.jpg)
An example
Neurofibromatosis 2
![Page 4: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/4.jpg)
Lister Hill National Center for Biomedical Communications 4
Neurofibromatosis type 2 (NF2) is often not recognised as a distinct entity from peripheral neurofibromatosis. NF2 is a predominantly intracranial condition whose hallmark is bilateral vestibular schwannomas. NF2 results from a mutation in the gene named merlin, located on chromosome 22.
[Uppal, S., and A. P. Coatesworth. “Neurofibromatosis Type 2.” Int J Clin Pract, 57, no. 8, 2003, pp. 698-703.]
Neurofibromatosis 2
![Page 5: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/5.jpg)
Lister Hill National Center for Biomedical Communications 5
Neurofibromatosis type 2 (NF2) is often not recognised as a distinct entity from peripheral neurofibromatosis. NF2 is a predominantly intracranial condition whose hallmark is bilateral vestibular schwannomas. NF2 results from a mutation in the gene named merlin, located on chromosome 22.
Entity recognition
missed partial ambiguous
Lexical resources
Ontologies
![Page 6: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/6.jpg)
Lister Hill National Center for Biomedical Communications 6
Neurofibromatosis type 2 (NF2) is often not recognised as a distinct entity from peripheral neurofibromatosis. NF2 is a predominantly intracranial condition whose hallmark is bilateral vestibular schwannomas. NF2 results from a mutation in the gene named merlin, located on chromosome 22.
Relation extraction
• vestibular schwannomas manifestation of neurofibromatosis 2 • neurofibromatosis 2 associated with mutation of NF2 gene • NF2 gene located on chromosome 22
Ontologies
![Page 7: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/7.jpg)
Types of resources for mining biomedical text
![Page 8: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/8.jpg)
Lister Hill National Center for Biomedical Communications 8
Types of resources
Lexical resources Collections of lexical items Additional information
Part of speech Spelling variants
Useful for entity recognition
UMLS SPECIALIST Lexicon, WordNet
Ontological resources Collections of
kinds of entities (substances, qualities, processes)
relations among them Useful for relation
extraction UMLS Semantic Network,
BioTop
Terminological resources Collections lexical items + identifiers Useful for entity resolution UMLS Metathesaurus
![Page 9: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/9.jpg)
Lister Hill National Center for Biomedical Communications 9
Types of resources (revisited)
Lexical and terminological resources Mostly collections of names for biomedical entities Often have some kind or hierarchical organization (e.g.,
relations) Ontological resources
Mostly collections of relations among biomedical entities
Sometimes also collect names
![Page 10: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/10.jpg)
Lister Hill National Center for Biomedical Communications 10
Lexical / Ontological MeSH
Addison Disease
Endocrine system diseases
Adrenal gland diseases
Adrenal Insufficiency
Disease
Immune system diseases
Autoimmune diseases
http://www.nlm.nih.gov/mesh/MBrowser.html
![Page 11: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/11.jpg)
Lister Hill National Center for Biomedical Communications 11
Lexical / Ontological FMA http://fme.biostr.washington.edu/index.html
![Page 12: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/12.jpg)
Lister Hill National Center for Biomedical Communications 12
Unified Medical Language System
SPECIALIST Lexicon 475,000 lexical items Part of speech and variant information
Metathesaurus 8.5M normalized names 2.9M concepts >10M relations
Semantic Network 133 high-level categories 7000 relations among them
Lexical resources
Ontological resources
Terminological resources
LVG / Norm
MetaMap
SemRep
![Page 13: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/13.jpg)
Lexical resources
SPECIALIST Lexicon and lexical tools
http://umlslex.nlm.nih.gov/
![Page 14: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/14.jpg)
Lister Hill National Center for Biomedical Communications 14
SPECIALIST Lexicon
Content English lexicon Many words from the biomedical domain
475,000 lexical items Word properties
morphology orthography syntax
Used by the lexical tools
![Page 15: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/15.jpg)
Lister Hill National Center for Biomedical Communications 15
Morphology
Inflection noun verb adjective
Derivation verb noun adjective noun
nucleus, nuclei
cauterize, cauterizes, cauterized, cauterizing
red, redder, reddest
cauterize -- cauterization
red -- redness
![Page 16: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/16.jpg)
Lister Hill National Center for Biomedical Communications 16
Orthography
Spelling variants oe/e ae/e ise/ize genitive mark
Addison's disease Addison disease Addisons disease
oesophagus - esophagus
anaemia - anemia
cauterise - cauterize
![Page 17: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/17.jpg)
Lister Hill National Center for Biomedical Communications 17
Syntax
Complementation verbs
intransitive transitive ditransitive
nouns prepositional phrase
Position for adjectives
I'll treat. He treated the patient. He treated the patient with a drug.
Valve of coronary sinus
![Page 18: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/18.jpg)
Lister Hill National Center for Biomedical Communications 18
SPECIALIST Lexicon record
{ base=hemoglobin (base form) spelling_variant=haemoglobin entry=E0031208 (identifier) cat=noun (part of speech) variants=uncount (no plural) variants=reg (plural: hemoglobins, hemoglobins) }
![Page 19: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/19.jpg)
Lister Hill National Center for Biomedical Communications 19
Lexical tools
To manage lexical variation in biomedical terminologies
Major tools Normalization Indexes Lexical Variant Generation program (lvg)
Based on the SPECIALIST Lexicon Used by noun phrase extractors, search engines
![Page 20: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/20.jpg)
Lister Hill National Center for Biomedical Communications 20
Normalization
Hodgkin’s diseases, NOS
Hodgkin diseases, NOS Remove genitive
Hodgkin diseases, Remove stop words
hodgkin diseases, Lowercase
hodgkin diseases Strip punctuation
hodgkin disease Uninflect
Sort words disease hodgkin
![Page 21: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/21.jpg)
Lister Hill National Center for Biomedical Communications 21
Normalization: Example
Hodgkin Disease HODGKINS DISEASE Hodgkin's Disease Disease, Hodgkin's Hodgkin's, disease HODGKIN'S DISEASE Hodgkin's disease Hodgkins Disease Hodgkin's disease NOS Hodgkin's disease, NOS Disease, Hodgkins Diseases, Hodgkins Hodgkins Diseases Hodgkins disease hodgkin's disease Disease, Hodgkin
normalize disease hodgkin
![Page 22: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/22.jpg)
Lister Hill National Center for Biomedical Communications 22
Normalization Applications
Model for lexical resemblance Help find lexical variants for a term
Terms that normalize the same usually share the same LUI
Help find candidates to synonymy among terms Help map input terms to UMLS concepts
![Page 23: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/23.jpg)
Lister Hill National Center for Biomedical Communications 23
Indexes
Word index word to Metathesaurus strings one word index per language
Normalized word index normalized word to Metathesaurus strings English only
Normalized string index normalized term to Metathesaurus strings English only
![Page 24: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/24.jpg)
Lister Hill National Center for Biomedical Communications 24
Lexical Variant Generation program
Tool for specialists (linguists) Performs atomic lexical transformations
generating inflectional variants lowercase …
Performs sequences of atomic transformations a specialized sequence of transformations provides the
normalized form of a term (the norm program)
![Page 25: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/25.jpg)
Lister Hill National Center for Biomedical Communications 25
Related NLM tools
http://umlslex.nlm.nih.gov/
![Page 26: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/26.jpg)
Lexical resources
Other resources
![Page 27: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/27.jpg)
Lister Hill National Center for Biomedical Communications
Similar resources
BioLexicon Developed as part of the EU project BOOTStrep Focus on biological entities (genes, proteins, chemical
entities, organisms) in support of the extraction of gene regulation events
BioThesaurus Developed as part of the Protein Information Resource Focus on proteins
27
![Page 28: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/28.jpg)
Lister Hill National Center for Biomedical Communications 28
Need for additional resources
More generic WordNet
More specific Lexical items specific to specialized subdomains
Not listed in biolexicons Not amenable to normalization
Examples Genes, proteins
– MAPK3 / Mapk3 / mapk3 Chemicals
– 5’-3’ exonuclease / 3’-5’ exonuclease Drugs Acronyms
![Page 29: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/29.jpg)
Lister Hill National Center for Biomedical Communications 29
Gene and protein names
Additional resources
Additional identification methods e.g., ABGene (Tanabe & Wilbur, NCBI) BioCreAtIvE
Gene mention identification Gene normalization
Genew http://www.gene.ucl.ac.uk/nomenclature/ Entrez Gene http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene UniProt http://www.ebi.uniprot.org/index.shtml
![Page 30: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/30.jpg)
Lister Hill National Center for Biomedical Communications 30
Chemical names
Additional resources PubChem http://pubchem.ncbi.nlm.nih.gov/ ChemIDplus http://chem.sis.nlm.nih.gov/chemidplus/chemidlite.jsp ChEBI http://www.ebi.ac.uk/chebi/
![Page 31: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/31.jpg)
Lister Hill National Center for Biomedical Communications 31
Drug names
Covered by UMLS Specialized resource: RxNorm
Branded names / generic names Various levels of aggregation
Ingredient Ingredient + dose Ingredient + form Ingredient + dose + form
Codes in various reference systems Mostly US drugs, few “over-the-counter” drugs
![Page 32: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/32.jpg)
Lister Hill National Center for Biomedical Communications 32
Acronyms
Many resources available AcroMine
http://www.nactem.ac.uk/software/acromine/ ARGH: Biomedical Acronym Resolver
http://lethargy.swmed.edu/ARGH/argh.asp Stanford Biomedical Abbreviation Server
http://bionlp.stanford.edu/abbreviation/ AcroMed
http://medstract.med.tufts.edu/acro1.1/index.htm SaRAD
http://www.hpl.hp.com/research/idl/projects/abbrev.html
![Page 33: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/33.jpg)
Terminological resources
UMLS Metathesaurus
http://www.nlm.nih.gov/research/umls/
![Page 34: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/34.jpg)
Lister Hill National Center for Biomedical Communications
Source Vocabularies
168 source vocabularies 21 languages Broad coverage of biomedicine
8.5M normalized names 2.9M concepts >10M relations
Common presentation
(2013AA)
![Page 35: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/35.jpg)
Lister Hill National Center for Biomedical Communications 35
Organize terms
Synonymous terms clustered into a concept Preferred term Unique identifier (CUI)
Addison's disease
Addison Disease MeSH D000224 Primary hypoadrenalism MedDRA 10036696 Primary adrenocortical insufficiency ICD-10 E27.1 Addison's disease (disorder) SNOMED CT 363732003
C0001403
![Page 36: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/36.jpg)
Lister Hill National Center for Biomedical Communications 36
Organize concepts
Inter-concept relationships: hierarchies from the source vocabularies
Redundancy: multiple paths
One graph instead of multiple trees (multiple inheritance)
A
B D E H D E
B
G H
E F H
C
B C
A
E F D
G H
![Page 37: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/37.jpg)
Lister Hill National Center for Biomedical Communications 37
Integrating subdomains
Biomedical literature
MeSH
Genome annotations
GO Model organisms
NCBI Taxonomy
Genetic knowledge bases
OMIM
Clinical repositories
SNOMED CT Other subdomains
…
Anatomy
FMA
UMLS
![Page 38: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/38.jpg)
Lister Hill National Center for Biomedical Communications 38
Integrating subdomains
Biomedical literature
Genome annotations
Model organisms
Genetic knowledge bases
Clinical repositories
Other subdomains
Anatomy
![Page 39: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/39.jpg)
Lister Hill National Center for Biomedical Communications 39
Neurofibromatosis type 2 (NF2) is often not recognised as a distinct entity from peripheral neurofibromatosis. NF2 is a predominantly intracranial condition whose hallmark is bilateral vestibular schwannomas. NF2 results from a mutation in the gene named merlin, located on chromosome 22.
Entity mention vs. resolution
UMLS:C0254123 EG:4771 HGNC:7773 UniProt:P35240
UMLS:C0027832 MeSH:D016518 SNOMEDCT:92503002 OMIM:101000
![Page 40: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/40.jpg)
Lister Hill National Center for Biomedical Communications 40
Other subdomains
…
Trans-namespace resolution (1)
Genome annotations
GO Model organisms
NCBI Taxonomy
Anatomy
FMA
Clinical repositories
Neurofibromatosis, type 2 (92503002)
Genetic knowledge bases
OMIM
UMLS Biomedical literature
MeSH
SNOMED CT
UMLS Neurofibromatosis 2
(D016518)
C0027832
NEUROFIBROMATOSIS, TYPE II (101000)
![Page 41: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/41.jpg)
Lister Hill National Center for Biomedical Communications 41
RxNorm
Trans-namespace resolution (2)
Nizoral, 200 mg oral tablet (MMSL:2140)
Ketoconazole 200 MG Oral Tablet [Nizoral] (RxNorm:201896)
Ketoconazole 200 MG Oral Tablet (RxNorm:197853)
Ketoconazole Tab 200 MG (MDDB:13317)
tradename of
Nizoral (RxNorm:202692)
has ingredient
Source: Multum [generic drug]
Target: Medi-Span [generic drug]
Ketoconazole (RxNorm:6135)
tradename of
http://mor.nlm.nih.gov/download/rxnav/
![Page 43: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/43.jpg)
Lister Hill National Center for Biomedical Communications 43
MetaMap
UMLS-based entity recognition system Linguistically motivated Exploits both the SPECIALIST lexicon and
Metathesaurus In practice, used to identify UMLS concepts in
biomedical text Freely available (UMLS license) Two versions
Web-based Standalone (MMTx)
![Page 44: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/44.jpg)
Lister Hill National Center for Biomedical Communications 44
Neurofibromatosis type 2 (NF2) is often not recognised as a distinct entity from peripheral neurofibromatosis. NF2 is a predominantly intracranial condition whose hallmark is bilateral vestibular schwannomas. NF2 results from a mutation in the gene named merlin, located on chromosome 22.
MetaMap Example
C0027832 C0027832
C0027831 C0027832
C0027859 C0027832
C0026882 C0254123
C0008665
Neurofibromin 2 MeSH Merlin SNOMED CT Schwannomin MeSH Schwannomerlin NCI Thesaurus
C0254123
![Page 45: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/45.jpg)
Lister Hill National Center for Biomedical Communications
MetaMap Recent developments
Negation processing (NegEx) Word sense disambiguation Integration in NLP pipelines
UIMA wrapper Performance improvements User-defined acronyms Applications
MetaMap is a key component of the Medical Text Indexer (MEDLINE indexing) From indexing support to “first-line indexing”
45
![Page 46: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/46.jpg)
Terminological resources
Other NER systems
![Page 47: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/47.jpg)
Lister Hill National Center for Biomedical Communications 47
BioPortal Annotator
Results filtered to SNOMED CT and MeSH
![Page 48: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/48.jpg)
Lister Hill National Center for Biomedical Communications 48
TerMine
![Page 49: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/49.jpg)
Lister Hill National Center for Biomedical Communications 49
Whatizit
![Page 50: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/50.jpg)
Ontological resources
![Page 51: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/51.jpg)
Lister Hill National Center for Biomedical Communications 51
Ontological resources
Provide background knowledge For resolving ambiguity in entity recognition
Merlin: Protein or Bird?
For relation extraction Template relations between high-level concepts Used in combination with clues from linguistic phenomena in
text
![Page 52: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/52.jpg)
Lister Hill National Center for Biomedical Communications 52
Ontological resources
Various level of formality Formal top-level ontologies (e.g., BioTop) Informal top-level ontologies (e.g., UMLS Semantic
Network) Domain-Range constraints for roles in DL-based
terminologies (e.g., SNOMED CT, NCI Thesaurus) Relations in terminologies
Various level of granularity UMLS Semantic Network: 133 types Foundational Model of Anatomy: 70,000 classes
![Page 53: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/53.jpg)
Ontological resources
UMLS Semantic Network
http://semanticnetwork.nlm.nih.gov/
![Page 54: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/54.jpg)
Lister Hill National Center for Biomedical Communications 54
“Biologic Function” hierarchy (isa)
Biologic Function
Pathologic Function Physiologic Function
Disease or Syndrome
Cell or Molecular
Dysfunction
Experimental Model of Disease
Organism Function
Organ or Tissue Function
Cell Function
Molecular Function
Mental or Behavioral
Dysfunction
Neoplastic Process
Mental Process
Genetic Function
![Page 55: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/55.jpg)
Lister Hill National Center for Biomedical Communications 55
Associative (non-isa) relationships Organism
process of
Embryonic Structure
Anatomical Abnormality
Congenital Abnormality
Acquired Abnormality
Fully Formed Anatomical
Structure
Anatomical Structure
part of
Organism Attribute
property of
Body Substance
contains, produces
conceptual part of
evaluation of
Body System conceptual part of
part of
Body Part, Organ or Organ Component
part of
Tissue
part of
Cell
part of
Cell Component
Gene or Genome
Body Space or Junction
adjacent to
location of
location of
evaluation of Finding
Laboratory or Test Result
Sign or Symptom
Biologic Function
Physiologic Function
Pathologic Function
Body Location or Region
conceptual part of
conceptual part of
Injury or Poisoning
disrupts
disrupts
co-occurs with
![Page 56: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/56.jpg)
Heart
Concepts
Metathesaurus
Esophagus
Left Phrenic Nerve
Heart Valves
Fetal Heart
Medias- tinum
Saccular Viscus
Angina Pectoris
Cardiotonic Agents
Tissue Donors
Anatomical Structure
Fully Formed Anatomical Structure
Embryonic Structure
Body Part, Organ or Organ Component Pharmacologic
Substance
Disease or Syndrome
Population Group
Semantic Types
Semantic Network
![Page 57: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/57.jpg)
Ontological resources
SemRep
![Page 58: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/58.jpg)
Lister Hill National Center for Biomedical Communications 58
Neurofibromatosis type 2 (NF2) is often not recognised as a distinct entity from peripheral neurofibromatosis. NF2 is a predominantly intracranial condition whose hallmark is bilateral vestibular schwannomas. NF2 results from a mutation in the gene named merlin, located on chromosome 22.
SemRep Relation extraction
C0027832 C0027832
C0027831 C0027832
C0027859 C0027832
C0026882 C0254123
C0008665
Neurofibromin 2 C0254123
Chromosomes, Human, Pair 22 C0008665
part of
![Page 59: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/59.jpg)
Lister Hill National Center for Biomedical Communications
SemRep Recent developments
All of MEDLINE processed 60M predications
Use of graph theory principles for selecting salient associations
Applications Literature-based discovery
Relationship between testosterone, cortisol and sleep quality in aging men
Automatic summarization Semantic MEDLINE http://skr3.nlm.nih.gov/SemMedDemo/
59
![Page 60: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/60.jpg)
Ontological resources
Other resources
![Page 61: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/61.jpg)
Lister Hill National Center for Biomedical Communications 61
Other ontological resources
Ontology integration systems NCBO BioPortal
http://www.bioontology.org/BioPortal Ontologies
Top-level ontologies (e.g., BioTop) Domain ontologies (e.g., FMA, SNOMED CT, NCI
Thesaurus)
![Page 62: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/62.jpg)
Lister Hill National Center for Biomedical Communications 62
Other relation extraction systems
Many relation extraction systems available Specialized
Protein-protein interaction (e.g., Info-PubMed, TextPresso, …) BioCreAtIvE (task 2)
More generic (e.g., MedLEE / BioMedLEE) Commercial systems (TeSSI, Linguamatics, …)
![Page 63: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/63.jpg)
Lister Hill National Center for Biomedical Communications 63
Future directions
Evolution of existing resources Modularization Compatibility with NLP pipelines (UIMA, GATES) Services (web services, CTS2)
Clinical text Limited availability to non-clinical institutions Opportunity to develop new tools (e.g., de-
identification) Integration with knowledge bases
Definitional vs. assertional knowledge
Collaborative development
![Page 64: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/64.jpg)
Lister Hill National Center for Biomedical Communications
ORBIT project
64
http://orbit.nlm.nih.gov/
![Page 65: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/65.jpg)
Lister Hill National Center for Biomedical Communications 65
References
Bodenreider O. Lexical, terminological and ontological resources for biological text mining. In: Ananiadou S, McNaught J, editors. Text mining for biology and biomedicine: Artech House; 2006. p. 43-66.
![Page 66: NLM Resources for Mining Biomedical Text2013/06/18 · NLM Resources for Mining Biomedical Text 3rd Annual i2b2 AUG Meeting & NLP Workshop Boston, MA June 18, 2013 Olivier Bodenreider](https://reader033.vdocument.in/reader033/viewer/2022041908/5e655dd164836152ca11f95e/html5/thumbnails/66.jpg)
Medical Ontology Research
Olivier Bodenreider
Lister Hill National Center for Biomedical Communications Bethesda, Maryland - USA
Contact: Web:
[email protected] mor.nlm.nih.gov