biomedical knowledge visualization · 7/6/2004  · microsoft powerpoint -...

34
Olivier Bodenreider Olivier Bodenreider Lister Hill National Center Lister Hill National Center for Biomedical Communications for Biomedical Communications Bethesda, Maryland Bethesda, Maryland - - USA USA Biomedical Knowledge Visualization Bethesda, MD July 6, 2004 7 th International Protégé Conference 2 nd Workshop on Visualizing Information in Knowledge Engineering (VIKE’04)

Upload: others

Post on 17-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

  • Olivier BodenreiderOlivier Bodenreider

    Lister Hill National CenterLister Hill National Centerfor Biomedical Communicationsfor Biomedical CommunicationsBethesda, Maryland Bethesda, Maryland -- USAUSA

    Biomedical Knowledge Visualization

    Bethesda, MD July 6, 2004

    7th International Protégé Conference2nd Workshop on Visualizing Informationin Knowledge Engineering (VIKE’04)

  • UMLS Semantic Navigator SemNav

    http://umlsks.nlm.nih.gov*

    SN Resources Semantic Navigator(* free UMLS registration required)

  • 3Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

    UUnified nified MMedical edical LLanguage anguage SSystemystem®®

    ◆◆ Developed at NLM since 1990Developed at NLM since 1990

    ◆◆ 1515thth edition in 2004edition in 2004

    ◆◆ Integrates some 60 terminological resourcesIntegrates some 60 terminological resources●● Clinical vocabularies (including specialties)Clinical vocabularies (including specialties)

    ●● Core terminologies (anatomy, drugs, med. devices)Core terminologies (anatomy, drugs, med. devices)

    ●● Administrative terminologies, standardsAdministrative terminologies, standards

    ◆◆ IntegrationIntegration●● Synonymous terms are clustered in a conceptSynonymous terms are clustered in a concept

    ●● Hierarchies (trees) are combined in a graph structureHierarchies (trees) are combined in a graph structure

  • 4Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

    Terminology integration Terminology integration TermsTerms

    Duchenne muscular dystrophy

    MeSH, SNOMEDCTV3, Jablonski,CRISP, DxPlain,MedDRA, LOINC

    pseudohypertrophic muscular dystrophyMeSH, CTV3SNOMED

    X-liked recessive muscular dystrophy Jablonski

    Duchenne de Boulogne muscular dystrophy Jablonski

    Duchenne’s muscular dystrophy COSTAR

    severe generalized familial muscular dystrophy SNOMED

    Duchenne type progressive muscular dystrophy SNOMED

  • 5Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

    Terminology integration Terminology integration RelationshipsRelationships

    UMLS

    Adrenal Cortex Diseases

    Hypoadrenalism

    Adrenal Gland Hypofunction

    Adrenal cortical hypofunction

    Adrenal Gland Diseases

    Addison’s Disease

    SNOMEDMeSHAODRead Codes

  • 6Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

    UMLSUMLS

    ◆◆ TwoTwo--level structurelevel structure●● Semantic NetworkSemantic Network

    ■■ 135 Semantic Types (135 Semantic Types (STsSTs))

    ■■ 54 types of relationships54 types of relationshipsamong among STsSTs

    ●● MetathesaurusMetathesaurus■■ >1M concepts>1M concepts

    ■■ ~12 M inter~12 M inter--conceptconceptrelationshipsrelationships

    ●● Link = categorizationLink = categorizationConcept

    Metathesaurus

    SemanticType

    Semantic Network

    categorization

  • Heart

    Concepts

    Metathesaurus

    22

    225

    97

    4

    12

    9 31

    Esophagus

    Left PhrenicNerve

    HeartValves

    FetalHeart

    Medias-tinum

    SaccularViscus

    AnginaPectoris

    CardiotonicAgents

    TissueDonors

    AnatomicalStructure

    Fully FormedAnatomicalStructure

    EmbryonicStructure

    Body Part, Organ orOrgan Component Pharmacologic

    Substance

    Disease orSyndrome

    PopulationGroup

    Semantic Types

    SemanticNetwork

  • MeSH Browser

  • 12Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

    SemNavSemNav Visualization optionsVisualization options

  • 17Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

    SemNavSemNav RelationshipsRelationships

    Dystrophin

    Concepts

    Semantic Types

    MuscularDystrophy,Duchenne55

    Amino Acid,Peptide or Protein

    Disease orSyndrome

    Biologically ActiveSubstance

  • Gene Ontology browser

    http://mor.nlm.nih.gov/perl/gennav.pl

  • 19Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

    Gene OntologyGene Ontology™™

    ◆◆ Developed by the GO ConsortiumDeveloped by the GO Consortium

    ◆◆ Several components (GO database)Several components (GO database)●● Ontology (~17,000 concepts)Ontology (~17,000 concepts)

    ■■ Molecular functionsMolecular functions

    ■■ Cellular componentsCellular components

    ■■ Biological processesBiological processes

    ●● Gene products (~1.6M)Gene products (~1.6M)

    ●● Associations between Gene products and GO concepts Associations between Gene products and GO concepts (~6.8M)(~6.8M)

  • Material and Methods

  • Technical details

  • 26Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

    Technical detailsTechnical details

    ◆◆ Simple web/Simple web/cgicgi technology (apache, Perl)technology (apache, Perl)

    ◆◆ dot (dot (GraphVizGraphViz))●● PNG file (PNG file (--TpngTpng))

    ●● ClientClient--side map (side map (--TcmapTcmap))

    ◆◆ PrecomputePrecomputethe transitive closure on hierarchical the transitive closure on hierarchical relations to perform the transitive closure fastrelations to perform the transitive closure fast

    ◆◆ Remove cycles (UMLS)Remove cycles (UMLS)

  • Discussion Issues and Challenges

  • 28Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

    IssuesIssues

    ◆◆ SizeSize●● Large number of concepts (>1 million)Large number of concepts (>1 million)

    ◆◆ ComplexityComplexity●● PolyhierarchicalPolyhierarchicalstructuresstructures

    ●● Multiple information sourcesMultiple information sources

    ●● Multiple propertiesMultiple properties

    ◆◆ Lack of formalityLack of formality●● Redundant relationsRedundant relations

    ●● Hierarchies vs. hierarchical relationsHierarchies vs. hierarchical relations

  • 29Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

    ChallengesChallenges

    ◆◆ Restrict information spaceRestrict information space●● To selected information sources (To selected information sources (SemNavSemNav))

    ●● To selected organisms (To selected organisms (GenNavGenNav))

    ◆◆ Reduce complexity (Reduce complexity (SemNavSemNav))●● Group concepts by semantic groupsGroup concepts by semantic groups

    ●● Transitive reduction on hierarchical relationsTransitive reduction on hierarchical relations

    ●● Select coSelect co--occurring conceptsoccurring concepts

    ◆◆ Reduce the cognitive burden on the userReduce the cognitive burden on the user●● Use graphUse graph--based rather than treebased rather than tree--based representationsbased representations

  • 30Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

    SemNavSemNav Semantic groupsSemantic groups

  • 31Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

    ChallengesChallenges

    ◆◆ Restrict information spaceRestrict information space●● To selected information sources (To selected information sources (SemNavSemNav))

    ●● To selected organisms (To selected organisms (GenNavGenNav))

    ◆◆ Reduce complexity (Reduce complexity (SemNavSemNav))●● Group concepts by semantic groupsGroup concepts by semantic groups

    ●● Transitive reduction on hierarchical relationsTransitive reduction on hierarchical relations

    ●● Select coSelect co--occurring conceptsoccurring concepts

    ◆◆ Reduce the cognitive burden on the userReduce the cognitive burden on the user●● Use graphUse graph--based rather than treebased rather than tree--based representationsbased representations

  • 32Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

    SemNavSemNav Transitive reductionTransitive reduction

  • 33Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

    ChallengesChallenges

    ◆◆ Restrict information spaceRestrict information space●● To selected information sources (To selected information sources (SemNavSemNav))

    ●● To selected organisms (To selected organisms (GenNavGenNav))

    ◆◆ Reduce complexity (Reduce complexity (SemNavSemNav))●● Group concepts by semantic groupsGroup concepts by semantic groups

    ●● Transitive reduction on hierarchical relationsTransitive reduction on hierarchical relations

    ●● Select coSelect co--occurring conceptsoccurring concepts

    ◆◆ Reduce the cognitive burden on the userReduce the cognitive burden on the user●● Use graphUse graph--based rather than treebased rather than tree--based representationsbased representations

  • MedicalOntologyResearch

    Olivier BodenreiderOlivier Bodenreider

    Lister Hill National CenterLister Hill National Centerfor Biomedical Communicationsfor Biomedical CommunicationsBethesda, Maryland Bethesda, Maryland -- USAUSA

    Contact:Contact:Web:Web:

    [email protected]@nlm.nih.govmor.nlm.nih.govmor.nlm.nih.gov