a school of information science, federal university of minas gerais , brazil

16
a School of Information Science, Federal University of Minas Gerais, Brazil b Medical University of Graz, Austria, c University Medical Center Freiburg, Germany d European Bioinformatics Institute, Hinxton, UK; e University of Geneva, Switzerland f Helsinki University, Finland, g Uniquer, Lausanne, Switzerland Requirements for Semantic Biobanks André Q ANDRADE a,b, , Markus KREUZTHALER b , Janna HASTINGS d,e , Maria KRESTYANINOVA f,g , Stefan SCHULZ b,c

Upload: ursula-moreno

Post on 01-Jan-2016

19 views

Category:

Documents


0 download

DESCRIPTION

Requirements for Semantic Biobanks. André Q ANDRADE a,b , , Markus KREUZTHALER b , Janna HASTINGS d,e , Maria KRESTYANINOVA f,g , Stefan SCHULZ b,c. a School of Information Science, Federal University of Minas Gerais , Brazil - PowerPoint PPT Presentation

TRANSCRIPT

aSchool of Information Science, Federal University of Minas Gerais, Brazil bMedical University of Graz, Austria, cUniversity Medical Center Freiburg, GermanydEuropean Bioinformatics Institute, Hinxton, UK; eUniversity of Geneva, Switzerland

fHelsinki University, Finland, gUniquer, Lausanne, Switzerland

Requirements for Semantic Biobanks

André Q ANDRADEa,b,, Markus KREUZTHALERb, Janna HASTINGSd,e , Maria KRESTYANINOVAf,g ,

Stefan SCHULZ b,c

• Semantic interoperability: systems exchange

exchange data + meaning

• Formal Ontologies provide unambiguous

descriptions of what is universally true for all

objects of a certain type

• Increasing number of biomedical

vocabularies are ontology based

(OBO Foundry, SNOMED CT…)

• Blood, tissue sampling for research

• Samples from several biobanks needed for

retrieving data for a specific research

question

• Comprehensive annotations with lab data

and clinical data

BiobanksSemantic

Model of Meaning Data

(Generalized) Biomedical Retrieval Scenario

• Retrieval: – Distribution of heterogeneous resources of interest

– Most retrieval scenarios recall-oriented

• Resources used by multiple researchers over the world

for multiple purposes

• Effective retrieval depends on querying resource

metadata– Provenance information

– Content-based semantic annotations (structured vocabulary)

– Access regulations

Does this sound familiar?

Analogy

Analogy

• Global bibliographic database

• Resources: publications from

different publishers

• Annotations:– Bibliographic data

– Abstract

– Semantic representation (MeSH) on

paper content

• Local access conditions to the full

resource apply

Analogy

• Global bibliographic database

• Resources: publications from

different publishers

• Annotations:– Bibliographic data

– Abstract

– Semantic representation (MeSH) of

paper content

• Local access conditions to the full

resource apply

Biobank“Broker”

• Global biobank sample database

• Resources: biological specimens

(blood, tissue,…)

• Annotations:– Sample information (staining etc…)

– Semantic representation of both lab and

selected patient related information

(Information models / ontologies)

• Local access conditions to the full

resource apply

• Sample related information:– Type of sample– Preparation of sample– Time– Storage information– Physical location– Associated information, lab data,

genotype,…• Donor related information:

– Demographic data– Phenotype data – Time indexed clinical data

(EHR extracts)

• Increment of relevant donor related information after samples are taken

Data resources for biobanking

1960 1970 1980 1990 2000 2010

Centralized broker for biobanking information

+

Biobank

* +EHR

+

Biobank

* +EHR

+

Biobank

* +EHR

+

Biobank

* +EHR

Centralized broker for biobanking information

+

Biobank

* +EHR

+

Biobank

* +EHR

+

Biobank

* +EHR

+

Biobank

* +EHR

Centralized broker for biobanking information

+

Biobank

* +EHR

+

Biobank

* +EHR

+

Biobank

* +EHR

+

Biobank

* +EHR

Language for semantic annotations of biobank data

• Formal ontologies – Precise, logical descriptions of annotations and queries

– High expressiveness through compositionality

– OWL-DL: Semantic Web Standard for description logics: allows to

formulate axioms of what is universally true of all instances of a

kind

• Specific components– Ground axioms provided by an upper level ontology (BioTop)

– Set of disjoint upper level categories and relations, together with

related constraints

– Ontological description of domain: SNOMED CT, OBO Foundry…

BioTop categories and example axiom

Description logics representation and retrieval

retrieves

• “retrieve all gastric mucosa samples from before 2003 of patients who had cancer of

stomach after 2008”

• Representation language: OWL DL

• Editor: Protégé 4.2.

• Reasoner: HermiT

Requirements

• Formal representations – Ontological representation of information models and terminologies

– Ontological representation of data about specimens

– Joint, universally used clinical terminology

– Expressive and stable upper level ontologies (+ ontological relations)

• Scope and granularity of EHR extract of interest for biobank

related queries

• Specification of structure and function of central repository

• Steps for information translation from legacy systems– Mappings

– Interfaces

– Update policies

Challenges

• Prototypical status of DL reasoners and editor

• Performance problems with expressive ontologies

• Modularization of large clinical terminologies in response to

data and query under scrutiny

• Organization of – Central repository

– Local mappings / translations

• Logistics (samples)

• Privacy and IP issues

• Business model

Thanks

• CAPES (Brazil) – Programa de

Doutorado no País com Estágio no Exterior

• FP7 – NoE SemanticHealthNet

Andrade et al.: Requirements for Semantic Biobanks