keynote rise 2015_jonquet

43
Atelier Recherche d’Information Sémantique, RISE’15 30 juin 2015 – Rennes Clement Jonquet – [email protected] A few contributions of the SIFR (Semantic Indexing of French biomedical Resources project) and how we reuse NCBO technology

Upload: clement-jonquet

Post on 17-Aug-2015

65 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Keynote rise 2015_jonquet

Atelier Recherche d’Information Sémantique, RISE’15

30 juin 2015 – Rennes

Clement Jonquet – [email protected]

A few contributions of

the SIFR (Semantic Indexing of French

biomedical Resources project)

and how we reuse NCBO

technology

Page 2: Keynote rise 2015_jonquet

How is this relevant to RISE?

Modèles de Recherche d'Information Sémantique

Extraction d'Information

Annotation Sémantique

Indexation Sémantique

Alignement d'ontologies et correspondances pour la Recherche d'Information

Langages de Représentation des connaissances pour la Recherche d'Information

Utilisation des distances Sémantiques pour la Recherche d'Information

Atelier RISE 2015

30 juin 2015, Rennes

Page 3: Keynote rise 2015_jonquet

A few introduction words

Atelier RISE 2015

30 juin 2015, Rennes

Page 4: Keynote rise 2015_jonquet

Biologist have adopted

ontologies To provide canonical representation of scientific

knowledge

To annotate experimental data to enable interpretation, comparison, and discovery across databases

To facilitate knowledge-based applications for

Decision support

Natural language-processing

Data integration

But ontologies are: spread out, in different formats, of different size, with different structures

Atelier RISE 2015

30 juin 2015, Rennes

Page 5: Keynote rise 2015_jonquet

Working with terminologies &

ontologies – a portal please!

You’ve built an ontology, how do you let the world know?

You need an ontology, where do you go o get it?

How do you know whether an ontology is any good?

How do you find resources that are relevant to the domain of the ontology (or to specific terms)?

How could you leverage your ontology to enable new science?

How could you use ontologies without managing them ?

Atelier RISE 2015

30 juin 2015, Rennes

Page 6: Keynote rise 2015_jonquet

Atelier RISE 2015

30 juin 2015, Rennes

Comparison of the

approaches

[IWBBIO'14]

Page 7: Keynote rise 2015_jonquet

Annotation challenge

Explosion of biomedical data: diverse, distributed, unstructured… not linked to ontologies

Hard for biomedical researchers to find the data they need

Data integration problem

Translational discoveries are prevented

Good examples

GO annotations

PubMed (biomedical literature) indexed with Mesh headings

Annotate data with ontology concepts

Horizontal approachONTOLOGIES

RESOURCES

Atelier RISE 2015

30 juin 2015, Rennes

Page 8: Keynote rise 2015_jonquet

Good use of the semantics (1/2) Simple keywords based search miss results

Atelier RISE 2015

30 juin 2015, Rennes

Page 9: Keynote rise 2015_jonquet

Good use of the semantics (2/2)

Atelier RISE 2015

30 juin 2015, Rennes

Page 10: Keynote rise 2015_jonquet

A few words about SIFR

project

Atelier RISE 2015

30 juin 2015, Rennes

Page 11: Keynote rise 2015_jonquet

Semantic Indexing of

French Biomedical Data

Resources project

… in collaboration with…

Page 12: Keynote rise 2015_jonquet

People

Young researchers

Clement Jonquet

Mathieu Roche

Sandra Bringay

Advisors

Stefano A. Cerri

Maguelonne Teisseire

Pascal Poncelet

Staff

Vincent Emonet

Students

Juan Antonio Lossio Ventura

Guillaume Surroca

~3 MSc students / year

Close collaborators

Philippe Lemoisson (TETIS)

Pierre Larmande (IRD / IBC)

Mark Musen (BMIR)

Stefan Darmoni (CISMEF)

Sebastien Harispe (LGI2P)

Atelier RISE 2015

30 juin 2015, Rennes

Page 13: Keynote rise 2015_jonquet

Increasing number of biomedical

data + multilingualism

Limits of keyword-based indexing

Biomedical community has turned to ontologies to describe their

data and turn them into structured and formalized knowledge

Using ontologies is by means of creating semantic annotations

Crucial need for tools & services for French biomedical data

Biomedical data integration challenge

New potential sceintific discoveries hidden in data

Translational research

Atelier RISE 2015

30 juin 2015, Rennes

Page 14: Keynote rise 2015_jonquet

Use ontologies for indexing, mining

and searching (French) biomedical

data

Obj1: Design, development and deploymentof the French Annotator.

Obj2: Obtain new research results to exploit and enhance ontology-based indexingservices.

semantic distances

ontology alignment

ontology enrichment and disambiguation

Obj3: Valorization of indexing services

Atelier RISE 2015

30 juin 2015, Rennes

Page 15: Keynote rise 2015_jonquet

Atelier RISE 2015

30 juin 2015, Rennes

A French biomedical Annotator

Page 16: Keynote rise 2015_jonquet

Atelier RISE 2015

30 juin 2015, Rennes

Use biomedical ontologies-based

annotations end-user applications

Page 17: Keynote rise 2015_jonquet

Reuse of the NCBO

technology

Atelier RISE 2015

30 juin 2015, Rennes

Page 18: Keynote rise 2015_jonquet

Bioportal : A “one stop shop”

for Biomedical Ontologies

Web repository for biomedical ontologies

Make ontologies accessible and usable – abstraction on format, locations, structure, etc.

Users can publish, download, browse, search, comment, align ontologies and use them for annotations both online and via a web services API.

Online support for ontology

Peer review

Notes (comments and discussion)

Versioning

Mapping

Search

Resources

Atelier RISE 2015

30 juin 2015, Rennes

Page 19: Keynote rise 2015_jonquet

http://bioportal.bioontology.org

BioPortal Ontology Repository

Page 20: Keynote rise 2015_jonquet

htt

p:/

/dat

a.b

ioo

nto

logy

.org

Ontology Services

• Search• Traverse• Comment• Download

Widgets• Tree-view• Auto-complete• Graph-view

Annotation

Data Access

Mapping Services

• Create• Upload• Download

Term recognition

Search “data”annotated with a given term

http://bioportal.bioontology.org Atelier RISE 2015

30 juin 2015, Rennes

Page 21: Keynote rise 2015_jonquet

Current axes of research

Atelier RISE 2015

30 juin 2015, Rennes

Page 22: Keynote rise 2015_jonquet

SIFR axes of research (1/8):

Design of the SIFR (French)

Annotator service

Deployment of a local instance of BioPortal at LIRMM

16 French terminologies imported from UMLS, EHTOP & BioPortal

UTF8 compliant Mgrep concept recognizer (Univ. of Michigan)

http://bioportal.lirmm.fr/annotator

New improvement to the annotation workflow

Automatic term extraction measures (C-value, LIDF-value, etc.)

Scoring of annotations & representation in RDF using the AO

[SWAT4LS 2014]

Atelier RISE 2015

30 juin 2015, Rennes

Page 23: Keynote rise 2015_jonquet

Improving the Annotator(s) –

example with scoring

Objective : To improve the Annotator(s) results by ranking the annotations according to their relevance

While not changing the service implementation

Take into account their frequencies (as originally proposed in 2009 and removed)

Add a term extraction measure, called C-Value, used to positively discriminate annotations generated from matches with multi-word terms.

2 new scoring methods allowing to score and rank annotations by their importance in the given input data

Interesting results validated against PubMed manual annotations

[SWAT4LS 2014]

Atelier RISE 2015

30 juin 2015, Rennes

Page 24: Keynote rise 2015_jonquet

SIFR axes of research (2/8):

Dealing with multilingualism within

BioPortal

Status of multilingualism in BioPortal – quite negative

Set of propositions [MSW 2014]

Representation of natural language property for an ontology

Representation of the distinction between ontologies

Representation of relation between ontologies

Representation of multilingual translation mappings

Reconciliation of multilingual mappings (possible PhD collaboration with

ESI)

Currently being tested/implemented within our local instance

Atelier RISE 2015

30 juin 2015, Rennes

Page 25: Keynote rise 2015_jonquet

What is being multilingual?

Interface internationalization = displaying static elements of the user interface (e.g., menu names, help, etc.) in different languages

Content internationalization = displaying BioPortal content (e.g., ontology labels, mappings, etc.) in different languages

Multilingual = internationalization (display) + to enabling a complete use of the functionalities and services of BioPortal for multilingual ontologies or monolingual ontologies

completely and properly addressed (languages, translations, multilingual mappings, etc.)

rich semantic description

Being able to parse multilingual content in ontologies (from xmllang to Lemon)

Atelier RISE 2015

30 juin 2015, Rennes

Page 26: Keynote rise 2015_jonquet

multilingual

ontology

Atelier RISE 2015

30 juin 2015, Rennes

en:diseasefr:maladie

...en:cancerfr:cancer

en:spindel cell sarcomefr:sarcome à cellules fusiformes

en:melanomafr:mélanome

disease

... cancer

spindle cell sarcome melanoma

maladie

... cancer

sarcome à cellules fusiformes

mélanome

language specific ontology (monolingual)

Page 27: Keynote rise 2015_jonquet

SIFR axes of research (3/8):

Automatic extraction of biomedical

terminology from text

Context of the PhD of Juan Antonio Lossio[LBM 2013][TALN 2014][PolTAL 2014]

BioTex , software http://tubo.lirmm.fr/biotex [ISWC 2014]

Work in French, English and Spanish

Motivations for automatic terminology extraction

Experiment and validate approaches for French data

Contribute to the ontology enrichment process

Acquire some NLP expertise for the annotation workflow

Atelier RISE 2015

30 juin 2015, Rennes

Page 28: Keynote rise 2015_jonquet

Atelier RISE 2015

30 juin 2015, Rennes

Page 29: Keynote rise 2015_jonquet

Statistical methods

C-value: Improves the extraction of longest terms

soft contact soft contact lens

Frantzi, K., Ananiadou, S., & Mima, H. (2000). Automatic recognition of multi-word terms:. the

c-value/nc-value method. International Journal on Digital Libraries, 3(2), 115-130.

Atelier RISE 2015

30 juin 2015, Rennes

Page 30: Keynote rise 2015_jonquet

Atelier RISE 2015

30 juin 2015, Rennes

Page 31: Keynote rise 2015_jonquet

Atelier RISE 2015

30 juin 2015, Rennes

Page 32: Keynote rise 2015_jonquet

Include BioTex into BioPortal

Use BioPortal dictionary for validation

New ontology enrichment service… give a corpus of data and

see what are the terms not yet covered

Atelier RISE 2015

30 juin 2015, Rennes

Page 33: Keynote rise 2015_jonquet

SIFR axes of research (4/8):

Semantic distance framework

Automatically compute existing (Rada, Wu&Palmer, Resnik)

semantic similarity measures over BioPortal ontologies

For a given concept get all semantically closed concepts

Get the semantic distance between 2 concepts

Collaboration with LGI2P to reuse Semantic Measure Library

(SML) within BioPortal

1st prototype: http://tubo.lirmm.fr/BioMedicalSemantic/web/app_dev.php

To include SML within BioPortal backend to bring semantic

distance services to the ontologies and data annotated

Atelier RISE 2015

30 juin 2015, Rennes

Page 34: Keynote rise 2015_jonquet

SIFR axes of research (5/8):

Informal patient data analysis

Dealing with public patient data on blogs, forums and

tweets (Sandra Bringay)

Detection of emotion [EGC 2014][eTELEMED 2014]

Patient vocabulary (crabe vs. cancer)

Project “Parlons de nous” (www.lirmm.fr/patient-mind)

MSH-M

A patient vocabulary currently being constructed [IC 2015]

Hosted and available in our local instance of BioPortal

Used for annotations, indexing, information retrieval

Atelier RISE 2015

30 juin 2015, Rennes

Page 35: Keynote rise 2015_jonquet

SIFR axes of research (6/8):

Viewpoint: a subjective knowledge

representation formalism

Collaboration with P. Lemoisson (CIRAD) & PhD of G. Surroca

Graph based knowledge representation formalism

Linked data from the semantic Web and user contributions

from the social Web.

Unified topological approach

First prototype for semantic search over HAL-LIRMM

publications [IC2014]

Capture the phenomenon of Serendipity

(i.e., incidental learning) [IC 2015]

Atelier RISE 2015

30 juin 2015, Rennes

Page 36: Keynote rise 2015_jonquet

SIFR axes of research (7/8):

Pharmacogenomics use case PGx studies how individual gene variations cause variability in

drug responses

Validation of pharmacogenomics state-of-the-art knowledge on

the basis of practice-based evidences

Compare pharmacogenomics literature (in English) and electronic

health records (in French)

EHRs from Paris (HEGP) & St Etienne hospitals

Improvement of the AnnotatorS to come to handle clinical data:

negation, disambiguation, modularity, temporality

Project submitted to ANR generic call 2015 (April 27th)

Collaborative action lead by Adrien Coulet (LORIA)

Stanford is in the loop (Russ, Mark, Michel, Nigam)

Atelier RISE 2015

30 juin 2015, Rennes

Page 37: Keynote rise 2015_jonquet

SIFR axes of research (8/8):

application to agronomy & plant

Within the Institute of Computational Biology of

Montpellier

Design of a semantic annotation workflow for plant data -

collaboration with IBC project [CO-PDI 2014]

AgroLD: to build an RDF knowledge base to house plant data

resources: SouthGreen, Gramene, OryGeneDB… [RDA 2014]

AgroPortal: reference ontology repository for the agronomic

domain [IN-OVIVE 2015]

Experiment NCBO technologies for the plant community

4 driving agronomic use cases

Atelier RISE 2015

30 juin 2015, Rennes

Page 38: Keynote rise 2015_jonquet

Objectives of AgroPortal project

Develop and support a reference ontology repository for the

agronomic domain

One-stop-shop for plant/agronomic related ontologies

Primary focus on the agronomic & plant domain

Reusing the NCBO BioPortal technology

Avoid to re-implement what has been done

Facilitate interoperability

Reusing the scientific outcomes, experience & methods of the

biomedical domain

Enable straightforward use of agronomic related ontologies

Respect the requirements of the agronomic community

Fully semantic web compliant infrastructure

Atelier RISE 2015

30 juin 2015, Rennes

Page 39: Keynote rise 2015_jonquet

AgroPortal 50 ontologies relevant to agronomic and plant

Atelier RISE 2015

30 juin 2015, Rennes

Page 40: Keynote rise 2015_jonquet

A few conclusions

Atelier RISE 2015

30 juin 2015, Rennes

Page 41: Keynote rise 2015_jonquet

Next future

Continue to move different prototypes into production

Release of the French Annotator

Find more use cases

Collaboration with the plant/agro community

Continue reusing and contributing to NCBO technology

Atelier RISE 2015

30 juin 2015, Rennes

Page 42: Keynote rise 2015_jonquet

Online resources

Web page: www.lirmm.fr/sifr

https://www.researchgate.net/projects

Code repository: https://github.com/sifrproject

13 developpers

10 repositories

Publications: http://bit.ly/194ImnR

Direct link to HAL-LIRMM platform

with advance search features

Portals & services:

http://bioportal.lirmm.fr

http://agroportal.lirmm.fr

Atelier RISE 2015

30 juin 2015, Rennes

Page 43: Keynote rise 2015_jonquet

Questions & Remarks ?

Atelier RISE 2015

30 juin 2015, Rennes