keynote rise 2015_jonquet
TRANSCRIPT
Atelier Recherche d’Information Sémantique, RISE’15
30 juin 2015 – Rennes
Clement Jonquet – [email protected]
A few contributions of
the SIFR (Semantic Indexing of French
biomedical Resources project)
and how we reuse NCBO
technology
How is this relevant to RISE?
Modèles de Recherche d'Information Sémantique
Extraction d'Information
Annotation Sémantique
Indexation Sémantique
Alignement d'ontologies et correspondances pour la Recherche d'Information
Langages de Représentation des connaissances pour la Recherche d'Information
Utilisation des distances Sémantiques pour la Recherche d'Information
Atelier RISE 2015
30 juin 2015, Rennes
A few introduction words
Atelier RISE 2015
30 juin 2015, Rennes
Biologist have adopted
ontologies To provide canonical representation of scientific
knowledge
To annotate experimental data to enable interpretation, comparison, and discovery across databases
To facilitate knowledge-based applications for
Decision support
Natural language-processing
Data integration
But ontologies are: spread out, in different formats, of different size, with different structures
Atelier RISE 2015
30 juin 2015, Rennes
Working with terminologies &
ontologies – a portal please!
You’ve built an ontology, how do you let the world know?
You need an ontology, where do you go o get it?
How do you know whether an ontology is any good?
How do you find resources that are relevant to the domain of the ontology (or to specific terms)?
How could you leverage your ontology to enable new science?
How could you use ontologies without managing them ?
Atelier RISE 2015
30 juin 2015, Rennes
Atelier RISE 2015
30 juin 2015, Rennes
Comparison of the
approaches
[IWBBIO'14]
Annotation challenge
Explosion of biomedical data: diverse, distributed, unstructured… not linked to ontologies
Hard for biomedical researchers to find the data they need
Data integration problem
Translational discoveries are prevented
Good examples
GO annotations
PubMed (biomedical literature) indexed with Mesh headings
Annotate data with ontology concepts
Horizontal approachONTOLOGIES
RESOURCES
Atelier RISE 2015
30 juin 2015, Rennes
Good use of the semantics (1/2) Simple keywords based search miss results
Atelier RISE 2015
30 juin 2015, Rennes
Good use of the semantics (2/2)
Atelier RISE 2015
30 juin 2015, Rennes
A few words about SIFR
project
Atelier RISE 2015
30 juin 2015, Rennes
Semantic Indexing of
French Biomedical Data
Resources project
… in collaboration with…
People
Young researchers
Clement Jonquet
Mathieu Roche
Sandra Bringay
Advisors
Stefano A. Cerri
Maguelonne Teisseire
Pascal Poncelet
Staff
Vincent Emonet
Students
Juan Antonio Lossio Ventura
Guillaume Surroca
~3 MSc students / year
Close collaborators
Philippe Lemoisson (TETIS)
Pierre Larmande (IRD / IBC)
Mark Musen (BMIR)
Stefan Darmoni (CISMEF)
Sebastien Harispe (LGI2P)
Atelier RISE 2015
30 juin 2015, Rennes
Increasing number of biomedical
data + multilingualism
Limits of keyword-based indexing
Biomedical community has turned to ontologies to describe their
data and turn them into structured and formalized knowledge
Using ontologies is by means of creating semantic annotations
Crucial need for tools & services for French biomedical data
Biomedical data integration challenge
New potential sceintific discoveries hidden in data
Translational research
Atelier RISE 2015
30 juin 2015, Rennes
Use ontologies for indexing, mining
and searching (French) biomedical
data
Obj1: Design, development and deploymentof the French Annotator.
Obj2: Obtain new research results to exploit and enhance ontology-based indexingservices.
semantic distances
ontology alignment
ontology enrichment and disambiguation
Obj3: Valorization of indexing services
Atelier RISE 2015
30 juin 2015, Rennes
Atelier RISE 2015
30 juin 2015, Rennes
A French biomedical Annotator
Atelier RISE 2015
30 juin 2015, Rennes
Use biomedical ontologies-based
annotations end-user applications
Reuse of the NCBO
technology
Atelier RISE 2015
30 juin 2015, Rennes
Bioportal : A “one stop shop”
for Biomedical Ontologies
Web repository for biomedical ontologies
Make ontologies accessible and usable – abstraction on format, locations, structure, etc.
Users can publish, download, browse, search, comment, align ontologies and use them for annotations both online and via a web services API.
Online support for ontology
Peer review
Notes (comments and discussion)
Versioning
Mapping
Search
Resources
Atelier RISE 2015
30 juin 2015, Rennes
http://bioportal.bioontology.org
BioPortal Ontology Repository
htt
p:/
/dat
a.b
ioo
nto
logy
.org
Ontology Services
• Search• Traverse• Comment• Download
Widgets• Tree-view• Auto-complete• Graph-view
Annotation
Data Access
Mapping Services
• Create• Upload• Download
Term recognition
Search “data”annotated with a given term
http://bioportal.bioontology.org Atelier RISE 2015
30 juin 2015, Rennes
Current axes of research
Atelier RISE 2015
30 juin 2015, Rennes
SIFR axes of research (1/8):
Design of the SIFR (French)
Annotator service
Deployment of a local instance of BioPortal at LIRMM
16 French terminologies imported from UMLS, EHTOP & BioPortal
UTF8 compliant Mgrep concept recognizer (Univ. of Michigan)
http://bioportal.lirmm.fr/annotator
New improvement to the annotation workflow
Automatic term extraction measures (C-value, LIDF-value, etc.)
Scoring of annotations & representation in RDF using the AO
[SWAT4LS 2014]
Atelier RISE 2015
30 juin 2015, Rennes
Improving the Annotator(s) –
example with scoring
Objective : To improve the Annotator(s) results by ranking the annotations according to their relevance
While not changing the service implementation
Take into account their frequencies (as originally proposed in 2009 and removed)
Add a term extraction measure, called C-Value, used to positively discriminate annotations generated from matches with multi-word terms.
2 new scoring methods allowing to score and rank annotations by their importance in the given input data
Interesting results validated against PubMed manual annotations
[SWAT4LS 2014]
Atelier RISE 2015
30 juin 2015, Rennes
SIFR axes of research (2/8):
Dealing with multilingualism within
BioPortal
Status of multilingualism in BioPortal – quite negative
Set of propositions [MSW 2014]
Representation of natural language property for an ontology
Representation of the distinction between ontologies
Representation of relation between ontologies
Representation of multilingual translation mappings
Reconciliation of multilingual mappings (possible PhD collaboration with
ESI)
Currently being tested/implemented within our local instance
Atelier RISE 2015
30 juin 2015, Rennes
What is being multilingual?
Interface internationalization = displaying static elements of the user interface (e.g., menu names, help, etc.) in different languages
Content internationalization = displaying BioPortal content (e.g., ontology labels, mappings, etc.) in different languages
Multilingual = internationalization (display) + to enabling a complete use of the functionalities and services of BioPortal for multilingual ontologies or monolingual ontologies
completely and properly addressed (languages, translations, multilingual mappings, etc.)
rich semantic description
Being able to parse multilingual content in ontologies (from xmllang to Lemon)
Atelier RISE 2015
30 juin 2015, Rennes
multilingual
ontology
Atelier RISE 2015
30 juin 2015, Rennes
en:diseasefr:maladie
...en:cancerfr:cancer
en:spindel cell sarcomefr:sarcome à cellules fusiformes
en:melanomafr:mélanome
disease
... cancer
spindle cell sarcome melanoma
maladie
... cancer
sarcome à cellules fusiformes
mélanome
language specific ontology (monolingual)
SIFR axes of research (3/8):
Automatic extraction of biomedical
terminology from text
Context of the PhD of Juan Antonio Lossio[LBM 2013][TALN 2014][PolTAL 2014]
BioTex , software http://tubo.lirmm.fr/biotex [ISWC 2014]
Work in French, English and Spanish
Motivations for automatic terminology extraction
Experiment and validate approaches for French data
Contribute to the ontology enrichment process
Acquire some NLP expertise for the annotation workflow
Atelier RISE 2015
30 juin 2015, Rennes
Atelier RISE 2015
30 juin 2015, Rennes
Statistical methods
C-value: Improves the extraction of longest terms
soft contact soft contact lens
Frantzi, K., Ananiadou, S., & Mima, H. (2000). Automatic recognition of multi-word terms:. the
c-value/nc-value method. International Journal on Digital Libraries, 3(2), 115-130.
Atelier RISE 2015
30 juin 2015, Rennes
Atelier RISE 2015
30 juin 2015, Rennes
Atelier RISE 2015
30 juin 2015, Rennes
Include BioTex into BioPortal
Use BioPortal dictionary for validation
New ontology enrichment service… give a corpus of data and
see what are the terms not yet covered
Atelier RISE 2015
30 juin 2015, Rennes
SIFR axes of research (4/8):
Semantic distance framework
Automatically compute existing (Rada, Wu&Palmer, Resnik)
semantic similarity measures over BioPortal ontologies
For a given concept get all semantically closed concepts
Get the semantic distance between 2 concepts
Collaboration with LGI2P to reuse Semantic Measure Library
(SML) within BioPortal
1st prototype: http://tubo.lirmm.fr/BioMedicalSemantic/web/app_dev.php
To include SML within BioPortal backend to bring semantic
distance services to the ontologies and data annotated
Atelier RISE 2015
30 juin 2015, Rennes
SIFR axes of research (5/8):
Informal patient data analysis
Dealing with public patient data on blogs, forums and
tweets (Sandra Bringay)
Detection of emotion [EGC 2014][eTELEMED 2014]
Patient vocabulary (crabe vs. cancer)
Project “Parlons de nous” (www.lirmm.fr/patient-mind)
MSH-M
A patient vocabulary currently being constructed [IC 2015]
Hosted and available in our local instance of BioPortal
Used for annotations, indexing, information retrieval
Atelier RISE 2015
30 juin 2015, Rennes
SIFR axes of research (6/8):
Viewpoint: a subjective knowledge
representation formalism
Collaboration with P. Lemoisson (CIRAD) & PhD of G. Surroca
Graph based knowledge representation formalism
Linked data from the semantic Web and user contributions
from the social Web.
Unified topological approach
First prototype for semantic search over HAL-LIRMM
publications [IC2014]
Capture the phenomenon of Serendipity
(i.e., incidental learning) [IC 2015]
Atelier RISE 2015
30 juin 2015, Rennes
SIFR axes of research (7/8):
Pharmacogenomics use case PGx studies how individual gene variations cause variability in
drug responses
Validation of pharmacogenomics state-of-the-art knowledge on
the basis of practice-based evidences
Compare pharmacogenomics literature (in English) and electronic
health records (in French)
EHRs from Paris (HEGP) & St Etienne hospitals
Improvement of the AnnotatorS to come to handle clinical data:
negation, disambiguation, modularity, temporality
Project submitted to ANR generic call 2015 (April 27th)
Collaborative action lead by Adrien Coulet (LORIA)
Stanford is in the loop (Russ, Mark, Michel, Nigam)
Atelier RISE 2015
30 juin 2015, Rennes
SIFR axes of research (8/8):
application to agronomy & plant
Within the Institute of Computational Biology of
Montpellier
Design of a semantic annotation workflow for plant data -
collaboration with IBC project [CO-PDI 2014]
AgroLD: to build an RDF knowledge base to house plant data
resources: SouthGreen, Gramene, OryGeneDB… [RDA 2014]
AgroPortal: reference ontology repository for the agronomic
domain [IN-OVIVE 2015]
Experiment NCBO technologies for the plant community
4 driving agronomic use cases
Atelier RISE 2015
30 juin 2015, Rennes
Objectives of AgroPortal project
Develop and support a reference ontology repository for the
agronomic domain
One-stop-shop for plant/agronomic related ontologies
Primary focus on the agronomic & plant domain
Reusing the NCBO BioPortal technology
Avoid to re-implement what has been done
Facilitate interoperability
Reusing the scientific outcomes, experience & methods of the
biomedical domain
Enable straightforward use of agronomic related ontologies
Respect the requirements of the agronomic community
Fully semantic web compliant infrastructure
Atelier RISE 2015
30 juin 2015, Rennes
AgroPortal 50 ontologies relevant to agronomic and plant
Atelier RISE 2015
30 juin 2015, Rennes
A few conclusions
Atelier RISE 2015
30 juin 2015, Rennes
Next future
Continue to move different prototypes into production
Release of the French Annotator
Find more use cases
Collaboration with the plant/agro community
Continue reusing and contributing to NCBO technology
Atelier RISE 2015
30 juin 2015, Rennes
Online resources
Web page: www.lirmm.fr/sifr
https://www.researchgate.net/projects
Code repository: https://github.com/sifrproject
13 developpers
10 repositories
Publications: http://bit.ly/194ImnR
Direct link to HAL-LIRMM platform
with advance search features
Portals & services:
http://bioportal.lirmm.fr
http://agroportal.lirmm.fr
Atelier RISE 2015
30 juin 2015, Rennes
Questions & Remarks ?
Atelier RISE 2015
30 juin 2015, Rennes