integration of biomedical literature and databases
DESCRIPTION
2nd European Conference on Scientific Publishing in Biomedicine and Medicine, Rikshospitalet, Oslo, Norway, September 5-6, 2008TRANSCRIPT
Integration of biomedical literature and databases
Lars Juhl JensenEMBL Heidelberg
biomedical databases
DNA sequences
GenBank
protein sequences
UniProt
protein structures
PDB
expression
ArrayExpress
GEOGene Expression Omnibus
modifications
Phospho.ELM
PhosphoSite
interactions
BioGRID
DIPDatabase of Interacting Proteins
IntAct
MINTMolecular Interactions Database
chemical compounds
PubChem
database of databases
Duncan Hull, nodalpoint.org
freely available
literature mining
PubMed
exponential increase
some things never change
“graph calculus”
=
~50 seconds per paper
information retrieval
find the relevant papers
ad hoc retrieval
user-specified query
“yeast AND cell cycle”
stemming
yeast / yeasts
dynamic query expansion
yeast / S. cerevisiae
Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1
hyperphosphorylation and degradation
no tool will find it
entity recognition
identify the substance(s)
Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1
hyperphosphorylation and degradation
good synonyms list
orthographic variation
CDC28
Cdc28p
disambiguation
Cdc2
SDS
information extraction
formalize the facts
co-mentioning
NLPNatural Language Processing
Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1
hyperphosphorylation and degradation
integration tools
“document-centric” tools
Reflect
browser add-on
real-time tagging service
any HTML document
augmented document
information from databases
iHOP
web interface
precomputed index
abstracts
find text about a protein
link proteins and text
experimental interactions
“entity-centric” tools
STRING & STITCH
functional associations
heterogeneous evidence
information extraction
curated knowledge
interaction data
expression data
genomic context
quality scores
probabilistic framework
cross-species transfer
association networks
Acknowledgments
STRING & STITCH– Christian von Mering
– Michael Kuhn
– Manuel Stark
– Samuel Chaffron
– Philippe Julien
– Jean Muller
– Tobias Doerks
– Jan Korbel
– Berend Snel
– Martijn Huynen
– Peer Bork
Natural Language Processing– Jasmin Saric
– Rossitza Ouzounova
– Isabel Rojas
– Peer Bork
Reflect– Evangelos Pafilis
– Heiko Horn
– Michael Kuhn
– Sean O’Donoghue
– Reinhardt Schneider
hands-on exercises
Exercises
Find literature on human CDC2
Find data and literature on targets and cytochrome P450 enzymes for Aspirin, Viagra, as well as for similar compounds
Find information on the genes in doi:10.1371/journal.pgen.1000120
Construct an interaction network of genes that cause G2/M delays in the budding yeast cell cycle
Tools
http://www.ihop-net.org
http://string.embl.de
http://stitch.embl.de
http://reflect.ws