integration of biomedical literature and databases

115
Integration of biomedical literature and databases Lars Juhl Jensen EMBL Heidelberg

Upload: lars-juhl-jensen

Post on 10-May-2015

1.963 views

Category:

Technology


1 download

DESCRIPTION

2nd European Conference on Scientific Publishing in Biomedicine and Medicine, Rikshospitalet, Oslo, Norway, September 5-6, 2008

TRANSCRIPT

Page 1: Integration of biomedical literature and databases

Integration of biomedical literature and databases

Lars Juhl JensenEMBL Heidelberg

Page 2: Integration of biomedical literature and databases

biomedical databases

Page 3: Integration of biomedical literature and databases

DNA sequences

Page 4: Integration of biomedical literature and databases

GenBank

Page 5: Integration of biomedical literature and databases
Page 6: Integration of biomedical literature and databases

protein sequences

Page 7: Integration of biomedical literature and databases

UniProt

Page 8: Integration of biomedical literature and databases
Page 9: Integration of biomedical literature and databases

protein structures

Page 10: Integration of biomedical literature and databases

PDB

Page 11: Integration of biomedical literature and databases
Page 12: Integration of biomedical literature and databases

expression

Page 13: Integration of biomedical literature and databases

ArrayExpress

Page 14: Integration of biomedical literature and databases

GEOGene Expression Omnibus

Page 15: Integration of biomedical literature and databases
Page 16: Integration of biomedical literature and databases

modifications

Page 17: Integration of biomedical literature and databases

Phospho.ELM

Page 18: Integration of biomedical literature and databases

PhosphoSite

Page 19: Integration of biomedical literature and databases

interactions

Page 20: Integration of biomedical literature and databases

BioGRID

Page 21: Integration of biomedical literature and databases

DIPDatabase of Interacting Proteins

Page 22: Integration of biomedical literature and databases

IntAct

Page 23: Integration of biomedical literature and databases

MINTMolecular Interactions Database

Page 24: Integration of biomedical literature and databases
Page 25: Integration of biomedical literature and databases

chemical compounds

Page 26: Integration of biomedical literature and databases

PubChem

Page 27: Integration of biomedical literature and databases
Page 28: Integration of biomedical literature and databases

database of databases

Page 29: Integration of biomedical literature and databases

Duncan Hull, nodalpoint.org

Page 30: Integration of biomedical literature and databases

freely available

Page 31: Integration of biomedical literature and databases

literature mining

Page 32: Integration of biomedical literature and databases

PubMed

Page 33: Integration of biomedical literature and databases

exponential increase

Page 34: Integration of biomedical literature and databases
Page 35: Integration of biomedical literature and databases
Page 36: Integration of biomedical literature and databases

some things never change

Page 37: Integration of biomedical literature and databases
Page 38: Integration of biomedical literature and databases

“graph calculus”

Page 39: Integration of biomedical literature and databases

=

Page 40: Integration of biomedical literature and databases

~50 seconds per paper

Page 41: Integration of biomedical literature and databases

information retrieval

Page 42: Integration of biomedical literature and databases

find the relevant papers

Page 43: Integration of biomedical literature and databases

ad hoc retrieval

Page 44: Integration of biomedical literature and databases

user-specified query

Page 45: Integration of biomedical literature and databases

“yeast AND cell cycle”

Page 46: Integration of biomedical literature and databases

stemming

Page 47: Integration of biomedical literature and databases

yeast / yeasts

Page 48: Integration of biomedical literature and databases

dynamic query expansion

Page 49: Integration of biomedical literature and databases

yeast / S. cerevisiae

Page 50: Integration of biomedical literature and databases
Page 51: Integration of biomedical literature and databases
Page 52: Integration of biomedical literature and databases
Page 53: Integration of biomedical literature and databases
Page 54: Integration of biomedical literature and databases

Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1

hyperphosphorylation and degradation

Page 55: Integration of biomedical literature and databases

no tool will find it

Page 56: Integration of biomedical literature and databases

entity recognition

Page 57: Integration of biomedical literature and databases

identify the substance(s)

Page 58: Integration of biomedical literature and databases

Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1

hyperphosphorylation and degradation

Page 59: Integration of biomedical literature and databases

good synonyms list

Page 60: Integration of biomedical literature and databases

orthographic variation

Page 61: Integration of biomedical literature and databases

CDC28

Page 62: Integration of biomedical literature and databases

Cdc28p

Page 63: Integration of biomedical literature and databases

disambiguation

Page 64: Integration of biomedical literature and databases

Cdc2

Page 65: Integration of biomedical literature and databases

SDS

Page 66: Integration of biomedical literature and databases

information extraction

Page 67: Integration of biomedical literature and databases

formalize the facts

Page 68: Integration of biomedical literature and databases

co-mentioning

Page 69: Integration of biomedical literature and databases

NLPNatural Language Processing

Page 70: Integration of biomedical literature and databases

Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1

hyperphosphorylation and degradation

Page 71: Integration of biomedical literature and databases

integration tools

Page 72: Integration of biomedical literature and databases

“document-centric” tools

Page 73: Integration of biomedical literature and databases

Reflect

Page 74: Integration of biomedical literature and databases
Page 75: Integration of biomedical literature and databases

browser add-on

Page 76: Integration of biomedical literature and databases

real-time tagging service

Page 77: Integration of biomedical literature and databases

any HTML document

Page 78: Integration of biomedical literature and databases

augmented document

Page 79: Integration of biomedical literature and databases

information from databases

Page 80: Integration of biomedical literature and databases
Page 81: Integration of biomedical literature and databases

iHOP

Page 82: Integration of biomedical literature and databases
Page 83: Integration of biomedical literature and databases

web interface

Page 84: Integration of biomedical literature and databases

precomputed index

Page 85: Integration of biomedical literature and databases

abstracts

Page 86: Integration of biomedical literature and databases

find text about a protein

Page 87: Integration of biomedical literature and databases

link proteins and text

Page 88: Integration of biomedical literature and databases
Page 89: Integration of biomedical literature and databases

experimental interactions

Page 90: Integration of biomedical literature and databases
Page 91: Integration of biomedical literature and databases

“entity-centric” tools

Page 92: Integration of biomedical literature and databases

STRING & STITCH

Page 93: Integration of biomedical literature and databases
Page 94: Integration of biomedical literature and databases
Page 95: Integration of biomedical literature and databases

functional associations

Page 96: Integration of biomedical literature and databases

heterogeneous evidence

Page 97: Integration of biomedical literature and databases

information extraction

Page 98: Integration of biomedical literature and databases
Page 99: Integration of biomedical literature and databases

curated knowledge

Page 100: Integration of biomedical literature and databases
Page 101: Integration of biomedical literature and databases

interaction data

Page 102: Integration of biomedical literature and databases
Page 103: Integration of biomedical literature and databases

expression data

Page 104: Integration of biomedical literature and databases
Page 105: Integration of biomedical literature and databases

genomic context

Page 106: Integration of biomedical literature and databases
Page 107: Integration of biomedical literature and databases

quality scores

Page 108: Integration of biomedical literature and databases

probabilistic framework

Page 109: Integration of biomedical literature and databases

cross-species transfer

Page 110: Integration of biomedical literature and databases

association networks

Page 111: Integration of biomedical literature and databases
Page 112: Integration of biomedical literature and databases
Page 113: Integration of biomedical literature and databases

Acknowledgments

STRING & STITCH– Christian von Mering

– Michael Kuhn

– Manuel Stark

– Samuel Chaffron

– Philippe Julien

– Jean Muller

– Tobias Doerks

– Jan Korbel

– Berend Snel

– Martijn Huynen

– Peer Bork

Natural Language Processing– Jasmin Saric

– Rossitza Ouzounova

– Isabel Rojas

– Peer Bork

Reflect– Evangelos Pafilis

– Heiko Horn

– Michael Kuhn

– Sean O’Donoghue

– Reinhardt Schneider

Page 114: Integration of biomedical literature and databases

hands-on exercises

Page 115: Integration of biomedical literature and databases

Exercises

Find literature on human CDC2

Find data and literature on targets and cytochrome P450 enzymes for Aspirin, Viagra, as well as for similar compounds

Find information on the genes in doi:10.1371/journal.pgen.1000120

Construct an interaction network of genes that cause G2/M delays in the budding yeast cell cycle

Tools

http://www.ihop-net.org

http://string.embl.de

http://stitch.embl.de

http://reflect.ws