integration of biomedical literature and databases

106
Integration of biomedical literature and databases Lars Juhl Jensen EMBL Heidelberg

Upload: lars-juhl-jensen

Post on 10-May-2015

335 views

Category:

Technology


3 download

DESCRIPTION

Nordic Conference for Scolarly Communication 2008, Scandic Star Hotel, Lund, Sweden, April 21-23, 2008

TRANSCRIPT

Page 1: Integration of biomedical literature and databases

Integration of biomedical literature and databases

Lars Juhl JensenEMBL Heidelberg

Page 2: Integration of biomedical literature and databases

why integration?

Page 3: Integration of biomedical literature and databases

why biomedicine?

Page 4: Integration of biomedical literature and databases

why literature?

Page 5: Integration of biomedical literature and databases

why databases?

Page 6: Integration of biomedical literature and databases

open access databases

Page 7: Integration of biomedical literature and databases

a lot of them

Page 8: Integration of biomedical literature and databases

Duncan Hull, nodalpoint.org

Page 9: Integration of biomedical literature and databases

PubChem

Page 10: Integration of biomedical literature and databases
Page 11: Integration of biomedical literature and databases

19.2 million compounds

Page 12: Integration of biomedical literature and databases

GenBank

Page 13: Integration of biomedical literature and databases
Page 14: Integration of biomedical literature and databases

85 million sequences

Page 15: Integration of biomedical literature and databases

89 billion nucleotides

Page 16: Integration of biomedical literature and databases

UniProt

Page 17: Integration of biomedical literature and databases
Page 18: Integration of biomedical literature and databases

5.6 million sequences

Page 19: Integration of biomedical literature and databases

PDB

Page 20: Integration of biomedical literature and databases
Page 21: Integration of biomedical literature and databases

50000 protein structures

Page 22: Integration of biomedical literature and databases

BINDBiomolecular Interaction Network Database

Page 23: Integration of biomedical literature and databases

DIPDatabase of Interacting Proteins

Page 24: Integration of biomedical literature and databases

MINTMolecular Interactions Database

Page 25: Integration of biomedical literature and databases

IntAct

Page 26: Integration of biomedical literature and databases

BioGRID

Page 27: Integration of biomedical literature and databases
Page 28: Integration of biomedical literature and databases

204000 interactions

Page 29: Integration of biomedical literature and databases

too many

Page 30: Integration of biomedical literature and databases

incomplete

Page 31: Integration of biomedical literature and databases

literature mining

Page 32: Integration of biomedical literature and databases

MEDLINE

Page 33: Integration of biomedical literature and databases

17.9 million citations

Page 34: Integration of biomedical literature and databases
Page 35: Integration of biomedical literature and databases

too much to read

Page 36: Integration of biomedical literature and databases

information retrieval

Page 37: Integration of biomedical literature and databases

finding the papers

Page 38: Integration of biomedical literature and databases
Page 39: Integration of biomedical literature and databases
Page 40: Integration of biomedical literature and databases

user-specified query

Page 41: Integration of biomedical literature and databases

“yeast AND cell cycle”

Page 42: Integration of biomedical literature and databases

stemming

Page 43: Integration of biomedical literature and databases

yeast / yeasts

Page 44: Integration of biomedical literature and databases

dynamic query expansion

Page 45: Integration of biomedical literature and databases

yeast / S. cerevisiae

Page 46: Integration of biomedical literature and databases

ranking

Page 47: Integration of biomedical literature and databases
Page 48: Integration of biomedical literature and databases
Page 49: Integration of biomedical literature and databases

Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1

hyperphosphorylation and degradation

Page 50: Integration of biomedical literature and databases

no tool will find it

Page 51: Integration of biomedical literature and databases

entity recognition

Page 52: Integration of biomedical literature and databases

identifying the substance(s)

Page 53: Integration of biomedical literature and databases

Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1

hyperphosphorylation and degradation

Page 54: Integration of biomedical literature and databases

Cdc28 yeast

Page 55: Integration of biomedical literature and databases

Cdc28 cell cycle

Page 56: Integration of biomedical literature and databases
Page 57: Integration of biomedical literature and databases

synonyms list

Page 58: Integration of biomedical literature and databases

orthographic variation

Page 59: Integration of biomedical literature and databases

CDC28

Page 60: Integration of biomedical literature and databases

Cdc28p

Page 61: Integration of biomedical literature and databases

disambiguation

Page 62: Integration of biomedical literature and databases

Cdc2

Page 63: Integration of biomedical literature and databases

SDS

Page 64: Integration of biomedical literature and databases
Page 65: Integration of biomedical literature and databases

still too much to read

Page 66: Integration of biomedical literature and databases

information extraction

Page 67: Integration of biomedical literature and databases

formalizing the facts

Page 68: Integration of biomedical literature and databases
Page 69: Integration of biomedical literature and databases

co-mentioning

Page 70: Integration of biomedical literature and databases

statistical methods

Page 71: Integration of biomedical literature and databases

NLPNatural Language Processing

Page 72: Integration of biomedical literature and databases

Gene and protein names

Cue words for entity recognition

Verbs for relation extraction

[nxexpr The expression of [nxgene the cytochrome genes [nxpg CYC1 and CYC7]]]is controlled by[nxpg HAP1]

Page 73: Integration of biomedical literature and databases

Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1

hyperphosphorylation and degradation

Page 74: Integration of biomedical literature and databases
Page 75: Integration of biomedical literature and databases

yet another database

Page 76: Integration of biomedical literature and databases

integration

Page 77: Integration of biomedical literature and databases

augmented browsing

Page 78: Integration of biomedical literature and databases
Page 79: Integration of biomedical literature and databases

semantic tagging

Page 80: Integration of biomedical literature and databases
Page 81: Integration of biomedical literature and databases

association networks

Page 82: Integration of biomedical literature and databases
Page 83: Integration of biomedical literature and databases
Page 84: Integration of biomedical literature and databases

curated knowledge

Page 85: Integration of biomedical literature and databases
Page 86: Integration of biomedical literature and databases

genomic context

Page 87: Integration of biomedical literature and databases

phylogenetic profiles

Page 88: Integration of biomedical literature and databases
Page 89: Integration of biomedical literature and databases

gene neighborhood

Page 90: Integration of biomedical literature and databases
Page 91: Integration of biomedical literature and databases

experimental data

Page 92: Integration of biomedical literature and databases

physical interactions

Page 93: Integration of biomedical literature and databases
Page 94: Integration of biomedical literature and databases

genetic interactions

Page 95: Integration of biomedical literature and databases
Page 96: Integration of biomedical literature and databases

literature mining

Page 97: Integration of biomedical literature and databases
Page 98: Integration of biomedical literature and databases

restricted access

Page 99: Integration of biomedical literature and databases

Bayesian framework

Page 100: Integration of biomedical literature and databases
Page 101: Integration of biomedical literature and databases

summary

Page 102: Integration of biomedical literature and databases

literature mining is good

Page 103: Integration of biomedical literature and databases

data integration is better

Page 104: Integration of biomedical literature and databases

open access

Page 105: Integration of biomedical literature and databases

Acknowledgments

STRING & STITCH– Christian von Mering

– Michael Kuhn

– Manuel Stark

– Samuel Chaffron

– Philippe Julien

– Tobias Doerks

– Jan Korbel

– Berend Snel

– Martijn Huynen

– Peer Bork

Reflect– Evangelos Pafilis

– Michael Kuhn

– Sean O’Donoghue

– Reinhardt Schneider

Natural Language Processing– Jasmin Saric

– Rossitza Ouzounova

– Isabel Rojas

– Peer Bork

Page 106: Integration of biomedical literature and databases