data integration and functional association networks
DESCRIPTION
Exploring Modular Protein Architecture, European Molecular Biology Laboratory, Heidelberg, Germany, December 3-5, 2008TRANSCRIPT
Lars Juhl Jensen
Data integration and functional association
networks
Lars Juhl Jensen
Data integration and functional association
networks
if this is your plan
STRING
Jensen, Kuhn et al., Nucleic Acids Research, 2009
data integration
functional associations
Frishman et al., Modern Genome Annotation, 2009
the basis
630 genomes
model organism databases
Ensembl
RefSeq
genomic context methods
gene fusion
Korbel et al., Nature Biotechnology, 2004
conserved neighborhood
Korbel et al., Nature Biotechnology, 2004
phylogenetic profiles
Korbel et al., Nature Biotechnology, 2004
primary experimental data
gene coexpression
GEOGene Expression Omnibus
protein interactions
Jensen & Bork, Science, 2008
BINDBiomolecular Interaction Network Database
BioGRIDGeneral Repository for Interaction Datasets
DIPDatabase of Interacting Proteins
IntAct
MINTMolecular Interactions Database
HPRDHuman Protein Reference Database
PDBProtein Data Bank
curated knowledge
complexes
MIPSMunich Information center
for Protein Sequences
Gene Ontology
pathways
Letunic & Bork, Trends in Biochemical Sciences, 2008
KEGGKyoto Encyclopedia of Genes and Genomes
MetaCyc
Reactome
PIDNCI-Nature Pathway Interaction Database
literature mining
MEDLINE
SGDSaccharomyces Genome Database
The Interactive Fly
OMIMOnline Mendelian Inheritance in Man
thesaurus
co-mentioning
NLPNatural Language Processing
Gene and protein namesCue words for entity recognitionVerbs for relation extraction
[nxgene The GAL4 gene]
[nxexpr The expression of [nxgene the cytochrome genes [nxpg CYC1 and CYC7]]]is controlled by[nxpg HAP1]
too easy …
… to be true
many data types
not comparable
different error rates
many sources
different file formats
different gene identifiers
redundancy
spread over 630 genomes
raw quality scores
reproducibility
von Mering et al., Nucleic Acids Research, 2005
intergenic distances
Korbel et al., Nature Biotechnology, 2004
benchmarking
calibrate vs. gold standard
von Mering et al., Nucleic Acids Research, 2005
raw quality scores
probabilistic scores
transfer by orthology
von Mering et al., Nucleic Acids Research, 2005
two modes
COG mode
von Mering et al., Nucleic Acids Research, 2005
protein mode
von Mering et al., Nucleic Acids Research, 2005
combine all evidence
visualize
Frishman et al., Modern Genome Annotation, 2009
related resources
STITCH
protein–chemical network
Reflect
eggNOG
orthologous groups
NetworKIN
Linding, Jensen, Ostheimer et al., Cell, 2007
Acknowledgments
NetworKIN.info– Rune Linding
– Gerard Ostheimer
– Francesca Diella
– Karen Colwill
– Jing Jin
– Pavel Metalnikov
– Vivian Nguyen
– Adrian Pasculescu
– Jin Gyoon Park
– Leona D. Samson
– Rob Russell
– Peer Bork
– Michael Yaffe
– Tony Pawson
STRING.embl.de– Michael Kuhn– Manuel Stark– Samuel Chaffron– Chris Creevey– Jean Muller– Tobias Doerks– Philippe Julien– Alexander Roth– Milan Simonovic– Peer Bork– Christian von Mering
STITCH.embl.de– Michael Kuhn– Christian von Mering– Monica Campillos– Peer Bork
eggNOG.embl.de– Philippe Julien– Michael Kuhn– Christian von Mering– Jean Muller– Tobias Doerks– Peer Bork
Reflect.ws– Sean O’Donoghue– Evangelos Pafilis– Heiko Horn– Michael Kuhn– Peer Bork– Reinhardt Schneider