cross-species data integration
DESCRIPTION
Centre for Molecular Biology and Neuroscience, Rikshospitalet-Radiumhospitalet, Oslo, Norway, October 25, 2006TRANSCRIPT
Cross-species data integration
Lars Juhl Jensen
EMBL Heidelberg
Lars Juhl Jensen
promoter analysis
Jensen et al., Bioinformatics, 2000
genome visualization
Pedersen et al., Journal of Molecular Biology, 2000
protein function prediction
data integration
Jensen et al., Drug Discovery Today: Targets, 2004
cell cycle
temporal interaction network
de Lichtenberg et al., Science, 2005
cross-species comparison
Jensen et al., Nature, 2006
STRING
373 proteomes
Genome Reviews
RefSeq
Ensembl
model organism databases
functional interactions
genomic context methods
gene neighborhood
gene fusion
phylogenetic profiles
Cell
Cellulosomes
Cellulose
correct interactions
wrong associations
gene neighborhood
sum of intergenic distances
gene fusion
sequence similarity
phylogenetic profiles
SVDSingular Value Decomposition
Euclidian distance
raw quality scores
not comparable
sum of intergenic distances
sequence similarity
Euclidian distance
benchmarking
calibrate vs. gold standard
raw quality scores
probabilistic scores
curated knowledge
KEGGKyoto Encyclopedia of Genes and Genomes
Reactome
MIPSMunich Information center
for Protein Sequences
STKESignal Transduction Knowledge Environment
primary experimental data
many sources
many parsers
physical protein interactions
BINDBiomolecular Interaction Network Database
GRIDGeneral Repository for Interaction Datasets
MINTMolecular Interactions Database
DIPDatabase of Interacting Proteins
HPRDHuman Protein Reference Database
merge data by publication
topology-based scores
von Mering et al., Nucleic Acids Research, 2005
co-expression
GEOGene Expression Omnibus
correlation coefficient
literature mining
different gene identifiers
synonyms lists
MEDLINE
SGDSaccharomyces Genome Database
The Interactive Fly
OMIMOnline Mendelian Inheritance in Man
co-mentioning
NLPNatural Language Processing
Gene and protein namesCue words for entity recognitionVerbs for relation extraction
[nxgene The GAL4 gene]
[nxexpr The expression of [nxgene the cytochrome genes [nxpg CYC1 and CYC7]]]is controlled by[nxpg HAP1]
calibrate vs. gold standard
combine all evidence
spread over many species
transfer by orthology
von Mering et al., Nucleic Acids Research, 2005
two modes
orthologous groups
von Mering et al., Nucleic Acids Research, 2005
fuzzy orthology
von Mering et al., Nucleic Acids Research, 2005
Bayesian scoring scheme
Bork et al., Current Opinion in Structural Biology, 2005
predicting “mode of action”
Jensen et al., Drug Discovery Today: Targets, 2004
Jensen et al., Drug Discovery Today: Targets, 2004
NetworKIN
the idea
mass spectrometry
phosphorylation sites
in vivo
kinases are unknown
sequence motifs
kinase families
overprediction
in vitro
protein networks
STRING
context
in vivo
the algorithm
benchmarking
Phospho.ELM
ATM signaling
ATM phosphorylates Rad50
summary
integration
high-throughput data
computational methods
biological discoveries
hypotheses
highly specific
testable
Acknowledgments
The STRING team (EMBL)– Christian von Mering
– Berend Snel
– Martijn Huynen
– Sean Hooper
– Samuel Chaffron
– Julien Lagarde
– Mathilde Foglierini
– Peer Bork
Literature mining project(EML Research)– Jasmin Saric
– Rossitza Ouzounova
– Isabel Rojas
Cell cycle project (CBS)– Ulrik de Lichtenberg
– Thomas Skøt Jensen
– Søren Brunak
• The NetworKIN project– Rune Linding
– Gerard Ostheimer
– Francesca Diella
– Karen Colwill
– Jing Jin
– Rob Russell
– Michael Yaffe
– Tony Pawson