gene association networks - large-scale integration of data and text
TRANSCRIPT
Gene association networks
Large-scale integration of data and text
Lars Juhl Jensen
9.6 million genes
association network
guilt by association
genomic context
gene fusion
Korbel et al., Nature Biotechnology, 2004
phylogenetic profiles
Korbel et al., Nature Biotechnology, 2004
experimental data
gene coexpression
physical interactions
Jensen & Bork, Science, 2008
curated knowledge
protein complexes
pathways
Letunic & Bork, Trends in Biochemical Sciences, 2008
many databases
different formats
different identifiers
variable quality
not comparable
hard work
(Ph.D. students)
parsers
mapping files
quality scores
affinity purification
von Mering et al., Nucleic Acids Research, 2005
score calibration
gold standard
von Mering et al., Nucleic Acids Research, 2005
implicit weighting by quality
common scale
cross-species transfer
Franceschini et al., Nucleic Acids Research, 2013
missing most of the data
>10 km
too much to read
text mining
comprehensive lexicon
cyclin dependent kinase 1
CDC2
orthographic variation
spaces and hyphens
cyclin dependent kinase 1
cyclin-dependent kinase 1
prefixes and suffixes
CDC2
hCdc2
“black list”
SDS
co-mentioning
counting
within documents
within paragraphs
within sentences
quality scores
score calibration
cross-species transfer
combine all evidence
Szklarczyk et al., Nucleic Acids Research, 2015string-db.org
web resource
download files
REST API
Bioconductor package
Cytoscape App
AcknowledgmentsDamian Szklarczyk
Michael KuhnAndrea Franceschini
Milan SimonovicAlexander Roth
Sune Pletscher-FrankildJohn “Scooter” MorrisChristian von Mering
Peer Bork
Unacknowledgments
Do yourself a favor, don’t fly