data integration and functional association networks

Post on 10-May-2015

747 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Exploring Modular Protein Architecture, European Molecular Biology Laboratory, Heidelberg, Germany, December 3-5, 2008

TRANSCRIPT

Lars Juhl Jensen

Data integration and functional association

networks

Lars Juhl Jensen

Data integration and functional association

networks

if this is your plan

STRING

Jensen, Kuhn et al., Nucleic Acids Research, 2009

data integration

functional associations

Frishman et al., Modern Genome Annotation, 2009

the basis

630 genomes

model organism databases

Ensembl

RefSeq

genomic context methods

gene fusion

Korbel et al., Nature Biotechnology, 2004

conserved neighborhood

Korbel et al., Nature Biotechnology, 2004

phylogenetic profiles

Korbel et al., Nature Biotechnology, 2004

primary experimental data

gene coexpression

GEOGene Expression Omnibus

protein interactions

Jensen & Bork, Science, 2008

BINDBiomolecular Interaction Network Database

BioGRIDGeneral Repository for Interaction Datasets

DIPDatabase of Interacting Proteins

IntAct

MINTMolecular Interactions Database

HPRDHuman Protein Reference Database

PDBProtein Data Bank

curated knowledge

complexes

MIPSMunich Information center

for Protein Sequences

Gene Ontology

pathways

Letunic & Bork, Trends in Biochemical Sciences, 2008

KEGGKyoto Encyclopedia of Genes and Genomes

MetaCyc

Reactome

PIDNCI-Nature Pathway Interaction Database

literature mining

MEDLINE

SGDSaccharomyces Genome Database

The Interactive Fly

OMIMOnline Mendelian Inheritance in Man

thesaurus

co-mentioning

NLPNatural Language Processing

Gene and protein namesCue words for entity recognitionVerbs for relation extraction

[nxgene The GAL4 gene]

[nxexpr The expression of [nxgene the cytochrome genes [nxpg CYC1 and CYC7]]]is controlled by[nxpg HAP1]

too easy …

… to be true

many data types

not comparable

different error rates

many sources

different file formats

different gene identifiers

redundancy

spread over 630 genomes

raw quality scores

reproducibility

von Mering et al., Nucleic Acids Research, 2005

intergenic distances

Korbel et al., Nature Biotechnology, 2004

benchmarking

calibrate vs. gold standard

von Mering et al., Nucleic Acids Research, 2005

raw quality scores

probabilistic scores

transfer by orthology

von Mering et al., Nucleic Acids Research, 2005

two modes

COG mode

von Mering et al., Nucleic Acids Research, 2005

protein mode

von Mering et al., Nucleic Acids Research, 2005

combine all evidence

visualize

Frishman et al., Modern Genome Annotation, 2009

related resources

STITCH

protein–chemical network

Reflect

eggNOG

orthologous groups

NetworKIN

Linding, Jensen, Ostheimer et al., Cell, 2007

Acknowledgments

NetworKIN.info– Rune Linding

– Gerard Ostheimer

– Francesca Diella

– Karen Colwill

– Jing Jin

– Pavel Metalnikov

– Vivian Nguyen

– Adrian Pasculescu

– Jin Gyoon Park

– Leona D. Samson

– Rob Russell

– Peer Bork

– Michael Yaffe

– Tony Pawson

STRING.embl.de– Michael Kuhn– Manuel Stark– Samuel Chaffron– Chris Creevey– Jean Muller– Tobias Doerks– Philippe Julien– Alexander Roth– Milan Simonovic– Peer Bork– Christian von Mering

STITCH.embl.de– Michael Kuhn– Christian von Mering– Monica Campillos– Peer Bork

eggNOG.embl.de– Philippe Julien– Michael Kuhn– Christian von Mering– Jean Muller– Tobias Doerks– Peer Bork

Reflect.ws– Sean O’Donoghue– Evangelos Pafilis– Heiko Horn– Michael Kuhn– Peer Bork– Reinhardt Schneider

top related