the string database - quality scores for heterogeneous interaction data

100
The STRING database Quality scores for heterogeneous interaction data Lars Juhl Jensen EMBL Heidelberg

Upload: lars-juhl-jensen

Post on 27-Jun-2015

1.229 views

Category:

Technology


0 download

DESCRIPTION

Lyon, France, April 23-25, 2007

TRANSCRIPT

Page 1: The STRING database - Quality scores for heterogeneous interaction data

The STRING databaseQuality scores for heterogeneous interaction data

Lars Juhl Jensen

EMBL Heidelberg

Page 2: The STRING database - Quality scores for heterogeneous interaction data

data integration

Page 3: The STRING database - Quality scores for heterogeneous interaction data

Jensen et al., Drug Discovery Today: Targets, 2004

Page 4: The STRING database - Quality scores for heterogeneous interaction data

functional interactions

Page 5: The STRING database - Quality scores for heterogeneous interaction data

Genomic neighborhood

Species co-occurrence

Gene fusions

Database imports

Experimental interaction data

Microarray expression data

Literature mining

Page 6: The STRING database - Quality scores for heterogeneous interaction data

373 proteomes

Page 7: The STRING database - Quality scores for heterogeneous interaction data

model organism databases

Page 8: The STRING database - Quality scores for heterogeneous interaction data

Ensembl

Page 9: The STRING database - Quality scores for heterogeneous interaction data

Genome Reviews

Page 10: The STRING database - Quality scores for heterogeneous interaction data

RefSeq

Page 11: The STRING database - Quality scores for heterogeneous interaction data

genomic context methods

Page 12: The STRING database - Quality scores for heterogeneous interaction data

gene fusion

Page 13: The STRING database - Quality scores for heterogeneous interaction data
Page 14: The STRING database - Quality scores for heterogeneous interaction data

gene neighborhood

Page 15: The STRING database - Quality scores for heterogeneous interaction data
Page 16: The STRING database - Quality scores for heterogeneous interaction data

phylogenetic profiles

Page 17: The STRING database - Quality scores for heterogeneous interaction data
Page 18: The STRING database - Quality scores for heterogeneous interaction data

scoring schemes

Page 19: The STRING database - Quality scores for heterogeneous interaction data

benchmarking

Page 20: The STRING database - Quality scores for heterogeneous interaction data

cross-species transfer

Page 21: The STRING database - Quality scores for heterogeneous interaction data

primary experimental data

Page 22: The STRING database - Quality scores for heterogeneous interaction data

many sources

Page 23: The STRING database - Quality scores for heterogeneous interaction data

different formats

Page 24: The STRING database - Quality scores for heterogeneous interaction data

different gene identifiers

Page 25: The STRING database - Quality scores for heterogeneous interaction data

redundancy

Page 26: The STRING database - Quality scores for heterogeneous interaction data

physical protein interactions

Page 27: The STRING database - Quality scores for heterogeneous interaction data

IntAct

Page 28: The STRING database - Quality scores for heterogeneous interaction data

BINDBiomolecular Interaction Network Database

Page 29: The STRING database - Quality scores for heterogeneous interaction data

MINTMolecular Interactions Database

Page 30: The STRING database - Quality scores for heterogeneous interaction data

DIPDatabase of Interacting Proteins

Page 31: The STRING database - Quality scores for heterogeneous interaction data

GRIDGeneral Repository for Interaction Datasets

Page 32: The STRING database - Quality scores for heterogeneous interaction data

HPRDHuman Protein Reference Database

Page 33: The STRING database - Quality scores for heterogeneous interaction data

PSI-MI

Page 34: The STRING database - Quality scores for heterogeneous interaction data

reference proteomes

Page 35: The STRING database - Quality scores for heterogeneous interaction data

merge data by publication

Page 36: The STRING database - Quality scores for heterogeneous interaction data

thousands of interactions

Page 37: The STRING database - Quality scores for heterogeneous interaction data

correct interactions

Page 38: The STRING database - Quality scores for heterogeneous interaction data

wrong interactions

Page 39: The STRING database - Quality scores for heterogeneous interaction data

scoring scheme

Page 40: The STRING database - Quality scores for heterogeneous interaction data

complex pull-down

Page 41: The STRING database - Quality scores for heterogeneous interaction data

von Mering et al., Nucleic Acids Research, 2005

Page 42: The STRING database - Quality scores for heterogeneous interaction data

log[(N12·N)/((N1+1)·(N2+1))]

Page 43: The STRING database - Quality scores for heterogeneous interaction data

yeast two-hybrid

Page 44: The STRING database - Quality scores for heterogeneous interaction data

non-shared interactors

Page 45: The STRING database - Quality scores for heterogeneous interaction data

-log((N1+1)·(N2+1))

Page 46: The STRING database - Quality scores for heterogeneous interaction data

not directly comparable

Page 47: The STRING database - Quality scores for heterogeneous interaction data

calibrate vs. gold standard

Page 48: The STRING database - Quality scores for heterogeneous interaction data
Page 49: The STRING database - Quality scores for heterogeneous interaction data

other types of evidence

Page 50: The STRING database - Quality scores for heterogeneous interaction data

co-expression

Page 51: The STRING database - Quality scores for heterogeneous interaction data

GEOGene Expression Omnibus

Page 52: The STRING database - Quality scores for heterogeneous interaction data

species-specific datasets

Page 53: The STRING database - Quality scores for heterogeneous interaction data

correlation coefficient

Page 54: The STRING database - Quality scores for heterogeneous interaction data

calibrate vs. gold standard

Page 55: The STRING database - Quality scores for heterogeneous interaction data

directly comparable

Page 56: The STRING database - Quality scores for heterogeneous interaction data

curated knowledge

Page 57: The STRING database - Quality scores for heterogeneous interaction data

many sources

Page 58: The STRING database - Quality scores for heterogeneous interaction data

different formats

Page 59: The STRING database - Quality scores for heterogeneous interaction data

different gene identifiers

Page 60: The STRING database - Quality scores for heterogeneous interaction data

redundancy

Page 61: The STRING database - Quality scores for heterogeneous interaction data

protein complexes

Page 62: The STRING database - Quality scores for heterogeneous interaction data

MIPSMunich Information center

for Protein Sequences

Page 63: The STRING database - Quality scores for heterogeneous interaction data

Gene Ontology

Page 64: The STRING database - Quality scores for heterogeneous interaction data

pathway databases

Page 65: The STRING database - Quality scores for heterogeneous interaction data

KEGGKyoto Encyclopedia of Genes and Genomes

Page 66: The STRING database - Quality scores for heterogeneous interaction data

Reactome

Page 67: The STRING database - Quality scores for heterogeneous interaction data

PIDNCI-Nature Pathway Interaction Database

Page 68: The STRING database - Quality scores for heterogeneous interaction data

STKESignal Transduction Knowledge Environment

Page 69: The STRING database - Quality scores for heterogeneous interaction data

BioPAX

Page 70: The STRING database - Quality scores for heterogeneous interaction data

reference proteomes

Page 71: The STRING database - Quality scores for heterogeneous interaction data

literature mining

Page 72: The STRING database - Quality scores for heterogeneous interaction data

MEDLINE

Page 73: The STRING database - Quality scores for heterogeneous interaction data

SGDSaccharomyces Genome Database

Page 74: The STRING database - Quality scores for heterogeneous interaction data

The Interactive Fly

Page 75: The STRING database - Quality scores for heterogeneous interaction data

OMIMOnline Mendelian Inheritance in Man

Page 76: The STRING database - Quality scores for heterogeneous interaction data

different gene identifiers

Page 77: The STRING database - Quality scores for heterogeneous interaction data

synonyms lists

Page 78: The STRING database - Quality scores for heterogeneous interaction data

black list

Page 79: The STRING database - Quality scores for heterogeneous interaction data

flexible matching

Page 80: The STRING database - Quality scores for heterogeneous interaction data

co-occurrence

Page 81: The STRING database - Quality scores for heterogeneous interaction data

log[(N12·N)/((N1+1)·(N2+1))]

Page 82: The STRING database - Quality scores for heterogeneous interaction data

NLPNatural Language Processing

Page 83: The STRING database - Quality scores for heterogeneous interaction data

Gene and protein namesCue words for entity recognitionVerbs for relation extraction

[nxgene The GAL4 gene]

[nxexpr The expression of [nxgene the cytochrome genes [nxpg CYC1 and CYC7]]]is controlled by[nxpg HAP1]

Page 84: The STRING database - Quality scores for heterogeneous interaction data

calibrate vs. gold standard

Page 85: The STRING database - Quality scores for heterogeneous interaction data

directly comparable

Page 86: The STRING database - Quality scores for heterogeneous interaction data

combine all evidence

Page 87: The STRING database - Quality scores for heterogeneous interaction data

spread over many species

Page 88: The STRING database - Quality scores for heterogeneous interaction data

transfer by orthology

Page 89: The STRING database - Quality scores for heterogeneous interaction data

von Mering et al., Nucleic Acids Research, 2005

Page 90: The STRING database - Quality scores for heterogeneous interaction data

two modes

Page 91: The STRING database - Quality scores for heterogeneous interaction data
Page 92: The STRING database - Quality scores for heterogeneous interaction data

orthologous groups

Page 93: The STRING database - Quality scores for heterogeneous interaction data

von Mering et al., Nucleic Acids Research, 2005

Page 94: The STRING database - Quality scores for heterogeneous interaction data
Page 95: The STRING database - Quality scores for heterogeneous interaction data

fuzzy orthology

Page 96: The STRING database - Quality scores for heterogeneous interaction data

von Mering et al., Nucleic Acids Research, 2005

Page 97: The STRING database - Quality scores for heterogeneous interaction data

add probabilistic scores

Page 98: The STRING database - Quality scores for heterogeneous interaction data

P = 1-(1-P1).(1-P2).(1-P3)…

Page 99: The STRING database - Quality scores for heterogeneous interaction data

Genomic neighborhood

Species co-occurrence

Gene fusions

Database imports

Experimental interaction data

Microarray expression data

Literature mining

Page 100: The STRING database - Quality scores for heterogeneous interaction data

Acknowledgments

The STRING team– Christian von Mering

– Michael Kuhn

– Berend Snel

– Martijn Huynen

– Sean Hooper

– Samuel Chaffron

– Julien Lagarde

– Mathilde Foglierini

– Peer Bork

Literature mining project– Jasmin Saric

– Rossitza Ouzounova

– Isabel Rojas