do not reproduce without permission 1 gerstein.info/talks (c) 2003 1 (c) mark gerstein, 2002, yale,...
TRANSCRIPT
2
(c
) M
ark
Ge
rste
in,
20
02
, Y
ale
, b
ioin
fo.m
bb
.ya
le.e
du
Do not reproduce without permission 2 G
ers
tein
.in
fo/t
alk
s
(c)
20
03
Computational Proteomicsof Protein Complexes
Mark B GersteinYale U
Talk at NIH2003.04.07
3
(c
) M
ark
Ge
rste
in,
20
02
, Y
ale
, b
ioin
fo.m
bb
.ya
le.e
du
Do not reproduce without permission 3 G
ers
tein
.in
fo/t
alk
s
(c)
20
03
The Interactome: the Next ‘omic Step
Interactome
ProteomeTranscriptome
Genome
4
(c
) M
ark
Ge
rste
in,
20
02
, Y
ale
, b
ioin
fo.m
bb
.ya
le.e
du
Do not reproduce without permission 4 G
ers
tein
.in
fo/t
alk
s
(c)
20
03
The popularity of interactome information
0
50
100
150
200
250
300
350
400
450
1999 2000 2001 2002 2003
Cit
atio
ns
per
yea
r
Gavin et al. p-p int dataset
Ho et al. p-p int dataset
Uetz et al. p-p int dataset
Ribosome Structure
Spellman et al. Expression Expt.
deRisi et al. Expression Expt.
5
(c
) M
ark
Ge
rste
in,
20
02
, Y
ale
, b
ioin
fo.m
bb
.ya
le.e
du
Do not reproduce without permission 5 G
ers
tein
.in
fo/t
alk
s
(c)
20
03
Computational Proteomics of Complexes
1. Interactions provide a systematic way of defining protein function on a genomic scale
2. Known complexes provide a benchmark to validate and integrate genome-wide interaction experiments, providing a more accurate interactome
3. Known complexes provide a focus for the intergration of (non-interaction) genomic information – e.g. expression data
4. Extrapolating from known complexes, one can predict protein complexes on a genome-scale via integrating experimental interactions and non-interaction information (combining #1 and #2)
6
(c
) M
ark
Ge
rste
in,
20
02
, Y
ale
, b
ioin
fo.m
bb
.ya
le.e
du
Do not reproduce without permission 6 G
ers
tein
.in
fo/t
alk
s
(c)
20
03
Circumscribing Protein Function in terms of Interactions
7
(c
) M
ark
Ge
rste
in,
20
02
, Y
ale
, b
ioin
fo.m
bb
.ya
le.e
du
Do not reproduce without permission 7 G
ers
tein
.in
fo/t
alk
s
(c)
20
03
Understanding Protein Function on a Genomic Scale
• 250 of 650 known on chr. 22 [Dunham et al.]
• >>30K+ Proteins in Entire Human Genome(alt. splicing)
.…… ~650
8
(c
) M
ark
Ge
rste
in,
20
02
, Y
ale
, b
ioin
fo.m
bb
.ya
le.e
du
Do not reproduce without permission 8 G
ers
tein
.in
fo/t
alk
s
(c)
20
03
Issues in defining protein function on a genomic scale
• Multi-functionality: 2 functions/protein (also 2 proteins/function)
• Role Conflation: molecular, cellular, phenotypic
• Fun terms… but do they scale? • Starry night• Sarah (affects female fertility); Sonic; Darkener of apricot &
suppressor of white apricot; Redtape, gridlock, roadblock (when mutated block transport along axons); ROP vs ROM ("Regulator of Copy Number" or RNA-I-II-complex-binding-protein)
• For now, definable aspects of function: interactions, location, enzymatic rxn. [Babbit]
9
(c
) M
ark
Ge
rste
in,
20
02
, Y
ale
, b
ioin
fo.m
bb
.ya
le.e
du
Do not reproduce without permission 9 G
ers
tein
.in
fo/t
alk
s
(c)
20
03
Ontologies for function: Networks, Hierarchies, DAGs
All of SCOP entries
1Oxido-
reductases
3Hydrolases
1.1Acting on CH-OH
1.1.1.1 Alcohol dehydrogenase
ENZYME
1.1.1NAD and
NADP acceptor
NON-ENZYME
3.1Acting on
ester bonds
1 Meta-bolism
1.1 Carb.
metab.
3.8 Extracel.
matrix
3.8.2 Extracel.
matrixglyco-protein
1.1.1 Polysach.
metab.
3.8.2.1 Fibro-nectin
General similarity Functional class similarityPrecise functional similarity
3 Cell
structure
1.5Acting on
CH-NH
3.4Acting on
peptide bonds
1.1.1.3Homoserine
dehydrogenase
1.2Nucleotide
metab.
3.1 Nucleus
3.8.2.2Tenascin
1.1.1.1 Glycogenmetab.
1.1.1.2 Starchmetab.
3.1.1.1 Carboxylesterase
3.1.1Carboxylic
ester hydro-lases
3.1.1.8 Cholineesterase
10
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 10
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
3
Ontologies for function: Interaction vectors
Lan et al. IEEE (2002) & COSB (2003)
11
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 11
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
3
Validating and Integrating Genomic Protein-Protein Interaction Datasets
with Known Complexes
12
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 12
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
3
Protein interaction data
• Databases (BIND, DIP, MIPS etc.) literature
• High-throughput datasets in vivo pull down yeast two-hybrid
• Computational predictions Tangential genomic data
• Expression data• Phenotypic data• Localization Data
13
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 13
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
3
Combining interaction data
• High-throughput data is less reliable than more careful, smaller scale experiments Orthogonal datasets
• Combining data increases accuracy coverage
• How to do this in a quantitative way? How to weight the different data sources? General classification problem (machine
learning) Bayesian networks: probabilistic
14
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 14
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
3
Example of data integration:RNA polymerase II
Which subunits interact?-> protein-protein interaction
experiments
Kornberg et al., 2001
Compare with Gold Std. structure:
Edwards, Kus, Jansen, Greenbaum, Greenblatt, Gerstein, TIG (2002)
15
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 15
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
3
Data integration: RNA polymerase II
Subunit A 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 5 5 5 5 5 5 6 6 6 6 6 8 8 8 8 9 9 9 10 10 11
Subunit B 2 3 5 6 8 9 10 11 12 3 5 6 8 9 10 11 12 5 6 8 9 10 11 12 6 8 9 10 11 12 8 9 10 11 12 9 10 11 12 10 11 12 11 12 12
16
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 16
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
3
Data integration: RNA polymerase II
Subunit A 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 5 5 5 5 5 5 6 6 6 6 6 8 8 8 8 9 9 9 10 10 11
Subunit B 2 3 5 6 8 9 10 11 12 3 5 6 8 9 10 11 12 5 6 8 9 10 11 12 6 8 9 10 11 12 8 9 10 11 12 9 10 11 12 10 11 12 11 12 12
structural contact 1 0 1 1 1 1 0 1 0 1 0 0 0 1 1 0 1 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
17
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 17
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
3
Data integration: RNA polymerase II
Interaction experiments before structure was known
Subunit A 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 5 5 5 5 5 5 6 6 6 6 6 8 8 8 8 9 9 9 10 10 11
Subunit B 2 3 5 6 8 9 10 11 12 3 5 6 8 9 10 11 12 5 6 8 9 10 11 12 6 8 9 10 11 12 8 9 10 11 12 9 10 11 12 10 11 12 11 12 12
structural contact 1 0 1 1 1 1 0 1 0 1 0 0 0 1 1 0 1 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Far western 1 1 1 1 1 0 0 0 0 0 0 1 0 0 0
Cross-linking 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 0 1
Far western 1 1 1 1 1 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Pull-down 1 1 0 1 0 1 0 1 1 0 1 0 1 0 1 1 1 0 1 1 1 1 0 1 0 0 0 0 0 0 1 0 0 0 0
Pull-down 1 1 1 1 0 1 0 1 1 0 1 0 1 0 1 1 1 0 1 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0
Pull-down 1 1 1 0 1 0 0 1 0
Far western 1 0 0 0 1 0
18
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 18
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
3
Data integration: RNA polymerase II
Subunit A 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 5 5 5 5 5 5 6 6 6 6 6 8 8 8 8 9 9 9 10 10 11
Subunit B 2 3 5 6 8 9 10 11 12 3 5 6 8 9 10 11 12 5 6 8 9 10 11 12 6 8 9 10 11 12 8 9 10 11 12 9 10 11 12 10 11 12 11 12 12
structural contact 1 0 1 1 1 1 0 1 0 1 0 0 0 1 1 0 1 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Far western 1 1 1 1 1 0 0 0 0 0 0 1 0 0 0
Cross-linking 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 0 1
Far western 1 1 1 1 1 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Pull-down 1 1 0 1 0 1 0 1 1 0 1 0 1 0 1 1 1 0 1 1 1 1 0 1 0 0 0 0 0 0 1 0 0 0 0
Pull-down 1 1 1 1 0 1 0 1 1 0 1 0 1 0 1 1 1 0 1 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0
Pull-down 1 1 1 0 1 0 0 1 0
Far western 1 0 0 0 1 0
= false
= true
19
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 19
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
3
Data integration: RNA polymerase II
Integrate using naive Bayes classifier
Subunit A 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 5 5 5 5 5 5 6 6 6 6 6 8 8 8 8 9 9 9 10 10 11
Subunit B 2 3 5 6 8 9 10 11 12 3 5 6 8 9 10 11 12 5 6 8 9 10 11 12 6 8 9 10 11 12 8 9 10 11 12 9 10 11 12 10 11 12 11 12 12
structural contact 1 0 1 1 1 1 0 1 0 1 0 0 0 1 1 0 1 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Far western 1 1 1 1 1 0 0 0 0 0 0 1 0 0 0
Cross-linking 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 0 1
Far western 1 1 1 1 1 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Pull-down 1 1 0 1 0 1 0 1 1 0 1 0 1 0 1 1 1 0 1 1 1 1 0 1 0 0 0 0 0 0 1 0 0 0 0
Pull-down 1 1 1 1 0 1 0 1 1 0 1 0 1 0 1 1 1 0 1 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0
Pull-down 1 1 1 0 1 0 0 1 0
Far western 1 0 0 0 1 0
Combined (Bayesian) 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
= false
= true
20
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 20
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
3
Data integration: RNA polymerase II
Integrate using naive Bayes classifier
Majority 1 1 1 1 1 0 1 0 1 1 1 0 1 0 1 0 1 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Intersection 1 1 1 0 1 0 0 0 1 1 1 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Union 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 0 1 1 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0
21
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 21
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
3
Data integration: RNA ploymerase II
Subunit pairs covered Fraction true [%]Far western 15 53Cross linking 20 65Far western 30 77Pull-down 35 57Pull-down 35 66Pull-down 9 44Far western 6 50
Combined (Naive Bayes) 45 80Union 45 60Intersection 45 76Majority 45 73
22
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 22
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
3
Comparison of interaction data sets
.
Data set
Method
23
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 23
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
3
Comparison of experimental data with gold standards
Positives8250 interactions in MIPS complexes
Negatives~2.7 M pairs in diff.
Subcellular compartments
TP
FP
Set of experimental“interactions”
24
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 24
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
3
Gavin
Uetz Ho
90/556711/135
1357/6226
6/6
353/21218/6
15/1
TP / FP
Combining experimental data
Jansen et al. JSFG 2002
25
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 25
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
3
Integrating Structural Complexes with Non-interaction Genomic Information:
Using them to Interpret Gene Expression data
26
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 26
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
3
MCM3MCM6CDC47MCM2CDC46CDC54
DPB3CDC45DPB2CDC2CDC7POL2HYS2POL32DBF4ORC2ORC6ORC5ORC4ORC3ORC1
MC
M3
MC
M6
CD
C4
7M
CM
2C
DC
46
CD
C5
4
DP
B3
CD
C4
5D
PB
2C
DC
2C
DC
7P
OL
2H
YS
2P
OL
32
DB
F4
OR
C2
OR
C6
OR
C5
OR
C4
OR
C3
OR
C1
Format of Gene Expression
Data
Conditions (e.g. Cancers) or Timepoints
A B A A A B B B A B B B B B A …..
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 …..
MCM3
MCM6
CDC47
MCM2
CDC46
S CDC54
E DPB3
N CDC45
E DPB2
G CDC2
CDC7
POL2
HYS2
POL32
DBF4
….
27
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 27
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
3
MCM3MCM6CDC47MCM2CDC46CDC54
DPB3CDC45DPB2CDC2CDC7POL2HYS2POL32DBF4ORC2ORC6ORC5ORC4ORC3ORC1
MC
M3
MC
M6
CD
C4
7M
CM
2C
DC
46
CD
C5
4D
PB
3C
DC
45
DP
B2
CD
C2
CD
C7
PO
L2
HY
S2
PO
L3
2D
BF
4O
RC
2O
RC
6O
RC
5O
RC
4O
RC
3O
RC
1
MCMsprots.
ORC
Polym.&
Expression Correlations Segment Replication
Complex into Component Parts
28
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 28
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
3
Range of Expression Correlations within Complexes
Replication CplxOverall .05 ORC .19, MCMs .75Pol. .45, .75,
Ribosome Overall .80Large .80Small .81
ProteasomeOverall .43 20S .5019S .51
29
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 29
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
3
Protein-Protein Interactions &
Expression
between selected expression timecourses
(all pairs, control)
(strong interactions in perm- anent complexes, clearly diff.)
Cell Cycle CDC28 expt. (Davis) Sets of interactions
(from MIPS)
(Uetz et al.)
Pairwise interactions
31
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 31
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
3
Permanent v. Transient Complexes
Jansen et al., Genome Research, 2002
33
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 33
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
3
Genome-wide prediction of protein complexes based on both high-
throughput interaction data and non-interaction, genomic information
34
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 34
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
3
Global Network of 3 Different
Types of Relationships
~313K significant
relationshipsfrom ~18M
possible
35
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 35
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
3
Global Network of 3 Different
Types of Relationships
Simultaneous 188KInverted 63KShifted 67K
~313K significant
relationshipsfrom ~18M
possible
36
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 36
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
3
Globally, how well do expression relationships
predict known interactions?
Coverage of the 8250 Known Interactions in Complexes Found [MIPS]
Random ~2% 1x(313K/18M)
24x
EnrichmentCompared to RandomizedExpressionRelationships
CC: 313K relationships from ~18M possible from clustering cell-cycle expt.
CC 42%
37
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 37
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
3
Combining Expression Data Sets Increases
Coverage & Decreases Noise
Coverage of the 8250 Known Interactions in Complexes Found [MIPS]
KO: 278K relationshipsfrom clusteringknock-out profiles [Rosetta]
KO 34% 22x
EnrichmentCompared to RandomizedExpressionRelationships
38
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 38
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
3
Combining Expression Data Sets Increases
Coverage & Decreases Noise
Coverage of the 8250 Known Interactions in Complexes Found [MIPS]
CC: 313K relationships from ~18M possible from clustering cell-cycle expt.
CC 42% 24x
KO: 278K relationshipsfrom clusteringknock-out profiles [Rosetta]
KO 34% 22xKO v CC 55% 111xKO ^ CC 21% 254x
EnrichmentCompared to RandomizedExpressionRelationships
39
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 39
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
3
Computational Proteomics of Complexes
1. Interactions provide a systematic way of defining protein function on a genomic scale
2. Known complexes provide a benchmark to validate and integrate genome-wide interaction experiments, providing a more accurate interactome
3. Known complexes provide a focus for the intergration of (non-interaction) genomic information – e.g. expression data
4. Extrapolating from known complexes, one can predict protein complexes on a genome-scale via integrating experimental interactions and non-interaction information (combining #1 and #2)
40
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 40
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
3
For the Future
• Developing an accurate interactome for the cell, from prediction and through integration of high-throughput information
• Development of statistical approaches to combine and integrate information
• Development of database technologies to store hetrogeneous and noisy genome-wide interaction datasets
• A moderate number of structural complexes are very useful as gold standard data
41
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 41
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
3
Protein complexes &Structural Genomics
• A computational challenge following from the solution of the partslist Given many monomeric structures produced by structural genomics,
predict (or rationalize) the interactome through docking
• Maybe many structures will be only be solved as complexes….
43
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 43
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
3
Bottlenecks in analysis of all of TargetDB (Interologs)
44
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 44
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
3
Acknowledgements
J Qian, R Jansen, A Drawid, C Wilson,
D Greenbaum, C Goh, N Lan, H Hegyi, R Das, S Douglas, B StengerJ Lin, Y Kluger
CollaboratorsM Snyder (A Kumar, H Zhu, …)
A Edwards, B Kus, J Greenblatt
NIH
GeneCensus.org