1
(c
) M
ark
Ge
rste
in,
20
02
, Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 1 G
ers
tein
.in
fo/t
alk
s
(c)
20
07
Research @ GersteinLab.org
• Human Genome Annotation (pseudogenes) Characterizing the function of non-coding regions, focusing on
protein fossils and novel transcriptionally active regions(Pseudogene.org + Tiling.GersteinLab.org)
• Molecular Networks Using molecular networks to integrate & mine functional
genomics information and describe protein function on a large-scale (Networks.GersteinLab.org)
• Macromolecular motions Analyzing select populations of 3D-structures in detail, trying to
understand their flexibility in terms of packing (MolMovDB.org)
2
(c
) M
ark
Ge
rste
in,
20
02
, Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 2 G
ers
tein
.in
fo/t
alk
s
(c)
20
07
Research @ GersteinLab.org
• Human Genome Annotation (pseudogenes) Characterizing the function of non-coding regions, focusing on
protein fossils and novel transcriptionally active regions(Pseudogene.org + Tiling.GersteinLab.org)
• Molecular Networks Using molecular networks to integrate & mine functional
genomics information and describe protein function on a large-scale (Networks.GersteinLab.org)
• Macromolecular motions Analyzing select populations of 3D-structures in detail, trying to
understand their flexibility in terms of packing (MolMovDB.org)
3
(c
) M
ark
Ge
rste
in,
20
02
, Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 3 L
ec
ture
s.G
ers
tein
La
b.o
rg
(c)
20
07
[IHGSC, Nature 409, 2001][Venter et al. Science 29, 2001]
2001: Most of the genome is not coding (only ~1.2% exon). It consists of elements such as repeats, regulatory regions, non-coding RNAs, origins of replication, pseudogenes, segmental duplications....What do these elements do? How should they be annotated?
4
(c
) M
ark
Ge
rste
in,
20
02
, Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 4 L
ec
ture
s.G
ers
tein
La
b.o
rg
(c)
20
07
[IHGSC, Nature 409, 2001][ENCODE Consortium, Nature 447, 2007]
2007 : Pilot results from ENCODE Consortium on decoding what the bases do - 1% of Genome (30 Mb in 44 regions)- Tiling Arrays to assay Transcription & Binding- Multi-organism sequencing and alignment- Careful Annotation- Variation Data
SnyderWeissman
Miller
5
(c
) M
ark
Ge
rste
in,
20
02
, Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 5 L
ec
ture
s.G
ers
tein
La
b.o
rg
(c)
20
07
Identifiable Features of a Pseudogene (RPL21)
Gerstein & Zheng. Sci Am 295: 48 (2006).
6
(c
) M
ark
Ge
rste
in,
20
02
, Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 6 L
ec
ture
s.G
ers
tein
La
b.o
rg
(c)
20
07
Very Different Distribution of Genes and Pseudogenes in
Different Organisms
E. coli, K-12
E. coli, O157
Yeast
Worm
Fly
Mouse
Human
Cress
Rickettsia
Leprosy
Plague
110
241
101
95
160
1,116
2411.1
3.3
4.6
4.6
5.5
12.2
100.3
131
125
2,932
3,272
-25000 -15000 -5000 5000 15000 25000 35000
-25000 -15000 -5000 5000 15000 25000 35000
No. of Genes
No. of Pseudogenes
Genome Size
Number of Genes / Pseudogenes
0 5000 15000 25000 35000Genome Size [log Mb]
E. coli, K-12
E. coli, O157
Yeast
Worm
Fly
Mouse
Human
Cress
Rickettsia
Leprosy
Plague
110
241
101
95
160
1,116
2411.1
3.3
4.6
4.6
5.5
12.2
100.3
131
125
2,932
3,272
-25000 -15000 -5000 5000 15000 25000 35000
-25000 -15000 -5000 5000 15000 25000 35000
No. of Genes
No. of Pseudogenes
Genome Size
Number of Genes / Pseudogenes
0 5000 15000 25000 35000Genome Size [log Mb]
Zhang & Gerstein (2004) Curr Opin Genet Dev 14: 328 + Harrison & Gerstein (2002) J Mol Biol 318: 1155
7
(c
) M
ark
Ge
rste
in,
20
02
, Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 7 L
ec
ture
s.G
ers
tein
La
b.o
rg
(c)
20
07
Historyof
Pseudogene
Preservation
Absent
Present with Disablement
Present without Disablement
Zheng et al. (2007) Gen. Res.
Based on alignment from ENCODE MSA
group
representative pseudogenes drawn from 201 totalA B C D E F
8
(c
) M
ark
Ge
rste
in,
20
02
, Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 8 L
ec
ture
s.G
ers
tein
La
b.o
rg
(c)
20
07
Using phastOdd value to examine neutral evolution of pseudogenes
most goodcandidates
for studying
mutational processes
a few non-proc. G under constraint
Zheng et al. (2007) Gen. Res.
9
(c
) M
ark
Ge
rste
in,
20
02
, Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 9 L
ec
ture
s.G
ers
tein
La
b.o
rg
(c)
20
07
Ex. Pseudogene
Intersecting Transcript-
ional Evidence
Composite ChIPhit
SpecialG
tracks in browser
diTAG
CAGE
TARs
ChIP-chip
Zheng et al. (2007) Gen. Res.
10
(
c)
Ma
rk G
ers
tein
, 2
00
2, Y
ale
, b
ioin
fo.m
bb
.ya
le.e
du
Do not reproduce without permission 10
Le
ctu
res
.Ge
rste
inL
ab
.org
(c
) 2
007
Research @ GersteinLab.org
• Human Genome Annotation (pseudogenes) Characterizing the function of non-coding regions, focusing on protein
fossils and novel transcriptionally active regions(Pseudogene.org + Tiling.GersteinLab.org)
• Molecular Networks Using molecular networks to integrate & mine functional genomics
information and describe protein function on a large-scale (Networks.GersteinLab.org)
• Macromolecular motions Analyzing select populations of 3D-structures in detail, trying to
understand their flexibility in terms of packing (MolMovDB.org)
11
(c
) M
ark
Ge
rste
in,
20
02
, Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 11
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
7
Toward Systematic Ontologies for Function,
using Networks
All of SCOP entries
1Oxido-
reductases
3Hydrolases
1.1Acting on CH-OH
1.1.1.1 Alcohol dehydrogenase
ENZYME
1.1.1NAD and
NADP acceptor
NON-ENZYME
3.1Acting on
ester bonds
1 Meta-bolism
1.1 Carb.
metab.
3.8 Extracel.
matrix
3.8.2 Extracel.
matrixglyco-protein
1.1.1 Polysach.
metab.
3.8.2.1 Fibro-nectin
General similarity Functional class similarityPrecise functional similarity
3 Cell
structure
1.5Acting on
CH-NH
3.4Acting on
peptide bonds
1.1.1.3Homoserine
dehydrogenase
1.2Nucleotide
metab.
3.1 Nucleus
3.8.2.2Tenascin
1.1.1.1 Glycogenmetab.
1.1.1.2 Starchmetab.
3.1.1.1 Carboxylesterase
3.1.1Carboxylic
ester hydro-lases
3.1.1.8 Cholineesterase
General Networks[Eisenberg et al.]
Hierarchies & DAGs[Enzyme, Bairoch; GO, Ashburner;
MIPS, Mewes, Frishman]
Interaction Vectors [Lan et al, IEEE 90:1848]
12
(
c)
Ma
rk G
ers
tein
, 2
00
2, Y
ale
, b
ioin
fo.m
bb
.ya
le.e
du
Do not reproduce without permission 12
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
7
Networks occupy a midway point in terms of level of understanding
1D: Complete Genetic Partslist
~2D: Bio-molecularNetwork
Wiring Diagram
3D: Detailed structural
understanding of cellular machinery
[Jeong et al. Nature, 41:411][Fleischmann et al., Science, 269 :496]
13
(
c)
Ma
rk G
ers
tein
, 2
00
2, Y
ale
, b
ioin
fo.m
bb
.ya
le.e
du
Do not reproduce without permission 13
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
7
Networks as a universal language
Disease Spread
[Krebs]
ProteinInteractions
[Barabasi] Social Network
Food Web
Neural Network[Cajal]
ElectronicCircuit
Internet[Burch & Cheswick]
14
(
c)
Ma
rk G
ers
tein
, 2
00
2, Y
ale
, b
ioin
fo.m
bb
.ya
le.e
du
Do not reproduce without permission 14
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
7
TopNet – an automated web tool
[Yu et al., NAR (2004); Yip et al. Bioinfo. (2006); Similar tools include Cytoscape.org, Idekar, Sander et al]
(vers. 2 :"TopNet-like
Yale Network Analyzer")
Normal website + Downloaded code (JAVA)+ Web service (SOAP) with Cytoscape plugin
15
(
c)
Ma
rk G
ers
tein
, 2
00
2, Y
ale
, b
ioin
fo.m
bb
.ya
le.e
du
Do not reproduce without permission 15
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
7
Target Genes
Transcription Factors
142 transcription factors
3,420 target genes
7,074 regulatory interactions
From integrating data from Snyder et al... TRANSFAC
Yeast Regulatory Network: a platform for integration
16
(
c)
Ma
rk G
ers
tein
, 2
00
2, Y
ale
, b
ioin
fo.m
bb
.ya
le.e
du
Do not reproduce without permission 16
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
7
Yeast Regulatory Hierarchy: the Middle-managers Rule
[Yu et al., PNAS (2006)]
17
(
c)
Ma
rk G
ers
tein
, 2
00
2, Y
ale
, b
ioin
fo.m
bb
.ya
le.e
du
Do not reproduce without permission 17
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
7
Yeast Network Similar in Structure to Government Hierarchy
with Respect to Middle-managers
18
(
c)
Ma
rk G
ers
tein
, 2
00
2, Y
ale
, b
ioin
fo.m
bb
.ya
le.e
du
Do not reproduce without permission 18
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
7
Characteristics of Regulatory Hierarchy: Middle Managers are Information Flow
Bottlenecks
[Yu et al., PNAS (2006)]
19
(
c)
Ma
rk G
ers
tein
, 2
00
2, Y
ale
, b
ioin
fo.m
bb
.ya
le.e
du
Do not reproduce without permission 19
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
7
Research @ GersteinLab.org
• Human Genome Annotation (pseudogenes) Characterizing the function of non-coding regions, focusing on
protein fossils and novel transcriptionally active regions(Pseudogene.org + Tiling.GersteinLab.org)
• Molecular Networks Using molecular networks to integrate & mine functional
genomics information and describe protein function on a large-scale (Networks.GersteinLab.org)
• Macromolecular motions Analyzing select populations of 3D-structures in detail, trying to
understand their flexibility in terms of packing (MolMovDB.org)
20
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
3
Do not reproduce without permission
Surveying structural flexibility on a proteomic scale
• Questions How do we describe a wide-range of
structural variability in standard terms? Can we develop simple models to
explain constraints on protein flexibility? What information about flexible hinge
location is encoded in sequence?
21
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
3
Do not reproduce without permission
MolMovDB.org
22
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
3
Do not reproduce without permission
Example "Morph": MBP
2 Known Crystal Structures (endpoints, not necessarily same seq.)
Std. Geometric Stats. (from structure comparison)
Pathway Interpolation
23
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
3
Do not reproduce without permission
Interdigitating structure of protein interfaces constrains motion
24
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
3
Do not reproduce without permission
Packing Tools - Voronoi software to calculate packing volumes (geometry.molmovdb.org)
25
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
3
Do not reproduce without permission
Small Shearing Domain Motions: Molybdenum-binding protein & GAPDH
[Lawson] [Wonacott]
26
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
3
Do not reproduce without permission
Proteins With Shear Motions are Often Divided into Layers
GAPDH Hexokinase [Steitz]
27
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
3
Do not reproduce without permission
Transferrin: Interdomain Hinges
[Baker]
28
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
3
Do not reproduce without permission
Transferrin hinge involves absence of steric constraints (continuously
maintained interfaces), esp. at hinge
29
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
3
Do not reproduce without permission
Proteomics Research @ GersteinLab.org
• Macromolecular motions Analyzing select populations of 3D-structures in detail, trying to understand their
flexibility in terms of packing (MolMovDB.org)
• Molecular Networks Using molecular networks to integrate & mine functional genomics information
and describe protein function on a large-scale (Networks.GersteinLab.org)
• Human Genome Annotation (protein fossils) Characterizing the function of non-coding regions, focusing on protein fossils and
novel transcriptionally active regions(Pseudogene.org + Tiling.GersteinLab.org)
30
(
c)
Ma
rk G
ers
tein
, 2
00
2, Y
ale
, b
ioin
fo.m
bb
.ya
le.e
du
Do not reproduce without permission 30
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
7
RNAi:Birth of a Field in
the Literature Culmin-ating in the 2006
Nobel
Source:Gerstein & Douglas.
PLoS Comp. Bio. 3:e80 (2007)
PubNet.GersteinLab.org