do not reproduce without permission 1 gerstein.info/talks (c) 2004 1 (c) mark gerstein, 2002, yale,...
TRANSCRIPT
1
(c
) M
ark
Ge
rste
in,
20
02
, Y
ale
, b
ioin
fo.m
bb
.ya
le.e
du
Do not reproduce without permission 1 G
ers
tein
.in
fo/t
alk
s
(c)
20
04
Annotation of Intergenic Regionsof the Human Genome
Mark B GersteinYale (Comp. Bio. & Bioinformatics)
Cistrome 2007, Boston, MA
2007.04.30, 15:20-15:55Slides downloadable from Lectures.GersteinLab.org
(Please read permissions statement.)
(Genome Annotation Talk without much pgenes, including Tilescope, HMMs, DART, binding sites, and pgene-transcription, All completed comfortably within time.)
2 2
zdz
© m
mvii
[IHGSC, Nature 409, 2001][Venter et al. Science 29, 2001]
Most of the human genome is not coding sequence
3
(c
) M
ark
Ge
rste
in,
20
02
, Y
ale
, b
ioin
fo.m
bb
.ya
le.e
du
Do not reproduce without permission 3 G
ers
tein
.in
fo/t
alk
s
(c)
20
04
• Mike Snyder &Sherman Weissman
• Tiling of whole chromosomes into small fragments
• Large-scale hybridization to find transcribed regions in unbiased fashion and TF binding sites (via ChIP-chip)
• Careful Computational Annotation
+ENCODE
4
(c
) M
ark
Ge
rste
in,
20
02
, Y
ale
, b
ioin
fo.m
bb
.ya
le.e
du
Do not reproduce without permission 4 G
ers
tein
.in
fo/t
alk
s
(c)
20
04
Overall Aim of Yale Genomics is Comprehensive Intergenic Annotation
• Regulatory regions, repeats, non-coding RNAs, origins of replication, pseudogenes, segmental duplications, unknown elements….
• Specifc Results within ENCODE – 1% of human genome (~30Mb in 44 regions)
Pseudogenes (Zheng et al., GR) Classification of Novel Transcribed Regions (Rozowsky et al., GR)
• Characterization of Novel Structured RNAs (Washeitl et al., GR) Grouping and Classification of Binding Sites
(from ChIP-chip)• Med. Scale (~100kb) deserts and islands (Zhang et al., GR)• Novel Promotors (Trinklein et al., GR)
• CNVs and SDs (from hires-aCGH, Korbel et al.)
5
(c
) M
ark
Ge
rste
in,
20
02
, Y
ale
, b
ioin
fo.m
bb
.ya
le.e
du
Do not reproduce without permission 5 G
ers
tein
.in
fo/t
alk
s
(c)
20
04
Outline: Tiling Array Analysis + Annotation Pipelines
• Tools for Scoring Arrays• Tools for Segmentation and Validation of Arrays• Results on Clusters of Novel Transcribed Regions• Results on Clusters of Binding Sites• Results on Active, Transcribed Pseudogenes
6
(c
) M
ark
Ge
rste
in,
20
02
, Y
ale
, b
ioin
fo.m
bb
.ya
le.e
du
Do not reproduce without permission 6 G
ers
tein
.in
fo/t
alk
s
(c)
20
04
Tiling Arrays Probing Intergenic
Activity: Tools
7 7
zdz
© m
mvii
Tilescope 101
▪ It is available at tilescope.gersteinlab.org
▪ It was designed for high-density tiling microarray data analysis.
▪ It is useful▫ Most existing data processing software was designed for traditional
microarrays.
▫ It is flexible—several microarray data processing methods are available.
▫ It is easy to use• It has a graphic user interface.• The data analysis process is streamlined.• It is online software. No need to install.
▫ It is free!
Zhang et al. (2007) GenomeBiology
8 8
zdz
© m
mvii
Tilescope: system implementation
▪ Written in Java
▪ Composed of 3 parts: applet, servlet, and pipeline program
Internet
Applet
ServletPipeline
Server Users
Zhang et al. (2007) GenomeBiology
9 9
zdz
© m
mvii
Tilescope: user interface
Zhang et al. (2007) GenomeBiology
1010
zdz
© m
mvii
Tilescope: data processing
▪ Array data can be normalized by mean, median, quantile, and loess.
▪ Tile scoring generates the signal map and the P-value map.
▪ Feature identification produces ‘hits’.
Zhang et al. (2007) GenomeBiology
13
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 13
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
4
HMM Segmentation + Optimal Selection of Regions to
Validate
14
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 14
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
4
Tiling array analyses: modeling
(unknown)
Goal: identify S = 1 probes based on DMethod: build a model M’ based on D, compute SPossible performance metric: error rate in predicting the state of a probe, experimental validation, …
(may need pre-processing)
15
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 15
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
4
Tiling array analyses: Hidden Markov model
Source: Du et al. (2006) Bioinformatics, 22, 3016-3024.
A model for transcriptional tiling array data
TAR: transcriptionally active region
Source: http://en.wikipedia.org/wiki/Hidden_Markov_model
State transitions in a hidden Markov model (example)x — hidden states
y — observable outputsa — transition probabilitiesb — output probabilities
16
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 16
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
4
Source: Du et al. (2006) Bioinformatics, 22, 3016-3024.
17
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 17
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
4
Tiling array analyses: active sampling & supervised learning
• Active sampling Selecting a small set of sub-regions for validation first
• Supervised learning Use the validation data to train the statistical model
Source: Gerstein et al. (2007) Gen. Res.
18
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 18
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
4
Tiling array analyses: Sample sub-region selection
Can we find a good selection scheme?
Source: Gerstein et al. (2007) Gen. Res.
19
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 19
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
4
Tiling array analyses: Sample sub-region selection
• Sampling solely based on data D Some candidates
• Random selection• Entropy based• KL-divergence based
• Testing the performance of these schemes Simulation
• Why?– So that we know S exactly
20
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 20
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
4
Tiling array analyses: Simulation results
Source: Du et al. (2006) Bioinformatics, 22, 3016-3024.
21
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 21
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
4
Tiling array analyses: Simulation results (cont.)
Source: Du et al. (2006) Bioinformatics, 22, 3016-3024.
23
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 23
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
4
Tiling array analyses: back to the real world
• Transcriptional tiling array Mainly using gene annotation to train the model
• Somewhat analogous to using MaxEntropy to select the sample regions
• The training set is expected to be noisy, but still leads to satisfying performance
• ChIP-chip tiling array Try to guess the signal distribution according to annotation information
• Ideal scenario Optimally select a medium-sized set of sample sub-regions Do experimental validations to build the model
2424
zdz
© m
mvii
Tiling array analyses: results on transcriptional data(ENCODE regions (~30Mb) , training set (~7.5mb), ¼ training set (~1.9Mb, ~0.1M probes))
Source: Du et al. (2006) Bioinformatics, 22, 3016-3024.
25
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 25
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
4
Tiling Arrays Probing Intergenic
Activity: Classifying Un-annotated Transcription
26
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 26
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
4
DART: Classification of Unannotated Transcription
• Large amount of novel transcribed regions (TARs / transfrags) detected using tiling microarrays.
• Developed DART: Database of Active Regions and Tools Developed a classification procedure for these novel TARs/transfrags Database for storing & visualizing various sets of TARs/transfrags Associated tools for analyzing these sets
Rozowsky et al. Genome Research (2007, in press)
27
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 27
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
4
Set of All TARs
Exonic TARs Novel TARs
Intronic Intergenic ESTs
Proximal Distal Proximal Distal
Pseudo TARs
Rozowsky et al. Gen. Res. (2007, in press)
28
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 28
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
4
Sch
em
atic
of C
lass
ifica
tion
Pro
ced
ure
Set of Novel TARs
S1A Filter Novel TARsfor Unusual SequenceComposition
S1B Filter Novel TARsfor Cross-Hybridization
S3A Cluster into Novel Transcribed Loci using Expression Profiles (EP)
Peculiar TARs
Cross-Hyb TARs
Novel EP Loci
Singlet or Ambiguous TARs
P D P D E
S2 Assign Novel TARs to Known Genes usingExpression Profiles
Gene Assoc. TARs
P D P D E
P D P D E
P D P D E
S3B Cluster into Novel TranscribedLoci using Phylogenic Profiles (PP)
Novel PP Loci
P D P D E
DART
Rozowsky et al. Genome Research (in press)
29
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 29
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
4
Rozowsky et al. Gen. Res. (2007, in press)
TAR clustering
30
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 30
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
4
DART: Database & Tools- Interfaces with UCSC- Tools use Ensembl API
Rozowsky et al. Genome Research (in press)
31
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 31
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
4
Table 1 Locations of all TARs
Exonic Pseudogenes Unannotated RegionsSize of ENCODE Regions (bp) 1,776,157 144,745 28,077,158Percentage of all ENCODE 5.9% 0.5% 93.6%
Number of TARs 3,666 195 6,988Percentage of all TARs 33.8% 1.8% 64.4%
Locations of Novel TARsESTs not in Exons Intronic Proximal Intronic Distal Intergenic Proximal Intergenic Distal
Size of Unannotated Regions (bp) 2,477,910 8,522,559 5,536,879 2,434,101 9,250,454Percentage of Unannotated Regions 8.8% 30.2% 19.6% 8.6% 32.8%
Number of Novel TARs 1,194 3,006 864 772 1,300Percentage of all Novel TARs 16.7% 42.1% 12.1% 10.8% 18.2%
Table 2: Sets of Classified Novel TARs Number Percentage
Total 6,988 100.0%
With peculiar sequence composition 503 7.2%Assigned to known genes 955 13.7%Caused by cross-hybridization - -In novel transcribed loci using expression profiles 681 9.7%In novel transcribed loci using phylogenetic profiles 782 11.2%
Rozowsky et al. Genome Research (in press)
32
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 32
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
4
DART Classification has been experimentally validated with some small scale experiment RT-PCR & Sequencing
Results:
18/46 (39%) confirmed by RT-PCR
4/5 Sequenced Products Map uniquely to correct genomic region
0.5Kb
1Kb
1.5Kb
2Kb
+ - + - + - + - + - + - + - + - + - + - + - + -+ - + - L
A11 A12 A13 A14 A15 A16 A17 A18 A19 A20 A21 A22 A23 B1ID:
TAR 1
PCR Sequence 1 ttcttcggaaaagcacatgaactctttggagtctcctgttccacttggtaaatttcctat 60 |||||||||||||||||||||||||| |||||||||||||||||||||||||||||||||Chr21 34,270,569 ttcttcggaaaagcacatgaactcttcggagtctcctgttccacttggtaaatttcctat 34,270,628
PCR Sequence 61 agctccgcactgaaagtccctgctgccctccttcctctgagcttgtggggcccacagatc 120 ||| |||||||||||||||||||||||||||||||||||||||||||||||||||||||Chr21 34,270,629 agccacgcactgaaagtccctgctgccctccttcctctgagcttgtggggcccacagatc 34,270,688
PCR Sequence 121 ccctgctccacttcctgcttcatttcagctgat 153 |||||||||||||||||||||||||||||||||Chr21 34,270,689 ccctgctccacttcctgcttcatttcagctgat 34,270,721
TAR 2
PCR Sequence 154 ggatgacactccctcgttctaataccatctgaatgcctgagcaattacatcttacaacct 213 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||Chr21 34,270,898 ggatgacactccctcgttctaataccatctgaatgcctgagcaattacatcttacaacct 34,270,957
PCR Sequence 214 catgaaaaacacagcagcttgtcacgatgaatg 246 |||||||||||||||||||||||||||||||||Chr21 34,270,958 catgaaaaacacagcagcttgtcacgatgaatg 34,270,990
Forward Primer
Reverse Primer
Novel TARs
PCR SequenceFP RP
Rozowsky et al. Genome Research (in press)
33
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 33
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
4
Tiling Arrays Probing Intergenic
Activity: Categories Groups of Binding
Sites
3434
zdz
© m
mvii
Transcriptional regulation
▪ Protein-coding genes are transcribed by RNA polymerase II (Pol2) ...
▪ ... Under elaborate regulation by▫ The binding of a complex set of transcription factors to their regulatory
elements
▫ Histone modifications such as acetylation and methylation
▫ Chromatin remodeling
▪ Transcription factor binding sites include▫ Core promoters
▫ Promoter proximal elements
▫ Other elements such as enhancers, silencers, insulators, and response elements
▪ Transcriptional regulatory elements can be globally mapped by high-throughput experiments such as ChIP-chip or ChIP-PET.
Zhang et al. (2007) Gen. Res.
3535
zdz
© m
mvii
ENCODE TR study
▪ First concerted effort to systematically identify TREs in the human genome on a large scale▫ 105 lists of transcriptional regulatory elements in the encode regions
▫ 29 transcription factors, 9 cell lines, 2 time points
▫ 7 laboratories and 3 different microarray platforms
▪ TFs and their TREs can be studied on various genomic levels.
Zhang et al. (2007) Gen. Res.
3636
zdz
© m
mvii
Our TRE analysis approach
▪ On an intermediate genomic level, involving 10 ~ 100 kb of DNA with several genes on average.
▪ Try to present the problem and subsequently analyze the data in a consistent and coherent statistical framework.
Zhang et al. (2007) Gen. Res.
3737
zdz
© m
mvii
Landscape of ENCODE TREs
▪ Positive correlation of the TRE density with both non-exonic conservation and gene density in a genomic region
Zhang et al. (2007) Gen. Res.
3838
zdz
© m
mvii
Non-random distribution of TREs
▪ TREs are not evenly distributed throughout the encode regions (P < 2.2×10−16 ).
▪ The actual TRE distribution is power-law.
▪ The null distribution is ‘Poissonesque.’
▪ Many genomic subregions with extreme numbers of TREs.
Zhang et al. (2007) Gen. Res.
3939
zdz
© m
mvii
Local TRE enrichment and depletion
▪ Hundreds of TRE ‘islands’ and ‘deserts’ are identified in ENCODE regions.
▪ The longest island is composed of 68 various TREs and covers a 35-kb region near the HOXA cluster on chromosome 7.
▪ The entirety of ehd1 on chromosome 11 is covered by TRE islands.
▪ Some of islands are located in the intergenic regions in the genome.
dart.gersteinlab.org/encode/tr/
Zhang et al. (2007) Gen. Res.
40
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 40
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
4
Tiling Arrays Probing Intergenic
Activity: Connecting Intergenic Activity to
Pseudogenes
41
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 41
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
4
5 Methods of Assignment
• 4 automatic pipelines retroFinder, pseudoFinder, PseudoPipe, GIS Comparing protein or transcript v genomic DNA,
filtering, application of rules
• HAVANA manual• What is a pseudogene?
Different criteria
• Conservative approach here Can't overlap gene annotation Need to have a protein alignment 201 pseudogenes (in comparison to ~400 genes)
Zheng et al. (2007) Gen. Res.
42
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 42
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
4
Overlap of Pseudogenes by 5 Different
Methods
Union of 252
Zheng et al. (2007) Gen. Res.
43
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 43
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
4
Ex. Pseudogene
Intersecting Transcript-
ional Evidence
SpecialG
tracks in browser
diTAG
CAGE
TARS
ChIP-chip
Zheng et al. (2007) Gen. Res.
44
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 44
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
4
Intersection of Pseudogenes with Transcriptional Evidence
Zheng et al. (2007) Gen. Res.
45
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 45
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
4
Intersection of Pseudogenes with Transcriptional Evidence
Zheng et al. (2007) Gen. Res.
46
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 46
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
4
Intersection of Pseudogenes with Transcriptional Evidence
Excluding TARs (due to cross-hyb issues)
Targeted RACE expts to 160 pseudogenes, gives 14
Total Evidence from Sequencing is 38 of 201 (with 5 having cryptic promotors)
14
Zheng et al. (2007) Gen. Res.
47
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 47
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
4
Targeted Transcription Expts.• RACE expts
Interrogated 160 pseudogenes (49 non-processed & 111 processed) In 51 cases (26 non-processed and 25 processed pseudogenes), could
design distinguishing primers (>4 mismatched bp v. parent) The resulting data supported transcription from 14 (8 processed and 6 non-
processed) of the 160 pseudogenes (9 with pseudogene specific primers) These numbers might represent a conservative estimate since a RACEfrag
was assigned to its parent gene by default if it could be mapped to both a parent locus and a pseudogene locus.
• RACE expts + sequencing (CAGE, PET, EST and mRNA) unambiguous evidence for pseudogene transcription All together, these data indicate 38 of 201 pseudogenes being the source
of novel RNA transcripts 5 of these had cryptic promotors (from TR analysis)
Zheng et al. (2007) Gen. Res.
48
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 48
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
4
Extension to Whole Genome
• 233 Transcribed from ~8000 Processed Pseudogenes • Evidence for Transcription
8% Refseq mRNAs 32% Unigene consensus sequences 72% dbEST expressed sequence tags 32% Oligonucleotide microarray data (extra support)
• Highly decayed Fraction with Ka/Ks ≥ 0.5 is 54%
Harrison et al. (2005) NAR
4949
Genes & PseudogenesGenes & Pseudogenes
Zheng & Gerstein, TIG (2007)
5050
Genes or Pseudogenes?Genes or Pseudogenes?
Zheng & Gerstein, TIG (2007)
51
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 51
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
4
Conclusions
• Tilescope Processing Pipeline HMM Segmentation & optimal selection of regions to validate
• DART classification of TARs 1300 clusters of transcriptionally active regions in ENCODE
• Deserts and Forests of Binding Activity on ~50kb scale
• Pseudogene Activity Consensus annotation from automatic pipelines and manual curation
gives 201 (~2/3 processed) >20% appear to be transcribed (38/201)
52
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 52
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
4
MS
MG
PM
SW
Acknowledgements
53
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 53
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
4
MS
MG
PM
SW
Acknowledgements
54
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 54
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
4 MS
MG
PM
SW
Acknowledgementspseudogene.org, tiling.gersteinlab.org
55
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 55
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
4
Acknowledgements
MS
MG
PM
SW
pseudogene.org, tiling.gersteinlab.org
56
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 56
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
4
Acknowledgements
P Bertone
D Zheng
Z Zhang
MS
MG
PM
SW
P Harrison
pseudogene.org, tiling.gersteinlab.org
57
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 57
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
4
Acknowledgements
P Bertone
J Rozowsky
G Euskirchen
T Royce
S Balasubramanian
J Korbel
A Karpikov
D Zheng
Z Zhang
D Yan
R Sasidharan
O Emanuelsson
J Du
J Rinn
MS
MG
PM
SW
V Stolc
R Martone
P Harrison
pseudogene.org, tiling.gersteinlab.org
N Luscombe
C Bruce
J Chang
N Carriero
N Echols
J Karro
58
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 58
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
4
ENCODE Acknowledgements
Adam Frankish, Robert Baertsch, Philipp Kapranov, Alexandre Reymond,
Siew Woh Choo, Yontao Lu, France Denoeud, Stylianos Antonarakis, Yijun Ruan, Chia-Lin Wei, Thomas
Gingeras, Roderic Guigo, Jennifer Harrow
Sanger, UCSC, GIS, AFFX, Geneva, IMIM
59
(
c)
Ma
rk G
ers
tein
, 2
00
2,
Ya
le,
bio
info
.mb
b.y
ale
.ed
u
Do not reproduce without permission 59
Ge
rste
in.i
nfo
/ta
lks
(c
) 2
00
4
Permissions Statement
This Presentation is copyright Mark Gerstein, Yale University, 2007.
Feel free to use images in it with
PROPER acknowledgement
(via citation to relevant papers or link to gersteinlab.org).