do not reproduce without permission 1 gerstein.info/talks (c) 2004 1 (c) mark gerstein, 2002, yale,...

1

(c

) M

ark

Ge

rste

in,

20

02

, Y

ale

, b

ioin

fo.m

bb

.ya

le.e

du

Do not reproduce without permission 1 G

ers

tein

.in

fo/t

alk

s

(c)

20

04

Annotation of Intergenic Regionsof the Human Genome

Mark B GersteinYale (Comp. Bio. & Bioinformatics)

Cistrome 2007, Boston, MA

2007.04.30, 15:20-15:55Slides downloadable from Lectures.GersteinLab.org

(Please read permissions statement.)

(Genome Annotation Talk without much pgenes, including Tilescope, HMMs, DART, binding sites, and pgene-transcription, All completed comfortably within time.)

2 2

zdz

© m

mvii

[IHGSC, Nature 409, 2001][Venter et al. Science 29, 2001]

Most of the human genome is not coding sequence

3

(c

) M

ark

Ge

rste

in,

20

02

, Y

ale

, b

ioin

fo.m

bb

.ya

le.e

du


ers

tein

.in

fo/t

alk

s

(c)

20

04

• Mike Snyder &Sherman Weissman

• Tiling of whole chromosomes into small fragments

• Large-scale hybridization to find transcribed regions in unbiased fashion and TF binding sites (via ChIP-chip)

• Careful Computational Annotation

+ENCODE

4

(c

) M

ark

Ge

rste

in,

20

02

, Y

ale

, b

ioin

fo.m

bb

.ya

le.e

du


ers

tein

.in

fo/t

alk

s

(c)

20

04

Overall Aim of Yale Genomics is Comprehensive Intergenic Annotation

• Regulatory regions, repeats, non-coding RNAs, origins of replication, pseudogenes, segmental duplications, unknown elements….

• Specifc Results within ENCODE – 1% of human genome (~30Mb in 44 regions)

Pseudogenes (Zheng et al., GR) Classification of Novel Transcribed Regions (Rozowsky et al., GR)

• Characterization of Novel Structured RNAs (Washeitl et al., GR) Grouping and Classification of Binding Sites

(from ChIP-chip)• Med. Scale (~100kb) deserts and islands (Zhang et al., GR)• Novel Promotors (Trinklein et al., GR)

• CNVs and SDs (from hires-aCGH, Korbel et al.)

5

(c

) M

ark

Ge

rste

in,

20

02

, Y

ale

, b

ioin

fo.m

bb

.ya

le.e

du


ers

tein

.in

fo/t

alk

s

(c)

20

04

Outline: Tiling Array Analysis + Annotation Pipelines

• Tools for Scoring Arrays• Tools for Segmentation and Validation of Arrays• Results on Clusters of Novel Transcribed Regions• Results on Clusters of Binding Sites• Results on Active, Transcribed Pseudogenes

6

(c

) M

ark

Ge

rste

in,

20

02

, Y

ale

, b

ioin

fo.m

bb

.ya

le.e

du


ers

tein

.in

fo/t

alk

s

(c)

20

04

Tiling Arrays Probing Intergenic

Activity: Tools

7 7

zdz

© m

mvii

Tilescope 101

▪ It is available at tilescope.gersteinlab.org

▪ It was designed for high-density tiling microarray data analysis.

▪ It is useful▫ Most existing data processing software was designed for traditional

microarrays.

▫ It is flexible—several microarray data processing methods are available.

▫ It is easy to use• It has a graphic user interface.• The data analysis process is streamlined.• It is online software. No need to install.

▫ It is free!

Zhang et al. (2007) GenomeBiology

8 8

zdz

© m

mvii

Tilescope: system implementation

▪ Written in Java

▪ Composed of 3 parts: applet, servlet, and pipeline program

Internet

Applet

ServletPipeline

Server Users


9 9

zdz

© m

mvii

Tilescope: user interface


1010

zdz

© m

mvii

Tilescope: data processing

▪ Array data can be normalized by mean, median, quantile, and loess.

▪ Tile scoring generates the signal map and the P-value map.

▪ Feature identification produces ‘hits’.


13

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u

Do not reproduce without permission 13

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

4

HMM Segmentation + Optimal Selection of Regions to

Validate

14

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u


Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

4

Tiling array analyses: modeling

(unknown)

Goal: identify S = 1 probes based on DMethod: build a model M’ based on D, compute SPossible performance metric: error rate in predicting the state of a probe, experimental validation, …

(may need pre-processing)

15

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u


Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

4

Tiling array analyses: Hidden Markov model

Source: Du et al. (2006) Bioinformatics, 22, 3016-3024.

A model for transcriptional tiling array data

TAR: transcriptionally active region

Source: http://en.wikipedia.org/wiki/Hidden_Markov_model

State transitions in a hidden Markov model (example)x — hidden states

y — observable outputsa — transition probabilitiesb — output probabilities

http://en.wikipedia.org/wiki/Hidden_Markov_model

http://en.wikipedia.org/wiki/Hidden_Markov_model

16

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u


Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

4


17

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u


Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

4

Tiling array analyses: active sampling & supervised learning

• Active sampling Selecting a small set of sub-regions for validation first

• Supervised learning Use the validation data to train the statistical model

Source: Gerstein et al. (2007) Gen. Res.

18

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u


Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

4

Tiling array analyses: Sample sub-region selection

Can we find a good selection scheme?

Source: Gerstein et al. (2007) Gen. Res.

19

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u


Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

4

Tiling array analyses: Sample sub-region selection

• Sampling solely based on data D Some candidates

• Random selection• Entropy based• KL-divergence based

• Testing the performance of these schemes Simulation

• Why?– So that we know S exactly

20

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u


Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

4

Tiling array analyses: Simulation results


21

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u


Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

4

Tiling array analyses: Simulation results (cont.)


23

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u


Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

4

Tiling array analyses: back to the real world

• Transcriptional tiling array Mainly using gene annotation to train the model

• Somewhat analogous to using MaxEntropy to select the sample regions

• The training set is expected to be noisy, but still leads to satisfying performance

• ChIP-chip tiling array Try to guess the signal distribution according to annotation information

• Ideal scenario Optimally select a medium-sized set of sample sub-regions Do experimental validations to build the model

2424

zdz

© m

mvii

Tiling array analyses: results on transcriptional data(ENCODE regions (~30Mb) , training set (~7.5mb), ¼ training set (~1.9Mb, ~0.1M probes))


25

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u


Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

4


Activity: Classifying Un-annotated Transcription

26

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u


Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

4

DART: Classification of Unannotated Transcription

• Large amount of novel transcribed regions (TARs / transfrags) detected using tiling microarrays.

• Developed DART: Database of Active Regions and Tools Developed a classification procedure for these novel TARs/transfrags Database for storing & visualizing various sets of TARs/transfrags Associated tools for analyzing these sets

Rozowsky et al. Genome Research (2007, in press)

27

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u


Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

4

Set of All TARs

Exonic TARs Novel TARs

Intronic Intergenic ESTs

Proximal Distal Proximal Distal

Pseudo TARs

Rozowsky et al. Gen. Res. (2007, in press)

28

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u


Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

4

Sch

em

atic

of C

lass

ifica

tion

Pro

ced

ure

Set of Novel TARs

S1A Filter Novel TARsfor Unusual SequenceComposition

S1B Filter Novel TARsfor Cross-Hybridization

S3A Cluster into Novel Transcribed Loci using Expression Profiles (EP)

Peculiar TARs

Cross-Hyb TARs

Novel EP Loci

Singlet or Ambiguous TARs

P D P D E

S2 Assign Novel TARs to Known Genes usingExpression Profiles

Gene Assoc. TARs

P D P D E

P D P D E

P D P D E

S3B Cluster into Novel TranscribedLoci using Phylogenic Profiles (PP)

Novel PP Loci

P D P D E

DART

Rozowsky et al. Genome Research (in press)

29

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u


Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

4

Rozowsky et al. Gen. Res. (2007, in press)

TAR clustering

30

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u


Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

4

DART: Database & Tools- Interfaces with UCSC- Tools use Ensembl API


31

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u


Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

4

Table 1 Locations of all TARs

Exonic Pseudogenes Unannotated RegionsSize of ENCODE Regions (bp) 1,776,157 144,745 28,077,158Percentage of all ENCODE 5.9% 0.5% 93.6%

Number of TARs 3,666 195 6,988Percentage of all TARs 33.8% 1.8% 64.4%

Locations of Novel TARsESTs not in Exons Intronic Proximal Intronic Distal Intergenic Proximal Intergenic Distal

Size of Unannotated Regions (bp) 2,477,910 8,522,559 5,536,879 2,434,101 9,250,454Percentage of Unannotated Regions 8.8% 30.2% 19.6% 8.6% 32.8%

Number of Novel TARs 1,194 3,006 864 772 1,300Percentage of all Novel TARs 16.7% 42.1% 12.1% 10.8% 18.2%

Table 2: Sets of Classified Novel TARs Number Percentage

Total 6,988 100.0%

With peculiar sequence composition 503 7.2%Assigned to known genes 955 13.7%Caused by cross-hybridization - -In novel transcribed loci using expression profiles 681 9.7%In novel transcribed loci using phylogenetic profiles 782 11.2%


32

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u


Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

4

DART Classification has been experimentally validated with some small scale experiment RT-PCR & Sequencing

Results:

18/46 (39%) confirmed by RT-PCR

4/5 Sequenced Products Map uniquely to correct genomic region

0.5Kb

1Kb

1.5Kb

2Kb

+ - + - + - + - + - + - + - + - + - + - + - + -+ - + - L

A11 A12 A13 A14 A15 A16 A17 A18 A19 A20 A21 A22 A23 B1ID:

TAR 1

PCR Sequence 1 ttcttcggaaaagcacatgaactctttggagtctcctgttccacttggtaaatttcctat 60 |||||||||||||||||||||||||| |||||||||||||||||||||||||||||||||Chr21 34,270,569 ttcttcggaaaagcacatgaactcttcggagtctcctgttccacttggtaaatttcctat 34,270,628

PCR Sequence 61 agctccgcactgaaagtccctgctgccctccttcctctgagcttgtggggcccacagatc 120 ||| |||||||||||||||||||||||||||||||||||||||||||||||||||||||Chr21 34,270,629 agccacgcactgaaagtccctgctgccctccttcctctgagcttgtggggcccacagatc 34,270,688

PCR Sequence 121 ccctgctccacttcctgcttcatttcagctgat 153 |||||||||||||||||||||||||||||||||Chr21 34,270,689 ccctgctccacttcctgcttcatttcagctgat 34,270,721

TAR 2

PCR Sequence 154 ggatgacactccctcgttctaataccatctgaatgcctgagcaattacatcttacaacct 213 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||Chr21 34,270,898 ggatgacactccctcgttctaataccatctgaatgcctgagcaattacatcttacaacct 34,270,957

PCR Sequence 214 catgaaaaacacagcagcttgtcacgatgaatg 246 |||||||||||||||||||||||||||||||||Chr21 34,270,958 catgaaaaacacagcagcttgtcacgatgaatg 34,270,990

Forward Primer

Reverse Primer

Novel TARs

PCR SequenceFP RP


33

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u


Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

4


Activity: Categories Groups of Binding

Sites

3434

zdz

© m

mvii

Transcriptional regulation

▪ Protein-coding genes are transcribed by RNA polymerase II (Pol2) ...

▪ ... Under elaborate regulation by▫ The binding of a complex set of transcription factors to their regulatory

elements

▫ Histone modifications such as acetylation and methylation

▫ Chromatin remodeling

▪ Transcription factor binding sites include▫ Core promoters

▫ Promoter proximal elements

▫ Other elements such as enhancers, silencers, insulators, and response elements

▪ Transcriptional regulatory elements can be globally mapped by high-throughput experiments such as ChIP-chip or ChIP-PET.

Zhang et al. (2007) Gen. Res.

3535

zdz

© m

mvii

ENCODE TR study

▪ First concerted effort to systematically identify TREs in the human genome on a large scale▫ 105 lists of transcriptional regulatory elements in the encode regions

▫ 29 transcription factors, 9 cell lines, 2 time points

▫ 7 laboratories and 3 different microarray platforms

▪ TFs and their TREs can be studied on various genomic levels.


3636

zdz

© m

mvii

Our TRE analysis approach

▪ On an intermediate genomic level, involving 10 ~ 100 kb of DNA with several genes on average.

▪ Try to present the problem and subsequently analyze the data in a consistent and coherent statistical framework.


3737

zdz

© m

mvii

Landscape of ENCODE TREs

▪ Positive correlation of the TRE density with both non-exonic conservation and gene density in a genomic region


3838

zdz

© m

mvii

Non-random distribution of TREs

▪ TREs are not evenly distributed throughout the encode regions (P < 2.2×10−16 ).

▪ The actual TRE distribution is power-law.

▪ The null distribution is ‘Poissonesque.’

▪ Many genomic subregions with extreme numbers of TREs.


3939

zdz

© m

mvii

Local TRE enrichment and depletion

▪ Hundreds of TRE ‘islands’ and ‘deserts’ are identified in ENCODE regions.

▪ The longest island is composed of 68 various TREs and covers a 35-kb region near the HOXA cluster on chromosome 7.

▪ The entirety of ehd1 on chromosome 11 is covered by TRE islands.

▪ Some of islands are located in the intergenic regions in the genome.

dart.gersteinlab.org/encode/tr/


40

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u


Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

4


Activity: Connecting Intergenic Activity to

Pseudogenes

41

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u


Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

4

5 Methods of Assignment

• 4 automatic pipelines retroFinder, pseudoFinder, PseudoPipe, GIS Comparing protein or transcript v genomic DNA,

filtering, application of rules

• HAVANA manual• What is a pseudogene?

Different criteria

• Conservative approach here Can't overlap gene annotation Need to have a protein alignment 201 pseudogenes (in comparison to ~400 genes)

Zheng et al. (2007) Gen. Res.

42

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u


Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

4

Overlap of Pseudogenes by 5 Different

Methods

Union of 252


43

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u


Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

4

Ex. Pseudogene

Intersecting Transcript-

ional Evidence

SpecialG

tracks in browser

diTAG

CAGE

TARS

ChIP-chip


44

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u


Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

4

Intersection of Pseudogenes with Transcriptional Evidence


45

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u


Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

4



46

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u


Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

4


Excluding TARs (due to cross-hyb issues)

Targeted RACE expts to 160 pseudogenes, gives 14

Total Evidence from Sequencing is 38 of 201 (with 5 having cryptic promotors)

14


47

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u


Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

4

Targeted Transcription Expts.• RACE expts

Interrogated 160 pseudogenes (49 non-processed & 111 processed) In 51 cases (26 non-processed and 25 processed pseudogenes), could

design distinguishing primers (>4 mismatched bp v. parent) The resulting data supported transcription from 14 (8 processed and 6 non-

processed) of the 160 pseudogenes (9 with pseudogene specific primers) These numbers might represent a conservative estimate since a RACEfrag

was assigned to its parent gene by default if it could be mapped to both a parent locus and a pseudogene locus.

• RACE expts + sequencing (CAGE, PET, EST and mRNA) unambiguous evidence for pseudogene transcription All together, these data indicate 38 of 201 pseudogenes being the source

of novel RNA transcripts 5 of these had cryptic promotors (from TR analysis)


48

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u


Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

4

Extension to Whole Genome

• 233 Transcribed from ~8000 Processed Pseudogenes • Evidence for Transcription

8% Refseq mRNAs 32% Unigene consensus sequences 72% dbEST expressed sequence tags 32% Oligonucleotide microarray data (extra support)

• Highly decayed Fraction with Ka/Ks ≥ 0.5 is 54%

Harrison et al. (2005) NAR

4949

Genes & PseudogenesGenes & Pseudogenes

Zheng & Gerstein, TIG (2007)

5050

Genes or Pseudogenes?Genes or Pseudogenes?

Zheng & Gerstein, TIG (2007)

51

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u


Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

4

Conclusions

• Tilescope Processing Pipeline HMM Segmentation & optimal selection of regions to validate

• DART classification of TARs 1300 clusters of transcriptionally active regions in ENCODE

• Deserts and Forests of Binding Activity on ~50kb scale

• Pseudogene Activity Consensus annotation from automatic pipelines and manual curation

gives 201 (~2/3 processed) >20% appear to be transcribed (38/201)

52

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u


Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

4

MS

MG

PM

SW

Acknowledgements

53

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u


Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

4

MS

MG

PM

SW

Acknowledgements

54

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u


Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

4 MS

MG

PM

SW

Acknowledgementspseudogene.org, tiling.gersteinlab.org

55

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u


Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

4

Acknowledgements

MS

MG

PM

SW

pseudogene.org, tiling.gersteinlab.org

56

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u


Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

4

Acknowledgements

P Bertone

D Zheng

Z Zhang

MS

MG

PM

SW

P Harrison


57

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u


Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

4

Acknowledgements

P Bertone

J Rozowsky

G Euskirchen

T Royce

S Balasubramanian

J Korbel

A Karpikov

D Zheng

Z Zhang

D Yan

R Sasidharan

O Emanuelsson

J Du

J Rinn

MS

MG

PM

SW

V Stolc

R Martone

P Harrison


N Luscombe

C Bruce

J Chang

N Carriero

N Echols

J Karro

58

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u


Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

4

ENCODE Acknowledgements

Adam Frankish, Robert Baertsch, Philipp Kapranov, Alexandre Reymond,

Siew Woh Choo, Yontao Lu, France Denoeud, Stylianos Antonarakis, Yijun Ruan, Chia-Lin Wei, Thomas

Gingeras, Roderic Guigo, Jennifer Harrow

Sanger, UCSC, GIS, AFFX, Geneva, IMIM

59

(

c)

Ma

rk G

ers

tein

, 2

00

2,

Ya

le,

bio

info

.mb

b.y

ale

.ed

u


Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

4

Permissions Statement

This Presentation is copyright Mark Gerstein, Yale University, 2007.

Feel free to use images in it with

PROPER acknowledgement

(via citation to relevant papers or link to gersteinlab.org).

do not reproduce without permission 1 gerstein.info/talks (c) 2004 1 (c) mark gerstein, 2002, yale,...

Documents

c mark gerstein

infotalks c

tools slide

genomebiology slide

sequence slide

zdz mmvii

data processing array

tiling arrays