localization analysis 11/07/07. microarray probes are oligonucleotide sequences with regular spacing...

72
Localization Analysis 11/07/07

Upload: veronica-garrett

Post on 16-Dec-2015

218 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

Localization Analysis

11/07/07

Page 2: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

• Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region.

chromosome

Tiling arrays

Page 3: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

Tiling Arrays

http://en.wikipedia.org/

Page 4: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

Typical applications:

Comparitive Genomic Hybridization (aCGH) – copy number variation

RNA analysis: transcript structure, transcript discovery, etc.

Location analysis: nuclease sensitivity

Location analysis: chromatin immunoprecipitation (ChIP)

NOTE: ALL of these things can also be done by deep sequencing, which we will briefly cover towards the end

Page 5: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

Series1

Series2

Spike-in experiments – we can find linkers as short as 7 bp

Location of labeled PCR product Measured red/green ratio

Page 6: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

Experimental Determination of Cross-Hybridization

Spike in PCR product – (1+1)/1 > (1+n)/n, so X-hybing probes will detect less enrichment experimentally

-8

-6

-4

-2

0

2

4

6

Series1

Series3

X-hyb

Page 7: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

Spike-in data

-2

-1.5

-1

-0.5

0

0.5

1

1.5

1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97 103 109 115 121 127 133 139 145 151 157 163 169 175 181 187 193 199Series1

Series2

-4

-3

-2

-1

0

1

2

1 14 27 40 53 66 79 92 105 118 131 144 157 170 183 196 209 222 235 248 261 274 287 300 313 326 339 352 365 378 391 404 417 430 443 456

Series1

Series2

Page 8: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

Array CGH Technology

Page 9: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

Genome-wide measurement of DNA copy number alteration by array CGH

Pollack J R et al. PNAS 2002;99:12963-12968

©2002 by The National Academy of Sciences

Page 10: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

DNA copy number alteration across chromosome 8 by array CGH

Pollack J R et al. PNAS 2002;99:12963-12968

©2002 by The National Academy of Sciences

Page 11: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

Typical applications:

Comparitive Genomic Hybridization (aCGH) – copy number variation

RNA analysis: transcript structure, transcript discovery, etc.

Location analysis: nuclease sensitivity

Location analysis: chromatin immunoprecipitation (ChIP)

NOTE: ALL of these things can also be done by deep sequencing, which we will briefly cover towards the end

Page 12: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

RNA vs genomic

5’ UTR

3’ UTR

Page 13: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

Tiling of the Hox loci – mRNA vs. genomic

Page 14: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling
Page 15: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling
Page 16: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

ZY Xu et al. Nature 000, 1-5 (2009) doi:10.1038/nature07728

Transcript maps.

Page 17: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

Typical applications:

Comparitive Genomic Hybridization (aCGH) – copy number variation

RNA analysis: transcript structure, transcript discovery, etc.

Location analysis: nuclease sensitivity

Location analysis: chromatin immunoprecipitation (ChIP)

NOTE: ALL of these things can also be done by deep sequencing, which we will briefly cover towards the end

Page 18: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

DNaseI HS profiling

Page 19: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

DHS profiling identifies promoters, enhancers, and insulators

Page 20: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

Isolation of nucleosomal DNA

Page 21: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling
Page 22: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

Typical applications:

Comparitive Genomic Hybridization (aCGH) – copy number variation

RNA analysis: transcript structure, transcript discovery, etc.

Location analysis: nuclease sensitivity

Location analysis: chromatin immunoprecipitation (ChIP)

NOTE: ALL of these things can also be done by deep sequencing, which we will briefly cover towards the end

Page 23: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

Experimental Protocol

• Step 1: crosslink protein with DNA

• Step 2: sonication (break) DNA

Kim and Ren 2007

Page 24: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

Experimental Protocol

• Step 1: crosslink– fix protein with DNA

• Step 2: sonication– break DNA

• Step 3: immuno-precipitation– Pull down target protein

by specific antibody

Kim and Ren 2007

Page 25: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

Experimental Protocol

• Step 1: crosslink– fix protein with DNA

• Step 2: sonication– break DNA

• Step 3: immuno-precipitation– Pull down target protein

by specific antibody• Step 4: hybridization

– Hybridize input and pulled-down DNA on microarray

Kim and Ren 2007

Page 26: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

Chromatin Immuno-precipitation

Page 27: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

Tiling Array Data

Each TF binding signal is represented by multiple probes.

Need more sophisticated statistical tools.Kim and Ren 2007

Page 28: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

Boyer et al. 2005

Tiling arrays provide high resolution for identifying bound fragments

Overlapping 25-mer fragments

Page 29: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

Mapping histone modifications

Page 30: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

Chromatin’s primary structure

Page 31: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

OK, now what?

•Analysis method strongly depends on how widespread the thing being examined is, and if you have a guess regarding its localization

•CGH: Just look!

•TF ChIP-chip, DHS: peak finding algorithms (BUT BUT BUT).

•RNA, chromatin marks: Hidden Markov Models, aggregation plots

Page 32: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

CGH Array Segmentation

• Key idea: Most probe targets have same copy number as their next neighbors

• Can average over neighbors• Key issue: when is a difference real?• Recommended Programs:• DNACopy – Solid statistical basis; slow• StepGram – Heuristic ; fast

Page 33: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

Methods

• Moving average t-test (Keles et al. 2004)

• HMM (Li et al. 2005; Yuan et al. 2005)

• Tilemap (Ji and Wong 2005)

• MAT (Johnson et al. 2006)

Page 34: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

Keles’ method• Calculate a two-sample t-

statistic Y2

Y1

i

CHIP-signal

Input-signal

22,21

2,1

,1,2,

/ˆ/ˆ nn

YYT

ii

iini

Keles et al. 2004

Page 35: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

Keles’ method• Calculate a two-sample t-

statistic Y2

Y1

i

CHIP-signal

Input-signal

22,21

2,1

,1,2,

/ˆ/ˆ nn

YYT

ii

iini

w

1

,*,

1 wi

ihnhni T

wT

• Moving average scan-statistic

Page 36: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

Multiple hypothesis testing

• Multiple hypothesis testing needs to be considered to control false positive error rates.

• What is the null distribution of this statistic?

1

,*,

1 wi

ihnhni T

wT

Page 37: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

Multiple hypothesis testing

• Assume has t-distribution• Approximate

by normal distribution.

• Alternatively can use resampling method to estimate the null distribution.

nhT ,

1

,*,

1 wi

ihnhni T

wT

Page 38: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

ChIPOTle: a simple method for identifying ‘bound’ genomic fragments(Buck et al. 2005)

Assumption: real binding site will have distribution of bound fragments encapsulating it.Therefore, true positives will likely have multiple, contiguous fragments with high signal.

1. Walk across tiled genomic probes with user-defined window size

2. Calculate mean signal intensitywithin each window

3. Estimate p-value of binding(Bonferroni-corrected) basedon a standard error model or

by permuting the dataset.

Page 39: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

BUT:

• Extensive low-affinity transcriptional interactions in the yeast genome

• Amos Tanay

• Genome Research 2006

Page 40: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

OK, what about more continuous data like RNA or chromatin marks?

Page 41: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

Inferring nucleosomes: HMM

Page 42: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling
Page 43: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

A Hidden Markov Model objectively identifies nucleosome positions

Page 44: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

Hidden Markov Models for Identifying Bound Fragments

HMM’s are trained on known data to recognize different states (eg. bound vs. unbound fragments) and the probability of moving between those states

Example: ChIP-chip data from a tiling microarray identifying regions bound toa transcription complex with a known 50bp binding sequence.

You expect that a bound fragment will have high signal on the array and that the bound fragment will be 2-3 probes long.

Once trained, an HMM can be used to identify the ‘hidden’ states in an unknown dataset, based on the known characteristics of each state (‘emission probabilities ’) and

the probability of moving between states (‘transition probabilities’)

Example: “A hidden Markov model for analyzing ChIP-chip experiments on genome tiling arrays and its application to p53 binding sequences” 2005. Li, Meyer, Liu

Page 45: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

Example: ChIP-chip data from a tiling microarray identifying regions bound toa transcription complex with a known 50bp binding sequence.

You expect that a bound fragment will have high signal on the array and that the bound fragment will be 2-3 probes long.

P( I ) = 0.2P( i ) = 0.8

P( I ) = 0.8P( i ) = 0.2

P( I ) = 0.8P( i ) = 0.2

P( I ) = 0.8P( i ) = 0.2

I = Intensity units > 10,000 i = Intensity units < 10,000

P= 0.5

P= 0.5

P= 1.0

P= 0

P= 0.7

P= 0.3

P= 1.0

Unbound 25mer Bound 25mer Bound 25mer Bound 25mer

Page 46: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

Example: ChIP-chip data from a tiling microarray identifying regions bound toa transcription complex with a known 50bp binding sequence.

You expect that a bound fragment will have high signal on the array and that the bound fragment will be 2-3 probes long.

P= 0.5

P= 0.5

P= 1.0

P= 0

P= 0.7

P= 0.3

P= 1.0

Unbound 25mer Bound 25mer Bound 25mer Bound 25mer

Emission Probabilities

Transition Probabilities

Given the data, an HMM will consider many different models and give back the optimal model

P( I ) = 0.2P( i ) = 0.8

P( I ) = 0.8P( i ) = 0.2

P( I ) = 0.8P( i ) = 0.2

P( I ) = 0.8P( i ) = 0.2

Page 47: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

Other types and uses of microarrays: aCGH

CGH (comparative genomic hybridization) looks at cytogenetic abnormalities

•genomic DNA hybridized to array

•often uses large clones (e.g., BACs) as array features

Page 48: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

Validation of data

There’s no way that all of your microarray data can be validated.

It’s strongly recommended that any key findings be verified by independent means.

Northern blots and quantitative RT-PCR are the typical ways of doing this; real-time, quantitative RT-PCR is generally the method of choice.

Page 49: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

Chromatin’s primary structure

Page 50: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

One way to turn this 1D trace into

2D is via “averageogram”

Page 51: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

H4 K16 Acetyl, aligned by NFR

Page 52: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling
Page 53: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

Beyond Transcription

% nucleosomes(Printed Arrays)

% exchange events(Printed Arrays)

CDSTSS3:

TSS5:

promoter:

Null:

tRNA:ARS:

CDS

TSS3:

TSS5:

promoter:

Null:tRNA:ARS:

Page 54: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling
Page 55: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

Multiple visualizations of tiling data

Page 56: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

RNA-Seq

Lockhart and Winzeler 2000

Wang et al. 2009

Page 57: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

RNA-Seq

• Whole Transcriptome Shotgun Sequencing– Sequencing cDNA– Using NexGen technology

• Revolutionary Tool for Transcriptomics– More precise measurements– Ability to do large scale experiments with little

starting material

Page 58: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

RNA-Seq Experiment

Wang et al. 2009

Page 59: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

Mapping

• Create unique scaffolds– Harder algorithms with such short reads

Page 60: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

Unbiased sequencing of the yeast transcriptome

Yassour M et al. PNAS 2009;106:3264-3269

©2009 by National Academy of Sciences

Page 61: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

Mapping

• Place reads onto a known genomic scaffold– Requires known genome and depends on

accuracy of the reference

http://en.wikipedia.org/

Page 62: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

Ab initio assembly of a transcript catalog

Yassour M et al. PNAS 2009;106:3264-3269

©2009 by National Academy of Sciences

Page 63: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

Biases

Wang et al. 2009

Page 64: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

What the data look like

Page 65: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

Superimposing channels

Giresi et al, Genome Res. 10

Page 66: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

Experimental Design for Microarrays

There are a number of important experimental design considerations for a microarray experiment:•technical vs biological replicates

•amplification of RNA

•dye swaps

•reference samples

Page 67: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

Experimental Design for Microarrays

Technical vs biological replicates

•technical replicates are repeat hybridizations using the same RNA isolate

•biological replicates use RNA isolated from separate experiments/experimental organisms

Although technical replicates can be useful for reducing variation due to hybridization, imaging, etc., biological replicates are necessary for a properly controlled experiment

Page 68: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

Experimental Design for Microarrays

Amplification of RNA

• linear amplification methods can be used to increase the amount of RNA so that microarray experiments can be performed using very small numbers of cells. It’s not clear to what degree this affects results, especially with respect to rare transcripts, but seems to be generally OK if done correctly

Page 69: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

Experimental Design for Microarrays

Dye swaps

When using 2-color arrays, it’s important to hybridize replicates using a dye-swap strategy in which the colors (labels) are reversed between the two replicates. This is because there can be biases in hybridization intensity due to which dye is used (even when the sequence is the same).

S1 S2

S1 S2

Page 70: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

Experimental Design for Microarrays

Reference samples

•one common strategy is to use a reference sample in one channel on each array. This is usually something that will hybridize to most of the features (e.g., a complex RNA mixture). Using a reference sample allows comparisons to be made between different experimental conditions, as each is compared to the common reference.S1

S2

S3

R

R

R

compareS1/R vs. S2/R vs. S3/R

Page 71: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

Experimental Design for Microarrays

The bottom line is that you should discuss your experimental design with a statistician before going ahead and beginning your experiments. It’s usually too late and too expensive to change the design once you’ve begun!

Page 72: Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling

• EXPERIMENT DESIGNtype, factors, number of arrays, reference sample, qc, database accession (ArrayExpress, GEO)

• SAMPLES USED, PREPARATION AND LABELING

• HYBRIDIZATION PROCEDURES AND PARAMETERS

• MEASUREMENT DATA AND SPECIFICATIONSquantitations, hardware & software used for scanning and analysis, raw measurements, data selection and transformation procedures, final expression data

• ARRAY DESIGNplatform type, features and locations, manufacturing protocols or commercial p/n

MIAME (Minimal Information About a Microarray Experiment)

When you publish a microarray experiment, you are expected to make available the following minimal information. This allows others to evaluate your data and compare it to other experimental results: