localization analysis 11/07/07. microarray probes are oligonucleotide sequences with regular spacing...
TRANSCRIPT
Localization Analysis
11/07/07
• Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region.
chromosome
Tiling arrays
Tiling Arrays
http://en.wikipedia.org/
Typical applications:
Comparitive Genomic Hybridization (aCGH) – copy number variation
RNA analysis: transcript structure, transcript discovery, etc.
Location analysis: nuclease sensitivity
Location analysis: chromatin immunoprecipitation (ChIP)
NOTE: ALL of these things can also be done by deep sequencing, which we will briefly cover towards the end
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
Series1
Series2
Spike-in experiments – we can find linkers as short as 7 bp
Location of labeled PCR product Measured red/green ratio
Experimental Determination of Cross-Hybridization
Spike in PCR product – (1+1)/1 > (1+n)/n, so X-hybing probes will detect less enrichment experimentally
-8
-6
-4
-2
0
2
4
6
Series1
Series3
X-hyb
Spike-in data
-2
-1.5
-1
-0.5
0
0.5
1
1.5
1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97 103 109 115 121 127 133 139 145 151 157 163 169 175 181 187 193 199Series1
Series2
-4
-3
-2
-1
0
1
2
1 14 27 40 53 66 79 92 105 118 131 144 157 170 183 196 209 222 235 248 261 274 287 300 313 326 339 352 365 378 391 404 417 430 443 456
Series1
Series2
Array CGH Technology
Genome-wide measurement of DNA copy number alteration by array CGH
Pollack J R et al. PNAS 2002;99:12963-12968
©2002 by The National Academy of Sciences
DNA copy number alteration across chromosome 8 by array CGH
Pollack J R et al. PNAS 2002;99:12963-12968
©2002 by The National Academy of Sciences
Typical applications:
Comparitive Genomic Hybridization (aCGH) – copy number variation
RNA analysis: transcript structure, transcript discovery, etc.
Location analysis: nuclease sensitivity
Location analysis: chromatin immunoprecipitation (ChIP)
NOTE: ALL of these things can also be done by deep sequencing, which we will briefly cover towards the end
RNA vs genomic
5’ UTR
3’ UTR
Tiling of the Hox loci – mRNA vs. genomic
ZY Xu et al. Nature 000, 1-5 (2009) doi:10.1038/nature07728
Transcript maps.
Typical applications:
Comparitive Genomic Hybridization (aCGH) – copy number variation
RNA analysis: transcript structure, transcript discovery, etc.
Location analysis: nuclease sensitivity
Location analysis: chromatin immunoprecipitation (ChIP)
NOTE: ALL of these things can also be done by deep sequencing, which we will briefly cover towards the end
DNaseI HS profiling
DHS profiling identifies promoters, enhancers, and insulators
Isolation of nucleosomal DNA
Typical applications:
Comparitive Genomic Hybridization (aCGH) – copy number variation
RNA analysis: transcript structure, transcript discovery, etc.
Location analysis: nuclease sensitivity
Location analysis: chromatin immunoprecipitation (ChIP)
NOTE: ALL of these things can also be done by deep sequencing, which we will briefly cover towards the end
Experimental Protocol
• Step 1: crosslink protein with DNA
• Step 2: sonication (break) DNA
Kim and Ren 2007
Experimental Protocol
• Step 1: crosslink– fix protein with DNA
• Step 2: sonication– break DNA
• Step 3: immuno-precipitation– Pull down target protein
by specific antibody
Kim and Ren 2007
Experimental Protocol
• Step 1: crosslink– fix protein with DNA
• Step 2: sonication– break DNA
• Step 3: immuno-precipitation– Pull down target protein
by specific antibody• Step 4: hybridization
– Hybridize input and pulled-down DNA on microarray
Kim and Ren 2007
Chromatin Immuno-precipitation
Tiling Array Data
Each TF binding signal is represented by multiple probes.
Need more sophisticated statistical tools.Kim and Ren 2007
Boyer et al. 2005
Tiling arrays provide high resolution for identifying bound fragments
Overlapping 25-mer fragments
Mapping histone modifications
Chromatin’s primary structure
OK, now what?
•Analysis method strongly depends on how widespread the thing being examined is, and if you have a guess regarding its localization
•CGH: Just look!
•TF ChIP-chip, DHS: peak finding algorithms (BUT BUT BUT).
•RNA, chromatin marks: Hidden Markov Models, aggregation plots
CGH Array Segmentation
• Key idea: Most probe targets have same copy number as their next neighbors
• Can average over neighbors• Key issue: when is a difference real?• Recommended Programs:• DNACopy – Solid statistical basis; slow• StepGram – Heuristic ; fast
Methods
• Moving average t-test (Keles et al. 2004)
• HMM (Li et al. 2005; Yuan et al. 2005)
• Tilemap (Ji and Wong 2005)
• MAT (Johnson et al. 2006)
Keles’ method• Calculate a two-sample t-
statistic Y2
Y1
i
CHIP-signal
Input-signal
22,21
2,1
,1,2,
/ˆ/ˆ nn
YYT
ii
iini
Keles et al. 2004
Keles’ method• Calculate a two-sample t-
statistic Y2
Y1
i
CHIP-signal
Input-signal
22,21
2,1
,1,2,
/ˆ/ˆ nn
YYT
ii
iini
w
1
,*,
1 wi
ihnhni T
wT
• Moving average scan-statistic
Multiple hypothesis testing
• Multiple hypothesis testing needs to be considered to control false positive error rates.
• What is the null distribution of this statistic?
1
,*,
1 wi
ihnhni T
wT
Multiple hypothesis testing
• Assume has t-distribution• Approximate
by normal distribution.
• Alternatively can use resampling method to estimate the null distribution.
nhT ,
1
,*,
1 wi
ihnhni T
wT
ChIPOTle: a simple method for identifying ‘bound’ genomic fragments(Buck et al. 2005)
Assumption: real binding site will have distribution of bound fragments encapsulating it.Therefore, true positives will likely have multiple, contiguous fragments with high signal.
1. Walk across tiled genomic probes with user-defined window size
2. Calculate mean signal intensitywithin each window
3. Estimate p-value of binding(Bonferroni-corrected) basedon a standard error model or
by permuting the dataset.
BUT:
• Extensive low-affinity transcriptional interactions in the yeast genome
• Amos Tanay
• Genome Research 2006
OK, what about more continuous data like RNA or chromatin marks?
Inferring nucleosomes: HMM
A Hidden Markov Model objectively identifies nucleosome positions
Hidden Markov Models for Identifying Bound Fragments
HMM’s are trained on known data to recognize different states (eg. bound vs. unbound fragments) and the probability of moving between those states
Example: ChIP-chip data from a tiling microarray identifying regions bound toa transcription complex with a known 50bp binding sequence.
You expect that a bound fragment will have high signal on the array and that the bound fragment will be 2-3 probes long.
Once trained, an HMM can be used to identify the ‘hidden’ states in an unknown dataset, based on the known characteristics of each state (‘emission probabilities ’) and
the probability of moving between states (‘transition probabilities’)
Example: “A hidden Markov model for analyzing ChIP-chip experiments on genome tiling arrays and its application to p53 binding sequences” 2005. Li, Meyer, Liu
Example: ChIP-chip data from a tiling microarray identifying regions bound toa transcription complex with a known 50bp binding sequence.
You expect that a bound fragment will have high signal on the array and that the bound fragment will be 2-3 probes long.
P( I ) = 0.2P( i ) = 0.8
P( I ) = 0.8P( i ) = 0.2
P( I ) = 0.8P( i ) = 0.2
P( I ) = 0.8P( i ) = 0.2
I = Intensity units > 10,000 i = Intensity units < 10,000
P= 0.5
P= 0.5
P= 1.0
P= 0
P= 0.7
P= 0.3
P= 1.0
Unbound 25mer Bound 25mer Bound 25mer Bound 25mer
Example: ChIP-chip data from a tiling microarray identifying regions bound toa transcription complex with a known 50bp binding sequence.
You expect that a bound fragment will have high signal on the array and that the bound fragment will be 2-3 probes long.
P= 0.5
P= 0.5
P= 1.0
P= 0
P= 0.7
P= 0.3
P= 1.0
Unbound 25mer Bound 25mer Bound 25mer Bound 25mer
Emission Probabilities
Transition Probabilities
Given the data, an HMM will consider many different models and give back the optimal model
P( I ) = 0.2P( i ) = 0.8
P( I ) = 0.8P( i ) = 0.2
P( I ) = 0.8P( i ) = 0.2
P( I ) = 0.8P( i ) = 0.2
Other types and uses of microarrays: aCGH
CGH (comparative genomic hybridization) looks at cytogenetic abnormalities
•genomic DNA hybridized to array
•often uses large clones (e.g., BACs) as array features
Validation of data
There’s no way that all of your microarray data can be validated.
It’s strongly recommended that any key findings be verified by independent means.
Northern blots and quantitative RT-PCR are the typical ways of doing this; real-time, quantitative RT-PCR is generally the method of choice.
Chromatin’s primary structure
One way to turn this 1D trace into
2D is via “averageogram”
H4 K16 Acetyl, aligned by NFR
Beyond Transcription
% nucleosomes(Printed Arrays)
% exchange events(Printed Arrays)
CDSTSS3:
TSS5:
promoter:
Null:
tRNA:ARS:
CDS
TSS3:
TSS5:
promoter:
Null:tRNA:ARS:
Multiple visualizations of tiling data
RNA-Seq
Lockhart and Winzeler 2000
Wang et al. 2009
RNA-Seq
• Whole Transcriptome Shotgun Sequencing– Sequencing cDNA– Using NexGen technology
• Revolutionary Tool for Transcriptomics– More precise measurements– Ability to do large scale experiments with little
starting material
RNA-Seq Experiment
Wang et al. 2009
Mapping
• Create unique scaffolds– Harder algorithms with such short reads
Unbiased sequencing of the yeast transcriptome
Yassour M et al. PNAS 2009;106:3264-3269
©2009 by National Academy of Sciences
Mapping
• Place reads onto a known genomic scaffold– Requires known genome and depends on
accuracy of the reference
http://en.wikipedia.org/
Ab initio assembly of a transcript catalog
Yassour M et al. PNAS 2009;106:3264-3269
©2009 by National Academy of Sciences
Biases
Wang et al. 2009
What the data look like
Superimposing channels
Giresi et al, Genome Res. 10
Experimental Design for Microarrays
There are a number of important experimental design considerations for a microarray experiment:•technical vs biological replicates
•amplification of RNA
•dye swaps
•reference samples
Experimental Design for Microarrays
Technical vs biological replicates
•technical replicates are repeat hybridizations using the same RNA isolate
•biological replicates use RNA isolated from separate experiments/experimental organisms
Although technical replicates can be useful for reducing variation due to hybridization, imaging, etc., biological replicates are necessary for a properly controlled experiment
Experimental Design for Microarrays
Amplification of RNA
• linear amplification methods can be used to increase the amount of RNA so that microarray experiments can be performed using very small numbers of cells. It’s not clear to what degree this affects results, especially with respect to rare transcripts, but seems to be generally OK if done correctly
Experimental Design for Microarrays
Dye swaps
When using 2-color arrays, it’s important to hybridize replicates using a dye-swap strategy in which the colors (labels) are reversed between the two replicates. This is because there can be biases in hybridization intensity due to which dye is used (even when the sequence is the same).
S1 S2
S1 S2
Experimental Design for Microarrays
Reference samples
•one common strategy is to use a reference sample in one channel on each array. This is usually something that will hybridize to most of the features (e.g., a complex RNA mixture). Using a reference sample allows comparisons to be made between different experimental conditions, as each is compared to the common reference.S1
S2
S3
R
R
R
compareS1/R vs. S2/R vs. S3/R
Experimental Design for Microarrays
The bottom line is that you should discuss your experimental design with a statistician before going ahead and beginning your experiments. It’s usually too late and too expensive to change the design once you’ve begun!
• EXPERIMENT DESIGNtype, factors, number of arrays, reference sample, qc, database accession (ArrayExpress, GEO)
• SAMPLES USED, PREPARATION AND LABELING
• HYBRIDIZATION PROCEDURES AND PARAMETERS
• MEASUREMENT DATA AND SPECIFICATIONSquantitations, hardware & software used for scanning and analysis, raw measurements, data selection and transformation procedures, final expression data
• ARRAY DESIGNplatform type, features and locations, manufacturing protocols or commercial p/n
MIAME (Minimal Information About a Microarray Experiment)
When you publish a microarray experiment, you are expected to make available the following minimal information. This allows others to evaluate your data and compare it to other experimental results: