special topics in genomics chip-chip and tiling arrays

40
Special Topics in Genomics ChIP-chip and Tiling Arrays

Upload: chinara

Post on 14-Jan-2016

32 views

Category:

Documents


2 download

DESCRIPTION

Special Topics in Genomics ChIP-chip and Tiling Arrays. Gene expression microarray analysis. Clustering genes by expression profile. Search conserved sequence motifs in cluster promoters. Traditional Method for Understanding Transcription Regulation. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Special Topics in Genomics ChIP-chip and Tiling Arrays

Special Topics in Genomics

ChIP-chip and Tiling Arrays

Page 2: Special Topics in Genomics ChIP-chip and Tiling Arrays

Traditional Method for Understanding Transcription Regulation

Very challenging for mammalian genomes

Gene expression microarray analysis

Clustering genes by

expression profile

Search conserved sequence

motifs in cluster promoters

Page 3: Special Topics in Genomics ChIP-chip and Tiling Arrays

ChIP-chip Technology

• Chromatin ImmunoPrecipitation + microarray

• Detect genome-wide in vivo location of TF and other DNA-binding proteins

• Can learn the regulatory mechanism of a transcription factor or DNA-binding protein much better and faster

Page 4: Special Topics in Genomics ChIP-chip and Tiling Arrays

Chromatin ImmunoPrecipitation (ChIP)

By Richard Bourgon at UC Berkley

Page 5: Special Topics in Genomics ChIP-chip and Tiling Arrays

TF/DNA Crosslinking in vivo

By Richard Bourgon at UC Berkley

Page 6: Special Topics in Genomics ChIP-chip and Tiling Arrays

Sonication (~500bp)

By Richard Bourgon at UC Berkley

Page 7: Special Topics in Genomics ChIP-chip and Tiling Arrays

TF-specific Antibody

By Richard Bourgon at UC Berkley

Page 8: Special Topics in Genomics ChIP-chip and Tiling Arrays

Immunoprecipitation

By Richard Bourgon at UC Berkley

Page 9: Special Topics in Genomics ChIP-chip and Tiling Arrays

Reverse Crosslink and DNA Purification

By Richard Bourgon at UC Berkley

Page 10: Special Topics in Genomics ChIP-chip and Tiling Arrays

Amplification

By Richard Bourgon at UC Berkley

Page 11: Special Topics in Genomics ChIP-chip and Tiling Arrays

Genome Tiling Arrays

# Arrays human genome

# Probes / Array

# Total Probes

Probe Length

Probe Resolution

Price

Affymetrix 7 6M 42.0M 25mer 35 bp $2,000

Nimblegen 38 390K 14.8M 50mer 110 bp $30,000

Agilent 21 244K 5.1M 60mer

300 bp in genes;

500 bp in intergenic

$11,000

By Xiaole Shirley Liu at Harvard

Page 12: Special Topics in Genomics ChIP-chip and Tiling Arrays

Genome Tiling Arrays

• Affymetrix genome tiling microarrays– Tile the genome non-repeat regions

– Chr21/22 tiling (earlier version): 1 million probe pairs (PM & MM) at 35 bp resolution on 3 arrays

– Whole genome: 42 million PM probes on 7 arrays

Probes

Chromosome

PM CGACATTGATTCAAGACTACATACAMM CGACATTGATTCTAGACTACATACA

By Xiaole Shirley Liu at Harvard

Page 13: Special Topics in Genomics ChIP-chip and Tiling Arrays

Chromatin ImmunoPrecipitation (ChIP)

By Richard Bourgon at UC Berkley

Page 14: Special Topics in Genomics ChIP-chip and Tiling Arrays

ChIP-chip Array Hybridization

• Map high intensity probes back to the genome• Locate TF binding location

Probes

Chromosome

ChIP-DNA

Noise

By Xiaole Shirley Liu at Harvard

Page 15: Special Topics in Genomics ChIP-chip and Tiling Arrays

Identify ChIP-enriched Region

• Controls: sonicated genomic Input DNA• Often 3 ChIP, 3 Ctrl replicates are needed

ChIP

Ctrl

By Xiaole Shirley Liu at Harvard

Page 16: Special Topics in Genomics ChIP-chip and Tiling Arrays

Mann-Whitney U-testfor ChIP-region Detection

• Affy TAS, Cawley et al (Cell 2004): – Each probe: rank probes (either PM-MM or

PM) within [-500bp, +500bp] window– Check whether sum of ChIP ranks is much

smaller

By Xiaole Shirley Liu at Harvard

Page 17: Special Topics in Genomics ChIP-chip and Tiling Arrays

TileMap (Ji and Wong, Bioinformatics 2005)

STEP 1:Compute a test statistic for each probe to

summarize probe level information

STEP 2:Combine probe level test statistics of

neighboring probes to help infer binding regions

Page 18: Special Topics in Genomics ChIP-chip and Tiling Arrays

Probe level test statistic: empirical Bayes approach

22s

23s

2Is

21s …

Probe

Sample Variance (df)

1 2 3 … I

i i ssS 222 )]([2s

Mean Sum of Squares

S

Is

dfI

I

dfB

1)(

2

21

2

2ˆ 22

Shrinkage Factor

222 ˆ)ˆ1(ˆ sBsB ii

Variance Shrinkage Estimator

21 2

2 23 2ˆ I…Variance Estimates

A modified t-statistic

i

iii

KK

xxt

11~

21

21

1~t 2

~t 3~t It

~…Probe level test statistics

Page 19: Special Topics in Genomics ChIP-chip and Tiling Arrays

Combining neighboring probes

TileMap (MA)

1. Compute the probe level test statistic t for each probe;

2. Compute a moving average statistic to measure enrichment;

3. Estimate FDR.

TileMap (HMM)

1. Compute the probe level test statistic t for each probe;

2. Estimate the distribution of t under H0 and H1;

3. Model t by a Hidden Markov Model, and decode the HMM.

Page 20: Special Topics in Genomics ChIP-chip and Tiling Arrays

Shrinking variance increases statistical power

Mean(X1)-Mean(X2)

t-statistic, canonical

t-statistic, variance shrinking

Moving Average

Page 21: Special Topics in Genomics ChIP-chip and Tiling Arrays

Peak 2 (180bp) transgenics

Neural tube expression Transgenics

Page 22: Special Topics in Genomics ChIP-chip and Tiling Arrays

Comparisons between TileMap and previous methods

cMyc ChIP-chip Data: 6 IP + 6 CT1 + 6 CT2

Gold Standard: Using GTRANS and Keles’ method to analyze all 18 arrays

Test data: 4 arrays, 2 IP vs 2 CT1 (s2r2)

GTRANS or TAS (Kampa et al., 2004)

1. Set a window;

2. Perform a Wilcoxon signed rank test for each window.

Keles et al. (2004)

1. Compute a t-statistic t for each probe (no shrinking, two sample only);

2. Rank probes by a moving average.

TileMap-HMM (Ji & Wong, 2005)

Page 23: Special Topics in Genomics ChIP-chip and Tiling Arrays

Shrinking variance saves money

Using non-shrinking method (Keles’ method) to analyze all probes

Using shrinking method to analyze half of the probes, i.e., reduce information by half

Page 24: Special Topics in Genomics ChIP-chip and Tiling Arrays

MAT(Johnson W.E. et al. PNAS, 2006)

• Model-based Analysis of Tiling arrays for ChIP-chip

• Goal: – Find ChIP-regions without replicates

– Find ChIP-region without controls

– Find ChIP-regions without MM probes

– Can analyze data array by array

By Xiaole Shirley Liu at Harvard

Page 25: Special Topics in Genomics ChIP-chip and Tiling Arrays

MAT

• Estimate probe behavior by checking other probes with similar sequence on the same array

• Probe sequence plays a

big role in signal value• Most of the probes in

ChIP-chip measures

non-specific

hybridization

By Xiaole Shirley Liu at Harvard

Page 26: Special Topics in Genomics ChIP-chip and Tiling Arrays

Probe Behavior Model

Baseline on number of Ts

A,C,G at each position of the 25mer

A,C,G,T Count Square

25mer Copy Number along the Genome

By Xiaole Shirley Liu at Harvard

Page 27: Special Topics in Genomics ChIP-chip and Tiling Arrays

Probe Standardization

• Fit the probe model array by array• Divide array probes to bins (3k probes/bin)• Background-subtraction and standardization

(normalization) on a single array;

binaffinityi

iii s

mPMLogt

ˆ)(

Model predicted probe intensity

Observed probe intensity

Observed probe variance within

each bin

By Xiaole Shirley Liu at Harvard

Page 28: Special Topics in Genomics ChIP-chip and Tiling Arrays

Eliminate Normalization

• Probe log(PM) values before and after standardization

• If normalize before model fitting– Predicted same ChIP-regions, although less confident

By Xiaole Shirley Liu at Harvard

Page 29: Special Topics in Genomics ChIP-chip and Tiling Arrays

ChIP-region Detection

• Window-based MATscore– ChIP without Ctrl

– TM: trimmed mean

– Multiple ChIP with multiple Ctrl

– More probes, higher t values in ChIP, less variance (fluctuation) more confident

ChIPInput

nInputinstTMChIPinstTM

regionMAT

)'()'()(

ChIPnregioninstTMregionMAT )'()(

By Xiaole Shirley Liu at Harvard

Page 30: Special Topics in Genomics ChIP-chip and Tiling Arrays

Raw probe values at two spike-in regions with concentration 2X

ChIP_1 Log(PM)

Input_1 Log(PM)

Sequence-based probe behavior standardization

ChIP_1 t-value

Input_1 t-value

Window-based neighboring probe combination for ChIP-region detection

ChIP_1 MATscore

ChIP_1/Input_1MATscore

3 Reps ChIP/InputMATscore

2X 2X

By Xiaole Shirley Liu at Harvard

Page 31: Special Topics in Genomics ChIP-chip and Tiling Arrays

Statistical Significance of Hits

• P-value and FDR cutoff:– P-value from MATscore distribution– Estimate negative peaks under the same P value cutoff– Regional FDR = #negative_peaks / #positive_peaks

<1% enriched

MAT: Quality Control

Background

Enriched DNA

<1% enriched

MAT: Quality Control

Background

Enriched DNA

By Xiaole Shirley Liu at Harvard

Page 32: Special Topics in Genomics ChIP-chip and Tiling Arrays

MAT summary

• Open source python http://chip.dfci.harvard.edu/~wli/MAT/

• Runs faster than array scanner• Can work with single ChIP, multiple ChIP, and

multiple ChIP with controls with increasing accuracy– Use single ChIP on promoter arrays to test antibody

and protocol before going whole genome

• Can identify individual failed samples

By Xiaole Shirley Liu at Harvard

Page 33: Special Topics in Genomics ChIP-chip and Tiling Arrays

Benchmark for ChIP-chip Target Detection(Johnson D.S. et al. Genome Research, 2008)

• ENCODE Spike-in experiment: both amplified and un-amplified

• Blind test: Samples hybridized to different tiling arrays, predictions made before the key was released

ChIP96 ENCODE clones,

2,4,8,...,256X enrichment + total chromatin DNA

Input

total genomic DNA

Page 34: Special Topics in Genomics ChIP-chip and Tiling Arrays

Comparison of platforms

Page 35: Special Topics in Genomics ChIP-chip and Tiling Arrays

Comparison of algorithms

Combined Johnson D.S. et al. Genome Research 2008 with Ji H. et al. Nature Biotechnology 2008

Page 36: Special Topics in Genomics ChIP-chip and Tiling Arrays

MBR: Microarray Blob Remover

By Xiaole Shirley Liu at Harvard

Page 37: Special Topics in Genomics ChIP-chip and Tiling Arrays

xMAN: eXtreme MApping of oligoNucleotides

• http://chip.dfci.harvard.edu/~wli/xMAN• xMAN maps ~42 M Affymetrix tiling probes to the

newest human genome assembly in less than 6 CPU hours– BLAST needs 20 CPU years; BLAT needs 55 CPU days

– Probe TCCCAGCACTTTGGGAGGCTGAGGC maps to 50,660 times in the genome

• Can map long oligos, and paired tag high throughput sequencing fragments

• Store the copy number information of every probe

• mXAN filters tiling array probes to ensure one unique probe measurement per 1 kb, improves peak detection

By Xiaole Shirley Liu at Harvard

Page 38: Special Topics in Genomics ChIP-chip and Tiling Arrays

CEAS: Cis-regulatory Element Annotation System

• Data Analysis Button for Biologists

http://ceas.cbi.pku.edu.cn

By Xiaole Shirley Liu at Harvard

Page 39: Special Topics in Genomics ChIP-chip and Tiling Arrays

CisGenome(Ji H. et al. Nature Biotechnology, 2008)

Graphic User Interface

CisGenome Browser

Core Data Analysis

Programs

Page 40: Special Topics in Genomics ChIP-chip and Tiling Arrays

Other applications of tiling arrays

• Transcriptome mapping• MeDIP-chip• DNase-chip• Nucleosome localization• Array CGH and copy number variation