analysing chip-seq data - babraham bioinf...composition / motif analysis •composition –good...
TRANSCRIPT
![Page 2: Analysing ChIP-Seq Data - Babraham Bioinf...Composition / Motif Analysis •Composition –Good place to start, can provide either biological or technical insight –See if hits (up](https://reader033.vdocument.in/reader033/viewer/2022050323/5f7d5f13c2e742693d62675c/html5/thumbnails/2.jpg)
Data Creation and Processing
Starting DNA Fragmented DNA ChIPped DNA
Sequence LibraryFastQ Sequence
FileMapped BAM File
Filtered BAM File Exploration Analysis
![Page 3: Analysing ChIP-Seq Data - Babraham Bioinf...Composition / Motif Analysis •Composition –Good place to start, can provide either biological or technical insight –See if hits (up](https://reader033.vdocument.in/reader033/viewer/2022050323/5f7d5f13c2e742693d62675c/html5/thumbnails/3.jpg)
Steps in Analysis
• Define enriched regions– Based around features– De-novo peak prediction
• Quantitate– Corrections and Normalisation
• Compare– Categorical– Quantitative
![Page 4: Analysing ChIP-Seq Data - Babraham Bioinf...Composition / Motif Analysis •Composition –Good place to start, can provide either biological or technical insight –See if hits (up](https://reader033.vdocument.in/reader033/viewer/2022050323/5f7d5f13c2e742693d62675c/html5/thumbnails/4.jpg)
Defining Regions - Should I peak call?• You need a single set of reference positions for analysis
– Peak calling to define solely from the data
– Feature based measurements if your exploration showed linkage to features
• If exploration showed strong and reasonably complete feature association then this is a good option– No worries about missing weaker peaks
– More complete background (get both enriched and unenriched)
• If no feature linkage then peak call– Only looking at enriched regions
– More difficult to do functional interpretation later on
![Page 5: Analysing ChIP-Seq Data - Babraham Bioinf...Composition / Motif Analysis •Composition –Good place to start, can provide either biological or technical insight –See if hits (up](https://reader033.vdocument.in/reader033/viewer/2022050323/5f7d5f13c2e742693d62675c/html5/thumbnails/5.jpg)
How Peak Callers Work (MACS)
Optimise the starting data
Build a background model
Test sliding windows
Apply per-site adjustment
Report
![Page 6: Analysing ChIP-Seq Data - Babraham Bioinf...Composition / Motif Analysis •Composition –Good place to start, can provide either biological or technical insight –See if hits (up](https://reader033.vdocument.in/reader033/viewer/2022050323/5f7d5f13c2e742693d62675c/html5/thumbnails/6.jpg)
Optimise the starting data
• Correct the for/rev offset
• Deduplicate
![Page 7: Analysing ChIP-Seq Data - Babraham Bioinf...Composition / Motif Analysis •Composition –Good place to start, can provide either biological or technical insight –See if hits (up](https://reader033.vdocument.in/reader033/viewer/2022050323/5f7d5f13c2e742693d62675c/html5/thumbnails/7.jpg)
Build a background model
Observed
Lambda value
![Page 8: Analysing ChIP-Seq Data - Babraham Bioinf...Composition / Motif Analysis •Composition –Good place to start, can provide either biological or technical insight –See if hits (up](https://reader033.vdocument.in/reader033/viewer/2022050323/5f7d5f13c2e742693d62675c/html5/thumbnails/8.jpg)
Build a background model
Lambda value
Model
Critical p-value (n=18)
![Page 9: Analysing ChIP-Seq Data - Babraham Bioinf...Composition / Motif Analysis •Composition –Good place to start, can provide either biological or technical insight –See if hits (up](https://reader033.vdocument.in/reader033/viewer/2022050323/5f7d5f13c2e742693d62675c/html5/thumbnails/9.jpg)
Build a background model
Lambda value
Observed + Model
Critical p-value (n=18)
![Page 10: Analysing ChIP-Seq Data - Babraham Bioinf...Composition / Motif Analysis •Composition –Good place to start, can provide either biological or technical insight –See if hits (up](https://reader033.vdocument.in/reader033/viewer/2022050323/5f7d5f13c2e742693d62675c/html5/thumbnails/10.jpg)
Test Sliding Windows
• Generally use half of the library fragment size
• Windows whose count exceeds the critical value are kept
• Merge adjacent windows over the critical value to form peaks
• Generates candidate (not final) peak set
![Page 11: Analysing ChIP-Seq Data - Babraham Bioinf...Composition / Motif Analysis •Composition –Good place to start, can provide either biological or technical insight –See if hits (up](https://reader033.vdocument.in/reader033/viewer/2022050323/5f7d5f13c2e742693d62675c/html5/thumbnails/11.jpg)
Correct for local variation
Critical value
Generate localised model if input density is higher than the global value
Most pessimistic p-value is kept
![Page 12: Analysing ChIP-Seq Data - Babraham Bioinf...Composition / Motif Analysis •Composition –Good place to start, can provide either biological or technical insight –See if hits (up](https://reader033.vdocument.in/reader033/viewer/2022050323/5f7d5f13c2e742693d62675c/html5/thumbnails/12.jpg)
Broad Peaks• Added in MACS2 – suitable where larger regions with variable
enrichment exist
• Uses two thresholds for enrichment
![Page 13: Analysing ChIP-Seq Data - Babraham Bioinf...Composition / Motif Analysis •Composition –Good place to start, can provide either biological or technical insight –See if hits (up](https://reader033.vdocument.in/reader033/viewer/2022050323/5f7d5f13c2e742693d62675c/html5/thumbnails/13.jpg)
How should you apply peak callers
• Multiple ChIPs (over multiple conditions)
• Multiple Inputs
![Page 14: Analysing ChIP-Seq Data - Babraham Bioinf...Composition / Motif Analysis •Composition –Good place to start, can provide either biological or technical insight –See if hits (up](https://reader033.vdocument.in/reader033/viewer/2022050323/5f7d5f13c2e742693d62675c/html5/thumbnails/14.jpg)
Multiple Inputs
• Input variability generally reflects general trends
– Mapability– Genome Assembly– Fragmentation biases
• Normally best to merge all inputs to one common reference input
![Page 15: Analysing ChIP-Seq Data - Babraham Bioinf...Composition / Motif Analysis •Composition –Good place to start, can provide either biological or technical insight –See if hits (up](https://reader033.vdocument.in/reader033/viewer/2022050323/5f7d5f13c2e742693d62675c/html5/thumbnails/15.jpg)
Multiple ChIPs
WT ChIP 1
WT ChIP 2
KO ChIP 1
KO ChIP 2
BAM Files Peak Sets
WT ChIP 1+
WT ChIP 2+
KO ChIP 1+
KO ChIP 2
Peaks
WT ChIP 1+
WT ChIP 2+
KO ChIP 1+
KO ChIP 2
![Page 16: Analysing ChIP-Seq Data - Babraham Bioinf...Composition / Motif Analysis •Composition –Good place to start, can provide either biological or technical insight –See if hits (up](https://reader033.vdocument.in/reader033/viewer/2022050323/5f7d5f13c2e742693d62675c/html5/thumbnails/16.jpg)
Multiple ChIPs
WT ChIP 1
WT ChIP 2
KO ChIP 1
KO ChIP 2
BAM Files
WT Peaks 1
WT Peaks 2
KO Peaks 1
KO Peaks 2
Peak Sets
WT Peaks 1And
WT Peaks 2
KO Peaks 1And
KO Peaks 2
WT Peaks 1And
WT Peaks 2
Or
KO Peaks 1And
KO Peaks 2
![Page 17: Analysing ChIP-Seq Data - Babraham Bioinf...Composition / Motif Analysis •Composition –Good place to start, can provide either biological or technical insight –See if hits (up](https://reader033.vdocument.in/reader033/viewer/2022050323/5f7d5f13c2e742693d62675c/html5/thumbnails/17.jpg)
Why isn't a peak called
Fewer peaks are called by just sub-sampling the same data
![Page 18: Analysing ChIP-Seq Data - Babraham Bioinf...Composition / Motif Analysis •Composition –Good place to start, can provide either biological or technical insight –See if hits (up](https://reader033.vdocument.in/reader033/viewer/2022050323/5f7d5f13c2e742693d62675c/html5/thumbnails/18.jpg)
Why isn't a peak called
With no input the region around the peak is used to model the background. Broader peaks can be missed
For ATAC data (no input) you should skip the rescoring step
![Page 19: Analysing ChIP-Seq Data - Babraham Bioinf...Composition / Motif Analysis •Composition –Good place to start, can provide either biological or technical insight –See if hits (up](https://reader033.vdocument.in/reader033/viewer/2022050323/5f7d5f13c2e742693d62675c/html5/thumbnails/19.jpg)
Reporting on Peak sets
• Don’t make claims based solely on the number of peaks (“there were more WT peaks than KO peaks” for example)
• Don’t make claims based on regions being peaks in 1 set but not another (there were 465 peaks which were specific to KO)
• It is OK to make statements about overlap (there were 794 peaks which were common to WT and KO)
• You have to address differential enrichment problems quantitatively
![Page 20: Analysing ChIP-Seq Data - Babraham Bioinf...Composition / Motif Analysis •Composition –Good place to start, can provide either biological or technical insight –See if hits (up](https://reader033.vdocument.in/reader033/viewer/2022050323/5f7d5f13c2e742693d62675c/html5/thumbnails/20.jpg)
Quantitating ChIP data for analysis
• Quantitation of ChIP is not a simple problem
• Can start with something simple but in many cases you will need to refine this
• Simple linear, globally corrected counts are a good place to start
![Page 21: Analysing ChIP-Seq Data - Babraham Bioinf...Composition / Motif Analysis •Composition –Good place to start, can provide either biological or technical insight –See if hits (up](https://reader033.vdocument.in/reader033/viewer/2022050323/5f7d5f13c2e742693d62675c/html5/thumbnails/21.jpg)
Normalising to input?
• Do you have significant variability in your input read density
– If not then there's no need to normalise
– Input outliers should be removed, not corrected against
• Do you have ChIP signal which is correlated with the input level?
– Only if you see correlation between ChIP and input should you normalise
– Most of the time this isn't the case, even where the input does vary
![Page 22: Analysing ChIP-Seq Data - Babraham Bioinf...Composition / Motif Analysis •Composition –Good place to start, can provide either biological or technical insight –See if hits (up](https://reader033.vdocument.in/reader033/viewer/2022050323/5f7d5f13c2e742693d62675c/html5/thumbnails/22.jpg)
See if the input has an influence
For truly enriched regions the input level is not predictive of the ChIP level.
Normalising to input would make things worse here.
![Page 23: Analysing ChIP-Seq Data - Babraham Bioinf...Composition / Motif Analysis •Composition –Good place to start, can provide either biological or technical insight –See if hits (up](https://reader033.vdocument.in/reader033/viewer/2022050323/5f7d5f13c2e742693d62675c/html5/thumbnails/23.jpg)
Why not just always do "fold over input"?
• Inputs are generally poorly measured
– Poor coverage compared to ChIP
• Fold change values are more influenced by input than ChIP
• Biases in the input are smaller than enrichment power of the antibody
Region Input ChIP ChIP/Input
Region A 5 200 40
Region B 2 200 100
![Page 24: Analysing ChIP-Seq Data - Babraham Bioinf...Composition / Motif Analysis •Composition –Good place to start, can provide either biological or technical insight –See if hits (up](https://reader033.vdocument.in/reader033/viewer/2022050323/5f7d5f13c2e742693d62675c/html5/thumbnails/24.jpg)
Hits with increased enrichment
Hits with decreased enrichment
![Page 25: Analysing ChIP-Seq Data - Babraham Bioinf...Composition / Motif Analysis •Composition –Good place to start, can provide either biological or technical insight –See if hits (up](https://reader033.vdocument.in/reader033/viewer/2022050323/5f7d5f13c2e742693d62675c/html5/thumbnails/25.jpg)
Evaluating and Normalising Enrichment
Good Enrichment
Worse Enrichment
Similar Enrichment Small Difference Large Difference
![Page 26: Analysing ChIP-Seq Data - Babraham Bioinf...Composition / Motif Analysis •Composition –Good place to start, can provide either biological or technical insight –See if hits (up](https://reader033.vdocument.in/reader033/viewer/2022050323/5f7d5f13c2e742693d62675c/html5/thumbnails/26.jpg)
Evaluating and Normalising Enrichment
No
rmal
ise
d R
ead
Co
un
t
Percentile through data
![Page 27: Analysing ChIP-Seq Data - Babraham Bioinf...Composition / Motif Analysis •Composition –Good place to start, can provide either biological or technical insight –See if hits (up](https://reader033.vdocument.in/reader033/viewer/2022050323/5f7d5f13c2e742693d62675c/html5/thumbnails/27.jpg)
Normalising Enrichment
• Simple– Single point of reference (eg size factor normalisation)– Works for small differences, not for large ones
• Enrichment specific– Two points of reference
• Low percentile to reflect baseline• High percentile to reflect close to saturation• Add to match first, Multiply to match second
• Quick and Dirty– Quantile normalisation to force a common distribution
• Don't normalise the input or use it to calculate distributions!
![Page 28: Analysing ChIP-Seq Data - Babraham Bioinf...Composition / Motif Analysis •Composition –Good place to start, can provide either biological or technical insight –See if hits (up](https://reader033.vdocument.in/reader033/viewer/2022050323/5f7d5f13c2e742693d62675c/html5/thumbnails/28.jpg)
Normalising Enrichment
Size FactorSingle point of comparison
Works well for small differencesInsufficient for large differencesAllows the use of count based stats
Small Difference
Large Difference
Large Difference
Small DifferenceQuantileForces distributions to be identicalCorrects any differences, easy to apply
EnrichmentTwo points of comparison
Corrects for larger differencesNot directly compatible with count based stats
![Page 29: Analysing ChIP-Seq Data - Babraham Bioinf...Composition / Motif Analysis •Composition –Good place to start, can provide either biological or technical insight –See if hits (up](https://reader033.vdocument.in/reader033/viewer/2022050323/5f7d5f13c2e742693d62675c/html5/thumbnails/29.jpg)
Normalising Enrichment
![Page 30: Analysing ChIP-Seq Data - Babraham Bioinf...Composition / Motif Analysis •Composition –Good place to start, can provide either biological or technical insight –See if hits (up](https://reader033.vdocument.in/reader033/viewer/2022050323/5f7d5f13c2e742693d62675c/html5/thumbnails/30.jpg)
Look for systematic enrichment changes (real biology!!)
Use replicates to build a case for a biological rather than technical difference
![Page 31: Analysing ChIP-Seq Data - Babraham Bioinf...Composition / Motif Analysis •Composition –Good place to start, can provide either biological or technical insight –See if hits (up](https://reader033.vdocument.in/reader033/viewer/2022050323/5f7d5f13c2e742693d62675c/html5/thumbnails/31.jpg)
Checking Normalisation
Before Normalisation After Normalisation
![Page 32: Analysing ChIP-Seq Data - Babraham Bioinf...Composition / Motif Analysis •Composition –Good place to start, can provide either biological or technical insight –See if hits (up](https://reader033.vdocument.in/reader033/viewer/2022050323/5f7d5f13c2e742693d62675c/html5/thumbnails/32.jpg)
Differential enrichment analysis
• Needs to be quantitative
• Needs to operate on non-deduplicated data
• Two statistical options
– Count based stats on raw uncorrected counts
• DESeq
• EdgeR
– Continuous quantitation stats on normalised enrichment values
• LIMMA
![Page 33: Analysing ChIP-Seq Data - Babraham Bioinf...Composition / Motif Analysis •Composition –Good place to start, can provide either biological or technical insight –See if hits (up](https://reader033.vdocument.in/reader033/viewer/2022050323/5f7d5f13c2e742693d62675c/html5/thumbnails/33.jpg)
Which statistic to pick?
• If enrichment is roughly similar
– Raw counts, then DESeq/EdgeR
• If there are large differences in enrichment
– Enrichment normalisation
– LIMMA statistics
![Page 34: Analysing ChIP-Seq Data - Babraham Bioinf...Composition / Motif Analysis •Composition –Good place to start, can provide either biological or technical insight –See if hits (up](https://reader033.vdocument.in/reader033/viewer/2022050323/5f7d5f13c2e742693d62675c/html5/thumbnails/34.jpg)
Visualisation of hits
• Map onto scatterplot for simple verification
• Normally makes sense to use log transformed counts
• Look at the data underneath candidates you make specific claims about
![Page 35: Analysing ChIP-Seq Data - Babraham Bioinf...Composition / Motif Analysis •Composition –Good place to start, can provide either biological or technical insight –See if hits (up](https://reader033.vdocument.in/reader033/viewer/2022050323/5f7d5f13c2e742693d62675c/html5/thumbnails/35.jpg)
Hit validation
• Look whether hits make sense
• Look at points which change but were not selected
• Log scale should be used
• Keep the context of non-hits
![Page 36: Analysing ChIP-Seq Data - Babraham Bioinf...Composition / Motif Analysis •Composition –Good place to start, can provide either biological or technical insight –See if hits (up](https://reader033.vdocument.in/reader033/viewer/2022050323/5f7d5f13c2e742693d62675c/html5/thumbnails/36.jpg)
Hit validationDirectionality
• Most ChIP enrichments are not strand-specific
• Should expect to see enrichment on both strands
![Page 37: Analysing ChIP-Seq Data - Babraham Bioinf...Composition / Motif Analysis •Composition –Good place to start, can provide either biological or technical insight –See if hits (up](https://reader033.vdocument.in/reader033/viewer/2022050323/5f7d5f13c2e742693d62675c/html5/thumbnails/37.jpg)
Hit validationHeatmap
• You should be able to see consistency between replicates
![Page 38: Analysing ChIP-Seq Data - Babraham Bioinf...Composition / Motif Analysis •Composition –Good place to start, can provide either biological or technical insight –See if hits (up](https://reader033.vdocument.in/reader033/viewer/2022050323/5f7d5f13c2e742693d62675c/html5/thumbnails/38.jpg)
Experimental Design
![Page 39: Analysing ChIP-Seq Data - Babraham Bioinf...Composition / Motif Analysis •Composition –Good place to start, can provide either biological or technical insight –See if hits (up](https://reader033.vdocument.in/reader033/viewer/2022050323/5f7d5f13c2e742693d62675c/html5/thumbnails/39.jpg)
Experimental Design Considerations
• All normal rules apply– Think about sources of variation
– Don't confound variables
– Think about what batch effects might exist
• Test your antibody well before starting– By far the biggest factor in success
– Good performance on Western / in-situ is not a guarantee, but it's a good start
![Page 40: Analysing ChIP-Seq Data - Babraham Bioinf...Composition / Motif Analysis •Composition –Good place to start, can provide either biological or technical insight –See if hits (up](https://reader033.vdocument.in/reader033/viewer/2022050323/5f7d5f13c2e742693d62675c/html5/thumbnails/40.jpg)
Experimental Design Considerations
• Number of replicates
– Lots of studies use 2 replicates
– Fine for just finding binding sites (motif analysis)
– Not really enough for differential binding
• Huge reliance on 'information sharing'
• No accurate measurement of variance per peak
• Potentially over-predicts differential binding
– Should think about likely levels of variability and make replicates to match
![Page 41: Analysing ChIP-Seq Data - Babraham Bioinf...Composition / Motif Analysis •Composition –Good place to start, can provide either biological or technical insight –See if hits (up](https://reader033.vdocument.in/reader033/viewer/2022050323/5f7d5f13c2e742693d62675c/html5/thumbnails/41.jpg)
Experimental Design Considerations
• Amount of sequencing– Can be difficult to predict
– Depends on• Genome size
• Proportion of genome which is enriched
• Efficiency of enrichment
– ENCODE standard is ~20M reads per sample• Can get away with fewer (K4me3 for example)
• Will need more for some marks (H3 for example)
• Sequencing depth will affect ability to detect changes
![Page 42: Analysing ChIP-Seq Data - Babraham Bioinf...Composition / Motif Analysis •Composition –Good place to start, can provide either biological or technical insight –See if hits (up](https://reader033.vdocument.in/reader033/viewer/2022050323/5f7d5f13c2e742693d62675c/html5/thumbnails/42.jpg)
Experimental Design Considerations
• Type of sequencing
– Single end is fine for most applications
• ATAC-Seq can require paired end for some analyses
– Moderate read length is required
• Can map anywhere in the genome
• 50bp is probably OK. 100bp would be preferable
![Page 43: Analysing ChIP-Seq Data - Babraham Bioinf...Composition / Motif Analysis •Composition –Good place to start, can provide either biological or technical insight –See if hits (up](https://reader033.vdocument.in/reader033/viewer/2022050323/5f7d5f13c2e742693d62675c/html5/thumbnails/43.jpg)
Material for the Course
• All
– Slides
– Exercises
– Data
– Virtual Machine Images
Are available at http://www.bioinformatics.babraham.ac.uk/training.html
![Page 44: Analysing ChIP-Seq Data - Babraham Bioinf...Composition / Motif Analysis •Composition –Good place to start, can provide either biological or technical insight –See if hits (up](https://reader033.vdocument.in/reader033/viewer/2022050323/5f7d5f13c2e742693d62675c/html5/thumbnails/44.jpg)
Downstream Analyses
![Page 45: Analysing ChIP-Seq Data - Babraham Bioinf...Composition / Motif Analysis •Composition –Good place to start, can provide either biological or technical insight –See if hits (up](https://reader033.vdocument.in/reader033/viewer/2022050323/5f7d5f13c2e742693d62675c/html5/thumbnails/45.jpg)
Composition / Motif Analysis
• Composition
– Good place to start, can provide either biological or technical insight
– See if hits (up vs down) cluster based on the underlying sequence composition
• Motifs
– Great for defining putative binding sites
– Interesting to do sensitivity check
– Can do differential motif calling (for hit/non-hit)
![Page 46: Analysing ChIP-Seq Data - Babraham Bioinf...Composition / Motif Analysis •Composition –Good place to start, can provide either biological or technical insight –See if hits (up](https://reader033.vdocument.in/reader033/viewer/2022050323/5f7d5f13c2e742693d62675c/html5/thumbnails/46.jpg)
Compter - composition analysis
www.bioinformatics.babraham.ac.uk/projects/compter
![Page 47: Analysing ChIP-Seq Data - Babraham Bioinf...Composition / Motif Analysis •Composition –Good place to start, can provide either biological or technical insight –See if hits (up](https://reader033.vdocument.in/reader033/viewer/2022050323/5f7d5f13c2e742693d62675c/html5/thumbnails/47.jpg)
MEME - Motif Analysis
![Page 48: Analysing ChIP-Seq Data - Babraham Bioinf...Composition / Motif Analysis •Composition –Good place to start, can provide either biological or technical insight –See if hits (up](https://reader033.vdocument.in/reader033/viewer/2022050323/5f7d5f13c2e742693d62675c/html5/thumbnails/48.jpg)
Gene Ontology / Pathway
• Be careful how you relate hits to genes
– Really need to have a global link between peak positions and genes
– Random positions will give significant GO hits if you just use closest/overlapping genes