introduc)on*to*chroman*ip*– sequencing( chip7seq ... · cross7correlaon*plots* −500 0 500 1000...
TRANSCRIPT
![Page 1: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/1.jpg)
Introduc)on to Chroma)n IP – sequencing (ChIP-‐seq) data analysis
Linköping, 21 April 2016
Agata Smialowska BILS / NBIS, SciLifeLab, Stockholm University
Introduc)on to Bioinforma)cs using NGS data
![Page 2: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/2.jpg)
Chroma)n state and gene expression PEV Posi)on effect variega)on in Drosophila eye (nature.com)
Juxtaposi)on of eye colour genes with heterochroma)n results in the “moWled” eye coloura)on (red and white).
Proteins, which bind heterochroma)n, act to “spread” the silencing signal by providing a forward feedback loop.
Heterochroma)n Protein 1; Histone methyltransferase Su(var)3-‐9; H3K9 methyla)on
First observed by H. Muller 1930
![Page 3: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/3.jpg)
Chroma)n immunoprecipita)on
RnDsystems
![Page 4: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/4.jpg)
Applica)ons General transcrip)on machinery
![Page 5: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/5.jpg)
Applica)ons
Promoter-‐associated transcrip)on factors
![Page 6: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/6.jpg)
Applica)ons
Distal enhancers
![Page 7: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/7.jpg)
Applica)ons
Histone modifica)ons and variants Ac)va)on states
Co-‐factors
![Page 8: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/8.jpg)
design study obtain input chroma)n perform precipita)on construct library sequence library bioinforma1c analysis
Workflow of a ChIP-‐seq study
![Page 9: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/9.jpg)
ChIP-‐seq workflow
Liu, PoW and Huss, BMC Biology 2010
![Page 10: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/10.jpg)
Cri)cal factors
• An)body selec)on • Library cloning and sequencing • Algorithm for peak detec)on • Proper control sample (input chroma)n or mock IP)
• Reproducibility in chroma)n fragmenta)on • Cross-‐linker choice • Enough material and biological replicates
![Page 11: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/11.jpg)
Experiment design
• Sound experimental design: replica)on, randomisa)on and blocking (R.A. Fisher, 1935)
• In the absence of a proper design, it is essen)ally impossible to par))on biological varia)on from technical varia)on
• Sequencing depth: depends on the structure of the signal; cannot be linearly scaled to genome size
• Single-‐ vs. paired-‐end reads: PE improves read mapping confidence and gives a direct measure of fragment size, which otherwise has to be modelled or es)mated
![Page 12: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/12.jpg)
Ideal design: Each sample has a matched input Input sequenced to a comparable depth as IP sample ≥2 biological replicates for site iden)fica)on ≥3 biological replicates for differen)al binding
input library/sequencing
X ChIP
replicates
input library/sequencing ChIP
replicates
✓ input library/sequencing
ChIP replicates
under-‐sequenced input
ChIP
well-‐sequenced input
ChIP
X
Experiment design
![Page 13: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/13.jpg)
Importance of biological replicates
sample technical replicates are generally a waste of )me and money
libraries sequencing
X replicates libraries sequencing
origin many studies do not account for batch effects i. )me ii. origin
so if you care about reproducibility
samples
experiment
✓ )me -‐-‐-‐-‐-‐-‐-‐>
experiment1 experiment2 Experiment3… libraries, sequencing, etc
X
![Page 14: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/14.jpg)
pooled data
under-‐sequenced data
X
if you need to pool your data, then it is under-‐sequenced
pooled data actual replicates
✓
Importance of sequencing depth
![Page 15: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/15.jpg)
Sequencing depth depends on data type
TF: 20 M
point-‐source mixed signal broad signal
No clear guidelines for mixed and broad type of peaks
Transcrip)on Factors
Chroma)n Remodellers
Histone marks
Chroma)n Remodellers
Histone marks
RNA polymerase II
Human: ? ?
H3K4me3: 25 M H3K36me3: 35 M H3K27me3: 40 M
H3K9me3: >55 M
Source: The ENCODE consor)um; Jung et al, NAR 2014
![Page 16: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/16.jpg)
The ENCODE (Encyclopedia of DNA Elements) Consor)um and the Roadmap Epigenomics Consor)um are a vast resource of various kinds of func)onal genomics data (as well as RNA-‐seq data).
![Page 17: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/17.jpg)
• ChIP – sequencing: introduc)on from a bioinforma)cs point of view
• Principles of analysis of ChIP-‐seq data
• ChIP-‐seq: downstream analyses
• Resources
• Exercise overview
![Page 18: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/18.jpg)
• ChIP – sequencing: introduc)on from a bioinforma)cs point of view
• Principles of analysis of ChIP-‐seq data
• ChIP-‐seq: downstream analyses
• Resources
• Exercise overview
![Page 19: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/19.jpg)
Chroma)n = DNA + proteins
Park, Nature Rev Gene)cs, 2009
![Page 20: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/20.jpg)
Data analysis
![Page 21: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/21.jpg)
Profile of protein binding sites vs. input
Park, Nature Rev Gene)cs, 2009
Chromator (Drosophila) – protein binding methylated histones
![Page 22: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/22.jpg)
design study obtain input chroma)n perform precipita)on construct library sequence library library quality control filter sequences align sequences filter alignments iden1fy peaks / regions of enrichment assess data quality understand the data / results downstream analyses
Workflow of a ChIP-‐seq study
Itera)ve process
![Page 23: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/23.jpg)
• ChIP – sequencing: introduc)on from a bioinforma)cs point of view
• Principles of analysis of ChIP-‐seq data
• ChIP-‐seq: downstream analyses
• Resources
• Exercise overview
![Page 24: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/24.jpg)
Two ques)ons to address
• 1. Did the ChIP part of the ChIP-‐seq experiment work? Was the enrichment successful?
• 2. Where are the binding sites (of the protein of interest)?
![Page 25: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/25.jpg)
Word of cau)on!
ChIP-‐seq experiments are more unpredictable than RNA-‐seq! Error sources: chroma)n structure PCR over-‐amplifica)on non-‐specific an)body other things?
![Page 26: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/26.jpg)
ChIP-‐seq QC: did the ChIP work?
• 1. Inspect the signal (mapped reads, coverage profiles) in genome browser
• 2. Compute peak-‐independent quality metrics (cross correla)on, cumula)ve enrichment)
• 3. Assess replicate consistency (correla)ons between replicates of the same condi)on)
![Page 27: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/27.jpg)
tag density distribu)on reproducibility similarity of coverage signal at known sites … Sposng inconsistencies Confounding factors Under-‐sequenced libraries …
![Page 28: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/28.jpg)
How do I know my data is of good quality?
Marinov et al, G3 2013
Library complexity
![Page 29: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/29.jpg)
Sequence duplica)on level > 80% (low complexity library)
Quality control: tag uniqueness – library complexity metric
NRF: Non-‐redundant frac)on (of reads): propor)on of unique tags / total less than 20% of reads should be duplicates for 10 million reads sequenced (ENCODE)
FastQC Babraham Ins)tute
![Page 30: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/30.jpg)
How do I know my data is of good quality?
Marinov et al, G3 2013
Objec)ve (i.e. peak independent) metrics to quan)fy enrichment in ChIP-‐seq; for TF in mammalian systems: Normalised Strand Correla)on NSC Rela)ve Strand Correla)on RSC
Large-‐scale quality analysis of published ChIP-‐seq data sets: 20% low quality 25% intermediate quality 30% inputs have metrics similar to IPs
![Page 31: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/31.jpg)
Strand cross-‐correla)on
Carroll et al, Front Genet 2014
The correla)on between signal of the 5ʹ′ end of reads on the (+) and (-‐) strands is assessed axer successive shixs of the reads on the (+) strand and the point of maximum correla)on between the two strands is used as an es)ma)on of fragment length.
Strand shix
Cross c
orrela)o
n
![Page 32: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/32.jpg)
Strand cross-‐correla)on
Carroll et al, Front Genet 2014
NSC = Max CC value (fLen)
Min CC RSC =
Max CC – Min CC
Phantom CC – Min CC
![Page 33: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/33.jpg)
Cross-‐correla)on plots
−500 0 500 1000 1500
0.20
00.
205
0.21
00.
215
0.22
00.
225
strand−shift (105,455)
cros
s−co
rrela
tion
ENCFF000OWMed.sorted.1.bam.picard.bam
NSC=1.14102,RSC=1.06452,Qtag=1
−500 0 500 1000 1500
0.28
60.
288
0.29
00.
292
0.29
40.
296
0.29
80.
300
strand−shift (100,265,245)
cros
s−co
rrela
tion
ENCFF000PET.sorted.1.bam.picard.bam
NSC=1.01443,RSC=0.289702,Qtag=−1
−500 0 500 1000 1500
0.19
0.20
0.21
0.22
0.23
strand−shift (130)
cros
s−co
rrela
tion
ENCFF000PMG.sorted.1.bam
NSC=1.28071,RSC=0.987276,Qtag=0
−500 0 500 1000 15000.25
0.26
0.27
0.28
0.29
0.30
strand−shift (125)
cros
s−co
rrela
tion
ENCFF000PMJ.sorted.1.bam
NSC=1.21367,RSC=1.39752,Qtag=1
−500 0 500 1000 1500
0.27
40.
275
0.27
60.
277
0.27
8
strand−shift (90,200,210)
cros
s−co
rrela
tion
ENCFF000PON.sorted.1.bam.picard.bam
NSC=1.0166,RSC=0.92739,Qtag=0
Very good enrichment
Acceptable enrichment Poor enrichment,
possibly undersequenced
No clustering Good input
Read clustering Bad input
Input
ChIP
![Page 34: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/34.jpg)
Cumula)ve enrichment aka “Fingerprint” is another metric for successful enrichment
hWp://deeptools.readthedocs.org Diaz et al, Genome Biol 2012
![Page 35: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/35.jpg)
Park, Nature Rev Gene)cs, 2009
![Page 36: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/36.jpg)
Peak calling
appropriate methodologies depend on data type
SPP MACS2
punctate mixed signal broad signal
-‐ -‐
This is an ac)ve area of algorithm development
Transcrip)on Factors
Chroma)n Remodellers
Histone marks
Chroma)n Remodellers
Histone marks
RNA polymerase II
![Page 37: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/37.jpg)
Principle of peak detec)on
Symmetry in reads mapped to opposite DNA strands
Computa)on of enrichment model
![Page 38: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/38.jpg)
Pepke, 2009
![Page 39: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/39.jpg)
Comparison of peak calling algorithms
Wilbanks 2010
![Page 40: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/40.jpg)
Point-‐source vs. broad peak detec)on
Wilbanks 2010
Sequence-‐specific binding (TFs) Distributed binding (histones, RNApol2)
![Page 41: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/41.jpg)
Comparison of enriched regions detected by various algorithms
Jung 2014
55M human
![Page 42: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/42.jpg)
Comparison of enriched regions detected by various algorithms
Jung 2014
55M human
![Page 43: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/43.jpg)
“Hyper-‐chippable” regions
Carroll et al, Front Genet 2014
DER – Duke Excluded Regions (11 repeat classes) UHS – Ultra High Signal (open chroma)n) DAC – consensus excluded regions
Reads mapped to these regions should be filtered out prior to peak calling Tracks available from UCSC for human, mouse, fly and worm
![Page 44: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/44.jpg)
Quality considera)ons
• ChIP-‐seq quality guidelines from the ENCODE project (Rela)ve strand cross-‐correla)on, Irreproducible discovery rate)
• An)body valida)on • Appropriate sequencing depth (depending on genome size and
peak type). For human genome and broad-‐source peaks, min. 40-‐50M reads is required.
• Experimental replica)on • Frac)on of reads in peaks (FRiP) > 1% • Cross correla)on (correla)on of the density of sequences aligned to
opposite DNA strands axer shixing by the fragment size) • Experimental verifica)on of known binding sites (and sites not
bound as nega)ve controls)
![Page 45: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/45.jpg)
ChIP-‐exo: improvement in binding site iden)fica)on
Rhee and Pugh, Cell 2011
![Page 46: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/46.jpg)
Other func)onal genomics techniques
Clifford et al, Nature Rev Genet, 2014
![Page 47: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/47.jpg)
• ChIP – sequencing: introduc)on from a bioinforma)cs point of view
• Principles of analysis of ChIP-‐seq data
• ChIP-‐seq: downstream analyses
• Resources
• Exercise overview
![Page 48: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/48.jpg)
ChIPseq downstream analyses
• Valida)on (wet lab)
• Downstream analysis – Mo)f discovery – Annota)on – Integra)on of binding and expression data – Integra)on of various binding datasets – Differen)al binding
![Page 49: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/49.jpg)
Peak annota)on
Iden)fica)on of nearest genomic features • BEDtools, • BEDops, • PeakAnnotator, • CisGenome, • In R / Bioconductor: ChIPpeakAnno
![Page 50: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/50.jpg)
Mo)f detec)on
• Enrichment of known sequence mo)fs (CEAS, Transfac Match, HOMER, RSAT)
• De novo mo)f detec)on (MEME, CisFinder, HMS, DREME, ChIPMunk, HOMER, RSAT)
Enrichment of known mo)fs (Homer):
![Page 51: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/51.jpg)
Signal visualisa)on and interpreta)on
Binding profile of a TF in rela)on to the transcrip)on start site
deepTools ngsplots seqMiner
• Clustering • Heatmaps • Profiles • Comparison of
different datasets
![Page 52: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/52.jpg)
Differen)al occupancy
• Use algorithms developed for differen)al expression and summarise reads mapped in peaks; normalisa)on; sta)s)cal tes)ng; R environment – edgeR / csaw – DiffBind (implements several normalisa)on methods)
• Calculate enrichment in sliding windows – DROMPA – Diffreps
![Page 53: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/53.jpg)
• ChIP – sequencing: introduc)on from a bioinforma)cs point of view
• Principles of analysis of ChIP-‐seq data
• ChIP-‐seq: downstream analyses
• Resources
• Exercise overview
![Page 54: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/54.jpg)
Where to obtain data?
![Page 55: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/55.jpg)
The ENCODE project
www.encodeproject.org • Encyclopedia of DNA elements • Iden)fica)on of regulatory DNA elements in human (and mouse) genome
• 240 human and 55 mouse DNA binding proteins • 1464 human and 432 mouse samples • RNA profiling, protein-‐DNA interac)on, chroma)n
condensa)on, DNA methyla)on, … • 2009 -‐ ongoing
![Page 56: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/56.jpg)
Human ACTB locus as seen in the UCSC Genome Browser
Gene model Alterna)ve transcripts Histone modifica)ons Chroma)n structure
Transcrip)on factor binding sites DNA conserva)on Single nucleo)de polymorphisms (SNP) Repeats
![Page 57: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/57.jpg)
Human ACTB locus as seen in the UCSC Genome Browser
Gene model Alterna)ve transcripts Histone modifica)ons Chroma)n structure
Transcrip)on factor binding sites DNA conserva)on Single nucleo)de polymorphisms (SNP) Repeats
![Page 58: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/58.jpg)
Human ACTB locus as seen in the UCSC Genome Browser
Gene model Alterna)ve transcripts Histone modifica)ons Chroma)n structure
Transcrip)on factor binding sites DNA conserva)on Single nucleo)de polymorphisms (SNP) Repeats
![Page 59: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/59.jpg)
Human ACTB locus as seen in the UCSC Genome Browser
Gene model Alterna)ve transcripts Histone modifica)ons Chroma)n structure
Transcrip)on factor binding sites DNA conserva)on Single nucleo)de polymorphisms (SNP) Repeats
![Page 60: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/60.jpg)
Human ACTB locus as seen in the UCSC Genome Browser
Gene model Alterna)ve transcripts Histone modifica)ons Chroma)n structure
Transcrip)on factor binding sites DNA conserva)on Single nucleo)de polymorphisms (SNP) Repeats
![Page 61: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/61.jpg)
The Epigenomics Roadmap Project
hWp://www.roadmapepigenomics.org/ • Reference human epigenomes • DNA methyla)on, histone modifica)ons, chroma)n
accessibility and small RNA transcripts • Stem cells and primary ex vivo )ssues • 111 )ssue and cell types • 2,804 genome-‐wide datasets
![Page 62: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/62.jpg)
Further reading
• Impact of ar)fact removal on ChIP quality metrics in ChIP-‐seq and ChIP-‐exo data. Carrol et al, Front. Genet. 2014
• Impact of sequencing depth in ChIP-‐seq experiments. Jung et al, NAR 2014
• ChIP-‐seq guidelines and prac)ces of the ENCODE and modENCODE consor)a. Landt et al, Genome Res. 2012
• hWp://genome.ucsc.edu/ENCODE/qualityMetrics.html#defini)ons
• hWps://www.encodeproject.org/data-‐standards
![Page 63: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/63.jpg)
Bioconductor ChIP-‐seq resources • General purpose tools:
– Rsubread (read mapping; not ideal for global alignment) – Rbow)e (global alignment) – GenomicRanges (tools for manipula)ng range data) – Rsamtools (SAM / BAM support) – htSeqTools (tools for NGS data; post-‐alignment QC) – chipseq (u)li)es for ChIP-‐seq analysis) – Csaw (a pipeline for ChIP-‐seq analysis, including sta)s)cal analysis of differen)al occupancy)
• Peak calling – SPP – BayesPeak (HMM and Bayesian sta)s)cs) – MOSAiCS (model-‐based one and two Sample Analysis and Inference for ChIP-‐Seq) – iSeq (Hidden Ising models) – ChIPseqR (developed to analyse nucleosome posi)oning data)
• Quality control – ChIPQC
• Differen)al occupancy – edgeR – DESeq, DESeq2 – DiffBind (compa)ble with objects used for ChIPQC, wrapper for DESeq and edgeR DE func)ons)
• Peak Annota)on – ChIPpeakAnno (annota)ng peaks with genome context informa)on)
![Page 64: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/64.jpg)
• ChIP – sequencing: introduc)on from a bioinforma)cs point of view
• Principles of analysis of ChIP-‐seq data
• ChIP-‐seq: downstream analyses
• Resources
• Exercise overview
![Page 65: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/65.jpg)
Exercise • 1. Quality control • 2. Read preprocessing • 3. Peak calling • 4. Exploratory analysis (sample clustering) • 5. Visualisa)on • 6. Sta)s)cal analysis of differen)al occupancy
![Page 66: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/66.jpg)
Did my ChIP work?
Cross-‐correla)on Cumula)ve enrichment
−500 0 500 1000 1500
0.10
0.12
0.14
0.16
0.18
0.20
0.22
0.24
strand−shift (100)
cros
s−co
rrela
tion
ENCFF000PED.chr12.bam
NSC=2.50193,RSC=1.87725,Qtag=2
![Page 67: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/67.jpg)
Exploratory analysis
Clustering of libraries by reads mapped in bins, genome – wide (spearman)
Clustering of libraries by reads mapped in peaks (pearson)
HeLa
Sknsh &
HepG
2 ne
ural
HepG
2
neural
Sknsh HeLa
HepG2
I
Ch
Ch
I
I
Ch
![Page 68: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/68.jpg)
Binding profile around TSS
![Page 70: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/70.jpg)
That’s all for now,
)me to do some hands-‐on work
![Page 71: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/71.jpg)
![Page 72: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/72.jpg)
Library quality control and preprocessing
• FastQC / Prinseq
• Trim adapters if any adapter sequences are present in the reads (as determined by the QC)
• In some cases, you’ll observe k-‐mer enrichment (especially if the data is ChIP-‐exo, a new varia)on of ChIP-‐seq) – it is not necessarily a bad thing, if sequence duplica)on levels are low; however it may indicate low complexity of the library – a warning sign that the enrichment in ChIP was not successful or the libraries are over-‐amplified (oxen the laWer is the consequence of the former)
![Page 73: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/73.jpg)
Sequence duplica)on level > 70% (low complexity library)
Quality control: tag uniqueness – library complexity metric
NRF: Non-‐redundant frac)on (of reads): propor)on of unique tags / total less than 20% of reads should be duplicates for 10 million reads sequenced (ENCODE)
![Page 74: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/74.jpg)
Mapping reads to the reference genome
• Choose the right reference: assembly version (not always the newest is best) and type (primary assembly, or assembly from individual chromosome sequences + non-‐chromosomal con)gs; not the top level assembly); choose the matching annota)on file (GTF, GFF)
• Read mapping: global alignment • Mappers (= aligners): Bow)e, BWA, BBMap, Novoalign, … (lots of tools are
available)
• Visualise data in genome browser – BAM files or tracks (wig, bedgraph, bigWig) – Local (IGV) or web-‐based (UCSC genome browser) – Data quality assessment
![Page 75: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/75.jpg)
Cross-‐correla)on profiles, RSC and NSC
• Metrics to quan)fy the fragment length signal and the ra)o of fragment length signal to read length signal
• Rela)ve Cross Correla)on (RSC) -‐ ChIP to ar)fact signal
• Normalised Cross Correla)on (NSC)
• TFs: fragment lengths are oxen greater than the size of the DNA binding event, the dis)nct clustering of (+) and (-‐) reads around this site is very apparent
• NSC>1.1 (higher values indicate more enrichment; 1 = no enrichment) • RSC>0.8 (0 = no signal; <1 low quality ChIP; >1 high enrichment • Broad peaks: this clustering may be more diffuse (fragment length < peak)
CC(Fragment length) min (CC)
CC(Fragment length)-‐min (CC) CC (read length) – min (CC)
![Page 76: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/76.jpg)
Comparison of peak calling algorithms
Wilbanks 2010
Peak overlap (Ho et al, 2012)
> 50 %
20 %
![Page 77: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/77.jpg)
Effect of sequencing depth on regions detected by various algorithms
Jung 2014
![Page 78: Introduc)on*to*Chroman*IP*– sequencing( ChIP7seq ... · Cross7correlaon*plots* −500 0 500 1000 1500 0.200 0.205 0.210 0.215 0.220 0.225 strand−shift (105,455) cross − correlation](https://reader034.vdocument.in/reader034/viewer/2022042415/5f302595a2733c160157be88/html5/thumbnails/78.jpg)
Fold enrichment = signal / background