chip%seq)analysis) · thechipseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 pubmed hits...
TRANSCRIPT
![Page 1: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/1.jpg)
ChIP-‐seq analysis
Morgane Thomas-‐Chollier Samuel Collombet
Computa)onal systems biology -‐ IBENS
Denis Thieffry, Jacques van Helden and Carl Herrmann kindly shared some of their slides.
M2 – Computa8onal analysis of cis-‐regulatory sequences 2014/2015
![Page 2: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/2.jpg)
The ChIP-‐seq era
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
Pubmed hits per year for "ChiP-Seq"
0
100
200
300
400
• Johnson DS, Mortazavi A, Myers RM, Wold B (2007) Genome-‐wide mapping of in vivo protein– DNA interacTons. Science 316: 1497–1502.
• Barski A, Cuddapah S, Cui K, Roth TY, Schones DE, et al. (2007) High-‐resoluTon profiling of histone methylaTons in the human genome. Cell 129: 823–837.
• Robertson G, Hirst M, Bainbridge M, Bilenky M, Zhao Y, et al. (2007) Genome-‐wide profiles of STAT1 DNA associaTon using chromaTn immu-‐ noprecipitaTon and massively parallel sequenc-‐ ing. Nat Methods 4: 651–657.
• Mikkelsen TS, Ku M, Jaffe DB, Issac B, Lieber-‐ man E, et al. (2007) Genome-‐wide maps of chromaTn state in pluripotent and lineage-‐ commiced cells. Nature 448: 553–560.
![Page 3: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/3.jpg)
ChIP (=Chroma8n Immuno-‐Precipita8on)
in vivo experimental methods to iden8fy binding sites
differences in methods to detect the bound DNA -‐ small-‐scale: PCR / qPCR -‐ large-‐scale:
-‐ microarray = ChIP-‐on-‐chip -‐ sequencing = ChIP-‐seq
h9p://www.chip-‐an)bodies.com/
![Page 4: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/4.jpg)
• find all regions in the genome bound by • a specific transcrip8on factor • histones bearing a specific
modifica8on
• in a given experimental condi.on (cell type, developmental stage,...)
The obtain ChIP-‐seq profiles have different shapes, depending on the targeted protein
Park, Nature reviews 2009
ChIP-‐seq applica8on
![Page 5: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/5.jpg)
Adapted from Bardet et al, Nature Protocols, 2012
treatment
control (input)
Quality check
Processing steps Downstream analyses
Aim of the course : ChIP-‐seq analysis workflow
![Page 6: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/6.jpg)
Control: input
Input (control without IP)
![Page 7: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/7.jpg)
ChIP-seq dataset (=treatment)
background noise
signal
=
+ How do we estimate the noise ?
Modelling noise levels
![Page 8: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/8.jpg)
Modelling noise levels
● noise is not uniform (chromaTn conformaTon, local biases, mappability)
● input dataset is mandatory for reliable local esTmaTon ! (although some algorithms do not require it … :-‐( )
treatment
input ?
![Page 9: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/9.jpg)
From sequence reads to peaks
FASTQ
sequences (reads length 36 / 50 bp, single-‐end) from Illumina
experiment Input FASTQ
Quality check
![Page 10: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/10.jpg)
FASTQ format @SRR002012.1 Oct4:5:1:871:340 GGCGCACTTACACCCTACATCCATTG + IIIIG1?II;IIIII1IIII1%.I7I @SRR002012.2 Oct4:5:1:804:348 GTCTGCATTATCTACCAGCACTTCCC + IIIIIIIII'I2IIIII:)I2II3I0 @SRR002012.3 Oct4:5:1:767:334 GCTGTCTTCCCGCTGTTTTATCCCCC + III8IIIIIII3III6II%II*III3 @SRR002012.4 Oct4:5:1:805:329 GTAGTTTACCTGTTCATATGTTTCTG + IIIIIII9IIIIII?IIIIIIII7II
SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS..................................................... ..........................XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX...................... ...............................IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII...................... .................................JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ...................... !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ | | | | | | 33 59 64 73 104 126 0 40 S - Sanger Phred+33, raw reads typically (0, 40) X - Solexa Solexa+64, raw reads typically (-5, 40) I - Illumina 1.3+ Phred+64, raw reads typically (0, 40) J - Illumina 1.5+ Phred+64, raw reads typically (3, 40) with 0=unused, 1=unused, 2=Read Segment Quality Control Indicator (bold)
adapted from Wikipedia
>SRR002012.1 Oct4:5:1:871:340 GGCGCACTTACACCCTACATCCATTG >SRR002012.2 Oct4:5:1:804:348 GTCTGCATTATCTACCAGCACTTCCC >SRR002012.3 Oct4:5:1:767:334 GCTGTCTTCCCGCTGTTTTATCCCCC >SRR002012.4 Oct4:5:1:805:329 GTAGTTTACCTGTTCATATGTTTCTG
FASTA format 1 read = 4 lines
![Page 11: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/11.jpg)
FASTQ
sequences (reads length 36 / 50 bp, single-‐end) from Illumina
quality check FASTQC
experiment Input FASTQ
From sequence reads to peaks
![Page 12: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/12.jpg)
hcp://www.bioinformaTcs.bbsrc.ac.uk/projects/fastqc/ hcp://bioinfo-‐core.org/index.php/9th_Discussion-‐28_October_2010 hcp://bioinfo.cipf.es/courses/mda11/lib/exe/fetch.php?media=ngs_qc_tutorial_mda_val_2011.pdf
![Page 13: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/13.jpg)
modEncode Kni Drosophila
![Page 14: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/14.jpg)
From sequence reads to peaks
FASTQ
FASTQC
experiment Input FASTQ
FASTQ FASTQ
mapping
BowTe BWA
BED BAM SAM
BED BAM SAM
Source: h9p://trac.seqan.de
![Page 15: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/15.jpg)
From sequence reads to peaks
FASTQ
FASTQC
experiment Input FASTQ
FASTQ FASTQ
mapping
BowTe BWA
BED BAM SAM
BED BAM SAM
visualiza8on
TF
Input
WIG BEDGRAPH
![Page 16: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/16.jpg)
From sequence reads to peaks
FASTQ
FASTQC
experiment Input FASTQ
FASTQ FASTQ
mapping
BowTe BWA
BED BAM SAM
BED BAM SAM
visualiza8on
peak calling MACS
WIG BEDGRAPH
![Page 17: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/17.jpg)
From sequence reads to peaks
FASTQ
FASTQC
experiment Input FASTQ
FASTQ FASTQ
mapping
BowTe BWA
BED BAM SAM
BED BAM SAM
visualiza8on
peak calling MACS
BED
peak list
WIG BEDGRAPH
![Page 18: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/18.jpg)
mapping
peak-‐calling
Valouev Nat Methods (2008), Jothi, NAR (2008)
![Page 19: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/19.jpg)
bimodal enrichment pacern
1 – modelling the read shio size
2 – peak calling
1 : search high-quality paired peaks : separates their forward and reverse reads, and aligns them by the midpoint. The distance between the modes of the forward and reverse peaks in the alignment is defined as d, and MACS shifts all reads by d/2 toward the 3′ ends to better locate the precise binding sites. 2: uses the shift size to search for peaks, Poisson distribution to measure the p-value of each peak, and False Discovery Rate (FDR) calculation using the input data
Two steps strategy :
MACS to IdentifyPeaks from
ChIP-Seq Data
2.14.6
Supplement 34 Current Protocols in Bioinformatics
Table 2.14.4 MACS Output Files
File name Description
FoxA1 model.r An R script to produce a PDF image about theshifting size model based on ChIP-Seq data
FoxA1 peaks.xls A tabular file containing information about calledpeaks
FoxA1 peaks.bed A BED format file containing the peak locations
FoxA1 summits.bed A BED format file containing the summits locationsfor called peaks
FoxA1 negative peaks.xls A tabular file containing information about negativepeaks, which are called by swapping the ChIP-Seqand control channel. The file will only be generatedwhen control data are available.
FoxA1 MACS wiggle A directory containing compressed wiggle formatfiles, which are generated through the pileup ofshifted reads for each chromosome
peak model
forward tagsreverse tagsshifted tags
d =119
!600
0.0
!400 !200 0Distance to the middle
200 400 600
0.1
0.2
0.3
Per
cent
age
0.4
0.5
Figure 2.14.1 Shifting size model for FoxA1 ChIP-Seq data. The red curve represents thedistribution of locations relative to the midpoints for reads from the forward strand, while theblue curve represents the same for reads from the reverse strand. d is determined as the dis-tance between the summits of red and blue curves, and MACS shifts all reads by d/2 towardthe 3! ends to improve the spatial resolution of inferred TF binding sites. The black curveis drawn based on the locations of shifted reads. For the color version of this figure go tohttp://www.currentprotocols.com/protocol/bi0214.
For each peak, the detailed information includes the chromosome name, start po-sition, end position, length of the peak region, summit location related to the peakstart position, number of reads in the peak region, "10*log10 (p-value) for the peakregion (i.e., a value of 100 means a p-value of 1e-10), fold enrichment for this re-gion (compared to the expectation from Poisson distribution with local lambda), andfalse-discovery rate (FDR). FoxA1 peaks.xls is a tabular plain text file, and theuser can open it in Excel and sort or filter using Excel functions (see Table 2.14.5).
Feng, J., Liu, T., & Zhang, Y. (2011). Using MACS to Iden)fy Peaks from ChIP-‐Seq Data, Current Protocols in Bioinforma)cs
From sequence reads to peaks
![Page 20: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/20.jpg)
ChIP-‐seq signal for transcrip8on factors
We expect to see a typical strand asymetry in read densiTes → ChIP peak recogniTon pacern
![Page 21: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/21.jpg)
Tag shiUing
Each tag is shioed by d/2 (i.e. towards the middle of the IP fragment) where d represent the fragment length
![Page 22: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/22.jpg)
From sequence reads to peaks
FASTQ
FASTQC
experiment Input FASTQ
FASTQ FASTQ
mapping
BowTe BWA
BED BAM SAM
BED BAM SAM
visualiza8on
peak calling MACS
BED
peak list
WIG BEDGRAPH
![Page 23: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/23.jpg)
Peak list (BED file)
chr1 145436475 145438649 1478 3206.01 + chr4 50881 52467 19930 3180.67 + chr9 31335610 31336400 26372 3170.26 + chr6 36971531 36973765 22937 3147.85 + chr4 16234642 16236143 20221 3133.43 + chr21 40144820 40146203 17188 3131.68 + chr19 40916830 40918210 13487 3127.46 + chr4 140477689 140479184 20737 3115.67 + chr3 12996108 12998488 18417 3108.55 + chr9 749205 752142 26263 3101.90 + chr1 11628770 11630411 268 3100.00 + chr1 153742611 153744775 1556 3100.00 +
![Page 24: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/24.jpg)
Read mapping programs
• BowTe (BowTe2) • BWA (BWA2) • STAR
• Generally not having a strong influence on the results » Parameters: retain uniquely mapped reads
![Page 25: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/25.jpg)
Peak-‐calling programs
• Strong influence on the called peaks » Many different programs » They do not share the same « default » threshold to retain peaks » The top highest peaks are usually common, but the less obvious peaks
are ooen not shared between different peak callers
Mali Salmon-‐Divon et al, BMC Bioinforma)cs, 2010
![Page 26: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/26.jpg)
Peak calling programs
• To be chosen according to type of expected peaks » TranscripTon factors and « sharp » peaks » ChromaTn marks and « broad peaks »
• Many new programs sTll developped !
![Page 27: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/27.jpg)
Adapted from Bardet et al, Nature Protocols, 2012
treatment
control (input)
Quality check
Processing steps Downstream analyses
Aim of the course : ChIP-‐seq analysis workflow
![Page 28: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/28.jpg)
Aim of the course
2 – Downstream analyses
-‐ visualisa8on -‐ func8onal annota8on of peaks -‐ mo8f discovery in peaks
![Page 29: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/29.jpg)
Visualizing in a genome browser
• Local tools (IGV) » Fast » Ideal for sensiTve datasets
• web-‐based tools (UCSC browser) with custom tracks » Integrated with many other informaTon (conservaTon,…) » Easy to share between collaborators
• File formats » BED => simply defines a region (start-‐end) » WIG, bedgraph => value assigned to each posiTon
![Page 30: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/30.jpg)
Vizualizing in IGV
![Page 31: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/31.jpg)
Distance to closest TSS Distance of the peaks to the closest TSS
Distance to the closest TSS
Num
ber o
f pea
ks
020
4060
8010
0
−5000 −2500 0 2500 5000
![Page 32: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/32.jpg)
Localisa8on of the peaks in the genome
![Page 33: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/33.jpg)
Genes → Regions ← Peaks
● Idea : assign funcTonal annotaTon to genomic regions use staTsTcs to avoid biases
● assign to each gene a regulatory domain basal (-‐5kb/+1kb from TSS) extended (up to nearest
basal region ; max 1Mb)
● each domain is annotated to the funcTonal terms of the corresponding gene → "Func8onal domains"
"GREAT improves functional interpretation of cis-regulatory regions" McLean et al. Nat. Biotech. (2010)
![Page 34: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/34.jpg)
Genes → Regions ← Peaks
"GREAT improves functional interpretation of cis-regulatory regions" McLean et al. Nat. Biotech. (2010)
term A term B
Given that 60% of the genome is annotated to A, would I randomly expect 3 or more peaks to fall into region A ?
Given that 15% of the genome is annotated to B, would I randomly expect 3 or more peaks to fall into region B ?
p = 0.07
p > 0.5
![Page 35: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/35.jpg)
Aim of the course
2 – Downstream analyses
-‐ visualisa8on -‐ func8onal annota8on of peaks -‐ mo8f discovery in peaks
![Page 36: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/36.jpg)
de novo mo8f discovery
target gene
cis-‐regulatory elements
target gene
target gene
transcripTon factor
binding moTf
Problem : How can we model/describe the binding specificity of a given TF ?
![Page 37: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/37.jpg)
• Find excepTonal moTfs based on the sequence only (A priori no knowledge of the moTf to look for)
• Criteria of excepTonality:
- higher/lower frequency than expected by chance (over-‐/under-‐representa8on) - concentraTon at specific posiTons relaTve to some reference coordinate (posi8onal bias)
!"#$%&'$()(**+***&,&
!"#$%#"&'"()"#%
*+),-./'(0%#"&'"()"#%123"(%+4+56+76"8%
!&+-#./012-3&-4&."516/$&'783-#816,&139&:"516/$&'#/62"0$;23)&<-%%$<2-3,&
97#".4"0%4#%":;")$"0%<=>".%/))'.."()"#%
:=.$<0$9&-<</%%$3<$;&'>0?&-%9$%&@1%A-5&#-9$6,&&
B7;$%5$9&'0$;0&;$C
/$3<$;,&
?%97#".4"0&!"#$%&
-<</%%$3<$;&<-#./0$9&4%-#D&&&
@:;")$"0&!"#$%&-<</%%$3<$;&<-#./0$9&
4%-#D&&
!3"/."A)+6%,=>".#%B."&'"()5"#%B./>%$"#$%
#"&'"()"#%
BE&
*%
F"#$%&'$()(**+***G&,&
97#".4"0%/))'.."()"#%;".%25(0/2% @:;")$"0%/))'.."()"#%;".%25(0/2%B/66/25(-%+(%3/>/-"("/'#%>/0"6%
H839-I;&>J&30& H839-I;&>J&30&
.-;82-3& .-;82-3&
-<</%%$3
<$;&
K& K&
C%!"#$%&'$()(**+***&,&
:=.$<0$9&-<</%%$3<$;&83&I?-6$&;$C/$3<$;&
B7;$%5$9&83&I839-
I&
97#".4"0%4#%":;")$"0%<=>".%/))'.."()"#%97#".4"0%/))'."()"#%;".%25(0/2%
de novo mo8f discovery
![Page 38: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/38.jpg)
• Tools already exist for a long Tme !
- MEME (1994) - RSAT oligo-‐analysis (1998) - AlignACE (2000) - Weeder (2001) - MoTfSampler (2001)
Why do we need new approaches for genome-‐wide datasets ?
de novo mo8f discovery
![Page 39: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/39.jpg)
New approaches for ChIP-‐seq datasets
• Size, size, size
-‐ limited numbers of promoters and enhancers -‐ dozens of thousands of peaks !!!!!!
hcp://www.genomequest.com/landing-‐pages/ODI-‐webinar-‐web.html
![Page 40: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/40.jpg)
• Size, size, size
-‐ limited numbers of promoters and enhancers -‐ dozens of thousands of peaks !!!!!!
• the problem is slightly different -‐ promoters: 200-‐2000bp from co-‐regulated genes -‐ peaks: 300bp, posiTonal bias
hcp://www.genomequest.com/landing-‐pages/ODI-‐webinar-‐web.html
!"#$%&'$()(**+***&,&
!"#$%#"&'"()"#%
*+),-./'(0%#"&'"()"#%123"(%+4+56+76"8%
!&+-#./012-3&-4&."516/$&'783-#816,&139&:"516/$&'#/62"0$;23)&<-%%$<2-3,&
97#".4"0%4#%":;")$"0%<=>".%/))'.."()"#%
:=.$<0$9&-<</%%$3<$;&'>0?&-%9$%&@1%A-5&#-9$6,&&
B7;$%5$9&'0$;0&;$C
/$3<$;,&
?%97#".4"0&!"#$%&
-<</%%$3<$;&<-#./0$9&4%-#D&&&
@:;")$"0&!"#$%&-<</%%$3<$;&<-#./0$9&
4%-#D&&
!3"/."A)+6%,=>".#%B."&'"()5"#%B./>%$"#$%
#"&'"()"#%
BE&
*%
F"#$%&'$()(**+***G&,&
97#".4"0%/))'.."()"#%;".%25(0/2% @:;")$"0%/))'.."()"#%;".%25(0/2%B/66/25(-%+(%3/>/-"("/'#%>/0"6%
H839-I;&>J&30& H839-I;&>J&30&
.-;82-3& .-;82-3&
-<</%%$3
<$;&
K& K&
C%!"#$%&'$()(**+***&,&
:=.$<0$9&-<</%%$3<$;&83&I?-6$&;$C/$3<$;&
B7;$%5$9&83&I839-
I&
97#".4"0%4#%":;")$"0%<=>".%/))'.."()"#%97#".4"0%/))'."()"#%;".%25(0/2%
New approaches for ChIP-‐seq datasets
![Page 41: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/41.jpg)
• Size, size, size
-‐ limited numbers of promoters and enhancers -‐ dozens of thousands of peaks !!!!!!
• the problem is slightly different -‐ promoters: 200-‐2000bp from co-‐regulated genes -‐ peaks: 300bp, posiTonal bias
• mo8f analysis: not just for specialists anymore ! -‐ complete user-‐friendly workflows
hcp://www.genomequest.com/landing-‐pages/ODI-‐webinar-‐web.html
!"#$%&'$()(**+***&,&
!"#$%#"&'"()"#%
*+),-./'(0%#"&'"()"#%123"(%+4+56+76"8%
!&+-#./012-3&-4&."516/$&'783-#816,&139&:"516/$&'#/62"0$;23)&<-%%$<2-3,&
97#".4"0%4#%":;")$"0%<=>".%/))'.."()"#%
:=.$<0$9&-<</%%$3<$;&'>0?&-%9$%&@1%A-5&#-9$6,&&
B7;$%5$9&'0$;0&;$C
/$3<$;,&
?%97#".4"0&!"#$%&
-<</%%$3<$;&<-#./0$9&4%-#D&&&
@:;")$"0&!"#$%&-<</%%$3<$;&<-#./0$9&
4%-#D&&
!3"/."A)+6%,=>".#%B."&'"()5"#%B./>%$"#$%
#"&'"()"#%
BE&
*%
F"#$%&'$()(**+***G&,&
97#".4"0%/))'.."()"#%;".%25(0/2% @:;")$"0%/))'.."()"#%;".%25(0/2%B/66/25(-%+(%3/>/-"("/'#%>/0"6%
H839-I;&>J&30& H839-I;&>J&30&
.-;82-3& .-;82-3&
-<</%%$3
<$;&
K& K&
C%!"#$%&'$()(**+***&,&
:=.$<0$9&-<</%%$3<$;&83&I?-6$&;$C/$3<$;&
B7;$%5$9&83&I839-
I&
97#".4"0%4#%":;")$"0%<=>".%/))'.."()"#%97#".4"0%/))'."()"#%;".%25(0/2%
New approaches for ChIP-‐seq datasets
![Page 42: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/42.jpg)
Comparison of tools for ChIP-‐seq
Table 1. Features of software tools used for analyzing motifs in ChIP-seq peak sequences.
Program
peak-m
oti
fs
Ch
ipM
un
k
Co
mp
lete
Mo
tifs
MEM
E-C
hIP
MIC
SA
Gim
meM
oti
fs
Web interface yes yes yes yes no no
Size limitation unrestricted (Web site tested with 22
Mb)
100kb (web site) 500kb (web site) unrestricted, but motif discovery restricted to 600 peaks clipped to
100bp
motif discovery restricted to a few hundred base pairs
-
Stand-alone version yes yes no yes yes yesTasks
peak finding no no no no yes no annotation of peak-flanking genes no no yes no no sequence composition (mono- and di-nucleotides)
yes no no no no
motif discovery yes yes yes yes yes yes enrichment in motifs from databases no no yes yes no enrichment in discovered motifs yes no no no no
peak scoring no no no yes yes no
motif clustering no no no no yes
comparison discovered motifs / motif DB yes no no yes yes
sequence scanning for site prediction yes no no yes no
positional distribution of sites inside peaks yes no yes no yes
visualization in genome browsers yes no yes no no
Motif discovery algorithms RSAT oligo-analysisRSAT dyad-analysis
RSAT position-analysisRSAT local-word-analysis+ in stand-alone version:
MEME ChIPMunk
ChipMunk ChipMunkMEME
Weeder
MEMEDREME
MEME MEMEWeeder
MotifSamplerBioProspector
GademImprobizerMDmodule
TrawlerMoAn
Pattern matching algorithms RSAT matrix-scan-quick no patser MAST + AME (enrichment)
no
Motif comparison algorithm RSAT compare-motifs no STAMP TOMTOM STAMPMotif clustering algorithm STAMPComparison between discovered motifs yes no yes no yesMotif database comparisons JASPAR
UNIPROBEDMMPMM
RegulonDBupload your own database
no JASPARTRANSFAC
JASPARTRANSFACUNIPROBE
FLYREGDPINTERACT
SCPDDMMPMM
and many others
no
Motif sizes variable (multiple word assembly)
user-specified <=25 for MEME<=12 for Weeder
<=13 for ChipMunk
predefined ranges(small, medium,
large, extra-large)Multiple motifs yes no yes yes yesRef (PMID) This article 20736340 21183585 21486936 20375099 21081511
![Page 43: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/43.jpg)
Table 1. Features of software tools used for analyzing motifs in ChIP-seq peak sequences.
Program
peak-m
oti
fs
Ch
ipM
un
k
Co
mp
lete
Mo
tifs
MEM
E-C
hIP
MIC
SA
Gim
meM
oti
fs
Web interface yes yes yes yes no no
Size limitation unrestricted (Web site tested with 22
Mb)
100kb (web site) 500kb (web site) unrestricted, but motif discovery restricted to 600 peaks clipped to
100bp
motif discovery restricted to a few hundred base pairs
-
Stand-alone version yes yes no yes yes yesTasks
peak finding no no no no yes no annotation of peak-flanking genes no no yes no no sequence composition (mono- and di-nucleotides)
yes no no no no
motif discovery yes yes yes yes yes yes enrichment in motifs from databases no no yes yes no enrichment in discovered motifs yes no no no no
peak scoring no no no yes yes no
motif clustering no no no no yes
comparison discovered motifs / motif DB yes no no yes yes
sequence scanning for site prediction yes no no yes no
positional distribution of sites inside peaks yes no yes no yes
visualization in genome browsers yes no yes no no
Motif discovery algorithms RSAT oligo-analysisRSAT dyad-analysis
RSAT position-analysisRSAT local-word-analysis+ in stand-alone version:
MEME ChIPMunk
ChipMunk ChipMunkMEME
Weeder
MEMEDREME
MEME MEMEWeeder
MotifSamplerBioProspector
GademImprobizerMDmodule
TrawlerMoAn
Pattern matching algorithms RSAT matrix-scan-quick no patser MAST + AME (enrichment)
no
Motif comparison algorithm RSAT compare-motifs no STAMP TOMTOM STAMPMotif clustering algorithm STAMPComparison between discovered motifs yes no yes no yesMotif database comparisons JASPAR
UNIPROBEDMMPMM
RegulonDBupload your own database
no JASPARTRANSFAC
JASPARTRANSFACUNIPROBE
FLYREGDPINTERACT
SCPDDMMPMM
and many others
no
Motif sizes variable (multiple word assembly)
user-specified <=25 for MEME<=12 for Weeder
<=13 for ChipMunk
predefined ranges(small, medium,
large, extra-large)Multiple motifs yes no yes yes yesRef (PMID) This article 20736340 21183585 21486936 20375099 21081511
Table 1. Features of software tools used for analyzing motifs in ChIP-seq peak sequences.
Program
peak-m
oti
fs
Ch
ipM
un
k
Co
mp
lete
Mo
tifs
MEM
E-C
hIP
MIC
SA
Gim
meM
oti
fs
Web interface yes yes yes yes no no
Size limitation unrestricted (Web site tested with 22
Mb)
100kb (web site) 500kb (web site) unrestricted, but motif discovery restricted to 600 peaks clipped to
100bp
motif discovery restricted to a few hundred base pairs
-
Stand-alone version yes yes no yes yes yesTasks
peak finding no no no no yes no annotation of peak-flanking genes no no yes no no sequence composition (mono- and di-nucleotides)
yes no no no no
motif discovery yes yes yes yes yes yes enrichment in motifs from databases no no yes yes no enrichment in discovered motifs yes no no no no
peak scoring no no no yes yes no
motif clustering no no no no yes
comparison discovered motifs / motif DB yes no no yes yes
sequence scanning for site prediction yes no no yes no
positional distribution of sites inside peaks yes no yes no yes
visualization in genome browsers yes no yes no no
Motif discovery algorithms RSAT oligo-analysisRSAT dyad-analysis
RSAT position-analysisRSAT local-word-analysis+ in stand-alone version:
MEME ChIPMunk
ChipMunk ChipMunkMEME
Weeder
MEMEDREME
MEME MEMEWeeder
MotifSamplerBioProspector
GademImprobizerMDmodule
TrawlerMoAn
Pattern matching algorithms RSAT matrix-scan-quick no patser MAST + AME (enrichment)
no
Motif comparison algorithm RSAT compare-motifs no STAMP TOMTOM STAMPMotif clustering algorithm STAMPComparison between discovered motifs yes no yes no yesMotif database comparisons JASPAR
UNIPROBEDMMPMM
RegulonDBupload your own database
no JASPARTRANSFAC
JASPARTRANSFACUNIPROBE
FLYREGDPINTERACT
SCPDDMMPMM
and many others
no
Motif sizes variable (multiple word assembly)
user-specified <=25 for MEME<=12 for Weeder
<=13 for ChipMunk
predefined ranges(small, medium,
large, extra-large)Multiple motifs yes no yes yes yesRef (PMID) This article 20736340 21183585 21486936 20375099 21081511
Thomas-‐Chollier, Herrmann, Defrance, Sand, Thieffry, van Helden Nucleic Acids Research, 2012
Comparison of tools for ChIP-‐seq
![Page 44: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/44.jpg)
Table 1. Features of software tools used for analyzing motifs in ChIP-seq peak sequences.
Program
peak-m
oti
fs
Ch
ipM
un
k
Co
mp
lete
Mo
tifs
MEM
E-C
hIP
MIC
SA
Gim
meM
oti
fs
Web interface yes yes yes yes no no
Size limitation unrestricted (Web site tested with 22
Mb)
100kb (web site) 500kb (web site) unrestricted, but motif discovery restricted to 600 peaks clipped to
100bp
motif discovery restricted to a few hundred base pairs
-
Stand-alone version yes yes no yes yes yesTasks
peak finding no no no no yes no annotation of peak-flanking genes no no yes no no sequence composition (mono- and di-nucleotides)
yes no no no no
motif discovery yes yes yes yes yes yes enrichment in motifs from databases no no yes yes no enrichment in discovered motifs yes no no no no
peak scoring no no no yes yes no
motif clustering no no no no yes
comparison discovered motifs / motif DB yes no no yes yes
sequence scanning for site prediction yes no no yes no
positional distribution of sites inside peaks yes no yes no yes
visualization in genome browsers yes no yes no no
Motif discovery algorithms RSAT oligo-analysisRSAT dyad-analysis
RSAT position-analysisRSAT local-word-analysis+ in stand-alone version:
MEME ChIPMunk
ChipMunk ChipMunkMEME
Weeder
MEMEDREME
MEME MEMEWeeder
MotifSamplerBioProspector
GademImprobizerMDmodule
TrawlerMoAn
Pattern matching algorithms RSAT matrix-scan-quick no patser MAST + AME (enrichment)
no
Motif comparison algorithm RSAT compare-motifs no STAMP TOMTOM STAMPMotif clustering algorithm STAMPComparison between discovered motifs yes no yes no yesMotif database comparisons JASPAR
UNIPROBEDMMPMM
RegulonDBupload your own database
no JASPARTRANSFAC
JASPARTRANSFACUNIPROBE
FLYREGDPINTERACT
SCPDDMMPMM
and many others
no
Motif sizes variable (multiple word assembly)
user-specified <=25 for MEME<=12 for Weeder
<=13 for ChipMunk
predefined ranges(small, medium,
large, extra-large)Multiple motifs yes no yes yes yesRef (PMID) This article 20736340 21183585 21486936 20375099 21081511
Table 1. Features of software tools used for analyzing motifs in ChIP-seq peak sequences.
Program
peak-m
oti
fs
Ch
ipM
un
k
Co
mp
lete
Mo
tifs
MEM
E-C
hIP
MIC
SA
Gim
meM
oti
fs
Web interface yes yes yes yes no no
Size limitation unrestricted (Web site tested with 22
Mb)
100kb (web site) 500kb (web site) unrestricted, but motif discovery restricted to 600 peaks clipped to
100bp
motif discovery restricted to a few hundred base pairs
-
Stand-alone version yes yes no yes yes yesTasks
peak finding no no no no yes no annotation of peak-flanking genes no no yes no no sequence composition (mono- and di-nucleotides)
yes no no no no
motif discovery yes yes yes yes yes yes enrichment in motifs from databases no no yes yes no enrichment in discovered motifs yes no no no no
peak scoring no no no yes yes no
motif clustering no no no no yes
comparison discovered motifs / motif DB yes no no yes yes
sequence scanning for site prediction yes no no yes no
positional distribution of sites inside peaks yes no yes no yes
visualization in genome browsers yes no yes no no
Motif discovery algorithms RSAT oligo-analysisRSAT dyad-analysis
RSAT position-analysisRSAT local-word-analysis+ in stand-alone version:
MEME ChIPMunk
ChipMunk ChipMunkMEME
Weeder
MEMEDREME
MEME MEMEWeeder
MotifSamplerBioProspector
GademImprobizerMDmodule
TrawlerMoAn
Pattern matching algorithms RSAT matrix-scan-quick no patser MAST + AME (enrichment)
no
Motif comparison algorithm RSAT compare-motifs no STAMP TOMTOM STAMPMotif clustering algorithm STAMPComparison between discovered motifs yes no yes no yesMotif database comparisons JASPAR
UNIPROBEDMMPMM
RegulonDBupload your own database
no JASPARTRANSFAC
JASPARTRANSFACUNIPROBE
FLYREGDPINTERACT
SCPDDMMPMM
and many others
no
Motif sizes variable (multiple word assembly)
user-specified <=25 for MEME<=12 for Weeder
<=13 for ChipMunk
predefined ranges(small, medium,
large, extra-large)Multiple motifs yes no yes yes yesRef (PMID) This article 20736340 21183585 21486936 20375099 21081511
Thomas-‐Chollier, Herrmann, Defrance, Sand, Thieffry, van Helden Nucleic Acids Research, 2012
Comparison of tools for ChIP-‐seq
![Page 45: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/45.jpg)
• fast and scalable • treat full-‐size datasets
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
0 10 20 30 40 50 60 70 80 90 100
Tim
e (s
econ
ds)
sequence size (Mb)
peak-motifs: oligo-analysis-7ntpeak-motifs: dyad-analysispeak-motifs: position-analysis-7ntpeak-motifs: local-words-7ntdremechipmunkmeme
typical ChIP-seq dataset
1h
Thomas-‐Chollier, Herrmann, Defrance, Sand, Thieffry, van Helden Nucleic Acids Research, 2012
size limit of other websites
RSAT peak-‐mo8fs
![Page 46: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/46.jpg)
• fast and scalable • treat full-‐size datasets • complete pipeline
Thomas-‐Chollier, Herrmann, Defrance, Sand, Thieffry, van Helden Nucleic Acids Research, 2012
RSAT peak-‐mo8fs
![Page 47: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/47.jpg)
• fast and scalable • treat full-‐size datasets • complete pipeline • web interface
Thomas-‐Chollier, Defrance, Medina-‐Rivera, Sand, Herrmann, Thieffry, van Helden Nucleic Acids Research, 2011 Medina-‐Rivera, Abreu-‐Goodger, Thomas-‐Chollier, Salgado, Collado-‐Vides, van Helden Nucleic Acids Research, 2011 Sand, Thomas-‐Chollier, van Helden Bioinforma.cs, 2009 Thomas-‐Chollier*, Sand*, Turatsinze, Janky, Defrance, Vervisch, van Helden Nucleic Acids Research, 2008 Sand, Thomas-‐Chollier, Vervisch, van Helden Nature Protocols, 2008 Thomas-‐Chollier*, Turatsinze*, Defrance, van Helden Nature Protocols, 2008 van Helden, Nucleic Acids Research, 2003
Jacques van Helden
within RSAT
RSAT peak-‐mo8fs
![Page 48: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/48.jpg)
![Page 49: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/49.jpg)
RSAT peak-‐mo8fs
• fast and scalable • treat full-‐size datasets • complete pipeline • web interface • accessible to non-‐specialists • using 4 complementary algorithms
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
0 10 20 30 40 50 60 70 80 90 100
Tim
e (s
econ
ds)
sequence size (Mb)
peak-motifs: oligo-analysis-7ntpeak-motifs: dyad-analysispeak-motifs: position-analysis-7ntpeak-motifs: local-words-7ntdremechipmunkmeme
- Global over-‐representa8on - oligo-‐analysis - dyad-‐analysis (spaced mo8fs)
- Posi8onal bias
- posi8on-‐analysis - local-‐words
!"#$%&'$()(**+***&,&
!"#$%#"&'"()"#%
*+),-./'(0%#"&'"()"#%123"(%+4+56+76"8%
!&+-#./012-3&-4&."516/$&'783-#816,&139&:"516/$&'#/62"0$;23)&<-%%$<2-3,&
97#".4"0%4#%":;")$"0%<=>".%/))'.."()"#%
:=.$<0$9&-<</%%$3<$;&'>0?&-%9$%&@1%A-5&#-9$6,&&
B7;$%5$9&'0$;0&;$C
/$3<$;,&
?%97#".4"0&!"#$%&
-<</%%$3<$;&<-#./0$9&4%-#D&&&
@:;")$"0&!"#$%&-<</%%$3<$;&<-#./0$9&
4%-#D&&
!3"/."A)+6%,=>".#%B."&'"()5"#%B./>%$"#$%
#"&'"()"#%
BE&
*%
F"#$%&'$()(**+***G&,&
97#".4"0%/))'.."()"#%;".%25(0/2% @:;")$"0%/))'.."()"#%;".%25(0/2%B/66/25(-%+(%3/>/-"("/'#%>/0"6%
H839-I;&>J&30& H839-I;&>J&30&
.-;82-3& .-;82-3&
-<</%%$3
<$;&
K& K&
C%!"#$%&'$()(**+***&,&
:=.$<0$9&-<</%%$3<$;&83&I?-6$&;$C/$3<$;&
B7;$%5$9&83&I839-
I&
97#".4"0%4#%":;")$"0%<=>".%/))'.."()"#%97#".4"0%/))'."()"#%;".%25(0/2%
![Page 50: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/50.jpg)
Mo8f discovery methods: frequency
!"#$%&'$()(**+***&,&
!"#$%#"&'"()"#%
*+),-./'(0%#"&'"()"#%123"(%+4+56+76"8%
!&+-#./012-3&-4&."516/$&'783-#816,&139&:"516/$&'#/62"0$;23)&<-%%$<2-3,&
97#".4"0%4#%":;")$"0%<=>".%/))'.."()"#%
:=.$<0$9&-<</%%$3<$;&'>0?&-%9$%&@1%A-5&#-9$6,&&
B7;$%5$9&'0$;0&;$C
/$3<$;,&
?%97#".4"0&!"#$%&
-<</%%$3<$;&<-#./0$9&4%-#D&&&
@:;")$"0&!"#$%&-<</%%$3<$;&<-#./0$9&
4%-#D&&
!3"/."A)+6%,=>".#%B."&'"()5"#%B./>%$"#$%
#"&'"()"#%
BE&
*%
F"#$%&'$()(**+***G&,&
97#".4"0%/))'.."()"#%;".%25(0/2% @:;")$"0%/))'.."()"#%;".%25(0/2%B/66/25(-%+(%3/>/-"("/'#%>/0"6%
H839-I;&>J&30& H839-I;&>J&30&
.-;82-3& .-;82-3&
-<</%%$3
<$;&
K& K&
C%!"#$%&'$()(**+***&,&
:=.$<0$9&-<</%%$3<$;&83&I?-6$&;$C/$3<$;&
B7;$%5$9&83&I839-
I&
97#".4"0%4#%":;")$"0%<=>".%/))'.."()"#%97#".4"0%/))'."()"#%;".%25(0/2%
oligo-‐analysis dyad-‐analysis (spaced mo8fs)
Thomas-‐Chollier, Darbo, Herrmann, Defrance, Thieffry, van Helden Nature Protocols,2012
![Page 51: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/51.jpg)
!"#$%&'$()(**+***&,&
!"#$%#"&'"()"#%
*+),-./'(0%#"&'"()"#%123"(%+4+56+76"8%
!&+-#./012-3&-4&."516/$&'783-#816,&139&:"516/$&'#/62"0$;23)&<-%%$<2-3,&
97#".4"0%4#%":;")$"0%<=>".%/))'.."()"#%
:=.$<0$9&-<</%%$3<$;&'>0?&-%9$%&@1%A-5&#-9$6,&&
B7;$%5$9&'0$;0&;$C
/$3<$;,&
?%97#".4"0&!"#$%&
-<</%%$3<$;&<-#./0$9&4%-#D&&&
@:;")$"0&!"#$%&-<</%%$3<$;&<-#./0$9&
4%-#D&&
!3"/."A)+6%,=>".#%B."&'"()5"#%B./>%$"#$%
#"&'"()"#%
BE&
*%
F"#$%&'$()(**+***G&,&
97#".4"0%/))'.."()"#%;".%25(0/2% @:;")$"0%/))'.."()"#%;".%25(0/2%B/66/25(-%+(%3/>/-"("/'#%>/0"6%
H839-I;&>J&30& H839-I;&>J&30&
.-;82-3& .-;82-3&
-<</%%$3
<$;&
K& K&
C%!"#$%&'$()(**+***&,&
:=.$<0$9&-<</%%$3<$;&83&I?-6$&;$C/$3<$;&
B7;$%5$9&83&I839-
I&
97#".4"0%4#%":;")$"0%<=>".%/))'.."()"#%97#".4"0%/))'."()"#%;".%25(0/2%
posi8on-‐analysis local-‐words
Mo8f discovery methods: posi8onal bias
Thomas-‐Chollier, Darbo, Herrmann, Defrance, Thieffry, van Helden Nature Protocols,2012
![Page 52: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/52.jpg)
Direct versus indirect binding
• ChIP-‐seq does not necessarily reveal direct binding
Direct binding Indirect binding
• The moTf of the targeted TF is not always found in peaks !
![Page 53: ChIP%seq)analysis) · TheChIPseqera 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pubmed hits per year for "ChiP-Seq" 0 100 200 300 400 • JohnsonDS, Mortazavi)A,)Myers)RM,)WoldB(2007](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e277f1e42a799176935a/html5/thumbnails/53.jpg)
To read further …
• Prac8cal Guidelines for the Comprehensive Analysis of ChIP-‐seq Data » Tim Bailey – PLOS ComputaTonal Biology 9:11 2013
• ChIP–seq and beyond: new and improved methodologies to detect and characterize protein–DNA interac8ons » Terrence S. Furey -‐ Nature Reviews GeneTcs 13, 840-‐852 (December
2012)
• ChIP-‐Seq: advantages and challenges of a maturing technology » Peter J. Park -‐ Nat Rev Genet. 2009 October; 10(10): 669–680
• Computa8on for ChIP-‐seq and RNA-‐seq studies » Shirley Pepke et al -‐ Nature Methods 6, S22 -‐ S32 (2009)