![Page 1: A method for high throughput sequencing data analysis ...iml.univ-mrs.fr/sta/SMPGD2010/slides/SMPGD10-Andrau-slide.pdf · A method for high throughput sequencing data analysis: application](https://reader034.vdocument.in/reader034/viewer/2022042212/5eb5b2678d81b36a10358a1b/html5/thumbnails/1.jpg)
A method for high throughput sequencing data analysis: application for mapping genome-wide protein-DNA binding sites
(ChIPseq)
JC Andrau, Biostat, 15/01/2010
1 2 3 7 8 94 5 6
T G C T A C G A T
![Page 2: A method for high throughput sequencing data analysis ...iml.univ-mrs.fr/sta/SMPGD2010/slides/SMPGD10-Andrau-slide.pdf · A method for high throughput sequencing data analysis: application](https://reader034.vdocument.in/reader034/viewer/2022042212/5eb5b2678d81b36a10358a1b/html5/thumbnails/2.jpg)
High thoughput sequencing applications
Epigenetic marks mapping and identification of regulatory sequences of gene expression (ChIP-seq)
Protein-DNA interaction
Genomesequencing
Human gene mapping
Qualitative (SNP) and quantitative (amplification) genetic variations
de novo sequencing of model organisms and pathogens
Transcriptome(RNAseq)
Identification and analysis of non coding RNAs (miRNA, etc.)
Monitoring gene expression in covering all the alternative messengers to a given locus in a variety of contexts
![Page 3: A method for high throughput sequencing data analysis ...iml.univ-mrs.fr/sta/SMPGD2010/slides/SMPGD10-Andrau-slide.pdf · A method for high throughput sequencing data analysis: application](https://reader034.vdocument.in/reader034/viewer/2022042212/5eb5b2678d81b36a10358a1b/html5/thumbnails/3.jpg)
ChIP-seq: Solexa procedure
PCR + size exclusion(gel extraction)
Loading in flowcell and cluster amplification
Image acquisition and base calling
![Page 4: A method for high throughput sequencing data analysis ...iml.univ-mrs.fr/sta/SMPGD2010/slides/SMPGD10-Andrau-slide.pdf · A method for high throughput sequencing data analysis: application](https://reader034.vdocument.in/reader034/viewer/2022042212/5eb5b2678d81b36a10358a1b/html5/thumbnails/4.jpg)
Sequencing and alignment
• Sequencing extremities of DNA fragments
• RAW data files (sequences)
• Aligned against a reference genome
– MAQ
– Solexa…
![Page 5: A method for high throughput sequencing data analysis ...iml.univ-mrs.fr/sta/SMPGD2010/slides/SMPGD10-Andrau-slide.pdf · A method for high throughput sequencing data analysis: application](https://reader034.vdocument.in/reader034/viewer/2022042212/5eb5b2678d81b36a10358a1b/html5/thumbnails/5.jpg)
First steps of data analysis
![Page 6: A method for high throughput sequencing data analysis ...iml.univ-mrs.fr/sta/SMPGD2010/slides/SMPGD10-Andrau-slide.pdf · A method for high throughput sequencing data analysis: application](https://reader034.vdocument.in/reader034/viewer/2022042212/5eb5b2678d81b36a10358a1b/html5/thumbnails/6.jpg)
First steps of data analysis
![Page 7: A method for high throughput sequencing data analysis ...iml.univ-mrs.fr/sta/SMPGD2010/slides/SMPGD10-Andrau-slide.pdf · A method for high throughput sequencing data analysis: application](https://reader034.vdocument.in/reader034/viewer/2022042212/5eb5b2678d81b36a10358a1b/html5/thumbnails/7.jpg)
DNA fragments VS sequences
• Only extremities of DNA fragments are sequenced
• Enriched regions don’t represent exact binding site
• In-silico process to elongate the tags
+
Strand
-
Strand
Binding
Site
![Page 8: A method for high throughput sequencing data analysis ...iml.univ-mrs.fr/sta/SMPGD2010/slides/SMPGD10-Andrau-slide.pdf · A method for high throughput sequencing data analysis: application](https://reader034.vdocument.in/reader034/viewer/2022042212/5eb5b2678d81b36a10358a1b/html5/thumbnails/8.jpg)
Elongation process
Strand +
Strand -
Shifting (bp)
Ove
rla
p
![Page 9: A method for high throughput sequencing data analysis ...iml.univ-mrs.fr/sta/SMPGD2010/slides/SMPGD10-Andrau-slide.pdf · A method for high throughput sequencing data analysis: application](https://reader034.vdocument.in/reader034/viewer/2022042212/5eb5b2678d81b36a10358a1b/html5/thumbnails/9.jpg)
Score per nucleotide
![Page 10: A method for high throughput sequencing data analysis ...iml.univ-mrs.fr/sta/SMPGD2010/slides/SMPGD10-Andrau-slide.pdf · A method for high throughput sequencing data analysis: application](https://reader034.vdocument.in/reader034/viewer/2022042212/5eb5b2678d81b36a10358a1b/html5/thumbnails/10.jpg)
Score per nucleotide
![Page 11: A method for high throughput sequencing data analysis ...iml.univ-mrs.fr/sta/SMPGD2010/slides/SMPGD10-Andrau-slide.pdf · A method for high throughput sequencing data analysis: application](https://reader034.vdocument.in/reader034/viewer/2022042212/5eb5b2678d81b36a10358a1b/html5/thumbnails/11.jpg)
Further analysis
![Page 12: A method for high throughput sequencing data analysis ...iml.univ-mrs.fr/sta/SMPGD2010/slides/SMPGD10-Andrau-slide.pdf · A method for high throughput sequencing data analysis: application](https://reader034.vdocument.in/reader034/viewer/2022042212/5eb5b2678d81b36a10358a1b/html5/thumbnails/12.jpg)
Artefacts removal and normalisation
• An input experiment helps to localize problematic regions in alignment (duplications, reference genome…)
– We shouldn’t see enrichment in input
– These regions were removed from all datasets
• Based on the average of the scores in the whole genome, we can estimate the BG level and then rescale all experiments according to this level
• Last step consists of subtracting the input from the datasets in order to reduce the variations effects and the background in the data
![Page 13: A method for high throughput sequencing data analysis ...iml.univ-mrs.fr/sta/SMPGD2010/slides/SMPGD10-Andrau-slide.pdf · A method for high throughput sequencing data analysis: application](https://reader034.vdocument.in/reader034/viewer/2022042212/5eb5b2678d81b36a10358a1b/html5/thumbnails/13.jpg)
Pipeline for ChIPseq data Analysis
- ChIP, QCs, sequencing and original file genesis- Alignment against a reference genome (Eland)
Conversion to gff format in RArtefact and multiple matches
removal
Elongation of tags, merge of
both strands and data bining
Input or mock data set substraction,
data normalisation
Data analysis and visualisation
![Page 14: A method for high throughput sequencing data analysis ...iml.univ-mrs.fr/sta/SMPGD2010/slides/SMPGD10-Andrau-slide.pdf · A method for high throughput sequencing data analysis: application](https://reader034.vdocument.in/reader034/viewer/2022042212/5eb5b2678d81b36a10358a1b/html5/thumbnails/14.jpg)
ChIPseq and ChIP-on-Chip
![Page 15: A method for high throughput sequencing data analysis ...iml.univ-mrs.fr/sta/SMPGD2010/slides/SMPGD10-Andrau-slide.pdf · A method for high throughput sequencing data analysis: application](https://reader034.vdocument.in/reader034/viewer/2022042212/5eb5b2678d81b36a10358a1b/html5/thumbnails/15.jpg)
Recruitment
CTD phosphorylations and transcription
The CTD is a heptapeptide repetition (Y S P T S P S)n of the largest Pol II subunit conserved from yeast (26x) to human (52x). ?
Initiation Elongation (productive)
![Page 16: A method for high throughput sequencing data analysis ...iml.univ-mrs.fr/sta/SMPGD2010/slides/SMPGD10-Andrau-slide.pdf · A method for high throughput sequencing data analysis: application](https://reader034.vdocument.in/reader034/viewer/2022042212/5eb5b2678d81b36a10358a1b/html5/thumbnails/16.jpg)
Core et al, Science 2008
TSS profiling of CTD and S5P overlaps with sense/antisensetranscription
Bin
din
g le
ve
l
Pol II Binding around TSSs
![Page 17: A method for high throughput sequencing data analysis ...iml.univ-mrs.fr/sta/SMPGD2010/slides/SMPGD10-Andrau-slide.pdf · A method for high throughput sequencing data analysis: application](https://reader034.vdocument.in/reader034/viewer/2022042212/5eb5b2678d81b36a10358a1b/html5/thumbnails/17.jpg)
K mean clustering of top 20% Pol II S5-P
1
2
3
4
5
6
7
8
9
1 2 3 4
5 6 7
8 9 10
10
Right
to
TSS
Centered
Left
to
TSS
Clustering indicates several populations of initiating Pol II around TSS
![Page 18: A method for high throughput sequencing data analysis ...iml.univ-mrs.fr/sta/SMPGD2010/slides/SMPGD10-Andrau-slide.pdf · A method for high throughput sequencing data analysis: application](https://reader034.vdocument.in/reader034/viewer/2022042212/5eb5b2678d81b36a10358a1b/html5/thumbnails/18.jpg)
PF lab, CIML MarseilleRomain FenouilFred KochPierre CauchyPierre Ferrier
CNG EvryIvo GutMarta Gut
GSF Cancer Institute, MunichDirk EickMartin HeidemannCorinna Hintermair
Many thanks to…