edacc primary analysis pipelines
DESCRIPTION
EDACC Primary Analysis Pipelines. Cristian Coarfa Bioinformatics Research Laboratory Molecular and Human Genetics. Data Levels. ChIP-Seq Shotgun Bisulfite Sequencing Methyl-C Reduced Representation Bisulfite Sequencing RRBS MRE-Seq MeDIP-Seq Chromatin Accessibility - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: EDACC Primary Analysis Pipelines](https://reader036.vdocument.in/reader036/viewer/2022062520/568158d5550346895dc61c73/html5/thumbnails/1.jpg)
EDACCPrimary Analysis Pipelines
Cristian CoarfaBioinformatics Research Laboratory
Molecular and Human Genetics
![Page 2: EDACC Primary Analysis Pipelines](https://reader036.vdocument.in/reader036/viewer/2022062520/568158d5550346895dc61c73/html5/thumbnails/2.jpg)
Data Levels
![Page 3: EDACC Primary Analysis Pipelines](https://reader036.vdocument.in/reader036/viewer/2022062520/568158d5550346895dc61c73/html5/thumbnails/3.jpg)
Data Types Submitted To EDACC
• ChIP-Seq • Shotgun Bisulfite Sequencing
– Methyl-C
• Reduced Representation Bisulfite Sequencing– RRBS
• MRE-Seq • MeDIP-Seq • Chromatin Accessibility • small RNA-Seq • mRNA-Seq
![Page 4: EDACC Primary Analysis Pipelines](https://reader036.vdocument.in/reader036/viewer/2022062520/568158d5550346895dc61c73/html5/thumbnails/4.jpg)
Read Mapping• Common processing step to all pipelines• High throughput
– Sequence space: Illumina– Color space: SOLID
• Quick and accurate anchoring• Reads size varies 36-76 bp• Short read aligners
– 1st generation: Maq, soap• Ungapped alignment
– 2nd generation: bowtie, bwa, soap 2• Tradeoff speed for sensitivity, good enough for many applications
• Mapping tools– Robust to indels– Sensitive to variable number of mismatches
![Page 5: EDACC Primary Analysis Pipelines](https://reader036.vdocument.in/reader036/viewer/2022062520/568158d5550346895dc61c73/html5/thumbnails/5.jpg)
Pash 3.0
• Positional Hashing
• Regular reads mapping• Bisulfite sequencing mapping• Integrate basepair variation with epigenetic variation
• SAM output, easy integration with other analysis tools• Accuracy without sacrificing efficiency
![Page 6: EDACC Primary Analysis Pipelines](https://reader036.vdocument.in/reader036/viewer/2022062520/568158d5550346895dc61c73/html5/thumbnails/6.jpg)
Bisulfite Sequencing• Current tools: BSMAP, RMAP-BS, mrsFast, Zoom
• Pash 3.0– Integrate mutation discovery with basepair-level methylation discovery– Speedup
• General approach– Covert C’s to T’s in reads and/or reference– Use mappings, reads and reference to determine methylated sites
• Pash 3– Generate and hash all possible kmers for reads– CTT: CCC, CCT, CTC, CTT– Map against forward and reverse complement chromosome strands
• Superior sensitivity to other tools, without loss of efficiency
![Page 7: EDACC Primary Analysis Pipelines](https://reader036.vdocument.in/reader036/viewer/2022062520/568158d5550346895dc61c73/html5/thumbnails/7.jpg)
Galaxy/Genboree
• Developed at Penn State University• Benefits
– Rapid deployment tool– Share pipelines w/ others
• Alan Harris, Sriram Raghuram– Deployed Galaxy/Genboree– Integration w/ Genboree
• API for upload/download– Adaptors for LFF file format support– EDACC XML validation tools
• Sriram Raghuram, Andrew Jackson, Cristian Coarfa– Integration with compute clusters
• Arpit Tandon, Sriram Raghuram– Deployed analysis tools
http://genboree.org/galaxy
![Page 8: EDACC Primary Analysis Pipelines](https://reader036.vdocument.in/reader036/viewer/2022062520/568158d5550346895dc61c73/html5/thumbnails/8.jpg)
Primary Analysis Pipelines
• Implemented & exposed via Galaxy/Genboree– Read mapping– Bisulfite Sequencing read mapping– Peak calling (ChIP-Seq, MeDIP-Seq)
• MACS (Harvard), FindPeaks (UBC)– Chromatin accessibility
• HotSpot (UW)– Small RNA-seq
• Coming soon– mRNA seq– Expression, alternative splicing– Gene fusion
• Typical user interaction– Use Galaxy for user input– Submit jobs to a cluster– Upload results to Genboree
![Page 9: EDACC Primary Analysis Pipelines](https://reader036.vdocument.in/reader036/viewer/2022062520/568158d5550346895dc61c73/html5/thumbnails/9.jpg)
Reads Mapping
![Page 10: EDACC Primary Analysis Pipelines](https://reader036.vdocument.in/reader036/viewer/2022062520/568158d5550346895dc61c73/html5/thumbnails/10.jpg)
ChIP-Seq
• Select uniquely mapping reads • Build read density maps
– Extend each read 200bp along the mapping strand– Remove monoclonal reads– Generate WIG data– Can be visualized in Genboree and UCSC
• Peak calling– FindPeaks, MACS
• Intepret Peaks– Overlap with genomic features of interest: gene promoters, etc
![Page 11: EDACC Primary Analysis Pipelines](https://reader036.vdocument.in/reader036/viewer/2022062520/568158d5550346895dc61c73/html5/thumbnails/11.jpg)
MeDIP-Seq
• Select uniquely mapping reads • Build read density maps• Determine methylated CpGs
– FindPeaks
![Page 12: EDACC Primary Analysis Pipelines](https://reader036.vdocument.in/reader036/viewer/2022062520/568158d5550346895dc61c73/html5/thumbnails/12.jpg)
Finding methylated CpGs
![Page 13: EDACC Primary Analysis Pipelines](https://reader036.vdocument.in/reader036/viewer/2022062520/568158d5550346895dc61c73/html5/thumbnails/13.jpg)
MeDIP-Seq Signal Visualization
![Page 14: EDACC Primary Analysis Pipelines](https://reader036.vdocument.in/reader036/viewer/2022062520/568158d5550346895dc61c73/html5/thumbnails/14.jpg)
MRE-Seq
• Select uniquely mapping reads • Determine unmethylated CpGs
![Page 15: EDACC Primary Analysis Pipelines](https://reader036.vdocument.in/reader036/viewer/2022062520/568158d5550346895dc61c73/html5/thumbnails/15.jpg)
Bisulfite Sequencing
• Shotgun Bisulfite Sequencing– Methyl-C– Genome wide
• Reduced Representation Bisulfite Sequencing– RRBS– Enzyme cocktail
• Map using Pash• Build methylation maps
![Page 16: EDACC Primary Analysis Pipelines](https://reader036.vdocument.in/reader036/viewer/2022062520/568158d5550346895dc61c73/html5/thumbnails/16.jpg)
Bisulfite Sequencing Read Mapping
![Page 17: EDACC Primary Analysis Pipelines](https://reader036.vdocument.in/reader036/viewer/2022062520/568158d5550346895dc61c73/html5/thumbnails/17.jpg)
Methylation Maps
Position Strand CHHStatus Methylation Unmethylated TotalReads50100242 + CG 1 0 150100243 - CG 40 11 5150100250 + CG 1 0 150100251 - CG 37 8 46
![Page 18: EDACC Primary Analysis Pipelines](https://reader036.vdocument.in/reader036/viewer/2022062520/568158d5550346895dc61c73/html5/thumbnails/18.jpg)
Small RNA-Seq
• Trim adapters
• Map reads onto target genome– up to 100 locations per read
• Interpret– Overlap w/ miRNAs, piRNAs, sno/scaRNAs
![Page 19: EDACC Primary Analysis Pipelines](https://reader036.vdocument.in/reader036/viewer/2022062520/568158d5550346895dc61c73/html5/thumbnails/19.jpg)
Exercise
• Download the input MeDIP-Seq file from the workshop wiki
• Analyze it using FindPeaks in Galaxy– Obtain results in Genboree Lff format
• Upload the results to Genboree database
• View the results in a tabular view
• Find the largest peaks
• Explore them in the Genboree browser