tutorial 6 high throughput sequencing. hts tools and analysis review of resequencing pipeline...
TRANSCRIPT
Tutorial 6
High Throughput Sequencing
HTS tools and analysis
• Review of resequencing pipeline
• Visualization - IGV
• Analysis platform – Galaxy
• Tuning up the pipelines
Review of resequencing pipeline
Demultiplexing
LaneUnknown inserts
Reference Genome
SampleMapping
Demultiplexing
Example of mapping parameters:• Number of mismatches per read • Scores for mismatch or gaps
Mapping parameters affect the rest of the analysis
Removing duplicates and non-unique mappings
Mapping
Demultiplexing
Reference Genome
Reference Genome
?
Resequencing/ Exome Pipeline
Coverage profile and variant calling
Removing duplicates and non-unique mappings
Mapping
Demultiplexing
Reference Genome
…ACTTCGTCGAAAGG…
G
Coverage profile and variant calling
Removing duplicates and non-unique mappings
Mapping
Demultiplexing
Variant filtering
Reference Genome
…ACTTCGTCGAAAGG…
Reference Genome
…ACTTCGTCGAAAGG…
Frequency >= 20%
Coverage >= 5
Variant calling
Removing duplicates and non-unique mappings
Mapping
Demultiplexing
Variant filtering
Genes and known variants
Reference Genome
…ACTTCGTCGAAATG… …GTCCCGTGATACTCCGT…
GA
rs230985Gene X
Resequencing results
Working with IGV:// . . / /http www broadinstitute org igv
Integrative Genome Viewer
IGV is a high-performance visualization tool for interactive exploration of large, integrated genomic datasets. It supports a wide variety of data types, including array-based and next-generation sequence data, and genomic annotations.
Genome used for mapping
Name of sample (BAM file)
Lowest resolution of the genome (zoomed out)
Zooming in
Zooming inCoverage track
Alignment track
Zoomed in until we get to the base pair value
SNP
Hover over the coverage track in order to see details regarding all bases in a specific position
Can we trust this SNP?
Hover over the alignment track in order to see details regarding a specific read
What is the quality of this read and its mapping?
Right-click on alignment track to change view of this track
Color reads by strand to verify there is no strand bias
Why and how to work with IGV
Base qualities, comparison between samples
False positive indels
Same mapping statistics – different meaning
What might cause this low percentage of mapping?
The sample contains a high percentage of contamination
The sample is very different from the reference genome
One image is worth a thousand words…
Structural Variations
Large deletion in the sample compared to the reference genome
Galaxy
:// . 2. . .https main g bx psu edu
/
Use your account name and password to login to Galaxy:
Uploading data to Galaxy
Use the “eye” icon to view the contents of a file
Mapping, filtering and conversion to BAM
Mapping
Filter SAM file
Convert SAM to BAM
Variant calling
Create pileup
Find variants
Tuning up the pipelines
1 mismatch per read
5 mismatches per read
How can mapping parameters affect the results
False positives vs. true negatives
3-bases insertion
One pipeline for all projects?
How can you tune your analysis?Try different programs.
Mapping:– Change mapping parameters– Use non-unique mappings– Don’t filter duplicates
Variants:– Change variant filtration – Change variant merging – penetrance, different heredity, low coverage in
one individual…– Look for bigger variants: big insertions/ deletions, inversions, copy number
variations etc.