sequencing errors and biases biological sequence analysis bnfo 691/602 spring 2013 mark reimers
TRANSCRIPT
![Page 1: Sequencing Errors and Biases Biological Sequence Analysis BNFO 691/602 Spring 2013 Mark Reimers](https://reader036.vdocument.in/reader036/viewer/2022062407/56649db45503460f94aa4890/html5/thumbnails/1.jpg)
Sequencing Errors and Biases
Biological Sequence AnalysisBNFO 691/602 Spring 2013
Mark Reimers
![Page 2: Sequencing Errors and Biases Biological Sequence Analysis BNFO 691/602 Spring 2013 Mark Reimers](https://reader036.vdocument.in/reader036/viewer/2022062407/56649db45503460f94aa4890/html5/thumbnails/2.jpg)
Outline
• Sequencing errors• Initiation biases• Quantification biases• Are biases consistent across samples?• Compensating biases
![Page 3: Sequencing Errors and Biases Biological Sequence Analysis BNFO 691/602 Spring 2013 Mark Reimers](https://reader036.vdocument.in/reader036/viewer/2022062407/56649db45503460f94aa4890/html5/thumbnails/3.jpg)
Types of mismatches in Illumina data are profoundly asymmetric and biased
Courtesy Thierry-Miegfrom uniquely mapped tags with a single mismatch
![Page 4: Sequencing Errors and Biases Biological Sequence Analysis BNFO 691/602 Spring 2013 Mark Reimers](https://reader036.vdocument.in/reader036/viewer/2022062407/56649db45503460f94aa4890/html5/thumbnails/4.jpg)
Position of single mismatch in uniquely mapped tags
Courtesy Thierry-Mieg
![Page 5: Sequencing Errors and Biases Biological Sequence Analysis BNFO 691/602 Spring 2013 Mark Reimers](https://reader036.vdocument.in/reader036/viewer/2022062407/56649db45503460f94aa4890/html5/thumbnails/5.jpg)
Initiation Biases
![Page 6: Sequencing Errors and Biases Biological Sequence Analysis BNFO 691/602 Spring 2013 Mark Reimers](https://reader036.vdocument.in/reader036/viewer/2022062407/56649db45503460f94aa4890/html5/thumbnails/6.jpg)
Nucleotide frequencies versus position for stringently mapped reads.
Hansen K D et al. Nucl. Acids Res. 2010;38:e131-e131
© The Author(s) 2010. Published by Oxford University Press.
![Page 7: Sequencing Errors and Biases Biological Sequence Analysis BNFO 691/602 Spring 2013 Mark Reimers](https://reader036.vdocument.in/reader036/viewer/2022062407/56649db45503460f94aa4890/html5/thumbnails/7.jpg)
Start Position Bias is Visible in MT-RNA
![Page 8: Sequencing Errors and Biases Biological Sequence Analysis BNFO 691/602 Spring 2013 Mark Reimers](https://reader036.vdocument.in/reader036/viewer/2022062407/56649db45503460f94aa4890/html5/thumbnails/8.jpg)
Start Position Bias is Consistent Across Samples
Counts per start site in lane 1 vs lane 2 (Marioni et al, Gen Res, 2008)
![Page 9: Sequencing Errors and Biases Biological Sequence Analysis BNFO 691/602 Spring 2013 Mark Reimers](https://reader036.vdocument.in/reader036/viewer/2022062407/56649db45503460f94aa4890/html5/thumbnails/9.jpg)
Quantification Biases
![Page 10: Sequencing Errors and Biases Biological Sequence Analysis BNFO 691/602 Spring 2013 Mark Reimers](https://reader036.vdocument.in/reader036/viewer/2022062407/56649db45503460f94aa4890/html5/thumbnails/10.jpg)
Consistent Technology-Specific Biases
(a) 25-kb region of chromosome 11 amplified by three long-range PCR products (red rectangles). (b) A heat-map colored matrix displays the correlation of coverage depth across 260 kb of sequence between four samples by three technologies from Harrismendy et al Genome Biology 2009
![Page 11: Sequencing Errors and Biases Biological Sequence Analysis BNFO 691/602 Spring 2013 Mark Reimers](https://reader036.vdocument.in/reader036/viewer/2022062407/56649db45503460f94aa4890/html5/thumbnails/11.jpg)
Quantitative Biases
• Not all regions represented equally• GC rich regions represented more• Independent of GC some chromosome regions
represented more – Euchromatin bias
• Sequence initiation site biases• ‘Mapability’ biases – some regions won’t have
any uniquely mapped tags
![Page 12: Sequencing Errors and Biases Biological Sequence Analysis BNFO 691/602 Spring 2013 Mark Reimers](https://reader036.vdocument.in/reader036/viewer/2022062407/56649db45503460f94aa4890/html5/thumbnails/12.jpg)
GC Bias
• Density of reads depends strongly on GC content of regions
• Most bias seems to come from PCR reaction
• Newer techniques show less bias but still strong GC content (%) of 1 kb region
Num
ber o
f Rea
ds in
1 k
b re
gion
From Dohm et al 2008
![Page 13: Sequencing Errors and Biases Biological Sequence Analysis BNFO 691/602 Spring 2013 Mark Reimers](https://reader036.vdocument.in/reader036/viewer/2022062407/56649db45503460f94aa4890/html5/thumbnails/13.jpg)
GC Bias depends on temperature
• Aird et al (Genome Biology 2011) did systematic tests of effects of various conditions on GC bias
• They provided protocols that improve CG bias but don’t eliminate it
NB. Log scale
![Page 14: Sequencing Errors and Biases Biological Sequence Analysis BNFO 691/602 Spring 2013 Mark Reimers](https://reader036.vdocument.in/reader036/viewer/2022062407/56649db45503460f94aa4890/html5/thumbnails/14.jpg)
Even Best Protocols have Bias• GC bias in Illumina reads from
a 400-bp fragment library amplified using the standard PCR protocol (Phusion HF, short denaturation) on a fast-ramping thermocycler (red squares), Phusion HF with long denaturation and 2M betaine (black triangles), AccuPrime Taq HiFi with long denaturation and primer extension at 65°C (blue diamonds) or 60°C (purple diamonds)
From Aird et al Genome Biology 2011
![Page 15: Sequencing Errors and Biases Biological Sequence Analysis BNFO 691/602 Spring 2013 Mark Reimers](https://reader036.vdocument.in/reader036/viewer/2022062407/56649db45503460f94aa4890/html5/thumbnails/15.jpg)
Biases Are NOT Consistent
• The plot on left shows Log-fold changes between RPKM values from two biological replicates (NA11918, NA12761) from the data of Montgomery et al, Nature 2010
• From Hansen et al 2012