gsnap: fast and snp-tolerant detection of complex variants ... · gsnap: fast and snp-tolerant...
TRANSCRIPT
![Page 1: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/1.jpg)
GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short readsby Thomas D. Wu and Serban Nacu
Matt HuskaFreie Universität Berlin
Computational Methods for High-Throughput Omics Data, WS 2011
![Page 2: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/2.jpg)
Outline
IntroductionMotivationGSNAP FeaturesExamples of Complex Variant Detection
The AlgorithmSummaryPreprocessingMethod 1: Spanning Set Generation and FilteringMethod 2: Complete Set Generation and FilteringVerification of Candidate RegionsDetecting Insertions and DeletionsDetecting Splice Junctions
ResultsSimulated ReadsTranscriptional ReadsLimitations
Conclusions
,
FU Berlin, GSNAP, Omics WS 2011 2
![Page 3: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/3.jpg)
Outline
IntroductionMotivationGSNAP FeaturesExamples of Complex Variant Detection
The AlgorithmSummaryPreprocessingMethod 1: Spanning Set Generation and FilteringMethod 2: Complete Set Generation and FilteringVerification of Candidate RegionsDetecting Insertions and DeletionsDetecting Splice Junctions
ResultsSimulated ReadsTranscriptional ReadsLimitations
Conclusions
,
FU Berlin, GSNAP, Omics WS 2011 2
![Page 4: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/4.jpg)
Outline
IntroductionMotivationGSNAP FeaturesExamples of Complex Variant Detection
The AlgorithmSummaryPreprocessingMethod 1: Spanning Set Generation and FilteringMethod 2: Complete Set Generation and FilteringVerification of Candidate RegionsDetecting Insertions and DeletionsDetecting Splice Junctions
ResultsSimulated ReadsTranscriptional ReadsLimitations
Conclusions
,
FU Berlin, GSNAP, Omics WS 2011 2
![Page 5: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/5.jpg)
Outline
IntroductionMotivationGSNAP FeaturesExamples of Complex Variant Detection
The AlgorithmSummaryPreprocessingMethod 1: Spanning Set Generation and FilteringMethod 2: Complete Set Generation and FilteringVerification of Candidate RegionsDetecting Insertions and DeletionsDetecting Splice Junctions
ResultsSimulated ReadsTranscriptional ReadsLimitations
Conclusions
,
FU Berlin, GSNAP, Omics WS 2011 2
![Page 6: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/6.jpg)
Outline
IntroductionMotivationGSNAP FeaturesExamples of Complex Variant Detection
The AlgorithmSummaryPreprocessingMethod 1: Spanning Set Generation and FilteringMethod 2: Complete Set Generation and FilteringVerification of Candidate RegionsDetecting Insertions and DeletionsDetecting Splice Junctions
ResultsSimulated ReadsTranscriptional ReadsLimitations
Conclusions
,
FU Berlin, GSNAP, Omics WS 2011 3
![Page 7: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/7.jpg)
Motivation
É New technology allows us to sequence more reads in shorter time.Increasing at an incredible rate with no signs of slowing down.
É "Why should we be happy with millions of reads, when we canhave. . .
,
FU Berlin, GSNAP, Omics WS 2011 4
![Page 8: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/8.jpg)
Motivation
É New technology allows us to sequence more reads in shorter time.Increasing at an incredible rate with no signs of slowing down.
É "Why should we be happy with millions of reads, when we canhave. . .
,
FU Berlin, GSNAP, Omics WS 2011 4
![Page 9: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/9.jpg)
Motivation
. . .billions?"
,
FU Berlin, GSNAP, Omics WS 2011 5
![Page 10: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/10.jpg)
Motivation: Why do we need another readmapping algorithm?
É The reads themselves are also getting longer. Longer reads = higherprobability for complex variants within a read.
É Current (Feb 2010) read mappers tend to either be very fast (BWA,Bowtie, SOAP2) or sensitive to variants (SOAP)
É GSNAP is intended to be fast and able to handle complex variants
,
FU Berlin, GSNAP, Omics WS 2011 6
![Page 11: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/11.jpg)
Motivation: Why do we need another readmapping algorithm?
É The reads themselves are also getting longer. Longer reads = higherprobability for complex variants within a read.
É Current (Feb 2010) read mappers tend to either be very fast (BWA,Bowtie, SOAP2) or sensitive to variants (SOAP)
É GSNAP is intended to be fast and able to handle complex variants
,
FU Berlin, GSNAP, Omics WS 2011 6
![Page 12: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/12.jpg)
Motivation: Why do we need another readmapping algorithm?
É The reads themselves are also getting longer. Longer reads = higherprobability for complex variants within a read.
É Current (Feb 2010) read mappers tend to either be very fast (BWA,Bowtie, SOAP2) or sensitive to variants (SOAP)
É GSNAP is intended to be fast and able to handle complex variants
,
FU Berlin, GSNAP, Omics WS 2011 6
![Page 13: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/13.jpg)
Outline
IntroductionMotivationGSNAP FeaturesExamples of Complex Variant Detection
The AlgorithmSummaryPreprocessingMethod 1: Spanning Set Generation and FilteringMethod 2: Complete Set Generation and FilteringVerification of Candidate RegionsDetecting Insertions and DeletionsDetecting Splice Junctions
ResultsSimulated ReadsTranscriptional ReadsLimitations
Conclusions
,
FU Berlin, GSNAP, Omics WS 2011 7
![Page 14: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/14.jpg)
Key Features of GSNAP
É Can handle short and long insertions and deletions
É Detects short and long distance splicing (includinginterchromosomal)É user-provided database of splice sites (eg. RefSeq)É probabilistic model
É SNP tolerant (given a user-provided database eg. dbSNP)É Can align bisulfite-treated DNA (for studying methlyation state)É Still pretty fast
,
FU Berlin, GSNAP, Omics WS 2011 8
![Page 15: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/15.jpg)
Key Features of GSNAP
É Can handle short and long insertions and deletionsÉ Detects short and long distance splicing (including
interchromosomal)
É user-provided database of splice sites (eg. RefSeq)É probabilistic model
É SNP tolerant (given a user-provided database eg. dbSNP)É Can align bisulfite-treated DNA (for studying methlyation state)É Still pretty fast
,
FU Berlin, GSNAP, Omics WS 2011 8
![Page 16: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/16.jpg)
Key Features of GSNAP
É Can handle short and long insertions and deletionsÉ Detects short and long distance splicing (including
interchromosomal)É user-provided database of splice sites (eg. RefSeq)
É probabilistic model
É SNP tolerant (given a user-provided database eg. dbSNP)É Can align bisulfite-treated DNA (for studying methlyation state)É Still pretty fast
,
FU Berlin, GSNAP, Omics WS 2011 8
![Page 17: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/17.jpg)
Key Features of GSNAP
É Can handle short and long insertions and deletionsÉ Detects short and long distance splicing (including
interchromosomal)É user-provided database of splice sites (eg. RefSeq)É probabilistic model
É SNP tolerant (given a user-provided database eg. dbSNP)É Can align bisulfite-treated DNA (for studying methlyation state)É Still pretty fast
,
FU Berlin, GSNAP, Omics WS 2011 8
![Page 18: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/18.jpg)
Key Features of GSNAP
É Can handle short and long insertions and deletionsÉ Detects short and long distance splicing (including
interchromosomal)É user-provided database of splice sites (eg. RefSeq)É probabilistic model
É SNP tolerant (given a user-provided database eg. dbSNP)
É Can align bisulfite-treated DNA (for studying methlyation state)É Still pretty fast
,
FU Berlin, GSNAP, Omics WS 2011 8
![Page 19: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/19.jpg)
Key Features of GSNAP
É Can handle short and long insertions and deletionsÉ Detects short and long distance splicing (including
interchromosomal)É user-provided database of splice sites (eg. RefSeq)É probabilistic model
É SNP tolerant (given a user-provided database eg. dbSNP)É Can align bisulfite-treated DNA (for studying methlyation state)
É Still pretty fast
,
FU Berlin, GSNAP, Omics WS 2011 8
![Page 20: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/20.jpg)
Key Features of GSNAP
É Can handle short and long insertions and deletionsÉ Detects short and long distance splicing (including
interchromosomal)É user-provided database of splice sites (eg. RefSeq)É probabilistic model
É SNP tolerant (given a user-provided database eg. dbSNP)É Can align bisulfite-treated DNA (for studying methlyation state)É Still pretty fast
,
FU Berlin, GSNAP, Omics WS 2011 8
![Page 21: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/21.jpg)
Outline
IntroductionMotivationGSNAP FeaturesExamples of Complex Variant Detection
The AlgorithmSummaryPreprocessingMethod 1: Spanning Set Generation and FilteringMethod 2: Complete Set Generation and FilteringVerification of Candidate RegionsDetecting Insertions and DeletionsDetecting Splice Junctions
ResultsSimulated ReadsTranscriptional ReadsLimitations
Conclusions
,
FU Berlin, GSNAP, Omics WS 2011 9
![Page 22: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/22.jpg)
Long deletion of 17nt plus mismatches
É 17nt deletion matching an entry in dbSNP, including mismatches:
,
FU Berlin, GSNAP, Omics WS 2011 10
![Page 23: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/23.jpg)
Splicing identified using the probabilisticmodel
É An intron within exon 1 of HOXA9. Is also experimentally supported.
,
FU Berlin, GSNAP, Omics WS 2011 11
![Page 24: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/24.jpg)
Splicing identified using a database of knownsplice sites
É Splicing sites identified despite having low probabilistic scores.
,
FU Berlin, GSNAP, Omics WS 2011 12
![Page 25: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/25.jpg)
Interchromosomal splicing (gene fusion)
É Splicing between BCAS4 (chr 20) and BCAS3 (chr 17).
,
FU Berlin, GSNAP, Omics WS 2011 13
![Page 26: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/26.jpg)
SNP-tolerant alignment
É SNP-tolerance allows both genotypes to align well
,
FU Berlin, GSNAP, Omics WS 2011 14
![Page 27: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/27.jpg)
Outline
IntroductionMotivationGSNAP FeaturesExamples of Complex Variant Detection
The AlgorithmSummaryPreprocessingMethod 1: Spanning Set Generation and FilteringMethod 2: Complete Set Generation and FilteringVerification of Candidate RegionsDetecting Insertions and DeletionsDetecting Splice Junctions
ResultsSimulated ReadsTranscriptional ReadsLimitations
Conclusions
,
FU Berlin, GSNAP, Omics WS 2011 15
![Page 28: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/28.jpg)
The Algorithm in Brief
É Index the genome using a hash table
É Break up short reads into shorter elements and look each up in hashtable
É Look at resulting position lists for each element and see if theysupport a common target location and have a reasonable number ofmismatches
É Verify the number of mismatches by checking the whole read againstthe reference
,
FU Berlin, GSNAP, Omics WS 2011 16
![Page 29: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/29.jpg)
The Algorithm in Brief
É Index the genome using a hash tableÉ Break up short reads into shorter elements and look each up in hash
table
É Look at resulting position lists for each element and see if theysupport a common target location and have a reasonable number ofmismatches
É Verify the number of mismatches by checking the whole read againstthe reference
,
FU Berlin, GSNAP, Omics WS 2011 16
![Page 30: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/30.jpg)
The Algorithm in Brief
É Index the genome using a hash tableÉ Break up short reads into shorter elements and look each up in hash
tableÉ Look at resulting position lists for each element and see if they
support a common target location and have a reasonable number ofmismatches
É Verify the number of mismatches by checking the whole read againstthe reference
,
FU Berlin, GSNAP, Omics WS 2011 16
![Page 31: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/31.jpg)
The Algorithm in Brief
É Index the genome using a hash tableÉ Break up short reads into shorter elements and look each up in hash
tableÉ Look at resulting position lists for each element and see if they
support a common target location and have a reasonable number ofmismatches
É Verify the number of mismatches by checking the whole read againstthe reference
,
FU Berlin, GSNAP, Omics WS 2011 16
![Page 32: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/32.jpg)
Outline
IntroductionMotivationGSNAP FeaturesExamples of Complex Variant Detection
The AlgorithmSummaryPreprocessingMethod 1: Spanning Set Generation and FilteringMethod 2: Complete Set Generation and FilteringVerification of Candidate RegionsDetecting Insertions and DeletionsDetecting Splice Junctions
ResultsSimulated ReadsTranscriptional ReadsLimitations
Conclusions
,
FU Berlin, GSNAP, Omics WS 2011 17
![Page 33: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/33.jpg)
Hash Table Space Requirements
É Hashing every overlapping oligomer in the entire reference sequencewould take too much memory (approx. 14GB for the human genome)
É We only hash 12mers every 3nt (approx. 4GB)É Optional SNP-tolerance only adds a small amount to total memory
requirements (3.8 GB → 4.0 GB)É Entire table only needs to be in memory during construction.
Afterwards it is mmap’d and only part is loaded into memory.
,
FU Berlin, GSNAP, Omics WS 2011 18
![Page 34: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/34.jpg)
Hash Table Space Requirements
É Hashing every overlapping oligomer in the entire reference sequencewould take too much memory (approx. 14GB for the human genome)
É We only hash 12mers every 3nt (approx. 4GB)
É Optional SNP-tolerance only adds a small amount to total memoryrequirements (3.8 GB → 4.0 GB)
É Entire table only needs to be in memory during construction.Afterwards it is mmap’d and only part is loaded into memory.
,
FU Berlin, GSNAP, Omics WS 2011 18
![Page 35: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/35.jpg)
Hash Table Space Requirements
É Hashing every overlapping oligomer in the entire reference sequencewould take too much memory (approx. 14GB for the human genome)
É We only hash 12mers every 3nt (approx. 4GB)É Optional SNP-tolerance only adds a small amount to total memory
requirements (3.8 GB → 4.0 GB)
É Entire table only needs to be in memory during construction.Afterwards it is mmap’d and only part is loaded into memory.
,
FU Berlin, GSNAP, Omics WS 2011 18
![Page 36: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/36.jpg)
Hash Table Space Requirements
É Hashing every overlapping oligomer in the entire reference sequencewould take too much memory (approx. 14GB for the human genome)
É We only hash 12mers every 3nt (approx. 4GB)É Optional SNP-tolerance only adds a small amount to total memory
requirements (3.8 GB → 4.0 GB)É Entire table only needs to be in memory during construction.
Afterwards it is mmap’d and only part is loaded into memory.
,
FU Berlin, GSNAP, Omics WS 2011 18
![Page 37: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/37.jpg)
Hashing the Reference Genome
,
FU Berlin, GSNAP, Omics WS 2011 19
![Page 38: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/38.jpg)
Including SNPs
,
FU Berlin, GSNAP, Omics WS 2011 20
![Page 39: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/39.jpg)
Resulting Reference "Space"
,
FU Berlin, GSNAP, Omics WS 2011 21
![Page 40: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/40.jpg)
Outline
IntroductionMotivationGSNAP FeaturesExamples of Complex Variant Detection
The AlgorithmSummaryPreprocessingMethod 1: Spanning Set Generation and FilteringMethod 2: Complete Set Generation and FilteringVerification of Candidate RegionsDetecting Insertions and DeletionsDetecting Splice Junctions
ResultsSimulated ReadsTranscriptional ReadsLimitations
Conclusions
,
FU Berlin, GSNAP, Omics WS 2011 22
![Page 41: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/41.jpg)
Spanning Set Generation and Filtering
,
FU Berlin, GSNAP, Omics WS 2011 23
![Page 42: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/42.jpg)
Spanning Set Generation and Filtering
,
FU Berlin, GSNAP, Omics WS 2011 24
![Page 43: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/43.jpg)
Spanning Set Generation and Filtering
,
FU Berlin, GSNAP, Omics WS 2011 25
![Page 44: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/44.jpg)
Spanning Set Generation and Filtering
,
FU Berlin, GSNAP, Omics WS 2011 26
![Page 45: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/45.jpg)
Spanning Set Generation and Filtering
,
FU Berlin, GSNAP, Omics WS 2011 27
![Page 46: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/46.jpg)
Spanning Set Generation and Filtering
,
FU Berlin, GSNAP, Omics WS 2011 28
![Page 47: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/47.jpg)
Spanning Set Generation and Filtering
,
FU Berlin, GSNAP, Omics WS 2011 29
![Page 48: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/48.jpg)
Spanning Set Generation and Filtering
É Generating elements: used to find supporting candidate locationsÉ Uses a multiway merging procedure (Knuth TAOCP Vol 3)É Slow: linear on the sum of the list lengths. O(l logn) runtime where n is
the number of position lists and l is the sum of their lengths.
É Filtering elements: filter the candidate locations found using thegenerating elementsÉ Fast: uses a binary search. O(log li) for position list i
É Choose the elements with the shortest position lists as generatingelements, and the longer ones as filtering elements.
É They choose K + 2 generating sets, where K is the constraint score (=max number of mismatches)
,
FU Berlin, GSNAP, Omics WS 2011 30
![Page 49: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/49.jpg)
Spanning Set Generation and Filtering
É Generating elements: used to find supporting candidate locationsÉ Uses a multiway merging procedure (Knuth TAOCP Vol 3)É Slow: linear on the sum of the list lengths. O(l logn) runtime where n is
the number of position lists and l is the sum of their lengths.É Filtering elements: filter the candidate locations found using the
generating elementsÉ Fast: uses a binary search. O(log li) for position list i
É Choose the elements with the shortest position lists as generatingelements, and the longer ones as filtering elements.
É They choose K + 2 generating sets, where K is the constraint score (=max number of mismatches)
,
FU Berlin, GSNAP, Omics WS 2011 30
![Page 50: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/50.jpg)
Spanning Set Generation and Filtering
É Generating elements: used to find supporting candidate locationsÉ Uses a multiway merging procedure (Knuth TAOCP Vol 3)É Slow: linear on the sum of the list lengths. O(l logn) runtime where n is
the number of position lists and l is the sum of their lengths.É Filtering elements: filter the candidate locations found using the
generating elementsÉ Fast: uses a binary search. O(log li) for position list i
É Choose the elements with the shortest position lists as generatingelements, and the longer ones as filtering elements.
É They choose K + 2 generating sets, where K is the constraint score (=max number of mismatches)
,
FU Berlin, GSNAP, Omics WS 2011 30
![Page 51: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/51.jpg)
Spanning Set Generation and Filtering
É Generating elements: used to find supporting candidate locationsÉ Uses a multiway merging procedure (Knuth TAOCP Vol 3)É Slow: linear on the sum of the list lengths. O(l logn) runtime where n is
the number of position lists and l is the sum of their lengths.É Filtering elements: filter the candidate locations found using the
generating elementsÉ Fast: uses a binary search. O(log li) for position list i
É Choose the elements with the shortest position lists as generatingelements, and the longer ones as filtering elements.
É They choose K + 2 generating sets, where K is the constraint score (=max number of mismatches)
,
FU Berlin, GSNAP, Omics WS 2011 30
![Page 52: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/52.jpg)
Spanning Set Generation and Filtering
,
FU Berlin, GSNAP, Omics WS 2011 31
![Page 53: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/53.jpg)
Spanning Set Generation and Filtering
,
FU Berlin, GSNAP, Omics WS 2011 32
![Page 54: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/54.jpg)
Spanning Set Generation and Filtering
,
FU Berlin, GSNAP, Omics WS 2011 33
![Page 55: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/55.jpg)
Spanning Set Generation and Filtering
É Problem: The spanning set method can only be used to detect alimited number of mismatches.
É If we want to allow K matches, for K > 1, then we need at least(K + 2) generating elements. The read length L limits the number ofelements N, and the spanning set elements are non-overlapping in allthree shifts if L = 10(mod12), so N ≤ b(L+ 2)/12c.
É So we need to satisfy N > (K + 2), and as such we can only apply thespanning set method when K ≤ b(L+ 2)/12c − 2 for L ≥ 34.
É For reads of length 100 (Illumina), the we can allow a maximum of 6mismatches.
É For reads of length 400 (454), the we can allow a maximum of 31mismatches.
É If we want to allow larger numbers of mismatches or the samenumber of mismatches in shorter reads, we need to use anothermethod. . .
,
FU Berlin, GSNAP, Omics WS 2011 34
![Page 56: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/56.jpg)
Spanning Set Generation and Filtering
É Problem: The spanning set method can only be used to detect alimited number of mismatches.
É If we want to allow K matches, for K > 1, then we need at least(K + 2) generating elements. The read length L limits the number ofelements N, and the spanning set elements are non-overlapping in allthree shifts if L = 10(mod12), so N ≤ b(L+ 2)/12c.
É So we need to satisfy N > (K + 2), and as such we can only apply thespanning set method when K ≤ b(L+ 2)/12c − 2 for L ≥ 34.
É For reads of length 100 (Illumina), the we can allow a maximum of 6mismatches.
É For reads of length 400 (454), the we can allow a maximum of 31mismatches.
É If we want to allow larger numbers of mismatches or the samenumber of mismatches in shorter reads, we need to use anothermethod. . .
,
FU Berlin, GSNAP, Omics WS 2011 34
![Page 57: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/57.jpg)
Spanning Set Generation and Filtering
É Problem: The spanning set method can only be used to detect alimited number of mismatches.
É If we want to allow K matches, for K > 1, then we need at least(K + 2) generating elements. The read length L limits the number ofelements N, and the spanning set elements are non-overlapping in allthree shifts if L = 10(mod12), so N ≤ b(L+ 2)/12c.
É So we need to satisfy N > (K + 2), and as such we can only apply thespanning set method when K ≤ b(L+ 2)/12c − 2 for L ≥ 34.
É For reads of length 100 (Illumina), the we can allow a maximum of 6mismatches.
É For reads of length 400 (454), the we can allow a maximum of 31mismatches.
É If we want to allow larger numbers of mismatches or the samenumber of mismatches in shorter reads, we need to use anothermethod. . .
,
FU Berlin, GSNAP, Omics WS 2011 34
![Page 58: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/58.jpg)
Spanning Set Generation and Filtering
É Problem: The spanning set method can only be used to detect alimited number of mismatches.
É If we want to allow K matches, for K > 1, then we need at least(K + 2) generating elements. The read length L limits the number ofelements N, and the spanning set elements are non-overlapping in allthree shifts if L = 10(mod12), so N ≤ b(L+ 2)/12c.
É So we need to satisfy N > (K + 2), and as such we can only apply thespanning set method when K ≤ b(L+ 2)/12c − 2 for L ≥ 34.
É For reads of length 100 (Illumina), the we can allow a maximum of 6mismatches.
É For reads of length 400 (454), the we can allow a maximum of 31mismatches.
É If we want to allow larger numbers of mismatches or the samenumber of mismatches in shorter reads, we need to use anothermethod. . .
,
FU Berlin, GSNAP, Omics WS 2011 34
![Page 59: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/59.jpg)
Spanning Set Generation and Filtering
É Problem: The spanning set method can only be used to detect alimited number of mismatches.
É If we want to allow K matches, for K > 1, then we need at least(K + 2) generating elements. The read length L limits the number ofelements N, and the spanning set elements are non-overlapping in allthree shifts if L = 10(mod12), so N ≤ b(L+ 2)/12c.
É So we need to satisfy N > (K + 2), and as such we can only apply thespanning set method when K ≤ b(L+ 2)/12c − 2 for L ≥ 34.
É For reads of length 100 (Illumina), the we can allow a maximum of 6mismatches.
É For reads of length 400 (454), the we can allow a maximum of 31mismatches.
É If we want to allow larger numbers of mismatches or the samenumber of mismatches in shorter reads, we need to use anothermethod. . .
,
FU Berlin, GSNAP, Omics WS 2011 34
![Page 60: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/60.jpg)
Outline
IntroductionMotivationGSNAP FeaturesExamples of Complex Variant Detection
The AlgorithmSummaryPreprocessingMethod 1: Spanning Set Generation and FilteringMethod 2: Complete Set Generation and FilteringVerification of Candidate RegionsDetecting Insertions and DeletionsDetecting Splice Junctions
ResultsSimulated ReadsTranscriptional ReadsLimitations
Conclusions
,
FU Berlin, GSNAP, Omics WS 2011 35
![Page 61: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/61.jpg)
Complete set generation and filtering
É Uses the complete set of overlapping 12mers.É Works for any number of mismatches as long as read and target have≥ 14 consecutive matches (12mer out of phase by up to two bases)
É Exhaustive for K ≤ bL/14c − 1
,
FU Berlin, GSNAP, Omics WS 2011 36
![Page 62: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/62.jpg)
Complete set generation and filtering
É Lower bound on mismatches: b(∆p+ 6)/12cwhere ∆p is the distance between start locations of consecutivesupporting 12mers.
,
FU Berlin, GSNAP, Omics WS 2011 37
![Page 63: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/63.jpg)
Complete set generation and filtering
É Lower bound on mismatches: b(∆p+ 6)/12cwhere ∆p is the distance between start locations of consecutivesupporting 12mers.
,
FU Berlin, GSNAP, Omics WS 2011 37
![Page 64: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/64.jpg)
Complete set example
,
FU Berlin, GSNAP, Omics WS 2011 38
![Page 65: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/65.jpg)
Complete set generation and filtering
,
FU Berlin, GSNAP, Omics WS 2011 39
![Page 66: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/66.jpg)
Outline
IntroductionMotivationGSNAP FeaturesExamples of Complex Variant Detection
The AlgorithmSummaryPreprocessingMethod 1: Spanning Set Generation and FilteringMethod 2: Complete Set Generation and FilteringVerification of Candidate RegionsDetecting Insertions and DeletionsDetecting Splice Junctions
ResultsSimulated ReadsTranscriptional ReadsLimitations
Conclusions
,
FU Berlin, GSNAP, Omics WS 2011 40
![Page 67: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/67.jpg)
Verification of Candidate Regions
É The spanning set and complete set methods generate candidateregions for which we know a lower bound on the number ofmismatches.
É These regions need to be verified to check the exact number ofmismatches.
,
FU Berlin, GSNAP, Omics WS 2011 41
![Page 68: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/68.jpg)
Remember: Resulting Reference "Space"
,
FU Berlin, GSNAP, Omics WS 2011 42
![Page 69: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/69.jpg)
Compressed Genome
É Text usually stored as 8 bit characters
É Because we have a reduced alphabet the reference is stored as 3 bitsper character: 2 bits for the nt + a flagÉ Flag in major-allele genome: indicates unknown or ambiguous ntÉ Flag in minor-allele genome: indicates a SNP
,
FU Berlin, GSNAP, Omics WS 2011 43
![Page 70: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/70.jpg)
Compressed Genome
É Text usually stored as 8 bit charactersÉ Because we have a reduced alphabet the reference is stored as 3 bits
per character: 2 bits for the nt + a flag
É Flag in major-allele genome: indicates unknown or ambiguous ntÉ Flag in minor-allele genome: indicates a SNP
,
FU Berlin, GSNAP, Omics WS 2011 43
![Page 71: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/71.jpg)
Compressed Genome
É Text usually stored as 8 bit charactersÉ Because we have a reduced alphabet the reference is stored as 3 bits
per character: 2 bits for the nt + a flagÉ Flag in major-allele genome: indicates unknown or ambiguous ntÉ Flag in minor-allele genome: indicates a SNP
,
FU Berlin, GSNAP, Omics WS 2011 43
![Page 72: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/72.jpg)
Verification of Candidate Regions
É Query sequence converted to the same compressed representationas the reference
É Shifted into position and bitwise XOR combined with the major- andminor-allele genomes separately
É Resulting arrays are bitwise AND’d, so mismatches at a SNP onlyoccur if both alleles do not match
,
FU Berlin, GSNAP, Omics WS 2011 44
![Page 73: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/73.jpg)
Outline
IntroductionMotivationGSNAP FeaturesExamples of Complex Variant Detection
The AlgorithmSummaryPreprocessingMethod 1: Spanning Set Generation and FilteringMethod 2: Complete Set Generation and FilteringVerification of Candidate RegionsDetecting Insertions and DeletionsDetecting Splice Junctions
ResultsSimulated ReadsTranscriptional ReadsLimitations
Conclusions
,
FU Berlin, GSNAP, Omics WS 2011 45
![Page 74: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/74.jpg)
Detecting Insertions and Deletions
,
FU Berlin, GSNAP, Omics WS 2011 46
![Page 75: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/75.jpg)
Detecting Insertions and Deletions
,
FU Berlin, GSNAP, Omics WS 2011 47
![Page 76: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/76.jpg)
Detecting Insertions and Deletions
,
FU Berlin, GSNAP, Omics WS 2011 48
![Page 77: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/77.jpg)
Detecting Insertions and Deletions
,
FU Berlin, GSNAP, Omics WS 2011 49
![Page 78: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/78.jpg)
Outline
IntroductionMotivationGSNAP FeaturesExamples of Complex Variant Detection
The AlgorithmSummaryPreprocessingMethod 1: Spanning Set Generation and FilteringMethod 2: Complete Set Generation and FilteringVerification of Candidate RegionsDetecting Insertions and DeletionsDetecting Splice Junctions
ResultsSimulated ReadsTranscriptional ReadsLimitations
Conclusions
,
FU Berlin, GSNAP, Omics WS 2011 50
![Page 79: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/79.jpg)
Detecting Splice Junctions
É Known splice sites: user-provided databaseÉ Novel splice sites: maximum entropy probabilistic model from Yeo
and Burge, 2004
,
FU Berlin, GSNAP, Omics WS 2011 51
![Page 80: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/80.jpg)
Detecting Splice Junctions
É Short-distance splice sites are on the same chromosome and < somedistance apart (default: 200,000 nt)
É Method similar to the one we used to find middle deletions earlier. . .
,
FU Berlin, GSNAP, Omics WS 2011 52
![Page 81: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/81.jpg)
Detecting Splice Junctions
É Crossover area is then searched for donor or acceptor sites (eitherknown or novel with high probability).
,
FU Berlin, GSNAP, Omics WS 2011 53
![Page 82: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/82.jpg)
Detecting Splice Junctions
É Crossover area is then searched for donor or acceptor sites (eitherknown or novel with high probability).
,
FU Berlin, GSNAP, Omics WS 2011 53
![Page 83: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/83.jpg)
Detecting Splice Junctions
É Long-distance splice sites can be on different chromosomesÉ Require higher probability scores for novel splice sites than
short-distance splice sitesÉ Candidates with matching breakpoints on the read are matched
,
FU Berlin, GSNAP, Omics WS 2011 54
![Page 84: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/84.jpg)
Detecting Splice Junctions
,
FU Berlin, GSNAP, Omics WS 2011 55
![Page 85: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/85.jpg)
Detecting Splice Junctions
É If both splice sites can not be found then GSNAP will return one site(a "half-intron")
,
FU Berlin, GSNAP, Omics WS 2011 56
![Page 86: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/86.jpg)
Outline
IntroductionMotivationGSNAP FeaturesExamples of Complex Variant Detection
The AlgorithmSummaryPreprocessingMethod 1: Spanning Set Generation and FilteringMethod 2: Complete Set Generation and FilteringVerification of Candidate RegionsDetecting Insertions and DeletionsDetecting Splice Junctions
ResultsSimulated ReadsTranscriptional ReadsLimitations
Conclusions
,
FU Berlin, GSNAP, Omics WS 2011 57
![Page 87: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/87.jpg)
Simulated Reads
É Runtime comparison between GSNAP and other alignment tools (for100,000 reads)
É Simulated increasingly complicated variantsÉ exact matches onlyÉ 1 - 3 mismatchesÉ short insertions and deletionsÉ longer insertions and deletions
,
FU Berlin, GSNAP, Omics WS 2011 58
![Page 88: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/88.jpg)
Simulated Reads: 36nt
GSNAP BWA Bowtie SOAP2 SOAP MAQ
Variant: Exact
Mapping Software
Run
time
(s)
050
010
0015
0020
0025
0030
00
,
FU Berlin, GSNAP, Omics WS 2011 59
![Page 89: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/89.jpg)
Simulated Reads: 36nt
GSNAP BWA Bowtie SOAP2 SOAP MAQ
Variant: 1 mm
Mapping Software
Run
time
(s)
050
010
0015
0020
0025
0030
00
,
FU Berlin, GSNAP, Omics WS 2011 60
![Page 90: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/90.jpg)
Simulated Reads: 36nt
GSNAP BWA Bowtie SOAP2 SOAP MAQ
Variant: 2 mm
Mapping Software
Run
time
(s)
050
010
0015
0020
0025
0030
00
,
FU Berlin, GSNAP, Omics WS 2011 61
![Page 91: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/91.jpg)
Simulated Reads: 36nt
GSNAP BWA Bowtie SOAP2 SOAP MAQ
Variant: 3 mm
Mapping Software
Run
time
(s)
050
010
0015
0020
0025
0030
00
,
FU Berlin, GSNAP, Omics WS 2011 62
![Page 92: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/92.jpg)
Simulated Reads: 36nt
GSNAP BWA Bowtie SOAP2 SOAP MAQ
Variant: Ins (1...3nt)
Mapping Software
Run
time
(s)
050
010
0015
0020
0025
0030
00
,
FU Berlin, GSNAP, Omics WS 2011 63
![Page 93: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/93.jpg)
Simulated Reads: 36nt
GSNAP BWA Bowtie SOAP2 SOAP MAQ
Variant: Del (1...3nt)
Mapping Software
Run
time
(s)
050
010
0015
0020
0025
0030
00
,
FU Berlin, GSNAP, Omics WS 2011 64
![Page 94: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/94.jpg)
Simulated Reads: 36nt
GSNAP BWA Bowtie SOAP2 SOAP MAQ
Variant: Ins (4...9nt)
Mapping Software
Run
time
(s)
050
010
0015
0020
0025
0030
00
,
FU Berlin, GSNAP, Omics WS 2011 65
![Page 95: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/95.jpg)
Simulated Reads: 36nt
GSNAP BWA Bowtie SOAP2 SOAP MAQ
Variant: Del (4...30nt)
Mapping Software
Run
time
(s)
050
010
0015
0020
0025
0030
00
,
FU Berlin, GSNAP, Omics WS 2011 66
![Page 96: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/96.jpg)
Simulated Reads: 36nt
Exa
ct
1 m
m
2 m
m
3 m
m
Ins
(1...
3nt)
Del
(1.
..3nt
)
Ins
(4...
9nt)
Del
(4.
..30n
t)
Run
time
(s)
050
010
0015
0020
0025
0030
00
GSNAPBWABowtieSOAP2SOAPMAQ
,
FU Berlin, GSNAP, Omics WS 2011 67
![Page 97: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/97.jpg)
Simulated Reads: 70nt
Exa
ct
1 m
m
2 m
m
3 m
m
4 m
m
Ins
(1...
3nt)
Del
(1.
..3nt
)
Ins
(4...
9nt)
Del
(4.
..30n
t)
Run
time
(s)
050
010
0015
0020
0025
0030
00
GSNAPBWABowtieSOAP2SOAPMAQ
,
FU Berlin, GSNAP, Omics WS 2011 68
![Page 98: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/98.jpg)
Simulated Reads: 100nt
Exa
ct
1 m
m
2 m
m
3 m
m
4 m
m
5 m
m
Ins
(1...
3nt)
Del
(1.
..3nt
)
Ins
(4...
9nt)
Del
(4.
..30n
t)
Run
time
(s)
050
010
0015
0020
0025
0030
00
GSNAPBWABowtieSOAP2SOAPMAQ
,
FU Berlin, GSNAP, Omics WS 2011 69
![Page 99: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/99.jpg)
Simulated Reads: Percent misses (on uniquereads)
É Most algorithms were able to perfectly map unique reads, with thefollowing notable exceptions:
É GSNAP: 12% misses for 36nt reads with 3 mismatchesÉ SOAP: 15% misses for 36nt reads with 3 mismatches, around 5% for
1-3nt indelsÉ BWA: 1-5% misses in 1-3nt indels
,
FU Berlin, GSNAP, Omics WS 2011 70
![Page 100: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/100.jpg)
Simulated Reads: Percent misses (on uniquereads)
É Most algorithms were able to perfectly map unique reads, with thefollowing notable exceptions:
É GSNAP: 12% misses for 36nt reads with 3 mismatches
É SOAP: 15% misses for 36nt reads with 3 mismatches, around 5% for1-3nt indels
É BWA: 1-5% misses in 1-3nt indels
,
FU Berlin, GSNAP, Omics WS 2011 70
![Page 101: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/101.jpg)
Simulated Reads: Percent misses (on uniquereads)
É Most algorithms were able to perfectly map unique reads, with thefollowing notable exceptions:
É GSNAP: 12% misses for 36nt reads with 3 mismatchesÉ SOAP: 15% misses for 36nt reads with 3 mismatches, around 5% for
1-3nt indels
É BWA: 1-5% misses in 1-3nt indels
,
FU Berlin, GSNAP, Omics WS 2011 70
![Page 102: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/102.jpg)
Simulated Reads: Percent misses (on uniquereads)
É Most algorithms were able to perfectly map unique reads, with thefollowing notable exceptions:
É GSNAP: 12% misses for 36nt reads with 3 mismatchesÉ SOAP: 15% misses for 36nt reads with 3 mismatches, around 5% for
1-3nt indelsÉ BWA: 1-5% misses in 1-3nt indels
,
FU Berlin, GSNAP, Omics WS 2011 70
![Page 103: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/103.jpg)
Memory Requirements
É GSNAP: should have access to 5 GB of memory, otherwise it will runslowly
É BWA: 2.2 GBÉ Bowtie: 1.1 GB (exact matches) or 2.2 GB (allowing mismatches)É MAQ: 302 MBÉ SOAP: 14 GBÉ SOAP2: unknown ("only provided as a binary and did not have the
required compile time flag")
,
FU Berlin, GSNAP, Omics WS 2011 71
![Page 104: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/104.jpg)
Outline
IntroductionMotivationGSNAP FeaturesExamples of Complex Variant Detection
The AlgorithmSummaryPreprocessingMethod 1: Spanning Set Generation and FilteringMethod 2: Complete Set Generation and FilteringVerification of Candidate RegionsDetecting Insertions and DeletionsDetecting Splice Junctions
ResultsSimulated ReadsTranscriptional ReadsLimitations
Conclusions
,
FU Berlin, GSNAP, Omics WS 2011 72
![Page 105: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/105.jpg)
Transcriptional Reads
É Only looked at the effect of splicing, indels and SNP tolerance
É Including known splicing information → increased yield approx. 8%É Including SNP tolerance → minor increase in yield (0.5%) but effected
about 8% of alignments
,
FU Berlin, GSNAP, Omics WS 2011 73
![Page 106: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/106.jpg)
Transcriptional Reads
É Only looked at the effect of splicing, indels and SNP toleranceÉ Including known splicing information → increased yield approx. 8%
É Including SNP tolerance → minor increase in yield (0.5%) but effectedabout 8% of alignments
,
FU Berlin, GSNAP, Omics WS 2011 73
![Page 107: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/107.jpg)
Transcriptional Reads
É Only looked at the effect of splicing, indels and SNP toleranceÉ Including known splicing information → increased yield approx. 8%É Including SNP tolerance → minor increase in yield (0.5%) but effected
about 8% of alignments
,
FU Berlin, GSNAP, Omics WS 2011 73
![Page 108: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/108.jpg)
Outline
IntroductionMotivationGSNAP FeaturesExamples of Complex Variant Detection
The AlgorithmSummaryPreprocessingMethod 1: Spanning Set Generation and FilteringMethod 2: Complete Set Generation and FilteringVerification of Candidate RegionsDetecting Insertions and DeletionsDetecting Splice Junctions
ResultsSimulated ReadsTranscriptional ReadsLimitations
Conclusions
,
FU Berlin, GSNAP, Omics WS 2011 74
![Page 109: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/109.jpg)
Limitations
É Matching is only exhaustive if there is at least a 14nt contiguousmatch, otherwise it’s a heuristic
É Limited to one indel or splice site per readÉ Does not use read quality scoresÉ Does not work with ABI SOLiD data
,
FU Berlin, GSNAP, Omics WS 2011 75
![Page 110: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/110.jpg)
Limitations
É Matching is only exhaustive if there is at least a 14nt contiguousmatch, otherwise it’s a heuristic
É Limited to one indel or splice site per read
É Does not use read quality scoresÉ Does not work with ABI SOLiD data
,
FU Berlin, GSNAP, Omics WS 2011 75
![Page 111: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/111.jpg)
Limitations
É Matching is only exhaustive if there is at least a 14nt contiguousmatch, otherwise it’s a heuristic
É Limited to one indel or splice site per readÉ Does not use read quality scores
É Does not work with ABI SOLiD data
,
FU Berlin, GSNAP, Omics WS 2011 75
![Page 112: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/112.jpg)
Limitations
É Matching is only exhaustive if there is at least a 14nt contiguousmatch, otherwise it’s a heuristic
É Limited to one indel or splice site per readÉ Does not use read quality scoresÉ Does not work with ABI SOLiD data
,
FU Berlin, GSNAP, Omics WS 2011 75
![Page 113: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/113.jpg)
TL;DR
É Comparable to other fast read alignment algorithms in terms ofspeed, but can handle more complex variants and splicing
,
FU Berlin, GSNAP, Omics WS 2011 76
![Page 114: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/114.jpg)
For Further Reading I
Knuth D.E.The Art of Computer Programming: Sorting and Searching. Vol 3.Addison-Wesley, 1973
Thomas D. Wu and Serban NacuFast and SNP-tolerant detection of complex variants and splicing inshort readsBioinformatics, 2010 Apr 1;26(7):873-81.
Yeo G and Burge CB.Maximum entropy modeling of short sequence motifs withapplications to RNA splicing signalsJ Comput Biol. 2004;11(2-3):377-94.
,
FU Berlin, GSNAP, Omics WS 2011 77
![Page 115: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/115.jpg)
Aligning Paired-End Reads
É Optional
,
FU Berlin, GSNAP, Omics WS 2011 78
![Page 116: GSNAP: Fast and SNP-tolerant detection of complex variants ... · GSNAP: Fast and SNP-tolerant detection of complex variants and splic-ing in short reads by Thomas D. Wu and Serban](https://reader030.vdocument.in/reader030/viewer/2022041205/5d59f98f88c99304078bc1e5/html5/thumbnails/116.jpg)
Bisulfite-converted DNA
É Optional
,
FU Berlin, GSNAP, Omics WS 2011 79