fuzzypath – algorithms, applications and future developments zemin ning sequence assembly and...
TRANSCRIPT
![Page 1: Fuzzypath – Algorithms, Applications and Future Developments Zemin Ning Sequence Assembly and Analysis](https://reader035.vdocument.in/reader035/viewer/2022062309/56649ed15503460f94be08ae/html5/thumbnails/1.jpg)
Fuzzypath – Algorithms, Fuzzypath – Algorithms, Applications and Future Applications and Future
DevelopmentsDevelopments
Zemin NingZemin Ning
Sequence Assembly and AnalysisSequence Assembly and Analysis
![Page 2: Fuzzypath – Algorithms, Applications and Future Developments Zemin Ning Sequence Assembly and Analysis](https://reader035.vdocument.in/reader035/viewer/2022062309/56649ed15503460f94be08ae/html5/thumbnails/2.jpg)
Outline of the Talk:
Sequence Reconstruction and Euler Path Assembly strategy Sequence extension using read pairs, base qualities,
fuzzy kmers or longer reads Repeat junctions Installation, data process and running Gap5 - visual inspection for mis-assembly errors Integration into the Phusion pipeline
![Page 3: Fuzzypath – Algorithms, Applications and Future Developments Zemin Ning Sequence Assembly and Analysis](https://reader035.vdocument.in/reader035/viewer/2022062309/56649ed15503460f94be08ae/html5/thumbnails/3.jpg)
Repeat Repeat Repeat
Sequence Repeat Graph
Sequences
![Page 4: Fuzzypath – Algorithms, Applications and Future Developments Zemin Ning Sequence Assembly and Analysis](https://reader035.vdocument.in/reader035/viewer/2022062309/56649ed15503460f94be08ae/html5/thumbnails/4.jpg)
Sequence ReconstructionSequence Reconstruction- Hamiltonian path approach- Hamiltonian path approach
S=(ATGCAGGTCC)S=(ATGCAGGTCC)ATG ATG ->-> TGC TGC -> -> GCA GCA ->-> CAG CAG -> -> AGG AGG ->-> GGT GGT -> -> GTC GTC ->-> TCC TCC
ATG AGG TGC TCC GTC GGT GCA CAGATG AGG TGC TCC GTC GGT GCA CAG
VerticesVertices: k-tuples from the spectrum shown in red (8);: k-tuples from the spectrum shown in red (8);EdgesEdges: overlapping k-tuples (7);: overlapping k-tuples (7);PathPath: visiting all vertices corresponding to the : visiting all vertices corresponding to the sequence.sequence.
![Page 5: Fuzzypath – Algorithms, Applications and Future Developments Zemin Ning Sequence Assembly and Analysis](https://reader035.vdocument.in/reader035/viewer/2022062309/56649ed15503460f94be08ae/html5/thumbnails/5.jpg)
Sequence ReconstructionSequence Reconstruction- Euler path approach- Euler path approach
VerticesVertices: : correspond to (k-I)-tuples (7);correspond to (k-I)-tuples (7);EdgesEdges: : correspond to k-tuples from the spectrum (8);correspond to k-tuples from the spectrum (8);PathPath: : visiting all EDGES corresponding to the visiting all EDGES corresponding to the sequence.sequence.
ATAT
GTGT CGCG
CACA
GCGCTGTG
GGGG
ATGCGTGGCAATGCGTGGCA ATGGCGTGCAATGGCGTGCA
ATG ATG ->-> TGG TGG -> -> GGC GGC ->-> GCG GCG -> -> CGT CGT ->-> GTG GTG -> -> TGC TGC ->-> GCA GCA
![Page 6: Fuzzypath – Algorithms, Applications and Future Developments Zemin Ning Sequence Assembly and Analysis](https://reader035.vdocument.in/reader035/viewer/2022062309/56649ed15503460f94be08ae/html5/thumbnails/6.jpg)
Assembly StrategyAssembly Strategy
Solexa read assembler to extend short reads to 1-2 kb long reads
Genome/Chromosome
Capillary reads assemblerPhrap/Phusion
forward-reverse paired reads
30-75 bp
known dist
~500 bp
30-75 bp
![Page 7: Fuzzypath – Algorithms, Applications and Future Developments Zemin Ning Sequence Assembly and Analysis](https://reader035.vdocument.in/reader035/viewer/2022062309/56649ed15503460f94be08ae/html5/thumbnails/7.jpg)
Kmer Extension & WalkKmer Extension & Walk
![Page 8: Fuzzypath – Algorithms, Applications and Future Developments Zemin Ning Sequence Assembly and Analysis](https://reader035.vdocument.in/reader035/viewer/2022062309/56649ed15503460f94be08ae/html5/thumbnails/8.jpg)
Base Quality to Filter Base ErrorsBase Quality to Filter Base Errors
![Page 9: Fuzzypath – Algorithms, Applications and Future Developments Zemin Ning Sequence Assembly and Analysis](https://reader035.vdocument.in/reader035/viewer/2022062309/56649ed15503460f94be08ae/html5/thumbnails/9.jpg)
Read Pairs in Repeat JunctionsRead Pairs in Repeat Junctions
![Page 10: Fuzzypath – Algorithms, Applications and Future Developments Zemin Ning Sequence Assembly and Analysis](https://reader035.vdocument.in/reader035/viewer/2022062309/56649ed15503460f94be08ae/html5/thumbnails/10.jpg)
Means to handle repeats:Means to handle repeats: - Base quality- Base quality - Read pair- Read pair - Fuzzy kmers- Fuzzy kmers - Closely related reference- Closely related reference - 454 or Sanger reads- 454 or Sanger reads
Kmer Extension & Repeat JunctionsKmer Extension & Repeat Junctions
Pileup of other reads like 454, Sanger etc Pileup of other reads like 454, Sanger etc at a repeat junction at a repeat junction
Consensus
![Page 11: Fuzzypath – Algorithms, Applications and Future Developments Zemin Ning Sequence Assembly and Analysis](https://reader035.vdocument.in/reader035/viewer/2022062309/56649ed15503460f94be08ae/html5/thumbnails/11.jpg)
Handling of Repeat JunctionsHandling of Repeat Junctions
![Page 12: Fuzzypath – Algorithms, Applications and Future Developments Zemin Ning Sequence Assembly and Analysis](https://reader035.vdocument.in/reader035/viewer/2022062309/56649ed15503460f94be08ae/html5/thumbnails/12.jpg)
Handling of Single Base Variations Handling of Single Base Variations
![Page 13: Fuzzypath – Algorithms, Applications and Future Developments Zemin Ning Sequence Assembly and Analysis](https://reader035.vdocument.in/reader035/viewer/2022062309/56649ed15503460f94be08ae/html5/thumbnails/13.jpg)
Fuzzypath PipelineFuzzypath Pipeline
![Page 14: Fuzzypath – Algorithms, Applications and Future Developments Zemin Ning Sequence Assembly and Analysis](https://reader035.vdocument.in/reader035/viewer/2022062309/56649ed15503460f94be08ae/html5/thumbnails/14.jpg)
Fuzzypath Read FileFuzzypath Read File
![Page 15: Fuzzypath – Algorithms, Applications and Future Developments Zemin Ning Sequence Assembly and Analysis](https://reader035.vdocument.in/reader035/viewer/2022062309/56649ed15503460f94be08ae/html5/thumbnails/15.jpg)
Fuzzypath Fastq FileFuzzypath Fastq File
![Page 16: Fuzzypath – Algorithms, Applications and Future Developments Zemin Ning Sequence Assembly and Analysis](https://reader035.vdocument.in/reader035/viewer/2022062309/56649ed15503460f94be08ae/html5/thumbnails/16.jpg)
Solexa reads:Number of reads: 6,000,000;Finished genome size: ~4.8 Mbp;Read length: 2x37 bp;Estimated read coverage: ~92.5 X;Insert size: 170/50-300 bp;
Assembly features: - contig statsSolexa 454
Total number of contigs: 75; 390Total bases of contigs: 4.80 Mbp 4.77 MbN50 contig size: 139,353 25,702Largest contig: 395,600 62,040Averaged contig size: 63,969 12,224Contig coverage on genome: ~99.8 % 99.4%Contig extension errors: 0Mis-assembly errors: 0 4
Salmonella seftenberg Salmonella seftenberg Solexa Solexa Assembly from Pair-End ReadsAssembly from Pair-End Reads
![Page 17: Fuzzypath – Algorithms, Applications and Future Developments Zemin Ning Sequence Assembly and Analysis](https://reader035.vdocument.in/reader035/viewer/2022062309/56649ed15503460f94be08ae/html5/thumbnails/17.jpg)
maqmaq
ssaha2ssaha2
![Page 18: Fuzzypath – Algorithms, Applications and Future Developments Zemin Ning Sequence Assembly and Analysis](https://reader035.vdocument.in/reader035/viewer/2022062309/56649ed15503460f94be08ae/html5/thumbnails/18.jpg)
maqmaq
ssaha2ssaha2
![Page 19: Fuzzypath – Algorithms, Applications and Future Developments Zemin Ning Sequence Assembly and Analysis](https://reader035.vdocument.in/reader035/viewer/2022062309/56649ed15503460f94be08ae/html5/thumbnails/19.jpg)
![Page 20: Fuzzypath – Algorithms, Applications and Future Developments Zemin Ning Sequence Assembly and Analysis](https://reader035.vdocument.in/reader035/viewer/2022062309/56649ed15503460f94be08ae/html5/thumbnails/20.jpg)
maqmaq
ssaha2ssaha2
![Page 21: Fuzzypath – Algorithms, Applications and Future Developments Zemin Ning Sequence Assembly and Analysis](https://reader035.vdocument.in/reader035/viewer/2022062309/56649ed15503460f94be08ae/html5/thumbnails/21.jpg)
maqmaq
ssaha2ssaha2
![Page 22: Fuzzypath – Algorithms, Applications and Future Developments Zemin Ning Sequence Assembly and Analysis](https://reader035.vdocument.in/reader035/viewer/2022062309/56649ed15503460f94be08ae/html5/thumbnails/22.jpg)
New Phusion AssemblerNew Phusion Assembler
SolexaReads
Assembly
Reads Group
Data Process Long Insert Reads
Supercontig
Contigs
PRono
Fuzzypath
Phrap
Velvet
2x75 or 2x100
![Page 23: Fuzzypath – Algorithms, Applications and Future Developments Zemin Ning Sequence Assembly and Analysis](https://reader035.vdocument.in/reader035/viewer/2022062309/56649ed15503460f94be08ae/html5/thumbnails/23.jpg)
Solexa reads:Number of reads: 557 Million;Finished genome size: 3.0 GB;Read length: 2x75bp;Estimated read coverage: ~25X;Insert size: 190/50-300 bp;Number of reads clustered: 458 Million
Assembly features: - contig statsTotal number of contigs: 1,040,582;Total bases of contigs: 2.703 GbN50 contig size: 6,484;Largest contig: 85,595 Averaged contig size: 2,597;Contig coverage over the genome: ~90 %;Mis-assembly errors: ?
Human AssemblyHuman Assembly – – COLO-829COLO-829Normal CellNormal Cell
![Page 24: Fuzzypath – Algorithms, Applications and Future Developments Zemin Ning Sequence Assembly and Analysis](https://reader035.vdocument.in/reader035/viewer/2022062309/56649ed15503460f94be08ae/html5/thumbnails/24.jpg)
Acknowledgements:
Yong Gu James Bonfield Heng Li Hannes Ponstingl Daniel Zerbino (EBI) Helen Beasley Siobhan Whitehead Tony Cox