hard assembly jan pačes institute of molecular genetics as cr
TRANSCRIPT
![Page 1: Hard assembly Jan Pačes Institute of Molecular Genetics AS CR](https://reader035.vdocument.in/reader035/viewer/2022062408/56649f1d5503460f94c34cd2/html5/thumbnails/1.jpg)
hard assembly
Jan Pačes
Institute of Molecular Genetics AS CR
![Page 2: Hard assembly Jan Pačes Institute of Molecular Genetics AS CR](https://reader035.vdocument.in/reader035/viewer/2022062408/56649f1d5503460f94c34cd2/html5/thumbnails/2.jpg)
problemsgenomes high GC content repetitions (short - low informational content,
long) polymorphic "unreadable" sequences, "weird" structures
technologies nonrandom libraries wrong sizes erroneous or chimeric reads
![Page 3: Hard assembly Jan Pačes Institute of Molecular Genetics AS CR](https://reader035.vdocument.in/reader035/viewer/2022062408/56649f1d5503460f94c34cd2/html5/thumbnails/3.jpg)
sequencing technologies ABI (sanger)
454 (pyrosequencing)
solexa (reversible terminator)
SOLiD (2base ligation)
PacBio (SMRT)
![Page 4: Hard assembly Jan Pačes Institute of Molecular Genetics AS CR](https://reader035.vdocument.in/reader035/viewer/2022062408/56649f1d5503460f94c34cd2/html5/thumbnails/4.jpg)
example of errors in one technology
http://chevreux.org/mira_ex_454sanger.html
![Page 5: Hard assembly Jan Pačes Institute of Molecular Genetics AS CR](https://reader035.vdocument.in/reader035/viewer/2022062408/56649f1d5503460f94c34cd2/html5/thumbnails/5.jpg)
Aird et al. Genome Biology 2011
high GC regions are underrepresented
![Page 6: Hard assembly Jan Pačes Institute of Molecular Genetics AS CR](https://reader035.vdocument.in/reader035/viewer/2022062408/56649f1d5503460f94c34cd2/html5/thumbnails/6.jpg)
Aird et al. Genome Biology 2011
protocol optimization for high GC content
![Page 7: Hard assembly Jan Pačes Institute of Molecular Genetics AS CR](https://reader035.vdocument.in/reader035/viewer/2022062408/56649f1d5503460f94c34cd2/html5/thumbnails/7.jpg)
repetitions
scaffold
repetition
![Page 8: Hard assembly Jan Pačes Institute of Molecular Genetics AS CR](https://reader035.vdocument.in/reader035/viewer/2022062408/56649f1d5503460f94c34cd2/html5/thumbnails/8.jpg)
repetitions
![Page 9: Hard assembly Jan Pačes Institute of Molecular Genetics AS CR](https://reader035.vdocument.in/reader035/viewer/2022062408/56649f1d5503460f94c34cd2/html5/thumbnails/9.jpg)
repetitions recognition
MIRA http://sourceforge.net/projects/mira-assembler/
MaSuRCAhttp://www.genome.umd.edu/masurca.html
SPAdeshttp://bioinf.spbau.ru/spades
Repeatmaskerhttp://www.repeatmasker.org/
RepeatModeller (RECON and RepeatScout)http://www.repeatmasker.org/RepeatModeler.html
position aware assemblers
![Page 10: Hard assembly Jan Pačes Institute of Molecular Genetics AS CR](https://reader035.vdocument.in/reader035/viewer/2022062408/56649f1d5503460f94c34cd2/html5/thumbnails/10.jpg)
k-mer distribution
![Page 11: Hard assembly Jan Pačes Institute of Molecular Genetics AS CR](https://reader035.vdocument.in/reader035/viewer/2022062408/56649f1d5503460f94c34cd2/html5/thumbnails/11.jpg)
k-mer analysis
JELLYFISH - Fast, Parallel k-mer Counting for DNAhttp://www.cbcb.umd.edu/software/jellyfish/
Quake is a package to correct substitution sequencing errors in experiments with deep coveragehttp://www.cbcb.umd.edu/software/quake/
KHMER Trim off likely erroneous k-mershttps://khmer-protocols.readthedocs.org/en/v0.8.2/
![Page 12: Hard assembly Jan Pačes Institute of Molecular Genetics AS CR](https://reader035.vdocument.in/reader035/viewer/2022062408/56649f1d5503460f94c34cd2/html5/thumbnails/12.jpg)
repetitions
scaffold
repetition
![Page 13: Hard assembly Jan Pačes Institute of Molecular Genetics AS CR](https://reader035.vdocument.in/reader035/viewer/2022062408/56649f1d5503460f94c34cd2/html5/thumbnails/13.jpg)
filling gaps
GapCloser (part of SOAPdenovo)http://soap.genomics.org.cn/soapdenovo.html
GapFiller (part of SSPACE)http://www.baseclear.com/lab-products/bioinformatics-tools/gapfiller/
GapFillerhttp://sourceforge.net/projects/gapfiller/
![Page 14: Hard assembly Jan Pačes Institute of Molecular Genetics AS CR](https://reader035.vdocument.in/reader035/viewer/2022062408/56649f1d5503460f94c34cd2/html5/thumbnails/14.jpg)
454 multiplicates
![Page 15: Hard assembly Jan Pačes Institute of Molecular Genetics AS CR](https://reader035.vdocument.in/reader035/viewer/2022062408/56649f1d5503460f94c34cd2/html5/thumbnails/15.jpg)
contig coverage by large libraries
![Page 16: Hard assembly Jan Pačes Institute of Molecular Genetics AS CR](https://reader035.vdocument.in/reader035/viewer/2022062408/56649f1d5503460f94c34cd2/html5/thumbnails/16.jpg)
illumina pe and mate-pairs libraries
![Page 17: Hard assembly Jan Pačes Institute of Molecular Genetics AS CR](https://reader035.vdocument.in/reader035/viewer/2022062408/56649f1d5503460f94c34cd2/html5/thumbnails/17.jpg)
highly polymorphic genomes
scaffold
two copies of polymorphic contigs
![Page 18: Hard assembly Jan Pačes Institute of Molecular Genetics AS CR](https://reader035.vdocument.in/reader035/viewer/2022062408/56649f1d5503460f94c34cd2/html5/thumbnails/18.jpg)
polymorphic assembly workflow
normal assembly
condensing alternative contigs
mapping to identify SNPs
"repair" reads
second "polymorpic" assembly
http://www.fishbrowser.org/software/L_RNA_scaffolder
![Page 19: Hard assembly Jan Pačes Institute of Molecular Genetics AS CR](https://reader035.vdocument.in/reader035/viewer/2022062408/56649f1d5503460f94c34cd2/html5/thumbnails/19.jpg)
![Page 20: Hard assembly Jan Pačes Institute of Molecular Genetics AS CR](https://reader035.vdocument.in/reader035/viewer/2022062408/56649f1d5503460f94c34cd2/html5/thumbnails/20.jpg)
G-quadruplex
![Page 21: Hard assembly Jan Pačes Institute of Molecular Genetics AS CR](https://reader035.vdocument.in/reader035/viewer/2022062408/56649f1d5503460f94c34cd2/html5/thumbnails/21.jpg)
AGCGACCCCCCCCCACCACCGCCACCACCACCTCTGCCATTGGCCGCCGCCGCCCCCCCCCCATTAAACCCCCCCACCCCCCCCCGCGCTGCCCCCTCCCCGGTGG
Chicken p53 – coverage from RNAseq data
Coverage > 13,000X
![Page 22: Hard assembly Jan Pačes Institute of Molecular Genetics AS CR](https://reader035.vdocument.in/reader035/viewer/2022062408/56649f1d5503460f94c34cd2/html5/thumbnails/22.jpg)
CCCGCCCACCCCCACCCCCACCCGCACCCCCCACTCTCCCACCCCCACCCCCTTTTCTCCCACCCCCTCTTCTCCCACCCCCTTTTCCCCCCCTTCCTCCCCCCACTCCGCCCCCCCCCCGCCCCCTCCCCCCCCCCAGGTGAGGACCCT
Chicken erythropoietin (EPO)– coverage from RNAseq data
Coverage > 500X from RNAseq
(*EPO locus not completed even from 1000X coverage genomic Illumina data!)
![Page 23: Hard assembly Jan Pačes Institute of Molecular Genetics AS CR](https://reader035.vdocument.in/reader035/viewer/2022062408/56649f1d5503460f94c34cd2/html5/thumbnails/23.jpg)
chicken missing genes
![Page 24: Hard assembly Jan Pačes Institute of Molecular Genetics AS CR](https://reader035.vdocument.in/reader035/viewer/2022062408/56649f1d5503460f94c34cd2/html5/thumbnails/24.jpg)
that’s it, thank you
many thanks also to:
Daniel EllederTomáš HronMichal KolářHynek Strnad