mcb3895-004 lecture #15 oct 23/14 de novo assemblies using pacbio

Post on 14-Dec-2015

217 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

MCB3895-004 Lecture #15Oct 23/14

De novo assemblies using PacBio

PacBio

• Long read sequencing technology

• High error rate (~13%) threw people at first• What would this be good for?

• Scaffolding an early focus

• Also correct reads using Illumina data• (now obsolete)

HGAP

• "Hierarchical Genome Assembly Process"

1. Preassembly - corrects longest reads by mapping shorter reads to them, quality trims

2. Assembly - OLC approach

3. Polishing - Quiver software derives consensus from mapped reads, uses to correct assembly

Results

• My test gave an impressive 1 contig!• High ~60X coverage, tame dataset

• Known problem: still some SNP errors • Can run Quiver again1. Import assembly as a reference sequence2. Perform reference mapping using same reads vs.

new reference3. Will output a new consensus fasta file

incorporating the variants it finds

PacBio chemistries

• PacBio has continually updated both its polymerases and detection chemistry

• Current test data uses P4-C2 chemistry

• P5-C3 gave slightly better length, maybe a bit more error

• Fastq available for this E.coli: SRR1284073

• Brand new: P6-C4

P6-C4

• As per last week

• 10-15kb read N50

• Slightly better accuracy?

• http://blog.pacificbiosciences.com/2014/10/new-chemistry-boosts-average-read.html

Other options: hybrid assemby

• It is possible to combine multiple data types

• Goal: cover the respective strengths of each• (of course, could confound too!)

• SPAdes is one of the most flexible assemblers in this regard

• Must have some Illumina• Will accept corrected, uncorrected PacBio (and

many more, including Oxford Nanopore)

Assignment #7

• Create 2 E.coli assemblies using PacBio data• Use P4-C2 alone and HGAP• Use Illumina + P5-C3 uncorrected• Use Illumina + P4-C2 uncorrected• Use Illumina + P4-C2 corrected• Multiple quiver steps to correct• Some other option!

• Hand in:• 2 genome assemblies• Lab notebook file detailing exact commands

top related