2011-04-26_various-assemblers-presentation
TRANSCRIPT
EBI is an Outstation of the European Molecular Biology Laboratory.
Assembly tools and Visualisation
Matthias Haimel
Assemblers25.04.112
Overview
• Assemblers• ABySS• SOAPdenovo
• Visualisation• Tablet• AbySS-Explorer
• Read mapping• Sam / Bam
• Visualisation• Artemis• IGV - Integrative Genomics Viewer
Assemblers25.04.113
ABySS Assembly By Short Sequences
• Genome Sciences Centre, Vancouver• http://www.bcgsc.ca/platform/bioinfo/software/abyss• Open source, BCCA Licence
• de Bruijn graph • Trimming (tip clipping), bubble popping• Use paired-end information: resolve ambiguities between contigs• parallel (use cluster)
• Files• Fasta / Fastq• Sam/Bam• colour-space
Assemblers25.04.114
ABySS
• ABYSS (singe end)• e.g. ABYSS -k27 single.fastq -o contigs.fa
• abyss-pe (paired end)• e.g. abyss-pe k=27 n=10 in='read_1.fastq read_2.fastq' name=ecli
• Multiple libraries• ... lib=’read1 read2’ read1=’read1_1.fa read1_2.fa’ read2=’read2_1.fa read1_2.fa’
Assemblers25.04.115
SOAPdenovo
• Beijing Genomics Institute (BGI), China• http://soap.genomics.org.cn/soapdenovo.html• Panda genome• Source available
• de Bruijn graph • pre-set Kmer frequency threshold• Bubble removing
• Build scaffold• mapping reads to contigs• gap filling
Assemblers25.04.116
SOAPdenovo
• Full run • e.g. SOAPdenovo all -s read.config -K 27 -o contigs.fa
• Run sub steps • pregraph = velveth• contig = velvetg• map map reads to contigs• scaff scaffolding
• Configuration• Config file input instead of read files• Specify rank, usage (assembly/scaffolding), insert size
Assemblers25.04.117
Visualisation
• Tablet• Lightweight• Easy to use
• Formats• ACE• AFG• BAM• BANK (AMOS)
http://bioinf.scri.ac.uk/tablet/
Assemblers25.04.118
Visualisation - Velvet
• Tablet• Velvetg ... -amos_file yes
• GraphViz• Transform velvet graph into GraphViz format• Contributed by Paul Harrison• <velvet>/contrib/layout/• Velvet -> .dot file (Python script)• .dot -> png (graphviz)
Assemblers25.04.119
Visualisation
• ABySS-Explorer• Visualizes ABySS assemblies• Interactive graph structure• Filter contigs
http://www.bcgsc.ca/platform/bioinfo/software/abyss-explorer
Assemblers25.04.1110
Assembler - Practical
• Assemblers• ABySS • SOAPdenovo
• Visualisation• Tablet• ABySS-Explorer
Assemblers25.04.1111
Read mapping
• SAM / BAM• Sequence Alignment / Map format (SAM)• Binary form of SAM (BAM)• generic format • Flexible and simple• Compact (BAM)• Allow indexing• Load regions• Support streaming
http://samtools.sourceforge.net/SAM1.pdf
Assemblers25.04.1112
SAM
• Header• File format version information• Sequence dictionary (name/length/..)• Read group (platform/library/...)• Program info
• Body• Alignment information
Assemblers25.04.1113
SAM Header
• '@' followed by record type (two characters)
@HD VN:1.0@SQ SN:chr20 LN:62435964@RG ID:L1 PU:SC_1_10LB:SC_1 SM:NA12891@RG ID:L2 PU:SC_2_12LB:SC_2 SM:NA12891
Assemblers25.04.1114
SAM Alignment
• Tab delimited lines
Assemblers25.04.1115
SAM Alignment
• Tab delimited lines
Read_28833_29006_6945 99 chr20 28833 20 10M1D25M = 28993 195 \ AGCT... <<<<... NM:i:1 RG:Z:L1read_28701_28881_323b 147 chr20 28834 30 35M = 28701 -168 \ACCT... <<7;:... MF:i:18 RG:Z:L2
Assemblers25.04.1116
Tools
• Mapping Reads• BWA • Bowtie • SSAHA2
• Manipulate SAM/BAM• SAM Tools package• Picard
Assemblers25.04.1117
BWA
• Burrows-Wheeler Alignment Tool• Map (singe/paired-end/long) reads to a sequence
• Index database• bwa index -a bwtsw database.fasta
• Align reads• bwa aln database.fasta short_read.fastq > aln_sa.sai
• Generate alignments• bwa sampe database.fasta aln_sa1.sai aln_sa2.sai read1.fq read2.fq > aln.sam
• Long reads• bwa bwasw database.fasta long_read.fastq > aln.sam
Assemblers25.04.1118
SAM tools
• Utilities for SAM format• samtools <command> ...
• Commands:• view: SAM <-> BAM• sort: sort BAM file • index: build BAM file index• merge: merges x BAM files• pileup: alignment in the pileup format• tview: integrated Text alignment viewer
Assemblers25.04.1119
Visualisation Integrative Genomics Viewer
• IGV• Good integration
• Formats• DAS• BAM• GFF• ...
• Tools• Run scripts• Export region• ...
http://www.broadinstitute.org/igv/
Assemblers25.04.1120
Visualisation
• Artemis• Sequence Viewer• Annotation tool
• Formats• EMBL• GENBANK• GFF• FASTA• BAM
http://www.sanger.ac.uk/resources/software/artemis/
Assemblers25.04.1121
Mapping - Practical
• Mapping reads + prepare for visalization• BWA • samtools
• Visualisation• IGV