2011-04-26_various-assemblers-presentation

21
EBI is an Outstation of the European Molecular Biology Laboratory. Assembly tools and Visualisation Matthias Haimel

Upload: mhaimel

Post on 10-May-2015

1.673 views

Category:

Education


0 download

TRANSCRIPT

Page 1: 2011-04-26_various-assemblers-presentation

EBI is an Outstation of the European Molecular Biology Laboratory.

Assembly tools and Visualisation

Matthias Haimel

Page 2: 2011-04-26_various-assemblers-presentation

Assemblers25.04.112

Overview

• Assemblers• ABySS• SOAPdenovo

• Visualisation• Tablet• AbySS-Explorer

• Read mapping• Sam / Bam

• Visualisation• Artemis• IGV - Integrative Genomics Viewer

Page 3: 2011-04-26_various-assemblers-presentation

Assemblers25.04.113

ABySS Assembly By Short Sequences

• Genome Sciences Centre, Vancouver• http://www.bcgsc.ca/platform/bioinfo/software/abyss• Open source, BCCA Licence

• de Bruijn graph • Trimming (tip clipping), bubble popping• Use paired-end information: resolve ambiguities between contigs• parallel (use cluster)

• Files• Fasta / Fastq• Sam/Bam• colour-space

Page 4: 2011-04-26_various-assemblers-presentation

Assemblers25.04.114

ABySS

• ABYSS (singe end)• e.g. ABYSS -k27 single.fastq -o contigs.fa

• abyss-pe (paired end)• e.g. abyss-pe k=27 n=10 in='read_1.fastq read_2.fastq' name=ecli

• Multiple libraries• ... lib=’read1 read2’ read1=’read1_1.fa read1_2.fa’ read2=’read2_1.fa read1_2.fa’

Page 5: 2011-04-26_various-assemblers-presentation

Assemblers25.04.115

SOAPdenovo

• Beijing Genomics Institute (BGI), China• http://soap.genomics.org.cn/soapdenovo.html• Panda genome• Source available

• de Bruijn graph • pre-set Kmer frequency threshold• Bubble removing

• Build scaffold• mapping reads to contigs• gap filling

Page 6: 2011-04-26_various-assemblers-presentation

Assemblers25.04.116

SOAPdenovo

• Full run • e.g. SOAPdenovo all -s read.config -K 27 -o contigs.fa

• Run sub steps • pregraph = velveth• contig = velvetg• map map reads to contigs• scaff scaffolding

• Configuration• Config file input instead of read files• Specify rank, usage (assembly/scaffolding), insert size

Page 7: 2011-04-26_various-assemblers-presentation

Assemblers25.04.117

Visualisation

• Tablet• Lightweight• Easy to use

• Formats• ACE• AFG• BAM• BANK (AMOS)

http://bioinf.scri.ac.uk/tablet/

Page 8: 2011-04-26_various-assemblers-presentation

Assemblers25.04.118

Visualisation - Velvet

• Tablet• Velvetg ... -amos_file yes

• GraphViz• Transform velvet graph into GraphViz format• Contributed by Paul Harrison• <velvet>/contrib/layout/• Velvet -> .dot file (Python script)• .dot -> png (graphviz)

Page 9: 2011-04-26_various-assemblers-presentation

Assemblers25.04.119

Visualisation

• ABySS-Explorer• Visualizes ABySS assemblies• Interactive graph structure• Filter contigs

http://www.bcgsc.ca/platform/bioinfo/software/abyss-explorer

Page 10: 2011-04-26_various-assemblers-presentation

Assemblers25.04.1110

Assembler - Practical

• Assemblers• ABySS • SOAPdenovo

• Visualisation• Tablet• ABySS-Explorer

Page 11: 2011-04-26_various-assemblers-presentation

Assemblers25.04.1111

Read mapping

• SAM / BAM• Sequence Alignment / Map format (SAM)• Binary form of SAM (BAM)• generic format • Flexible and simple• Compact (BAM)• Allow indexing• Load regions• Support streaming

http://samtools.sourceforge.net/SAM1.pdf

Page 12: 2011-04-26_various-assemblers-presentation

Assemblers25.04.1112

SAM

• Header• File format version information• Sequence dictionary (name/length/..)• Read group (platform/library/...)• Program info

• Body• Alignment information

Page 13: 2011-04-26_various-assemblers-presentation

Assemblers25.04.1113

SAM Header

• '@' followed by record type (two characters)

@HD VN:1.0@SQ SN:chr20 LN:62435964@RG ID:L1 PU:SC_1_10LB:SC_1 SM:NA12891@RG ID:L2 PU:SC_2_12LB:SC_2 SM:NA12891

Page 14: 2011-04-26_various-assemblers-presentation

Assemblers25.04.1114

SAM Alignment

• Tab delimited lines

Page 15: 2011-04-26_various-assemblers-presentation

Assemblers25.04.1115

SAM Alignment

• Tab delimited lines

Read_28833_29006_6945 99 chr20 28833 20 10M1D25M = 28993 195 \ AGCT... <<<<... NM:i:1 RG:Z:L1read_28701_28881_323b 147 chr20 28834 30 35M = 28701 -168 \ACCT... <<7;:... MF:i:18 RG:Z:L2

Page 16: 2011-04-26_various-assemblers-presentation

Assemblers25.04.1116

Tools

• Mapping Reads• BWA • Bowtie • SSAHA2

• Manipulate SAM/BAM• SAM Tools package• Picard

Page 17: 2011-04-26_various-assemblers-presentation

Assemblers25.04.1117

BWA

• Burrows-Wheeler Alignment Tool• Map (singe/paired-end/long) reads to a sequence

• Index database• bwa index -a bwtsw database.fasta

• Align reads• bwa aln database.fasta short_read.fastq > aln_sa.sai

• Generate alignments• bwa sampe database.fasta aln_sa1.sai aln_sa2.sai read1.fq read2.fq > aln.sam

• Long reads• bwa bwasw database.fasta long_read.fastq > aln.sam

Page 18: 2011-04-26_various-assemblers-presentation

Assemblers25.04.1118

SAM tools

• Utilities for SAM format• samtools <command> ...

• Commands:• view: SAM <-> BAM• sort: sort BAM file • index: build BAM file index• merge: merges x BAM files• pileup: alignment in the pileup format• tview: integrated Text alignment viewer

Page 19: 2011-04-26_various-assemblers-presentation

Assemblers25.04.1119

Visualisation Integrative Genomics Viewer

• IGV• Good integration

• Formats• DAS• BAM• GFF• ...

• Tools• Run scripts• Export region• ...

http://www.broadinstitute.org/igv/

Page 20: 2011-04-26_various-assemblers-presentation

Assemblers25.04.1120

Visualisation

• Artemis• Sequence Viewer• Annotation tool

• Formats• EMBL• GENBANK• GFF• FASTA• BAM

http://www.sanger.ac.uk/resources/software/artemis/

Page 21: 2011-04-26_various-assemblers-presentation

Assemblers25.04.1121

Mapping - Practical

• Mapping reads + prepare for visalization• BWA • samtools

• Visualisation• IGV