martijn vermaat department of human genetics center for ......alignment methods sequence alignment...
TRANSCRIPT
Alignment methods
Martijn Vermaat
Department of Human Genetics
Center for Human and Clinical Genetics
Alignment methods
Sequence alignment
Assembly vs alignment
Alignment methods
Common issues
Platform specifics
Software
Metagenomics course 1/28 Thursday, 7 February 2013
Sequence alignment
Identifying regions of similarity in sequences
Metagenomics course 2/28 Thursday, 7 February 2013
Sequence alignment
Identifying regions of similarity in sequences
In NGS
• Recovering original nucleotide sequence
• . . . from many short fragments
• . . . using a known reference
Metagenomics course 2/28 Thursday, 7 February 2013
Sequence alignment
Pairwise alignment
Metagenomics course 3/28 Thursday, 7 February 2013
Sequence alignment
Multiple sequence alignment
Metagenomics course 4/28 Thursday, 7 February 2013
Sequence alignment
Global vs local alignment
Metagenomics course 5/28 Thursday, 7 February 2013
Sequence alignment
Structural alignment
Metagenomics course 6/28 Thursday, 7 February 2013
Assembly vs alignment
Alignment methods
Sequence alignment
Assembly vs alignment
Alignment methods
Common issues
Platform specifics
Software
Metagenomics course 7/28 Thursday, 7 February 2013
Assembly vs alignment
Assembly
Metagenomics course 8/28 Thursday, 7 February 2013
Assembly vs alignment
Assembly
Alignment
Metagenomics course 8/28 Thursday, 7 February 2013
Assembly vs alignment
Assembly
• Memory hungry
• Needs high coverage
Metagenomics course 9/28 Thursday, 7 February 2013
Assembly vs alignment
Assembly
• Memory hungry
• Needs high coverage
Alignment
• Easy to do in parallel
• Restricted by reference sequence
• highly polymorphic regions• large insertions
Metagenomics course 9/28 Thursday, 7 February 2013
Alignment methods
Alignment methods
Sequence alignment
Assembly vs alignment
Alignment methods
Common issues
Platform specifics
Software
Metagenomics course 10/28 Thursday, 7 February 2013
Alignment methods
Smith-Waterman
• Generalization of Needleman-Wunsch• Guaranteed optimal alignment
− A C A C A C T A
− 0 0 0 0 0 0 0 0 0
A 0 2 1 2 1 2 1 0 2
G 0 1 1 1 1 1 1 0 1
C 0 0 3 2 3 2 3 2 1
A 0 2 2 5 4 5 4 3 4
C 0 1 4 4 7 6 7 6 5
A 0 2 3 6 6 9 8 7 8
C 0 1 4 5 8 8 11 10 9
A 0 2 3 6 7 10 10 10 12
gap penalty = −1
match = +2
mismatch = −1
Metagenomics course 11/28 Thursday, 7 February 2013
Alignment methods
2-step alignment
Metagenomics course 12/28 Thursday, 7 February 2013
Alignment methods
2-step alignment
Step 1: Find candidate positions
• Use read seeds• Hash table-based or Burrows-Wheeler transform-based
heuristic• Balance between speed and accuracy
Metagenomics course 12/28 Thursday, 7 February 2013
Alignment methods
2-step alignment
Step 2: Align and report
• Complete alignment with Smith-Waterman• Evaluate alignment(s)
Metagenomics course 12/28 Thursday, 7 February 2013
Common issues
Alignment methods
Sequence alignment
Assembly vs alignment
Alignment methods
Common issues
Platform specifics
Software
Metagenomics course 13/28 Thursday, 7 February 2013
Common issues
Insertions and deletions (indels)
Metagenomics course 14/28 Thursday, 7 February 2013
Common issues
Insertions and deletions (indels)
• Local realignment around indels• Per-Base Alignment Qualities (BAQ)
Metagenomics course 14/28 Thursday, 7 February 2013
Common issues
Non-unique alignment
How to report non-unique alignments?
Metagenomics course 15/28 Thursday, 7 February 2013
Common issues
Non-unique alignment
How to report non-unique alignments?
• Discard entirely
• Choose one randomly
• Report all
• with best quality• above some quality
Depends on the tool
Metagenomics course 15/28 Thursday, 7 February 2013
Common issues
Structural variation
• Chromosomal relocation
• Inversion
• Large indels
• Copy-number variation
Use specialized tools
Metagenomics course 16/28 Thursday, 7 February 2013
Common issues
Split-read mapping
• Allow aligned read to be split• For example RNA reads on DNA reference
Metagenomics course 17/28 Thursday, 7 February 2013
Common issues
Split-read mapping
• Allow aligned read to be split• For example RNA reads on DNA reference
Metagenomics course 17/28 Thursday, 7 February 2013
Common issues
Circular alignment
• Circular genome (e.g. bacteria, mitochondria)
Metagenomics course 18/28 Thursday, 7 February 2013
Common issues
Circular alignment
• Circular genome (e.g. bacteria, mitochondria)• Most aligners assume linear reference
Metagenomics course 18/28 Thursday, 7 February 2013
Common issues
Circular alignment
• Circular genome (e.g. bacteria, mitochondria)• Most aligners assume linear reference• Trick: extend reference
Metagenomics course 18/28 Thursday, 7 February 2013
Common issues
Circular alignment
• Circular genome (e.g. bacteria, mitochondria)• Most aligners assume linear reference• Trick: extend reference
• copy first N bases to the end
Metagenomics course 18/28 Thursday, 7 February 2013
Common issues
Circular alignment
• Circular genome (e.g. bacteria, mitochondria)• Most aligners assume linear reference• Trick: extend reference
• copy first N bases to the end• restore alignment to original reference
Metagenomics course 18/28 Thursday, 7 February 2013
Platform specifics
Alignment methods
Sequence alignment
Assembly vs alignment
Alignment methods
Common issues
Platform specifics
Software
Metagenomics course 19/28 Thursday, 7 February 2013
Platform specifics
Paired-end sequencing
Metagenomics course 20/28 Thursday, 7 February 2013
Platform specifics
Paired-end sequencing
• Align reads separately• Choose from non-unique alignments based on pairing
Metagenomics course 20/28 Thursday, 7 February 2013
Platform specifics
Color-space (or SOLiD) reads
• Used by 454, Solexa, SOLiD systems• Di-nucleotide encoding• Needs support from alignment software
Metagenomics course 21/28 Thursday, 7 February 2013
Platform specifics
Color-space (or SOLiD) reads
• Used by 454, Solexa, SOLiD systems• Di-nucleotide encoding• Needs support from alignment software
Metagenomics course 21/28 Thursday, 7 February 2013
Platform specifics
Color-space (or SOLiD) reads
Decoding
Metagenomics course 22/28 Thursday, 7 February 2013
Platform specifics
Error profile
• Homopolymers• CG-content• Positional (example shown)
Metagenomics course 23/28 Thursday, 7 February 2013
Software
Alignment methods
Sequence alignment
Assembly vs alignment
Alignment methods
Common issues
Platform specifics
Software
Metagenomics course 24/28 Thursday, 7 February 2013
Software
Some popular aligners for NGS
Hash table-based
• Eland• MAQ
Metagenomics course 25/28 Thursday, 7 February 2013
Software
Some popular aligners for NGS
Hash table-based
• Eland• MAQ
Burrows-Wheeler Transform-based
• Bowtie• BWA
Metagenomics course 25/28 Thursday, 7 February 2013
Software
Some popular aligners for NGS
Hash table-based
• Eland• MAQ
Burrows-Wheeler Transform-based
• Bowtie• BWA
Split-read alignment
• Tophat• GSNAP• Mosaik
Metagenomics course 25/28 Thursday, 7 February 2013
Software
Viewers
• IGV, Savant, Geneyous, Tablet
Metagenomics course 26/28 Thursday, 7 February 2013
Software
Viewers
• IGV, Savant, Geneyous, Tablet• tview (console-based)
Metagenomics course 26/28 Thursday, 7 February 2013
Software
Viewers
• IGV, Savant, Geneyous, Tablet• tview (console-based)• UCSC Genome Browser, GBrowse (web-based)
Metagenomics course 26/28 Thursday, 7 February 2013
Questions?
Acknowledgements:
Jeroen Laros
Bas E. Dutilh
Metagenomics course 27/28 Thursday, 7 February 2013
Questions?
Image sources
cbsu.tc.cornell.edu/ngw2010/day2 lecture1.pdf
en.wikipedia.org/wiki/Sequence alignment
en.wikipedia.org/wiki/Multiple sequence alignment
www.pitt.edu/ mcs2/teaching/biocomp/tutorials/global.html
www.biology-direct.com/content/4/1/30/figure/F3?highres=y
www.genomesunzipped.org/2012/04/guest-post-accurate-identification-of-rna-editing-sites-from-high
-throughput-sequencing-data.php
www.eplantscience.com/botanical biotechnology biology chemistry/biotechnology/genes genetic
engineering/genes nature concept and synthesis/biotech physical nature dna.php
www.pnas.org/content/109/4/1347/F1.expansion.html
omega.rc.unesp.br/mauricio/curso/bibliografia/22/362/Dibase%20Sequencing%20and%20Color%20Space
%20Analysis.pdf
cgrlucb.wikispaces.com/SAMtoolsSpring2012
and some of my own
Metagenomics course 28/28 Thursday, 7 February 2013