From Sequence to Knowledge
Assembly, Annotation, and Analysis
of Phage genomes from Genomic
and Metagenomic Data Sets
A helping hand through
The Annotation Bottleneck
Ramy K. Aziz
Workshop presenters
6 Aug 2017 Phage Genomics - Evergreen 2017
Alejandro Reyes
AR
Ramy Aziz
RAJason Gill
JG
A bit of history…
• Since 2009, the Genomics Workshop has
become an essential part of the Evergreen
phage meeting
• The challenge always is: how to meet
needs/expectations that are so many and
so diverse, in ~4 hours
6 Aug 2017 Phage Genomics - Evergreen 2017
A bit of history…
• Since 2009, the Genomics Workshop has
become an essential part of the Evergreen
phage meeting
• The challenge always is: how to meet
needs/expectations that are so many and
so diverse, in ~4 hours
• The answer is:
…….
6 Aug 2017 Phage Genomics - Evergreen 2017
“The analysis bottleneck”
• Observation:
– We generate more data than we can analyze.
– We generate sequence data faster than
we can analyze them.
• Opinion:
– Not all bottlenecks are
created equal!
– It is important to define the question(s)
before working on the answer(s)!6 Aug 2017 Phage Genomics - Evergreen 2017
Workshop audience
• Who (how many) among you have:
– annotated at least a phage genome?
– worked on a viral metagenome?
– used the command line (Unix, Linux, Mac
Terminal) for sequence analysis?
• We have actually ran an online survey,
and here is what we found …
6 Aug 2017 Phage Genomics - Evergreen 2017
Quick group activity
Defining the question(s):
• Introduce yourself, your institution, and your
favorite phage
• Do you have a genome sequenced? Planning to?
– Why have you sequenced your phage genome?
– Why you want to sequence your phage genome?
• What is the single most pressing question you
want to have answered from genome analysis?
6 Aug 2017 Phage Genomics - Evergreen 2017
What you want …... isfrom genome from metagenome
6 Aug 2017 Phage Genomics - Evergreen 2017
Incomplete
frameshift
- complete
- accurate
Credit: Andrew Kropinski Credit: Bas Dutilh
faulty assembly
What you want …... isfrom genome from metagenome
6 Aug 2017
Incomplete faulty assembly
frameshift
- complete
- accurate
Phage Genomics - Evergreen 2017
Credit: Andrew Kropinski Credit: Bas Dutilh
A process of reconstruction
• Experimentally
6 Aug 2017 Phage Genomics - Evergreen 2017
DNA
TGATTGTGTGTTTGCGCAATGCG
ATGTGTATATATAGTGAGCTTGCCC
GTCTCTCTNNNTCTCTTG
TGATTGGTCTNNNTCTCTTGCGCAATGCG
A process of reconstruction
• Experimentally
• Computationally
6 Aug 2017 Phage Genomics - Evergreen 2017
TGATTGTGTGTTTGCGCAATGCG
ATGTGTATATATAGTGAGCTTGCCC
GTCTCTCTNNNTCTCTTG
TGATTGGTCTNNNTCTCTTGCGCAATGCG
DNA
TGATTGTGTGTTTGCGCAATGCG
ATGTGTATATATAGTGAGCTTGCCC
GTCTCTCTNNNTCTCTTG
TGATTGGTCTNNNTCTCTTGCGCAATGCG
A process of reconstruction
• Experimentally
• Computationally
6 Aug 2017 Phage Genomics - Evergreen 2017
TGATTGTGTGTTTGCGCAATGCG
ATGTGTATATATAGTGAGCTTGCCC
GTCTCTCTNNNTCTCTTG
TGATTGGTCTNNNTCTCTTGCGCAATGCG
“Any phage
one can get!”
“eDNA”
TGATTGTGTGTTTGCGCAATGCG
ATGTGTATATATAGTGAGCTTGCCC
GTCTCTCTNNNTCTCTTG
TGATTGGTCTNNNTCTCTTGCGCAATGCG
Assembly
Gene finding/
ORF calling
tRNA calling
Annotation
(Assigning
functions)
orienting
Validation
Fixing frameshifts
Introns and Inteins Subsystem
assignment
Refinement/
Secondary
annotation
loop
Special purpose:
toxins, morons, integrases,
lifestyle prediction
Regulatory elements
(promoters, terminators)
Output: files and graphics
From Sequence to Knowledge
From raw sequence data to
genome submission/ publication
Classification
• The phage sequence space (Lima-Mendez et al.)
• The phage proteomic tree (Edwards & Rohwer)
• New: VIP tree http://www.genome.jp/viptree
6 Aug 2017 Phage Genomics - Evergreen 2017
This workshop: outline
1. Annotation overview
2. Automated tools for genome annotation:
– PhAnToMe/RAST related tools
– Galaxy/ Apollo
3. Tools for metagenome-based analyses
– Assembly
– Functional prediction via protein families
6 Aug 2017 Phage Genomics - Evergreen 2017
Where to go from here?
• Part I:
General introduction of genome annotation
• Part II:
Two levels
– Level 1: Novices and beginners:
Automated annotation tools
– Level 2: Intermediate to advanced users:
Command-line based tools
6 Aug 2017 Phage Genomics - Evergreen 2017
Online resources/ Slideshare• Data & links:
– http://egybio.net/tutorial
• Slides
– http://bit.ly/annotation2016
– http://bit.ly/phantome4
– Old tutorials (more detailed, but missing latest ):
• Evergreen 2011: http://slidesha.re/phantome1
• http://slidesha.re/phiRAST1 (by Karin Holmfeldt)
• Evergreen 2013: http://bit.ly/phantome2
• Evergreen 2015: http://bit.ly/phantome3
6 Aug 2017 Phage Genomics - Evergreen 2017