the opera of phantome - version 2.0 (presented at the 21st biennial evergreen phage meeting)
TRANSCRIPT
The Opera of PhAnToMe 2.0
Ramy K. Aziz (@azizrk)Aug 02 2015
opus (LT) = work (Pl. opera)
SEED-based phage database (2009-2013-…)
Phage Genomics Workshop, Evergreen 2015
Phage Genomics - Evergreen 2015
As usual, slides will be made available
• Evergreen 2011 workshop– http://slidesha.re/phantome1– http://slidesha.re/phiRAST1
• Evergreen 2013 workshop– http://bit.ly/phantome2
• This year’s workshop: – http://bit.ly/phantome3
• Hashtag for the meeting?– #Evergreen15
08/02/2015
PRELUDEThe Opera of PhAnToMe 2.0
Aims• Direct
– Discuss the theory behind RAST– Quickly preview several tools developed under (or
under influence of) the PhAnToMe project– Demonstrate online, community annotation using
SEED
• Indirect– PhAnToMe 2.0?– Establish community annotation efforts/ design
courses/ crowdsourcing– Seek Funding? Crowdfunding?
08/02/2015 Phage Genomics - Evergreen 2015
Outline• Act I. The environment (the SEED)
– The SEED and the ‘Subsystems Technology’
• Act II. The toolbox (PhAnToMe and sequels)– The RAST family– PhACTS– PhiSPy– iVireons
• Act III. The community– Online annotation process – Annotation smmit(s)– Course design
08/02/2015 Phage Genomics - Evergreen 2015
$$
Writing proposals, applying for grants
08/02/2015
History
Phage Genomics - Evergreen 2015
NSF-funded, 3-year project (09-12) to develop
PhageAnnotationTools andMethods
Four Centers:- SDSU, San Diego, CA- VCU, Richmond, VA- USF, St. Pete FL- UA, Tucson, AZ
http://www.phantome.org
08/02/2015
Two years ago…
Phage Genomics - Evergreen 2015
MAJOR UPDATE
08/02/2015
Current status
Phage Genomics - Evergreen 2015
MAJOR UPDATE
08/02/2015
Current status
Phage Genomics - Evergreen 2015
ACT I. THE ENVIRONMENTThe Opera of PhAnToMe 2.0
I. The Environment: SEED
http://theseed.org
08/02/2015
Aziz RK,, et al. (2012) PLoS ONE 7(10): e48053. doi:10.1371/journal.pone.0048053
Phage Genomics - Evergreen 2015
SEED: Main concept
One genome
All genomes
08/02/2015 Phage Genomics - Evergreen 2015
SEED: Main concept
One genome
All genomes
08/02/2015 Phage Genomics - Evergreen 2015
“Subsystems-based technologies were developed in the SEED with the view that the interpretation of one genome can be made more efficient and consistent if hundreds of genomes are simultaneously annotated in one subsystem at a time”
SEED: Main concept• Protein-based database
Jargon: PEG = protein-encoding gene
• The subsystems approach
and• FIGfams: protein families based on
– sequence similarity– chromosomal co-occurrence, gene order,
synteny– human curation, evidence-based expert
assertions08/02/2015 Phage Genomics - Evergreen 2015
RAST: automated annotation
08/02/2015 Phage Genomics - Evergreen 2015
08/02/2015
What is a subsystem?• “A subset of functional roles studied across genomes”• A spreadsheet where:
– each row represents a genome– each column represents a functional role/ feature/ protein– different patterns = variants
Function 1 Function 2 … Function n
Genome a
Genome b
…
Genome z
Phage Genomics - Evergreen 2015
08/02/2015
What is a subsystem?
Phage Genomics - Evergreen 2015
Advantages of subsystems
Subsystems-basedannotation
08/02/2015 Phage Genomics - Evergreen 2015
Annotation Reconstruction
from genome from metagenome
08/02/2015 Phage Genomics - Evergreen 2015
Incomplete
frameshift
- complete- accurate
Credit: Andrew Kropinski Credit: Bas Dutilh
faulty assembly
Annotation Reconstruction
from genome from metagenome
08/02/2015
Incomplete faulty assembly
frameshift
- complete- accurate
Phage Genomics - Evergreen 2015
Credit: Andrew Kropinski Credit: Bas Dutilh
ACT II. THE TOOLBOXThe Opera of PhAnToMe 2.0
II. PhAnToMe ToolBoxhttp://www.phantome.org
08/02/2015 Phage Genomics - Evergreen 2015
The ToolBox: The RAST family• (At least) Five ways to annotate a genome via RAST:
– RAST (http://rast.nmpdr.org)
• annotates online, saves your genome on server
– myRAST (local)
• uses the server but you can edit offline)
– “PhAST” (http://www.phantome.org/PhageSeed/Phage.cgi?page=phast)
• optimized gene-calling
– Use your favorite gene caller then upload gbk file to RAST
– RASTtk (second-generation RAST)
• modular
• batch upload
08/02/2015 Phage Genomics - Evergreen 2015
New
http://rast.nmpdr.org
08/02/2015 Phage Genomics - Evergreen 2015
http://rast.nmpdr.org
08/02/2015 Phage Genomics - Evergreen 2015
“PhAST”: phage-optimized RAST
08/02/2015 Phage Genomics - Evergreen 2015
http://www.phantome.org/PhageSeed/Phage.cgi?page=phast
“PhAST”: phage-optimized RAST
08/02/2015 Phage Genomics - Evergreen 2015
http://www.phantome.org/PhageSeed/Phage.cgi?page=phast
RASTtk (RAST toolkit)
08/02/2015 Phage Genomics - Evergreen 2015
RASTtk (RAST toolkit)
08/02/2015 Phage Genomics - Evergreen 2015
The RASTtk Microbial Annotation Pipeline
FASTA QCFASTA to Genome TO Call rRNAs Call tRNAs
Call CDSsProdigal
Call CDSsGlimmer3
AnnotateProteins K-mer v2
AnnotateProteins K-mer v1
Call CRISPRs CALL Phages (PhiSpy)
Find Repeats ExportGenBank,
GFF3, Fasta
• Green boxes are alternative pipeline steps
• Dashed boxes are optional pipeline steps
08/02/2015 Phage Genomics - Evergreen 2015
In final development: phi-RASTtk
FASTA QCFASTA to Genome TO Call rRNAs Call tRNAs
Call CDSsProdigal
Call CDSsGenMark
AnnotatePhage
Proteins
AnnotateProteins K-mer v2
Find Repeats Find Toxins ExportGenBank,
GFF3, Fasta
• Green boxes are alternative pipeline steps
• Dashed boxes are optional pipeline steps
08/02/2015 Phage Genomics - Evergreen 2015
RASTtk command-line
08/02/2015 Phage Genomics - Evergreen 2015
RAST Video demos available• Watch on your own:
– http://tutorial.theseed.org
• Possible tutorial on Tuesday at 3 PM + hands-on application
08/02/2015 Phage Genomics - Evergreen 2015
ACTIVITIES/EXERCISESAfter this workshop (1 PM)
Phage Genomics - CeBio 2015
What do you need to annotate your genome?
• A sequenced genome• Format: fasta or genbank (.gbk)• A RAST username and password
06/02/2015
Phage Genomics - CeBio 2015
I. Browse your favorite genome
06/02/2015
Phage Genomics - CeBio 2015
1. Browse your favorite genome
06/02/2015
Phage Genomics - CeBio 2015
2. Explore the protein page• Annotation history• Annotation clearinghouse• Evidence
– similarities– literature
06/02/2015
Phage Genomics - CeBio 2015
2. Explore the protein page
06/02/2015
• Find your favorite protein
Phage Genomics - CeBio 2015
2. Explore the protein page
06/02/2015
• Find your favorite protein
Phage Genomics - CeBio 2015
3. Aligning proteins (in context)• Evidence> Similarities> Align• Compare region, advanced settings• Phylogenetic trees
06/02/2015
Phage Genomics - CeBio 2015
3. Aligning proteins (in context)
06/02/2015
END OF ACTIVITY
Other tools• PHACTS:
– classifies and predicts lifestyle
• PhiSpy: – finds prophages
• iVireons– predicts phage structural proteins, holins,
more to come
08/02/2015 Phage Genomics - Evergreen 2015
The ToolBox: PHACTS• PHAge Classification Tool Set
• Uses a novel similarity algorithm and a supervised Random Forest classifier to predict whether the lifestyle of a phage, described by its proteome, is virulent or temperate.
• The similarity algorithm creates a training set from phages with known lifestyles and along with the lifestyle annotation, trains a Random Forest to classify the lifestyle of a phage.
• PHACTS predictions have had a 99% precision rate.
08/02/2015 Phage Genomics - Evergreen 2015Kate McNair
PHACTS• http://www.phantome.org/PHACTS/
• Other applications• Host prediction: whether a phage infects a Gram
positive or Gram negative bacteria• Taxonomy prediction: a phage’s Family
08/02/2015 Phage Genomics - Evergreen 2015Kate McNair
PHACTS
08/02/2015 Phage Genomics - Evergreen 2015Kate McNair
The ToolBox: PhiSpy
Calculate genomic characteristics
Classifyprophage region
Evaluate predicted prophages
• Transcriptional Strand Orientation• Customized AT skew• Customized GC skew• Protein length • Abundance of Phage words
• Random Forest• Pre calculated training genome• Input bacterial genome
• Produce a rank for each gene
• Phage insertion points• Similarity of phage proteins
08/02/2015 Phage Genomics - Evergreen 2015Sajia Akhter
PhiSpy
• Performance comparison in 50 complete bacterial genomes
Applications %Identified %FN %FP
Prophinder 89% 11% 12%
Phage_finder 82% 18% 1.33%
PhiSpy 94% 6% 0.66%
08/02/2015 Phage Genomics - Evergreen 2015Sajia Akhter
• Download: PhiSpy – http://sourceforge.net/projects/phispy
• PhiSpy is on RASTtk
• Ran PhiSpy on 4,335 bacterial genomes
• Predicted 12,826 prophages in 3,203 genomes
– 9,101 known prophages
– 3,723 undefined prophages08/02/2015 Phage Genomics - Evergreen 2015
PhiSpy
Sajia Akhter
iVIREONS – http://vdm.sdsu.edu/ivireons
Victor Seguritan
“FAMILIES” OF ANNs
1) General structural proteins:
2) Phage major capsid proteins
3) Phage tail/tail fibers/collar etc.
4) Holins
5) Portals
• Trained with all types of proteins• Both phages & viruses
08/02/2015 Phage Genomics - Evergreen 2015
Victor Seguritan
1
iVIREONS – http://vdm.sdsu.edu/ivireons
2Enter User Info
VibrioPhage
3Upload Sequences
Victor Seguritan
4 View Results
5Copy Results to a Spreadsheet
iVIREONS – http://vdm.sdsu.edu/ivireons
- Structural 1:1- MCP 1:1- MCP 2:1- MCP 3:1- MCP 4:1- MCP 7:1- MCP 22:1
(lambda)- Tail 1:1- Tail 2:1- Tail 4:1- Tail 7:1- Tail 6.6:1
(lambda)
Stringencies Reported
08/02/2015 Phage Genomics - Evergreen 2015
ACT III. THE COMMUNITYThe Opera of PhAnToMe 2.0
SEED allows continuous annotation
08/02/2015
SEED
RAST
GenomesSubsystems
SEED Viewer
New Genomes
Subsystems Editor
Phage Genomics - Evergreen 2015
SEED allows community annotation
08/02/2015 Phage Genomics - Evergreen 2015
Annotations will improve only if YOU help
08/02/2015 Phage Genomics - Evergreen 2015
Prospects• Phage annotation “summits”
– First summit (Jan 2011) was at Biosphere 2, Tucson, AZ
– A second one?• On a summit? (e.g., Bogotá? Mount Sinai?)• Red Sea Resort in Egypt??
• Pushing for community annotation– Undergraduate students (I have about 20 in training)
08/02/2015 Phage Genomics - Evergreen 2015
FINALEThe Opera of PhAnToMe 2.0
Aims• Direct
– Discuss the theory behind RAST– Quickly preview several tools developed under (or
under influence of) the PhAnToMe project– Demonstrate online, community annotation using
SEED
• Indirect– PhAnToMe 2.0?– Establish community annotation efforts/ design
courses/ crowdsourcing– Seek Funding? Crowdfunding?
08/02/2015 Phage Genomics - Evergreen 2015
AcknowledgmentsRobert A. Edwards, PhD
• RASTtk and PhiRAST development: Ross Overbeek, Robert Olson, Jim Davis, Gordon Pusch, Terry Disz, Bruce Parrello
• Phage annotators (Phantomers): Bhakti Dwivedi, Mya Breitbart, et al.
• FIG and all SEED annotators:VeronikaV, SvetaG, OlgaV/Z, et al.
Sajia Akhter
08/02/2015
$$
Phage Genomics - Evergreen 2015
& NSF
$$& NSF
Acknowledgments
• PHAST
Victor Seguritan
08/02/2015
Katelyn McNair
• iVireons
Phage Genomics - Evergreen 2015
If you use, please cite• SEED, RAST, myRAST, phiRAST, PHAST:
– RAST: Aziz et al., BMC Genomics 2008 – SEED servers: Aziz RK,, et al. (2012) PLoS ONE 7(10): e48053. – Nucleic Acids Res. 2014 Jan;42(Database issue):D206-14
• Letters of support
06/02/2015 Phage Genomics - CeBio 2015
Questions?