the opera of phantome - version 2.0 (presented at the 21st biennial evergreen phage meeting)

65
The Opera of PhAnToMe 2.0 Ramy K. Aziz (@azizrk) Aug 02 2015 opus (LT) = work (Pl. opera) SEED-based phage database (2009-2013-…) Phage Genomics Workshop, Evergreen 2015

Upload: ramy-k-aziz

Post on 15-Aug-2015

415 views

Category:

Science


3 download

TRANSCRIPT

Page 1: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

The Opera of PhAnToMe 2.0

Ramy K. Aziz (@azizrk)Aug 02 2015

opus (LT) = work (Pl. opera)

SEED-based phage database (2009-2013-…)

Phage Genomics Workshop, Evergreen 2015

Page 2: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

Phage Genomics - Evergreen 2015

As usual, slides will be made available

• Evergreen 2011 workshop– http://slidesha.re/phantome1– http://slidesha.re/phiRAST1

• Evergreen 2013 workshop– http://bit.ly/phantome2

• This year’s workshop: – http://bit.ly/phantome3

• Hashtag for the meeting?– #Evergreen15

08/02/2015

Page 3: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

PRELUDEThe Opera of PhAnToMe 2.0

Page 4: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

Aims• Direct

– Discuss the theory behind RAST– Quickly preview several tools developed under (or

under influence of) the PhAnToMe project– Demonstrate online, community annotation using

SEED

• Indirect– PhAnToMe 2.0?– Establish community annotation efforts/ design

courses/ crowdsourcing– Seek Funding? Crowdfunding?

08/02/2015 Phage Genomics - Evergreen 2015

Page 5: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

Outline• Act I. The environment (the SEED)

– The SEED and the ‘Subsystems Technology’

• Act II. The toolbox (PhAnToMe and sequels)– The RAST family– PhACTS– PhiSPy– iVireons

• Act III. The community– Online annotation process – Annotation smmit(s)– Course design

08/02/2015 Phage Genomics - Evergreen 2015

$$

Writing proposals, applying for grants

Page 6: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

08/02/2015

History

Phage Genomics - Evergreen 2015

NSF-funded, 3-year project (09-12) to develop

PhageAnnotationTools andMethods

Four Centers:- SDSU, San Diego, CA- VCU, Richmond, VA- USF, St. Pete FL- UA, Tucson, AZ

Page 7: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

http://www.phantome.org

08/02/2015

Two years ago…

Phage Genomics - Evergreen 2015

Page 8: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

MAJOR UPDATE

08/02/2015

Current status

Phage Genomics - Evergreen 2015

Page 9: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

MAJOR UPDATE

08/02/2015

Current status

Phage Genomics - Evergreen 2015

Page 10: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

ACT I. THE ENVIRONMENTThe Opera of PhAnToMe 2.0

Page 11: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

I. The Environment: SEED

http://theseed.org

08/02/2015

Aziz RK,, et al. (2012) PLoS ONE 7(10): e48053. doi:10.1371/journal.pone.0048053

Phage Genomics - Evergreen 2015

Page 12: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

SEED: Main concept

One genome

All genomes

08/02/2015 Phage Genomics - Evergreen 2015

Page 13: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

SEED: Main concept

One genome

All genomes

08/02/2015 Phage Genomics - Evergreen 2015

“Subsystems-based technologies were developed in the SEED with the view that the interpretation of one genome can be made more efficient and consistent if hundreds of genomes are simultaneously annotated in one subsystem at a time”

Page 14: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

SEED: Main concept• Protein-based database

Jargon: PEG = protein-encoding gene

• The subsystems approach

and• FIGfams: protein families based on

– sequence similarity– chromosomal co-occurrence, gene order,

synteny– human curation, evidence-based expert

assertions08/02/2015 Phage Genomics - Evergreen 2015

Page 15: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

RAST: automated annotation

08/02/2015 Phage Genomics - Evergreen 2015

Page 16: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

08/02/2015

What is a subsystem?• “A subset of functional roles studied across genomes”• A spreadsheet where:

– each row represents a genome– each column represents a functional role/ feature/ protein– different patterns = variants

Function 1 Function 2 … Function n

Genome a

Genome b

Genome z

Phage Genomics - Evergreen 2015

Page 17: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

08/02/2015

What is a subsystem?

Phage Genomics - Evergreen 2015

Page 18: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

Advantages of subsystems

Subsystems-basedannotation

08/02/2015 Phage Genomics - Evergreen 2015

Page 19: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

Annotation Reconstruction

from genome from metagenome

08/02/2015 Phage Genomics - Evergreen 2015

Incomplete

frameshift

- complete- accurate

Credit: Andrew Kropinski Credit: Bas Dutilh

faulty assembly

Page 20: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

Annotation Reconstruction

from genome from metagenome

08/02/2015

Incomplete faulty assembly

frameshift

- complete- accurate

Phage Genomics - Evergreen 2015

Credit: Andrew Kropinski Credit: Bas Dutilh

Page 21: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

ACT II. THE TOOLBOXThe Opera of PhAnToMe 2.0

Page 22: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

II. PhAnToMe ToolBoxhttp://www.phantome.org

08/02/2015 Phage Genomics - Evergreen 2015

Page 23: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

The ToolBox: The RAST family• (At least) Five ways to annotate a genome via RAST:

– RAST (http://rast.nmpdr.org)

• annotates online, saves your genome on server

– myRAST (local)

• uses the server but you can edit offline)

– “PhAST” (http://www.phantome.org/PhageSeed/Phage.cgi?page=phast)

• optimized gene-calling

– Use your favorite gene caller then upload gbk file to RAST

– RASTtk (second-generation RAST)

• modular

• batch upload

08/02/2015 Phage Genomics - Evergreen 2015

New

Page 24: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

http://rast.nmpdr.org

08/02/2015 Phage Genomics - Evergreen 2015

Page 25: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

http://rast.nmpdr.org

08/02/2015 Phage Genomics - Evergreen 2015

Page 26: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

“PhAST”: phage-optimized RAST

08/02/2015 Phage Genomics - Evergreen 2015

http://www.phantome.org/PhageSeed/Phage.cgi?page=phast

Page 27: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

“PhAST”: phage-optimized RAST

08/02/2015 Phage Genomics - Evergreen 2015

http://www.phantome.org/PhageSeed/Phage.cgi?page=phast

Page 28: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

RASTtk (RAST toolkit)

08/02/2015 Phage Genomics - Evergreen 2015

Page 29: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

RASTtk (RAST toolkit)

08/02/2015 Phage Genomics - Evergreen 2015

Page 30: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

The RASTtk Microbial Annotation Pipeline

FASTA QCFASTA to Genome TO Call rRNAs Call tRNAs

Call CDSsProdigal

Call CDSsGlimmer3

AnnotateProteins K-mer v2

AnnotateProteins K-mer v1

Call CRISPRs CALL Phages (PhiSpy)

Find Repeats ExportGenBank,

GFF3, Fasta

• Green boxes are alternative pipeline steps

• Dashed boxes are optional pipeline steps

08/02/2015 Phage Genomics - Evergreen 2015

Page 31: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

In final development: phi-RASTtk

FASTA QCFASTA to Genome TO Call rRNAs Call tRNAs

Call CDSsProdigal

Call CDSsGenMark

AnnotatePhage

Proteins

AnnotateProteins K-mer v2

Find Repeats Find Toxins ExportGenBank,

GFF3, Fasta

• Green boxes are alternative pipeline steps

• Dashed boxes are optional pipeline steps

08/02/2015 Phage Genomics - Evergreen 2015

Page 32: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

RASTtk command-line

08/02/2015 Phage Genomics - Evergreen 2015

Page 33: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

RAST Video demos available• Watch on your own:

– http://tutorial.theseed.org

• Possible tutorial on Tuesday at 3 PM + hands-on application

08/02/2015 Phage Genomics - Evergreen 2015

Page 34: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

ACTIVITIES/EXERCISESAfter this workshop (1 PM)

Page 35: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

Phage Genomics - CeBio 2015

What do you need to annotate your genome?

• A sequenced genome• Format: fasta or genbank (.gbk)• A RAST username and password

06/02/2015

Page 36: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

Phage Genomics - CeBio 2015

I. Browse your favorite genome

06/02/2015

Page 37: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

Phage Genomics - CeBio 2015

1. Browse your favorite genome

06/02/2015

Page 38: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

Phage Genomics - CeBio 2015

2. Explore the protein page• Annotation history• Annotation clearinghouse• Evidence

– similarities– literature

06/02/2015

Page 39: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

Phage Genomics - CeBio 2015

2. Explore the protein page

06/02/2015

• Find your favorite protein

Page 40: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

Phage Genomics - CeBio 2015

2. Explore the protein page

06/02/2015

• Find your favorite protein

Page 41: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

Phage Genomics - CeBio 2015

3. Aligning proteins (in context)• Evidence> Similarities> Align• Compare region, advanced settings• Phylogenetic trees

06/02/2015

Page 42: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

Phage Genomics - CeBio 2015

3. Aligning proteins (in context)

06/02/2015

Page 43: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

END OF ACTIVITY

Page 44: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

Other tools• PHACTS:

– classifies and predicts lifestyle

• PhiSpy: – finds prophages

• iVireons– predicts phage structural proteins, holins,

more to come

08/02/2015 Phage Genomics - Evergreen 2015

Page 45: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

The ToolBox: PHACTS• PHAge Classification Tool Set

• Uses a novel similarity algorithm and a supervised Random Forest classifier to predict whether the lifestyle of a phage, described by its proteome, is virulent or temperate.

• The similarity algorithm creates a training set from phages with known lifestyles and along with the lifestyle annotation, trains a Random Forest to classify the lifestyle of a phage.

• PHACTS predictions have had a 99% precision rate.

08/02/2015 Phage Genomics - Evergreen 2015Kate McNair

Page 46: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

PHACTS• http://www.phantome.org/PHACTS/

• Other applications• Host prediction: whether a phage infects a Gram

positive or Gram negative bacteria• Taxonomy prediction: a phage’s Family

08/02/2015 Phage Genomics - Evergreen 2015Kate McNair

Page 47: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

PHACTS

08/02/2015 Phage Genomics - Evergreen 2015Kate McNair

Page 48: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

The ToolBox: PhiSpy

Calculate genomic characteristics

Classifyprophage region

Evaluate predicted prophages

• Transcriptional Strand Orientation• Customized AT skew• Customized GC skew• Protein length • Abundance of Phage words

• Random Forest• Pre calculated training genome• Input bacterial genome

• Produce a rank for each gene

• Phage insertion points• Similarity of phage proteins

08/02/2015 Phage Genomics - Evergreen 2015Sajia Akhter

Page 49: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

PhiSpy

• Performance comparison in 50 complete bacterial genomes

Applications %Identified %FN %FP

Prophinder 89% 11% 12%

Phage_finder 82% 18% 1.33%

PhiSpy 94% 6% 0.66%

08/02/2015 Phage Genomics - Evergreen 2015Sajia Akhter

Page 50: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

• Download: PhiSpy – http://sourceforge.net/projects/phispy

• PhiSpy is on RASTtk

• Ran PhiSpy on 4,335 bacterial genomes

• Predicted 12,826 prophages in 3,203 genomes

– 9,101 known prophages

– 3,723 undefined prophages08/02/2015 Phage Genomics - Evergreen 2015

PhiSpy

Sajia Akhter

Page 51: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

iVIREONS – http://vdm.sdsu.edu/ivireons

Victor Seguritan

Page 52: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

“FAMILIES” OF ANNs

1) General structural proteins:

2) Phage major capsid proteins

3) Phage tail/tail fibers/collar etc.

4) Holins

5) Portals

• Trained with all types of proteins• Both phages & viruses

08/02/2015 Phage Genomics - Evergreen 2015

Victor Seguritan

Page 53: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

1

iVIREONS – http://vdm.sdsu.edu/ivireons

2Enter User Info

VibrioPhage

[email protected]

3Upload Sequences

Victor Seguritan

Page 54: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

4 View Results

5Copy Results to a Spreadsheet

iVIREONS – http://vdm.sdsu.edu/ivireons

- Structural 1:1- MCP 1:1- MCP 2:1- MCP 3:1- MCP 4:1- MCP 7:1- MCP 22:1

(lambda)- Tail 1:1- Tail 2:1- Tail 4:1- Tail 7:1- Tail 6.6:1

(lambda)

Stringencies Reported

08/02/2015 Phage Genomics - Evergreen 2015

Page 55: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

ACT III. THE COMMUNITYThe Opera of PhAnToMe 2.0

Page 56: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

SEED allows continuous annotation

08/02/2015

SEED

RAST

GenomesSubsystems

SEED Viewer

New Genomes

Subsystems Editor

Phage Genomics - Evergreen 2015

Page 57: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

SEED allows community annotation

08/02/2015 Phage Genomics - Evergreen 2015

Page 58: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

Annotations will improve only if YOU help

08/02/2015 Phage Genomics - Evergreen 2015

Page 59: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

Prospects• Phage annotation “summits”

– First summit (Jan 2011) was at Biosphere 2, Tucson, AZ

– A second one?• On a summit? (e.g., Bogotá? Mount Sinai?)• Red Sea Resort in Egypt??

• Pushing for community annotation– Undergraduate students (I have about 20 in training)

08/02/2015 Phage Genomics - Evergreen 2015

Page 60: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

FINALEThe Opera of PhAnToMe 2.0

Page 61: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

Aims• Direct

– Discuss the theory behind RAST– Quickly preview several tools developed under (or

under influence of) the PhAnToMe project– Demonstrate online, community annotation using

SEED

• Indirect– PhAnToMe 2.0?– Establish community annotation efforts/ design

courses/ crowdsourcing– Seek Funding? Crowdfunding?

08/02/2015 Phage Genomics - Evergreen 2015

Page 62: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

AcknowledgmentsRobert A. Edwards, PhD

• RASTtk and PhiRAST development: Ross Overbeek, Robert Olson, Jim Davis, Gordon Pusch, Terry Disz, Bruce Parrello

• Phage annotators (Phantomers): Bhakti Dwivedi, Mya Breitbart, et al.

• FIG and all SEED annotators:VeronikaV, SvetaG, OlgaV/Z, et al.

Sajia Akhter

08/02/2015

$$

Phage Genomics - Evergreen 2015

& NSF

Page 63: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

$$& NSF

Acknowledgments

• PHAST

Victor Seguritan

08/02/2015

Katelyn McNair

• iVireons

Phage Genomics - Evergreen 2015

Page 64: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

If you use, please cite• SEED, RAST, myRAST, phiRAST, PHAST:

– RAST: Aziz et al., BMC Genomics 2008 – SEED servers: Aziz RK,, et al. (2012) PLoS ONE 7(10): e48053. – Nucleic Acids Res. 2014 Jan;42(Database issue):D206-14

• Letters of support

06/02/2015 Phage Genomics - CeBio 2015

Page 65: The Opera of Phantome - Version 2.0 (presented at the 21st Biennial Evergreen Phage Meeting)

Questions?