from sequence to knowledge (phage genomics workshop intro at the 22nd biennial evergreen phage...

41
From Sequence to Knowledge Assembly, Annotation, and Analysis of Phage genomes from Genomic and Metagenomic Data Sets A helping hand through The Annotation Bottleneck Ramy K. Aziz

Upload: ramy-k-aziz

Post on 21-Jan-2018

193 views

Category:

Science


0 download

TRANSCRIPT

From Sequence to Knowledge

Assembly, Annotation, and Analysis

of Phage genomes from Genomic

and Metagenomic Data Sets

A helping hand through

The Annotation Bottleneck

Ramy K. Aziz

Workshop presenters

6 Aug 2017 Phage Genomics - Evergreen 2017

Alejandro Reyes

AR

Ramy Aziz

RAJason Gill

JG

PRELUDE

6 Aug 2017 Phage Genomics - Evergreen 2017

A bit of history…

• Since 2009, the Genomics Workshop has

become an essential part of the Evergreen

phage meeting

• The challenge always is: how to meet

needs/expectations that are so many and

so diverse, in ~4 hours

6 Aug 2017 Phage Genomics - Evergreen 2017

A bit of history…

• Since 2009, the Genomics Workshop has

become an essential part of the Evergreen

phage meeting

• The challenge always is: how to meet

needs/expectations that are so many and

so diverse, in ~4 hours

• The answer is:

…….

6 Aug 2017 Phage Genomics - Evergreen 2017

A bit of history…

6 Aug 2017 Phage Genomics - Evergreen 2017

The 2011 workshop

6 Aug 2017 Phage Genomics - Evergreen 2017

The 2013 workshop

6 Aug 2017 Phage Genomics - Evergreen 2017

The 2013 workshop

6 Aug 2017 Phage Genomics - Evergreen 2017

The 2013 workshop

6 Aug 2017 Phage Genomics - Evergreen 2017

The 2015 workshop

6 Aug 2017 Phage Genomics - Evergreen 2017

The 2015 workshop

6 Aug 2017 Phage Genomics - Evergreen 2017

The 2015 workshop

6 Aug 2017 Phage Genomics - Evergreen 2017

The 2015 workshop

6 Aug 2017 Phage Genomics - Evergreen 2017

The 2015 workshop

6 Aug 2017 Phage Genomics - Evergreen 2017

MOTIVATION

6 Aug 2017 Phage Genomics - Evergreen 2017

“The analysis bottleneck”

• Observation:

– We generate more data than we can analyze.

– We generate sequence data faster than

we can analyze them.

• Opinion:

– Not all bottlenecks are

created equal!

– It is important to define the question(s)

before working on the answer(s)!6 Aug 2017 Phage Genomics - Evergreen 2017

“The analysis bottleneck”

• The Lavigne paradox (2013)

6 Aug 2017 Phage Genomics - Evergreen 2017

“The analysis bottleneck”

• The Lavigne paradox (2013)

6 Aug 2017 Phage Genomics - Evergreen 2017

“The analysis bottleneck”

• The Lavigne paradox (2015)

6 Aug 2017 Phage Genomics - Evergreen 2017

AUDIENCE

6 Aug 2017 Phage Genomics - Evergreen 2017

Workshop audience

• Who (how many) among you have:

– annotated at least a phage genome?

– worked on a viral metagenome?

– used the command line (Unix, Linux, Mac

Terminal) for sequence analysis?

• We have actually ran an online survey,

and here is what we found …

6 Aug 2017 Phage Genomics - Evergreen 2017

Workshop audience

6 Aug 2017 Phage Genomics - Evergreen 2017

Workshop audience

6 Aug 2017 Phage Genomics - Evergreen 2017

Workshop audience

6 Aug 2017 Phage Genomics - Evergreen 2017

Workshop audience

6 Aug 2017 Phage Genomics - Evergreen 2017

Workshop audience

6 Aug 2017 Phage Genomics - Evergreen 2017

Quick group activity

Defining the question(s):

• Introduce yourself, your institution, and your

favorite phage

• Do you have a genome sequenced? Planning to?

– Why have you sequenced your phage genome?

– Why you want to sequence your phage genome?

• What is the single most pressing question you

want to have answered from genome analysis?

6 Aug 2017 Phage Genomics - Evergreen 2017

DEFINING THE QUESTION(S)

6 Aug 2017 Phage Genomics - Evergreen 2017

What you want …... isfrom genome from metagenome

6 Aug 2017 Phage Genomics - Evergreen 2017

Incomplete

frameshift

- complete

- accurate

Credit: Andrew Kropinski Credit: Bas Dutilh

faulty assembly

What you want …... isfrom genome from metagenome

6 Aug 2017

Incomplete faulty assembly

frameshift

- complete

- accurate

Phage Genomics - Evergreen 2017

Credit: Andrew Kropinski Credit: Bas Dutilh

A process of reconstruction

6 Aug 2017 Phage Genomics - Evergreen 2017

A process of reconstruction

• Experimentally

6 Aug 2017 Phage Genomics - Evergreen 2017

DNA

TGATTGTGTGTTTGCGCAATGCG

ATGTGTATATATAGTGAGCTTGCCC

GTCTCTCTNNNTCTCTTG

TGATTGGTCTNNNTCTCTTGCGCAATGCG

A process of reconstruction

• Experimentally

• Computationally

6 Aug 2017 Phage Genomics - Evergreen 2017

TGATTGTGTGTTTGCGCAATGCG

ATGTGTATATATAGTGAGCTTGCCC

GTCTCTCTNNNTCTCTTG

TGATTGGTCTNNNTCTCTTGCGCAATGCG

DNA

TGATTGTGTGTTTGCGCAATGCG

ATGTGTATATATAGTGAGCTTGCCC

GTCTCTCTNNNTCTCTTG

TGATTGGTCTNNNTCTCTTGCGCAATGCG

A process of reconstruction

• Experimentally

• Computationally

6 Aug 2017 Phage Genomics - Evergreen 2017

TGATTGTGTGTTTGCGCAATGCG

ATGTGTATATATAGTGAGCTTGCCC

GTCTCTCTNNNTCTCTTG

TGATTGGTCTNNNTCTCTTGCGCAATGCG

“Any phage

one can get!”

“eDNA”

TGATTGTGTGTTTGCGCAATGCG

ATGTGTATATATAGTGAGCTTGCCC

GTCTCTCTNNNTCTCTTG

TGATTGGTCTNNNTCTCTTGCGCAATGCG

Assembly

Gene finding/

ORF calling

tRNA calling

Annotation

(Assigning

functions)

orienting

Validation

Fixing frameshifts

Introns and Inteins Subsystem

assignment

Refinement/

Secondary

annotation

loop

Special purpose:

toxins, morons, integrases,

lifestyle prediction

Regulatory elements

(promoters, terminators)

Output: files and graphics

From Sequence to Knowledge

From raw sequence data to

genome submission/ publication

Classification

• The phage sequence space (Lima-Mendez et al.)

• The phage proteomic tree (Edwards & Rohwer)

• New: VIP tree http://www.genome.jp/viptree

6 Aug 2017 Phage Genomics - Evergreen 2017

Countless tools

6 Aug 2017 Phage Genomics - Evergreen 2017

This workshop: outline

1. Annotation overview

2. Automated tools for genome annotation:

– PhAnToMe/RAST related tools

– Galaxy/ Apollo

3. Tools for metagenome-based analyses

– Assembly

– Functional prediction via protein families

6 Aug 2017 Phage Genomics - Evergreen 2017

Where to go from here?

• Part I:

General introduction of genome annotation

• Part II:

Two levels

– Level 1: Novices and beginners:

Automated annotation tools

– Level 2: Intermediate to advanced users:

Command-line based tools

6 Aug 2017 Phage Genomics - Evergreen 2017

Online resources/ Slideshare• Data & links:

– http://egybio.net/tutorial

• Slides

– http://bit.ly/annotation2016

– http://bit.ly/phantome4

– Old tutorials (more detailed, but missing latest ):

• Evergreen 2011: http://slidesha.re/phantome1

• http://slidesha.re/phiRAST1 (by Karin Holmfeldt)

• Evergreen 2013: http://bit.ly/phantome2

• Evergreen 2015: http://bit.ly/phantome3

6 Aug 2017 Phage Genomics - Evergreen 2017