eccmid 2016 - how to build actionable virulome databases

Post on 14-Jan-2017

351 Views

Category:

Science

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Assessing virulence from genomic data - which virulome database?

João André Carriço, Microbiology Institute and Instituto de Medicina Molecular, Faculty of Medicine, University of Lisbonjcarrico@fm.ul.pt twitter: @jacarrico

Session SY024 Controversies in interpreting whole genome sequence data26th ECCMID, Amsterdam, Netherlands 7-12 April 2016

How can we design actionable virulome databases

João André Carriço, Microbiology Institute and Instituto de Medicina Molecular, Faculty of Medicine, University of Lisbonjcarrico@fm.ul.pt twitter: @jacarrico

Session SY024 Controversies in interpreting whole genome sequence data26th ECCMID, Amsterdam, Netherlands 7-12 April 2016

What is a virulence factor?Virulence Factors: Class of gene products Help pathogens to invade the host

and evade specific host’s defensive mechanisms

Enhance the pathogen’s potential to cause disease

What is a virulence factor?Virulence Factors (example): Bacterial toxins (Endotoxins and Exotoxins) Adherence factors (Pili) Cell surface carbohydrates and proteins that

protect a bacterium (Streptococcal M Protein) Hydrolytic enzymes that may contribute to the

pathogenicity of the bacterium (hyaluronidase) Factors to compete with host nutrient uptake

(Siderophores)Sources: VFDB / Medical Microbiology. 4th edition. (http://www.ncbi.nlm.nih.gov/books/NBK7627/)

Too much –ome will kill you…

Virulome

Core genome Accessory genomeMobilom

e

“Virulome” Databases VFDB (http://www.mgc.ac.cn/VFs/main.htm) Pathosystems Resource Integration Center

(PATRIC) VF (https)://www.patricbrc.org/) Victors (http://www.phidias.us/victors/) PHI-Base (http://www.phi-base.org/) MvirDB (http://mvirdb.llnl.gov/ )

Criteria for choice: Focused mainly on virulence factors DB (as defined in the first slide) excludes Antibiotic resistance databases (CARD, ARDB,ARGO, RAC,…)

VFDB

* Created to facilitate the screening of HTS data

Database last update: Tue Feb 23 22:05:25 2016

PATRIC VF• 6 NIAID priority genera:

• Mycobacterium• Salmonella• Escherichia• Shigella• Listeria• Bartonella

• 1572 VFs• 1071 articles• Use of controlled vocabulary• Integrates VFDB and Victors VF information

• PATRIC supports:• Genome annotation• Comparative Genomics• Transcriptomics• Pathways• Host-pathogen interaction• Disease-related information

• Database last update:• March 2016

Pathosystems Resource Integration Center

Victors

• 5177 Virulence Factors• 126 Pathogens (class/#sp/#VFs):

• Gram + 15 1160 • Gram – 36 3488 • Virus 54 179 • Parasites 13 105 • Fungi 8 245

• Last DB Update: 27/8/2014

PHI-base

• pathogenicity, virulence and effector genes• Fungal• Oomycete • bacterial pathogens

• Hosts:• Animal• Plant • Fungal• Insect hosts.

mVirDB

• Biodefense focused• Last update 2007??• Data still available for download..

Greatest strengths All the databases have:

manually curated data links for the original publication

However manual curation is a huge caveat due to the sustainability of the process

How to use these resources Querying annotation in the the

website

Selecting species of interest, and browsing the website

BLAST query for DNA or Protein

How to use these resources Download the gene/protein

databases and use them as templates for searching own data

How to use these resources

MVLST/MLST-v

How to use these resources With HTS several core genome /whole genome MLST schemas are becoming

available/being developed: Neisseria sp. Campylobacter sp. Staphylococcus aureus Legionella pneumophila Listeria monocitogenes Enterococcus faecium Mycobacterium tuberculosis Acinetobacter baumannii Salmonella enterica E.coli ….

Loci in these schemas can be annotated / linked to the Virulence Factor DBs for automatic allele annotation through these systems

Seqsphere+

http://pubmlst.org/http://bigsdb.web.pasteur.fr/https://enterobase.warwick.ac.uk/

Bionumerics 7.5

Back to the title So far we have seen what is

available

How can we design actionable virulome

databases ?Actionable: able to be done or acted on; having practical value New Oxford American Dictionary

Bioinformatics needs Available databases still lack interfaces

for programmatic access : RESTful APIs would allow:

▪ easy automatic querying from scripts without the need of web interfaces or downloads

▪ Database updates by authorized groups (distributed curation effort)

APIs : Application Programming Interfaces

Bioinformatics needs Existing DBs reuse each others datasets without true

database interoperability: need for common ontologies (controlled vocabularies already exist but are not used by all)

Ontologies and computer readable data formats (json-ld or RDF) can allow for true database interoperability allowing bioinformaticians to extract the targeted information from a single query reaching multiple databases

Controlled vocabularies and Ontologies

Trends Microbiol 17, 279–285 (2009).

Sustainability needs

Major problems of databases Manual curation still a necessity Academic model for sustainability of a

resource: lack of funding leads to “dead” databases

Take home messages Existing virulome databases provide a wealth of data

A large part of the available VF data overlaps between DBs. The overlap largely depends of the last database update and what was included.

They are always a Work in Progress , heavily relying in manual curation

Novel HTS based techniques such as cg/wgMLST can use this databases to annotate schemas and provide a much richer picture of VF diversity at DNA/Protein level.

on VF

Acknowledgments UMMI Members

Mário Ramirez José Melo-Cristino

EFSA INNUENDO Project (https://sites.google.com/site/innuendocon/) Mirko Rossi

FP7 PathoNGenTrace (http://www.patho-ngen-trace.eu/): Dag Harmsen (Univ. Muenster) Stefan Niemann (Research Center Borstel) Keith Jolley, James Bray and Martin Maiden (Univ. Oxford) Joerg Rothganger (RIDOM) Hannes Pouseele (Applied Maths)

Genome Canada IRIDA project (www.irida.ca) Franklin Bristow, Thomas Matthews, Aaron Petkau, Morag Graham and Gary Van Domselaar (NLM , PHAC) Ed Taboada and Peter Kruczkiewicz (Lab Foodborne Zoonoses, PHAC) Fiona Brinkman (SFU) William Hsiao (BCCDC)

INTEGRATED RAPID INFECTIOUS DISEASE ANALYSIS

top related