galaxytrakr: development of an accessible cloud …...discovery • ngs assembly – plasmidspades,...
TRANSCRIPT
![Page 1: GalaxyTrakr: Development of an Accessible Cloud …...Discovery • NGS Assembly – Plasmidspades, SPAdes, Quast • NGS Screening and Prediction – Seqsero v1 and v2, Seqsero Batch](https://reader033.vdocument.in/reader033/viewer/2022060515/5f8de2ca48d32a00f55a93da/html5/thumbnails/1.jpg)
1
GalaxyTrakr: Development of an
Accessible Cloud-based
Bioinformatics Platform
James Pettengill
Geneticist, Biostatistics and Bioinformatics Staff
Center for Food Safety and Applied Nutrition
US Food and Drug Administration
Food Safety & High-Throughput Sequencing (HTS)
Institute for Food Safety and Health
May 31, 2018
![Page 2: GalaxyTrakr: Development of an Accessible Cloud …...Discovery • NGS Assembly – Plasmidspades, SPAdes, Quast • NGS Screening and Prediction – Seqsero v1 and v2, Seqsero Batch](https://reader033.vdocument.in/reader033/viewer/2022060515/5f8de2ca48d32a00f55a93da/html5/thumbnails/2.jpg)
2
Outline:
1. Galaxy: a user-friendly interface for bioinformatics
• Introduction to Galaxy
• GalaxyTrakr Overview
• GalaxyTrakr Tools
![Page 3: GalaxyTrakr: Development of an Accessible Cloud …...Discovery • NGS Assembly – Plasmidspades, SPAdes, Quast • NGS Screening and Prediction – Seqsero v1 and v2, Seqsero Batch](https://reader033.vdocument.in/reader033/viewer/2022060515/5f8de2ca48d32a00f55a93da/html5/thumbnails/3.jpg)
3
Outline:
1. Galaxy: a user-friendly interface for bioinformatics
• Introduction to Galaxy
• GalaxyTrakr Overview
• GalaxyTrakr Tools
2. CFSAN cgMLST: rapid screen and clustering of
isolates
• Internal rapid identification of SNP clusters for outbreak
analyses
• Resource for others/industry
![Page 4: GalaxyTrakr: Development of an Accessible Cloud …...Discovery • NGS Assembly – Plasmidspades, SPAdes, Quast • NGS Screening and Prediction – Seqsero v1 and v2, Seqsero Batch](https://reader033.vdocument.in/reader033/viewer/2022060515/5f8de2ca48d32a00f55a93da/html5/thumbnails/4.jpg)
4
What is Galaxy?
![Page 5: GalaxyTrakr: Development of an Accessible Cloud …...Discovery • NGS Assembly – Plasmidspades, SPAdes, Quast • NGS Screening and Prediction – Seqsero v1 and v2, Seqsero Batch](https://reader033.vdocument.in/reader033/viewer/2022060515/5f8de2ca48d32a00f55a93da/html5/thumbnails/5.jpg)
5
Why Galaxy?
- Has a graphical user interface (GUI) so does not require
command line experience
- Active community of developers/users sharing the tools they
have developed or ported to Galaxy*
- Access programs through the Galaxy Tool Shed
![Page 6: GalaxyTrakr: Development of an Accessible Cloud …...Discovery • NGS Assembly – Plasmidspades, SPAdes, Quast • NGS Screening and Prediction – Seqsero v1 and v2, Seqsero Batch](https://reader033.vdocument.in/reader033/viewer/2022060515/5f8de2ca48d32a00f55a93da/html5/thumbnails/6.jpg)
6
Summary of Galaxy on AWS
• Galaxy has an Academic Free License.
– https://galaxyproject.org/
• Installed on a cloud formation cluster master node.
• Submits jobs to compute cluster via Grid Engine.
• Compute clusters are elastic, based on demand.
• Storage is elastic and accessible from multiple master
nodes.
• Two options for installation on AWS:
– https://aws.amazon.com/hpc/cfncluster/ **
– https://galaxyproject.org/cloudman/getting-started/
![Page 7: GalaxyTrakr: Development of an Accessible Cloud …...Discovery • NGS Assembly – Plasmidspades, SPAdes, Quast • NGS Screening and Prediction – Seqsero v1 and v2, Seqsero Batch](https://reader033.vdocument.in/reader033/viewer/2022060515/5f8de2ca48d32a00f55a93da/html5/thumbnails/7.jpg)
7
GalaxyTrakr Tools
• NGS QC and Manipulation– Trimmomatic, FastQC
• NGS Mapping– Bowtie2, Short Read Sequencer Typer (v2), BWA and BWA-MEM, Neptune Signature
Discovery
• NGS Assembly– Plasmidspades, SPAdes, Quast
• NGS Screening and Prediction– Seqsero v1 and v2, Seqsero Batch Paired-End Reads, Sistr cmd, BTyper, MLST, ABRicate
• Data Input– Direct from NCBI in Pileup, BAM or FASTA/Q format
– Upload from local computer via secure FTP or via GalaxyTrakr web interface
• Data Output– Download from GalaxyTrakr web interface
– Download via FTP
• Reference based variant detection– CFSAN SNP Pipeline
![Page 8: GalaxyTrakr: Development of an Accessible Cloud …...Discovery • NGS Assembly – Plasmidspades, SPAdes, Quast • NGS Screening and Prediction – Seqsero v1 and v2, Seqsero Batch](https://reader033.vdocument.in/reader033/viewer/2022060515/5f8de2ca48d32a00f55a93da/html5/thumbnails/8.jpg)
8
GalaxyTrakr Stats
• Currently 139 active users across 42 different locations
worldwide, adding about 15 users per week
![Page 9: GalaxyTrakr: Development of an Accessible Cloud …...Discovery • NGS Assembly – Plasmidspades, SPAdes, Quast • NGS Screening and Prediction – Seqsero v1 and v2, Seqsero Batch](https://reader033.vdocument.in/reader033/viewer/2022060515/5f8de2ca48d32a00f55a93da/html5/thumbnails/9.jpg)
9
GalaxyTrakr Stats
• Currently 139 active users across 42 different locations
worldwide, adding about 15 users per week
• Over 38,000 jobs processed to date, top users using
over 3500 hours and 11,000 CPU slots
![Page 10: GalaxyTrakr: Development of an Accessible Cloud …...Discovery • NGS Assembly – Plasmidspades, SPAdes, Quast • NGS Screening and Prediction – Seqsero v1 and v2, Seqsero Batch](https://reader033.vdocument.in/reader033/viewer/2022060515/5f8de2ca48d32a00f55a93da/html5/thumbnails/10.jpg)
10
GalaxyTrakr Stats
• Currently 139 active users across 42 different locations
worldwide, adding about 15 users per week
• Over 38,000 jobs processed to date, top users using
over 3500 hours and 11,000 CPU slots
• Cost with current load is approximately $6500 a month,
initial target was $10000 a month
![Page 11: GalaxyTrakr: Development of an Accessible Cloud …...Discovery • NGS Assembly – Plasmidspades, SPAdes, Quast • NGS Screening and Prediction – Seqsero v1 and v2, Seqsero Batch](https://reader033.vdocument.in/reader033/viewer/2022060515/5f8de2ca48d32a00f55a93da/html5/thumbnails/11.jpg)
11
GalaxyTrakr Stats
• Currently 139 active users across 42 different locations
worldwide, adding about 15 users per week
• Over 38,000 jobs processed to date, top users using
over 3500 hours and 11,000 CPU slots
• Cost with current load is approximately $6500 a month,
initial target was $10000 a month
• Custom software for automated monitoring and
management, less than 1 full-time staff member
managing IT services - detailed Custom Dashboard:
http://dash.galaxytrakr.org/
![Page 12: GalaxyTrakr: Development of an Accessible Cloud …...Discovery • NGS Assembly – Plasmidspades, SPAdes, Quast • NGS Screening and Prediction – Seqsero v1 and v2, Seqsero Batch](https://reader033.vdocument.in/reader033/viewer/2022060515/5f8de2ca48d32a00f55a93da/html5/thumbnails/12.jpg)
12
An example with SeqSero:
• Uses whole genome sequence data to predict serotype.
• Useful tool for QA/QC
• Maps reads to database of antigen alleles using BWA in multiple steps.
• Chooses alleles to which more reads mapped.
• Uses BLAST to clear up ambiguities.
![Page 13: GalaxyTrakr: Development of an Accessible Cloud …...Discovery • NGS Assembly – Plasmidspades, SPAdes, Quast • NGS Screening and Prediction – Seqsero v1 and v2, Seqsero Batch](https://reader033.vdocument.in/reader033/viewer/2022060515/5f8de2ca48d32a00f55a93da/html5/thumbnails/13.jpg)
13
Galaxy homepage
![Page 14: GalaxyTrakr: Development of an Accessible Cloud …...Discovery • NGS Assembly – Plasmidspades, SPAdes, Quast • NGS Screening and Prediction – Seqsero v1 and v2, Seqsero Batch](https://reader033.vdocument.in/reader033/viewer/2022060515/5f8de2ca48d32a00f55a93da/html5/thumbnails/14.jpg)
14
Upload your data
![Page 15: GalaxyTrakr: Development of an Accessible Cloud …...Discovery • NGS Assembly – Plasmidspades, SPAdes, Quast • NGS Screening and Prediction – Seqsero v1 and v2, Seqsero Batch](https://reader033.vdocument.in/reader033/viewer/2022060515/5f8de2ca48d32a00f55a93da/html5/thumbnails/15.jpg)
15
Choose your data
![Page 16: GalaxyTrakr: Development of an Accessible Cloud …...Discovery • NGS Assembly – Plasmidspades, SPAdes, Quast • NGS Screening and Prediction – Seqsero v1 and v2, Seqsero Batch](https://reader033.vdocument.in/reader033/viewer/2022060515/5f8de2ca48d32a00f55a93da/html5/thumbnails/16.jpg)
16
Run it
![Page 17: GalaxyTrakr: Development of an Accessible Cloud …...Discovery • NGS Assembly – Plasmidspades, SPAdes, Quast • NGS Screening and Prediction – Seqsero v1 and v2, Seqsero Batch](https://reader033.vdocument.in/reader033/viewer/2022060515/5f8de2ca48d32a00f55a93da/html5/thumbnails/17.jpg)
17
Wait…
![Page 18: GalaxyTrakr: Development of an Accessible Cloud …...Discovery • NGS Assembly – Plasmidspades, SPAdes, Quast • NGS Screening and Prediction – Seqsero v1 and v2, Seqsero Batch](https://reader033.vdocument.in/reader033/viewer/2022060515/5f8de2ca48d32a00f55a93da/html5/thumbnails/18.jpg)
18
View and Download the Results
![Page 19: GalaxyTrakr: Development of an Accessible Cloud …...Discovery • NGS Assembly – Plasmidspades, SPAdes, Quast • NGS Screening and Prediction – Seqsero v1 and v2, Seqsero Batch](https://reader033.vdocument.in/reader033/viewer/2022060515/5f8de2ca48d32a00f55a93da/html5/thumbnails/19.jpg)
19
CFSAN SNP Pipeline
![Page 20: GalaxyTrakr: Development of an Accessible Cloud …...Discovery • NGS Assembly – Plasmidspades, SPAdes, Quast • NGS Screening and Prediction – Seqsero v1 and v2, Seqsero Batch](https://reader033.vdocument.in/reader033/viewer/2022060515/5f8de2ca48d32a00f55a93da/html5/thumbnails/20.jpg)
20
cgMLST:core genome multi-locus sequence type
Annotated Reference
Collection of reference genomes
Annotated Reference
Annotated Reference
All against all
comparison of genes Identify single copy core genes
cgMLST database
Genome of interest
Genome of interest
Genome of interest
cgMLST creation
Annotate with
PROKKA Annotation
Annotation
Annotation
Isolate and compare cgMLST loci to determine
closest isolates
cgMLST in practice
![Page 21: GalaxyTrakr: Development of an Accessible Cloud …...Discovery • NGS Assembly – Plasmidspades, SPAdes, Quast • NGS Screening and Prediction – Seqsero v1 and v2, Seqsero Batch](https://reader033.vdocument.in/reader033/viewer/2022060515/5f8de2ca48d32a00f55a93da/html5/thumbnails/21.jpg)
21
21
SNP detection/calling
(reference)
Open-reading frame
annotation (de-novo);
Presence/absence and
extended MLST
Whole-chromosome
alignment (de-novo)
cgMLST has the potential to incorporate many
approaches…
…providing extreme sensitivity and
valuable genotypic and phenotypic
information.
![Page 22: GalaxyTrakr: Development of an Accessible Cloud …...Discovery • NGS Assembly – Plasmidspades, SPAdes, Quast • NGS Screening and Prediction – Seqsero v1 and v2, Seqsero Batch](https://reader033.vdocument.in/reader033/viewer/2022060515/5f8de2ca48d32a00f55a93da/html5/thumbnails/22.jpg)
22
cgMLST: rapid screening of GenomeTrakr/Pathogen
database to identify similar isolates
![Page 23: GalaxyTrakr: Development of an Accessible Cloud …...Discovery • NGS Assembly – Plasmidspades, SPAdes, Quast • NGS Screening and Prediction – Seqsero v1 and v2, Seqsero Batch](https://reader033.vdocument.in/reader033/viewer/2022060515/5f8de2ca48d32a00f55a93da/html5/thumbnails/23.jpg)
23
Summary:
1. GalaxyTrakr: a user-friendly interface for bioinformatics
• Open source
• Lots of tools
• Activity community and support available
• CFSAN’s GalaxyTrakr may not be suitable for industry as it’s intended for public data – but industry could stand up own Galaxy instance inhouse.
2. CFSAN cgMLST: rapid screen and clustering of isolates
• Internal rapid identification of SNP clusters for outbreak analyses
• Resource for others/industry – requires some bioinformatic expertise
![Page 24: GalaxyTrakr: Development of an Accessible Cloud …...Discovery • NGS Assembly – Plasmidspades, SPAdes, Quast • NGS Screening and Prediction – Seqsero v1 and v2, Seqsero Batch](https://reader033.vdocument.in/reader033/viewer/2022060515/5f8de2ca48d32a00f55a93da/html5/thumbnails/24.jpg)
24
Acknowledgements
GalaxyTrakr
• James Sanders**
• Charles Strittmatter
• Justin Payne
• Errol Strain
• Hugh Rand
cgMLST
• Arthur Pightling