irida: canada’s federated platform for genomic epidemiology
TRANSCRIPT
![Page 1: IRIDA: Canada’s federated platform for genomic epidemiology](https://reader035.vdocument.in/reader035/viewer/2022081604/58a0c7c71a28ab6d018b55e3/html5/thumbnails/1.jpg)
IRIDA: Canada’s federated platform for genomic epidemiology
William Hsiao, [email protected]
@wlhsiao
BC Public Health Microbiology and Reference Laboratory and University of British Columbia
ABPHM 2015
![Page 2: IRIDA: Canada’s federated platform for genomic epidemiology](https://reader035.vdocument.in/reader035/viewer/2022081604/58a0c7c71a28ab6d018b55e3/html5/thumbnails/2.jpg)
Genome Canada Bioinformatics Competition: Large-Scale Project
“A Federated Bioinformatics Platform for Public Health Microbial Genomics”
Our Goal
The IRIDA platform(Integrated Rapid Infectious Disease Analysis)
An open source, standards compliant, high quality genomic epidemiology analysis platform to support real-time (food-borne) disease outbreak
investigations
2 www.IRIDA.ca
![Page 3: IRIDA: Canada’s federated platform for genomic epidemiology](https://reader035.vdocument.in/reader035/viewer/2022081604/58a0c7c71a28ab6d018b55e3/html5/thumbnails/3.jpg)
3
Each year, one in eight Canadians (or four million people)
get sick with a domestically acquired food-borne illness.
![Page 4: IRIDA: Canada’s federated platform for genomic epidemiology](https://reader035.vdocument.in/reader035/viewer/2022081604/58a0c7c71a28ab6d018b55e3/html5/thumbnails/4.jpg)
Partnership among public health agencies and academic institutes to bridge the gaps between advancements in genomic epidemiology and application to real-life and real-
time use cases in public health agencies
- Project Team has direct access to state of the art research in academia- Project Team is directly embedded in user organization
![Page 5: IRIDA: Canada’s federated platform for genomic epidemiology](https://reader035.vdocument.in/reader035/viewer/2022081604/58a0c7c71a28ab6d018b55e3/html5/thumbnails/5.jpg)
5
Interviews with key personnel to identify barriers to implement genomic epidemiology in
public health agencies
![Page 6: IRIDA: Canada’s federated platform for genomic epidemiology](https://reader035.vdocument.in/reader035/viewer/2022081604/58a0c7c71a28ab6d018b55e3/html5/thumbnails/6.jpg)
GAP 1: PUBLIC HEALTH PERSONNEL LACK TRAINING IN GENOMICS
![Page 7: IRIDA: Canada’s federated platform for genomic epidemiology](https://reader035.vdocument.in/reader035/viewer/2022081604/58a0c7c71a28ab6d018b55e3/html5/thumbnails/7.jpg)
Solution 1a: Build a User Friendly, high quality analysis platform to process genomics data
• Carefully designed and engineered software platform is just the starting point… User
Interface
Secu
rity
File system
Metadata Storage Application
logic
REST APIWorkflow Execution Manager
Continuous Integration Documentation
![Page 8: IRIDA: Canada’s federated platform for genomic epidemiology](https://reader035.vdocument.in/reader035/viewer/2022081604/58a0c7c71a28ab6d018b55e3/html5/thumbnails/8.jpg)
• Easy to use interface hiding the technical details
Solution 1a: Build a User Friendly, high quality analysis platform to process genomics data
![Page 9: IRIDA: Canada’s federated platform for genomic epidemiology](https://reader035.vdocument.in/reader035/viewer/2022081604/58a0c7c71a28ab6d018b55e3/html5/thumbnails/9.jpg)
Solution 1a: Build a User Friendly, high quality analysis platform to process genomics data
![Page 10: IRIDA: Canada’s federated platform for genomic epidemiology](https://reader035.vdocument.in/reader035/viewer/2022081604/58a0c7c71a28ab6d018b55e3/html5/thumbnails/10.jpg)
Solution 1b: Build Portable and Transparent Pipelines
• Use Galaxy as workflow engine – large community support
• Retools to address usability, security, and other limitations
• Version Controlled Pipeline Templates• Input files, parameters, and workflow are
sent to IRIDA-specific Galaxy for execution• Results and provenance information are
copied from Galaxy
1. Input files sent to
Galaxy
3. Results downloaded from Galaxy
IRIDA UI/DB
GalaxyAssembly Tools
Variant Calling Tools
…
REST API
Shared File System
Worker Worker
2. Tools executed on Galaxy workers
![Page 11: IRIDA: Canada’s federated platform for genomic epidemiology](https://reader035.vdocument.in/reader035/viewer/2022081604/58a0c7c71a28ab6d018b55e3/html5/thumbnails/11.jpg)
Solution 1c: Start the training NOW!
• Canada’s National Microbiology Laboratory has hosted genomic workshops for partners and collaborators
• IRIDA Project has dedicated funding for hosting workshops in 4Q of 2015 and 2016
• We would like to hear about other training initiatives and share experience and training material
![Page 12: IRIDA: Canada’s federated platform for genomic epidemiology](https://reader035.vdocument.in/reader035/viewer/2022081604/58a0c7c71a28ab6d018b55e3/html5/thumbnails/12.jpg)
GAP 2: INFORMATION SHARING IS INEFFICIENT AND AD-HOC
![Page 13: IRIDA: Canada’s federated platform for genomic epidemiology](https://reader035.vdocument.in/reader035/viewer/2022081604/58a0c7c71a28ab6d018b55e3/html5/thumbnails/13.jpg)
Many Players in surveillance and outbreak – ineffective information sharing
Source: M. Taylor, BCCDC
Provincial public health dept.
National laboratory
Local public health dept.
Provincial laboratory
Cases
Physicians Frontline lab
Information
Bioinformatics and Analytical Capacities
![Page 14: IRIDA: Canada’s federated platform for genomic epidemiology](https://reader035.vdocument.in/reader035/viewer/2022081604/58a0c7c71a28ab6d018b55e3/html5/thumbnails/14.jpg)
Many Systems used in Reporting Diseases –require data re-entry and re-coding
National Ministry of Health
Provincial public health dept.
National laboratory
Local public health dept.
Provincial laboratory
Cases
Physicians Local laboratory
Fax/Electronic
Fax
Phone/Fax
Electronic/Paper
Electronic/Fax/Phone Mailing of Samples/Fax/Eelctroni
c
Source: M. Taylor, BCCDC
![Page 15: IRIDA: Canada’s federated platform for genomic epidemiology](https://reader035.vdocument.in/reader035/viewer/2022081604/58a0c7c71a28ab6d018b55e3/html5/thumbnails/15.jpg)
IRIDA is designed with these dilemma in mind
• Solutions:– 2a: Localized Instance of federated databases
– 2b: Permission Control – authentication /authorization for information sharing
– 2c: User role-based display of information
![Page 16: IRIDA: Canada’s federated platform for genomic epidemiology](https://reader035.vdocument.in/reader035/viewer/2022081604/58a0c7c71a28ab6d018b55e3/html5/thumbnails/16.jpg)
16
Solution 2a: Local/Cloud Instances and Data Federation
• Data processing capacity pushed to data generating labs• Allow data sharing securely for enhanced analysis• Eventually cultivating a culture of openness of data
sharing and collaborative development of tools
![Page 17: IRIDA: Canada’s federated platform for genomic epidemiology](https://reader035.vdocument.in/reader035/viewer/2022081604/58a0c7c71a28ab6d018b55e3/html5/thumbnails/17.jpg)
Authorization
Solution 2b: Security
• Local authorization per instance.• Method-level authorization.• Object-level authorization.• Allow secure, fine grained and
flexible information sharing
![Page 18: IRIDA: Canada’s federated platform for genomic epidemiology](https://reader035.vdocument.in/reader035/viewer/2022081604/58a0c7c71a28ab6d018b55e3/html5/thumbnails/18.jpg)
Solution 2c: Role-based Dynamic Display driven by Ontology
• Ontologies often lack a content management system (CMS)• An Interface Model Ontology (IFM) can define a CMS for an
ontology
![Page 19: IRIDA: Canada’s federated platform for genomic epidemiology](https://reader035.vdocument.in/reader035/viewer/2022081604/58a0c7c71a28ab6d018b55e3/html5/thumbnails/19.jpg)
IFM Interface View Permissions
Detailed View Restricted View
E.g. User role permissions control visibility and editing of content
![Page 20: IRIDA: Canada’s federated platform for genomic epidemiology](https://reader035.vdocument.in/reader035/viewer/2022081604/58a0c7c71a28ab6d018b55e3/html5/thumbnails/20.jpg)
GAP 3: INFORMATION REPRESENTATION IS INCONSISTENT
![Page 21: IRIDA: Canada’s federated platform for genomic epidemiology](https://reader035.vdocument.in/reader035/viewer/2022081604/58a0c7c71a28ab6d018b55e3/html5/thumbnails/21.jpg)
Solution 3a: Use Ontology
• Ontology: a way to describe types of entities and relations between them
• Why use ontology– Ontology is flexible and expandable– Lower levels of expressivity (e.g. controlled vocabulary,
data dictionary) are heavy handed and show low level of compliance and adoption
– Free text used as an alternative that are not computing friendly
– Ontology and semantic web technologies may be a solution
![Page 22: IRIDA: Canada’s federated platform for genomic epidemiology](https://reader035.vdocument.in/reader035/viewer/2022081604/58a0c7c71a28ab6d018b55e3/html5/thumbnails/22.jpg)
Many Domains of Knowledge are needed to describe an outbreak investigation Build On, Work With:
OBITypON NGSOnto NIAID-GSC-BRC core metadataMIxS Ontology NCBI Biosample etcTRANS – Pathogen Transmission EPOExposure OntologyInfectious Disease OntologyCARD, ARO for AMRUSDA Nutrient DBEFSA Comp. Food Consump. DB
Example gaps to be filled: Expand food ontology; expand CARD AMR data with others.
![Page 23: IRIDA: Canada’s federated platform for genomic epidemiology](https://reader035.vdocument.in/reader035/viewer/2022081604/58a0c7c71a28ab6d018b55e3/html5/thumbnails/23.jpg)
Lab Checklist/Ontology
• Currently finishing a lab/genomics checklist and starting an epidemiology checklist
• Metadata Domains:– Sample Collection– Sample Source– Environmental– Lab Analytics– Sequencing Process /QC– Sequencing Run /QC– Assembly Process / QC– Others overlapping with Epi: Demographic / Geographic / etc.
![Page 24: IRIDA: Canada’s federated platform for genomic epidemiology](https://reader035.vdocument.in/reader035/viewer/2022081604/58a0c7c71a28ab6d018b55e3/html5/thumbnails/24.jpg)
GAP 4: GENOMIC DATA INTERPRETATION IS COMPLEX AND TECHNOLOGY IS EVOLVING
![Page 25: IRIDA: Canada’s federated platform for genomic epidemiology](https://reader035.vdocument.in/reader035/viewer/2022081604/58a0c7c71a28ab6d018b55e3/html5/thumbnails/25.jpg)
Solution 4a: Use of QA/QC in IRIDA
• Software Engineering– High quality software that meets regulatory guidelines– Open Source product to ensure “white box” testing– Ontology driven software development– Follow proper software development cycle
• Data Quality– Built-in modules to check for input data quality – Warnings and Feedbacks during pipeline execution to laboratory technologists – Use of Ontology to check metadata (non-genomic) data quality
• Analytic Tool Quality– Utilize validation datasets– Use of abstract pipeline description – with version control– Periodic analysis of exceptions and boundary cases to assess tool accuracy
![Page 26: IRIDA: Canada’s federated platform for genomic epidemiology](https://reader035.vdocument.in/reader035/viewer/2022081604/58a0c7c71a28ab6d018b55e3/html5/thumbnails/26.jpg)
Solution 4b: Generation of validation datasets
To Participate, Contact Rene [email protected]
Or Errol Strain [email protected]
http://www.globalmicrobialidentifier.org/Workgroups#work-group-4
![Page 27: IRIDA: Canada’s federated platform for genomic epidemiology](https://reader035.vdocument.in/reader035/viewer/2022081604/58a0c7c71a28ab6d018b55e3/html5/thumbnails/27.jpg)
27
Solution 4c: Exploratory tools can access certain data via REST API securely
http://pathogenomics.sfu.ca/islandviewer
IslandViewer
Dhillon and Laird et al. 2015, Nucleic Acids Research
http://kiwi.cs.dal.ca/GenGIS
Parks et al. 2013, PLoS One
![Page 28: IRIDA: Canada’s federated platform for genomic epidemiology](https://reader035.vdocument.in/reader035/viewer/2022081604/58a0c7c71a28ab6d018b55e3/html5/thumbnails/28.jpg)
Availability
• Jun 1 2015: IRIDA 1.0 beta Internal Release– Release to collaborators for installation and full test
• Jul 1 2015: IRIDA 1.0 beta1– Announce Beta release, download, documentation available on
website – www.irida.ca
• Aug 1 2015: IRIDA 1.0 beta2– Cloud installer, with documentation– Additional pipelines as available – Visualization as available
![Page 29: IRIDA: Canada’s federated platform for genomic epidemiology](https://reader035.vdocument.in/reader035/viewer/2022081604/58a0c7c71a28ab6d018b55e3/html5/thumbnails/29.jpg)
29
AcknowledgementsProject LeadersFiona Brinkman – SFUWill Hsiao – PHMRLGary Van Domselaar – NML
University of LisbonJoᾶo Carriҫo
National Microbiology Laboratory (NML)Franklin BristowAaron PetkauThomas MatthewsJosh AdamAdam OlsonTarah LynchShaun TylerPhilip MabonPhilip AuCeline NadonMatthew Stuart-EdwardsMorag GrahamChrystal BerryLorelee TschetterAleisha Reimer
Laboratory for Foodborne Zoonoses (LFZ)Eduardo TaboadaPeter KruczkiewiczChad LaingVic GannonMatthew WhitesideRoss DuncanSteven Mutschall
Simon Fraser University (SFU)Melanie CourtotEmma GriffithsGeoff WinsorJulie ShayMatthew LairdBhav DhillonRaymond Lo
BC Public Health Microbiology & Reference Laboratory (PHMRL) and BC Centre for Disease Control (BCCDC)Judy Isaac-RentonPatrick TangNatalie PrystajeckyJennifer GardyDamion DooleyLinda HoangKim MacDonaldYin ChangEleni GalanisMarsha TaylorCletus D’SouzaAna Paccagnella
University of MarylandLynn Schriml
Canadian Food Inspection Agency (CFIA)Burton BlaisCatherine CarrilloDominic Lambert
Dalhousie UniversityRob BeikoAlex Keddy
McMaster UniversityAndrew McArthurDaim Sardar
European Nucleotide ArchiveGuy CochranePetra ten HoopenClara Amid
European Food Safety AgencyLeibana Criado ErnestoVernazza FrancescoRizzi Valentina
![Page 30: IRIDA: Canada’s federated platform for genomic epidemiology](https://reader035.vdocument.in/reader035/viewer/2022081604/58a0c7c71a28ab6d018b55e3/html5/thumbnails/30.jpg)
3030
IRIDA Annual General MeetingWinnipeg, April 8-9, 2015
![Page 31: IRIDA: Canada’s federated platform for genomic epidemiology](https://reader035.vdocument.in/reader035/viewer/2022081604/58a0c7c71a28ab6d018b55e3/html5/thumbnails/31.jpg)
The IRIDA platform(Integrated Rapid Infectious Disease Analysis)
An open source, standards compliant, high quality genomic epidemiology analysis platform to support real-time (food-borne) disease outbreak
investigations
Contacts:
[email protected]@wlhsiao
31 www.IRIDA.ca