methods and practices to make bioinformatics tools...

Post on 26-Aug-2018

216 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1: Mathématique et Informatique Appliquées du Génome à l’Environnement (MaIAGE) - Jouy-en-Josas - INRA; 2: Plant Resistance to Pests and Diseases (RPB) – Montpellier – IRD; 3: Génétique Physiologie etSystèmes d’Elevage (GenPhySE) - Castanet Tolosan - INRA; 4: Unité de Nutrition Humaine (UNH) - Clermont-Ferrand - INRA et Université d’Auvergne; 5: Unité de recherche Virologie et ImmunologieMoléculaires (VIM) - Jouy-en-Josas - INRA; 6: Plateforme MetaToul-AXIOM - Toulouse – INRA

WHAT’S NEXT We are currently working on a new use case “miRDeep2”. It is a pipeline made from the miRDeep2 tool but neverfinished. We are installing it on a Test Galaxy instances and then we will reproduce the work done for the other project that is to: ensure thewrappers follow the best practices; be sure they can be shared with Conda package dependencies; publish them on the Toolshed; Installthem on a Galaxy instance and maybe create a container and organize a training about it.And for SNiPlay we are managing some others pipelines like the Haplophyles analysis one.

6. Current deliverables

4. Progress

For the past 21 months

we have set up methods

for the overall project.

And we principally

applied it on 4 use cases

(subprojects).

For each use case we have 4 main deliverables: the

packaged tools (wrapper for Galaxy and packages to install dependencies); an

instance easily accessible for users; a use case with someone having a good

knowledge of the tool; documentation and/or training support ready to use.

DELIVERABLES

5. Project deliverables

The first use case is the suite “REPET” to detect, annotate and analyse repeats in genomic sequences (specifically designed for transposable elements).For this tool we finalized wrappers that allow the two “REPET” pipelines to run on Galaxy (in a lighter way). Difficulties came from the large number of dependenciesand the need of a HPC cluster for these pipelines. We succeed to install it on galaxy instances on the IFB cloud and on a URGI virtual machine. We have thereforecarried out a training on the URGI galaxy machine, it allows us to validate the tool, and make a list of new features to improve the current version. We made a seconddevelopment cycle, with the delivery of a new version of REPET. This version has been successfully used in a second training session.

PROGRESS RESULTS

Theyhave been established by thecommunity to improve the quality ofwrappers development. Respect ofthese practices provides aharmonization of tools architecture,an easy tool maintenance and aninsurance of the tool quality forusers.Good practices include the way tobuild a wrapper with the essentialsmarkups, the way to distribute thetools with toolshed managementand the way to install the tools withdependencies management.

Based on these practices wedesigned this tool integrationprocess, with two technicaldeliverable levels. 3. Tool integration process

TECHNOLOGIESBEST PRACTICES In front of thisprocess we apply some technologies commonlyused by the community:

- Planemo, an assisting software for Galaxy toolsbuilding. It eases development by providingwrappers skeletons (template) and checking thegood syntax of the xml files. Also, it can automatetools testing and publication on the toolshed.

- Conda, a dependencies and environmentmanagement technology used by Galaxydevelopers to manage tools packaging. It is nowthe privileged way for dependencies managementin Galaxy last versions. There is also a repository forall the biological tools packages named “BioConda”.

- Docker, a container manager, that provide lightvirtual machines. The interest of this technology isto easily share tools in a close environment, readyto use

1. Project origin

Accessibility, Reproducibility and Transparency are the guidelines of the web-basedplatform Galaxy. These three principles have lead the Galaxy Community to create GoodPractices for tool integration into their platform. Today, a developer has clear guidelines to wrapa tool under galaxy (xml file, packaging).

2. Partner communities

FRENCH COMMUNITYA.R.T.

GFLS PROJECT

The French community has been among the early usersof Galaxy. Several tools wrappers have been developed since 2010. Wrappers have aheterogeneous non-standardised architecture because the stabilization of methods took aconsequent time. After this significant stabilization time, the French community has adoptedthe Galaxy good practices.

Consequently, The French Galaxy working group of the French Bioinformatics Institute (IFB),established the project “Galaxy For Life Science” (GFLS). This project aim to provide upgrade and highlight of wrappers forseveral French scientific communities (Plant, Livestock, Microbe). These enhancement are made according to these goodpractices, tools and method.

Valentin MARCON valentin.marcon@inra.fr

Institut National de la Recherche Agronomique (INRA)Unité Mathématiques & Informatique Appliquées du Génome à l'Environnement Plateforme bioinformatique MigaleDomaine de Vilvert 78352 Jouy-en-Josas

Past21 Month

Remaining8 month

Plant Livestock Microbe

Taking tools out of their laboratory:Methods and practices to make Bioinformaticstools accessible through Galaxy

Valentin Marcon∗1, Alexis Dereeper2, Sarah Maman-Haddad3, Melanie Petera4, Luc Jouneau5, Marie Tremblay-Franco6, and Olivier Inizan1

The second use case “BIOS4BIOL” was a set of wrappers, build by members of a statistical working group at INRA. We standardized the wrappers and created a newone that achieve normalization on data, a common feature of the old wrappers. We worked with the original developers of the tools and to facilitate ourcollaborations we made a training on the versioning tool “git”. We also have managed dependencies with Conda packages.The tools are now available in the “Genotool” Galaxy instance and in the Galaxy Toolshed.

The third use case “SNiPlay” , a tool for SNP detection, management and analysis. It is a pipeline of wrappers that has been published on the Toolshed but needed tobe completed. In this project, we managed dependencies with Conda and for that, we created two new Conda packages. Readseq and sNMF package are nowavailable through the Bioconda repository. We then updated the tools on the Toolshed. Finally we created a Docker image that includes SNiPlay pipeline tools, otherscomplementary tools, a visualization plugin, and the ready-to-use workflow.

Olivier INIZAN Olivier.inizan@inra.fr

@valentin_marcon

@OlivierInizan

top related