reproducible bioinformatics project: a community for ... · 91 demonstrative workflows (i.e. docker...

Post on 18-Oct-2019

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

ReproducibleBioinformaticsProject:Acommunityforreproducible1

bioinformaticsanalysispipelines2

NehaKulkarni1,LucaAlessandrì1,RiccardoPanero1,MaddalenaArigoni1,MartinaOlivero2,3

FrancescaCordero3$,MarcoBeccuti3andRaffaeleACalogero1$4

5

1Dept.ofMolecularBiotechnologyandHealthSciences,UniversityofTorino,Torino,Italy6

2Dept.ofOncology,UniversityofTorino,Candiolo,Italy7

3Dept.ofComputerSciences,UniversityofTorino,Torino,Italy8

9

Neha Kulkarni kulkarnineha220@gmail.com 10

Luca Alessandrì alessandri.luca1991@gmail.com 11

Riccardo Panero riccardo.panero@gmail.com 12

Maddalena Arigoni maddalena.arigoni@unito.it 13

Martina Olivero martina.olivero@ircc.it 14

Francesca Cordero fcordero@di.unito.it 15

Marco Beccuti beccuti@di.unito.it 16

Raffaele A Calogero raffaele.calogero@unito.it 17

18

$Corresponding author 19

20

Abstract21

BackgroundReproducibilityofaresearchisakeyelementinthemodernscienceanditis22

mandatoryforanyindustrialapplication.Itrepresentstheabilityofreplicatinganexperiment23

All rights reserved. No reuse allowed without permission. (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/239947doi: bioRxiv preprint first posted online Dec. 26, 2017;

2

independentlybythelocationandtheoperator.Therefore,astudycanbeconsidered24

reproducibleonlyifalluseddataareavailableandtheexploitedcomputationalanalysisworkflow25

isclearlydescribed.However,todayforreproducingacomplexbioinformaticsanalysis,theraw26

dataandalistoftoolsusedintheworkflowcouldbenotenoughtoguaranteethereproducibility27

oftheresultsobtained.Indeed,differentreleasesofthesametoolsand/orofthesystemlibraries28

(exploitedbysuchtools)mightleadtosneakyreproducibilityissues.29

ResultsToaddressthischallenge,weestablishedtheReproducibleBioinformaticsProject(RBP),30

whichisanon-profitandopen-sourceproject,whoseaimistoprovideaschemaandan31

infrastructure,basedondockerimagesandRpackage,toprovidereproducibleresultsin32

Bioinformatics.OneormoreDockerimagesarethendefinedforaworkflow(typicallyoneforeach33

task),whiletheworkflowimplementationishandledviaR-functionsembeddedinapackage34

availableatgithubrepository.Thus,abioinformaticianparticipatingtotheprojecthasfirstlyto35

integrateher/hisworkflowmodulesintoDockerimage(s)exploitinganUbuntudockerimage36

developedadhocbyRPBtomakeeasierthistask.Secondly,theworkflowimplementationmust37

berealizedinRaccordingtoanR-skeletonfunctionmadeavailablebyRPBtoguarantee38

homogeneityandreusabilityamongdifferentRPBfunctions.Moreovershe/hehastoprovidethe39

Rvignetteexplainingthepackagefunctionalitytogetherwithanexampledatasetwhichcanbe40

usedtoimprovetheuserconfidenceintheworkflowutilization.41

ConclusionsReproducibleBioinformaticsProjectprovidesageneralschemaandaninfrastructure42

todistributerobustandreproducibleworkflows.Thus,itguaranteestofinaluserstheabilityto43

repeatconsistentlyanyanalysisindependentlybytheusedUNIX-likearchitecture.44

All rights reserved. No reuse allowed without permission. (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/239947doi: bioRxiv preprint first posted online Dec. 26, 2017;

3

Keywords45

Reproducibleresearch,docker,wholetranscriptomesequencing,miRNAsequencing,ChIP46

sequencing,community,SNV.47

Background48

RecentlyBakerandLithgow[1,2]highlightedtheproblemofthereproducibilityinresearch.49

Reproducibilitycriticalityaffectstodifferentextentalargeportionofthesciencefields[1].Since50

nowadaysbioinformaticsplaysanimportantroleinmanybiologicalandmedicalstudies[3],a51

greateffortmustbeputtomakesuchcomputationalanalysesreproducible[4,5].Reproducibility52

issuesinbioinformaticsmightbeduetotheshorthalf-lifeofthebioinformaticssoftware,the53

complexityofthepipelines,theuncontrolledeffectsinducedbychangesinthesystemlibraries,54

theincompletenessorimprecisioninworkflowdescription,etc.Todealwithreproducibilityissues55

inBioinformaticsSandve[5]suggestedtengoodpracticerulesforthedevelopmentofa56

computationalworkflow(Table1).AcommunitythatfulfillsomeoftherulessuggestedbySandve57

isBioconductor[6]project,whichprovidesversioncontrolforalargeamountof58

genomics/bioinformaticspackages.Inthisway,oldreleasesofanyBioconductorpackagearekept59

availablefortheusers.However,Bioconductordoesnotcoverallthestepsofanypossible60

bioinformaticsworkflow,e.g.inRNAseqwolkflowfastqtrimmingandalignmentstepsare61

generallydoneusingtoolsnotimplementedinBioconductor.BaseSpace[7,8]andGalaxy[9]62

representanexampleofbothcommercialandopen-sourcesolutions,whichpartiallyfulfill63

Sandve’sroles.Furthermore,theworkflowsimplementedinsuchenvironmentscannotbeheavily64

customized,e.g.BaseSpacehasstrictrulesforapplicationssubmission.Moreover,clouds65

applications,asBaseSpace,havetocopewithlegalandethicalissues[10].Ontheotherhand,66

Galaxydoesnotprovidestandardizedmetadatatoannotateworkflows.67

All rights reserved. No reuse allowed without permission. (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/239947doi: bioRxiv preprint first posted online Dec. 26, 2017;

4

Recentlycontainertechnology,alightweightOS-levelvirtualization,wasexploredintheareaof68

Bioinformaticstomakeeasierthedistribution,theutilizationandthemaintenanceof69

bioinformaticssoftware[11-13].Indeed,sinceapplicationsandtheirdependenciesarepackaged70

togetherinthecontainerimage,theusershavenottodownloadandinstallallthedependencies71

requiredbyanapplication,thusavoidingallthecaseswherethedependenciesarenotwell72

documentedornotavailableatall.Moreover,problemsrelatedtoversionsconflictsorupdatesof73

thesystemlibrariesdonotoccur,becausethecontainersareisolatedfromtherestofthe74

operatingsystem.75

Amongtheavailablecontainerplatforms,Docker(http://www.docker.com)isbecomingdefacto76

thestandardenvironmenttoquicklycompose,create,deploy,scaleandoverseecontainerized77

applicationsunderLinux.Itsstrengthsarethehighdegreeofportability,whichallowsusersto78

registerandsharecontainersovervarioushostsinprivateandpublicrepositories;amore79

effectiveresourceuseandafasterdeploymentcomparedwithothersoftware.80

Although,Menegidio[13],daVeiga[11]andKim[12]providedalargecollectionofbioinformatics81

instrumentsbasedonDockertechnology,todaywearemissingacommunitydeliveringto82

bioinformaticiansacontrolled,butflexibleframeworktodistributeDockerbasedworkflowsunder83

theumbrellaofareproducibilityframework.Here,wedescribetheimplementationofthe84

ReproducibleBioinformaticsProject(RBP,http://reproducible-bioinformatics.org/),aimingto85

distributetothebioinformaticscommunitydocker-basedapplicationsunderthereproducibility86

frameworkproposedbySandve[5].RBPacceptssimpledockerimplementationsofbioinformatics87

software(e.g.adockerembeddingbwaalignertool),implementationofcomplexpipelines88

involvingtheuseofmultipledockersimages(e.g.aRNAseqworkflowprovidingallthestepsforan89

analysisstartingfromthequalitycontrolofthefastqtodifferentialexpression),aswellas90

All rights reserved. No reuse allowed without permission. (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/239947doi: bioRxiv preprint first posted online Dec. 26, 2017;

5

demonstrativeworkflows(i.e.dockerimagesembeddingthefullbioinformaticsworkflowusedina91

publication)intendedtoprovidetheabilitytoreproducepublisheddata.92

Implementation93

TheReproducibleBioinformaticsProject(RBP)referencewebpageisreproducible-94

bioinformatics.org.Theprojectisbasedonthreemodules(Figure1):(i)docker4seqRpackage95

(https://github.com/kendomaniac/docker4seq),(ii)dockersimages96

(https://hub.docker.com/u/repbioinfo/),and(iii)4SeqGUI97

(https://github.com/mbeccuti/4SeqGUI).98

Docker4seqpackageprovidestheconnectionbetweenusersanddockercontainers.Docker4seqis99

organizedintwobranches:stableanddevelopment.Thetransitionbetweendevelopmentand100

stablebranchisdonewhenamodule(Rfunction(s)/dockercontainer(s))fulfillsthe10rules101

suggestedbySandve[5]forgoodbioinformaticspractice(Table1):102

Thefunctionskeleton.Rindocker4seqprovidesaprototypetobuildadockercontrollingfunction.103

Acknowledgmentsofthedeveloperworkisprovidedwithinthestructureoftheskeleton.R.In104

skeleton.Rthereisafieldindicatingdeveloperaffiliationandemailforcontacts.Indockerimages105

repositorydocker.io/repbioinfoisavailableanUbuntuimage,asprototypeforthecreationofa106

dockerimagecompliantwiththeRBPspecifications.Developerisfreetodecidetousethis107

prototypeortoadaptadifferentLinuxdockerdistributionforhis/herapplication.Dockerimages108

designedbythecoredevelopersofRBParelocatedindocker.io/repbioinfo(docker.com),the109

imagesdevelopedbythirdpartiescanbeinsteadplacedinanypublic-accessdockerrepository.110

RBPrequiresthatanyoperation,implyingtheuseofanyR/Bioconductorpackagesortheuseofan111

externalsoftware,hastobeimplementedinadockercontainer.Onlyreformattingactions,e.g.112

tableassembly,datareordering,etc.,canbehandledoutsideadockerimage.113

All rights reserved. No reuse allowed without permission. (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/239947doi: bioRxiv preprint first posted online Dec. 26, 2017;

6

AnynewRBPmodule(Rfunction(s)/dockerimage(s))mustbeassociatedwithanexplanatory114

vignette,accessibleonlineashtmldocument,andtoasetoftestdata,alsoaccessibleonline.115

Thus,allinstrumentsneededtoacquireconfidenceonmodulefunctionalitiesareprovidedtothe116

finaluser.117

DockerimagesarelabelledwiththeextensionYYYY.NN,whereYYYYistheyearofinsertioninthe118

stableversionandNNaprogressivenumber.YYYYchangesonlyifanyupdateontheprogram(s),119

implementedinthedockerimage,isdone.Thisbecauseanyofsuchupdateswillaffectthe120

reproducibilityoftheworkflow.Previousversion(s)willbealsoavailableintherepository.NN121

referstochangesinthedockerimage,whichdonotaffectthereproducibilityoftheworkflow.122

Anewmodulecanbesubmittedtotheinfo@reproducible-bioinformatics.organdRBPcoreteam123

willverifythecompliancewithSandve[5]rules.Onesvalidated,theRfunctionscontrollingthe124

newmoduleareinsertedindocker4seqstablerelease.Partiallyvalidatedmoduleswillbeplacedin125

developmentbranchandmovedtostableonewhencompliancewithSandve’srulesisfulfilled.126

4SeqGUIisaJavabasedgraphicalinterfacetodocker4seqfunctions.Itisdesignedtoprovidea127

GUItousershavinglimitedknowledgeofRscripting.CurrentlytheGUIembedsonlygeneral-128

purposeworkflows,suchasRNAseq,miRNAseqandChip-seqworkflow.129

Results130

Thestablebranchofdocker4seqRpackagecontainsalltheRfunctionsrequiredtohandleallthe131

stepsofRNAseqworkflow(Fig.2A),ChIPseqworkflow(Fig.2B),andmiRNAseqworkflow(Fig.2C).132

Docker4seqalsoprovidesawrapperfunctionforthebcl2fastqIlluminatooltoconverttheIllumina133

sequenceroutputindemultiplexedfastqfiles(Fig.2).Then,thefastqfilescanbehandledwithany134

ofthethreedifferentworkflows.ThecountstableproducedbyRNAseqormiRNAseqworkflows135

canbeusedfordatavisualization(pca,principalcomponentanalysisfunction),toevaluatethe136

All rights reserved. No reuse allowed without permission. (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/239947doi: bioRxiv preprint first posted online Dec. 26, 2017;

7

statisticalpoweroftheexperiment(experimentPowerfunction),todefinetheoptimalsamplesize137

oftheexperimentforthedetectionofdifferentiallyexpressedgenes(sampleSizefunction)andto138

detectdifferentiallyexpressedgenes/transcripts(wrapperDeseq2function).Samplesize/statistical139

powerestimationoftheexperimentanddifferentialexpressionarecalculatedrespectivelyvia140

RnaSeqSampleSize[14]andDESeq2Bioconductorpackages[15].141

Inthedevelopmentbranch,themaineffortofthecoredevelopersisfocusedinproviding142

workflowsforDNAandRNAsomaticvariantcalling.TheDNAvariantcallingworkflowembedsthe143

pre-processingproceduresuggestedbytheGATKbestpractice(Fig.3A).RNAseqdatapreparation144

forvariantcalling(Fig.3C)requirestheuseofSTAR2stepprocedure[16],whichprovides145

significantlyincreasedsensitivitytonovelsplicejunctions.Then,aftersortingandduplicates146

marking,OPOSSUM[17]isusedtoremoveintronicregionsandtomergeoverlappingreads.We147

havealsoimplementedaspecificprocedure(Fig.3B),basedonxenomesoftware[18],to148

discriminatebetweenhumanreadsandmousehostreadsinthesequencesproducedbythe149

analysisofpatientsderivedxenografts(PDX,[19]).Aspartofthesomaticvariantcallingworkflow150

weareimplementingMUTECT1and2[20](Fig.4A)tocallsomaticvariantsaswellasPLATYPUS151

[21]forextractinginformationofjoined-samplesSNVs(Fig.4B).152

WearealsoexpandingtheRNAseqmoduleaddingthereference-freeSalmonaligner[22],which153

employslessmemoryforthealignmenttaskthanSTAR,butprovidingsimilarresults[23].154

Finally,HashCloneframework(AcceptedforpublicationinBMCBioinformatics),anewsuiteof155

bioinformaticstoolsprovidingB-cellsclonalityassessmentandminimalresidualdisease(MRD)156

monitoringovertimefromdeepsequencingdata,wasintegratedintheDocker4seqpackage.In157

particular,aparallelversionofthestandardHashCloneworkflow(Fig.5)wasdevelopedexploiting158

thedockerarchitecture.159

All rights reserved. No reuse allowed without permission. (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/239947doi: bioRxiv preprint first posted online Dec. 26, 2017;

8

Allthemodulesdescribedaboveareimplementedin18dockerimagesdepositedinthedocker160

hub(https://hub.docker.com/u/repbioinfo/).161

AspartoftheRBPwehavealsodevelopedaGUI,4SeqGUI162

(https://github.com/mbeccuti/4SeqGUI).TheGUIisimplementedinJAVAandcanbeexploitedto163

performwholetranscriptomesequencingworkflow(Fig.2A),ChIPsequencingworkflow(Fig.2B),164

andmiRNAsequencingworkflow(Fig.2C).165

Discussion166

Bioinformaticsworkflowsarebecominganessentialpartofmanyresearchpapers.However,167

absenceofclearandwell-definedrulesonthecodedistributionmaketheresultsofmost168

publishedresearchesunreproducible[24].Recently,Almugbelandcoworkers[25]describedan169

interestinginfrastructuretoembedBioconductorbasedpackages.However,Bioconductordoes170

notcoverallstepsofanypossiblebioinformaticsworkflow,thusprovidingalimitedframeworkfor171

developingcomplexpipelines.Differently,RBPrepresentsanewinstrument,whichexpandsthe172

ideaofAlmugbel[25],providingamoreflexibleinfrastructureallowingthebioinformatics173

communitytospreadtheirworkundertheguidanceofrules,whichguaranteeinter-laboratory174

reproducibilityanddonotlimitdockerimplementationstoBioconductorpackages.RBPcore175

developerscreatedframeworksforRNA/miRNAquantificationandanalysis.ChIPseqworkflowwas176

alsodevelopedandvariantcallingworkflowsforDNAandRNAareunderactivedevelopment.A177

peculiarfeatureofRBPistheacceptanceofdemonstrativeworkflows,i.e.bioinformatics178

proceduresdescribedinabiological/medicalpaper.Ademonstrativeworkflowiswrappedina179

dockerimageanditissupportedbyatutorial,whichdescribesstepbystephowtheanalysisis180

donetoguaranteethereproducibilityofpublisheddata. 181

All rights reserved. No reuse allowed without permission. (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/239947doi: bioRxiv preprint first posted online Dec. 26, 2017;

9

Availabilityandrequirements182

Projectname:ReproducibleBioinformaticsProject183

Projecthomepage:http://reproducible-bioinformatics.org184

Operatingsystem:UNIX-like185

Programminglanguage:R186

Otherrequirements:dockerversion17.05.0-ceorhigher187

License:GPL.188

189

Declarations190

Competinginterests191

None192

193

Funding194

ThisworkhasbeensupportedbytheEPIGENFLAGPROJECT195

196

Authors'contributions197

NKandLAequallycontributedtothedevelopmentofmiRNAworkflowandalltheothertools.RP198

andFCdevelopedtheRNAseqworkflowandrefinedtheChIPseqworkflow.MAandMO199

performedapplicationstesting.MBandRACdevelopedtherulestosubmittoolsandworkflowsto200

theReproducibleBioinformaticscommunity.RACandMBequallysupervisedtheoverallwork.201

202

All rights reserved. No reuse allowed without permission. (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/239947doi: bioRxiv preprint first posted online Dec. 26, 2017;

10

Figurescaption203204Figure1:ReproducibleBioinformaticsProjectstructure.205

206

Figure2:Workflowsavailableinthestablebranchofdocker4seq.A)Wholetranscriptome207

sequencingworkflow,B)ChIPsequencingworkflow,andC)miRNAsequencingworkflow.The208

namesfollowedbyparenthesisarethedocker4seqfunctionsusedtoexecutetheanalysissteps.209

Blackindicateelementsincommonamongmorethanoneworkflow.210

211

Figure3:Variantcallingworkflowsunderrefinementinthedevelopmentbranchofdocker4seq.212

A)SNVscallinginDNAworkflow.ThefunctionsnvPreprocessingrequiresthatusersprovidesits213

owncopyoftheGATKsoftware,becauseofBroadInstitutelicenserestrictions.Thisfunction214

returnsabamfilesorted,withduplicatesmarkedafterGATKindelrealignmentandquality215

recalibration.B)DatapreprocessingforsamplesderivedbyPatientDerivedXenografths(PDX).216

Thexenomefunctiondiscriminatesbetweenthemousehostreadsandthehumantumorreads,217

thenDNAorRNASNVcallingworkflowscanbeapplied.C)SNVscallinginRNAworkflow.The218

functionstar2stepsgeneratesasortedbam,whereduplicatesaremarkedandprocessedby219

opossumforremovalofintronicregionsandmergingofoverlappingreads.Thenamesfollowedby220

parenthesisarethedocker4seqfunctionsusedtoexecutetheanalysissteps.Blackindicate221

elementsincommonbetweenmorethanoneworkflow.222

223

Figure4:Variantcallingworkflowsunderdevelopmentinthedevelopmentbranchof224

docker4seq.A)SomaticSNVsdetectionusingGATKMUTECT1or2.B)Platypusbasedjoin225

mutationscaller.Dashedblocksarenotimplemented,yet.226

227

All rights reserved. No reuse allowed without permission. (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/239947doi: bioRxiv preprint first posted online Dec. 26, 2017;

11

Figure5:HashClonepipeline.TheHashClonestrategyisorganizedinthreesteps:228

Thefirststep(redbox)isusedtodetectk-merinallpatients’samples.Thesecondstep(green229

box)focusonthegenerationofsequencesignaturesleadingtotheidentificationofthesetof230

putativeclonespresentineachofthepatients’sample;thethirdstep(bluebox)isusedtothe231

characterizationandevaluationofthecancerclones.232

233

References234

1. BakerM:1,500scientistsliftthelidonreproducibility.Nature2016,533(7604):452-454.235

2. LithgowGJ,DriscollM,PhillipsP:Alongjourneytoreproducibleresults.Nature2017,236

548(7668):387-388.237

3. SearlsDB:Therootsofbioinformatics.PLoScomputationalbiology2010,6(6):e1000809.238

4. KanwalS,KhanFZ,LonieA,SinnottRO:Investigatingreproducibilityandtracking239

provenance-Agenomicworkflowcasestudy.BMCbioinformatics2017,18(1):337.240

5. SandveGK,NekrutenkoA,TaylorJ,HovigE:Tensimplerulesforreproducible241

computationalresearch.PLoScomputationalbiology2013,9(10):e1003285.242

6. GentlemanRC,CareyVJ,BatesDM,BolstadB,DettlingM,DudoitS,EllisB,GautierL,GeY,243

GentryJetal:Bioconductor:opensoftwaredevelopmentforcomputationalbiologyand244

bioinformatics.Genomebiology2004,5(10):R80.245

7. ColomboAR,J.TricheTJ,RamsinghG:Arkas:RapidreproducibleRNAseqanalysis.246

F1000Res2017,6:586.247

8. VanNesteC,GansemansY,DeConinckD,VanHoofstatD,VanCriekingeW,DeforceD,Van248

NieuwerburghF:Forensicmassivelyparallelsequencingdataanalysistool:249

ImplementationofMyFLqasastandaloneweb-andIlluminaBaseSpace((R))-application.250

ForensicSciIntGenet2015,15:2-7.251

All rights reserved. No reuse allowed without permission. (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/239947doi: bioRxiv preprint first posted online Dec. 26, 2017;

12

9. DiganW,CountourisH,BarritaultM,BaudoinD,Laurent-PuigP,BlonsH,BurgunA,Rance252

B:AnArchitectureforGenomicsAnalysisinaClinicalSettingUsingGalaxyandDocker.253

Gigascience2017.254

10. DoveES,JolyY,TasseAM,PublicPopulationProjectinG,SocietyInternationalSteeringC,255

InternationalCancerGenomeConsortiumE,PolicyC,KnoppersBM:Genomiccloud256

computing:legalandethicalpointstoconsider.Europeanjournalofhumangenetics:257

EJHG2015,23(10):1271-1278.258

11. daVeigaLeprevostF,GruningBA,AlvesAflitosS,RostHL,UszkoreitJ,BarsnesH,VaudelM,259

MorenoP,GattoL,WeberJetal:BioContainers:anopen-sourceandcommunity-driven260

frameworkforsoftwarestandardization.Bioinformatics2017,33(16):2580-2582.261

12. KimB,AliT,LijeronC,AfganE,KrampisK:Bio-Docklets:virtualizationcontainersfor262

single-stepexecutionofNGSpipelines.Gigascience2017,6(8):1-7.263

13. MenegidioFB,JabesDL,CostadeOliveiraR,NunesLR:Dugong:aDockerimage,basedon264

UbuntuLinux,focusedonreproducibilityandreplicabilityforbioinformaticsanalyses.265

Bioinformatics2017.266

14. ChingT,HuangS,GarmireLX:PoweranalysisandsamplesizeestimationforRNA-Seq267

differentialexpression.RNA2014,20(11):1684-1696.268

15. LoveMI,HuberW,AndersS:Moderatedestimationoffoldchangeanddispersionfor269

RNA-seqdatawithDESeq2.Genomebiology2014,15(12):550.270

16. DobinA,DavisCA,SchlesingerF,DrenkowJ,ZaleskiC,JhaS,BatutP,ChaissonM,Gingeras271

TR:STAR:ultrafastuniversalRNA-seqaligner.Bioinformatics2013,29(1):15-21.272

17. OikkonenL,LiseS:MakingthemostofRNA-seq:Pre-processingsequencingdatawith273

OpossumforreliableSNPvariantdetection.WellcomeOpenRes2017,2:6.274

All rights reserved. No reuse allowed without permission. (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/239947doi: bioRxiv preprint first posted online Dec. 26, 2017;

13

18. ConwayT,WaznyJ,BromageA,TymmsM,SoorajD,WilliamsED,Beresford-SmithB:275

Xenome--atoolforclassifyingreadsfromxenograftsamples.Bioinformatics2012,276

28(12):i172-178.277

19. SiolasD,HannonGJ:Patient-derivedtumorxenografts:transformingclinicalsamplesinto278

mousemodels.Cancerresearch2013,73(17):5315-5319.279

20. CibulskisK,LawrenceMS,CarterSL,SivachenkoA,JaffeD,SougnezC,GabrielS,Meyerson280

M,LanderES,GetzG:Sensitivedetectionofsomaticpointmutationsinimpureand281

heterogeneouscancersamples.Naturebiotechnology2013,31(3):213-219.282

21. RimmerA,PhanH,MathiesonI,IqbalZ,TwiggSRF,ConsortiumWGS,WilkieAOM,McVean283

G,LunterG:Integratingmapping-,assembly-andhaplotype-basedapproachesforcalling284

variantsinclinicalsequencingapplications.Naturegenetics2014,46(8):912-918.285

22. PatroR,DuggalG,LoveMI,IrizarryRA,KingsfordC:Salmonprovidesfastandbias-aware286

quantificationoftranscriptexpression.Naturemethods2017,14(4):417-419.287

23. ZhangC,ZhangB,LinLL,ZhaoS:Evaluationandcomparisonofcomputationaltoolsfor288

RNA-seqisoformquantification.BMCgenomics2017,18(1):583.289

24. HothornT,LeischF:Casestudiesinreproducibility.Briefingsinbioinformatics2011,290

12(3):288-300.291

25. AlmugbelR,HungLH,HuJ,AlmutairyA,OrtogeroN,TamtaY,YeungKY:Reproducible292

Bioconductorworkflowsusingbrowser-basedinteractivenotebooksandcontainers.JAm293

MedInformAssoc2017.294

295

Tables296297

Table1:Goodpracticebioinformaticsrules,derivedfromSandveetal.[5]298

All rights reserved. No reuse allowed without permission. (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/239947doi: bioRxiv preprint first posted online Dec. 26, 2017;

14

1 ForEveryResult,KeepTrackofHowItWasProduced

2 AvoidManualDataManipulationSteps

3 ArchivetheExactVersionsofAllExternalProgramsUsed

4 VersionControlAllCustomScripts

5 RecordAllIntermediateResults,WhenPossibleinStandardizedFormats

6 ForAnalysesThatIncludeRandomness,NoteUnderlyingRandomSeeds

7 AlwaysStoreRawDatabehindPlots

8 GenerateHierarchicalAnalysisOutput,AllowingLayersofIncreasingDetailtoBe

Inspected

9 ConnectTextualStatementstoUnderlyingResults

10 ProvidePublicAccesstoScripts,Runs,andResults

299

300

All rights reserved. No reuse allowed without permission. (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/239947doi: bioRxiv preprint first posted online Dec. 26, 2017;

15

Figures301302

303

Figure1304

305

All rights reserved. No reuse allowed without permission. (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/239947doi: bioRxiv preprint first posted online Dec. 26, 2017;

16

306

307

Figure2308

309

All rights reserved. No reuse allowed without permission. (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/239947doi: bioRxiv preprint first posted online Dec. 26, 2017;

17

310

311

Figure3312

313

All rights reserved. No reuse allowed without permission. (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/239947doi: bioRxiv preprint first posted online Dec. 26, 2017;

18

314

315

Figure4316

317

All rights reserved. No reuse allowed without permission. (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/239947doi: bioRxiv preprint first posted online Dec. 26, 2017;

19

318

319

Figure5320

321

All rights reserved. No reuse allowed without permission. (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint. http://dx.doi.org/10.1101/239947doi: bioRxiv preprint first posted online Dec. 26, 2017;

top related