Alan Turing Institute Symposium on Reproducibility for Data-Intensive Research Dickson Poon China Centre, St. Hugh’s College, University of Oxford Wednesday 6 and Thursday 7 April 2016
1
AlanTuringInstituteSymposiumonReproducibilityforData-IntensiveResearch
Wednesday6April2016
Programme Location
09:00-09:30Registrationandcoffee OutsideoftheLectureTheatre
09:30-09:40WelcomeandIntroduction;outcomesofthesymposium(LucieBurgess,UniversityofOxford)
LectureTheatre
09:40–10:10Whatisreproducibilityinthesettingofcomputationaldataanalytics?(ProfCaroleGoble,ManchesterUniversity)
LectureTheatre
10:10–10:30OverviewoftheATI(ProfJaredTanner,UniversityofOxford) LectureTheatre
10:30–11:00Coffeebreak WordsworthTeaRoom
11:00-12:45Session1–Dataprovenancetosupportreproducibility(Lead:DrPaoloMissier,UniversityofNewcastle,andProfTomNichols,UniversityofWarwick)Speakers:ProfLucMoreau,UniversityofSouthamptonProfDorothyBishop,UniversityofOxfordDrPaoloMissier,UniversityofNewcastle
LectureTheatreBreak-outrooms–seeSessionformatformoredetails
12:45–13:45Lunch WordsworthTeaRoom
13:45–15:30Session2–Computationalmodelsandsimulations(Lead:ProfJeremyGibbons,UniversityofOxford)Speakers:DrNicolaBotta,PotsdamInstituteforClimateImpactResearchProfPatrikJansson,ChalmersUniversityofTechnologyDrCamilDemetrescu,SapienzaUniversityofRome
LectureTheatreBreak-outrooms–seeSessionformatformoredetails
15:30–16:00Coffeebreak WordsworthTeaRoom
16:00–17:15Lightningtalks LectureTheatre
17:15–17:45Day1reportage,plansforday2 LectureTheatre
19:00Pre-dinnerdrinks;welcomefromLucieBurgess AttheentrancetoStHugh’sCollegeHall
19:30Conferencedinner StHugh’sCollegeHall
Alan Turing Institute Symposium on Reproducibility for Data-Intensive Research Dickson Poon China Centre, St. Hugh’s College, University of Oxford Wednesday 6 and Thursday 7 April 2016
2
Thursday7April2016
Programme Location
08:30Coffeeavailable WordsworthTeaRoom
09:00–09:15Re-capofday1andoverviewofday2 LectureTheatre
09:15-11:00Session3–Reproducibilityforreal-timebigdata(Lead:ProfDaviddeRoure,UniversityofOxford)Speakers:DrSuzyMoat,UniversityofWarwickProfDaviddeRoure,UniversityofOxford
LectureTheatreBreak-outrooms–seeSessionformatformoredetails
11:00–11:15Coffeebreak WordsworthTeaRoom
11:15–13:00Session4–PublicationofData-IntensiveResearch(Leads:ProfCaroleGoble,ManchesterUniversity;DavidCrotty,RichardO’Beirne,OxfordUniversityPress)Speakers:DrLaurieGoodman,GigaScienceMrNeilChueHong,SoftwareSustainabilityInstitute
LectureTheatreBreak-outrooms–seeSessionformatformoredetails
13:00–14:00Lunch WordsworthTeaRoom
14:00–15:45Session5–Novelarchitecturesandinfrastructuretosupportreproducibility(Leads:DrRichardMortier,UniversityofCambridge,andDrAdamFarquhar,TheBritishLibrary)Speakers:DrKenjiTakeda,MicrosoftResearchLimitedDrRichardMortier,UniversityofCambridge
LectureTheatreBreak-outrooms–seeSessionformatformoredetails
15:45–16:30Day2reportage LectureTheatre
16:30Endofsymposium
BREAK-OUTGROUPSSESSIONFORMATTherewillbe3break-outgroupspersession,eachwithadifferenttopic;delegateswillchoseonthedaywhichgrouptojoin.
Questionsforsmallgroupdiscussioncouldinclude:• Currentlandscape,scientificchallenges-Whatarethelatestadvancesinresearchrelatingtothetheme?Whatis
stateoftheart?Whatarethekeyresearchquestionsandscientificchallenges?Wheredothegreatestgapslie?• Disciplinaryandinter-disciplinarychallenges-Whatarethedata-intensivescientificdisciplinesthatshouldbe
broughttogether;wheredotheyinter-relatetosupportresearchinthisarea?• Foundationalandappliedresearchchallenges-Whataretheappliedstakeholderchallengesandhowshouldthese
drivefundamentalresearch?e.g.inhealth,finance,utilities,engineering,etc…• Benefitsandimpact–Whatarethedownstreamimpacts,e.g.lesscostlycomputations,moreefficientand
widespreaddatare-use,greatertransparencyandpublictrustinscience?• WhatcouldanemergingATIprogrammelooklikeinthisarea?Whatambitioustargetscouldweachievewithin1
year,3,5years?WhatistheaddedvalueofworkingwithintheATIframework?WhatpartnersshouldtheATIworkwithinthisarea?
Alan Turing Institute Symposium on Reproducibility for Data-Intensive Research Dickson Poon China Centre, St. Hugh’s College, University of Oxford Wednesday 6 and Thursday 7 April 2016
3
Sessionabstracts
Sessionlocation,logistics,reportageKeynotesandsessiontalkstakeplaceintheLECTURETHEATRE.Followingeachsessiontherewillbethreebreakoutgroups,intheLectureTheatre,theLoueySeminarRoomonthe2ndfloor,andtheHoTimSeminarRoomonthe1stfloor.Eachbreakoutgroupwillhaveadifferenttopic.Topicsandbreakoutroomsarescheduledbelow.Foreachbreakoutgroup,weinviteonedelegatetoactasascribeandtonotethekeypointsofthediscussioninaGoogledocument,forwhichlinksareprovidedbelow.OtherdelegatesarewelcometoaddtotheGooglenotesduringeachsession.Wewillphotographpost-it-notesandflip-chartoutputsforavisualrecord.Thesessionchairswillfeedbackthekeypointsinthereportagesessions.Theoutcomeofthesymposiumwillbeawhitepaper,whichwewillpublishonline.Ifyouwouldliketodosomepre-readingonthetopicscoveredinthesymposium,alistofreferencesisavailableinBibText,RefWorksandEndNotesformatsonasharedGooglefolderhere:https://drive.google.com/open?id=0B1EyUglIzGARZjFVa3dLUWExbzg
WEDNESDAY6APRIL2016
Openingkeynote-WhatisReproducibility?-ProfessorCaroleGoble,UniversityofManchester‘WhenIuseaword’,HumptyDumptysaidinratherascornfultone,‘itmeansjustwhatIchooseittomean-neithermorenorless.’[1].ItisthesamewithReproducible.Reusable.Repeatable.Recomputable.Replicable.Rerunnable.Regeneratable.Reviewable.ItisR*mayhem.Orpride[2].Doesitmatter?Atleastitdoesforcomputationalscience.Differentshadesof‘reproducible’matterinthepublishingworkflow,dependingonwhetheryouaretestingforrobustness(rerun),defence(repeat),certification(replicate),comparison(reproduce)ortransferringbetweenresearchers(reuse).Differentformsof‘R’makedifferentdemandsonthecompleteness,depthandportabilityofresearch[3].Ifweviewcomputationaltools(software,scripts)asinstruments–‘datascopes’ratherthan‘telescopes’or‘microscopes’–thenweneedtobeclearwhenwetalkaboutreproduciblecomputationalexperimentsaboutwhetherwearererunningwiththesamesetuponthesame(preserved)instrument(sayavirtualmachine),orreproducingtheinstrumenttoreplicatetheexperiment(sayadescriptionofanalgorithmrecoded)orrepairingtheinstrumentsowecanreuseitforsomeotherexperiment(sayreplacingadefunctwebserviceoradeprecatedlibrary).InthistalkProfessorCaroleGoblewilldiscusstheR*brouhahaanditspracticalconsequencesforcomputationaldatadrivenscience.[1]LewisCarroll,ThroughtheLooking-Glass(1872)[2]DavidDeRoure,MoreRsthanPirateshttp://www.scilogs.com/eresearch/more-rs-than-pirates/[3]JulianaFreire,PhilippeBonnet,DennisShasha,Computationalreproducibility:state-of-the-art,challenges,anddatabaseresearchopportunitiesSIGMOD'12Proceedingsofthe2012ACMSIGMODInternationalConferenceonManagementofData:593-596,ACMNewYork,NY,USA,doi:10.1145/2213836.2213908OverviewoftheAlanTuringInstitute–ProfessorJaredTanner,UniversityofOxfordProfessorJaredTannerwillgiveanoverviewofthemissionandstrategicobjectivesofthenewly-establishedAlanTuringInstitute,andwilltakequestionsfromdelegates.TheInstitute’smissionisto:undertakedatascienceresearchattheintersectionofcomputerscience,mathematics,statisticsandsystemsengineering;providetechnicallyinformedadvicetopolicymakersonthewiderimplicationsofalgorithms;enableresearchersfromindustryandacademiatoworktogethertoundertakeresearchwithpracticalapplications;andactasamagnetforleadersinacademiaandindustryfromaroundtheworldtoengagewiththeUKindatascienceanditsapplications.
Alan Turing Institute Symposium on Reproducibility for Data-Intensive Research Dickson Poon China Centre, St. Hugh’s College, University of Oxford Wednesday 6 and Thursday 7 April 2016
4
Session1–Dataprovenancetosupportreproducibility(Wed6April,11:00-12:45)LinktoGoogledocfornotes:GroupA–https://docs.google.com/document/d/1Ac2J7WVzXfEHeWnexWOQxBiEM3s8c7eO6FFXj-gJGZw/edit?usp=sharingGroupB–https://docs.google.com/document/d/1UMFEtudI8Rs-2VptMZc2_K24qC4gmQaHG0iE7weaa7o/edit?usp=sharingGroupC–https://docs.google.com/document/d/155NKzZ-U9jSygmyZe2yMFdc0P79HC8MksZlx7lsyFuQ/edit?usp=sharing
Sessionchairs DrPaoloMissier,NewcastleUniversity
ProfessorTomNichols,WarwickUniversity
Sessionabstract
Thedriveforgreaterreproducibilitydemandsthateveryaspectofexperimentaldesign,dataacquisition,pre-processing,analysisandresultsgenerationbetrackable.Completeprovenancewouldthenallowanindependentinvestigatortounderstandexactlywhatwasdonetothedataateachstep,andattempttoreproducetheresultwitheithertheoriginal(shared)dataoranewsetofdata.Inthesamespirit,theabilitytocaptureprovenanceisalsoimportanttosupporttheexplorationofalternativeexperimentaldesigns,bymakingitpossibletoreasonabout,andexplain,differencesinoutcomesproducedbydifferentversionsofanexperiment.Realisingthispotentialhasbeenprovingdifficult,however.Thegoalofthissessionistotryandunderstandtherealityofprovenancemanagementpracticeswithrespecttoreproducibility.Webeginwiththreeshortpresentations,whichwill(1)exploreprovenanceissuesintheareaofneurosciences(2)summarisetheprovenancestandardsthatareavailabletoaddresstheserequirements,and(3)provideanoverviewofexperimentaltoolsthatleveragethosestandardstofacilitatereproducibility.
Sessionspeakers(inorderoftalks)
1.1ProfessorLucMoreau,UniversityofSouthampton
ProvenanceforexplainingandreproducingpastresultsTheESRCEBookprojectaimstoofferamulti-modaltool-suite(commandline,web-basedinteractiveportal,andinteractiveworkflows)aidingintheuseandteachingofstatisticalanalysistechniqueswithaparticularemphasisontheirapplicationtosocialscience.Provenanceisattheheartofthisapproach,capturingtracesofexecutionsteps,irrespectiveoftheirmodality.Provenancecanalsobeusedasinputtoaworkflowreconstructioncomponent,allowingtracesofpreviouslycapturedstepstobeeditedasre-executableworkflows.Inthistalk,IwilloutlinetheEBookapproachandIwillillustratethesalientaspectsoftheprovenancePROVmodel.
1.2ProfessorDorothyBishop,UniversityofOxford
Opendata:unintendedconsequencesandsuggestionsforavertingthemOpendataistypicallypresentedaspartofthesolutiontothereproducibilitycrisis,buttherearesituationswhenitcouldhavequitetheoppositeeffect.Anylarge,multivariatedatasetprovidesampleopportunitiesforp-hacking–i.e.diggingaroundinthedatatolookfor‘interesting’patternsthatgivesignificantresults.Iwillconsiderthreepossiblewaysofavertingproblems:dataaccessagreementswithpre-registeredanalysis,dividingdataintodiscovery(open)andreplication(restrictedaccess)samples,andmaskeddata.
1.3DrPaoloMissier,UniversityofNewcastle
Asscientificexperimentalresearchbecomesincreasinglydata-intensiveanddata-driven,anecosystemoftechnologyandtoolsisslowlyemergingtoaddresstheneedtoensureitsoutcomesarereproducible.Thispresentationwillbrieflyexploresomeofthetechnologywhereprovenancefeaturesaspartofthesolution.Thesewillinclude,amongstothers,theNoWorkflowandYesWorkflowtools(thinkyinandyangofscientificprogramming),aswellasaprovenance-awareMatlabclientdevelopedbytheDataONE‘federateddatapreservation’project,aspartofitstoolkitinsupportoftheEarthSciences.
Breakoutgroups
Alan Turing Institute Symposium on Reproducibility for Data-Intensive Research Dickson Poon China Centre, St. Hugh’s College, University of Oxford Wednesday 6 and Thursday 7 April 2016
5
GroupA–LectureTheatre GroupB–LoueySeminarRoom GroupC–HoTimSeminarRoom
Whatarethemotivations,challengesandlimitationsintherecordingandexploitationofprovenanceofopendata?
Provenanceisonlyoneoftheelementsofreproducibility.Howdoesitintegratewithotherpiecesofthe‘repropuzzle’?
Whatkindofautomatedreasoningcanweperformusingprovenance?Forinstance:canwedesignanalgorithmthatgeneratesanewinterestingexperimentusingthe(detailed)provenancerecordingsofotherexperiments?
Chair:ProfTomNichols Chair:DrPaoloMissier Chair:ProfLucMoreau
Scribe:DrSusannaSansone Scribe:DrPaoloMissier Scribe:ProfPatrikJansson
Alan Turing Institute Symposium on Reproducibility for Data-Intensive Research Dickson Poon China Centre, St. Hugh’s College, University of Oxford Wednesday 6 and Thursday 7 April 2016
6
Session2–Computationalmodelsandsimulations(Wed6April,13:45-15:30)LinktoGoogledocfornotes:GroupA–https://docs.google.com/document/d/1pdNNwu-b1aLmLwdbJWfCSHiG0ehQPFPDdBPPdQt7FHM/edit?usp=sharingGroupB–https://docs.google.com/document/d/1isJ15RwA-KjxFnfMygEER2NlqrFMVN6izdrxqpDDU6s/edit?usp=sharingGroupC–https://docs.google.com/document/d/1lnmQxFoFIzw1spDvJwapTgIaOuOKQLcHO4CQmlVRQE0/edit?usp=sharing
Sessionchair ProfessorJeremyGibbons,UniversityofOxford
Sessionabstract
Introductorypresentationsonartifactevaluationincomputingpublications(validatingthataccompanyingcodedoesindeedgiveanyresultsreportedinthepaper)andintheotherdirection,aboutcomputingtechniquesthathelptoyieldcorrectcomputationalsimulationsofabstractmodels,suchasdifferentialequationsinaneconomicspaper(domain-specificlanguages,codegeneration,programcorrectness,typesafety,verificationetc.)Foragentle,accessibleintroductiontothetopic,pleasesee:https://theconversation.com/science-relies-on-computer-modelling-so-what-happens-when-it-goes-wrong-56859
Sessionspeakers(inorderoftalks)
2.1DrNicolaBotta,PotsdamInstituteforClimateImpactResearchProfessorPatrikJansson,ChalmersUniversityofTechnology
FromNumericalSimulationstoRigorousScientificAdvice:It'sAllAboutLanguages!Wecansimulatetheevolutionofcoupledearthsystemmodelsoverthousandsofyears,createsyntheticpopulationsofmillionsofagentsandanalysenetworksofmaterialandenergyflowsonveryfinescales.Butcanweinterpretandcommunicateourresultsinarigorous,unequivocalway?Canwebuilduponshared,unambiguousnotionsofsustainability,stability,avoidability?Canweprovideaccountableadvicetodecisionmaking?Wearguethat,inspiteoftheamazinggrowthofavailablecomputingpower,thegapbetweenscientificcomputingandrigorousscientificadvicehasbeenwideningandthatcomputingsciencecanplayacrucialroleinhelpingtoclosethisgap.
2.3DrCamilDemetrescu,SapienzaUniversityofRome
ArtifactEvaluationsforSoftwareConferencesWedescribeanevaluationprocessofartifactsthatcomplementconferencepublicationswithsupplementarymaterial(software,dataetc.).Theprocesshasbeenwidelytestedinseveralmainstreamconferencesincomputingsince2011.Wereportonthemotivations,implementationandoutcomeofthisprocess.
Breakoutgroups
GroupA–LectureTheatre GroupB–LoueySeminarRoom GroupC–HoTimSeminarRoom
LanguagesforScientificModelling Domain-specificlanguagesforscientists
ArtifactEvaluation
Chair:DrJonathanCooper Chair:DrJamesCheney Chair:DrCamilDemetrescu
Scribe:CatherineJones Scribe:ProfPatrikJansson DrCraigAnslow
Alan Turing Institute Symposium on Reproducibility for Data-Intensive Research Dickson Poon China Centre, St. Hugh’s College, University of Oxford Wednesday 6 and Thursday 7 April 2016
7
LIGHTNINGTALKS(Wed6April,16:00-17:15)
1.Layingthegroundworkforbiomedicaldatadiscovery,sharingandreuseSpeaker:DrSusannaAssuntaSansone,AssociateDirector,Oxforde-ResearchCentre,UniversityofOxford;Consultant,NaturePublishingGroupSlideshere.ThebiomedicalandhealthcareDataDiscoveryIndexEcosystem(bioCADDIE,https://biocaddie.org)isacooperativeresearchefforttofacilitatedatadiscovery,sharingandreuseviathedevelopmentofDataMed,aDataDiscoveryIndexprototypefortheNIHBigDatatoKnowledgeInitiative(BD2K,https://datascience.nih.gov).WeneedtobetransformativeandimpactfulfordataasPubMedisforthebiomedicalliterature.Inthisshorttalkthehighlightsofourjourneysofar.2.OntheResearchObjectsframeworkSpeaker:ProfessorCaroleGoble,SchoolofComputerScience,UniversityofManchesterTobereproducibleincludesbundling,alongwiththenarrative,theotherstuff:experimentalmethods,computationalcodes,data,algorithms,workflows,scripts.Someoftheotherstuffishostedremotelyanditalsohasthepotentialtochangeunderyourfeet.Folksincreasinglyreferto‘ResearchObjects’asageneraltermfor‘stuffthatsupplementsanarticleoranewcurrencyunitforresearch’or‘somethingotherthanapaper’.ButInfrastructuremakerswhohavetomake,supportandexchangeResearchObjectsneedmorethanconcepts.Weneedframeworks,metadataspecifications,referenceimplementations,examples.Metaphors.Researchobject.orghasdefinedsuchaROframeworkwithreferencespecificationsandimplementations.ROsaremetadataobjectsforexplicitlydescribingaggregationsorpackagesofcontent:boxesofcomponents,andassemblinginstructions,withashippingmanifestforwhatisintheboxandwhereitisfrom.Wespecifytheontologiesneededtoconstructmanifests(aggregationandannotation)andtoguidetheircontent(checklists,provenance,versioning,dependencies).TheROcontainerneedstobeimplementedusingoff-the-shelfplatforms–Zip,BagIt,Docker.Iwillsketchouttheframeworkandpointtosomeimplementations.3.ProvenanceinneuroimagingwithNIDMResultsSpeaker:ProfessorThomasNichols,HeadofNeuroimagingStatistics,DepartmentofStatistics&WarwickManufacturingGroup,WarwickUniversityFunctionalMagneticResonanceImaging(fMRI)isoneoftheearly'bigdata'biomedicaldisciplines,withindividual'srawdataoccupying~1GBaround2000,andgrowinglargerwithmodernacquisitiontechniques.Thislarge,complexdatahoweverisroutinelyreducedtosummariesthatcouldbewrittenonaPostItnote:alistofx,y,zcoordinatesofactivationlocations.20yearsago,theneedforthisdistillationwasdefendedastherewasnowaytoshareorpublishthebinaryimagefiles.Inthemodernera,however,thereislittlejustificationfornotsharingthefullanddetailedrepresentationoffMRIresults.Iwilldiscussastandardizationinitiativetolinkanddescribeallfacetsofaneuroimagingexperiment.TheeffortiscalledtheNeuroimagingDataModel(NIDM),andissupportedbytheInternationalNeuroinformaticsCoordinatingFacility(INCF)NeuroimagingDataSharingTaskForce.Aftergivinganoverviewoftheproject,IwillfocusonNIDM-Results,theportionofthemodelthatrepresentsmassunivariatestatisticalresults.Thisstandardapproachrepresentsresultsfromthe3mostwidelyusedsoftwarepackages,covering80%ofcitationsinfMRI,andfacilitatessharingandreuseofresults,formeta-analysisinparticular.4.SoftwareRe-use,re-purposingandreproducibilitySpeaker:CatherineJones,SoftwareEngineeringGroupLeader,ScientificComputingDepartment,ScienceandTechnologyFacilitiesCouncilCatherinewillgiveanoverviewoftheJiscfundedproject,SoftwareReuse,Repurposing&Reproducibilitywhichexaminedissuesofpersistentidentificationofsoftwareandissuesaroundkeepingsoftwarerunningthroughuseofvirtualmachines/Docker.5.Howtomakedataworkforyou–adataecosystemvisionbyElsevierSpeaker:DrWouterHaak,VPResearchDataManagementSolutions,Elsevier
Alan Turing Institute Symposium on Reproducibility for Data-Intensive Research Dickson Poon China Centre, St. Hugh’s College, University of Oxford Wednesday 6 and Thursday 7 April 2016
8
Howtogetanecosystemgoingwheredatasharingbenefitsallstakeholders,startingwiththeresearcher?‘Datasharing’isanoftenheardphrase,andwiththechangingfundermandatestoencourageopendata,animmediateneed.Weneedtolookbeyonddata-compliance.Elsevierisbuildinganopenecosystemfordatauseanddatare-use,andisengagingwiththescientificcommunitytomakethedataworkforthem.6.BayesianuncertaintyquantificationforassessingmodelpredictivityandreproducibilitySpeaker:DrMatteoIcardi,WarwickZeemanLecturer,WarwickUniversityComputersimulationsofphysicalmodelsareoftenaffectedbyacertaindegreeofirreproducibility.Thiscanbecausedbytheactualirreproducibilityoftheunderlyingphysicalproblem,presentalsointheexperiments,oritcouldberelatedtoapoormathematicalformulationorparametrisationofthemodel.Inothercases,thelimitedpredictivityofamodelissimplyduetoanunbalancedratiobetweenmodelcomplexityandinformationavailabletoinformthemodelparameters.Thefirstcauseisoftenreferredtoasanaleatoricuncertaintyandthelastoneasepistemicuncertainty.Inthisshorttalk,IwillillustratethepotentialofcombiningBayesianstatisticaltechniqueswithphysicalmodelstoobtainquantitativeinformationaboutthesesourcesofuncertainty.Anexamplerelatedtothesimulationofsubsurfaceflowswillbepresented.7.Theimpactofresearchdatapublishing:DatasharingstoriesfromScientificDataSpeaker:DrVarshaKhodiyar,DataCurationEditor,NaturePublishingGroupScientificData(NaturePublishingGroup)hasbeenpublishingpeer-reviewed,openaccessandmultidisciplinaryresearchdatasetssinceMay2014.Reuseofshareddatabytheresearchcommunityiswellestablishedinmanyfields.However,inourexperience,openlypublishedpeer-revieweddataisalsousedbythoseoutsideoftheformalresearchcommunity.Wehighlightexamplesofdatareusebyboththeresearchandnon-researchcommunities.8.InolongerknowwhatdataisSpeaker:DrWilliamKilbride,ExecutiveDirector,DigitalPreservationCoalitionInolongerknowwhatdatais.ForsometimenowIhavestruggledwithameaningfuldefinition.Withabackgroundinthehumanities,Ihaveneverbeenentirelycomfortablewiththeideaofdatawithouttheory:‘rawdata’seemstomeanoxymoron.ButIamhappytogivelogicalpositiviststheirspaceinotherdisciplinesifthatistheconsensus.Sowhenaskedaboutdata,Ihaveduckedthephilosophicalquestionandgivenasoftercomputingdefinitionthatdistinguishesdatafromsoftwareandhardware.Datacanbesharedbecauseitcanbepackagedandbecauseitisindependentofthetoolsweapplytoit.Datahasnoon-offswitchanditisnotexecutable.Butmydefinitionnolongerworks–anditneverreallyhas.Practicalexperiencetellsmethatdataincludesallsortsofinternaldependencesandapplications:librariesandservicesthatexistinthespacebetweeninertdataandactiveprocesses.Aswehearevergreaterrhetoricofdatapolicyanddataprocessanddatavalueinscience,soIthinkweneedtobeabitcleareraboutwhatisinscope.Iamnolongersurewhatdatais.Thereismoreatstakethanbitsandbytes.9.OPUS-KeepingTrackOfYourResearchDataSpeaker:DrRipdumanSohan,ResearchAssociate,DigitalTechnologyGroup,CambridgeUniversityEPSRC'sopendatarequirementsnowmeanitisnecessarytotrackthedataandmetadatainvolvedinthecreationofeverypublication.Manualcollectionofthisinformationiscumbersome,tediousanderrorprone,whileaddingautomaticcollectionislikelytobeadomain-specific,timeconsumingandexpensiveexercise.AttheCambridgeComputerLaboratorywehavedevelopedasystemcalledOPUSthatassistswiththetaskofdataandmetadatacollectionforLinuxapplications.OPUSseamlesslyandtransparentlymonitorstheprogramsrunonmachinesandrecordsavarietyofinformationsuchasthefilesthatwereaccessedandtheuserrunningtheprogram.Thisisachievedwithnegligibleperformanceoverhead.OPUSiscapableofsupportingreproducibilityforad-hocworkflowsandindividualprograms.WehavealsoadaptedOPUStosatisfytheEPSRCopendatarequirements.DuringthetalkwewillintroduceOPUS,highlightkeyfeaturesandoutlineourplansforthefuture.
Alan Turing Institute Symposium on Reproducibility for Data-Intensive Research Dickson Poon China Centre, St. Hugh’s College, University of Oxford Wednesday 6 and Thursday 7 April 2016
9
10.SustainingcomputationiskeytothefutureofdigitalSpeaker:ProfessorNatasaMilic-Frayling,UniversityofNottingham/IntactDigitalVenture/UNESCODigitalisessentiallycomputational.Withoutsoftwarewecannotuseandexperiencedata,contentandprograms.Yet,asoftwarelifespanisshortduetoeconomicreasons.Asdemandforaspecificsoftwaredeclines,itisnoteconomicallyfeasibletocontinuemaintainingit.Thus,itsoonbecomesoutdatedandobsolete.However,virtualizationandemulationaretechniquesthatcanprovidetechnologicallyandeconomicallysustainablesolutiontosupporttherareuseoflegacycontentanddata.Currentlywecanvirtualizesoftwareapplicationsthatgo25yearsback.Withaconcertedefforttosupportvirtualizedlegacysoftwareservices,wecanensurethatlegacysoftwareremainsfunctionalinthefarfuture.UNESCOPERSISTprogrammeaimstoestablishaninternationalbankofallsoftwareneededtoenableaccesstoDigitalHeritage.11.IfIcan'thavethedata,canIpleasehaveapointer?Speaker:DrKevinAshley,DirectoroftheDigitalCurationCentre,UniversityofEdinburghMoreandmoredatacollectionsarebeingmadeavailableviasafehavensorothermechanismsthatprovideuserswiththemeanstoanalysedatawithinthem,butnottoseethedataitselfortoexportittoanotherenvironment.Whilstnecessary,thesedonotalwaysmakeiteasyforresearcherstoknowexactlywhattheyhaveruntheiranalysison.Eventhosewhowanttodotherightthingwithregardtoreproducibilitymaynotbegiventhetoolstodoso.12.ReproducibleModelDevelopmentwiththeCardiacElectrophysiologyWebLabSpeaker:DrJonathanCooper,DepartmentofComputerScience,UniversityofOxfordThepromiseofsystemsbiologyistousemathematicalmodellingtoelucidatehowthebehaviouroflargesystemsemergesfrominteractionsofcomponentsatlowerscales,synthesisingexperimentaldatafromdifferentscaleswithinquantitativehypotheses(i.e.models),withtheultimategoalofprovidingapredictiveunderstandingoflivingsystems.Onereasonforthelimitedrealisationofthispromiseisthedifficultyinrelatingmodelstoexperimentaldatarobustlyandreproducibly,beingabletoeasily:testhowwellamodelcapturesobservedbehavioursandpredictsnovelscenarios,andupdateittoincorporatenewdata.Weareaimingtomaketheprocessofproducinganewmodelfromexperimentaldata(i.e.modelselection,parameterisationandvalidation)documented,automatedandrepeatable.Ourideasarebeingtestedinthecontextofcardiacelectrophysiology,arguablythemostmatureareaofsystemsbiology,inwhichmodellinghasunderpinnedmanyadvancesinourbasicunderstandingandnumeroustreatments.Theendresultwillbeacommunityresourceforcardiacresearcherstodevelop,reproduceandcomparemathematicalmodelsofcardiaccellelectrophysiologywithfullconfidenceinthephylogenyandrobustnessofthosemodels.13.TowardModularEmpiricalResearchSpeaker:DrAleksiAaltonen,AssistantProfessor,WarwickBusinessSchoolThenatureofempiricalresearchvariesconsiderablybetweenacademicdisciplines.Whilescholarlyspecialisationaccountsformuchofthetremendousadvancesinmodernscience,italsohinderspracticalopportunitiesforcross-disciplinarycollaborationasresearchersfinditdifficulttoplug-inandinterfacedifferentwaysofdoingresearch.Theproblemis:howcanwemakeempiricalresearchmoremodular?Empiricalresearchmeanstheproductionofaposterioriknowledge,thatis,justifyingknowledgeclaimsbyreferencetoobservation.Anecessarypartofanyempiricalstudyisaprocessthatstartsfromacquiring,simulatingorexperimentallygeneratingdataaboutaphenomenonofinterestandthenproceedsbyperforminganalyticaloperationswiththedata.Thisprocesscanbeconceivedasachainofresearchoperationswithspecificinputsandoutputs.Unfortunately,wecurrentlylacktoolstodescribe,modelandmanagethepracticalstepsinempiricalresearchprocessesinwaythatwouldbeuniversallyacceptable.Asimple,formallywell-definedapproachtomodelempiricalresearchprocesscouldbebased,forinstance,ongraphtheory.Itwouldhelpresearcherstothinkmoreclearlyabouttheirpracticesandtodevelopinformationsystemsthatoffloadadministrativeworktodigitalresearchinfrastructures.AnanaloguecanbefoundinversioncontrolsystemssuchasGitHuborBitbucketthatprovideacommoninfrastructureacrossvarioussoftwaredevelopmentprojectsbyofferingaplatformonwhichmanydifferentdeveloperscandevelopcomponentsforcomplexprojects.Asimilarresearch
Alan Turing Institute Symposium on Reproducibility for Data-Intensive Research Dickson Poon China Centre, St. Hugh’s College, University of Oxford Wednesday 6 and Thursday 7 April 2016
10
infrastructurewouldenhancereplicabilityandcross-disciplinarycommunication,andsupportnewtypesofcollaboration,ultimatelyleadingtomoreeffectivescholarship.14.F1000ResearchandreproducibilitySpeaker:DrMichaelaTorkar,EditorialDirectorF1000ResearchF1000Researchisanopensciencepublishingplatformforlifescientists,operatingafullytransparentpost-publicationpeerreviewmodelandencouragingthesharingofalltypesofstudies,includingnegativeandnullfindings,confirmatorystudiesandattemptstoreproducepreviousstudies.F1000Researchoperatesastrictopendatapolicy,ensuringthatauthorsincludeallthesourcedataunderlyingtheirfiguresandtables.Editorialchecksandthepeerreviewaimtoensurethatsufficientmethodologicaldetails(anddata)areprovidedtoallowotherstoreproducetheresearch;forexample,authorsareaskedtoaddResearchResourceIdentifiers(RRIDs)inordertounambiguouslyidentifyresourcesusedinastudy.F1000ResearchrecentlylaunchedachannelonPreclinicalReproducibilityandRobustness(http://f1000research.com/channels/PRR),specificallyforreproducibilitystudies;thefirstsetofpapersincludes3studiespublishedbyAmgenresearchers,whocouldnotreproducepreviousfindings;inallcases,allthedatageneratedinthereproducibilityattemptsaredepositedontheOpenScienceFrameworkrepository,andthereferees’reports(andtheirnames)havebeenpublishedalongsidethearticles.15.SteppingtowardsopensciencewithaninstitutionaldatarepositorySpeaker:RobinRice,DataLibrarian,EDINAandDataLibrary,UniversityofEdinburghEdinburghDataShare(www.ed.ac.uk/is/datashare),anopenaccessinstitutionaldatarepository,holdsover1,000datasetsfromdisciplinesspanningtheUniversity.Itisdesignedasasustainablesolutionforthosewhodonothaveamoreappropriatedisciplinaryrepositoryoption,andisabulwarkoftheUniversity’sResearchDataManagementPolicy.In2015wereceivedtheDataSealofApprovalpeer-reviewedtrusteddigitalrepositorystandard.WebelievetherepositorymeetstheFAIRframeworkofFindable,Accessible,InteroperableandRe-usable,helpingouracademicstomaketheirdata‘intelligentlyopen’ontheweb(ScienceasanOpenEnterprise.2012).16.ProjectSkye:BridgingTheoryandPracticeforScientificDataCurationSpeaker:DrJamesCheney,Reader,UniversityofEdinburghThistalkprovidedabriefoverviewoftheSkyeproject,fundedbyafive-year€1.99MERCconsolidatorgrant.Scienceisincreasinglydata-driven.Scientificresearchfundersnowroutinelymandateopenpublicationofpublicly-fundedresearchdata.Safelyreusingsuchdatacurrentlyrequireslabour-intensivecuration.Provenancerecordingthehistoryandderivationofthedataiscriticaltoreapingthebenefitsandavoidingthepitfallsofdatasharing.Therearehundredsofcuratedscientificdatabasesinbiomedicinethatneedfine-grainedprovenance;oneimportantexampleistheIUPHAR/BPSGuidetoPharmacologydatabase(GtoPdb),apharmacologicaldatabasedevelopedinEdinburgh.TheSkyeprojectwillbuildsupportforcurationintotheprogramminglanguageitself,buildingonrecentresearchontheLinksWebprogramminglanguage,includingadvancesinlanguage-integratedquery,andonprovenanceanddatacuration.Linksisastrongly-typedlanguagethatprovidesstate-of-the-artsupportforlanguage-integratedqueryandWebprogramming.ThisprojectwillbuildonLinksandotherrecentlanguagedesignsforheterogeneousmeta-programmingtodevelopanewlanguage,calledSkye,thatcanexpressmodular,reusablecurationandprovenancetechniques.Tokeepfocusontherealneedsofscientificdatabases,SkyewillbeevaluatedinthecontextofGtoPdbandotherscientificdatabaseprojects.Bridgingthegapbetweencurationresearchandthepracticesofscientificdatabasecuratorswillcatalyseavirtuouscyclethatwillincreasethepaceofbreakthroughresultsfromdata-drivenscience.Forfurtherinformationontheproject,pleasesee:http://homepages.inf.ed.ac.uk/jcheney/group/skye.html#project
Alan Turing Institute Symposium on Reproducibility for Data-Intensive Research Dickson Poon China Centre, St. Hugh’s College, University of Oxford Wednesday 6 and Thursday 7 April 2016
11
THURSDAY7APRIL2016
Session3–ReproducibilityforReal-TimeBigData(Thu7April,09:15-11:00)LinktoGoogledocfornotes:GroupA–https://docs.google.com/document/d/1MQq-4gByirOcpZI7BjSaH6_Evx_IsEpLXcC_PmJt-bE/edit?usp=sharingGroupB–https://docs.google.com/document/d/1FHIC_dDU8jRZbRzuxjXtc0L7e8V_5cXdtooc9BqeWUs/edit?usp=sharingGroupC–https://docs.google.com/document/d/1QOoXmtw_fLTo8ChGLxf4xIO1psFrnSF0URy7UayeOi8/edit?usp=sharing
Sessionchair ProfessorDaviddeRoure,UniversityofOxford
Sessionabstract
Reproducibilityinnewdigitalscholarship–bigger,faster,better?Newareasofscholarshiparecharacterisedbymachinesandpeopleoperatingtogetheratscale:widespreadadoptionofnewtechnologiesleadstomassivedatageneration,whileatthesametimewehavecrowd-scalepersonalengagementwiththedataanditsanalysis.Thisdemocratisationandempowermentleadstoentirelynewsocialprocesses,whichaffordnewopportunitiestoconductscholarship,suchasthroughcitizenscience,andwhichthemselvesdemandscholarlyexamination.Forexample,howdowereproduceexperimentsinsocialmediaanalytics,whichexaminenewsocialprocesses,atthescaleofthepopulation,inrealtime?Thissessionexploresthechangingscholarlylandscapeandthenewchallengesinreproducibility.
Sessionspeakers(inorderoftalks)
3.1DrSuzyMoat,UniversityofWarwick
SensinghumanbehaviourwithonlinedataOureverydayusageoftheInternetgenerateshugeamountsofdataonhowhumansexchangeinformation.Inrecentwork,wehaveinvestigatedwhetherdatafromsourcessuchasGoogle,WikipediaandFlickrcanbeusedtomeasureandevenpredicthumanbehaviourintherealworld.Inthistalk,Iwillgiveanoverviewoftheopportunitiesandchallengesforreproducibilitycreatedbyworkingwiththesenewformsofdata.
3.2ProfessorDaviddeRoure,UniversityofOxford
TheEthicsofAutomation–adystopianviewofourevolvingknowledgeinfrastructureTodayweembracethemethodologicalchallengesofbigandrealtimedata,fromthelargehadroncollidertothelargepeoplecollider,knownassocialmedia.Thisisallausefulrehearsal,buttherealchallengeslieahead.TheInternetofThings,deployedinourcities,cars,homesandbodies,bringsyetmoredata—machinetomachine.Meanwhile,theengagementofthecrowdinanalyticsmaysoonbeoutofitssweetspot,aswegivewaytohumanstrainingthemachinestoprocessatrisingscale.Clearlythefutureisincreasinglyautomated,butwhatdoesthismeanforresearch,andforresearchcommunication?Thistalkwillprovideasocialmachinesperspectiveonourknowledgeinfrastructureandlookatourincreasinglyautomatedfuture,askingwhetheritismeaningfultoautomatereproducibility,andifandhowweshouldkeepthehumanintheloop.
Breakoutgroups
GroupA–LectureTheatre GroupB–LoueySeminarRoom GroupC–HoTimSeminarRoom
TopicTBC Reproducibilityinthesocialsciences:doesthearrivaloflargeonlinedatasourcesmaketheoutlookbetterorworse?
TopicTBC
Chair:ProfDaviddeRoure Chair:DrSuzyMoat Chair:ProfEricMeyer
Scribe:DrJonathanCooper Scribe:Dr.ClareDyer-SmithandLucieBurgess Scribe:
Alan Turing Institute Symposium on Reproducibility for Data-Intensive Research Dickson Poon China Centre, St. Hugh’s College, University of Oxford Wednesday 6 and Thursday 7 April 2016
12
Session4–PublicationofData-IntensiveResearch(Thu7April,11:15-13:00)LinktoGoogledocfornotes:GroupA–https://docs.google.com/document/d/1_jHOHQSDG5pvExPWyNZFp70Y_eyDlUYB543i4zKMhbY/edit?usp=sharingGroupB–https://docs.google.com/document/d/1uHB--IyDRAUl8DTfafWBT2moj65s98hBsipbRTe0M24/edit?usp=sharingGroupC–https://docs.google.com/document/d/1XTd-pLfFj0sRvFIcLZfiaMTk039Eh3jLUEwy-QZXXFo/edit?usp=sharing
Sessionchairs ProfessorCaroleGoble,UniversityofManchester
DrDavidCrotty,OxfordUniversityPressMrRichardO’Beirne,OxfordUniversityPress
Sessionabstract
Thepublicationofdata-intensiveresearch
Asthemethodsandoutputsofresearchchange,whataretheissuessurroundingthepublicationofdata-intensiveresearch?Thissessionwilldiscusstheroleofsoftwarewithregardstoreproducibility,andhowthistiestoskills,funderpolicyandpublishing,aswellasaneditorinchief’sviewofdatacitation:whatworks,whatdoesn’t,whatrequiresmoreeducation,andwhatisneededtomakesuredataisasreusableaspossible.
Sessionspeakers(inorderoftalks)
4.1DrLaurieGoodman,GigaScience
OvercomingHurdlestoDataPublicationAlthoughdatapublicationisnotnew(CharlesDarwin’sANaturalist’sVoyageAroundtheWorldisa‘classic’example),theideaofdatabeingapublishableentityhasrecentlybecomeaprominentpartofresearcherandpublisherconversations.Here,asthestartofadiscussionondatapublication,IwillpresentwhatthejournalGigaSciencehasbeendoingwithregardtodatapublication,theneedfordatapublication,andresponsesfromthecommunity.
4.2MrNeilChueHong,SoftwareSustainabilityInstitute
There'sNoSuchThingAsIrreproducibleResearch(SoftwareCreditedition)Howoftenhavewereadastoryinanewspaper,anddespairedoveralackofdatatobackitup?Theavailabilityandaccessibilityofdata,softwareandotherresearchoutputsisfundamentaltogoodresearch,yetmanybarriersstillexist.And,whilstopendataismovingusforward,weriskbeingstalledbyoursoftware.Thistalkcallsouttheseissuesandsilos,andexamineshowwehavetochangeourattitudesto‘shame’ifresearchistosurvivepubliccriticism.
Breakoutgroups
GroupA–LectureTheatre GroupB–LoueySeminarRoom GroupC–HoTimSeminarRoom
Overcomingbarrierstodatapublication
Willpublicationofcodealongsidedatasolvethereproducibilityproblem?
Threethingswecandotoday:CalltoArms
Chairs:DrDavidCrottyandDrLaurieGoodman
Chairs:MrRichardO’BeirneandMrNeilChueHong
Chair:DrSimonHodson
Scribe:DrRobDavidson Scribe: Scribe:
Alan Turing Institute Symposium on Reproducibility for Data-Intensive Research Dickson Poon China Centre, St. Hugh’s College, University of Oxford Wednesday 6 and Thursday 7 April 2016
13
Session5–Novelarchitecturesandinfrastructuretosupportreproducibility(Thu7April,14:00-15:45)LinktoGoogledocfornotes:GroupA–https://docs.google.com/document/d/1QwJ6MPaFWtoX1vJEVYVmvRrEls9u3p-RIjGVCeVD_68/edit?usp=sharingGroupB–https://docs.google.com/document/d/1wjIoAT8Fk54URHtrgg5natS2b-jquhnADYA5dm3cS0c/edit?usp=sharingGroupC–https://docs.google.com/document/d/1Gv-51ox41mmGF5Ewno4MUMw4VdMBWRv9Fl8WYd2Bj-E/edit?usp=sharing
Sessionchairs DrRichardMortier,UniversityofCambridge
DrAdamFarquhar,BritishLibrary
Sessionabstract
FutureArchitecturesandInfrastructuresRecentyearshaveseendramaticadvancesincomputinginfrastructurethatsupportreproducibility.Virtualmachines,cloudcomputing,containersallprovidemeanstocaptureandreplicatetheenvironmentinwhichcodemustrun.Thissessionwillconsiderhowtheseinfrastructuretechnologiesarebeingused,andhowsomeofthemarecurrentlybeingdeveloped,beforeconsideringwhatspecificnewrequirementsarisefromreproducibilityfordataintensiveresearch.
Sessionspeakers(inorderoftalks)
5.1DrKenjiTakeda,MicrosoftResearch ReproducibilityandsustainabilityusingcloudcomputingWewilldiscusshowresearchersaroundtheworldareexploringandusingcloudcomputingasacorepartoftheirreproducibilityandsustainabilityplansacrossmanydisciplines,andwhatthefuturelookslike.Wewilldescribehowlinkingscholarlycommunicationsacrossthewebalsoprovidesexcitingopportunitiesahead,includingthroughournewAcademicKnowledgeGraphservice-https://www.projectoxford.ai/academic.Wewilllooktooffercloudcomputingawardstoparticipantsoftheworkshop,andwanttomakesureeveryonecantakeadvantageofthisasapotentialpositiveoutcomeoftheevent–www.azure4research.com
5.2DrRichardMortier,UniversityofCambridge
Unikernels:EvolvingContainersandVirtualMachinesVirtualMachinesandcontainershaverevolutionisedsoftwaredevelopmentandsystemoperations.Providingmeanstocaptureenvironmentaldependenciesisaboontoanyonewishingtoreliablyreproduceasoftwareenvironment,whetherfordevopsorscience.However,neitherisapanacea.Athird,morerecentoption,isnowcomingtolight:unikernels.Iwillbrieflyintroduceunikernelsandindicatesomeofthewaysinwhichtheymayberelevanttofuturedevelopmentofreproducibledatasciencesystems.
Breakoutgroups
GroupA–LectureTheatre GroupB–LoueySeminarRoom GroupC–HoTimSeminarRoom
Futuretechnicalrequirementsforreproducibledatascience
Warstoriesandtalltales:casestudiesofreproducibility
Businessstructuresandcommercialpressures
Chair:DrRichardMortier Chair:DrAdamFarquhar Chair:DrKenjiTakeda
Scribe:DrPaoloMissier Scribe:MrBrianMatthews Scribe:
Alan Turing Institute Symposium on Reproducibility for Data-Intensive Research Dickson Poon China Centre, St. Hugh’s College, University of Oxford Wednesday 6 and Thursday 7 April 2016
14
s
Organisersandspeakersbios
ProfessorDorothyBishopFMedSci,FBA,FRSisaWellcomeTrustPrincipalResearchFellowandProfessorofDevelopmentalNeuropsychologyattheUniversityofOxford,wheresheheadsaprogrammeofresearchintochildren’scommunicationimpairments.SheisasupernumeraryfellowofStJohn’sCollegeOxford.Hermaininterestsareinthenatureandcausesofdevelopmentallanguageimpairments,withaparticularfocusonpsycholinguistics,neurobiologyandgenetics.ShealsoisactiveinthefieldofopenscienceandresearchreproducibilityandshechairedasymposiumonreproducibilityattheWellcomeTrustlastyear.Aswellaspublishinginconventionalacademicoutlets,shewritesapopularblogwithpersonalreactionstoscientificandacademicmatters(Bishopblog)andtweetsas@deevybee.
DrNicolaBottaisaseniorscientistatPotsdamInstituteforClimateImpactResearch(PIK).HehasreceivedaPhDinengineeringfromtheETHZürichin1994.HehasworkedatDLR(nationalaeronauticsandspaceresearchcentre),GöttingenattheFU(FreieUniversität)Berlinand,since1998,atPIK.Hehaspublishedinhigh-impactjournalsincomputationalfluidmechanics,parallelcomputing,agent-basedmodellingandprogramspecification.Hismainresearchinterestsareprogramspecificationanddevelopmentanddependentlytypedlanguages.
LucieBurgessisAssociateDirectorforDigitalLibrariesattheBodleianLibraries,UniversityofOxford,andaSeniorResearchFellowatHertfordCollege,Oxford.LucieleadstheBodleianDigitalLibrarySystemsandServicesteamof40staffandisamemberoftheBodleianExecutive.LucieisamemberofOxfordUniversity’sITCommitteeandDigitalStrategycommittee;isaboardmemberoftheDigitalPreservationCoalitionandistheJiscrepresentativetotheArXiv.orgmemberadvisoryboardandscientificadvisoryboard.From2007-2014LucieworkedattheBritishLibrary,whereasHeadofStrategyandPlanningsheledthedevelopmentoftheBritishLibrary’s2020Vision.ShealsoledtheUKLegalDepositlibraries’effortstoextendlegaldeposittothedigitaldomain.Luciehasalsoworkedinpublishing,businessdevelopmentandstrategyatUnitedBusinessMedia,aFTSE-250informationservicescompany,andforArthurAndersenBusinessConsulting.LuciebeganhercareerworkingwiththeUnitedNationsFrameworkConventiononClimateChangesecretariatinBonn,Germany.LuciehasaMaster’sdegreeinPhysicsfromHertfordCollege,UniversityofOxford.
MrNeilChueHongisthefoundingDirectorandPIoftheSoftwareSustainabilityInstitute,acollaborationbetweentheuniversitiesofEdinburgh,Manchester,OxfordandSouthampton.Heenablesresearchsoftwareusersanddeveloperstodrivethecontinuedimprovementandimpactofresearchsoftware.From2007-2010,hewasDirectorofOMII-UKattheUniversityofSouthampton,whichprovidedandsupportedfree,open-sourcesoftwarefortheUKe-Researchcommunity.Inadditiontosittingonseveralprojectadvisorycommittees,heistheEditor-in-ChiefoftheJournalofOpenResearchSoftware,thecurrentAdvisoryCouncilchairoftheSoftwareCarpentryFoundation,co-authorof‘BestPracticesforScientificComputing’and‘AnOpenSciencePeerReviewOath’,andamemberoftheEPSRCStrategicAdvisoryTeamone-Infrastructure.
Alan Turing Institute Symposium on Reproducibility for Data-Intensive Research Dickson Poon China Centre, St. Hugh’s College, University of Oxford Wednesday 6 and Thursday 7 April 2016
15
DrDavidCrottyistheEditorialDirector,JournalsPolicyforOxfordUniversityPress.HeoverseesjournalpolicyandcontributestostrategyacrossOUP’sjournalsprogram,drivestechnologicalinnovation,servesasaninformationofficer,andmanagesasuiteofresearchsociety-ownedjournals.DavidwaspreviouslyanExecutiveEditorwithColdSpringHarborLaboratoryPress,creatingandeditingnewsciencebooksandjournals,andwastheEditorinChiefforColdSpringHarborProtocols.DavidreceivedhisPhDinGeneticsfromColumbiaUniversityanddiddevelopmentalneuroscienceresearchatCaltechbeforemovingfromthebenchtopublishing.DavidhasbeenelectedtotheSTMAssociationBoardandservesontheinterimBoardofDirectorsforCHORInc.,anot-for-profitpublic-privatepartnershiptoincreasepublicaccesstoresearch.AstheExecutiveEditoroftheSocietyforScholarlyPublishing'sScholarlyKitchenblog,Davidregularlywritesabouttheintersectionoftechnologyandpublishing.
DrCamilDemetrescuconductsresearchattheintersectionofdifferentareasincomputing,rangingfromprogramminglanguagesandsystems,algorithmsanddatastructures,andsoftwareengineering.Hisresearchactivityfocusesonthedesignofefficientalgorithms,tools,andtechniquesforengineeringtheperformanceofsoftwaresystems,withparticularemphasisonperformanceanalytics,incrementalalgorithms,anddatastreaming.Hehasbeenprincipalinvestigatorandsitecoordinatorofmanyresearchprojects.CamilDemetrescuhasbeenvisitingscientistatMicrosoftResearch--SiliconValley,attheAT&TResearchLaboratories--FlorhamPark,andattheITUniversityofCopenhagen.HisPh.D.thesiswasawardedoneofthetwo2002PrizesoftheItalianChapteroftheEATCSforthebestdissertationsinTheoreticalComputerScience.HehasservedassteeringcommitteememberandprogramcommitteechairofpremierconferencessuchastheEuropeanSymposiumonAlgorithms.CamilDemetrescuregularlyservesintheprogramandartifactevaluationcommitteesofpremierinternationalconferences.HehasorganizedseveralscientificeventsatBertinoroandDagstuhlandisthegeneralchairofthe30thEuropeanConferenceonObject-OrientedProgramming(ECOOP2016).Heismemberoftheeditorialboardofthe"MathematicalProgrammingComputation"(MPC)journal.
DrAdamFarquharisHeadofDigitalScholarshipattheBritishLibrary,whereheandhisteamfocusonestablishingservicesforresearchersthattakefulladvantageofthepossibilitiesthatdigitalcollectionsanddatapresentacrossallformatsandsubjects.HeisprincipleinvestigatorfortheBritishLibraryLabsproject;co-ordinatestheTHORprojectthatwillprovideseamlessidentifierservicesforresearchersanddata;memberoftheInternationalImageInteroperability(IIIF)Consortiumexecutivecommittee;DirectoroftheEndangeredArchivesProgrammethatworkswithteamsaroundtheglobetopreservearchivalmaterialthatisindangerofdestruction,neglectorphysicaldeterioration;PresidentofDataCite,aninternationalassociationdedicatedtomakingiteasiertoidentify,cite,andreusescientificdata;andfounderandBoardmemberoftheOpenPreservationFoundation.HehasbeenresponsiblefortheLibrary’smaps,newspaper,photographic,audioandmovingimagecollections.BeforejoiningtheLibrary,hewastheprincipleknowledgemanagementarchitectforSchlumbergerandresearchscientistattheStanfordKnowledgeSystemsLaboratory.
ProfessorJeremyGibbonsisProfessorofComputingintheDepartmentofComputerScienceatOxfordUniversity,whereheisdirectorofthepart-timeprofessionalpostgraduateSoftwareEngineeringProgramme,andheadoftheProgrammingLanguagesandSoftwareEngineeringresearchtheme.Hisresearchinterestsareindomain-specificlanguages,functionalprogrammingandthemathematicsofprogramconstruction.
Alan Turing Institute Symposium on Reproducibility for Data-Intensive Research Dickson Poon China Centre, St. Hugh’s College, University of Oxford Wednesday 6 and Thursday 7 April 2016
16
ProfessorCaroleGobleisProfessorofComputerScienceattheUniversityofManchesterUKandco-founderoftheSoftwareSustainabilityInstituteUK.Forthepast20yearsshehasleadaresearchanddevelopmentteamworkingone-Infrastructure,platformsandtoolstoenablescientiststoshareassetsofallkinds,interoperateresourcesandrepresentknowledge.ShehasworkedinallareasofSciencebutparticularlytheLifeSciences.HersystemsandservicesservingtheneedsofReproducibleResearchinclude:ResearchObjects(researchobject.org),workflowsystems(ApacheTavernawww.taverna.org.uk),andsharingplatforms(SEEKseek4science.org,myExperimentmyexperiment.org,BioCataloguebiocatalogue.org).SheisafoundingmemberoftheScholarlyCommunicationsorganisationForce11.organdservesoninternationalWGsfordataandsoftwarecitation,datasetpublishing,workflowinteroperabilityandidentifiermanagement.Sheco-leadstheEURIELIXIRforlifesciencedatainteroperabilitystreamandtheFAIRDOM(fair-dom.org)initiativeforreproducibilityofSystemsBiologyprojects.Shehasgivenmanykeynotesonthetopicofreproducibility.
DrLaurieGoodmanistheEditor-in-Chieffortheinternationalopen-accessopen-datajournalGigaScience,co-publishedbyBGIandBioMedCentral.Dr.GoodmanreceivedherBSandMSfromStanfordUniversityin1986,andPhDinBiochemistryandMolecularBiologyfromtheUniversityofChicagoin1991.Duringhergraduatework,shepublishedanovel,ASpellofDeceit,withDelReyBooks.ShecompletedapostdoctoralfellowshipattheUniversityofColoradoatBoulderthenleftthebenchin1995toworkasAssistantEditoratNatureGenetics.In1997,shemovedtoColdSpringHarborLaboratoryPresstoserveastheExecutiveEditorofGenomeResearchandManagingEditorofLearning&Memory.In2006,shestartedherowncompany,GoodmanWriting&Editing,whichprovidesavarietyofservicesincludingmanuscriptwritingseminarsandhigh-leveleditingofscientificmanuscripts,withaspecialtyineditingmanuscriptsfromnon-nativeEnglishspeakers.ORCIDID:0000-0001-9724-5976.
ProfessorPatrikJanssonisaFullProfessorofComputerScienceandHeadofthedivisionofSoftwareTechnologyatChalmersUniversityofTechnology,Sweden.HisresearchareaisSoftwareTechnologyandhehasworkedonGenericProgramming,FunctionalProgrammingandDependentTypeTheory.Inparallelhehasspentafewyearsbuildingamulti-disciplinarycommunityandresearchagendacalled"GlobalSystemsScience"togetherwithresearchersineconomics,climatechange,risk&resiliance,etc.(inadditiontocomputerscienceandmathematics).CurrentlyheworksonDomainSpecificLanguagesintheCentreofexcellenceforGlobalSystemsScience[1],intheGRACeFULproject[2]andintheDSLsofMathproject[3].Twitterprofile[4]:Computerscientist,Haskellhacker,catalystofresearchideas,likestoconnectthebigpicturewithformaldetails,software&languagetechnologyadvocate.[1]http://coegss.eu/[2]https://www.graceful-project.eu/[3]https://github.com/DSLsofMath/DSLsofMath[4]https://twitter.com/patrikja
DrPaoloMissierisReaderinLarge-ScaleInformationManagementwiththeSchoolofComputingScience,NewcastleUniversity,UK.Hejoinedacademiain2004,afterapriorcareerasaResearchScientistatBellCommunicationsResearch,USA(1994-2001),asaResearchFellowattheUniversityofManchester,SchoolofComputerScience(2004-2011).Hiscurrentresearchinterestsareinlarge-scalemetadataanalytics(i.e.theapplicationofpredictivedataanalyticstechniquestolargecorporaofmetadata),anddataprovenancemanagementandanalysisinparticular.Between2012and2013,PaolocontributedtotheW3CWorkingGrouponProvenanceontheWebandco-editedthePROVstandard.Hecurrentlyholdsa3-yearprojectgrantfromEPSRCfortheReCompproject.Aimedatbuildingametadatamanagementinfrastructure,ReCompwillenabledecision-makingontheneedsandopportunitiestore-computeexpensivedataanalyticstasksastheiroutcomeslosevalueovertime.AtNewcastle,MissierisalsoresponsibleforthePost-graduate(MSc)moduleonBigDataAnalytics,partoftheMScprograminCloudComputingforBigDataAnalytics.PaoloholdsaPhDinComputerSciencefromtheUniversityofManchester,UK(2007),anMScinComputerSciencefromUniversityofHouston,Texas,USA(1993)andaBScandMScinComputerSciencefromUniversita'diUdine,Italy(1990).
Alan Turing Institute Symposium on Reproducibility for Data-Intensive Research Dickson Poon China Centre, St. Hugh’s College, University of Oxford Wednesday 6 and Thursday 7 April 2016
17
DrSuzyMoatisanAssociateProfessorofBehaviouralScienceatWarwickBusinessSchool,wheresheco-directstheDataScienceLab.HerresearchinvestigateswhetherdataonourusageoftheInternet,fromsourcessuchasGoogle,WikipediaandFlickr,canhelpusmeasureandevenpredicthumanbehaviourintherealworld.Moat’sworktouchesonproblemsasdiverseaslinkingonlinebehaviourtostockmarketmoves(withPreis,Curme,Stanley,etal.),estimatingcrowdsizes(withBottaandPreis)andevaluatingwhetherthebeautyoftheenvironmentweliveinmightaffectourhealth(withSeresinheandPreis).Theresultsofherresearchhavebeenfeaturedbytelevision,radioandpressworldwide,byoutletssuchasCNN,BBC,TheGuardian,WallStreetJournal,NewScientistandWired.WithhercollaboratorandDataScienceLabco-director,TobiasPreis,sherecentlyledanonlinecourseonusingbigdatatomeasureandpredicthumanbehaviourwhichattractedover15,000learners.Moathasalsoactedasanadvisortogovernmentandpublicbodiesonthepredictivecapabilitiesofbigdata.
ProfessorLucMoreauisProfessorofComputerScienceandHeadoftheWebandInternetSciencegroup(WAIS),intheDepartmentofElectronicsandComputerScience(ECS)attheUniversityofSouthampton.Lucwasco-chairoftheW3CProvenanceWorkingGroup,whichresultedinfourW3CRecommendationsandnineW3CNotes,specifyingPROV,aconceptualdatamodelforprovenancetheWeb,anditsserializationsinvariousWeblanguages.Previously,heinitiatedthesuccessfulProvenanceChallengeseries,whichsawtheinvolvementofover20institutionsinvestigatingprovenanceinter-operabilityin3successivechallenges,andwhichresultedinthespecificationofthecommunityOpenProvenanceModel(OPM).
DrRichardMortierisamemberoffacultyintheSystemsResearchGroupattheCambridgeUniversityComputerLab.PastworkincludesInternetrouting,distributedsystemperformanceanalysis,networkmanagement,aestheticdesignablemachine-readablecodes,andhomenetworking.Heworksintheintersectionofsystemsandnetworkingwithhuman-computerinteraction,andiscurrentlyfocusedonhowtobuilduser-centricsystemsinfrastructurethatenablespeopletobettersupportthemselvesinaubiquitouscomputingworldthroughHuman-DataInteraction.
ProfessorThomasNicholsisaProfessorofNeuroimagingStatisticsandaWellcomeTrustSeniorResearchFellowattheUniversityofWarwick,holdingajointpositionbetweenWarwickManufacturingGroup&theDepartmentofStatistics.Heisastatisticianwithasolitary,20-yearfocusonmodellingandinferencemethodsforbrainimagingresearch.Beforegraduatestudies,heworkedasaprogrammerandstatisticianattheUniversityofPittsburgh'sPositronEmissionTomograpyFacility.HeearnedhisPhDinStatisticsatCarnegieMellonUniversitywithcross-traininginCognitiveNeuroscience,andin2000joinedthefacultyattheDepartmentBiostatisticsattheUniversityofMichigan.Hehada3yearsojurninindustry,workingatGlaxoSmithKline'sClinicalImagingCentre,London,wherehedevelopedmethodsforfMRIclinicaltrialsandimaginggeneticsstudies.In2009hereceivedtheWileyYoungInvestigatorAwardbytheOrganizationforHumanBrainMappinginrecognitionforhiscontributionstostatisticalmodeling&inferenceofneuroimagingdata.
Alan Turing Institute Symposium on Reproducibility for Data-Intensive Research Dickson Poon China Centre, St. Hugh’s College, University of Oxford Wednesday 6 and Thursday 7 April 2016
18
RichardO’Beirne,DigitalStrategyManager,OxfordUniversityPress,hasworkedindigitalpublishingsince1994andwithOUPsince2004.Astrongadvocatefortheimportanceofstandardscomplianceandopencollaboration,herepresentsOUPonanumberofpublishingindustrybodies.
ProfessorDaviddeRourePhDFBCSMIMA,isProfessorofe-ResearchatUniversityofOxfordandDirectoroftheOxforde-ResearchCentre.HehasstrategicresponsibilityforDigitalHumanitiesatOxfordwithinTORCH(TheOxfordResearchCentreintheHumanities),collaboratesinOxford'sWebSciencelaboratorywiththeOxfordInternetInstitute,andisamemberoftheOxfordCyberSecurityNetwork.ForseveralyearshehasalsoheldESRCroles,asdirectorofDigitalSocialResearchandasastrategicadvisorintheareaofnewformsofdataandrealtimeanalytics.Focusedonadvancingscholarshipusinginnovativedigitalmethods,Davidworkscloselywithmultipledisciplinesincludingsocialsciences(studyingsocialmachines),digitalhumanities(computationalmusicology),computerscience(largescaledistributedsystems,socialcomputing,InternetofThings)andpreviouslysciencesandsocialstatistics.Hehasextensiveexperienceinhypertext,Web,LinkedData,andscientificworkflows.Drawingonthisbroadinterdisciplinarybackgroundheisafrequentspeakerandwriterondigitalscholarshipandthefutureofscholarlycommunications.
DrKenjiTakedaisSolutionsArchitectandTechnicalManagerforMicrosoftResearch.HeiscurrentlyfocussedonAzureforResearch,AzureMachineLearningandacademicoutreachinEurope,MiddleEastandAfrica.Heisworkingwithresearchersacrossdisciplinestobestunderstandtheuseofcloudcomputingtoacceleratetheirresearch.HehasextensiveexperienceinCloudComputing,HighPerformanceandHighProductivityComputing,Data-intensiveScience,ScientificWorkflows,ScholarlyCommunication,EngineeringandEducationalOutreach.Hisalsohasresearchexpertiseisintheareasofaerodynamics,aeroacousticsandflightsimulation.Hehasapassionfordevelopingnovelcomputationalapproachestotacklefundamentalandappliedproblemsinscienceandengineering.Kenjiadvisesfundingagenciesandgovernmentbodiesonpolicyandinnovation.HeisontheeditorialboardfortheJournalofOpenResearchSoftware,andsteeringcommitteesformajorresearchconsortiaandinternationalconferences.Heisanadvocateofopenscienceandreproducibleresearch.
ProfessorJaredTannerisProfessoroftheMathematicsofInformationintheMathematicsInstituteatOxfordUniversityandaFellowatExeterCollege;OxfordUniversity’sliaisondirectorfortheAlanTuringInstitute;SIAMUKIEVicePresident(2011-2013)andFoundingEditor-in-ChiefofInformationandInference:AJournaloftheIMA.ProfTanner’sresearchfocusisonthedesign,analysis,andapplicationofnumericalalgorithmsforinformationinspiredapplicationsinsignalandimageprocessing.