goble keynote vivo-scits2014

Post on 10-May-2015

840 Views

Category:

Science

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Research Objects for FAIRer Science - Shared Keynote presentation at VIVO and Science of Team Science Joint Conference, 6-8 August 2014, Austin Texas

TRANSCRIPT

Research Objects for

FAIRer ScienceProfessor Carole Goble CBE FREng FBCSThe University of Manchester, UKcarole.goble@manchester.ac.uk

VIVO/SciTS Conferences 6-8 August 2014, Austin, TX

Scientific publications have at least two goals: (i) to announce a result and (ii) to convince readers that the result is correct

…..papers in experimental science should describe the results and provide a clear enough protocol to allow successful repetition and extension

Jill Mesirov Accessible Reproducible Research

Science 22 Jan 2010: 327(5964): 415-416 DOI: 10.1126/science.1179653

Virtual Witnessing*

*Leviathan and the Air-Pump: Hobbes, Boyle, and the Experimental Life (1985) Shapin and Schaffer.

Virtual Witnessing*

*Leviathan and the Air-Pump: Hobbes, Boyle, and the Experimental Life (1985) Shapin and Schaffer.

Capturing, representing, sharing the information needed to understand how a research result came about.

Context of results• Inputs, outputs, process…Context of resources• Instruments, data, software,

people…

“An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment, [the complete data] and the complete set of instructions which generated the figures.” David Donoho, “Wavelab and Reproducible Research,” 1995

datasetsdata collectionsstandard operating proceduressoftwarealgorithmsconfigurationstools and appscodesworkflowsscriptscode librariesservices,system software infrastructure, compilershardwareMorin et al Shining Light into Black

BoxesScience 13 April 2012: 336(6078) 159-160

Ince et al The case for open computer programs, Nature 482, 2012

“I can’t immediately reproduce the research in my own laboratory. It took an estimated 280 hours for an average user to approximately reproduce the paper.”

Phil BourneNIH Big Wig for Data Science

a reproducibility paradox

big, fast,complicated, multi-step, multi-type multi-field

greaterexpectations of reproducibility

diy publishinggreater access

Systems Biology Collaborations

Modelling Cycle

45 organisations 112 organisations

Data

Models

Articles

ExternalDatabases

http://www.seek4science.org

Metadata

http://www.isatools.org

Ontology-driven Aggregated Content Infrastructure (Framework) for building Sys Bio Commons

share and interlinking multi-stewarded, mixed, methods, models, data, samples…

Standards

DCATFOAF

YellowPages

Yellow Pages

Careful Sharing Options

Commons

Investigations

AssaysStudies

Towards Interoperable Bioscience Data, Nature Genetics, 2012

Standards, Structure, Interlink

Just Enough Results Model for things produced and used in experiments

Construction data

Validation data

Metabolomics

Mass Spec

Transcriptomics

Proteomics

Fluxomics

Publications

Mix of locally & remotely hosted content

Open Modelling Exchange Format Archive

Wolstencroft et al, Proc ISWC 2013

Just Enough Results Model for stuff in experimentsCommon elements

Data type specific elements

Experimentalists, modellers & developersCross-site, cross project collaborationKnowledge network

Building the System: Building a Cult

TRUST

VISION

SETTING EXPECTATIONS

Drink togetherWork together

• Collaboration – Complementarity correlation

• Modellers share more than Experimentalists

• Experimentalists reuse models more than Modellers

• Active enclave sharing • Public sharing tricky even

after publication, bribery and threats

• Data Hugging, Flirting and Voyerism

• Playground rules apply• Fluid, transient

collaborations > membership mgt pain in a*se

• Shameless exploitation of PI competitiveness & vanity

• PI & Funder leadership

• Pan project spawned collaborations – YES!!!!

• But not necessarily visible to us.

Data discovery

Data assembly, cleaning, and refinement

Ecological Niche Modeling

Statistical analysis

Data collection

InsightsInsights Scholarly Communication & Reporting

Scholarly Communication & Reporting

Enclosed sea problem (Ready et al., 2010)

Pilumnus hirtellus

Scientific Workflows

BioSTIF

method

instruments and laboratory

materials

Data discovery

Data assembly, cleaning, and refinement

Ecological Niche Modeling

Statistical analysis

Data collection

InsightsInsights Scholarly Communication & Reporting

Scholarly Communication & Reporting

Method Matters!

Workflow Commons

"Mapping present and future predicted distribution patterns for a meso-grazer guild in the Baltic Sea" by Sonja Leidenberger et al

1st International Workshop on Social Object Networks (SocialObjects 2011), Boston, October 9th 2011.

Find, Click ‘n’ GoFile ‘n’ Forget

Specialist Curators

24

Properties What would you ask a publication if you could?

Identity and DescriptionUniquenessAuthenticity

Who are you ? Where and when were you born ? Who were your parents (creators) ?

Review, Reuse, and Repurpose For which purpose were you conceived and have been used ?

InspectionVisualizationAnnotations

What do you have inside ?

Representation How is your content structured ?

Access Rights May I access all your parts ?

Adaptability Which parts can I replace ?

Evolution & VersioningProvenance

What have they done to you ? Who and When ? Why did they do that ?

Quality Why are you relevant to me ? Can I believe what you are saying or trust your results ?

Reproducibility Do you still produce the same results ?

Fitness Are you still working ?How could I repair you ?

Credit and attribution How could I thank you ? How could I talk about you ?

From Manuscripts

to “Research Objects”

A meme

The multi-dimensional paper

Packs

Packs

www.datafairport.org

What is a Research Object?

Howard Ratner, STM Innovations Seminar 2012was: Chair STM Future Labs Committee, CEO EVP Nature Publishing Group,

now: Director of Development for CHORUS (Clearinghouse for the Open Research of US)

http://www.youtube.com/watch?v=p-W4iLjLTrQ&list=PLC44A300051D052E5

http://www.myexperiment.org/packs/196.html

What The Commons* Is and Is Not

Is Not:– A database

– Confined to one physical location

– A new large infrastructure

– Owned by any one group

Is:– A conceptual framework

– Analogous to the Internet

– A collaboratory

– A few shared rules• All research objects

have unique identifiers

• All research objects have limited provenance

Philip E. Bourne Ph.D.Associate Director for Data Science, National Institutes of Healthhttp://www.slideshare.net/pebourne

*The NIH BD2K Commons Framework $100million in 2015

Social Objects

carriers of discourse

http://www.researchobject.org/

A Framework to Bundle and Relate multi-hosted (digital) resources of a scientific experiment or investigation using standard mechanisms & uniform access protocols. Carriers of Research Context

Outputs are first class citizens to be managed, credited and tracked: data, software

Research Objects

Links

• Recording & linking together the components of an experiment

• Linking across experiments.

Preserve Archive

Reproduce* RecomputeReuseTrain & Explain

Exchange RemixFix

* a word that means many things…..

re-compute

replicatererun repeat

re-examine

repurpose

recreate

reuse

restore

reconstructreview

regeneraterevise

recycle

regenerate the figure

redo

Results may vary

repeat replicate

Drummond C Replicability is not Reproducibility: Nor is it Good Science, onlinePeng RD, Reproducible Research in Computational Science Science 2 Dec 2011: 1226-1227.

Methods(techniques, algorithms, spec. of the steps)

Materials(datasets, parameters, algorithm seeds)

ExperimentInstruments(codes, services, scripts, underlying libraries)

Laboratory(sw and hw infrastructure, systems software, integrative platforms)

Setup

reusereproduce

Executable Research Object

same experimentsame set upsame lab

same experimentsame set updifferent lab

same experimentdifferent set up

different experiment

some of same

Validate

reusereproduce

repeat replicate

http://www.biomedcentral.com/biome/carole-goble-on-reproducible-research-what-it-really-means-how-to-reach-it/

Design

Execution

Result Analysis

Collection

Publish / Report

Peer Review

Peer Reuse

Modelling

Can I repeat & defend my method?

Can I review / reproduce and compare my results / method with your results /

method?

Can I review / replicate and certify

your method?

Can I transfer your results into my

research and reuse this method?

* Adapted from Mesirov, J. Accessible Reproducible Research Science 327(5964), 415-416 (2010)

Research Report

Prediction

Monitoring

Cleaning

specialist codes libraries, platforms, tools

services

(cloud) hosted services

commodity platforms

data collectionscatalogues software

repositories

my datamy processmy codes

integrative frameworks

gateways

data carpentry

http://software-carpentry.org/

Components & Dependencies

• 35 kinds of annotations• 5 Main Workflows• 14 Nested Workflows• 25 Scripts• 11 Configuration files• 10 Software

dependencies • 1 Web Service • Dataset: 90 galaxies

observed in 3 bands • Multiple platforms• Multiple systems

José Enrique Ruiz (IAA-CSIC)

Galaxy Luminosity Profiling

Executable Instrument Entropy

Zhao, Gomez-Perez, Belhajjame, Klyne, Garcia-Cuesta, Garrido, Hettne, Roos, De Roure and Goble. Why workflows break - Understanding and combating decay in Taverna workflows, 8th Intl Conf e-Science 2012

MitigateDetect, RepairPreserve

Partial replicationApprox. reproductionVerificationBenchmarks

Executable Instrument EntropyPrepare to Repair

Reproducibility by InspectionRead It

Reproducibility by InvocationRun It

Document Instrument

[Adapted Freire, 2013]

provenancegather

dependenciescapture stepstrack & keep

results

provenancegather

dependenciescapture stepstrack & keep

results

portability

variability tolerance

preservationpackaging

versioning

openaccessibleavailablemachine actionable

descriptionintelligible

machine-readable

[Adapted Freire, 2013]

AuthoringExec.

PapersLink docs to

experiment

Sweave

ProvenanceTracking,Versioning

Replay, Record, Repair

Workflows, makefiles

ProvStore

provenancegather dependencies

capture stepstrack & keep results

provenancegather dependencies

capture stepstrack & keep results

openaccessibleavailablemachine actionable

descriptionintelligible

machine-readable

[Adapted Freire, 2013]

packagingportability

variability tolerance

preservation

provenancegather dependencies

capture stepstrack & keep results

provenancegather dependencies

capture stepstrack & keep results

versioning

host

service

Open Source/Store

Sci as a Service

Integrative fws

Virtual MachinesRecompute,

limited installation, Black BoxByte execution, copiesDescriptive read,White BoxArchived record

Read & Run, Co-locationNo installation

Portable PackageWhite Box, Installation Archived record

[Adapted Freire, 2013]

host

service

ReproZip

packagingportability

variability tolerance

preservation

provenancegather dependencies

capture stepstrack & keep results

provenancegather dependencies

capture stepstrack & keep results

versioning

No Green Fields No One System

Find Access Interop ReusePorting across PlatformsExchange between SystemsComparing across Labs

Identity

Description

Packaging

Refer to aggregations and their resource contents

Interpretation: What does it mean?How can I compare with others?How is it linked together and linked to others?

Describe aggregation structure and its constituent partsContainer regardless of host

FAIR RO Core Model

manifest

Uniform and first class handling of diverse types (data, software, workflows…)

Identity

Annotation

Aggregation

FAIR RO Core ModelDOIs

URIsHandles

ORCID

W3C OAM

OAI-ORE

Open Annotation Model

OAI-Object Reuse and Exchange

Identity

Annotation

Aggregation

FAIR RO Core ModelDOIs

URIsHandles

ORCID

AggregationsResource mapsProxies

Annotation first class and stand-off

Identity persistence and resolutionCitation

W3C OAM

OAI-ORE

Identity

Annotation

Aggregation

FAIR RO Core PlatformsDOIs

URIsHandles

ORCID

Data Citation Implementation

W3C OAM

OAI-ORE

Distributed Third Party Tenancy

Alien Store

AggregationCarrier of Research Context

• Identifiable, citable, resolvable

• Uniform Management• Mixed Stewardship

• Decay & Graceful Degrade

• Content & Aggregation Lifecycles

• Annotations• Manifests, Recipes,

Permissions, Discourse

Aggregations• Dispersed /

Encapsulated• External (linked) /

Local• Mixed types • Blackboxes• Virtual / Materialised

Content Resources• Aggregations

themselves• In many aggregations• Virtual / Materialised• Open / Closed

TARDIS: Time and Relative Dimension in SpaceScience

RO Model Ontology

• RO Management– Transportation / Access / Citation– Id location of RO “container”– Provenance of RO & contents– Behaviour/lifecycle of RO & contents– Policies

• RO Interpretation– What the RO and its content mean– How they can be compared and

validated– How they can be used, executed, linked

• Interpretation variations– Type (e.g. Workflows)– Discipline (e.g. Biology)– Task (e.g. Discovery, Execution)– Activity (e.g. Experiment)

Progression LevelsManagement and Interpretation for Integrated Applications

Progression LevelsManagement and Interpretation for Integrated Applications

• RO Management– Transportation / Access / Citation– Id location of RO “container”– Provenance of RO & contents– Behaviour/lifecycle of RO & contents– Policies

• RO Interpretation– What the RO and its content mean– How they can be compared and

validated– How they can be used, executed, linked

• Interpretation variations– Type (e.g. Workflows)– Discipline (e.g. Biology)– Task (e.g. Discovery, Execution)– Activity (e.g. Experiment)

Checklists

Versio

nin

g

Pro

venance

Dependencies

More Stakeholders

& ServicesCitation minimum

More specialised detail

Fewer but more specialised

stakeholders & services

AnnotationProfiles

.

Depth: how deeply described

Coverage: how much is covered.

Progression levelsSemantic Framework

Checklists

Versio

nin

g

Pro

venance

Dependencies

NISO-JATS

EXPO, ISAJERM, OBI

MIAME, SBML

GIT

MIM Ontology

PROVPAVVoID

Puppet Docker

Make

PAV

RO Model roevowfprov

wfdesc

SysBio Workflows

DCAT

AnnotationProfiles

.

Depth: how deeply described

Coverage: how much is covered.

Progression levelsSemantic FrameworkExperiment

VIVO-ISF

DC

Checklistsaka Minimum Information Models

Safety, quality, consistency

Validation, monitoring Common in experimental

science Checklists defined in

terms of the RO model and its annotations

Services execute against model and an RO’s annotations

Zhao et. al. A Checklist-Based Approach for Quality Assessment of Scientific Information 3rd

In. Workshop on Linked Science, 2013

Minim Checklist Ontology to describe checklists

Must, Should…Cardinalities…Rules…

http://purl.org/net/mim/ns

Towards Smart Integrated Applications & Mediation

1. Id & Cite fluid things2. First class citizenship &

uniform handling of artifacts

3. Compound 4. Mixed, leaky Containers5. Span outcomes, evolve

outputs, emergence6. Layered interpretation and

management profiles using standards

7. Machine-processable8. Technology Independent

Bechhofer, Why linked data is not enough for scientists, DOI: 10.1016/j.future.2011.08.004

Towards Smart Integrated Applications & Mediation

Bechhofer, Why linked data is not enough for scientists, DOI: 10.1016/j.future.2011.08.004

1. Id & Cite fluid things2. First class citizenship &

uniform handling of artifacts

3. Compound 4. Mixed, leaky Containers5. Span outcomes, evolve

outputs, emergence6. Layered interpretation and

management profiles using standards

7. Machine-processable8. Technology Independent

Research Objects Frameworka systematic approach to representing

a different unit of scholarship

“development” view“logical” view

“process” view “physical” view

SERVICESPOLICIES

LIFECYCLESMETADATA PROFILES

Lets Bake Research Objects!

Open Archival Information System Pilot

ROs are “Information Packages”

ROManagerRODL

• A single, transferable object encapsulates description and resources – Download, transfer, publish

• ZIP-based format + manifest describes aggregation and annotations– Unpack with standard

tooling

• JSON-LD for manifest– Lightweight linked-data

format– Use JSON tooling and

services

Baking with off the shelf platforms

OMEX archive

bundle

Adobe

UC

FO

RE

PR

OV

OD

F

• Work with local folder structure.– Version: github. – Metadata: Local tooling – Metadata about

aggregation and its resources: “hidden folder”

• Zenodo/figshare pull snapshot from github– DOIs for aggregation– new DOIs: release cycles

Baking with off the shelf platforms

http://dx.doi.org/10.6084/m9.figshare.1031591

FARSITE

coded descriptions of clinical study cohorts

an NHS tool to assess the feasibility of gathering a

cohort

packages codes, study, and metadata

Home Baking

In the WildSafari

integrated database and journal

http://www.gigasciencejournal.com

galaxy.cbiit.cuhk.edu.hk[Peter Li]

Nanopub: represents structured data along with its provenance in a single publishable and citable entry

Galaxy workflows: re-enact the analysis

Research Object: aggregates the (digital) resources contributing to findings of (computational) research (results, data and software) as citable compound digital objects

http://isa-tools.github.io/soapdenovo2/http://sandbox.wf4ever-project.org/portal/ro?ro=http://sandbox.wf4ever-project.org/rodl/ROs/SOAP2denovo2-Aureus/

[Alejandra Gonzalez-BeltranPhilippe Rocca-Serra]

what’s the least we can do? how might ROs minted and used by science teams?

how might ROs be implemented and used by developer teams?

Standards

ModelsPlatforms

Id SchemesResolution

Light touchExtensibleInfiltration

Mapping

Making,Curating, Using

Nudging

Sharing

Linking

Infiltration

Embedding into and changing work practices

TOOLS

Citing

Technical Social

Reward

Mixed stewardship

CitationSchemes

Fragility

[Norman Morrison]

(meta)Data Capture Platforms

Process Capture Platforms

Stealthy not Sneakyto reduce the frictioninstrument the world

IncrementalJIJIT not JIC

Focus on Personal Productivity

not Public Good

Auto-magical

From made reproducible to born reproducibleWhat’s the least we can do?

Knowledge TurnsTransportation & MediationUnit of Scholarly CurrencyContext, ComparisonDistributed: Search, Discover, Index, Harvest, Port

Research TurnsRelease model: Evolution, Emergence, Discourse, Comparison, Historical reviewForks, Merges & FixivityFlow across groups, projects and articlesAnti-Salami, Threaded Publications

Schopf, Treating Data Like Software: A Case for Production Quality Data, JCDL 2012

Goble, De Roure, Bechhofer, Accelerating Knowledge Turns, I3CK, 2013

Profile FocusBody of knowledge around methods, workflows, software, data, person, rather than publication.First class citation, credit and respect

Open Research Practice is (increasingly) like Open Source Software Practice.

(Which we know a lot about)

FAIR research practice benefits from a shared and principled approach for identification, aggregation and annotation of research components of all kinds.

– Using existing standards, vocabularies, frameworks, platforms, infrastructures. Using linked data and semantic interoperability

VIVO - to represent the full context of researchers’ work.

SciTS – to study the research process and research collaboration

http://www.researchobject.org

• Barend Mons• Sean Bechhofer• Philip Bourne• Matthew Gamble• Raul Palma• Jun Zhao• Alan Williams• Stian Soiland-Reyes• Paul Groth• Tim Clark• Juliana Freire• Alejandra Gonzalez-Beltran• Philippe Rocca-Serra• Ian Cottam

All the members of the Wf4Ever teamiSOCO: Intelligent Software Components S.A., SpainUniversity of Manchester, School of Computer Science, Manchester, United KingdomUniversity of Oxford, Department of Zoology, Oxford, UKPoznan Supercomputing and Networking Center. Poznan, PolandIAA: Instituto de Astrofísica de Andalucía, Granada, SpainLeiden University Medical Centre, Centre for Human and Clinical Genetics, The Netherlands

Colleagues in Manchester’s Information Management GroupRO Advisory Board Members

http://www.researchobject.orghttp://www.wf4ever-project.org

top related