photon and neutron data infrastructure 2 matthews - ig...photon and neutron data infrastructure ......

16
Photon and Neutron Data Infrastructure Brian Matthews Scientific Computing Department Co-Chair, RDA Photon and Neutron Science Interest Group [email protected]

Upload: duongthien

Post on 09-Apr-2018

222 views

Category:

Documents


2 download

TRANSCRIPT

Photon and Neutron Data Infrastructure

Brian Matthews Scientific Computing Department

Co-Chair, RDA Photon and Neutron Science

Interest Group

[email protected]

PanSig •  To sustain the community of interest for common data issues for the

Large-Scale Photon and Neutron Source communities.

•  14 European institutes + ORNL, Argonne, SLAC, ANSTO, ANS (at least) –  Facility computing infrastructure providers –  But also want to involve user communities

•  Structural Biology, Materials

•  Meetings: P3, March 2014, Dublin, P4, Sept 2014, Amsterdam

•  Cochairs: –  Amber Boehnlein (SLAC, USA) –  Brian Matthews (STFC, UK) –  Thomas Proffen (ORNL, USA) –  Frank Schluenzen (DESY, DE)

•  Building cross-continental links and consortia

Large-­‐Scale  science  facili/es  The  key  challenges  of  the  21st  century  –  energy,  global  climate,  health  and  security  concerns  –  need  to  study  ma:er  at  the  scales  from  single  atoms  (10-­‐10  m)  to  living  cells  (10-­‐6  m),  to  engineering  components  (10-­‐3  m)  

This  research  requires  large  scale  research  infrastructures  that  are  beyond  the  capability  of  any  single  university  or  research  group  

... to construct and operate a shared data infrastructure for Photon and Neutron laboratories...

Neutron diffraction

X-ray diffraction

High-quality structure refinement

•  Common data catalogue

•  Integration of users data from different facilities

•  Track provenance of data through analysis stages

•  Deploy standards for long-term curation

•  Support scalability through parallelisation

•  Deploy infrastructure in three different techniques

Open Data Infrastructure (Nov 11– Sep14)

PaN-Data Integration

Shared Data Policy Framework

Federated User Authentication

Federated Data Catalogue

Common Data Format NeXus

Common data environment, common user experience

Umbrella

Proposals

Once awarded beamtime at ISIS, an entry will be created in ICAT that describes your proposed experiment.

 

Experiment

Data collected from your experiment will be indexed by ICAT (with additional experimental conditions) and made available to your experimental team  

Analysed Data

You will have the capability to upload any desired analysed data and associate it with your experiments.

 

Publication

Using ICAT you will also be able to associate publications to your experiment and even reference data from your publications.

 

B-lactoglobulin protein interfacial structure

 

Exam

ple

ISI

S Pro

pos

al

 

GEM – High intensity, high resolution neutron diffractometer  

H2-(zeolite) vibrational frequencies vs polarising

potential of cations

Central Facility •  Secure access to user’s data

•  Flexible data searching

•  Scalable and extensible architecture

•  Integration with analysis tools

•  Access to high-performance resources

•  Linking to other scientific outputs •  Data policy aware

ICAT and CSMD •  The Core Scientific Meta-Data Model

(CSMD) is a study-data oriented model which has been developed at STFC since 2001.

•  It captures high level information about scientific studies and the data that they produce throughout a facility’s scientific workflow.

•  It is a key aspect of the ICAT, a software suite designed to manage the cataloguing and (continuous) access to facilities data.

http://www.icatproject.org/mvn/site/icat/4.2.5/icat.core/schema.html

•  Investigation •  Investigator •  Topic and Keyword •  Publication •  Sample •  SampleParameter •  Dataset •  DatasetParameter •  Datafile •  DatafileParameter •  Parameter

ICAT + Mantid (desktop client)

ICAT Tool Suite and Clients

ICAT APIs

IDS (ICAT Data Service)

ICAT Job Portal

TopCAT (Web Interface to

ICATs)

ICAT + DAWN (Eclipse Plugin)

Desktop app

Clusters/HPC

Disk

Tape

Metadata as Middleware

Data Publication

Integrating the Lifecycle Growing opportunities to build on the data management

infrastructure

Extending the integrated data management system across the scientific lifecycle;

From Proposal to Publication

•  Support post-experimental analysis -  Image reconstruction with ISIS (IMAT) and DLS:

•  data from experiment, user access to SCARF -  ICAT Job Portal for the LSF Octopus Facility

•  Integrated job submission

•  Support Data Publication -  DOIs issued for Data -  Data Preservation

•  Support Publications -  STFC Epubs Repository -  Linking data and publications to provide a record of

science

Crystalise research

idea

Seek & gain

funding

Perform research & gather

data

Analyse collected

data

Manage & curate data

Publish results

Post-experimental support

Scan Reconstruct Segment + Quantify

3D mesh + Image based Modelling

Predict + Compare

Some mage credit: Avizo, Visualization Sciences Group (VSG)

Data Catalogue

Petabyte Data storage

Parallel File system

HPC CPU+GPU

Visualisation

Infrastructure + Software + Expertise! •  Tomography: Dealing

with high data volumes – 200Gb/scan, ~5 TB/day (one experiment at DLS)

•  MX: high data volumes, smaller files, but a lot more experiments

•  Hard to move the data – needs to be handled at the facility?

ISIS:IMAT

DLS:I12/I13

A Tomography Reconstruction Demo for IMAT

•  In- (ISIS) and post-experiment (ISIS and DLS) data processing. –  IMAT is a new neutron imaging instrument on ISIS

•  HPC integration with experiments; –  Using SCARF CPU and GPU clusters

•  A tomographic image reconstruction toolbox –  With supported algorithms;

•  High throughput image reconstruction framework; –  With fast 3D visualisation;

•  An integral component of IMAT’s in-experiment data analysis capability through Mantid (ISIS) and DAWN (DLS),

•  Maximise the science resulting from Data collected on facility instruments.

•  Towards a service in 2015/2016

PanDaas •  Data Analysis as a

Service –  Led by ESRF –  18 institutes

worldwide •  Data reduction and

analysis platform Photon and Neutron analytical facilities

•  Not funded, but a continuing need –  Looking to continue.

NFFA-EUROPE •  Nanoscience Foundries and Fine

Analysis –  Research and Innovation actions

•  Integrated, distributed research infrastructure –  for multidisciplinary research at the

nanoscale –  from synthesis and

nanolithography –  nanocharacterization, theoretical

modelling and numerical simulation,

–  coordinated open-access to complementary facilities

•  Information and Data management Repository Platform (IDRP) –  CNR-IOM, ESRF, STFC, KIT

•  RDA standardisation

RDA-IG Data Life Cycle Design for Nano-Science

•  Nano-Science comprises experiments, instruments, facilities and theoretical work focusing in studying, modeling and composing structures and phenomena on a nanometer scale.

•  The Interest Group

–  to enable data sharing across facilities and user groups, –  resulting in data publications as open data.

•  The RDA Interest Group will establish a Data Life Cycle Lab to foster for world-wide communication, coordination, and support for Data Life Cycle Management, Data Analysis, and Data Sharing in Nano-Science. The agenda will include:

–  To organize and moderate a regular “Project and Information Share“ –  To provide an open information and discussion platform in close collaboration with RDA’s

infrastructure, e.g. providing video shares of the presentations and a forum for discussions. –  To inform about, to discuss, and to adopt RDA’s outputs. This includes active communication

with other Working and Interest Groups within RDA. –  Research and development to support Data Life Cycle Management in Nano-Science and

information exchange about ongoing R&D performed in various nano-science facilities. –  To foster interoperability by developing metadata standards for the cataloguing, access and

exchange of data and associated information describing nano-science experiments.

•  The last point may turn into an Working Group

•  Rainer Stotzka, Brian Matthews, Stefano Cozzini, Andy Gotz