photon and neutron data infrastructure 2 matthews - ig...photon and neutron data infrastructure ......
TRANSCRIPT
Photon and Neutron Data Infrastructure
Brian Matthews Scientific Computing Department
Co-Chair, RDA Photon and Neutron Science
Interest Group
PanSig • To sustain the community of interest for common data issues for the
Large-Scale Photon and Neutron Source communities.
• 14 European institutes + ORNL, Argonne, SLAC, ANSTO, ANS (at least) – Facility computing infrastructure providers – But also want to involve user communities
• Structural Biology, Materials
• Meetings: P3, March 2014, Dublin, P4, Sept 2014, Amsterdam
• Cochairs: – Amber Boehnlein (SLAC, USA) – Brian Matthews (STFC, UK) – Thomas Proffen (ORNL, USA) – Frank Schluenzen (DESY, DE)
• Building cross-continental links and consortia
Large-‐Scale science facili/es The key challenges of the 21st century – energy, global climate, health and security concerns – need to study ma:er at the scales from single atoms (10-‐10 m) to living cells (10-‐6 m), to engineering components (10-‐3 m)
This research requires large scale research infrastructures that are beyond the capability of any single university or research group
... to construct and operate a shared data infrastructure for Photon and Neutron laboratories...
Neutron diffraction
X-ray diffraction
High-quality structure refinement
• Common data catalogue
• Integration of users data from different facilities
• Track provenance of data through analysis stages
• Deploy standards for long-term curation
• Support scalability through parallelisation
• Deploy infrastructure in three different techniques
Open Data Infrastructure (Nov 11– Sep14)
PaN-Data Integration
Shared Data Policy Framework
Federated User Authentication
Federated Data Catalogue
Common Data Format NeXus
Common data environment, common user experience
Proposals
Once awarded beamtime at ISIS, an entry will be created in ICAT that describes your proposed experiment.
Experiment
Data collected from your experiment will be indexed by ICAT (with additional experimental conditions) and made available to your experimental team
Analysed Data
You will have the capability to upload any desired analysed data and associate it with your experiments.
Publication
Using ICAT you will also be able to associate publications to your experiment and even reference data from your publications.
B-lactoglobulin protein interfacial structure
Exam
ple
ISI
S Pro
pos
al
GEM – High intensity, high resolution neutron diffractometer
H2-(zeolite) vibrational frequencies vs polarising
potential of cations
Central Facility • Secure access to user’s data
• Flexible data searching
• Scalable and extensible architecture
• Integration with analysis tools
• Access to high-performance resources
• Linking to other scientific outputs • Data policy aware
ICAT and CSMD • The Core Scientific Meta-Data Model
(CSMD) is a study-data oriented model which has been developed at STFC since 2001.
• It captures high level information about scientific studies and the data that they produce throughout a facility’s scientific workflow.
• It is a key aspect of the ICAT, a software suite designed to manage the cataloguing and (continuous) access to facilities data.
http://www.icatproject.org/mvn/site/icat/4.2.5/icat.core/schema.html
• Investigation • Investigator • Topic and Keyword • Publication • Sample • SampleParameter • Dataset • DatasetParameter • Datafile • DatafileParameter • Parameter
ICAT + Mantid (desktop client)
ICAT Tool Suite and Clients
ICAT APIs
IDS (ICAT Data Service)
ICAT Job Portal
TopCAT (Web Interface to
ICATs)
ICAT + DAWN (Eclipse Plugin)
Desktop app
Clusters/HPC
Disk
Tape
Metadata as Middleware
Integrating the Lifecycle Growing opportunities to build on the data management
infrastructure
Extending the integrated data management system across the scientific lifecycle;
From Proposal to Publication
• Support post-experimental analysis - Image reconstruction with ISIS (IMAT) and DLS:
• data from experiment, user access to SCARF - ICAT Job Portal for the LSF Octopus Facility
• Integrated job submission
• Support Data Publication - DOIs issued for Data - Data Preservation
• Support Publications - STFC Epubs Repository - Linking data and publications to provide a record of
science
Crystalise research
idea
Seek & gain
funding
Perform research & gather
data
Analyse collected
data
Manage & curate data
Publish results
Post-experimental support
Scan Reconstruct Segment + Quantify
3D mesh + Image based Modelling
Predict + Compare
Some mage credit: Avizo, Visualization Sciences Group (VSG)
Data Catalogue
Petabyte Data storage
Parallel File system
HPC CPU+GPU
Visualisation
Infrastructure + Software + Expertise! • Tomography: Dealing
with high data volumes – 200Gb/scan, ~5 TB/day (one experiment at DLS)
• MX: high data volumes, smaller files, but a lot more experiments
• Hard to move the data – needs to be handled at the facility?
ISIS:IMAT
DLS:I12/I13
A Tomography Reconstruction Demo for IMAT
• In- (ISIS) and post-experiment (ISIS and DLS) data processing. – IMAT is a new neutron imaging instrument on ISIS
• HPC integration with experiments; – Using SCARF CPU and GPU clusters
• A tomographic image reconstruction toolbox – With supported algorithms;
• High throughput image reconstruction framework; – With fast 3D visualisation;
• An integral component of IMAT’s in-experiment data analysis capability through Mantid (ISIS) and DAWN (DLS),
• Maximise the science resulting from Data collected on facility instruments.
• Towards a service in 2015/2016
PanDaas • Data Analysis as a
Service – Led by ESRF – 18 institutes
worldwide • Data reduction and
analysis platform Photon and Neutron analytical facilities
• Not funded, but a continuing need – Looking to continue.
NFFA-EUROPE • Nanoscience Foundries and Fine
Analysis – Research and Innovation actions
• Integrated, distributed research infrastructure – for multidisciplinary research at the
nanoscale – from synthesis and
nanolithography – nanocharacterization, theoretical
modelling and numerical simulation,
– coordinated open-access to complementary facilities
• Information and Data management Repository Platform (IDRP) – CNR-IOM, ESRF, STFC, KIT
• RDA standardisation
RDA-IG Data Life Cycle Design for Nano-Science
• Nano-Science comprises experiments, instruments, facilities and theoretical work focusing in studying, modeling and composing structures and phenomena on a nanometer scale.
• The Interest Group
– to enable data sharing across facilities and user groups, – resulting in data publications as open data.
• The RDA Interest Group will establish a Data Life Cycle Lab to foster for world-wide communication, coordination, and support for Data Life Cycle Management, Data Analysis, and Data Sharing in Nano-Science. The agenda will include:
– To organize and moderate a regular “Project and Information Share“ – To provide an open information and discussion platform in close collaboration with RDA’s
infrastructure, e.g. providing video shares of the presentations and a forum for discussions. – To inform about, to discuss, and to adopt RDA’s outputs. This includes active communication
with other Working and Interest Groups within RDA. – Research and development to support Data Life Cycle Management in Nano-Science and
information exchange about ongoing R&D performed in various nano-science facilities. – To foster interoperability by developing metadata standards for the cataloguing, access and
exchange of data and associated information describing nano-science experiments.
• The last point may turn into an Working Group
• Rainer Stotzka, Brian Matthews, Stefano Cozzini, Andy Gotz