nomad (novel materials discovery) laboratory a european … materials... · nomad 2 0172018 data is...

4
©NOMAD, 2017/2018 Data is a crucial raw material for the 21st century. Since its launch in November 2015, the NOMAD (Novel Materials Discovery) Laboratory Centre of Excellence (CoE) (cf. Fig. 1) has been working to unify and improve the usefulness of computational materials-science data. Orthogonally to other databases in computational materials science that typically focus on a single computer code and are restricted to a closed research group or consortium, the ambitious goal of NOMAD is to serve all im- portant computer codes and develop Big-Data tools and services that benefit the whole community of materials science and engineering, in industry and in academia. Clearly, much of the value of high-throughput cal- culations is wasted without deeper, Big-Data-driv- en analyses. This is the extreme-scale computing challenge addressed by NOMAD. NOMAD creates, collects, cleanses, refines, and stores data from computational materials science, and develops tools to mine this Big Data to find patterns, struc- ture, and novel information that could not be discovered from studying smaller data sets. The large volume of data and innovative tools already available from NOMAD will enable researchers in industry and academia to advance materials sci- ence, identify new physical phenomena, and help to improve existing and develop novel products and technologies. A recent Nature Editorial, “Not so open data (empty rhetoric over data sharing slows science)”, 1 noted that many scientific fields are resistant to openly sharing data. In computational materials science, there has been a cultural shift in attitudes towards open data, largely fostered by NOMAD. This was also noted in a recent Nature Correspondence: “Open data settled in materials theory”. 2 The NOMAD Repository provides not only open access to results (e.g. energies) but to the full input and output files of computational materials science. These are provided by all major databases in the field (in particu- lar, AFLOW, OQMD, and Materials Project) and by many individual researchers and groups. In this role, the NOMAD Repository is also listed among the recommended re- positories of Nature Scientific Data, where it is the only one for materials science. The NOMAD Repository is now the largest repository for input and output files of computational materials science world- wide. Currently, it contains more than 44 million total-energy calculations, corre- sponding to billions of CPU-core hours 1 https://www.nature.com/news/empty-rhetoric-over-data-sharing-slows-science-1.22133 2 https://www.nature.com/articles/548523d The NOMAD (Novel Materials Discovery) Laboratory A European Centre of Excellence Industry Update - March 2017 - January 2018 Figure 1. The Overall concept and structure of the NOMAD Laboratory CoE. For details, please visit https://NOMAD- CoE.eu, and watch a 3-minute video: https://youtu.be/ yawM2ThVlGw Figure 2. Codes with more than 100 uploads to the NOMAD Repository, as of January 2018.

Upload: others

Post on 02-Jun-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: NOMAD (Novel Materials Discovery) Laboratory A European … Materials... · NOMAD 2 0172018 Data is a crucial raw material for the 21st century. Since its launch in November 2015,

©NOMAD, 2017/2018

Data is a crucial raw material for the 21st century. Since its launch in November 2015, the NOMAD (Novel Materials Discovery) Laboratory Centre of Excellence (CoE) (cf. Fig. 1) has been working to unify and improve the usefulness of computational materials-science data. Orthogonally to other databases in computational materials science that typically focus on a single computer code and are restricted to a closed research group or consortium, the ambitious goal of NOMAD is to serve all im-portant computer codes and develop Big-Data tools and services that benefit the whole community of materials science and engineering, in industry and in academia.

Clearly, much of the value of high-throughput cal-culations is wasted without deeper, Big-Data-driv-en analyses. This is the extreme-scale computing challenge addressed by NOMAD. NOMAD creates, collects, cleanses, refines, and stores data from computational materials science, and develops tools to mine this Big Data to find patterns, struc-ture, and novel information that could not be discovered from studying smaller data sets. The large volume of data and innovative tools already available from NOMAD will enable researchers in industry and academia to advance materials sci-ence, identify new physical phenomena, and help to improve existing and develop novel products and technologies.

A recent Nature Editorial, “Not so open data (empty rhetoric over data sharing slows science)”,1 noted that many scientific fields are resistant to openly sharing data. In computational materials science, there has been a cultural shift in attitudes towards open data, largely fostered by NOMAD. This was also noted in a recent Nature Correspondence: “Open data settled in materials theory”.2 The NOMAD Repository provides not only open access to results (e.g. energies) but to the full input and output files of computational materials science. These are provided by all major databases in the field (in particu-

lar, AFLOW, OQMD, and Materials Project) and by many individual researchers and groups. In this role, the NOMAD Repository is also listed among the recommended re-positories of Nature Scientific Data, where it is the only one for materials science.

The NOMAD Repository is now the largest repository for input and output files of computational materials science world-wide. Currently, it contains more than 44 million total-energy calculations, corre-sponding to billions of CPU-core hours

1 https://www.nature.com/news/empty-rhetoric-over-data-sharing-slows-science-1.221332 https://www.nature.com/articles/548523d

The NOMAD (Novel Materials Discovery) Laboratory A European Centre of Excellence

Industry Update - March 2017 - January 2018

Figure 1. The Overall concept and structure of the NOMAD Laboratory CoE. For details, please visit https://NOMAD-CoE.eu, and watch a 3-minute video: https://youtu.be/yawM2ThVlGw

Figure 2. Codes with more than 100 uploads to the NOMAD Repository, as of January 2018.

Page 2: NOMAD (Novel Materials Discovery) Laboratory A European … Materials... · NOMAD 2 0172018 Data is a crucial raw material for the 21st century. Since its launch in November 2015,

©NOMAD, 2017/2018

used at high-performance computer centers worldwide (cf. Fig. 2). NOMAD hosts the data for at least 10 years (for free), and it offers DOIs to make the data citable. Web pages of many codes used in computational materials sci-ence acknowledge this service by showing a stamp “supported by NOMAD” at their home page (cf. Fig. 3). Presently, NOMAD is creating a business plan to ensure sustainability of the Repository and related NOMAD services, which include the code-independent Archive, the Encyclopedia, the Big-Data Analytics Toolkit, and the Advanced Graphics.

As the NOMAD Repository data is generated by many different computer codes, it is heteroge-neous and therefore hard to integrate and to use directly for data analytics and extensive com-parisons. The NOMAD team has developed ways to convert the existing open-access data of the NOMAD Repository into a common, code-independent format, developing numerous parsers and cre-ating the NOMAD Archive (https://metainfo.nomad-coe.eu/). NOMAD currently supports 30 solid-state physics, quantum chemistry and atomistic simulation codes, yielding an Archive that includes 16.5 TB of compressed raw data, 5.6 TB of compressed extracted metadata, 44 million total-energy calcula-tions, and more than 800 million parsed quantities.

In the last year, we have developed a new REST API based on ElasticSearch, the leading open source solution for querying large datasets, which allows users to fully exploit the growing amounts of data in the Archive. Users can perform detailed queries for specific metadata, calculated quantities and more. Presently, we are moving to a full automation of data handling, from initial upload in the Re-pository to final user access in the Archive, to ensure a robust process where uploaded data is quickly available for users. The NOMAD team has also helped to build replications of the Repository and Ar-chive at different sites in China and Korea as publicly accessible mirrors.

The NOMAD Encyclopedia (https://encyclopedia-gui.nomad-coe.eu, Fig. 4), which went live at the end of April 2017, represents a user-friendly, public access point to the extensive knowledge con-tained in the NOMAD Archive. Anyone can visit the Encyclopedia and, via a user-friendly graphical user interface, see, compare, explore, and comprehend computations from international research-ers from a materials-oriented point of view. It helps users to understand the structural, mechanical, and thermal behavior of a large variety of materials, as well as their electronic properties, responses to external excitations, and other features. All this data is also freely accessible via a web-API. Re-cently, this API was the basis for the development of materials’ fingerprints based on their electronic structure – a functionality which will be included in the Encyclopedia for finding similarities between materials.

Figure 3. Stamp of support found on the web pages of many computer codes.

Figure 4. Snapshot of the NOMAD Encyclopedia

Page 3: NOMAD (Novel Materials Discovery) Laboratory A European … Materials... · NOMAD 2 0172018 Data is a crucial raw material for the 21st century. Since its launch in November 2015,

©NOMAD, 2017/2018

The NOMAD Big-Data Analytics Toolkit (https://analytics-toolkit.nomad-coe.eu) helps users to identify correlations and structure in the Big Data of the Archive. The tools help scientists and engi-neers to select which materials will be most useful for specific applications or predict and identify promising new materials with specific sets of desirable properties, worth further exploration. As of January 2018, twenty analytics tools are publicly available, complete with tutorials, now searchable and accessible via the newly designed, dynamic homepage. In particular, we have released the first tutorial that shows how to perform analyses on Archive data found through user queries using the query GUI (analytics-toolkit.nomad-coe.eu/nomad-query-gui), the latter recently extended with the links to information from AFLOW API and SpringerMaterials database, by matching chemical formu-las and structure prototypes between the NOMAD Archive and AFLOW/SpringerMaterials databases. The tutorials cover important topics in materials science, such as predicting energy difference be-tween crystal structures and assessing crystal-structure stability for a material under different syn-thetic conditions. Through the tutorials, users can reproduce results using default settings or explore their own settings. They can also access the underlying code to tailor the tools to meet their needs.

Seeing helps understanding. Consequently, NOMAD has developed an infrastructure for remote Visualization of the multi-dimensional NOMAD data. Visualization tools allow users to remotely ac-cess hardware, software and data deployed at the Max Planck Computing and Data Facility using just a web browser on devices such as laptops and smartphones, independent of their location. This web-based access makes remote visualizations easier to access for a wide range of users. Virtu-al-reality tools have also been developed to support enhanced training and dissemination. Demos are now available for a range of virtual-re-ality devices (https://www.nomad-coe.eu/the-project/graphics/VR-prototype) for an array of data, from crystals to molecular dy-namics to electron density and more. These tools have seen increased usage by materials scientists outside of NOMAD, with the Atom-ically Resolved Dynamics Department of the Max Planck Institute for the Structure and Dynamics of Matter, the Bio Application Lab of the Leibniz Supercomputing Centre (Mu-nich) and the Shell IT Centre in Bangalore, all setting up HTC Vive installations.

High-Performance Computing Expertise and Hardware enable the NOMAD Laboratory CoE to meet the demands of the Encyclopedia, Big-Data-Analytics Toolkit and Advanced Graphics tools by design and operation of the underlying computing platform for the NOMAD services, as well as application support for both HPC and Big-Data analytics and corresponding workflows. The NOMAD platform consists of a distributed multi-layered storage system connected to scalable computing capacity. The platform is able to use existing computing capacity via virtualization and container technology and is therefore easy to set up at new sites. Through the NOMAD Laboratory CoE, academic and in-dustrial users alike are leveraging European HPC capabilities by gaining access to meaningful, useful presentations of computational materials science data already computed by HPC centers and by using the HPC resources that support delivery of NOMAD tools and services. Users can access all NOMAD services via single sign-on access or anonymously. Local installations are also possible due to the flexible nature of the platform. NOMAD has recently signed a Memorandum of Understanding with the HPC-Europa3 programme (http://www.hpc-europa.org/), which funds international research visits for both academic and industrial projects.

In order to demonstrate the usefulness of NOMAD’s data and tools, we are conducting a number of Case Studies (https://www.nomad-coe.eu/industry/industry-interviews-case-studies) in collaboration with industry. For example, we are looking at CO2 conversion to valuable chemicals and fuels with I-deals, coordinator of the Methanol fuel from CO2 (MefCO2) project. Applying the newly developed

Figure 5. Virtual reality experience, ‘Being and atom’, Shell, India

Page 4: NOMAD (Novel Materials Discovery) Laboratory A European … Materials... · NOMAD 2 0172018 Data is a crucial raw material for the 21st century. Since its launch in November 2015,

©NOMAD, 2017/2018

compressed-sensing and subgroup-discovery approaches, the team has identified physically inter-pretable ab initio descriptors for energy and structure of CO2 adsorbed at binary and ternary oxide surfaces. The descriptors include only properties of involved atomic species, bulk materials, and clean surfaces. We have shown that, contrary to the standard understanding, the O-C-O bending angle does not correlate well with the charge transferred to CO2 for the whole data set. However, the subgroup discovery identified a subset of surfaces for which this correlation is accurate. This subset is characterized by a more ionic character of the bonding between surface cations and oxygen.

The data and tools of the NOMAD Laboratory CoE are freely available to anyone wishing to use them. To ensure that the NOMAD achieves maximum impact and benefit, we have continued our extensive outreach to industrial and academic end-users. For exam-ple, in September 2017, we hosted the Big-Data-driven Mate-rials Science Workshop (http://th.fhi-berlin.mpg.de/meetings/BDMS2017/), bringing together materials science and data analytics communities. We also hosted NOMAD Summer, a hands-on course on tools for novel-materials discovery, in September 2017, delivering lectures and hands-on sessions using NOMAD tools to 40 participants from industry and ac-ademia (Fig. 4). We have conducted numerous Industry In-terviews and will continue to seek industry feedback on our tools and services, in particular at our third Industry Meeting in February 2018.

Figure 6. NOMAD Summer participants

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 676580.The materials presented and views expressed here are the responsibility of the author(s) only. The EU Commission takes no responsibility for any use

made of the information set out.

https://nomad-coe.eu/ @NoMaDCoE www.facebook.com/nomadCoE

NOMAD Partners