http://www.aiida.net
Davide CampiTheory and Simulation of Materials, EPFL, Switzerland
Jülich, 10 May 2017
Managing computational scientific research and
accelerate discovery in Materials Science using
AiiDA
Atomistic simulations of molecules and
crystals using quantum-mechanical first-
principles methods
to predict and discovery new materials
and materials properties
Nature,November 2014
12 papers on DFT in the
top-100 most cited
papers in the entire
scientific literature
Van Noorden et al., Nature 514, 550 (2014)
Accuracy and
predictive power of
quantum engines
Computing power 1993-2013
Sum of the top-500 supercomputers
Number 1
Number 500 150,000x increase
in the past 20 years
1 month (1993)
↓10 seconds (2015)
...but did we think at how to manage
simulations?
We should manage
simulations and data
with a
dedicated platform
But how should the
platform be?
Computational science should be...
reproducibleOften not possible from the
data reported in papers
searchableFind existing calculations,
reuse and data-mine results
reliableResults persisted in
repositories, automated
procedures to reduce errors
and verify results
shareableCommunity to share results,
cross-validate them, and boost
scientific discovery
THE ADES MODEL
We have encoded these requirements in the four pillars
for a computational science infrastructure
AiiDA
http://www.aiida.net
MIT
License
Approved License
Oct 2012: First committo current AiiDA
repository
May 2014: Distributingbeta versions toselected groups
Mar 2012: Firstdiscussions about
current framework
Feb 2015: First publicopen release of AiiDA (0.4.0on bitbucket + readthedocs)
Apr 2015: Paper onarXiv, AiiDA 0.4.1
released
7tutorials fromOctober 2014 to 2017
(Lausanne, Kyoto, Zurich, Berlin, Trieste)
Early codesby BorisKozinsky
2012 2013 2014 2015 2016
Oct-Dec 2015: Paperpublished, AiiDA 0.5.0
released
What is AiiDA?
What is AiiDA?
1.The core of the code is the AiiDA API: the main python code and
classes
What is AiiDA?
2.The AiiDA Object-Relational Mapper (ORM): transparently store
data, codes and calculations in a database, transparent to the user
What is AiiDA?
3.A daemon to manage interaction with remote computers without
user intervention
What is AiiDA?
4.User interaction occurs via the command line tool verdi, the
interactive shell or via Python scripts
Automation in AiiDA
G. Pizzi et al., Comp. Mat. Sci 111, 218-230 (2016)
Automation: coupling to data
• Coupling automation to data:
– uniformity of the input data, usage of codes and computers
– full reproducibility of calculations (data is stored first)
Automation: the daemon
A daemon runs in the background
Calculation state
TOSUBMIT
WITHSCHEDULER
RETRIEVED
PARSED
FINISHED
• Process runs in the
background
• Daemon processes
managed using celery
with a django backend,
and supervisor
• Supports direct
connection and various
protocols: ssh, sftp, …
• Schedulers supported:
SLURM, SGE, Torque,
PBSPro, LSF, …
Abstraction into APIs: a single calculation
26
Parameter = DataFactory('parameter')Structure = DataFactory('structure')
code = get_code('quantumespresso-pw@mycluster')JobCalc = code.new_process()
attrs = {'max_wallclock_seconds': 3600,'resources': {"num_machines": 2},}
inp = {}inp['structure'] = Structure(cif=‘silicon.cif')
inp['parameters'] = Parameter({'CONTROL': {
'calculation': 'scf','restart_mode': 'from_scratch',
},'SYSTEM': {
'ecutwfc': 40.,}})
f = async(JobCalc, _attributes=attrs, **inp)
print f.job.pk
Choose code and computer
Define all inputs
Take care of running the calculation
through the daemon
Abstraction into APIs: a single calculation
27
Parameter = DataFactory('parameter')Structure = DataFactory('structure')
code = get_code('quantumespresso-pw@cluster2')JobCalc = code.new_process()
attrs = {'max_wallclock_seconds': 3600,'resources': {"num_machines": 2},}
inp = {}inp['structure'] = Structure(cif=‘silicon.cif')
inp['parameters'] = Parameter({'CONTROL': {
'calculation': 'scf','restart_mode': 'from_scratch',
},'SYSTEM': {
'ecutwfc': 40.,}})
f = submit(JobCalc, _attributes=attrs, **inp)
print f.job.pk
Just one line change to change
computer (with different cluster,
schedulers, version of codes, …)
Data gets stored in the DB during
submission
Take care of running the calculation
through the daemon
Data in AiiDA
G. Pizzi et al., Comp. Mat. Sci 111, 218-230 (2016)
Storage and provenance
• Inspiration from the open provenance
model
• Any calculation: a function,
converting inputs to outputs:
out1, out2 = F(in1, in2)
• Each object is a node in a graph,
connected by directional labeled links
• Output nodes can be used as inputsout1 out2
in1 in2
F
data
data
data
data
calc
• Calculated properties: result of complex, connected calculations
• How do we store simulations preserving the connected
structure?
Data provenance: Directed Acyclic Graphs
The Quantum ESPRESSO example
• The actual nodes required by a calculation depend on the
developer of the plugin
• We developed complete plugins for the Quantum ESPRESSO
codes (pw.x, ph.x, cp.x, postprocessing, ...)
Pw
Calculation
Parameter
Data
CONTROL- calculation = ‘scf’
SYSTEM- ecutwfc = 40.- ecutrho = 320.
Kpoints
Data
MP grid4x4x1
Code
hammer4.rcc.psu.edu/usr/global/qe/5.1/bin/pw.x
Upf
Data
PSLibrary 0.3.1C.pbe-n-rrkjus_psl.UPF
Upf
Data
PSLibrary 0.3.1H.pbe-rrkjus_psl.UPF
Structure
Datacell=...atom_positions=...
16 cpus1h walltime
...
The output nodes
CodeUpf
Data
Upf
Data
Upf
Data
Parameter
Data
Parameter
Data
Structure
Data
Pw
Calculation
2 nodes
30 min walltime
Parameter
Data
energy=-12.4
forces=[1.2,0.12,...]
walltime=1h2m
Structure
Data
Array
Data
After parsing, a set of
output nodes are
attached to the
calculation
Accessible via the ResultManager:
c.res.energy,c.res.forces, ...
Data provenance: Directed Acyclic Graphs
Directed Acyclic Graphs:
appropriate representation
of calculations, data
and their provenance
Next questions
What must we store?
How can we store it?
How can we query it?
Saving the DAGs: Nodes and Links
How to represent it… in SQL?
• Each node: row in a SQL table
• Additional data:
• key-value attributes
• Files/folders
• Links also stored in a SQL table
⇒ jobs provenance
G. Pizzi et al., Comp. Mat. Sci
111, 218-230 (2016)
• Multiple backends supported:
• SQL (MySQL, PostgreSQL, …) and
attributes EAV table via Django
• SQL+JSONB (PostgreSQL > 9.4) via
SQLAlchemy
• Easy to extend to other backends
(preliminary benchmarks with graph
DBs like Neo4j and TitanDB)
Environment in AiiDA
G. Pizzi et al., Comp. Mat. Sci 111, 218-230 (2016)
Low-level pillars User-level pillars
Is AiiDA easy to use?
Does it provide an advantage
over files and custom-made scripts?
Environment in AiiDA: user interaction
results = QueryBuilder(**{
'path':[
{'cls':PhCalculation, 'tag': 'ph'},
{'cls':ParameterData, 'tag': 'param'}],
'filters':{
'ph': [‘machine': “piz-daint”},
'param': [‘attributes.min_freq': {‘<‘:-50.0}]},
'project':{
'param': [‘attributes.min_freq'],}
}).iterall()
DB-agnostic query interface for
AiiDA objects and their provenance
Seamless user interaction Extensive documentation + tutorials
http://aiida-core.readthedocs.org/
High-level Python interface
Reusability and modularity: AiiDA plugins
Calculation Data Parser Transport SchedulerImporters &
exporters
Generation of
input files for
a given code
Quantum
ESPRESSO,
Phonopy,
ASE, GPAW,
Yambo,
NWChem, …
Management
of data
objects for
input/output
files&folders,
parameter sets,
remote data,
structures,
pseudos,
...
Parsing of
code output
& generation
of new DB
nodes
Quantum
ESPRESSO,
Phonopy,
ASE, GPAW,
Yambo,
NWChem,
...
How to
connect to a
cluster
local connection,
ssh, ...
How to
interact with
the scheduler
PBSPro, Torque,
SGE, SLURM,
LSF,…
Import
structures, ...
from external
DBs
ICSD, COD,
TCOD, MPOD, ...
Environment in AiiDA: Scientific workflows
• Computed properties: result of complex
sequences of calculations
• Not only storage: also the management and
encoding is important for reproducibility
• We need to encode and to share
“turn-key solutions” for:
– the calculation of materials properties
– the automatic validation of results
Workflows - a ‘Hello World’ example
@wfdef sum(a, b):
return {'sum': a+b}
@wfdef mul(a, b):
return {'prod': a*b}
@wfdef add_mul_wf(a, b, c):
return {'result':mul(add(a, b)[‘sum’], c)[‘prod’]}
final_value = add_mul_wf(a=Int(3),b=Int(4),c=Int(5))[‘result’]
final_value = (3+4) * 5 = 35
Workflows features
• Automatic tracking of provenance in the DB from simple
python functionsInputs, outputs, call links, stored by simply adding a decorator
• Both serial and parallel (multithreaded) execution(using async calls and waiting for the result of futures) and
combination of parallel and serial execution
• Checkpointing of workflows(you can shutdown your machine while it’s waiting for a day-long job,
and then continue from where you left) using a custom Workflow
class
• Nesting and reuse of existing workflows
• Description of the inputs and outputs of a workflowfor perspective provenance
A workflow example:
phonon dispersions
Main-Workflow
Structure
Relaxation
Dynamical
matrices
Interatomic
force constants
Phonon
dispersion
Su
b-w
ork
flo
ws
Sin
gle
calc
ula
tio
ns
Relaxation #1
Relaxation #2
Relaxation #n
Structure cell
converged
Restart
managementRestart (wall-time
exceeded, …)
PW vc-relax
PW vc-relax
PW vc-relax
several failure cases
handled automatically
Initialize PH
PH on q-grid
Collect phonons
Parallelization
PH on q1
PH on q2
PH on qn
N. Mounet et al.
Visualising the outcome
params = {'input': {'kpoints_density': 0.2,
'convergence': 'tight'},
'structure': structure,
'pseudo_family': pseudo_family,
'machinename': 'mycluster',
'pw_input':{'volume_conv_threshold': 5e-2},
'pw_parameters': {
'SYSTEM': {'ecutwfc': 30.},
'ELECTRONS': {'conv_thr': 1.e-10}}
'ph_input':{'distance_kpoints_in_dispersion': 0.005,
'diagonalization': 'cg'}
}
wf = asyncd(PhBandsWorkflow, **params)
Discovering new 2D materials
N. Mounet et al., arXiv:1611.05234
Discovering new 2D materials
Discovering new 2D materials
Discovering new 2D materials
• Structural stability
• Band structure
• Band alignment
• Effective masses
• Carrier mobility
• Topological properties
2D FETs 2D Tunnel-FETs 2D Spin-FETs 2D Topological
insulators2D Conductors
Schottky-Barrier-
Free Contacts
Loss-less plasmonic
Effective masses workflow
Topological workflow
Some current applications
Phonon-phonon
scattering in 2DPhonon hydrodynamics in 2D materials
1D metallic wires
at interfacesEngineering polar discontinuities in 2D
Functional
development
Development of a Koopmans’ compliant
functional
Neural Network
potentials
Generating databases for neural network
potentials
Pseudopotential
database
Creation of a Standard Solid
StatePseudopotentials library
Thermodynamical
properties of 2D
materials
Discovering 2D materials and creating adatabase
of their thermodynamical properties
Paraelectric-ferroelectric transition in perovskites,
optical properties with GW, stability of structures, …
Sharing in AiiDA
G. Pizzi et al., Comp. Mat. Sci 111, 218-230 (2016)
Sharing in AiiDA
• Private AiiDA instances
• UUIDs used to uniquely identify
all data/calculation objects
• Data can be pushed to other
repositories or the outside
world
...
Materials Cloudwww.materialscloud.org
App-store model
“App-store”-like model for
plugins & workflows, e.g.
• Computers: automatically
setup a new cluster or
supercomputer
• DB importers: load structures
and data from COD, ICSD, …
• Calculations: plugin for your
simulation software
• Turn-key solutions:
workflows to compute a
desired property
• …
Quantum
ESPRESSO
Yambo CP2K Fleur
Simulation codes
Electronic
density
Phonons Optical
properties
Transport
Turn-key workflows
CSCS (CH) CINECA (IT) BSC (ES) Jülich (DE)
Computing clusters
The MaterialsCloud AiiDA app-store
Acknowledgements
Giovanni
Pizzi
(EPFL)
Andrea
Cepellotti
(EPFL)
Andrius
Merkys
(Vilnius)
Nicolas
Mounet
(EPFL)
Th
e A
iiD
Ate
am
Boris
Kozinsky
(BOSCH)
Martin
Uhrin
(EPFL)
Spyros
Zoupanos
(EPFL)
Nicola
Marzari
(EPFL)
Snehal
Waychal
(EPFL)
Leonid
Kahle
(EPFL)
Fernando
Gargiulo
(EPFL)
Rico
Häuselmann
(EPFL)
Sebastiaan
Huber
(EPFL)
Conclusions
A computational science platform
adopting the ADES model:
Datareproducibility&provenance, directed acyclic graphs,
queries
Environmentflexible platform; workflows to encode scientists’ knowledge
Sharingsocial ecosystem to encourage interactions
Automationautomate repetitive tasks via daemon, abstracting into
APIs
Contacts and info:
Website: http://www.aiida.net
Docs: http://aiida-core.readthedocs.io
Git repo: https://github.com/aiidateam/aiida_core/
@aiidateam
https://www.facebook.com/aiidateam
Standard Solid State Pseudopotentials
I. Castelli, N. Mounet, and N. Marzari, in preparation
http://www.materialscloud.org/sssp
and
AiiDA: single job submission
Code
qepw-5.1-hammertype: PwCalculation
Pw
Calculation
2 nodes
30 min walltime
Upf
Data
Upf
Data
Upf
Data
Ba.pbe-psl.UPF
Ti.pbe-psl.UPF
O.pbe-psl.UPF
# Retrieve code from the DB
code = Code.get(name='pw-localhost')
# Create new cell in DB (read from file)
struc = StructureData(cif=‘BaTiO3.cif’)
# Namelists
parameters = ParameterData({
'CONTROL': ...
'SYSTEM': ...,
'ELECTRONS': ...
})
# Kpoints
kpoints = KPointsData()
kpoints.set_kpoints_mesh([4,4,4])
# Prepare calculation
calc = code.new_calc(
max_wallclock_seconds(30*60), # 30 min
resources = {num_machines=2})
calc.label = “Test QE pw.x with AiiDA”
calc.description = “A longer description”
# setup links
calc.use_structure(struc)
# Store all nodes to the DB and submit
calc.store_all()
calc.submit()
calc.use_parameters(parameters)
calc.use_kpoints(kpoints)
calc.use_pseudos_from_family('pbe-pslibrary031')
Kpoints
Data
4x4x4MP grid
Parameter
Data
SCF
ecutwf=30
ecutrho=240
Structure
Data
&CONTROL
calculation = 'scf'
outdir = './out/'
prefix = 'aiida'
pseudo_dir = './pseudo/'
restart_mode = 'from_scratch'
verbosity = 'high'
/
&SYSTEM
ecutrho = 200
ecutwfc = 30
ibrav = 0
nat = 5
ntyp = 3
/
&ELECTRONS
conv_thr = 1.d-06
/
ATOMIC_SPECIES
Ba 137.327 ba_pbe_v1.uspp.F.UPF
O 15.9994 o_pbe_v1.2.uspp.F.UPF
Ti 47.867 ti_pbe_v1.uspp.F.UPF
ATOMIC_POSITIONS angstrom
Ti 2.015 2.015 2.015
Ba 0.000 0.000 0.000
O 0.000 2.015 2.015
O 2.015 0.000 2.015
O 2.015 2.015 0.000
K_POINTS automatic
2 2 2 0 0 0
CELL_PARAMETERS angstrom
4.030 0.000 0.000
0.000 4.030 0.000
0.000 0.000 4.030
&CONTROL
calculation = 'scf'
outdir = './out/'
prefix = 'aiida'
pseudo_dir = './pseudo/'
restart_mode = 'from_scratch'
verbosity = 'high'
/
&SYSTEM
ecutrho = 200
ecutwfc = 30
ibrav = 0
nat = 5
ntyp = 3
/
&ELECTRONS
conv_thr = 1.d-06
/
ATOMIC_SPECIES
Ba 137.327 ba_pbe_v1.uspp.F.UPF
O 15.9994 o_pbe_v1.2.uspp.F.UPF
Ti 47.867 ti_pbe_v1.uspp.F.UPF
ATOMIC_POSITIONS angstrom
Ti 2.015 2.015 2.015
Ba 0.000 0.000 0.000
O 0.000 2.015 2.015
O 2.015 0.000 2.015
O 2.015 2.015 0.000
K_POINTS automatic
2 2 2 0 0 0
CELL_PARAMETERS angstrom
4.030 0.000 0.000
0.000 4.030 0.000
0.000 0.000 4.030
Ba
OTi
Plotting an Equation of State
# Load calculations from this groupgroup = Group.get(name='aiidatutorial_eos')
# Load calculations from this groupcalcs = list(group.nodes)
# Create ('alat','energy') pairs to plot# and sort by 'alat'data = sorted([
(c.inp.structure.cell[0][0],c.res.energy)for c in calcs])
# Transpose (get a list of all# 'alat', and a list of all# 'energies')x, y = zip(*data)
# Plotfrom pylab import *plot(x, y, 'o', label='data')xlabel('alat (ang)')ylabel('energy (eV)')show()
data = [[3.8, -3704.9718541],[3.81034482759, -3705.0487423],[3.82068965517, -3705.1225978],...]
Pw
Calculation
Structure
Data
Parameter
Data
c.inp.structure
c
c.res
Environment in AiiDA: queries
Query tool developed to perform queries using python calls
without any knowledge of DB query languages (such as SQL)
q1 = QueryTool(class=PhCalculation)
q1.filter_attr('machine_name', '=', "piz-daint")
q1.filter_attr('min_frequency', '<', 0.,relnode="out.parameters")
q = QueryTool(class=StructureData)q.filter_rel(child=q1)
results = q.run_query()
• Provides a backend-agnostic query interface for AiiDA objects and their
provenance
• Internally uses Django and SQLAlchemy for conversion to SQL queries
• Queries cached and run only when needed
• Queries can be joined; transitive closure table used transparently
Querying for magnetic systems
Querying for metals
Querying for metals
Querying for metals
New advanced QueryTool
• Allow queries that
traverse the graph
• Attach arbitrary filters
to any node
• Specify projections, i.e.,
data that you want to
get back from the query
• Express queries in a
JSON-covertible
format (to expose them
via a REST API
Param
MD
Struc
Relax
Struc2
• SYSTEM.conv_t
hr < 1e-6
• SYSTEM.ecutwf
c > 60.
Retrieve:
• Param: IONS.temp
•MD: id, time
•Traj: id, num_steps
•Struc: id, name, atoms, cell
•Struc: id, name
Traj
• 4 atoms
• Contains N and H
• Does not contain
S
• (v3)z < 3. or
5. < (v3)z < 7.
New advanced QueryTool
queryhelp = {'path':[
{'class':'Param'}, {'class':'MD'},{'class':'Traj'}, {'class':'Struc', 'input_of':'MD'},{'class':'Relax', 'input_of':'Struc'},{'class':'Struc', ‘label':'Struc2','input_of':'Relax'},
],'project':{
‘Param’:['attributes.IONS.tempw'],‘Md': ['id', 'time'], ‘Traj': ['id', 'attr.length'],'Struc':['id','name', ‘attr.atoms’, 'attr.cell'], 'Struc2':['id', 'name'],
},'filters':{
'Param':{ 'and':[
{'attr.SYSTEM.econvthr':{'<':1e-6}},{'attr.SYSTEM.ecutwfc':{'>':60.}}]
},'Struc':[
{‘attr.cell.2.2':{
'or':[{'<':3.0}, {'>':5., '<':7.}]},},{
'attr.atoms':{'contains':['N', 'H'],‘~has_key’:'S','of_length':4}
},],
}
Param
MD
Struc
Relax
Struc2
• 4 atoms
• Contains N and H
• Does not contain
S
• (v3)z < 3. or
5. < (v3)z < 7.
• SYSTEM.conv_t
hr < 1e-6
• SYSTEM.ecutwf
c > 60.
Retrieve:
• Param: IONS.temp
•MD: id, time
•Traj: id, num_steps
•Struc: id, name, atoms, cell
•Struc: id, name
Traj
MaterialsCloud infrastructure
MaterialsCloud Data nodes list
MaterialsCloud Data nodes metadata
MaterialsCloud Data nodes visualisation
MaterialsCloud Data nodes visualisation