cineca – 12/06/2012 visualization case study

of 21 /21
1 Visualization Case Study Crystal Fingerprinting and STM4 for USPEX output analysis Ing. Mario Valle CINECA – 12/06/2012 New CSCS building – computer room

Upload: others

Post on 29-Mar-2022

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CINECA – 12/06/2012 Visualization Case Study

1

Visualization Case StudyCrystal Fingerprinting and STM4 for USPEX output analysis

Ing. Mario Valle

CINECA – 12/06/2012

New CSCS building – computer room

Page 2: CINECA – 12/06/2012 Visualization Case Study

2

Started with postprocessing

Nice images for publications

Movies for conference talks

Fra

nce

sco

Ge

rva

sio

–E

TH

rich

Data from disjoint subfields

Mario Valle - Visualization Case Study - 12/06/2012

Macromolecules

Crystallography

Help overcome tools inflexibility

For example there are nice crystallographyprograms that do not support dynamic data and do not allow customization

Page 3: CINECA – 12/06/2012 Visualization Case Study

3

STM4 is a frameworkfor the development of

unusual and enhanced techniquesfor chemistry visualization

Offer broader set of techniques

Provide enhanced techniques

Available data with the standard isosurface

Isosurface with the new volume interpolator

Page 4: CINECA – 12/06/2012 Visualization Case Study

4

STM3 Gallery

STM3 modules

The LEGO DNA

Page 5: CINECA – 12/06/2012 Visualization Case Study

5

One of the continuing scandals…

Prediction of the stable crystal structure on the basis of only the chemical composition is one of the central problems of condensed matter physics, which for a long time remained unsolved

The ability to solve this problem would open new ways also for the understanding of the behaviour of materials

Mario Valle - Visualization Case Study - 12/06/2012

Idea: copy genetic evolution

Mario Valle - Visualization Case Study - 12/06/2012

Page 6: CINECA – 12/06/2012 Visualization Case Study

6

Evolution of crystal structures

Mario Valle - Visualization Case Study - 12/06/2012

Examples of USPEX predictions

Novel high pressure

phases of CaCO3

Low-energy 3D carbon structure

40-atom cell of MgSiO3 post-perovskite

Page 7: CINECA – 12/06/2012 Visualization Case Study

7

USPEX discovered new materials

Boron at 1 atm: USPEX easily found the complex α-B structure...

...and discovered also the superhardboron γ-B28 phase (Nature)

Prof. A. Oganov

USPEX structure cancer problem

Mario Valle - Visualization Case Study - 12/06/2012

Different colors means different crystal structures

US

PE

X s

tru

ctu

re c

ance

rNormal structure cluster generation

GenerationGeneration

The problem

Mario Valle - Visualization Case Study - 12/06/2012

USPEX is a crystal structure predictor based on an evolutionary algorithm

Each run produces hundred of putative crystal structures…

…but many of them are equal

So an intensive manual labor is needed to prune duplicated structures

Project: develop a (semi)automaticway to extractunique structuresfrom USPEX outputs

Page 8: CINECA – 12/06/2012 Visualization Case Study

8

Proposed solution from High-Dim

Mario Valle - Visualization Case Study - 12/06/2012

Compute unique coordinates

Define distance measure

Add grouping criteria

Space 100-3000dimensional

Each group describes a distinct structure

Structure “coordinates”

Set of distancesfor each atomin the unit cell

Distance sets concatenated for all atoms in the structure

agg

(coordinate)

Structure “fingerprint” CrystalFp

Set of distancesfor each atomin the unit cell

Distance sets concatenated for all atoms in the structure

agg

(coordinate)

Page 9: CINECA – 12/06/2012 Visualization Case Study

9

Visual design and validation

Mario Valle - Visualization Case Study - 12/06/2012

Built a tool to explore 

algorithm choices and 

parameters settings

This tool wraps the  

classifier library, called 

CrystalFp, and provides 

various interactive 

visual diagnostics to 

check classifier 

behavior

It is built inside STM4, 

the molecular 

visualization toolkit 

developed at CSCS

1. Load structures

2. Filter on energy

3. Compute fingerprints

4. Compute distances

5. Group structures

The application interface gives access to all CrystalFp algorithms and their parameters in a clear process workflow

STM4 provided an environment that accelerated the implementation

Workflow support

Page 10: CINECA – 12/06/2012 Visualization Case Study

10

Visual diagnostics tools

1. 2D maps

2. Charts

3. Picking for details

4. 2D data export

Various visualization and analysis tools to check and validateCrystalFp algorithms behavior

Visual diagnostics: distance matrix

Mario Valle - Visualization Case Study - 12/06/2012

Distances between structures Distances ordered by group

Clustering visual diagnostic

Mario Valle - Visualization Case Study - 12/06/2012

DFS grouping Pseudo SNN (K=1)

Pseudo SNN (K=5) SNN (K=5)

DFS: Deep first search of the neighbors nodes

Pseudo SNN: Maintain connection between nodes only if they share at least K neighbors

SNN: As above plus a DBSCAN pass

Page 11: CINECA – 12/06/2012 Visualization Case Study

11

Visual diagnostics: scatterplot

Colored by “stress” to detect local minima traps

Colored by group

Diagnostic chart:distances in 2D vs. distances in High‐Dim space

The scatterplot tool in CrystalFp tries to map High‐Dim space points to 2D preserving their relative distances

USPEX problem solved: an example

Hydrogen at 600 GPa (16 atoms)

The USPEX run produced 1274 structures

From these the 794 within 0.5 eV from the lowest energy value found are selected

Manual analysis to remove duplicated structures from this set: ~20h of work

Using the CrystalFp classifier: ~10min

At the end found only 4 unique structures: One α-Ga type (top) One Cs-IV (bottom), the ground state (i.e. the

lower energy structure), and two closely related structures

Mario Valle - Visualization Case Study - 12/06/2012

Page 12: CINECA – 12/06/2012 Visualization Case Study

12

Classifier integration in USPEX

Original USPEX.A lot of identical structures.

USPEX after the classifier integration.No more “structure cancer”!

USPEX structure cancer

GenerationGeneration

Sol

ved

at

the

root

!

From the problem solution …

Mario Valle - Visualization Case Study - 12/06/2012

Compute unique coordinates

Define distance measure

Space 100-3000dimensional

Add grouping criteria

Each group describes a distinct structure

Page 13: CINECA – 12/06/2012 Visualization Case Study

13

… to a new paradigm

Mario Valle - Visualization Case Study - 12/06/2012

Compute unique coordinates

Define distance measure

Space 100-3000dimensional

High Dim space tools

To look at crystal structures from a novel perspective

Unfold data to lower dimensions

Mario Valle - Visualization Case Study - 12/06/2012

One famous test dataset (right I said right!) contains points on a rolled sheet that forms a 3D shape called the “Swiss roll” (a superb example on the left)

Multidimensional scaling projects points from high dimensional space to a lower dimensional one preserving distances between points as faithfully as possible 

Sammon mapping to 2D

CCA mapping

CrystalFp multi dim. scaling

Mario Valle - Visualization Case Study - 12/06/2012

The scatterplot tool in 

CrystalFp implements a

Force Directed Placement

multidimensional scaling 

algorithm (here the points 

are colored by energy)

Nanocluster data from Dr. Gareth Tribello (USI)

Page 14: CINECA – 12/06/2012 Visualization Case Study

14

Page 15: CINECA – 12/06/2012 Visualization Case Study

15

Study of energy landscapes

Mario Valle - Visualization Case Study - 12/06/2012

A. R. Oganov and M. Valle,How to quantify energy landscapes of solids,The Journal of Chemical Physics, vol. 130,p. 104504, 2009.

Energy landscape of Au8Pd4 system

More complex landscapes

Mario Valle - Visualization Case Study - 12/06/2012

Energy landscape for MgO with 32 atoms/cell

New quantities: quasi-entropy

For each given structure, quasi‐entropy is a measure of disorder and complexity of that structure.

Sstr is better correlated to energy than Steinhardt’s Q6Sstr is better correlated to energy than Steinhardt’s Q6

Si structures developing defects

Page 16: CINECA – 12/06/2012 Visualization Case Study

16

(Totally) unexpected correlations

• We found unexpected correlations between distance and other physical variables

• For example the deceptively simple H2O shows clear correlations and grouping

• This and other datasets motivated us to continue the exploration of the crystal fingerprints’ space…

“And roughly the only mechanism for suggesting questions is exploratory”

A conversation with John W. Tukey and Elizabeth TukeyLuisa T. Fernholz and Stephan Morgenthaler

Statistical ScienceVolume 15, Number 1 (2000), 79-94

Atoms per cell

Fin

gerp

rint c

utof

f

General law

Cry

sta

lFp

–P

ara

me

tric

stu

dy

Page 17: CINECA – 12/06/2012 Visualization Case Study

17

Usage:CrystalFp [options] POSCARfile [ENERGIESfile]

‐v  ‐‐verbose (optional argument)Verbose level (if no argument, defaults to 1)

‐?  ‐h  ‐‐help (no argument)This help

‐t  ‐‐elements (required argument)List of chemical elements

‐es ‐‐max‐step  ‐‐end‐step (required argument)Last step to load (default: all)

‐ss ‐‐start‐step (required argument)First step to load (default: first)

‐et  ‐‐energy‐per‐structure (no argument)Energy from file is per structure, not per atom

‐e  ‐‐energy‐threshold (required argument)Energy threshold

‐r  ‐‐threshold‐from‐min (required argument)Threshold from minimum energy

‐c  ‐‐cutoff‐distance (required argument)Fingerprint forced cutoff distance

‐n  ‐‐nano‐clusters  ‐‐nanoclusters (no argument)The structures are nanoclusters, not crystals

‐b  ‐‐bin‐size (required argument)Bin size for the pseudo‐diffraction methods

‐p  ‐‐peak‐size (required argument)Peak smearing size

...

Page 18: CINECA – 12/06/2012 Visualization Case Study

18

10 20 30

0.1

0.2

0.3

0.4

Depth vs. Order

Degree.of.order..F2.

Dep

th

2 5 10 20

0.0

50

.10

0.2

0

Depth vs. Order

Degree.of.order..F2.

Dep

th

2 5 10 20

0.0

50

.10

0.2

0

Depth vs. Order

Degree.of.order..F2.

Dep

th

Page 19: CINECA – 12/06/2012 Visualization Case Study

19

Interesting correlations

Mario Valle - Visualization Case Study - 12/06/2012

Searching an explanation

Mario Valle - Visualization Case Study - 12/06/2012

Distance vs. dimensionality

GaAs 8 atoms/cellcutoff: 3 – 30 Ådimensionality: 180 – 1800

Page 20: CINECA – 12/06/2012 Visualization Case Study

20

Distance decomposition

Mario Valle - Visualization Case Study - 12/06/2012

GaAs 8 atoms/cellcutoff: 30 Ådimensionality: 1800

But this one?

Mario Valle - Visualization Case Study - 12/06/2012

Structure: Au8Pd4

Cutoff: 30 ÅDimension: 1800

Synthetic datasets

8 atoms with uniformly distributed random fractional coordinates in a cubic unit cell with 5 Å side

Distance distribution vs. embedding dimension

Intrinsic dimension vs. embedding dimension

Page 21: CINECA – 12/06/2012 Visualization Case Study

21

Lessons learned

Mario Valle - Visualization Case Study - 12/06/2012

From the Modeling side Using known concepts in unusual contexts is a

source of unexpected insights Discoveries happen on the boundaries of disciplines “Seeing is believing” and convincing. Then the

domain experts become a source of ideas

From the Visual Analysis side Quick prototyping and experimentation capabilities

are critical (that is, STM4 is a big help) No need of fancy visualizations. What are needed

are visualizations tuned to the problem at hand Data management is critical to keep order in the

data exploration

http://mariovalle.name/CrystalFphttp://mariovalle.name/CrystalFp

http://mariovalle.name/STM4http://mariovalle.name/STM4

Going together…

Thank you!Thank youfor your attention!

And don’t forget: [email protected]