compressed sensing meets symbolic regression: sisso · compressed sensing: sisso sis:...

Post on 01-Feb-2021

9 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

  • Compressed sensingmeets

    symbolic regression:SISSO

    - Part 2 -

    Luca M. Ghiringhelli

    On-line course on Big Data and Artificial Intelligence in Materials Sciences

  • P = c1d1 + c2d2 + … + cndn

    Compressed sensing, not only LASSO

  • Residual1P (property)

    d1d2

    P = c1d1 + c2d2 + … + cndn

    Residual1P (property)

    d1d1*d2

    d2*

    Compressed sensing, not only LASSO

    Greedy method:Orthogonal Matching Pursuit

    Limitation of greedy methods:

  • Compressed sensing: SISSO

    SIS:Sure-Independence Screening

    S2DS1D

    featuresResidual1D

    features

    P (property)

    Ouyang et al., PRM 2018, DOI: 10.1103/PhysRevMaterials.2.083802

  • Compressed sensing: SISSO

    SIS:Sure-Independence Screening

    S2DS1D

    featuresResidual1D

    features

    P (property)

    SO:Sparsifying Operator

    Exact (by enumeration) overSimilarity criterion in SIS step:● Scalar product (Pearson correlation)● Spearman correlation (captures nonlinear monotonicity)● Mutual information, …● However: computational cost is to be factored in

  • Compressed sensing: SISSO

    SIS:Sure-Independence Screening

    S2DS1D

    featuresResidual1D

    features

    P (property)

    SO:Sparsifying Operator

    Exact solution of:

    by enumeration over

    Ouyang et al., PRM 2018, DOI: 10.1103/PhysRevMaterials.2.083802

  • Compressed sensing: SISSO

    SIS:Sure-Independence Screening

    S2DS1D

    featuresResidual1D

    features

    P (property)

    SO:Sparsifying Operator

    Exact (by enumeration) over

    In practice:0. i = 1, S = Ø1. Rank features according to similarity to Residuali-1 (Property = Residual0). 2. Add first k features to S. 3. Perform least-square regression over all n-tuples in S.4. The lowest error model is the i-dimensional SISSO model.5. i ← i+1; goto 1.

    P = c1d1 + c2d2 + … + cndn

  • Predicting crystal structures from the composition

    Octet binaries (NaCl, ZnS, BN, KF, GaAs, CaO, …)Rock-salt or Zinc-blende structure?

    Learning the relative stability from the property of the isolated atomic species

    Rock salt6-fold coordinationIonic bonding

    Zinc blende4-fold coordinationCovalent bonding

  • KS le

    vels

    [eV]

    Valence p

    Valence sRadius @ max

    example: Sn (Tin)

    Valence p (HOMO)

    Valence sKS level s [eV]

    LUMO

    Atomic features

  • exp(x)

    xn

    Energy2 Energy1

    | x - y |

    x / y

    Length1 Length2

    x / y

    exp(-x)

    ln(x)

    Systematic construction of candidates

    Length1 Length2

    x + y

    x·y

    arctan(x)

    Length1 Length2

    x / y

    exp(-x)

    Energy2 Energy1

    | x - y |

  • Systematic construction of candidates

    P = c1d1 + c2d2 + … + cndnEach feature (column in the matrix), is a tree-represented candidate function, projected onto the training data.The (selected) descriptor has as component the features selected by the sparse recovery algorithm (here, SISSO).

  • Structure map from SISSOstarting from 7x2 atomic features

    LMG et al., PRL 2015 DOI: 10.1103/PhysRevLett.114.105503LMG et al., NJP 2017 DOI: 10.1088/1367-2630/aa57bf

    Predicting crystal structures from the composition

    P = c1d1 + c2d2 + …

  • In SISSOthe “hyperparameters” are:

    The level of sparsity i.e., the number of “activated” features in P = c1d1 + c2d2 + …

    The size of the feature spacedetermined by the complexity of the tree

    Tuned via cross-validation: Iterated random selection of a subset of the data for training + test on the left-out set

    Data-driven model complexity

    Ouyang et al. J. Phys. Mater. 2019 DOI: 10.1088/2515-7639/ab077b

  • In SISSO the “hyperparameters” are:

    The level of sparsity i.e., the number of “activated” features in P = c1d1 + c2d2 + …

    The size of the feature spacedetermined by the complexity of the tree

    Tuned via cross-validation: Iterated random selection of a subset of the data for training + test on the left-out set

    Two levels of the tree, formulas like

    Three levels of the tree, formulas like

    Data-driven model complexity

    Ouyang et al. J. Phys. Mater. 2019 DOI: 10.1088/2515-7639/ab077b

  • Compressed-sensing-based model identification:Shares concepts with

    ● Regularized regression. But: Massive sparsification.

    ● Dimensionality reduction. But supervised, and yielding sparse, “interpretable” descriptors

    ● Features (basis-set) selection. But: non-greedy solver.

    ● Symbolic regression. But: deterministic solver.

    Few bits of taxonomy for SISSO

  • Compressed-sensing-based model identification:Shares concepts with

    ● Regularized regression. But: Massive sparsification.

    ● Dimensionality reduction. But supervised, and yielding sparse, “interpretable” descriptors

    ● Features (basis-set) selection. But: non-greedy solver.

    ● Symbolic regression. But: deterministic solver.

    Few bits of taxonomy for SISSO

    Open challenges of symbolic regression + compressed sensing approach:● Efficiently include constants and scaling factors in the symbolic tree● Include known, physical invariances in the symbolic-tree construction● Include vectors (and tensors) as features. Contractions?

  • Intepretability

    James, Witten, Hastie, Tibshirani, An Introduction to Statistical Learning, Springer (2013)

    Flexibility/complexity

    Inte

    rpre

    tabi

    lity

    ● Sparsifying methods, LASSO, SISSO, Symbolic regression● Linear regression

    ● Kernelized regression● Trees

    ● Forests● Support vector machine

    ● Neural Networks

    Model Interpretability: related to sparse features selection

  • Intepretability

    James, Witten, Hastie, Tibshirani, An Introduction to Statistical Learning, Springer (2013)

    Flexibility/complexity

    Inte

    rpre

    tabi

    lity

    ● Sparsifying methods, LASSO, SISSO, Symbolic regression● Linear regression

    ● Kernelized regression● Trees

    ● Forests● Support vector machine

    ● Neural Networks

    Model Interpretability: related to sparse features selection

    In general, with symbolic regression:● If the exact equation is within reach of the searching/optimizing algorithm,

    it is found. Simple model does not necessarily mean less accurateFor other powerful ML methods (kernel regression, regression treesand forests, deep learning, this is not the case.

    ● The few fitting parameters yield stability with respect to noise (low complexity no overfitting)→

  • x Atomic fractionIE Ionization energyχ Electronegativity

    Intepretability: what might endow us with

  • x Atomic fractionIE Ionization energyχ Electronegativity

    Intepretability: what might endow us with

  • HgTe (std pressure, ZB)GaAs (std pressure, ZB)

    CdTe (std pressure, ZB)

    (9 GPa, RS)(29 GPa, oI4)

    (4 GPa, RS)

    Intepretability: what might endow us with

  • Intepretability: what might endow us with

  • Multi-task learning

  • Application: multi-phase stability diagramProperties: crystal-structure formation energies

    d1

    d2 RS

    CsCl

    Multi-task learning

  • Multi-task learning

    Ouyang et al. J. Phys. Mater. 2019 DOI: 10.1088/2515-7639/ab077b

  • Multi-task learning

    Ouyang et al. J. Phys. Mater. 2019 DOI: 10.1088/2515-7639/ab077b

  • MT-SISSO is remarkably

    data-parsimonious

    Multi-task learning

    Ouyang et al. J. Phys. Mater. 2019 DOI: 10.1088/2515-7639/ab077b

  • Intepretability

    James, Witten, Hastie, Tibshirani, An Introduction to Statistical Learning, Springer (2013)

    Flexibility/complexity

    Inte

    rpre

    tabi

    lity

    ● Sparsifying methods, LASSO, SISSO, Symbolic regression● Linear regression

    ● Kernelized regression● Trees

    ● Forests● Support vector machine

    ● Neural Networks

    Model Interpretability: related to sparse features selection

    In general, with symbolic regression:● If the exact equation is within reach of the searching/optimizing algorithm,

    it is found. Simple model does not necessarily mean less accurateFor other powerful ML methods (kernel regression, regression treesand forests, deep learning, this is not the case.

    ● The few fitting parameters yield stability with respect to noise (low complexity no overfitting)→

    Slide 1Slide 2Slide 6Slide 8Slide 12Slide 14Slide 20Slide 21Slide 22Slide 23Slide 25Slide 26Slide 31Slide 32Slide 37Slide 38Slide 39Slide 40Slide 41Slide 43Slide 44Slide 45Slide 50Slide 51Slide 52Slide 53Slide 54Slide 55

top related