application of deep reinforcement learning for …
TRANSCRIPT
APPLICATION OF DEEP REINFORCEMENT LEARNING
FOR BATTERY DESIGN
A Thesis presented to
the Faculty of the Graduate School
at the University of Missouri
In Partial Fulfillment
of the Requirements for the Degree
Master of Science
by
DONGPENG LIU
Dr. Dong Xu, Thesis Supervisor
JULY 2020
The undersigned, appointed by the Dean of the Graduate School, have examined
the thesis entitled:
APPLICATION OF DEEP REINFORCEMENT LEARNING FOR BATTERY DESIGN
presented by Dongpeng Liu,
a candidate for the degree of Master of Science and hereby certify that, in their
opinion, it is worthy of acceptance.
Dr. Dong Xu
Dr. Jianlin Cheng
Dr. Jian Lin
ACKNOWLEDGMENTS
I would like to thank Dr. Dong Xu for his supportive instructions. He pointed
out the direction for me. He also helped me to open up the research ideas and to
foster my research taste. Thanks to his guidance, I know how to do a good job in
society and contribute my own strength to the society. which not only helped me in
my Master’s study, but also gave me enlightenment for my future career.
I want to thank my colleagues and mentors in Automat Inc. for their professionally
support in battery, robotic control and machine learning. Their domain knowledge
helped me extend this project when I interned at the company. I also want to thank
my parents, who have always supported me and thanked them for being so good. I
want to thank my classmates and labmates: we solved many problems together with
classmates’ company, and my amiability fellows in the lab lead me to the way to
research.
ii
TABLE OF CONTENTS
ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . ii
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
CHAPTER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Problem of material research and development . . . . . . . . . 1
1.2 Related works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.1 Steps of machine learning application project . . . . . . . . . 5
1.4 Problem formulation of battery recipe generation . . . . . . . . . . . 6
1.4.1 Prediction problem . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4.2 Generation problem . . . . . . . . . . . . . . . . . . . . . . . 8
1.5 Materials Artificial Intelligence Robotics-driven System (MARS) . . 11
2 Data preprocessing and representation . . . . . . . . . . . . . . . . 18
2.1 Data collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2 Data cleaning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3 Data visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4 Feature engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.4.1 Vector representation . . . . . . . . . . . . . . . . . . . . . . . 29
3 Prediction models and experiments . . . . . . . . . . . . . . . . . . 32
iii
3.1 Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.1.1 Conductivity prediction model . . . . . . . . . . . . . . . . . . 32
3.1.2 State-of-the-art methods on structural data prediction . . . . 33
3.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4 Generation models and experiments . . . . . . . . . . . . . . . . . . 40
4.1 Models for structural data generation . . . . . . . . . . . . . . . . . . 40
4.1.1 Formulated as an optimization problem . . . . . . . . . . . . . 40
4.1.2 Markov Decision Process for battery recipe generation . . . . 41
4.2 Model training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.2.1 Bayesian optimization setting . . . . . . . . . . . . . . . . . . 47
4.2.2 Training Reinforcement Learning model . . . . . . . . . . . . 47
4.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
iv
LIST OF TABLES
Table Page
2.1 Parameter and target inputs . . . . . . . . . . . . . . . . . . . . . . . 21
2.2 Formulation examples of 6 domains, selected from database. . . . . . 27
2.3 One-hot-like example recipe . . . . . . . . . . . . . . . . . . . . . . . 29
3.1 Formulations generated by machine learning . . . . . . . . . . . . . . 38
v
LIST OF FIGURES
Figure Page
1.1 MARS workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.2 MARS system architecture . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3 MARS machine learning model . . . . . . . . . . . . . . . . . . . . . 14
1.4 MARS machine learning flow chart . . . . . . . . . . . . . . . . . . . 15
2.1 Data flow and data QA/QC . . . . . . . . . . . . . . . . . . . . . . . 19
2.2 Box plot for each dimension of input . . . . . . . . . . . . . . . . . . 25
2.3 Conductivity distribution plot . . . . . . . . . . . . . . . . . . . . . . 26
2.4 Feature correlation plot . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.1 Deep neural network architecture . . . . . . . . . . . . . . . . . . . . 37
3.2 Scatter plot of LightGBM prediction result and ground truth . . . . . 39
3.3 Scatter plot of XGBoost prediction result and ground truth . . . . . . 39
3.4 Scatter plot of Neural Network prediction result and ground truth . . 39
4.1 DDGP Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2 Generated conductivity result w.r.t iteration . . . . . . . . . . . . . . 49
4.3 Generated conductivity plot . . . . . . . . . . . . . . . . . . . . . . . 52
4.4 Bayesian optimization conductivity plot . . . . . . . . . . . . . . . . 52
4.5 Standard deviation of generated recipes . . . . . . . . . . . . . . . . . 53
vi
ABSTRACT
The conventional material research and development are mainly driven by human
intuition, labor, and manual decision. It is ineffective and inefficient. Due to the
complexity of material design and the magnitude of experimental and computational
work, the discovery of materials with conventional methods usually takes very long
development cycles (10-20 years) with enormous labor and costs. To address this
challenge, we proposed a machine-learning framework called Material Artificial Intel-
ligence Robotics-driven System (MARS), aiming to reduce the costs with the help of
machine learning techniques.
We applied advanced deep-learning networks to better predict conductivity. We
explored neural network models and tree-based models such as LightGBM. In partic-
ular, we made the models more interpretable and identified the relationships between
the electrolyte’s composition and the ionic conductivity. To search for the optimal
conductivity, we developed a sophisticated deep reinforcement learning (RL) model
called DDPG (Deep Deterministic Policy Gradient) to explore novel recipes to reach
much higher conductivity. DDPG begins the RL process by entering new states
through actions, where each action at a specific state (with a one-hot vector, repre-
senting selections of electrolyte components) would yield a reward Q, trained by the
predictor developed in the previous step. After the optimal compositions have been
found for the maximum conductivity, voltage stability and modulus, new measure-
ments would be conducted to confirm these compositions. The new measurement
data were then fed back to improve the prediction model. In this way, the prediction
model is constantly being updated by each RL prediction. Once a successful update
has been made to the prediction model, the whole process iterates. Finally, a well-
trained DDPG model combines the benefits of both Q-learning and Policy Gradient
method. DDPG is faster, simpler, more robust, and able to achieve much higher
vii
conductivity than conventional search methods.
Finally, the model could provide compositions that lead to higher conductivi-
ties than the highest conductivity in the training data. Then, we generated more
training data according to these compositions to retrain the prediction model. The
generated recipes have been attested both by machine learning metrics and wet lab
experiments. The generated best conductivity (2.51e−3) has meet our expectations
of battery recipes.
viii
Chapter 1
Introduction
1.1 Background
1.1.1 Problem of material research and development
The mainstream material R&D today is mainly driven by human intuition, labor,
and manual decision. It is ineffective and inefficient given the complexity of material
design, and the magnitude of experimental and computational work. Thus, material
discovery sometimes looks like “treasure hunt” – with long development cycles (10-20
years) and occasional lucky breakthroughs. Besides, it is very difficult to solve most
of the complicated problems in materials exploration by using only mechanics and
statistical mechanics (named first principles), although many mechanics approaches
are truly helpful in materials discovery and optimization [1].
The electrolyte materials discovery has attracted people’s technical interest be-
cause of its possible applications in various electrochemical devices such as fuel cells,
solid batteries, etc. It is estimated that chemical materials’ R&D spending is $50 bil-
lion yearly, but currently the applications of software technologies in general accounts
1
for less than 1% [2]. The technical challenges that need to be addressed include: (1)
achieving parallelized mechanical measurements which otherwise are usually devised
in a series fashion; (2) designing sample holders that are compatible with complex
formulated samples such as the polymer electrolyte; (3) collecting meaningful data
that is interpretable and convertible to mechanical property values.
This work will primarily develop a model for battery materials with the first focus
on polymer electrolytes. The electrolyte is a key component in next-gen lithium-metal
batteries that double energy densities and halve the cost with improved safety [3],
which is urgently needed, such as in electric vehicles (EV). EV battery performance
is one major bottleneck. There is an urgent need to accelerate battery material
innovation that leads to batteries that show long mileage, durability, low cost, safety
and fast charging.
Specifically, the machine learning-based material discovery workflow combines:
(1) initial knowledgebase collection, including parameters (e.g., material compositions
and physical properties), and objective functions (e.g. conductivity and durability);
(2) AI model training and learning using the knowledgebase; (3) experimental design
aimed towards the optimal solution by the AI model; (4) parallelization experimen-
tation via high-throughput platform; and (5) knowledgebase updates based on new
results. This process iterates towards the global maximum of material performance.
One analogy would be the self-driving car, which uses algorithms and automation to
make its driving decisions.
Our contribution includes a system and a method for a Materials Artificial Intel-
ligence Robotics-driven System (MARS). MARS includes a machine learning frame-
work, a knowledge data base that includes training data, a robotic preparation module
and a robotic testing module. The provide the advantage of accelerating advanced
materials and device research and development. Various embodiments of MARS are
centralized, autonomous, combinatorial, and closed-loop with combine machine learn-
2
ing and robotic high-throughput automation. According to various embodiments,
MARS can be implemented to discover new high-performance battery materials and
improve existing battery materials.
1.2 Related works
There are several existing robotic and high throughput systems that help to design
and execute R&D experiments more rapidly and efficiently. There are also Artificial
Intelligent applications that used to predict material properties like Lattice thermal
conductivity [4] and band gap [5]. therefore, speeding up product properties testing
like battery life cycles. We have witnessed tremendous progress and great benefits in
machine learning with robotic automation and material science [6], both in academia
and in industry. However, there are only a few studies on conductivity measurement
by AI. Among them, the relevant one would be [7], who employed multi-layer neural
network for only one fixed type of polymer electrolyte. To make progress on the
limitation of polymer type, we could consider employing deep learning with huge
amount of data, and generative model for data reproduction.
High-throughput experimentation allows parallelizing many experiments at one
time, via automation thus greatly compressing research time-to-market [8] [9]. In
materials science, for example, companies such as Wildcat Discovery [10] and In-
termolecular have applied high-throughput robotic tools to prepare large number of
samples and screen them for properties such as battery coulombic efficiency and di-
electric constant. We employed high-throughput experiment to cope with challenge
1.
Recent emerged Materials Informatics approaches aim to accelerate materials dis-
covery, with the help of machine learning and big data [11]. As a strong supplement
of the first principles strategy, machine learning is a strategy that uses known data
3
about properties and descriptors (including both computational and experimental
parameters) of some materials to find semi-empirical rules, explicitly or implicitly,
and uses these rules to predict and evaluate the properties of unknown materials [12].
Conventional machine learning methods have been applied in as a conductivity clas-
sifier, among them, tree-based methods shows promising results over other methods
[13]. They further analysed the factors (or nodes) in tree-based methods, such as
Li-salt content or temperature threshold. However, they haven’t deeply investigated
the continues space, as well as the further optimization in continues space. Neural
networks inherently suitable for continues problems, and its feasibility on conductiv-
ity has been tested [14]. However, the model is trained by a few of samples, while the
real word material information contains huge amount of data for models. We should
leverage more computing ability of deep learning models for better prediction.
Many generative algorithms, such as reinforcement learning, have found their
applications in new areas such as developing drugs [15] [16], discovering materials
[17], and managing supply chain [18]. However, researches on generation of electro-
chemistry formulations such as battery recipes, are relatively scarce. A close ap-
plication relevant to battery recipe would be Li-salt structural design aided by AI
[19]. Robotic systems are widely expanding into manufacturing, R&D, and Internet
of Things (IoT) [20] [21]. Machine learning and robotic arms are combined to tackle
with challenge 2 and 3.
1.3 Machine Learning
Machine Learning is a kind of special methods or algorithms that given plenty of
data, using statistical techniques and computer algorithms to extract and find the
hidden pattern among these data.
Machine learning explores the study and construction of algorithms that can learn
4
from and make predictions on data. The whole flow of the machine learning prob-
lem can describe as below: by making data-driven predictions or decisions, through
building a model from sample inputs. Machine learning is employed in a range of
computing tasks where designing and programming explicit algorithms with good
performance is difficult or infeasible; example applications include email filtering,
detection of network intruders, and computer vision.
1.3.1 Steps of machine learning application project
Machine Learning is a computational statistics problem. The development flow in-
cludes data preprocessing, modeling and evaluation. Explanations are following:
Data preprocessing: It’s the first and foremost important step of the whole ma-
chine learning project. The performance of our machine learning model is related
to the size of high quality dataset. We should conduct some appropriate processing
methods for our data, such as data normalization and data vectorization of data char-
acteristics, to enhance their representational ability, and avoid our machine learning
model becoming so complex. We also need to split the dataset into training set, test
set and validation set.
Modeling: We need to formulate our problem in a specific, machine-solvable form.
Every machine learning problem should have an ”X” with component features (while
we can name it as “input data” ) and which kind of feature is target ”Y” that we
want our machine learning algorithm to yield. The model selection is based on data
type and dataset size, etc. Our tabular and category data then is naturally suitable
for tree-based methods. We train and fine-tune the model by using the data we
pre-processed in the previous step.
Evaluation: The evaluation is the final step of the whole project, we need to
find the best index for our output. Since our problem have multiple models solving
different problems, every problem should have individual criteria. Our prediction
5
model employ commonly used metric, MSE, and our generation model is assessed
with a domain-specific metric. We use test data to evaluate our model, and compare
our result with other models.
Upon finishing the above steps, a complete loop is done. We shall analyse our
result to decide whether or not to conduct another experiment-analyse loop.
1.4 Problem formulation of battery recipe gener-
ation
Generation new recipe of battery is a long-stand, meanwhile hard to complete require-
ment. The difficulty for solving it lies in the huge variance of the intrinsic information
of structure data that fits within fixed fields and columns in relational databases and
spreadsheets.
For data generation, there are many researches on image, audio and text gen-
eration. By contrast, the generation of structural data is much less studied than
prediction of structural data. Common practices are to learn from the structural
data for other goals. Researchers apply neural networks to generate texts from struc-
tured data [22]. Potential applications include auto-generating news articles, weather
reports and industry reports. Here we consider generating another structural data
from source structural data, in other words, from electrolyte materials database to
battery recipe.
What makes the generation problem complex is, the structural input data may
have columns with rambling inter-relationships. Taking the house price prediction
problem as an example, the number of family member has associations with the num-
ber of room they have, and the location can nearly determine the house price. More-
over, sometimes the generated data should meet some standard, or preference. For
human face generation, a more natural human-like face is preferred. For the battery
6
recipe generation, a higher conductivity is preferred. Searching for higher conduc-
tivity neutrally forms a optimization problem, Therefore the optimization methods,
regardless of with or without constrained, are worth exploring [23]. If we treat the
target output (conductivity) as one of the constrains, there are other feasible un-
supervised machine learning methods, for example, Variational Autoencoder (VAE)
[24], Generative adversarial networks (GAN) [25] and Reinforcement Learning (RL)
[26].
Optimization algorithm describes how a combination and variance of x can mini-
mum the output y [23] [27]. Almost all machine learning algorithms ultimately come
down to the maximum or minimum of an objective function of optimization problem.
For example, for supervised learning, we need to find an optimal mapping function f(x)
to minimize the loss function (empirical risk or structural risk) of training samples.
Or, find an optimal probability density function p(x), make logarithmic likelihood
function of training sample maximization (maximum likelihood estimation).
VAE and GAN based methods [28] generate data by estimate and mimic the
distribution inside the input data, mostly assuming Gaussian distribution as initial
distribution. However, distributions for some attributes are sometimes counterintu-
itive and contrary to nature; columns data such as polymer kind and solvent kind
could be arbitrary chosen. In addition, to add constrain to GAN, common solution
is devising a new loss for extra generator updating, which also require the constrain
variable, conductivity, to function as label or output, not input. AlphaGo [29] has
demonstrated the capability for Reinforcement Learning to search large space of 10170
for optimal solutions. We believe such algorithms can solve material problems and
greatly enhance performance.
These approaches draws the same conclusion that, an explicit map from x to y
should be determined before we employ whether optimization methods or unsuper-
vised learning methods. A function or model fittingly describing the relationships of
7
x and y is needed. Thus, we will resolve the whole generation process into a two-steps
pipeline: first we build and refine the prediction model, then use it in the following
generation model.
1.4.1 Prediction problem
For prediction model, there are lots of structural data available in Kaggle competitions
[30] that we can learning and analysis from. Among the winners’ solution, the most
common and efficient way to pre-process structure data is feature engineering [31].
Overall, we focused on prediction model as preparation, then compared different
generation model with or without the prediction. Because the prediction can be a
totally different sub field with generation, we dedicate a whole chapter for building
and then refining it. We have successfully run two rounds of sample preparation using
our machine learning model, for the electrolyte materials in lithium metal batteries.
The electrolyte components are encoded as one-hot vector as input for a regression
model (the predictor). The model was trained using over 1000 experimental samples.
The validation result shows the accuracy of prediction to the level of industrial ap-
plication. The prediction model was further applied to our RL generation model as
the environment in explore novel formulations.
We believe this process could generalizing to other structural data generation
problem, which is lack of examined in researches.
1.4.2 Generation problem
We could intuitively model the chemical reaction process for recipe generation is
formulated as learning a reinforced agent, which performs discrete actions of slight-size
addition or removal in a chemistry-aware Markov Decision Process (MDP). Herein, we
include a assumption that chemical reaction process for recipe generation has Markov
8
property. MDPs are a classical formalization of sequential decision making, where
actions influence not only immediate rewards of current state, but also subsequent
states through those future rewards [26]. Thus MDPs involve delayed reward and
the need to trade-off immediate and delayed reward. They are useful for studying
optimization problems, here defined by optimizing conductivity of recipes. We then
employ reinforcement learning to solve this MDP problem.
The MDP M formally have components: states, actions and rewards (M =
S,A,R), where each term is defined as follows:
S = St is the state, whose value can be all possible intermediate and final generated
recipes. Each St is a tuple of state and its corresponding time step, denoted as (s, t).
Here we consider the case of finite MDP, that the set of states, actions and rewards all
have a finite number of elements. Also, all of the three components are defined discrete
with regard to time, presenting as dependence on preceding component. For the initial
state S0, we randomly chosen from a combination of our battery material recipe
database and those already generated recipes. MDP modeling typically required an
ending state tailed a series of states forming episode. We explore the episodic case
by limiting the maximum number of time steps T in our tabular chemical-reaction
based MDP, after T time steps the episode will end and then start a new episode.
A = At denotes a set of actions that describe the modification made on the current
state (intermediate recipe) at each time step t. Action space here is same as state
space, the only difference is that the modifications in action are often micro-scale in
comparison with state space. We enforce this because we want to simulate a chemical
reaction environment where each component is added gradually, following suggestions
from [32]. Therefore, the space is also continues, represented by a distribution of each
component.
Rt is the reward function that specifies the reward after reaching state St, with
discount factor γ. This hyper-parameter is set to 0.9 in our study. In our framework,
9
the state will be post-processed to a valid and complete structure form at each step.
That is, all component content sum should be 1, and all component content should
be larger than or equal to 0. Note that in our virtual environment, A reward is given
not just at the terminal states, but after each action step. Both intermediate rewards
and final rewards are used to guide the behavior of the reinforcement learning (RL)
agent, avoided delayed or sparse reward issue as many other reinforced frameworks
suffered [33]. Furthermore, to ensure that the last state is rewarded the most, we use
γ to discount the value of the rewards at state St. In addition, our reward function
consider the similarity of recipes, in order to avoid generate many repeated recipes.
Reinforcement learning
To solve an MDP, conventional approaches such as Dynamic Programming (DP) and
Monte Carlo harness the iterative nature of MDP problem. Reinforcement learning is
an iterative process, each iteration to solve two problems: given a strategy evaluation
function, and according to the value function to update the strategy. Methods of
reinforcement learning can be considered to achieve a similar effect to DP, weaken
the assumptions of the known accurate environment model or to calculate less. The
DP method is generally used for finite MDP problems, where the set of states, actions,
and returns are finite. For continuous state action space problems, optimal solutions
are obtained only in special cases.
The DP-based method requires an environment model, while the Monte Carlo-
and TD- (Temporal Difference) based methods do not require an environment model.
The former is called model-based method [26], uses a model of the environment for
planning, while the latter is not model-free methods, which learn from the experience
of directly interacting with the environment. If the model is used to enhance the
strategy, the biggest benefit of introducing environment modeling is that it can make
better use of prior knowledge and improve learning efficiency. Model-free methods do
10
not try to learn environment dynamics and reward function, which have an advantage
in saving computation and space for more trials.
Reinforcement learning algorithms can be divided into three categories: value
based, policy based and actor-critic based models. The commonly used value based
algorithm, such as DQN, has only one value function network without policy network,
while the actor-critic algorithm represented by DDPG (Deep Deterministic Policy
Gradient) has both value function network and policy network. DDPG is also model
free and off-policy, and also USES depth neural network for function approximation.
However, unlike DQN, whose vanilla version merely capable to solve the problem of
discrete and low-dimensional action spaces. DDPG can solve the continuous motion
space problem by introducing action policy modeling. In addition, DQN is the value
based method, that is, there is only one value function network, while DDPG is the
actor-critic method, that is, there is both value function network (critic) and policy
network (actor).
1.5 Materials Artificial Intelligence Robotics-driven
System (MARS)
We employ our MARS platform on polymer electrolyte. The electrolyte consists of
polymers, lithium salts, plasticizers and solvents. Experimentally, the formulations
recommended by the model are prepared by:
1) Weighing the components according to composition recommended by the model.
For example, a formulation consists of polymer A with 50 wt%, lithium salt B with
40 wt%, plasticizer C with 9 wt% and additive D with 1 wt%. For preparing this
formation, we weigh 5 g of polymer A, 4 g of salt B, 0.9 g of plasticizer C, and 0.1 g
of additive D.
2) Dispensing the components of each formulation into a vial. Please note the
11
plasticizer is not added in this step.
3) Dissolving, blending and suspending components in a solvent by magnetic mix-
ing. The ratio between the sum of all components and the solvent is 50:1 to 2:1 by
weight.
4) Removing the solvent by heating.
5) Dispensing the plasticizer(s) onto the polymer electrolyte film and letting them
diffuse homogeneously.
Currently, there are 8 solvents to be used for acceptable solubility. In this step,
viscosity is a key processing parameter to optimize. Besides, stock solutions are used
for shortening the preparation time. Figure 1.1 outlines the experimental workflow
on polymer electrolyte preparation.
Figure 1.1: MARS workflow.
Then, the samples are loaded onto characterization modules. High-throughput
characterization is carried out and data is collected. The data is fed back to the
AI model, for it to learn and improve. The closed loop iterates and converges into
materials that meet our requirements.
Figure 1.2 shows a block diagram of the MARS example system, which includes
a machine learning model with input training data database, which is constantly up-
dated to adapt the machine learning model based on our actual experimental test
12
System User Device
Machine LearningModel Module
Training & TuningModule
Robotic PreparationModule
UI & API Module
Robotic TestingModule
Training Database
Application Engine User Interface
Machine Learning Model
ReinforcementLearning DNN
LightGBM ...
Figure 1.2: MARS architecture diagram, illustrating an exemplary environment inwhich some embodiments may operate.
results. The system includes a machine learning module, a robotic preparation mod-
ule, a robotic testing module, a training tuning module and user interface (UI) mod-
ule. The system can communicate with the user device through the user interface
generated by the application engine to display output (such as suggested recipes and
test results). Machine learning models and databases may further become part of the
system. The way the database is deployed may affect retrieval and storage efficiency
and/or data security. Taken together, we use mongoDB [34] as the database.
The machine learning module, as Figure 1.3 shows, uses a machine learning model
to generate one or more suggested recipe outputs. The machine learning module out-
puts the proposed recipe to the robot readiness module mixes and prepares instances
of each proposed recipe, and stores each prepared recipe instance as part of an electro-
chemical module. The robot preparation module provides the connection between the
robot test module and the electro-chemical module. The robot test module performs
13
generate
Machine LearningModel Module
Robotic PreparationModule
Electro-Chem Module
RecipeDeposits
testing feedbackRobotic TestingModule
Proposed Recipes
TraingingData
Figure 1.3: MARS machine learning model diagram, illustrating an exemplary envi-ronment in which some embodiments may operate.
one or more tests on any stored recipe instance and generated test results. The robot
test module will input the proposed formula and test results into the training data,
and the training and tuning module can further tune the machine learning model
according to the proposed formula and test results.
As shown in Figure 1.4, MARS receives requests for one or more optimized tar-
get functions for the battery selection portion (such as polymer electrolyte, liquid
electrolyte, cathode, or anode). Using machine learning models, MARS generated
a number of different formulations of the battery material optimization at least the
objective function. A machine learning model may include a training optimization
module that trains the machine learning model based on the training data, including
one or more parameter types, such as (for example): one or more chemicals, one or
more components, one or more components, one or more physical properties, and one
or more processes. Machine learning models can also be trained based on the training
data of one or more objective functions corresponding to parameter types. Machine
learning models are tuned by training and tuning modules to identify one or more
14
Generate, by a machine learning network, a plurality of proposeddifferent recipes of batterymaterials optimization of at least objectivefunction
Prepareaninstanceofatleastoneoftheproposeddifferentrecipesofbatterymaterialsviaaroboticpreparationmodule
Deposit the instance of the proposed different recipe of batterymaterials into an electrochemical module via the robotic preparationmodule
Executeapluralityofformulationcharacteristictestsoneachdepositedinstanceintheelectrochemicalmoduleviaarobotictestingmodule,therobotictestingmoduleloadedwithapluralityofdifferenttestsforoneormoreofthebatterymaterials
Updatethemachinelearningmodel,viatherobotictestingmodule,witharesultofatleastoneoftheformulationcharacteristictests
Figure 1.4: MARS machine learning flow chart, illustrating an exemplary methodthat may be performed in some embodiments.
15
combinations of parameter types that are compatible to a certain extent to create the
desired optimization of one or more objective functions.
According to various embodiments, machine learning model training is carried out
for polymer electrolyte formula components such as polymer components, lithium salt
components, plasticizer components and additive components. However, according
to various embodiments, the plasticizer and/or additive assembly may be an op-
tional formulation assembly. Additional training data may include training data such
as physical properties, composition, and viscosity encoded prediction signals based
on direct relationship composition (i.e., formula) and viscosity, direct relationship
prediction, higher concentration electrolyte formula components (without plasticizer
components) results in a higher viscosity.
MARS performs multiple formulations for each deposition instance in the elec-
trochemical module using the robotic test module, which performs multiple tests for
one or more battery materials. For example, a robot test module may have one or
more stored experiments and test protocols that will be applied to one or more stored
instances of proposed recipes. The robot test module can apply different experiments
and test protocols to different storage instances of the proposed recipe and determine
the results of each different experiment and test protocol. Through the robot test
module, MARS updates the machine learning model to get at least one formula per-
formance test. For example, the robot test module can add the proposed formula and
its corresponding test results to the training data, and the training and tuning mod-
ule can update and tune the machine learning model based on the updated training
data.
Understandably, Figure 1.4 operations can be repeated in different order or exe-
cuted in parallel. In addition, the behavior of the models and methods in the example
might occur on two or more computers, for example, in an online network environ-
ment. Some behaviors may occur on the local computer, while others may occur on
16
the remote computer. Further understood, the behavior of the flowchart 1.4 can be
performed iteratively in order to converge to the final multiple formulations.
17
Chapter 2
Data preprocessing andrepresentation
2.1 Data collection
We plan to search a vast space of materials via our MARS system designs and pro-
cesses. The space consists of: material and component data such as the chemical
structure and the particle size. On the chemical structure, in the polymer electrolyte
use case, there are four categories of chemical structure that are represented in the
training data, namely the polymer, the lithium salt, the additive, and the plasticizer.
Regarding their functionality, the polymer conducts the lithium ion, and provides
mechanical integrity to the polymer electrolyte. The lithium salt is the source of
lithium ions. The additive improves electrolyte properties such as ionic conductivity
and voltage stability, and battery performance such as battery cycle life, safety and
charging rate. The plasticizer makes the polymer flexible and supports ion conduction.
To supplement the chemical list (See Table 2.2), other chemicals that fall into
the four categories above but have not been used in polymer electrolytes represent
the other set (Chemical Database) for the machine learning algorithms to perform
18
exploration. For example, from the chemical supplier Sigma-Aldrich website, we
collected about 250 different chemicals, with the data on for example, linear formula,
price, SMILES representation, molecular weight, melting point, density, toxicity, and
flash point.
On the machine learning exploration in this Chemical Database, the chemicals in
the training set (Table 2.2) are part of the Chemical Database. We keep the chemical
name and SMILES representation to be consistent between two sets. The Chemical
database has these four chemical categories as well. In this way, the machine learning
algorithms know there are new chemicals in the Chemical Database for them to
explore. Figure 2.1 depicts this.
Figure 2.1: Data flow chart and and data QA/QC.
In terms of particular chemical structures that would work as the components in
the polymer electrolyte, the following are guiding principles:
(1) On the polymer, there needs to be functional groups that can interact with
lithium ions and thus “dissolve” and conduct lithium ions. For example, polyethylene
oxide is a common polymer for the polymer electrolyte, the ether oxygen in polyethy-
lene oxide has lone ion pairs that can be shared with lithium ions. Other functional
groups such as nitrile, carbonyl, carboxylate ester, fluoride and amine.
(2) On the plasticizer, the chemicals with the following attributes are preferred: It
19
should be able to dissolve salts to sufficient concentration. In other words, it should
have a high dielectric constant. It should be fluid (low viscosity), so that facile ion
transport can occur. It should remain inert to all cell components, especially the
charged surfaces of the cathode and the anode, during cell operation. It should
remain liquid in a wide temperature range. In other words, its melting point (Tm)
should be low and its boiling point (Tb) high. It should also be safe (high flash
point Tf), nontoxic, and economical. (3) On the lithium salt, it should be able to
completely dissolve and dissociate in the nonaqueous media, and the solvated ions
(especially lithium cation) should be able to move in the media with high mobility.
The anion should be stable against oxidative decomposition at the cathode. The
anion should be inert to electrolyte solvents. Both the anion and the cation should
remain inert toward the other cell components such as separator, electrode substrate,
and cell packaging materials. The anion should be nontoxic and remain stable against
thermally induced reactions with electrolyte solvents and other cell components.
(3) On the additive, there are those used for improving the ion conduction prop-
erties in the bulk electrolytes, those used for SEI chemistry modifications, and those
used for preventing overcharging of the cells. However, on the other hand, it is
possible that the machine learning algorithm will discover new and novel polymer
electrolyte mechanisms among the chemicals available. Besides, liquid electrolyte in
lithium batteries serves as another use case. In this case, polymer is absent. There
are three the lithium salt, the additive, and the plasticizer (which is usually referred
to as the solvent in liquid electrolyte).
2.2 Data cleaning
In the polymer electrolyte use case, the polymer(s) and the lithium salt(s) are essen-
tial, and the additive(s) and the plastizer(s) are optional (also notice that we have
20
recipes with optional polymer). Table 2.1 lists the representative chemical types in
the training set. These chemicals have been reported.
In terms of required objective functions as inputs to train the model for polymer
electrolyte development, there are ionic conductivity, voltage stability and Young’s
modulus. Table 2.1 shows a complete list on the inputs to train the model for polymer
electrolyte development in our current knowledgebase, including the required ones
discussed above. In the use case of liquid electrolyte, the complete list on the inputs
to train the model is similar to Table 2.1. The polymer parameters, free-standing
parameters and mechanical parameters are not applicable here.
Table 2.1: Parameter and objective function inputs to
train the model in the use case of polymer electrolyte.
Parameters Electrolyte - Polymer Polymer 1 Type
Polymer 1 CAS #
Polymer 1 Vendor
Polymer 1 Product #
Polymer 1 Repeat unit
Polymer 1 Repeat unit MW
Polymer 1 MW
Polymer 1 Mw/Mn
Polymer 1 Density
Polymer 1 Voltage stability range
Polymer 1 wt%
Copolymer Type
Electrolyte - Li Salt 1 Li Salt 1 Type
Li Salt 1 CAS #
Li Salt 1 Vendor
21
Table 2.1 continued from previous page
Li Salt 1 Product #
Li Salt 1 Structure
Li Salt 1 MW
Li Salt 1 wt%
molar ratio and Li salt 2
Li Salt 2 conc. In liquid electrolyte / M
Li Salt 2 Solubility / g/g
Li Sat 2 pH in water
Electrolyte - Plasticizer plasticizer 1 Type
plasticizer 1 CAS #
plasticizer 1 Vendor
plasticizer 1 Product #
plasticizer 1 structure
plasticizer 1 repeat unit MW
plasticizer 1 MW
plasticizer 1 density / g/cm3
plasticizer 1 wt%
Electrolyte - Additive Additive 1 Type
Additive 1 CAS #
Additive 1 Vendor
Additive 1 Product #
Additive 1 size / nm
Additive 1 solubility / g/g
Additive 1 structure
Additive 1 MW
Additive 1 wt%
22
Table 2.1 continued from previous page
Electrolyte - Process Formulation Method
Solvent
Synthesis Method
Electrolyte - Formulation properties Tg1
Tm1
Tpc
Tg2
Tm2
Xc %
Tf
Td
viscosity
color
solubility
surface tension
temperature effect
Formulation objective functions - Electrolyte Electrolyte Conductivity / S/cm
Electrolyte Conductivity temperature / C
Electrolyte Feedstock Miscibility
Electrolyte Film visual uniformity
Electrolyte Film visual color
Electrolyte Free-standing
Electrolyte Li transference number
Electrolyte Li TN temperature / C
Li diffusion coefficient
Electrolyte Cathodic stability vs Li / V
23
Table 2.1 continued from previous page
Electrolyte Anodic stability vs Li / V
Electrolyte Anodic stability temperature
Electrolyte Li depo onset potential / V vs Li
Electrolyte Modulus / MPa
Electrolyte Adhesion
Electrolyte Tensile Strength / Mpa
Electrolyte Elongation / %
2.3 Data visualization
Normalization First, we want to know whether the target (conductivity) have
a proper distribution for machine learning models. That is, because our data is
collected by manual experiments and academic reports, the distribution may be highly
imbalanced (e.g. A distribution with long tail, such as some dimensions Figure 2.2
shown. Both our target y and our input X have imbalance problems in a way.
Inputting imbalanced distribution would dim the accuracy of the machine learning
model. Thus, we decided to transform the target y into a common distribution
such as normal distribution, which will conduce to a better performance of machine
learning model [35]. For our conductivity y, the origin data has a very large difference,
from 1e-3 to 1e-11, so we can’t directly use it as reward. After log transformation
and normalization, the range are reduced to range from zero to one. We applied
logarithm transform and normalization, yielding the new conductivity score shown in
distribution plot, Figure 2.3.
24
Figure 2.2: Box plot for each dimension of input. Need normalization of featuredistributions for easier modeling.
Analysis of numerical data. In addition to our current input features, we
also want to disentangle the relations of input dimensions. Correlation is a commonly
used metric for uncovering the relationship between two continuous variables. There
would be some certain reactions between some components (w.r.t type and size). We
measure the inter-relations of numerical dimensions, results shown in Figure 2.4. We
could conclude that the conductivity enhancement depends on the filler type and size.
Strong relationships are: 12:’Solvent 1 wt% and 13:’Solvent 2 wt%, 10:’Li Salt wt%
and 11:’molar ratio of (repeat unit+solvent) and Li salt’.
We tested some feature combinations in our prediction model, such as weight
percentage of Li salt multiply by weight percentage of all Solvent, discussed in later
Chapter. Note that we prefer a solid battery, which means that the less of solvent,
the better a recipe would be. Thus, after investigation we still take the origin one-hot
vector, multiplied with corresponding weight percentage described below.
25
Figure 2.3: Conductivity distribution plot. After transformation, the conductivityscore is nearly normal distributed.
2.4 Feature engineering
Table 2.2 provide lists of categories and chemicals of training data according to some
embodiments. According to various embodiments, the training tuning module trains
the machine learning model according to any portion(s) of the training data. Follow-
ing these forms, the machine learning model returns different formulations of multiple
battery materials, such as polymer electrolytes, liquid electrolytes, anodes or cath-
odes. All of them in the input are features that the model can learn from. Besides,
some of the feature maybe more important so we should emphasize them in the input
vector. Also, there would exist inter-related features that have redundancy or special
connection among them. Disentangling and handling these connections is what we
should do in feature engineering step.
For electrolyte formulations, machine learning models suggest polymers, plasti-
cizers, lithium salts and additives, and weight percentages for each component. Ac-
cording to various embodiments, the formulation presented in Table 2.2 is a polymer-
electrolyte formulation proposed by machine learning models in response to input
26
Figure 2.4: Feature correlation plot. Linearity relations of these numerical inputs arestuck out.
requests for optimized objective functions such as conductivity. That is, each formu-
lation based on machine learning models that predict the combination of ingredients
(polymers, lithium salts, plasticizers and additives) will converge to the best conduc-
tivity.
Table 2.2: Formulation examples of 6 domains, selected
from database.
Polymer A Type PBP PBP LPE16 LPE7
Polymer A wt% 17.4 17.6 16.7 17.5
Polymer B Type LAPPE1 LAPPE1 LAPPE4 LAPPE1
Polymer B wt% 17.4 17.6 16.7 17.5
27
Table 2.2 continued from previous page
Li Salt Type S2ALP S2ALP LBI LBI
Li Salt wt% 17.4 17.6 16.7 17.5
Plasticizer A Type DO DO Et Pr
Plasticizer A wt% 17.4 17.6 16.7 17.5
Plasticizer B Type N N D D
Plasticizer B wt% 12.8 11.9 16.7 12.4
Additive Type AO SMMS LMMS LMMS
Additive wt% 17.4 17.6 16.7 17.5
Machine learning model determines all of the components, for example, the first
formula including polymer type ”PBP” weight 17.4 percentage points, the polymer
type B ”LAPPE1” 17.4 weight percent, lithium salt ”S2ALP” 17.4 percentage by
weight, of the type of plasticizer ”DO” a 17.4 weight percent, plasticizer type B 12.8
weight percentage of the ”N” and a ”AO” 17.4 weight percentage of the additives. The
second of all components in the recipes including polymer type ”PBP” 17.6 weight
percent, polymer type B ”LAPPE1” 17.6 weight percentage, lithium salt ”S2ALP”
17.6 percentage by weight, of the type of plasticizer ”DO” a 17.6 weight percent,
”N” plasticizer type B 11.9 weight percent (wt %) and a ”social media” 17.6 weight
percentage of the additives. The wt% of the polymer in this formulation ranges from
2wt% to 99 wt%, ideally from 50 wt% to 85%. The wt% of salt in the recipe ranges
from 2 wt% to 80 wt%, ideally from 10 wt% to 60 wt%. The plasticizer wt% in the
formulation ranges from 0 wt% to 90 wt%, preferably from 0 wt% to 30 wt%. The
wt% of the additive in the formula ranges from 0 wt% to 40 wt%, preferably from 0
wt% to 15%.
28
2.4.1 Vector representation
Our raw data (see Table 2.2) is a mix of categorical and numerical data. We need
to find a vector representation of categorical data as the input, to match the model’s
input requirement. We adopt one-hot representation for our chemical components,
that is, each chemical is represented in one element in the whole vector. This element
is marked “1” while others are remain “0”. Then, we multiply the element with its
weight percent wt%. After that, the chemical is processed and next chemical in the
recipe is processed, repeating the process. Finally we got a weight percent-masked
vector; here we still name it one-hot vector.
Table 2.3: One-hot-like example recipe. For each for-
mula, sum of all content should be 1.0 (i.e. 100%).
polymer 1 type x content 0
0
. . .
0.25
0
polymer 2 type x content 0
. . .
0.3
0
0
Li salt type x content 0
0
0
. . .
29
Table 2.3 continued from previous page
polymer 1 type x content 0
0
. . .
0.1
solvent 1 type x content 0
0.05
. . .
0
0
0
0
solvent 2 type x content 0
0
. . .
0
0.15
0
0
additive 1 type x content 0
0
0
0.25
. . .
0
30
Table 2.3 continued from previous page
polymer 1 type x content 0
0
. . .
0
The one-hot vector representation has 102 components – 39 of them being poly-
mers, 26 being lithium salts, 19 being plasticizers and 17 being additives, plus 1
dimension of conductivity as output. Then each one-hot vector in the input is corre-
spondingly multiplied by the quantitative content. The input of the model includes
the chemical names of their components and the components’ quantitative content
(that is, wt%) in the use case of the polymer electrolyte, which are the two parameters
used in the model. In terms of the chemical components, Table 2.2 lists examples in
each chemical category. Example one-hot-like vector is shown in Table 2.3.
In following chapters, all machine learning models will harness this vector repre-
sentation, in order to keep the input/output consistency. In particular, the recipe
generator model will also yield a 102 dimension vector. Then for the post-processing
of the generated recipe, we have an inverse transform function, to help us project the
102D vector back to 6 domain, simple representation as Table 2.2 shows.
31
Chapter 3
Prediction models and experiments
3.1 Modeling
3.1.1 Conductivity prediction model
Machine learning framework can include, and is not limited to, a model based on
a neural net based algorithm, such as Artificial Neural Network, Deep Learning; a
robust linear regression algorithm, such as Random Sample Consensus [36], Huber
Regression [37] or Bayesian Regression [38]; a tree-based algorithm, such as Clas-
sification and Regression Tree, Random Forest [39], Gradient Boost Machines and
Gradient Boosting Decision Tree [40]; Naıve Bayes Classifier [41], and other suitable
machine learning algorithms, such as XGBoost [42] and LightGBM [43].
Those model can capture different details of the data. For example, Artificial
Neural Network is adept at capturing spatial features in images, and tree-based algo-
rithms are skillful in categorical data. Because our chemical recipe data is structural
data mixed with categorical and numerical data, we investigated XGBoost, Light-
GBM and CNN models to find the best model for capturing the latent information,
32
described in following sections. Then, their ensembles combination also have been
evaluated.
3.1.2 State-of-the-art methods on structural data prediction
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN) has extraordinary performance in many com-
puter vision tasks. It is in contrast to the typical paradigm in computer vision, where
hand-crafted features must be designed for each specific task [44]. By using repetitive
blocks of neurons in the form of convolutional layers, pooling layers and fully con-
nected layers, CNN has not only the ability to acquire image feature representations,
but also outperformed many conventional hand-crafted feature techniques [45]. In
this study, we applied a typical architecture of CNN. We built our neural network
by stacking layers, mainly Convolutional layer, Pooling layer and Fully Connected
Layer. Those layers function together to yield a conductivity prediction.
Convolutional Layer Convolution layers are the core process for extracting
features from the input image. Kernels are generated randomly to do the convolu-
tion operation by using the back-propagation algorithm. The output of this layer is
computed as:
y`ij = σ(x`ij) = σ(m−1∑a=0
m−1∑b=0
ωaby`−1(i+a)(j+b)) (3.1)
where x`ij is the ith, jth input unit from the `th layer. The convolutional operation is
denoted as matrix multiplication and ωab is the kernel for convolution. Bias term of
the `th layer is omitted. We used the rectified linear unit (ReLU, σ(x) = max(0, x))
as the non-linear activation function.
33
Pooling Layer The pooling layer is responsible for reducing the size of the
feature maps. Max-pooling and mean-pooling are the two standard methods used in
the pooling layers. The primary function of this layer is to reduce the dimensions of
the output of convolutional layers and the number of parameters to learn to prevent
overfitting. In this study, we chose the max-pooling for pooling layers. Max-pooling
only picks the maximum value in the region instead of calculating its arithmetic
mean in mean-pooling. Max pooling has demonstrated faster convergence and out-
performed the average pooling and other variants [46].
Fully Connected Layer After flattened, each value in the vector gets a vote
through one or more fully connected layers [47]. In this study, a softmax function (σ)
is used in the output layer, defined as:
y`i = σ(x`i) = σ(∑j
w`−1ji y`−1
j ) (3.2)
where σ is the score from the fully connected layer. The main function of softmax
layer is to compress the score to the values between zero and one, with unitary sum.
Tree-based model
Common machine learning algorithms, such as neural networks, can be trained in the
form of Mini-batch, and the size of the training data will not be limited by memory.
GBDT needs to traverse the entire training data for many times in each iteration.
If you put the whole training data into memory, it will limit the size of the training
data. If it is not loaded into memory, reading and writing training data repeatedly
will consume a very large amount of time. Especially in the face of industrial mass
data, ordinary GBDT algorithm is unable to meet its requirements. XGBoost and
LightGBM (Light Gradient Boosting Machine) are frameworks to realize GBDT al-
34
gorithm, which has the following advantages: Faster training speed; Lower memory
consumption; Better accuracy; Distributed support for fast processing of massive
data.
XGBoost XGBoost is based on the gradient boosting machine was improved,
at the system level optimization, including joining regularization to prevent fitting,
similar to the random forest sampling, the columns of the training model can be used
directly or using the greedy algorithm approximate algorithm, suitable for sparse
data, using column blocks for distributed learning, make full use of the cache, we can
use outside of the CPU resources to calculate.
LightGBM Gradient Boosting Decision Tree (GBDT) is a long-lasting model
in machine learning, whose main idea is to use weak classifier (Decision Tree) iterative
training to get the optimal model, which has the advantages of good training effect
and difficulty in overfitting. GBDT is widely used in the industry, it is often used for
click-through rate prediction, search sorting and other tasks. GBDT is also a lethal
weapon in data mining competitions like Kaggle.
XGBoost uses pre-sorted decision tree algorithms, while LightGBM uses His-
togram based decision tree algorithms. The pre-sort algorithm needs to calculate
the gain of splitting each time it traverses a feature value, while the histogram algo-
rithm only needs to calculate k times (k can be considered as constant), and the time
complexity is optimized from O(data*feature) to O(K *features). LightGBM uses
the level-wise decision tree growth strategy used by XGBoost. LightGBM uses the
leaf-wise algorithm with depth limit. The disadvantage of Leaf-Wise is that deeper
decision trees may grow, resulting in overfitting.
35
3.2 Implementation
All of models were implemented in Python programming language. Neural network
models were built with Tensorflow framework [48] and pytorch framework [49], Tree-
based model were implemented by Scikit-learn framework [50]. Pandas toolkit [51]
was used in data pre-processing part. In view of the fact that only a few studies
applied machine learning methods on battery recipe data for conductivity, we decided
to select and examine models who may suitable for this problem. Here we compare
various of neural network models and tree-based model, as well as their embedding
models. Their parameter settings are listed below.
LightGBM We tested 600-750 of estimators and find the best performance with
720; along with max tree depth equals to 7, number of leaves of 48. The learning rate
is controlled at 0.05. We have also setup bagging and feature sub-sampling rate to
0.9 to avoid overfitting.
XGBoost We tested max tree depth from 5 to 25, with learning rate set to 0.05.
We also conducted grid-search on the maximum depth from 5 to 10, and picked 7 in
following experiments. We have also setup subsample rate (0.85) for XGBoost.
Neural Network Model Since our data have special format and pattern, we
built our Neural Network Model accordingly. The details of the network is shown
in Figure 3.1. The first multuplier layer is inspired by DeepFM [52] and attention
mechanism [53] to understand and learn feature interactions from the input features.
We built a bottleneck 1D Convolutional block with skip connection. Each block has
3 Conv layers embedded, window size of each Conv layer is 3. The blocks are stacked,
following by batch normalization and dense layer. Then, the network is optimized by
36
Adam [54] compiled by MSE metric with learning rate set to 1e-3.
Multiplier Convolution Dense Convolution Dense
8@102x18@208x1
256@208x1 256@208x1 230@208x1
1x512
1x1
Figure 3.1: Deep neural network architecture. Conv layer denotes the convolutionalblock. Concatenation operations are omitted.
3.3 Evaluation
We employed statistical metrics to evaluate our prediction model. Commonly used
metrics for a regression task such as RMSE is evaluated, as well as correlation metrics
including Spearman correlation and Pearson correlation. These metrics could help
us find factors in failed predictions and better fine-tune our model during model
validation time.
We have proven the feasibility of our Neural Network prediction model. As shown
in Table 3.1, we measured the real conductiviity scale and the scale of whole range of
normalized conductivity scores from 0 to 1. The difference between predicted score
and the corresponding measured score was no more than 0.08 (around 10−2 in real
scale), for each of the 9 electrolytes studied. The smallest difference was 0.01 (around
3e−7 in real scale).
37
Table 3.1: Formulations generated by machine learning
with their predicted and measured conductivity scores.
Formulation # Predicted score Measurement score
1 1.12e-03 3.02e-03
2 9.55e-07 5.00e-06
3 1.20e-06 1.55e-06
4 6.02e-06 3.00e-07
5 8.29e-07 3.30e-08
6 7.41e-06 4.05e-05
7 3.23e-04 1.30e-03
8 6.87e-07 3.50e-07
9 3.79e-06 7.22e-05
Figure 3.2, Figure 3.3 and Figure 3.4 show that the prediction model predicted the
conductivity accurately to the level of industrial application. LightGBM rank highest
among all the three models. The Pearson correlation coefficient of LightGBM is 0.92
between the predicted conductivities and measured ones with pvalue of 1.36e−17,
which means our model is able to predict correct ordinal information. LightGBM
also perform well in spearman correlation, with 0.96, and pvalue with 1.91e−64. The
Root Mean Square Error (RMSE) between prediction value and measurement value
is acceptable as well, with 0.06. Thus, we obtain the prediction model and further
apply it in our generation model as environment model.
38
Figure 3.2: Scatter plot of LightGBM. With RMSE of 0.07, std of 0.003, Spearmancorrelation of 0.91 and pearson correlation of 0.92.
Figure 3.3: Scatter plot of XGBoost. With RMSE of 0.08, std of 0.005, Spearmancorrelation of 0.89 and pearson correlation of 0.91.
Figure 3.4: Scatter plot of our Neural Network model. With RMSE of 0.08, std of0.007, Spearman correlation of 0.83 and pearson correlation of 0.93.
39
Chapter 4
Generation models andexperiments
4.1 Models for structural data generation
4.1.1 Formulated as an optimization problem
Optimization problem is a process of finding extremum, which is often the problem in
data science. So the problem of finding the maximum conductivity can be naturalized
as an optimization problem. The usual way to find the extreme value is to take the
derivative, that is, to optimize based on the gradient, if the function form is known,
then the derivative can be found, and the function can only be convex. However,
in most cases, the problem situation does not meet these two conditions, such as
inversion problem (inversion problem refers to the determination of parameters (or
model parameters) representing the characteristics of the problem based on the results
and some general principles (or models)), so in this case gradient optimization cannot
be used.
40
Bayesian optimization Bayesian optimization was proposed to solve the in-
version problem. The advantage of bayesian optimization is only need continuous
sampling, to estimate the maximum of a function, at the same time the required
sampling points. Bayesian optimization applies to don’t know what function the spe-
cific form of expression, but if given a x, y can be calculated. The calculation method
here can use Gaussian Process Regression, etc. If (x,y) is sufficient, then the trend
of the function image is basically known. Bayesian optimization is especially suitable
for small space optimization [55].
4.1.2 Markov Decision Process for battery recipe generation
We could intuitively model the chemical reaction process for recipe generation is
formulated as learning a reinforced agent, which performs discrete actions of slight-size
addition or removal in a chemistry-aware Markov Decision Process (MDP). Herein, we
include a assumption that chemical reaction process for recipe generation has Markov
property. MDPs are a classical formalization of sequential decision making, where
actions influence not only immediate rewards of current state, but also subsequent
states through those future rewards [26]. Thus MDPs involve delayed reward and
the need to trade-off immediate and delayed reward. They are useful for studying
optimization problems, here defined by optimizing conductivity of recipes. We then
employ reinforcement learning to solve this MDP problem.
The MDP M formally have components: states, actions and rewards (M =
S,A,R), where each term is defined as follows:
S = St is the state, whose value can be all possible intermediate and final generated
recipes. Each St is a tuple of state and its corresponding time step, denoted as (s, t).
Here we consider the case of finite MDP, that the set of states, actions and rewards all
have a finite number of elements. Also, all of the three components are defined discrete
with regard to time, presenting as dependence on preceding component. For the initial
41
state S0, we randomly chosen from a combination of our battery material recipe
database and those already generated recipes. MDP modeling typically required an
ending state tailed a series of states forming episode. We explore the episodic case
by limiting the maximum number of time steps T in our tabular chemical-reaction
based MDP, after T time steps the episode will end and then start a new episode.
A = At denotes a set of actions that describe the modification made on the current
state (intermediate recipe) at each time step t. Action space here is same as state
space, the only difference is that the modifications in action are often micro-scale in
comparison with state space. We enforce this because we want to simulate a chemical
reaction environment where each component is added gradually. Therefore, the space
is also continues, represented by a distribution of each component.
Rt is the reward function that specifies the reward after reaching state St, with
discount factor γ. This hyper-parameter is set to 0.9 in our study. In our framework,
the state will be post-processed to a valid and complete structure form at each step.
That is, all component content sum should be 1, and all component content should
be larger than or equal to 0. Note that in our virtual environment, A reward is given
not just at the terminal states, but after each action step. Both intermediate rewards
and final rewards are used to guide the behavior of the reinforcement learning (RL)
agent, avoided delayed or sparse reward issue as many other reinforced frameworks
suffered [33]. Furthermore, to ensure that the last state is rewarded the most, we use
γ to discount the value of the rewards at state St. In addition, our reward function
consider the similarity of recipes, in order to avoid generate many repeated recipes.
Reinforcement learning
To solve an MDP, conventional approaches such as Dynamic Programming (DP) and
Monte Carlo harness the iterative nature of MDP problem. Reinforcement learning is
an iterative process, each iteration to solve two problems: given a strategy evaluation
42
function, and according to the value function to update the strategy. Methods of
reinforcement learning can be considered to achieve a similar effect to DP, weaken
the assumptions of the known accurate environment model or to calculate less. The
DP method is generally used for finite MDP problems, where the set of states, actions,
and returns are finite. For continuous state action space problems, optimal solutions
are obtained only in special cases.
The DP-based method requires an environment model, while the Monte Carlo-
and TD- (Temporal Difference) based methods do not require an environment model.
The former is called model-based method [26], uses a model of the environment for
planning, while the latter is not model-free methods, which learn from the experience
of directly interacting with the environment. If the model is used to enhance the
strategy, the biggest benefit of introducing environment modeling is that it can make
better use of prior knowledge and improve learning efficiency. Model-free methods do
not try to learn environment dynamics and reward function, which have an advantage
in saving computation and space for more trials.
Reinforcement learning algorithms can be divided into three categories: value
based, policy based and actor-critic based models. The commonly used value based
algorithm, such as DQN, has only one value function network without policy network,
while the actor-critic algorithm represented by an model following Asynchronous Ad-
vantage Actor-Critic framework (A3C) [56] called DDPG (Deep Deterministic Policy
Gradient). DDPG has both value function network and policy network. DDPG is
also model free and off-policy, and also USES depth neural network for function ap-
proximation. However, unlike DQN, whose vanilla version can only solve the problem
of discrete and low-dimensional action spaces. DDPG can solve the continuous mo-
tion space problem by introducing action policy modeling. In addition, DQN is the
value based method, that is, there is only one value function network, while DDPG
is the actor-critic method, that is, there is both value function network (critic) and
43
policy network (actor).
Environment model
In our RL framework, the chemical environment receives action At from the agent
and yield a scalar reward Rt and a state St+1 to the agent. We define the state of
the chemical recipe content St as the intermediate generated tabular recipe at time
step t, which is fully observable by the RL agent. For the task of battery recipe
generation, the environment incorporates rules from domain knowledge. Therefore,
our environment should figure out the state transaction:St to St+1, and evaluate the
action to get reward.
Our environment mimic the chemical experiment process, that allows adding com-
ponent gradually while monitoring the current status of the battery solution. Thus,
we model the state transaction as simply adding operation. When a new action
comes, the new state should be current state add the new action. Noise is included
to enclose the operation mistake.
The reward function is calculated from an extra model, which is the conductivity
prediction model built before. As discussed in last Chapter, hereinafter we using our
trained LightGBM as environment model.
Reward design Instead of simply putting attention on the diversity of recipe,
we explore the possibility of generating novel recipe based on the existing knowl-
edge base. We designed a reward function that consists of the final property score,
containing conductivity score and other constrains as
Rew = ω(st) + α1√
2π × 8.2e
12( temp−25
σ)2
44
where ω(st) represents the prediction model. We include the temperature as a con-
strain because too high and too low temperature is unfavourable by domain knowl-
edge. Model will generate more room temperature recipes, while preserve probabilities
of other temperatures, controlled by a weight α.
Model design
Actor Critic
inputinput
replay buffer(s, a, r, s')
aa
a
Online Policy Net
Target Policy Net
update
policy gradientyOnline Critic Net
y
yTarget Critic Net
update
loss function
s, s', r
Environment(conductivitypredictor)
Figure 4.1: Framework of DDPG model in our approach.
Our model is built based on DDPG modeling. Input of critic network is action
and observation, the output is value function estimation Q(s, a). In addition, a neural
network is used to approximate the policy function, also known as actor network, and
its input is observation s, the output is action a. Critic network and Actor network
are represented as Q(s, a;ω) and a = π(s; θ). Here, ω and θ denote parameters in
these models. An asynchronous update target network is used in DDPG to ensure
parameter convergence. The whole network architecture shown in Fig. 4.1.
The connection between the two networks is as follows: first the environment gives
an observation, the agent makes a decision to take action based on the return of actor
network (with adding noise to the action), the environment receives the action and
gives a reward R, and the new observation. This process is called time step of an
iteration. At this point we need to update the critic network according to reward R,
and then update the actor network in the direction of the critic. Then move on to the
next step, keep iterating until we’ve trained a good network of actors. The goal of
45
recipe generation is equally to fit a Q function to make the agent generate an action
at at state st that maximizes the future expected cumulative rewards with policy of
action a = πθ(s).
The critic network is used for value function approximation and is updated using
gradient descent. Notice that both actor and critic use the target network:
targett = Rt+1 + γQ(St+1, π(St+1; θ−);ω−) (4.2)
and loss function:
Loss =1
N
N∑t=1
(targett −Q(St, at;ω))2 (4.3)
To evaluate the policy, we need a object to achieve called policy objective function
J(θ). We want the best policy θ that can make J(θ) optimal. Then the derivative lead
to Policy Gradient, 5θJ(θ). We should update the policy parameters in a way that
makes the value of the value function larger. Deterministic Policy Gradient Theorem
[57] provides a method to update deterministic strategy. Given the agent’s policy π,
the TD error δ, the value of the state-action pair Qπ(St, At) and the value of state
Qπ(St) are updated:
δt = Rt+1 + γQ(St+1, πθ(St+1);ω)−Q(St, at;ω) (4.4)
ωt+1 = ωt + αω · δt · 5ωQ(St, at;ω) (4.5)
θt+1 = θt + αθ · 5θπθ(St)5a Q(St, at;ω) (4.6)
where α is learn rate, a hyper-parameter that controls scale of update. Here, TD
error δ record and update a difference in last time step, then keep this value for next
time step’s update.
46
4.2 Model training
4.2.1 Bayesian optimization setting
We implement Bayesian optimization in this project using ”bayes opt” toolkit [58].
The parameter search scope is set from 0.10 to 0.99. Likelihood function is the default
gaussian function. Number of initial points is 400, number of iterations is set to 500.
4.2.2 Training Reinforcement Learning model
Model-environment relations
We use the experience simulated by the model to replace the actual experience in
the learning method. In our approach, we build a environment model, but just for
simulate the real chemical reaction in battery. Our environment model is outside with
the Reinforcement Learning (RL) model, trying to mimic a true wet-lab chemical
reaction return. Note that in model-based method, agent could predict which action
would be more worthwhile taking, while in our simulation environment model, each
step’s reward could be calculated. Our model is classified as model-free method, while
our simulation model helps the agent perform better.
Adoption of RL
For an actor network, its output dimension is the dimension of our target artifact.
For the critic network, its output is a 1-dimensional vector, where for each sample, a
corresponding y (conductivity) estimation is the output. Their input dimensions are
fixed recipe dimensions (102D).
We found that the output of the actor network is a dense matrix, and in actual
experiments, we hope that the generated matrix is sparse, that is, only a few dimen-
sions have values, and the rest are 0. Our way to achieve sparseness is to take Max
47
out. For each domain, we limit to only take the largest 1-2 dimension as the final
reserved dimension, and other reduce to 0. In practice, we implemented the max out
method in two steps: in the actor network and in the environment model. For the
former, we extended and rewritten the final activation function of the actor network
so that argmax is calculated when forward and backward (argmax is not derivable,
the back propagation here needs to explicitly calculate the mean). For the latter, we
could explicitly check whether the dimension of state meets our expectations when
apply action to state.
Another case required post-processing is the density vector from actor network. If
the dense vector is nearly even distributed, the argmax operation is especially hard to
determine which chemical should be selected from the 102D vector. In other words, if
2 elements in the yield vector have few difference, then the argmax operation is likely
to pick the wrong one. Besides, by using argmax we already assume that elements
in the generated vector denote possibilities of each position’s chemical. However,
after argmax operation, the wt% of chemicals are also obtained by the elements. Our
current solution is keeping argmax out the same highest element for possibility and
wt%, while processing them by normalization.
Training setting
Parameter in RL model is three-fold: environment related parameters, critic model
related parameters and actor model related parameters. We generally follow the plain
DDPG settings, but adjust the feature related part. Our action and state space is in
same-dimension numerical space. Our environment and problems define a continues
problem, wherein the end situation of iterations should be set (tested 1 to 64).
We applied 1e-4 as the learning rate of actor and critic network, with batch size
set to 32, hidden size set to 512. Critic network is optimized by MSE. As the origin
version of DDPG, memory mechanism and noise mechanism are also included, with
48
memory capacity of 5000. Default temperature is set to 25 degree.
4.3 Evaluation
After harvesting the generated recipes, we still have to post-process them. Common
processings are shared among generation models we used, such as availability filtering
and ranking selection.
Figure 4.2: Line plot: generated recipe conductivity results versus iterations. Barplot: sample number w.r.t bins of conductivity.
For Bayesian optimization, we need to take care of the distribution of the vectors.
Because Bayesian optimization is prone to stacking in local minima, the result may
finally be congregated, and the vector distributions would also be similar. We ob-
served sometimes the yield vectors looks similar, and after distribution check we will
discard these recipes.
For RL, as we discussed in the reward design part, our reward function is devised
to have recipe conductivity thresholds (e.g. score more than 0.8), the score predicted
by our environment model. In each episode, 2 recipes were automatically generated
and they all matched the defined rules by using the model without pre-training. Then,
49
as the common filtering and selection steps, we applied our filter to screen out the
available recipe. Last, these recipes are ranked by their estimated conductivities; top
recipes will get verified by wet lab experiments.
First we want to verify the RL model works properly w.r.t iteration. Figure
4.2 shows the average conductivity per 100 epoches during model training. We can
observe that the model is jittering all the time, having no clue of convergence even
after 400 epoches. The jittering maybe come from the frequent restart of the episode.
In addition, the figure shows a histogram-like bar plot (Red). We can easily conclude
that most generated recipes have around score value 0.8. Sample numbers of higher
and lower score recipes are similar. Based on these evidence we could say the model
did work on our problem.
Figure 4.3 and Figure 4.4 portray representative predicted normalized ionic con-
ductivity of polymer electrolytes generated by the two models. The X axis represents
the electrolyte sample number. The Y axis represents the normalized ionic conductiv-
ity. The reinforcement learning model 130 generates and optimizes formulations (i.e.
recipes) with ionic conductivity higher than a maximum conductivity represented in
the training data (i.e. normalized conductivity less than 1, or ionic conductivity of as
high as 3.7 x 10-2 S/cm). By comparison, our RL model is more preferred as recipe
generator.
In addition, we also conducted studies on controlling difference of generation,
shown in Figure 4.5. Inspired by the earth-moving distance [59] of distributions,
we also include this mechanism in our loss function as comparison. The standard
deviation (std) over iteration also indicates that our model did learn from the data
while keeping a low generative std.
The limitation of the generative model would be the rate of generating satisfactory
recipes. Sometimes it may need lots of iterations to get a valuable recipe for testing.
We can draw a line over y=1 to see if the prediction exceed the maximum score in
50
the database. The predictor, as the environment, in this case is the bottleneck of
generative model. We will devise a metric about it and tune it in the future.
51
Figure 4.3: Line chart shows the generated recipes’ transformed conductivity score(Y-axis) with respect to iteration time step (X-axis), associated with an exemplaryenvironment in which some embodiments may operate. Points above the line y=1are conductivities larger than all existing recipes’ conductivity in our database.
Figure 4.4: Line chart shows the generated recipes’ transformed conductivity score(Y-axis) with respect to iteration time step (X-axis). There are no points above theline y=1, in this situation RL model outperformed Bayesian optimization method.
52
Figure 4.5: Standard deviation of generated recipes. Std is slowly decreasing becausethe generator learned the environment. Our model successfully controled the std.
53
Chapter 5
Summary
We developed a transformative machine learning framework, MARS, to accelerate ad-
vanced material RD. The machine learning framework comprised a machine learning
predictor to predict an objective function based on a recipe, and a reinforcement learn-
ing model to generate the plurality of proposed different recipes of battery materials
that provides optimized objective function. This framework spotlighted generative
machine learning models using structural input data.
The whole process of our proposed framework includes: predicting, by a machine
learning model (LightGBM); generating, by a reinforcement learning model (DDPG),
a plurality of proposed different recipes of battery materials by optimization of con-
ductivity and objective functions by given recipes; preparing an instance of at least
one of the proposed different recipes of battery materials via a robotic preparation
module. The framework further integrating with a high-throughput robotic plat-
form. Instances of the different proposed recipes of battery materials are prepared
and deposited into an electrochemical module by a robotic preparation module. A
robotic testing module executes a plurality of formulation characteristic tests on each
deposited recipe instance and updates the database and machine learning model.
Based on the results of RL model, we believe our approach of combining AI
54
machine learning and robotic high-throughput automation will greatly reduce the
cost and time to market for new and improved materials. It can potentially cut the
discovery time for new material solutions by 10 times, from 10-20 years down to 1-2
years.
We made an assumption that the direct tabular information can be accessed by
machine learning model. Our experiments on prediction model could confirm the
assumption. Other information such as chemical formula in SMILE format would
also help the model. The generated recipes are only focusing on conductivity, some
other objects haven’t been included. We plan to follow the experience in conductivity
project, predicting then generating by extra information. In the future, when multiple
objective functions are considered, we can combine them together into a weighted
single total objective function so that the overall objective will be optimized using
the same search algorithm. In some other cases, a certain objective function may be
used as constraints, for example, searching for the optimal ionic conductivity given
a certain range of Young’s modulus. In this case, the search will reject the solutions
that do not satisfy the constraints.
In addition, after some amount of iterations, we noticed that those models all
have fewer improvement, or even declination. Recently view of generation model re-
searchers concern about that the generation model will learn to hack the environment
model. So this phenomena maybe caused by the generative model fool the predictor.
This will also lead to similarity of generated recipes. Our reports on T-SNE analysis
also indicates that the recipes are grouped in clusters, while their score not follow any
obvious pattern. Investigation of interactions between environments and generation
models would be our future work.
55
Bibliography
[1] Wencong Lu, Ruijuan Xiao, Jiong Yang, Hong Li, and Wenqing Zhang. Data
mining-aided materials discovery and optimization. Journal of Materiomics,
3(3):191 – 201, 2017. High-throughput Experimental and Modeling Research
toward Advanced Batteries.
[2] Propagator Ventures. Why we invested in kebotix, 2018. [Online; posted 8-Nov-
2018].
[3] Christophe Pillot. The rechargeable battery market and main trends 2016–2025.
2017.
[4] Cormac Toher, Jose Plata, Ohad Levy, Maarten Jong, Mark Asta, Marco Buon-
giorno Nardelli, and Stefano Curtarolo. High-throughput computational screen-
ing of thermal conductivity, debye temperature, and gruneisen parameter using
a quasiharmonic debye model. Physical Review B, 90, 11 2014.
[5] Yuan Dong, Chuhan Wu, Chi Zhang, Yingda Liu, Jianlin Cheng, and Jian Lin.
Bandgap prediction by deep learning in configurationally hybridized graphene
and boron nitride. npj Computational Materials, 5:26, 02 2019.
[6] Rama Vasudevan, Kamal Choudhary, Apurva Mehta, Ryan Smith, Gilad Kusne,
Francesca Tavazza, Lukas Vlcek, Maxim Ziatdinov, Sergei Kalinin, and Ja-
son Hattrick-Simpers. Materials science in the artificial intelligence age: high-
56
throughput library generation, machine learning, and a pathway from correla-
tions to the underpinning physics. MRS Communications, 9:1–18, 07 2019.
[7] Suriani Ibrahim and Mohd Johan. Conductivity, thermal and neural network
model nanocomposite solid polymer electrolyte s lipf 6 ). International Journal
of Electrochemical Science, 6, 11 2011.
[8] Fang Ren, Logan Ward, Travis Williams, Kevin Laws, Christopher Wolverton,
Jason Hattrick-Simpers, and Apurva Mehta. Accelerated discovery of metallic
glasses through iteration of machine learning and high-throughput experiments.
Science Advances, 4:eaaq1566, 04 2018.
[9] Krishna Rajan. Combinatorial materials sciences: Experimental strategies for
accelerated knowledge discovery. Annual Review of Materials Research, 38:299–
322, 08 2008.
[10] Wildcat Discovery Technologies. Wildcat Discovery Technolo-
gies Discloses Fundamental Advances in Rechargeable Bat-
tery Materials Technology, month = 03, year = 2011, url =
”https://www.businesswire.com/news/home/20110314005427/en/wildcat-
discovery-technologies-discloses-fundamental-advances-rechargeable”.
[11] Xiao Wan, Wentao Feng, Yunpeng Wang, Haidong Wang, Xing Zhang,
Chengcheng Deng, and Nuo Yang. Materials discovery and properties prediction
in thermal transport via materials informatics: A mini review. Nano Letters,
19(6):3387–3395, 2019. PMID: 31090428.
[12] Gerbrand Ceder. Opportunities and challenges for first-principles materials de-
sign and applications to li battery materials. MRS bulletin, 35(9):693–701, 2010.
57
[13] Ao Huang, Yanzhu Huo, Juan Yang, and Guangqiang Li. Computational simu-
lation and prediction on electrical conductivity of oxide-based melts by big data
mining. Materials, 12, 2019.
[14] Tulay Ekemen Keskin, Emre Ozler, Emrah Sander, Muharrem Dugenci, and
Mohammed Ahmed. Prediction of electrical conductivity using ann and mlr: a
case study from turkey. Acta Geophysica, 05 2020.
[15] Antonio Lavecchia. Machine-learning approaches in drug discovery: methods
and applications. Drug Discovery Today, 20(3):318–331, March 2015.
[16] Bowen Tang, Fengming He, Dongpeng Liu, Meijuan Fang, Zhen Wu, and Dong
Xu. Ai-aided design of novel targeted covalent inhibitors against sars-cov-2.
bioRxiv, 2020.
[17] Rafael Gomez-Bombarelli, Jennifer N. Wei, David Duvenaud, Jose Miguel
Hernandez-Lobato, Benjamın Sanchez-Lengeling, Dennis Sheberla, Jorge
Aguilera-Iparraguirre, Timothy D. Hirzel, Ryan P. Adams, and Alan Aspuru-
Guzik. Automatic chemical design using a data-driven continuous representation
of molecules. ACS Central Science, 4(2):268–276, January 2018.
[18] Arpan Kar. Machine learning applications in supply chain management. CII Con-
ference on E2E TrimodalSupply chain: Envisioning Collaborative, Cost Centric,
Digital Cognitive Supply Chain, 07 2016.
[19] Kan Hatakeyama-Sato, Toshiki Tezuka, Momoka Umeki, and Kenichi Oyaizu.
Ai-assisted exploration of superionic glass-type li+ conductors with aromatic
structures. Journal of the American Chemical Society, 142(7):3301–3305, 2020.
PMID: 31939282.
[20] M. L. Green, C. L. Choi, J. R. Hattrick-Simpers, A. M. Joshi, I. Takeuchi, S. C.
Barron, E. Campo, T. Chiang, S. Empedocles, J. M. Gregoire, A. G. Kusne,
58
J. Martin, A. Mehta, K. Persson, Z. Trautt, J. Van Duren, and A. Zakutayev.
Fulfilling the promise of the materials genome initiative with high-throughput
experimental methodologies. Applied Physics Reviews, 4(1):011105, March 2017.
[21] Juan J. de Pablo, Nicholas E. Jackson, Michael A. Webb, Long-Qing Chen,
Joel E. Moore, Dane Morgan, Ryan Jacobs, Tresa Pollock, Darrell G. Schlom,
Eric S. Toberer, James Analytis, Ismaila Dabo, Dean M. DeLongchamp, Gre-
gory A. Fiete, Gregory M. Grason, Geoffroy Hautier, Yifei Mo, Krishna Rajan,
Evan J. Reed, Efrain Rodriguez, Vladan Stevanovic, Jin Suntivich, Katsuyo
Thornton, and Ji-Cheng Zhao. New frontiers for the materials genome initiative.
npj Computational Materials, 5(1), April 2019.
[22] Tianyu Liu, Kexiang Wang, Lei Sha, Baobao Chang, and Zhifang Sui. Table-
to-text generation by structure-aware seq2seq learning. In Thirty-Second AAAI
Conference on Artificial Intelligence, 2018.
[23] Nansi Xue, Wenbo Du, Amit Gupta, Wei Shyy, Ann Sastry, and Joaquim Mar-
tins. Optimization of a single lithium-ion battery cell with a gradient-based
algorithm. Journal of The Electrochemical Society, 160:A1071–A1078, 05 2013.
[24] Diederik P Kingma and Max Welling. Auto-encoding variational bayes, 2013.
[25] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-
Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adver-
sarial nets. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and
K. Q. Weinberger, editors, Advances in Neural Information Processing Systems
27, pages 2672–2680. Curran Associates, Inc., 2014.
[26] Richard S. Sutton and Andrew G. Barto. Introduction to Reinforcement Learn-
ing. MIT Press, Cambridge, MA, USA, 1st edition, 1998.
59
[27] RE l Perez and K Behdinan. Particle swarm approach for structural design
optimization. Computers & Structures, 85(19-20):1579–1588, 2007.
[28] C. Zhang and Y. Peng. Stacking vae and gan for context-aware text-to-image
generation. In 2018 IEEE Fourth International Conference on Multimedia Big
Data (BigMM), pages 1–5, 2018.
[29] Steffen Holldobler, Sibylle Mohle, and Anna Tigunova. Lessons learned from
alphago. 06 2017.
[30] kaggle, an online community of data scientists and machine learning practition-
ers. https://www.kaggle.com.
[31] mlcourse.ai. mlcourse.ai – open machine learning course, eature engineering and
feature selection. https://mlcourse.ai.
[32] Piotr Gromski, Alon Henson, Jaroslaw Granda, and Leroy Cronin. How to ex-
plore chemical space using algorithms and automation. Nature Reviews Chem-
istry, 3, 01 2019.
[33] Mariya Popova, Olexandr Isayev, and Alexander Tropsha. Deep reinforcement
learning for de novo drug design. Science Advances, 4(7):eaap7885, July 2018.
[34] Kyle Banker. MongoDB in Action. Manning Publications Co., USA, 2011.
[35] Andrius Vabalas, Emma Gowen, Ellen Poliakoff, and Alexander J. Casson. Ma-
chine learning algorithm validation with a limited sample size. PLOS ONE,
14(11):1–20, 11 2019.
[36] Martin A. Fischler and Robert C. Bolles. Random sample consensus: A paradigm
for model fitting with applications to image analysis and automated cartography.
Commun. ACM, 24(6):381–395, June 1981.
60
[37] Peter J. Huber. Robust estimation of a location parameter. Ann. Math. Statist.,
35(1):73–101, 03 1964.
[38] Thomas P. Minka. Bayesian linear regression. Technical report, 3594 Security
Ticket Control, 1999.
[39] Leo Breiman. Random forests. Mach. Learn., 45(1):5–32, October 2001.
[40] Alexey Natekin and Alois Knoll. Gradient boosting machines, a tutorial. Fron-
tiers in Neurorobotics, 7:21, 2013.
[41] I. Rish. An empirical study of the naive bayes classifier. Technical report, 2001.
[42] Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting system. In
Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining, KDD ’16, page 785–794, New York, NY, USA, 2016.
Association for Computing Machinery.
[43] Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qi-
wei Ye, and Tie-Yan Liu. Lightgbm: A highly efficient gradient boosting decision
tree. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vish-
wanathan, and R. Garnett, editors, Advances in Neural Information Processing
Systems 30, pages 3146–3154. Curran Associates, Inc., 2017.
[44] Suraj Srinivas, Ravi Kiran Sarvadevabhatla, Konda Reddy Mopuri, Nikita
Prabhu, Srinivas S. S. Kruthiventi, and R. Venkatesh Babu. A taxonomy of
deep convolutional neural nets for computer vision. Frontiers in Robotics and
AI, 2:36, 2016.
[45] Joseph Walsh, Niall O’ Mahony, Sean Campbell, Anderson Carvalho, Lenka
Krpalkova, Gustavo Velasco-Hernandez, Suman Harapanahalli, and Daniel Ri-
ordan. Deep learning vs. traditional computer vision. 04 2019.
61
[46] Dominik Scherer, Andreas Muller, and Sven Behnke. Evaluation of pooling
operations in convolutional architectures for object recognition. In Konstantinos
Diamantaras, Wlodek Duch, and Lazaros S. Iliadis, editors, Artificial Neural
Networks – ICANN 2010, pages 92–101, Berlin, Heidelberg, 2010. Springer Berlin
Heidelberg.
[47] Convolutional Neural Networks for Image and Technische Universitat Munchen
Video Processing. Layers of a convolutional neural network, 2014.
[48] Martın Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey
Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al.
Tensorflow: A system for large-scale machine learning. In 12th {USENIX} Sym-
posium on Operating Systems Design and Implementation ({OSDI} 16), pages
265–283, 2016.
[49] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gre-
gory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga,
Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Rai-
son, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai,
and Soumith Chintala. Pytorch: An imperative style, high-performance deep
learning library. In H. Wallach, H. Larochelle, A. Beygelzimer, F. dAlche-Buc,
E. Fox, and R. Garnett, editors, Advances in Neural Information Processing
Systems 32, pages 8024–8035. Curran Associates, Inc., 2019.
[50] Fabian Pedregosa, Gael Varoquaux, Alexandre Gramfort, Vincent Michel,
Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron
Weiss, Vincent Dubourg, et al. Scikit-learn: Machine learning in python. Journal
of machine learning research, 12(Oct):2825–2830, 2011.
62
[51] Wes McKinney et al. Data structures for statistical computing in python. In
Proceedings of the 9th Python in Science Conference, volume 445, pages 51–56.
Austin, TX, 2010.
[52] Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He.
Deepfm: A factorization-machine based neural network for ctr prediction. pages
1725–1731, 08 2017.
[53] Dzmitry Bahdanau, Kyunghyun Cho, and Y. Bengio. Neural machine translation
by jointly learning to align and translate. ArXiv, 1409, 09 2014.
[54] Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization.
International Conference on Learning Representations, 12 2014.
[55] Carl Edward Rasmussen and Christopher K. I. Williams. Gaussian Processes
for Machine Learning (Adaptive Computation and Machine Learning). The MIT
Press, 2005.
[56] Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Tim-
othy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous
methods for deep reinforcement learning. In International conference on machine
learning, pages 1928–1937, 2016.
[57] David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and
Martin Riedmiller. Deterministic policy gradient algorithms. 2014.
[58] Fernando Nogueira. Bayesian Optimization: Open source constrained global
optimization tool for Python, 2014–.
[59] Wei Zhao, Maxime Peyrard, Fei Liu, Yang Gao, Christian Meyer, and Steffen
Eger. Moverscore: Text generation evaluating with contextualized embeddings
and earth mover distance, 09 2019.
63