artificial intelligence in chemistrysynthetic organic chemistry synthesis is the process of creating...
TRANSCRIPT
ARTIFICIAL INTELLIGENCE IN CHEMISTRY Blaine Berrington
OVERVIEW
• Provide an overview of synthesis and its challenges
• Discuss past approaches of AI to synthesis
• Discuss solutions that modern AI poses for synthesis
• Synthesis planning
• Reaction prediction
• summarize other applications of AI to the field of chemistry
SYNTHETIC ORGANIC CHEMISTRY
Synthesis is the process of creating new molecules designated as a target through controlled stepwise chemical reactions.
• Used in a huge array of industries from pharmaceuticals and dyes to superconductors and plastics
• Involves a number of problem solving strategies
• Requires meticulous planning and skill to carry out
https://techtransfer.cancer.gov/aboutttc/successstories/taxol
https://en.wikipedia.org/wiki/Paclitaxel
SYNTHESIS
• Intermediate approach
• Direct Associative approach
• Logic-centered approach
• Reduction of chemical complexity
• Formation of a “tree” of paths
• Time consuming
https://www.organic-chemistry.org/totalsynthesis/totsyn04/quinine-woodward-williams.shtm
LOGIC-CENTERED SYNTHESIS
• Perception of structurally important features within a target molecule: • Functional groups • Stereocenters • Regional reactivity (instability and sensitivity)
• Reductions of molecular complexity (goal) • Internal connectivity scission • Chain/appendage reduction • Functionality removal • Stereochemistry simplification • Instability removal
• Subgoals • Functional group interchange • Protecting groups/positional groups • Rearrangement
RETROSYNTHESIS OF A NATURAL PRODUCT (PENICILLIN)
https://chemistonthekeys.wordpress.com/2012/03/12/classic-synthesis-i-penicillin-v/
SYNTHESIS OF A NATURAL PRODUCT (PENICILLIN)
https://www.google.com/url?sa=i&rct=j&q=&esrc=s&source=i
mages&cd=&ved=2ahUKEwi2o7bkuv7lAhUMP30KHR_pB3QQj
Rx6BAgBEAQ&url=%2Furl%3Fsa%3Di%26rct%3Dj%26q%3D%26e
src%3Ds%26source%3Dimages%26cd%3D%26cad%3Drja%26u
act%3D8%26ved%3D%26url%3Dhttps%253A%252F%252Fwww.r
edbubble.com%252Fpeople%252Fnarwhalfire%252Fworks%25
2F23613992-penicillin-v-total-
synthesis%26psig%3DAOvVaw2eKb1R_Cf-
BD8pISh5HO7e%26ust%3D1574533687162943&psig=AOvVaw2
eKb1R_Cf-BD8pISh5HO7e&ust=1574533687162943
https://www.youtube.com/watch?v
=rh0Tn_oPS30
DEVELOPING NEW MOLECULES
• You guess
• Quantitative structure–activity relationship (QSAR) approach
SOFTWARE FOR SYNTHESIS THEN(1960’S)
• Started DENDRAL, LHASA(Corey and Wipke), and SEC (WIPKE) nearly 50 years ago
• Retrosynthesis oriented programs for determining reaction routes were SEC and LHASA
• DENDRAL was used in characterization of unknown molecules via spectral data
• How do computers relate?
RETROSYNTHESIS TO COMPUTERS
LHASA
• Programmed and maintained by Corey as a tool of efficiency for synthesis design
• Interactive with chemist to yield reaction pathways of interest
• Chemist input of target molecule
• boundary conditions with goals are defined
• Inverse synthetic operations that satisfy goals are computed and unlikely solutions are deleted
• Chemist then assesses the outputted precursors
LHASA
• Graphical Module
• Chemist draws a structure
• Represented as a connection table for atoms and bonds
• Coupled with a list of coordinates for atom positions
• Other representations fall short (line notation)
LHASA
• Perception Module
• Recognizes functional groups, chains, rings, symmetry, redundancy, and related atoms
• For Rings: A is the origin atom and path is grown along network until it doubles back on itself. (An appears before Ai) The ring is then added to a list of rings.
Path: A1, A2,..., Ai,..., An, An-1
Sequence: Ai, Ai+1,... An-1
*Allows for set operations
https://www.onlinemathlearning.com/union-set.html
LHASA
• Strategy and control module
• Heuristics applied: introduction of reactive functionalities, mechanistic disconnections, transforms that lead to disconnections
• Knowledge based rules are applied
• Requires a knowledge base of fundamental reactions
LHASA
• Modification module
• Subroutines operating on the connection table are applied to introduce the transforms necessary for generating the precursors
• Making and breaking of bonds, loss/addition of atoms, loss/addition of charge.
LHASA
• Evaluation Module
• Bulk of evaluation is done by the chemist
• Program evaluates valence violations, etc..
• Structural simplicity is evaluated (rings, appendages, etc.)
*ring system simplicity
LHASA
LHASA
DENDRAL (1965)
• Utilized a heuristic based approach to determining molecular structure based off spectral data
• Isomers, alcohols, and ketones were problematic
DENDRAL
• Heuristic approach
LHASA
• Problems with LHASA and DENDRAL
• Memory limitations
• Difficult to add new reactions
• Backtracking the rationale of solutions is difficult
• Doesn’t scale well
AI AND CHEMISTRY TODAY
• Synthesis planning
• Prediction of Organic reaction outcomes
• Robot chemists
• Chemical property prediction
SYNTHESIS PLANNING
• Deep neural networks trained on fundamental organic chemistry retrosynthesis rules
• Trained program run in collaboration with monte carlo search tree algorithm
• Selection
• Expansion
• Exploration
• Updating
• Reaction prediction indistinguishable from a human’s
SYNTHESIS PLANNING
• Training for fundamental rules of organic synthesis
• Neural networks can be trained to recognize and apply retrosynthetic fundamentals
• Use reaction records as a training basis
SYNTHESIS PLANNING
• Monte Carlo Search tree algorithm
• Selection • Child node with greatest probability of succeeding is selected
• Expansion • Successor nodes to the previously selected node are expanded
• Exploration • Reinforcement learning to make random decision further down from children nodes • Children nodes are explored at random assigning a “reward” to each one based on
the proximity of its output to the desired solution
• Updating • Parent nodes are updated based on the scores of the children nodes • A pathway is then selected after updating nodes “reward” states based on selection
of a node that satisfies the query
MONTE CARLO SEARCH
PREDICTION OF REACTION OUTCOMES
• Determine reaction outcome based on reactants and conditions
• The term “reaction” is an abstraction
• Prediction is based on three approaches
• Physical laws
• Rule based expert systems
• Inductive machine learning
PREDICTION OF REACTION OUTCOMES
• Rule based expert systems
• Employs heuristics, graph rewrite patterns, and constraints
• Drawbacks
• Large knowledge base
• Not scalable
• Confined in its ability
PREDICTION OF REACTION OUTCOMES
• Physical laws
• reactions are modeled as minimum energy paths between stable configurations on a high-dimensional potential energy surface, where saddle points represent transition states.
• Schrodinger’s equation cannot be solved for exact solutions
PREDICTION OF REACTION OUTCOMES
• Mechanistic
• Easier to predict
• *preferred representation
PREDICTION OF REACTION OUTCOMES
• A novel approach
• Incorporates idealized graph based MO’s
• Trained on “productive” reactions
• MO constructive interaction is statistically ranked
• (electron filled/unfilled MO)
PREDICTION OF REACTION OUTCOMES
• Reaction prediction model
• Requires training set of reactions
• mechanistic construction of MOs
• Ranking of productive mechanisms
PREDICTION OF REACTION OUTCOMES
• Construction of the molecular orbitals
• For a molecule “m” a connection graph is generated
• Vertices Am represent labeled atoms and the edges Bm
• Quadruples of the filled and unfilled orbitals are generated
• Each atom can have multiple MO designations
m = Gm(Am,Bm)
f := (a, tf , nf , cf )
PREDICTION OF REACTION OUTCOMES
• Trained neural network: Reaction site filtering
• Reaction explorer system provided training data
• Reactivity (l) is assessed based on the learned model for (a,c) where l=1 or 0
• Trained neural networks using sigmoidal activation functions in a single hidden layer and a single output node
• Gradients on the weights of the neural network are calculated with standard back-propagation
• provide a probabilistic prediction of an (a ,c ) tuple being labeled reactive
• Determines the possible reactions based on electron sources and sinks
• Possibilities involving unfilled unreactive MOs are disregarded
• Orbital interaction computed
PREDICTION OF REACTION OUTCOMES
• Orbital interaction is computed
PREDICTION OF REACTION OUTCOMES
• Orbital interaction ranking
• training on ordered pairs of productive and unproductive orbital interactions
• pair of shared weight artificial neural networks, each with a single sigmoidal hidden layer and a linear output node
• sigmoidal output layer with fixed weights of +1,-1
• Yields a ranked set of rational reaction outcomes based on reactants and conditions
PREDICTION OF REACTION OUTCOMES
• Conclusion
• Huge amounts of data exist for training
• Scalable
• Accurate
OTHER AI APPLICATIONS IN CHEMISTRY
• Robot chemists
• Property prediction
• Electron density prediction
• A lot more...
FUTURE OF AI IN CHEM
• AI will not replace chemists but will be a tool added to the chemist toolkit
• Automated reactions
REFERENCES
• Yanaka, M.; Nakamura, K.; Kurumisawa, A.; Wipke, W. T. Automatic Knowledge Base Building for the Organic Synthesis Design Program (SECS). Tetrahedron Computer Methodology 1990, 3 (6), 359–375.
• Proceedings of the 2019 Workshop on Network Meets AI & ML - NetAI19. 2019.
• Wipke, W.; Ouchi, G. I.; Krishnan, S. Simulation and Evaluation of Chemical Synthesis—SECS: An Application of Artificial Intelligence Techniques. Artificial Intelligence 1978, 11(1-2), 173–193.
• Peiretti, F.; Brunel, J. M. Artificial Intelligence: The Future for Organic Chemistry? ACS Omega 2018, 3 (10), 13263–13266.
• Kayala, M. A.; Azencott, C.-A.; Chen, J. H.; Baldi, P. Learning to Predict Chemical Reactions. Journal of Chemical Information and Modeling 2011, 51 (9), 2209–2222.
QUESTIONS