computational enzyme design - baker lab...viktorya aviyente at bog˘aziÅi university and then...

26
Enzyme Design DOI: 10.1002/anie.201204077 Computational Enzyme Design Gert Kiss, Nihan C ¸ elebi-ȔlÅɒm, Rocco Moretti, David Baker, and K. N. Houk* A ngewandte Chemi e Keywords: active-site design · biomolecular catalysis · non-natural reactions · protein engineering · theozymes . Angewandte Reviews K. N. Houk et al. 5700 www.angewandte.org # 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim Angew. Chem. Int. Ed. 2013, 52, 5700 – 5725

Upload: others

Post on 28-Jan-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

  • Enzyme DesignDOI: 10.1002/anie.201204077

    Computational Enzyme DesignGert Kiss, Nihan Çelebi-�lÅ�m, Rocco Moretti, David Baker, and K. N. Houk*

    AngewandteChemie

    Keywords:active-site design ·biomolecular catalysis ·non-natural reactions ·protein engineering ·theozymes

    .AngewandteReviews K. N. Houk et al.

    5700 www.angewandte.org � 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim Angew. Chem. Int. Ed. 2013, 52, 5700 – 5725

    http://www.angewandte.org

  • 1. Introduction

    Life depends on protein catalysts that control andaccelerate reactions involved in metabolism. Over threebillion years of evolution has led to enzymes that can catalyzechemical transformations that are too slow to be measuredunder normal conditions. The most proficient enzymes canaccelerate reactions with turnover rates that occur as rapidlyas the diffusion of reactants to the catalyst. Furthermore,enzymes enhance the catalytic rate of specific reactions andsubstrates. They frequently depend on cofactors or coen-zymes, and are often sensitive to environmental conditions(pH value, temperature, and solvent), yet operate over a largespectrum thereof—making them ideal components for com-plex and tightly regulated metabolic pathways.

    Humans have taken advantage of enzymatic processes foras long as naturally occurring fermentation has been con-trolled to preserve foods, to make bread and cheese, or toproduce alcoholic beverages. However, the word “enzyme”from Ancient Greek “en zýmē” or “in dough/yeast” wasn�tcoined until 1876, when German physiologist Wilhelm K�hnechose to simplify references to “elements that are responsiblefor fermentation processes” by giving them a name.[1] At thetime, the identity of enzymes was speculative, but EmilFischer suggested in 1894 a “lock and key” model to explainthe substrate specificity of enzymes. 32 years later, in 1926,James B. Sumner purified and crystallized urease, and showedthat enzymes are proteins in their own right. In 1946, LinusPauling speculated that enzymes are “closely complementaryin structure to the activated complex for the reactioncatalyzed”,[2] a remarkable statement considering that at thetime “no one [had] succeeded in determining the structure ofany enzyme nor in finding out how the enzyme does its job”.[3]

    The study of structure–function relationships at the atomiclevel continued to remain elusive for another two decades,until the first high-resolution crystal structure of an enzymewas solved and the field of structural biology emerged.[4]

    Since then, a surge of active research related to enzymecatalysis has continued to probe, adjust, and expand ourunderstanding of these seemingly miraculous “nano-machines”. A variety of factors have been proposed to

    explain the observed rate enhancements, and range fromnoncovalent transition-state (TS) stabilization (electrostatic,desolvation, restriction of motion, etc.) to covalent bonding(low energy barrier hydrogen bonds, formation of intermedi-ates, metal–ion interactions, etc.). Researchers have come toembrace Pauling�s hypothesis as a general statement of whatis responsible for the catalytic power of natural enzymes, butthe power of preorganization and chemical catalysis is nowrecognized.[5] The most proficient[6] of these catalysts offer farmore than an active site that is complementary to the TS; theyenter into the reaction by altering the TS and thus change thefree-energy profile from what it is in solution.[7] This“covalent hypothesis” explains why the vast majority ofenzymes can achieve TS binding constants that are orders ofmagnitude beyond what can be expected from noncovalentinteractions.

    Catalytic proficiency is formally the binding constant ofthe complex formed between the enzyme and the transitionstate, and was defined by Wolfenden as Ktx

    �1 = (kcat/KM)/kuncat.

    [6] Remarkably, Ktx�1 spans 21 orders of magnitude (108

    to 1029m�1)[6,8] for enzymes that have been studied todate,[6, 9–12] with an average Ktx

    �1 value of 1016.0�4.0m�1.[13] Thisvalue corresponds to an average DG value for transition-statebinding of 22 kcalmol�1, but can range up to 38 kcalmol�1,much higher than a noncovalent TS binding free energy of15 kcal mol�1.

    Recent developments in computational chemistry and biology havecome together in the “inside-out” approach to enzyme engineering.Proteins have been designed to catalyze reactions not previouslyaccelerated in nature. Some of these proteins fold and act as catalysts,but the success rate is still low. The achievements and limitations of thecurrent technology are highlighted and contrasted to other proteinengineering techniques. On its own, computational “inside-out” designcan lead to the production of catalytically active and selective proteins,but their kinetic performances fall short of natural enzymes. Whencombined with directed evolution, molecular dynamics simulations,and crowd-sourced structure-prediction approaches, however,computational designs can be significantly improved in terms ofbinding, turnover, and thermal stability.

    From the Contents

    1. Introduction 5701

    2. Protein Engineering 5703

    3. The Inside-out Approach toComputational Enzyme Design 5708

    4. Computational EnzymeDesign—Achievements 5711

    5. Challenges in Enzyme Design 5719

    6. Summary and Outlook 5721

    [*] Dr. G. Kiss, Dr. N. Çelebi-�lÅ�m, Prof. Dr. Dr. K. N. HoukDepartment of Chemistry and BiochemistryUniversity of California, Los Angeles607 Charles E. Young Dr. East, Los Angeles CA, 90095 (USA)E-mail: [email protected]

    Dr. R. Moretti, Prof. Dr. D. BakerDepartment of Biochemistry and Howard Hughes Medical InstituteUniversity of Washington, Seattle, WA 98195 (USA)

    Dr. G. KissCurrent address: Department of Chemistry, Stanford University,Stanford, CA 94305 (USA)

    Dr. N. Çelebi-�lÅ�mCurrent address: Yeditepe University, Department of ChemicalEngineering, Istanbul (Turkey)

    Computational Enzyme DesignAngewandte

    Chemie

    5701Angew. Chem. Int. Ed. 2013, 52, 5700 – 5725 � 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim

  • Chemists imagine the possibility of designing and synthe-sizing molecules with the attributes of enzymes (selective,proficient, “green”, operating in water under ambient con-ditions, nontoxic, and biodegradable). To do so, at leasta subset of the above factors has to be considered, dependingon the target reaction. Furthermore, it can be desirable to tryand unite catalytic turnover with substrate-, stereo-, regio-, orchemoselectivity as well as a tolerance towards organicsolvents, elevated temperatures, and chemical degradation.Many different approaches of this type have been reviewedextensively: these include bioinformatics approaches,[14,15]

    natural evolution based engineering,[16] host–guest and supra-molecular chemistry,[13] directed evolution,[17–21] catalytic anti-bodies,[22–24] organocatalysis,[25] rational structure-based pro-tein engineering,[26] and computational protein design.[27]

    In this Review we describe the computational “inside-out” approach to enzyme design, and the beginnings of whatwe strive to develop into a robust technology to makecatalysts for synthesis, biotechnology, and therapeutics. Theidea behind our computational design strategy is to utilizebiochemical building blocks (amino acids, cofactors, co-

    enzymes, etc.) to produce catalysts for nonbiological process-es that can be made by microbiological techniques. The recentsurge in computational power has spurred an increase in thedevelopment and testing of improved structure predictionand conformational search algorithms. Quantum mechanicalmethods lead to predictions of the arrangements of functionalgroups that maximize the binding and stabilization of thetransition states of the desired reaction. If a protein can bedesigned that will fold into the necessary 3D geometry,catalytic conversion of non-natural chemicals into product(s)should be possible. To avoid having to predict the stability ofnew sequences from scratch, we incorporate the designedactive site into stable protein folds. Furthermore, we adapt asmuch of the active-site components from natural precedent asis possible for the non-natural reaction or substrate.

    We discuss the “inside-out” protocol, highlight approx-imations and bottlenecks, explore examples of successfuldesign, examine achievements and challenges, and presentcases in which variations and additions to the original designprotocol were beneficial. We conclude that molecular dynam-ics (MD) simulations, post-design directed evolution, and

    Gert Kiss studied chemistry in Heidelberg,Germany, carried out research in the lab ofNoah W. Allen at UNCA, and then went onto pursue a PhD with K. N. Houk at UCLA.As a graduate student he was an NIH-CBIfellow, an LLNL Lawrence Scholar, anda recipient of the Stauffer Research Award.He is currently an NIH Simbios postdoctoralfellow with Vijay S. Pande at Stanford.

    Nihan Çelebi-�lÅ�m received a BS inchemistry at BoğaziÅi University, Turkey, andan MS in computational and theoreticalchemistry at Universit� Henri Poincar� inNancy, France. She received her PhD withViktorya Aviyente at BoğaziÅi University andthen carried out postdoctoral research withKendall Houk at UCLA. Recently she joinedthe faculty at Yeditepe University, Turkey.

    Rocco Moretti received his BS in biochem-istry at Worcester Polytechnic Institute, andhis PhD in biochemistry at the University ofWisconsin with Aseem Ansari and Jon Thor-son. He is currently a senior fellow in theresearch group of David Baker at the Uni-versity of Washington.

    David Baker received his PhD with RandySchekman at UC Berkeley and carried outpostdoctoral studies in biophysics with DavidAgard at UCSF. He is currently a Professorof Biochemistry at the University of Wash-ington and an Investigator at the HowardHughes Medical Institute. He is a memberof the National Academy of Sciences andthe American Academy of Sciences.

    K. N. Houk received his PhD with R. B.Woodward at Harvard, and then taught atLouisiana State University and the Univer-sity of Pittsburgh before joining UCLA in1986. He is now the Saul Winstein Chair inOrganic Chemistry. He was Director of theChemistry Division of the National ScienceFoundation from 1988–1990. He isa member of the National Academy ofSciences, the American Academy of Arts andSciences, and the International Academy ofQuantum Molecular Sciences.

    .AngewandteReviews

    K. N. Houk et al.

    5702 www.angewandte.org � 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim Angew. Chem. Int. Ed. 2013, 52, 5700 – 5725

    http://www.angewandte.org

  • crowd-sourced redesign can lead to improved efficiencies. Webegin by discussing some of the protein engineeringapproaches that have paved the way.

    2. Protein Engineering

    2.1. Catalytic Antibodies

    Following the pioneering work of the research groups ofLerner and Schultz in the mid-1980s,[28, 29] catalytic antibodieshave been produced for a wide range of chemical trans-formations.[24, 30] The concept is based on Pauling�s hypothesisthat enzymes provide an environment complementary instructure and electronic distribution to that of the rate-limiting TS.[2, 31] When challenged with a hapten that resem-bles the key TS characteristics for a given reaction, antibodiesare produced that can bind the hapten and thus also the TS itmimics. Transition-state binding equates to a lowered reactionbarrier and thus to an increased turnover rate compared tothe uncatalyzed reaction in solution.[32] The production ofcatalytic antibodies takes advantage of the rapid rates ofmutation and selection against a specific antigen that is a keycharacteristic of adaptive immune responses. The resultingbinding interactions are specific and can be harnessed tocatalyze non-natural reactions, and also to promote theconversion of non-natural substrates. Quantum mechanicalcomputations are useful for the design of transition-stateanalogues (TSAs) that can serve as haptens for a givenreaction.[33] Janda and co-workers reviewed strategies andchallenges in the development of new haptens.[24]

    Among the reactions that have been catalyzed by anti-bodies are Diels–Alder cycloadditions, acyl transfer reactions,oxy-Cope rearrangements, and cyclizations. Catalytic profi-ciencies range from Ktx

    �1 = 104.6m�1 to Ktx�1 = 108.6m�1, and so

    the transition states of the reactions they catalyze are boundmore strongly than the substrates (KM

    �1 = 103.5�1.0m�1).[13,30]

    Nature�s enzymes, on the other hand, exert massiveKtx

    �1 values, with an average range of 1012 to 1020m�1.Exceptional cases, such as ODCase and alkyl sulfatases,display Ktx

    �1 values of 1024m�1 and 1029m�1, respectively.[8,34]

    Naturally, the catalytic efficiency (kcat/KM) of such enzymes isoften limited only by the diffusion rate of the substrate andranges from 104 to 109m�1 s�1. In comparison, catalytic anti-bodies fall short of this limit by 4 orders of magnitude or more(kcat/KM = 10

    2–105m�1 s�1).[13,24, 34] This has been attributed tovarious factors, including product inhibition,[35–37] lower bind-ing constants,[13] lack of covalent binding and catalysis,[7]

    smaller buried surface area,[13] differences in timescales ofevolution,[22] and inadequacies of the immunoglobulin fold.[22]

    The comparatively low stability of the immunoglobulin foldand high cost of producing antibody catalysts further limittheir applications in industrial settings. Nonetheless, there isincreasing interest in their potential for therapeutic applica-tions, such as neutralizing HIV-1,[38, 39] antibody-directedenzyme prodrug therapy (ADEPT), and the inactivation ofaddictive substances through the antibody-mediated break-down of drug molecules.[40]

    While catalytic antibodies often suffer from productinhibition and can generally not be programmed for elaboratearrays of catalytic functionality, they have provided research-ers with an important toolkit for the study of biocatalyticprocesses. In the following subsections, we highlight examplesand discuss the role of computations in designing andunderstanding antibody catalysis.

    2.1.1. Diels–Alder Reaction

    The first catalytic antibody for a Diels–Alder reaction(1E9) was reported by Auditor and co-workers in 1989.[41] Itcatalyzes the cycloaddition between tetrachlorothiopheneand N-ethyl maleimide with a rate enhancement (kcat/kuncat) of1000m (Scheme 1).[42] The crystal structure reveals a mostly

    hydrophobic binding pocket with a single polar residue(AsnH85). Chen et al. studied 1E9 through a combination ofQM calculations, docking studies, molecular dynamics simu-lations, and a linear interaction energy approach.[43] Theactive site of 1E9 offers high shape complementarity to the TSgeometry and offers electrostatic interactions that favor theTS over the reactants.

    The Diels–Alder cycloaddition provides the opportunityto aim beyond catalytic rate accelerations and towardsachieving stereochemical control. Gouverneur et al. demon-strated this in a spectacular way: The cycloaddition betweentrans-1-N-acylamino-1,3-butadiene and N,N-dimethylacryl-amide affords a mixture of endo and exo stereoisomersunder thermal conditions. QM calculations were used toelucidate the characteristics of stereoisomeric transitionstates and to design transition-state analogues for the endoand exo pathways.[44]

    Antibody 13G5 catalyzes the disfavored exo-Diels–Alderreaction between methyl N-butadienyl carbamate and N,N-dimethylacrylamide (kcat = 1.20 � 10

    �3 min�1, kcat/kuncat =6.9m), and yields a single enantiomer in high enantiomericexcess (Scheme 2).[45] The crystal structure and QM calcu-lations showed that AspH50 and TyrL36 account for most of thecatalytic effect of 13G5, while AsnL91 better stabilizes the

    Scheme 1. Antibody-catalyzed cycloaddition of tetrachlorothiopheneand N-ethyl maleimide.

    Computational Enzyme DesignAngewandte

    Chemie

    5703Angew. Chem. Int. Ed. 2013, 52, 5700 – 5725 � 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim www.angewandte.org

    http://www.angewandte.org

  • ground state, and slightly retards the reaction.[45, 46] Thisfinding suggests that AsnL91 provides a structural frameworkfor the antibody to orient the substrates rather than havinga catalytic effect.

    The absolute configuration of the product was determinedexperimentally to be exo-(3S,4S). The specificity of thereaction was initially explored by docking the transitionstate into the crystal structure of antibody 13G5.[45] MDrelaxation of the antibody around the frozen TS revealed thatthe catalytic base (AspH50) can be coordinated by one andthree water molecule(s) in the presence of the exo-(3S,4S) andexo-(3R,4R) TS, respectively,[46] and the interaction of thecatalytic AspH50 with the carbamate NH group is significantlyweakened in the exo-(3R,4R) pathway.

    Antibody 10F11 catalyzes a retro-Diels–Alder reactionthat liberates HNO with a kcat/kuncat value of 2500(Scheme 3).[47] Inspection of the crystal structure of 10F11

    suggested good shape complementarity with the TS, andidentified specific active-site residues (Trp, Phe, Ser) thatwere proposed to contribute to catalysis.[48] Density functionaltheory (DFT) calculations were employed on models of theactive site to study the interactions that can stabilize thetransition state.[49]

    Kim et al. reviewed and compared the noncovalentcatalysis of Diels–Alder reactions by cyclodextrins, self-assembling capsules, antibodies, and RNAses, and concluded

    that—unlike enzyme catalysts—none of these hosts providesubstantial specific binding of the transition states.[50]

    2.1.2. Kemp Elimination

    The first catalytic antibody to catalyze the ring opening of5-nitrobenzisoxazole (Kemp elimination) was reported byHilvert and co-workers in 1996 (Scheme 4).[51] 34E4 displays

    a kcat/KM value of 5.5 � 103m�1 s�1 and a kcat/kuncat value of 2.1 �

    104 compared to the uncatalyzed reaction (kuncat = 3.1 �10�5 s�1), and a (kcat/KM)/kOAc� value of 3.4 � 10

    8 comparedto the rate of the acetate-promoted reaction in water (kOAc�=1.6 � 10�5m�1 s�1). Both experimental and computationalinvestigations demonstrated that, similar to other proton-transfer reactions, the Kemp elimination is sensitive to thegeometry in which the substrate and base are aligned.[52,53]

    Furthermore, when a carboxylate functions as the catalyticbase, Kemp elimination reactions are also highly sensitive tothe polarity of the solvent.[54, 55] These features in combinationwith the simplicity of the reaction and the ease with whichprogress can be monitored (UV/Vis), has made the Kempelimination a frequently studied model for base-catalyzedbiochemical transformations.

    2.1.3. Aldol/Retro-Aldol Reaction

    QM calculations on aldol reactions date back to the1980s,[56] and offer early geometric descriptions of thetransition state. Aldolase antibodies ab38C2, ab84G3, andab33F12 were later raised against TS-analogous haptens andcan catalyze aldol and retro-aldol reactions with activitiescomparable to natural aldolases, but with a broader substratescope (Scheme 5).[57–61] These antibodies resemble class Ialdolases that utilize the e-amino group of an active sitelysine residue to form a Schiff base with the substrate. Polarresidues outline the otherwise hydrophobic active site atdistances that range between 5 and 7 � from the e-aminogroup of LysH93. In a QM study, Arn� and Domingoinvestigated the role of some of these residues as potentialgeneral acid catalysts in the C�C bond-formation step.[62]

    2.1.4. Decarboxylation-Catalyzed Ring-Opening Reaction

    The rate of the decarboxylation reaction of 5-nitro-3-carboxybenzisoxazole varies by up to eight orders of magni-tude depending on the solvent polarity.[54, 55] Aprotic polarsolvents promote the reaction by desolvating the carboxylate

    Scheme 2. Antibody-catalyzed disfavored exo-Diels–Alder reaction ofmethyl N-butadienyl carbamate and N,N-dimethylacrylamide.

    Scheme 3. Antibody-catalyzed retro-Diels–Alder reaction.

    Scheme 4. Antibody-catalyzed ring opening of 5-nitrobenzisoxazole.

    .AngewandteReviews

    K. N. Houk et al.

    5704 www.angewandte.org � 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim Angew. Chem. Int. Ed. 2013, 52, 5700 – 5725

    http://www.angewandte.org

  • reactant and by stabilizing the transition state throughdispersion interactions.[63–66] Antibody 21D8 catalyzes thedecarboxylation of 5-nitro-3-carboxybenzisoxazole[67] by upto 61000-fold over the background reaction in water(Scheme 6).[68]

    QM, MD, and free-energy perturbation (FEP) calcula-tions were performed to explore the origins of catalysisfurther,[69] and showed that partial solvation of the carboxyl-ate group was detrimental to catalysis, but this was counteredby favorable hydrogen-bonding interactions with the isoxa-zole oxygen atom.

    2.1.5. Cyclization of trans-Epoxy Alcohols

    Four different antibodies were raised with two differenthaptens to catalyze the disfavored endo-tet cyclization

    reaction of trans-epoxy alcohols (Scheme 7).[70, 71] Quantummechanical calculations show that the endo-tet transition statehas SN1 character and can be stabilized electrostatically bya carboxylate.[72] The hypothesis was tested with QM modelsthat consisted of motifs using concurrent general acid/basecatalysis.[73, 74] Analogous calculations for 6-exo and 7-endocyclizations predicted the preferential formation of the seven-membered product. This was verified experimentally,[75] and isalso in line with the X-ray structure eventually obtained forantibody Fab 5C8.[71] The active site contains an AspH95–HisL89

    dyad that appears to be poised for acid/base catalysis.

    2.1.6. Hydrolysis of Aromatic Amides and Esters

    Antibody 43C9 catalyzes the hydrolysis of aromaticamides and esters with an unusually efficient kcat/kuncat valueof 2.5 � 105.[30, 76–78] Getzoff and co-workers built a computa-tional homology model of the antibody�s variable region andproposed that ArgL96 functions as the oxyanion hole, whileHisL91 is the catalytic nucleophile.[79] Subsequently, the X-raystructure of 43C9 was solved and supported the predictions,showing that a water-mediated hydrogen-bonding network inthe active site is important for catalysis.[80] Kollman and co-workers later performed QM calculations, MD simulations,and free-energy calculations on 43C9 and proposed a directhydroxide attack as an alternative to the mechanism involvingnucleophilic catalysis by HisL91.[81]

    2.1.7. Chorismate–Prephenate Rearrangement

    The Claisen rearrangement of chorismate to prephenate iscatalyzed by natural chorismate mutase enzymes,[82,83] and bythe catalytic antibodies IF7 and IIF1-2E11.[84, 85] Wiest andHouk employed QM calculations to mimic the active site ofchorismate mutase and proposed that specific hydrogen-bonddonors were responsible for the approximately 200-fold rateacceleration displayed by catalytic antibody IF7.[86]

    2.2. Directed Evolution

    Directed or laboratory evolution has become one of themore mature forms of protein engineering and has found itsway into modern industrial-scale applications.[87] It is a power-

    Scheme 5. Examples of antibody-catalyzed aldol and retro-aldol reac-tions.

    Scheme 6. Antibody-catalyzed decarboxylation reaction of 5-nitro-3-carboxybenzisoxazole.

    Scheme 7. Antibody-catalyzed disfavored endo-tet cyclization reactionsof trans-epoxy alcohols. nd = not determined.

    Computational Enzyme DesignAngewandte

    Chemie

    5705Angew. Chem. Int. Ed. 2013, 52, 5700 – 5725 � 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim www.angewandte.org

    http://www.angewandte.org

  • ful and commonly used approach to enzyme engineering thatrelies on iterative cycles of mutagenesis and selection.[17,18]

    Examples of its application include improved themostabil-ity,[88] tolerance to organic solvents,[89] strengthened protein–protein interactions,[90] altered substrate promiscuity/specific-ity,[91, 92] enhanced enzymatic activity,[93] and inversion ofenantioselectivity.[94, 95] In the directed evolution of catalyticfunction, a starting gene is mutagenized to create a library ofvariants, which is screened for enzymes with an improvementof the sought-after property (stability, substrate specificity,activity, etc.). Typically the improvements in any one roundare small, and the process is repeated many times.

    Strategies for the construction of libraries include randomwhole-gene error-prone PCR or random mutagenesis (Fig-ure 1a), site saturation or targeted mutagenesis (CASTing,

    ISM)[95] (Figure 1b), and the generation of chimeras throughsequence recombination (Figure 1c).[19] A key strength ofrandom mutagenesis is that no structural or mechanisticinformation about the enzyme is required and that beneficialmutations can be uncovered at unexpected positions distantfrom the active site.[96]

    Site saturation or targeted strategies, on the other hand,focus on certain areas of an enzyme (i.e. the active site) andrequire prior structural or biochemical knowledge about theprotein. Reducing the randomizable sequence space increasesthe probability with which multiple beneficial mutations canbe uncovered within the active site.[97] The approach is ofvalue when dramatic alterations to an enzyme�s function aresought or when improved function depends on a combinationof active-site variations.

    Beneficial mutations within a library can be identified, forexample, through statistical analysis of protein sequence–activity relationships (ProSAR),[98] then combined and incor-porated by gene shuffling.[99] Molecular and functionaldiversity can be further expanded with neutral drift libraries,in which mutations are accumulated that are orthogonal tothe function and stability of the enzyme.[100,101]

    A key challenge for directed evolution is the identificationof individual variants that display the desired improvementsout of a large set of randomized protein sequences.[20]

    Selection-based in vitro techniques, such as mRNA display[102]

    and emulsion-based microfluidic FADS (fluorescence-acti-vated droplet sorter),[103] exhibit substantial throughput.Screening-based techniques that measure substrate or prod-uct concentrations are the most versatile, but are also morelimited in their throughput.[104] Once a genotype–phenotypelink is established, directed evolution can work with allbiologically produced proteins, including those that containnon-natural amino acids or non-natural prosthetic modifica-tions.[105]

    Some experiments have involved completely naive start-ing points, but directed evolution works best for enzymes thatdisplay some level of activity towards the desired reaction ortowards a highly similar one.[19] While the success of directedevolution programs depends on a clear, uphill path from thestarting point to a highly active variant,[105] most proteinsequences do not display the desired initial activity. Thischallenge can be overcome somewhat through neutral driftlibraries and by gradually changing the selective pressurefrom the existing function to the desired one.[20, 21]

    Many attempts have been made to engineer and redesignproteins and enzymes over the past few decades. Those thatmet with success, employed variations of directed evolutionranging from random mutagenesis to semirational or focusedlibrary-generating strategies and sophisticated statisticalselection such as ProSAR in specific cases. Two recentexamples are the asymmetric synthesis of chiral amines forthe industrial production of the type-2 diabetes drug sitaglip-tin (Januvia)[106] and the oxidative desymmetrization of theprochiral amine for the production of the hepatitis C drugBoceprevir.[107] In other cases, computational approachesresulted in significant advances in understanding the mech-anism by which directed evolution can change the enantio-selectivity of an enzyme.[108, 109] In the past 5 years alone, over60 articles were published that reported on enhancing thethermostability, substrate and cofactor specificity, enantiose-lectivity, and reaction rate of natural enzymes. Many of thesewere engineered for applications in asymmetric organicsynthesis, and include transaminases, enoate reductases,esterases, monoamine oxidases, dehalogenases, and aldolases,as well as cytochrome P450s (oxidations and epoxidations)and Baeyer–Villiger monooxygenases. The topic has recentlybeen the subject of several excellent reviews.[87,110–113]

    2.3. Natural Evolution and Enzyme Redesign

    Nature has experimented with ways to generate newcatalytic functions for billions of years. The study of thesestrategies can provide us with insights that can be used tomake educated mutations to native active sites with the goalof eliciting new functions. Enzymes that belong to mechanis-tically diverse superfamilies are valuable starting points forsuch redesign efforts, particularly when they are structurallyconserved among one another. Members of such superfami-lies are frequently also promiscuous and one enzyme oftencatalyzes a number of chemical transformations, albeit atmuch lower rates than its physiological reaction.[16, 114]

    One successful redesign approach has been to enhancepromiscuous functions based on sequence and structure

    Figure 1. Three strategies for creating protein libraries by directedevolution. a) Random mutagenesis across the full sequence. b) Tar-geted mutagenesis that is focused on a specific site. c) Proteinsequence recombination for the replacement of entire segments.Reprinted from Ref. [19], with permission.

    .AngewandteReviews

    K. N. Houk et al.

    5706 www.angewandte.org � 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim Angew. Chem. Int. Ed. 2013, 52, 5700 – 5725

    http://www.angewandte.org

  • alignments. Here, the redesign is based on a template enzymewith innate activity for the target reaction. Information ona naturally existing enzyme that is known to promote thetarget reaction is then applied for the redesign of the templateenzyme. Fersht and co-workers, for example, compared thesequence of N-acetylneuraminate lyase (NAL) to that of thehomologous dihydrodipicolate synthase (DHDPS) and iden-tified a Leu-Arg mismatch in the active site.[115] TheLeu142Arg mutant was made (along with a number ofstabilizing mutations for the new Arg). This switched theactivity of NAL from its native retro-aldol cleavage (N-acetylneuraminate to pyruvate and N-acetyl-d-mannos-amine) to that of DHDPS (condensation of pyruvate withl-aspartate-b-semialdehyde). The native retro-aldol activityof NAL was abolished, while the rate of the innate DHDPS ofthe NAL was increased eightfold. Similarly, a Leu to Argmutation switched the physiological activity of 4-oxalocrot-onate tautomerase (4-OT) to that of trans-3-chloroacylatedehalogenase (CaaD).[116] Structural studies revealed onlyminor geometric changes. the kcat value of the CaaD activityof the 4-OT was increased 9-fold; kcat/KM increased 50-fold. Asomewhat more ambitious study introduced four mutationsinto the active site of keto-l-gulonate 6-phosphate decarbox-ylase (KGPDC) to increase the rate of its promiscuousactivity for the d-arabinose-hex-3-ulose 6-phosphate synthase(HPS) 170-fold.[117, 118]

    Although similar to the above, the redesign of an enzymetowards a reaction for which it does not possess anypromiscuous activity is a grander challenge. Sequence andstructure alignments with members of the same superfamilyhere too form the basis for redesign. Gerlt and co-workerscombined a rational mutation with directed evolution for theredesign of l-Ala-d,l-Glu epimerase (AEE) and muconatelactonizing ezyme (MLE), respectively.[119] The efforts wereaimed at introducing OSBS (o-succinyl benzoate synthase)activity into AEE and MLE, neither of which showspromiscuity towards the OSBS reaction. The feat wasachieved by altering the substrate specificity: a single muta-tion (Asp-Gly and Glu-Gly) allows AEE and MLE to acceptthe OSBS substrate, which readily reacts with the unchangedcatalytic residues to give o-succinyl benzoate. Ohta and co-workers went a step further and generated a-aryl propionateracemase activity in the homologous aryl malonate decar-boxylase (AMD) by introducing a catalytic acid/base into theactive site (Gly74Cys).[120] In a separate study, the enantiose-lectivity of that same decarboxylase was inverted by usinga double mutant (Gly74Cys, Cys188Ser).[121] The mutant gives(R)-a-thienyl propionate in a yield of 60 % and an enantio-meric excess of 84% ee, but also displays an approximately600-fold lower activity than the wild-type AMD. Randommutagenesis improved the kcat value 10-fold and decreasedthe gap to the wild-type AMD to a 60-fold drop in activity.[122]

    Dunaway-Mariano and co-workers impressively showed thatfunction can be transplanted within the crotonase superfamilyby replacing a His–Asp dyad with a Glu–Glu acid/basepair.[123] Two glutamates were introduced into the 4-CBA-CoA dehalogenase active site and six additional mutationswere necessary to give a fully soluble and stable protein. Witha kcat value of 0.064 s

    �1, the octamutant activity is far below

    that of the wild-type crotonase (kcat = 1000 s�1), but the

    exercise shows that “an entirely new catalytic pathway canbe created at the expense of the pre-existing pathway througha limited number of amino acid substitutions”.

    The redesign of enzyme superfamily members is useful indeciphering the strategies and principles that guide thenatural evolution of catalytic biomolecules. The chemicalversatility that is accessible to the protein engineer by thisroute, however, is limited to the generally narrow range ofreaction types within a superfamily. The crotonase super-family (CS) is a notable exception in which “nature has variedcommon structural features to evolve catalysts for a remark-ably diverse set of reactions”[124] spanning all six classes ofreaction defined in the Enzyme Commission (EC) classifica-tion scheme.

    2.4. Rational and De Novo Protein Design2.4.1. Design and Prediction of Protein Folds

    More drastic engineering approaches that are based ona variety of computational techniques have led to the redesignof entire proteins. Early work in the field focused on theredesign of helical bundles,[125] and employed strategies thataim at generating specific hydrophobic/hydrophilic pat-terns—a primary determinant for the orientation and registerof helical bundles. The approach gave some sense of controlover the formation of a fold without necessitating theprediction of specific side-chain orientations.[126–128] Furtherwork extended the computational design approach to proteinstructures with less-regular geometries.[129] The generalapplicability of computational protocols, such as that ofRosettaDesign, was tested by re-engineering a diverse set ofnine small globular proteins.[130] The computational design ofproteins of complex topology is assisted by techniques such asdead-end elimination and Monte Carlo sampling that canattempt to pack side chains in their minimal energy positions.The scope of computer-based engineering is not limited to theredesign of existing topologies. Kuhlman et al., for example,iterated between sequence design and structure prediction toaccess novel protein folds, and produced the Top7 a/btopology.[131] In contrast to previous design procedures thattreated the backbone as rigid and require a vast conforma-tional space to be sampled, the design of Top7 was possible inpart because of a flexible backbone minimization step in theiterative protocol.

    2.4.2. Protein–Protein Interactions

    Computational approaches have also been employed forthe design of protein–protein interactions. Huang et al.achieved micromolar binding affinities by using a designminimization approach in which the best amino acid identitiesand rotamers were predicted for the protein–protein inter-face.[132] Improved affinities were obtained when the naturallyoccurring protein–protein interfaces were used as a guide.[90]

    Key residues that are thought to account for the bulk of thebinding affinity are chosen as “hot spots” and placed inlocations likely to maximize binding. The rest of the interface

    Computational Enzyme DesignAngewandte

    Chemie

    5707Angew. Chem. Int. Ed. 2013, 52, 5700 – 5725 � 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim www.angewandte.org

    http://www.angewandte.org

  • is then “filled in” to maximize packing around these keyinteractions and yielded a binding affinity of 130 nm. Thesomewhat more challenging task of designing a single binderto a fixed, biologically relevant partner gave rise to a compu-tational design with a binding affinity of 200 nm.[133] Directedevolution further improved the binding constants to 180 pmand 4 nm. Analysis of the mutations suggest that the computa-tional designs could be improved by accounting for backboneflexibility, as well as improved electrostatic and solvationmodels.[133] More recently, DeGrado and co-workers utilizedtheir computational design approach CHAMP (computedhelical antimembrane protein method)[134] to produce a helicalb peptide that targets a transmembrane helix of the integrinaIIbb3.

    [135] The DeGrado research group further showcased theutility of computational design approaches by generatinghelical protein assemblies along carbon nanotubes.[136]

    2.4.3. DNA Binders

    The design of DNA binders is another interestingdirection of computational protein design. One technique isto combine preexisting DNA binding modules by redesigningthe intermodule interfaces, thereby reducing the problem toa design of protein–protein interactions.[137] More targetedchanges were made in the computational redesign of homingendonucleases that can recognize a single base pair differ-ence.[138,139] The design for recognition of multiple base pairchanges has also been demonstrated.[140] The simultaneousintroduction of multiple adjacent base pair changes provedmore successful than a stepwise combination of mutationsfrom individual base pair changes. The design of suchsequence specificity changes has been used in the case ofthe homing endonuclease I-AniI to probe the role of DNAsequence in binding and catalysis.[141]

    2.4.4. Protein–Ligand Interactions

    Early work on protein–small-molecule binding appearedpromising, with reports of binders for metal,[142,143] lactate,[144]

    serotonin,[144] TNT,[144] and nerve agents.[145] However, doubtsarose when the periplasmic binding proteins designed forlactate, serotonin, TNT, and nerve agents did not show ligandbinding when assayed by isothermal calorimetry (ITC) orNMR spectroscopy.[146] It is thought that the initial reports ofsuccess may have arisen from the reliance on an indirectenvironmentally sensitive fluorescence-based readout. Whilenot a solved problem, some progress has been made onprotein–small-molecule binding. Boas and Harbury appliedcomputational design to periplasmic binding proteins andfound that native site recapitulation required high-resolutionrotamer sampling, continuous minimization, and accurateelectrostatic calculations.[147,148] DeGrado and co-workerswere able to create an a-helical bundle which was able tobind a heme-like cofactor.[148] Recent attempts at redesigninga dipeptide binder, on the other hand, were unsuccessful,presumably because of inadequately accounting for thebinding site flexibility.[149]

    2.4.5. Catalytic Peptides and Proteins

    Early examples for the de novo design of chemicalfunctions include, but are not limited to, the following studies:

    Johnsson et al. designed a metal-free oxaloacetate decar-boxylase (oxaldie) that operates through an imine mechanismby incorporating a reactive amine onto an amphiphilic a-helix.[150] Designed oxaldies catalyze the decarboxylation ofoxaloacetate with a kcat/KM value of 0.63m

    �1 s�1. The rate ofimine formation is found to be three to four orders ofmagnitude larger with oxaldie than with simple aminecatalysts and comparable to catalytic antibodies (103–106).[151]

    Sasaki and Kaiser designed “helichrome”, an artificialhemeprotein, in which four amphiphilic a helices werecovalently tethered to one face of the porphyrin ring tocreate a hydrophobic pocket for substrate binding. The FeIII

    complex of helichrome showed hydrolase activity and con-verted aniline into p-aminophenol in the presence of NADPHwith a kcat/KM value of 1.67m

    �1 s�1.[152]

    Broo et al. designed a hairpin helix–loop–helix motif thatdimerizes to form four-helix bundles and that utilizes histidineresidues to catalyze the acyl-transfer reaction of activatedesters.[153] Rossi et al. inserted two and four copies of theartificial triazacyclononane amino acid into three distincthelix–loop–helix peptides. They generated ZnII binding sitescapable of catalyzing the transesterification of an RNA modelsubstrate up to 380-fold.[154]

    Dutton and co-workers used a tryptophan and a tyrosineradical maquette, a3W

    1 and a3Y1,[155] as models of radical

    enzymes to study how side-chain radicals are generated,controlled, and directed towards catalysis.[156] In more recentwork, Pecoraro and co-workers utilized a3D as a scaffold forthe placement of three cysteine residues that are capable ofbinding the heavy metal ions CdII, HgII, and PbII.[157]

    DeGrado and co-workers described the catalysis of an O2-dependent phenol oxidase reaction by de novo diiron modelproteins based on the four-chain heterotetrameric helicalbundle DFtet.

    [158] The most active variant catalyzes theoxidation of 4-nitrophenyl acetate with a 1000-fold rateenhancement.

    Bolon and Mayo used a “compute and build” strategy toincorporate hydrolase activity onto a catalytically inert E. colithioredoxin scaffold. The resulting PZD2 utilizes a nucleo-philic histidine to promote the hydrolysis of p-nitrophenylacetate 180-fold.[159]

    3. The Inside-out Approach to ComputationalEnzyme Design

    In recent years, computational algorithms have becomeincreasingly reliable for identifying amino acid sequencescompatible with a target tertiary structure. Efforts towardssolving the inverse protein folding problem[160–163] reacheda milestone with the design and successful experimental proofof the structure of the 93-residue a/b protein Top7.[131] Thisshowed that, for an arbitrary fold, it is possible to usecomputational methods to predict sequences that wouldproduce that stable fold. While a great deal remains to be

    .AngewandteReviews

    K. N. Houk et al.

    5708 www.angewandte.org � 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim Angew. Chem. Int. Ed. 2013, 52, 5700 – 5725

    http://www.angewandte.org

  • done in this area, another great challenge is to createfunctional proteins that can promote non-natural chemicalreactions.

    A collaborative effort between the research groups ofBaker and the Houk has led to the development of an “inside-out” protocol towards this goal (Figure 2). At the core of the

    computational design protocol is a theoretical active site(theozyme, Figure 2, top panel) with the appropriate func-tionality for catalysis. Here, quantum mechanical (QM)calculations are employed to determine the catalytic unitsthat will be most effective at stabilizing the transition state(TS) in a precise geometrical arrangement. Protein scaffoldsare selected from the PDB (http://www.rcsb.org)[164] and areused as templates into which the QM transition-stategeometry is grafted (RosettaMatch,[165] Figure 2, centerpanel). Amino acid residues surrounding the QM theozymeare mutated and optimized to ensure good packing and foldstability, and to complement the geometric and electronicfeatures of the TS (RosettaDesign, Figure 2, bottom panel).

    3.1. Theozymes

    In the first step of the inside-out design protocol, QMcalculations are carried out to generate three-dimensionalarrangements of functional groups that are optimal forstabilizing the TS of the targeted reaction.[166] A theozyme

    (short for theoretical enzyme) is typically constructed from anarray of amino acid side chains and backbone amides, butincorporation of unnatural amino acids and cofactors canfurther expand the chemical space. For a given reaction,a number of distinct theozyme motifs are usually generated,each of which varies in the composition of its functionalgroups. The energy profile of each motif is computed and themagnitude of catalysis is assessed. The theozyme motifs arefurther diversified geometrically by producing an ensemble ofconformations without disrupting the catalytic interactions.

    3.2. Incorporating Theozymes into Protein Scaffolds

    RosettaMatch has been used to search the native activesites of existing protein structures for backbone positions thatcan accommodate the three-dimensional side-chain arrange-ment in a theozyme. The program “matches” the theozymemotif into the pocket by sequentially attaching each sidechain of the theozyme to the backbone of the protein scaffold.Side-chain rotamers are generated for every position in thescaffold active site to which the functional groups of thetheozyme are mapped. An ideal match is then one in whichthe exact three-dimensional geometry of the theozyme can berealized. Deviations from the optimum geometry by just a fewtenths of an Angstrom and single-digit angles can lead toenergetic penalties of up to 5 kcalmol�1, which translates tofour orders of magnitude in terms of the reaction rate (kcat). Inpractice, an ideal match has not yet been obtained for any ofthe designed enzymes; a circumstance that can be attributedto the discrete nature of both the protein backbone and theprimary matching algorithm as well as to the computationalcost associated with the mapping out of conformational space.Matching then quickly becomes a bottleneck in the computa-tional design protocol, particularly when a theozyme invokesthree or more catalytic residues. Hence, an exact searchtypically does not give a single match and it becomesnecessary to assign tolerance values to catalytic distances,angles, and dihedrals. The resulting matches are generallydistorted from the theozyme geometry and necessitate someform of geometric filtering and ranking according to theirtheozyme-likeness. A useful utility for this purpose is EDGE(enzyme design geometry evaluation), which uses geometrichashing to compare theozyme atoms with a target structureand ranks matches based on the summation of their devia-tions.

    SABER (selection of active/binding sites for enzymeredesign), a program developed by Houk and co-workers,offers an alternative to RosettaMatch: instead of placingtheozymes into predefined active sites, SABER searches theProtein Data Bank (PDB) for proteins with the appropriatecatalytic functionality already in place. When a suitable activesite is found, only those amino acid residues need to bemutated that are required to accommodate the new substratein its transition-state geometry. This stands in contrast to theRosettaMatch-based approach, where both the new catalyticfunctionality and the new substrate must be accommodated,generally requiring a larger number of mutations than theSABER-based approach.

    Figure 2. Key steps in the computational inside-out design protocol(shown here for the Kemp elimination): from QM theozyme, to match,to design.

    Computational Enzyme DesignAngewandte

    Chemie

    5709Angew. Chem. Int. Ed. 2013, 52, 5700 – 5725 � 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim www.angewandte.org

    http://dx.doi.org/10.1016/0014-5793(76)80847-2http://www.angewandte.org

  • 3.3. Active-Site Design

    After the theozyme has been attached to a scaffoldprotein, either by RosettaMatch or by SABER, the Roset-taDesign module is used to restrain catalytic residues and togenerate an optimal sequence/structure for the remainder ofthe active site. Rotamer sampling by Monte Carlo simulatedannealing is used to optimize the identity and position ofactive-site residues, both in terms of their interactions withthe theozyme and also with each other. To further refine theactive site, this rotamer sampling is performed for multiplerounds, interspersed with minimization of the side chains,backbone, substrate conformation, and rigid body position.Throughout the process, the theozyme geometry is enforcedthrough restraints. To ensure that the resultant sequence isintrinsically compatible with the theozyme, rather than beingexternally forced, a last cycle of repacking and minimizationwithout the geometric restraints is commonly run.[167] Ideally,these steps lead to the introduction of amino acid residuesthat add interactions to stabilize the positions of the keycatalytic residues, tune their pKa values, and optimize tran-sition-state binding. In practice, each match that enters theactive-site design stage contains a theozyme that is alreadysignificantly distorted compared to the ideal QM TS geom-etry. RosettaDesign is then tasked with generating the bestpossible stabilization for a geometry that in itself is non-ideal.While RosettaDesign attempts to constrain the design to theideal theozyme, as specified by the geometric restraints,normally even the highest ranked final designs differ quiteconsiderably from the original theozyme geometry. Figure 3illustrates this point with four Kemp elimination designs thatare superimposed onto the catalytic heavy atoms of theirtheozyme. The individual side chains cluster together in theirgeneral three-dimensional arrangements (Figure 3a), but lackthe precise positioning that naturally evolved enzymes displaywithin a catalytic class (Figure 3 b).[168]

    3.4. Filtering, Ranking, and Evaluating Computational Designs

    Prior to the experimental workup, final designs areassessed towards their capability to stabilize the key catalyticresidues. They are ranked on the basis of empirical criteriasuch as Rosetta energy, ligand-binding scores, hydrogenbonding, active-site geometry, and packing scores. Compar-ison with the original scaffold protein plays an important role,as the native context forms a reference for what a well-foldedprotein looks like. Thus far, assessing the quality of finaldesigns has relied heavily on the chemical intuition of thehuman designer for assessing how “enzyme-like” prospectivedesigns are and for capturing properties that are currently notaccounted for by the Rosetta scoring framework. Nature�scatalytic units are generally supported by frameworks ofhydrogen bonds, steric packing, p–p stacking, limited dynam-ics, and limited water accessibility. At present, some of this isimplicitly accounted for through various energy scores thatpenalize poor interactions and reward good ones throughoutthe design and repacking process. The explicit provision ofsupporting interactions for the catalytic unit can be viewed as

    a second, and in a sense more challenging, stage in the designprocess, for which we are only now beginning to establishautomated protocols. Increasingly, tools such as Foldit,EDGE, various in-house scripts, and more rigorous computa-tional tests that probe the dynamics of the systems are beingdeveloped and refined with the goal of maximizing thesuccess rate, particularly as more challenging reactions arepursued.

    The design of Kemp eliminases and retro-aldolases, forexample, was carried out with the first version of Rosetta-Match and RosettaDesign. The scaffold set consisted of only87 proteins, only a discrete matching algorithm was available,and the backbones of the proteins were treated as rigid. Theassessment of designs was performed by manual inspection ofthe optimized final structures. The design of proteins towardscatalysis of a bimolecular Diels–Alder cycloaddition wascarried out with an updated version of Rosetta, using thediscrete matching algorithm against a scaffold set of 227proteins. Final designs were assessed both by manualinspection of the optimized geometries and by moleculardynamics simulations.

    The current collection of Rosetta modules (Rosetta3)extends the scaffold set to the entire PDB, introducesa secondary, nondiscrete matching algorithm, which comple-ments the primary one, and allows a small degree of backboneplasticity in response to a new active-site sequence. MDsimulations were found to be valuable for assessing thestructural integrity of a newly designed active site and for

    Figure 3. Geometric overlay of catalytic atoms. a) Theozyme (black/orange) over four final Rosetta designs in the TIM barrel fold (lightgreen). The catalytic heavy atoms are highlighted as spheres. RMSDvalues of KE designs with the His–Glu/Asp dyad compared totheozyme: 1.2 � for KE70 (second most active), 0.8 � for KE38(inactive), 0.8 � for KE54 (inactive), 0.5 � for KE66 (inactive). b) Cata-lytic triad from esterase crystal structures; RMSD= 0.45 � within thesame fold.[168]

    .AngewandteReviews

    K. N. Houk et al.

    5710 www.angewandte.org � 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim Angew. Chem. Int. Ed. 2013, 52, 5700 – 5725

    http://www.angewandte.org

  • exposing design flaws that are intractable from static evalua-tions.[169]

    The sequence of a final design generally differs by 10% ormore from that of the wild-type template protein. Dependingon the degree of the perturbation, the packing and hydrogen-bonding interactions within the modified protein are expectedto be less ideal than those of the wild-type scaffold protein.Cycling through repacking and geometry optimization duringthe design process ensures that the overall conformation ofthe new protein is at a minimum of its potential energylandscape. However, neighboring minima may exist (corre-sponding to alternative conformations of side chains andloops) that could have become thermodynamically morefavorable in the design process. The actual conformationalstate of a designed active site might thus differ significantlyfrom that of the computational model—a possibility that canreadily be investigated through molecular dynamics (MD)simulations. MD evaluations are now performed on a routinebasis for finalized designs as a means to pinpoint structuralweaknesses and to guide adjustments in the form of additionaland/or alternative mutations.

    3.5. Experimentation

    Aside from the source of the genes (chemical synthesisversus cloning), experimental validation of computationallydesigned enzymes is much the same as activity measurementsfor any other enzyme.

    3.5.1. Synthesis and Expression

    In the case of the retro-aldolases,[170] the Kemp elimi-nases,[170, 171] and the Diels–Alderases,[172] the final optimizedprotein sequences were sent to a commercial gene synthesiscompany for typical codon optimization and cloning intoa standard His-tagged E. coli expression vector. E. coliBL21(DE3) cells were then transformed with the plasmid,and the gene expressed under conventional IPTG or auto-induction conditions.[173] Soluble protein can thus be obtainedby conventional IMAC purification,[174–176] along with gelfiltration.

    3.5.2. Enzyme Assays

    One potential complication in assaying the activity ofa designed enzyme is the low activity of most of the initialvariants. Assays that can detect slightly above backgroundlevels of activity are thus preferred to identify these weakcatalysts. Such assays are selected on the basis of the targetreaction. The Kemp eliminases of Rçthlisberger et al. and theretro-aldolases of Jiang et al. were designed for a reactionwith a spectrophotometic shift, and continuous monitoring byUV/Vis spectroscopy over the course of over 10 min or 40 h,respectively, allowed for the detection of product formation.In contrast, the Diels–Alderases of Siegel et al. were designedagainst a reaction that was spectrophotometrically silent, soproduct formation was monitored by LC-MS, with time pointstaken over the course of several days. In this case, a chiral LC-

    MS assay allowed for further characterization of the stereo-specificity of the reaction, which showed that the catalyst wasspecific for the product configuration selected at the theo-zyme stage.

    3.5.3. Directed Evolution

    Typically, the initial successful designs have low activity.This low starting activity has been further improved throughmultiple rounds of directed evolution. A combination ofrandom mutagenesis and targeted diversification has yieldedimproved activities. Further computational analysis also fedinto this work, thereby allowing for the selection of potentialmutations (including insertions) to incorporate during therounds of selection. Specific examples are highlighted inSection 4.2.3.

    4. Computational Enzyme Design—Achievements

    4.1. Retro-Aldolases

    The design of a novel retro-aldolase is the first example inwhich the computational inside-out approach was employedto construct a functional active site. The resulting retro-aldolases catalyze the C�C bond breaking in 4-hydroxy-4-(6-methoxy-2-naphthyl)-2-butanone.[170] Analogous to the strat-egy used by type I aldolases, the reaction mechanism invokesa nucleophilic lysine and the formation of an iminiumintermediate (Scheme 8).[177]

    The computational designs are based on four distincttheozymes (Figure 4). They feature a lysine as a Schiff baseand a general acid/base (I: Lys/Asp dyad, II: Tyr, III: His/Aspdyad, IV: H2O) for deprotonation of the b alcohol. Thecharged side chain (Lys-Asp-Lys) mediated proton transfer

    Scheme 8. Steps in the amine-catalyzed retro-aldol reaction of4-hydroxy-4-(6-methoxy-2-naphthyl)-2-butanone.

    Computational Enzyme DesignAngewandte

    Chemie

    5711Angew. Chem. Int. Ed. 2013, 52, 5700 – 5725 � 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim www.angewandte.org

    http://www.angewandte.org

  • scheme in motif I is analogous to that with d-2-deoxyribose-5-phosphate aldolase.[178] Motifs II, III, and IV mimic the activesites of catalytic antibodies, in which a lysine is placed intoa hydrophobic pocket to lower its pKa value.

    The geometries of the four active-site motifs wereobtained from QM theozyme calculations, which were carriedout for every step along the retro-aldol reaction path. Thetransition-state geometries were then combined to generatea composite active site that carries the geometric informationof the complete reaction profile. The resulting consensustheozymes of the four motifs were further diversified byvarying a) the internal degrees of freedom of the compositetransition state, b) the orientation of the catalytic side chainswith respect to the composite transition state within rangesconsistent with catalysis, and c) the conformations of thecatalytic chains. For each motif, a set of 1013–1018 uniqueactive-site geometries was generated. The hashing algorithmwithin RosettaMatch[165] was used to search for placements ofthese into the binding pockets of 71 protein scaffolds. Around180 000 distinct solutions (matches) were found. Rosetta-Design was subsequently utilized to optimize the active-sitesequence for optimal packing around the composite transitionstate and the catalytic lysine. A total of 72 designs in 10different scaffolds were selected for experimental character-ization. The final selection criteria were based on a) thepredicted binding energy of the transition state, b) the extentto which the catalytic geometry was satisfied, c) the packingaround the active lysine, and d) the consistency of side-chainconformation after side-chain repacking in the presence andabsence of the composite transition state.

    70 of the 72 proteins were soluble when expressed andpurified from Escherichia coli, and a respectable 32 showeddetectable retro-aldolase activity. Product formation wasmonitored with a fluorescence-based assay. The active designsspan five different protein scaffolds (1mw4, 1f5j, 1thf, 1i4n,1a53) from the triose phosphate isomerase (TIM) barrel andjelly-roll folds, and are based on the active-site theozyme

    motifs III and IV. The designs in the relatively open jelly-rollscaffold show simple linear kinetics, whereas the TIM barreldesigns with more enclosed active-site pockets displayedmore complex kinetics—a potential indication of restrictedsubstrate access and/or product release. Two apo structures(the S210A variant of RA22 and the M48K variant of RA61)were solved at 2.2 and 1.9 � resolutions, respectively.[170] Thebackbone geometries and side-chain orientations are inexcellent agreement with those of the designs. Respectablerate enhancements of up to four orders of magnitude wereachieved. However, even the best computational design fallstwo to three orders of magnitude short of the rate enhance-ment (kcat/kuncat) that is achieved by comparable catalyticantibodies.[60, 61] The catalytic efficiencies (kcat/KM) of thedesigns range between 0.02 and 0.74m�1 s�1 and are modest,particularly when compared to those of natural enzymes.

    In an effort to shed light on the performance discrepancyof computationally designed enzymes relative to catalyticantibodies, Ruscio et al.[179] studied the influence of structuralfluctuations of the protein on the active-site preorganizationof RA22 by using molecular dynamics. The authors found thatan alternative orientation of the substrate with respect toHis233 is optimal for the nucleophilic attack by Lys159. Theyfurther note that the His233–Asp53 dyad is disrupted due tothe solvation of Asp53, which in turn provides conformationalflexibility to His233, thus affecting its interaction with thesubstrate. The authors attributed the comparatively lowactivity of RA22 to these dynamic distortions in thedeprotonation step of the reaction.

    Lasilla et al. recently showed that the designed interac-tions of water with Tyr78 and Ser87 in RA61 do notcontribute to catalysis.[180] Activity is instead largely attrib-uted to the nucleophilic character of the catalytic lysine(pKa = 6.8–7.5) and to the favorable interaction energybetween the enzyme and the naphthyl group of the substrate.

    4.2. Kemp Eliminases4.2.1. Computational Designs

    The Kemp elimination (Figure 5a) is a well-studied ring-opening reaction that is initiated by deprotonation of thesubstrate. The reaction serves as a model for the biochemi-cally relevant abstraction of a proton from carbon centers,although it does not have a natural counterpart. The reactionhas become an attractive target for catalyst design, rangingfrom catalytic antibodies[51] to “synzymes”.[181] Most recently,DeGrado and co-workers employed a minimalist designapproach to endow calmodulin with Kemp eliminationactivity.[182]

    The rate of the Kemp elimination depends strongly on themedium when a carboxylate functions as the general base, andrate accelerations of 107 can be achieved by simply placingacetate in a polar aprotic solvent such as acetonitrilecompared to placing it in water.[183] An additional accelerationof 106 can be achieved through precise positioning of thedonor and acceptor for this reaction,[53, 183,184] thereby givinga theoretical limit for the rate enhancement of 1013 for theKemp elimination.

    Figure 4. Retro-aldolase theozyme motifs.

    .AngewandteReviews

    K. N. Houk et al.

    5712 www.angewandte.org � 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim Angew. Chem. Int. Ed. 2013, 52, 5700 – 5725

    http://www.angewandte.org

  • In the first example of active-site design towards catalysisof an unnatural reaction, Rçthlisberger et al. used the inside-out protocol to produce eight active enzymes that promotethe base-catalyzed ring opening of 5-nitrobenzisoxazole (5-NBZ).[171] Two distinct theozymes (Figure 5b) were employedin the process, and gave rise to catalysts with rate enhance-ments of up to 105. A crystal structure was solved for KE07, anactive design with kcat/KM = 12.2m

    �1 s�1, and superimposedwell on the computational model.

    The kinetic parameters of these eight computationalKemp elimination designs are comparable to those ofcatalytic antibodies. The rate enhancements (kcat/kuncat)range from 103 to 105 and compare well with the kcat/kuncat value of 10

    4 displayed by catalytic antibodies 34E4 and35F10.[51] In terms of substrate binding, on the other hand, thetwo catalytic antibodies outperform the eight computationaldesigns up to 10-fold (antibody KM of 0.6 to 0.1 mm comparedto a range of 4.2 to 0.6 mm for the designs). Three of thecomputational designs were further enhanced by in vitrodirected evolution. The kcat/KM value of KE07 was improved200-fold,[185] that of KE70 over 400-fold,[186] and in the case ofKE59 the kcat/KM value was increased over 2000-fold.

    [187] Thestudies demonstrate how computational protein design can beused to generate enzymes with modest activities that can thenbe further optimized through directed evolution approaches.

    4.2.2. Computational Analyses of De Novo Kemp Designs4.2.2.1. PDDG/PM3 Monte Carlo Study

    Alexandrova et al. described the analysis of the fouractive Kemp elimination designs KE07 (258 residues), KE10(253), KE15 (258), and KE16 (258) by mapping out thereaction coordinate with a semiempirical PDDG/PM3 QM/MM Monte Carlo approach.[188] The computational setupconsisted of 200 residue cutaways of the four designs. Thesemiempirical QM part consisted of the 5-NBZ substrate andthe catalytic base (Glu/Asp). Water molecules were not

    included in the PM3 region. Side-chain motions were sampledwhile the protein backbones were held fixed. The attempt togain insight into what governs the observed activities and toestablish a correlation between the computed and experi-mental barriers was met with limited success. The computedbarriers were plagued by large absolute error bars and bya trend that was opposite to what was found experimentally. Itshould be noted, however, that within the series of fourdesigns that was chosen for this study, the free energies ofactivation (DG�) span a range of merely 0.9 kcal mol�1—toonarrow to be picked up by most modern QM methods.

    4.2.2.2. DFT-Based Approaches

    Density functional theory (DFT) calculations wereemployed to study six active and four inactive Kempelimination designs with free energies of activation (DG�)ranging from 18.3 to 20.6 kcalmol�1 (actives) and to DG��23.2 kcal mol�1 (inactives).[169] Three modeling approacheswere explored, ranging from a minimalistic representation ofthe catalytic units (Figure 6, upper right), to QM on the full

    active sites, and to computations on the entire protein systemsafter a short MD simulation (Figure 6, left) in which the activesite and selected water molecules were treated with QM(Figure 6, bottom right).

    Qualitatively, the full-protein MD-QM/MM approachcompared best to experiment; the computed barriers forinactive designs were significantly higher than those of activedesigns. Aside from the qualitative agreement, however, theapproach shows only a weak correlation with the experimen-tally determined energy barriers (R2 = 0.58), thus indicatingthat significant contributions to catalysis also escape thiscomputational model. A lesson from these studies is thatcomputing energy barriers for base-catalyzed reactions suchas the Kemp elimination, necessitates an explicit treatment ofsolvent molecules and other polar groups as part of the QMcalculations

    Figure 5. The Kemp elimination. a) Reaction scheme of 5-nitrobenz-isoxazole. b) The two theozymes that were employed.

    Figure 6. Modeling approaches for analysis of Kemp designs rangefrom QM on the catalytic unit (top right; with circled backbone heavyatoms frozen) to full enzyme ONIOM QM/MM after 2 ns MD (QMlayer shown as sticks at bottom right). Modified from Ref. [169].

    Computational Enzyme DesignAngewandte

    Chemie

    5713Angew. Chem. Int. Ed. 2013, 52, 5700 – 5725 � 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim www.angewandte.org

    http://www.angewandte.org

  • 4.2.2.3. Empirical Valence Bond (EVB) Calculations

    Warshel and co-workers used a two-layer EVB approachto evaluate Kemp designs. They applied free energy pertur-bation umbrella sampling (FEP/US) calculations on designsKE07 (and directed evolution variants),[189] KE70, KE59, andHG-2.[190] The EVB setup was calibrated to reproduceab initio calculations of the reaction surface in a solventcage and then applied to obtain free energies of activation(DG�). For many systems, these are in exceptional agreementwith experimentally determined values, yet for others (e.g.KE59, HG-2) the deviations are significant (Table 1).

    4.2.2.4. Active-Site Dynamics from MD Simulations

    Molecular dynamics simulations of 23 Kemp eliminases(14 active, 9 inactive), although of a rather qualitative nature,were more conclusive than previous computational studies.Analysis of the simulation data showed that the failedcomputational designs are unable to maintain essentialactive-site hydrogen bonds.[169] This becomes particularlyclear in the example of the inactive KE38 (Figure 7c).Compared to the catalytic His–Asn contact in the naturallyevolved cathepsin K (Figure 7a) and the catalytic His–Aspdyad in the active KE70 (Figure 7 b), there is no significantpopulation in which the KE38 His–Glu dyad is intact, and Hisalone is too weak a base to deprotonate the substrate on itsown.

    Overall, a disassembly of the designed catalytic contactswas observed to occur through a combination of two factors:excessive solvent accessibility and alternative side-chainpacking, both of which give rise to distinct distributionpatterns. This observation is relevant to rational enzymedesign in general, but particularly for the catalysis of reactionsthat depend on a carboxylate base, as they are usuallysensitive to polar protic solvents such as water. Solventmolecules that come into direct contact with the carboxylateoxygen atoms can significantly reduce their base strength (upto 106 in terms of kcat) and Figure 8a shows this trend fora cross-section of the dataset. On average, the active sites offunctional designs are less hydrated than those of inactivedesigns (Figure 8b), but even the microenvironments of themost active designs are still far from those of naturallyevolved acid/base catalysts such as cathepsin K (outermostright column in Figure 8b).

    Taken together, active designs can be discerned frominactive ones when a multidimensional problem can besimplified to a two-dimensional model (Figure 9).

    What transpires then from this study is that by queryingthe dynamics of a protein–substrate complex in the presenceof explicitly represented solvent molecules, and by askingspecific questions based on chemical intuition, one can gathera wealth of information about the system at hand and relatethat to experimental observables. On this basis, it has becomea useful approach to combine MD-based analyses with the

    Table 1: EVB activation free energies for computational Kemp elimi-nases.[189,190]

    System PDB entry Base DG�exp[a] DG�EVB

    [a]

    HG-2 (S265T) NA Asp127 17.7 18.234E4 antibody 1vol GluH50 17.9 17.3KE59 NA Glu231 18.3[b] 31.7HG-2 3nyd Asp127 18.5[c] 24.3KE70 3npu His16-Asp44 18.5 19.31A53-2 3nyz Glu178 20.0 20.7KE07 2rkx Glu231 20.1 19.5

    [a] In kcalmol�1. [b] Computed with kcat = 0.29 s�1.[171] [c] Computed from

    an extrapolated kcat = 0.22 s�1.[191] NA= not available.

    Figure 7. Angle (q) versus distance (d) scatter plots of the catalyticcontact. a) His–Asn contact of the naturally evolved cathepsin Kcatalytic triad; b) His–Asp contact of the active design KE70; c) His–Glu contact of the inactive design KE38. Data points are from 20 nsMD simulations. Three hydrogen bond categories[192] are outlined withdotted lines. The individual distributions are projected onto the axes.The progression of the catalytic contact from QM theozyme, to finaldesign, and the fully relaxed MD starting geometry is plotted withfilled, half-filled, and empty circles, respectively. Modified fromRef. [169].

    .AngewandteReviews

    K. N. Houk et al.

    5714 www.angewandte.org � 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim Angew. Chem. Int. Ed. 2013, 52, 5700 – 5725

    http://www.angewandte.org

  • design and refinement of new enzymes and the interpretationof results from directed evolution experiments.

    4.2.3. Directed Evolution of Kemp Eliminases KE07, KE70, andKE59

    Tawfik and co-workers combined directed evolutionmethods with rational design and were able to furtherimprove the catalytic activities of three computationallydesigned Kemp eliminases.

    4.2.3.1. KE07

    Seven rounds of random mutagenesis and selectionresulted in up to eight mutations and a 200-fold increase inthe kcat/KM value compared to the computationally designed“wild-type”-KE07.[185] The improvement resulted from a 2.6-fold lower KM and a 76-fold higher kcat value, which canlargely be attributed to the Ile7Asp mutation adjacent to the

    active site. The Ile7Asp mutation weakens the partial saltbridge between the catalytic Glu101 and Lys222 (Figure 10 a)in a dual fashion: Asp7 directly competes with Glu101 forLys222 and also recruits additional water molecules that candirectly interact with Glu101 (Figure 10b), effectively break-ing up the salt bridge and tuning the pKa value of the catalyticbase.[169,185]

    4.2.3.2. KE70

    A combination of computational optimization and ninerounds of random mutagenesis resulted in a > 400-foldincrease in the kcat/KM value (up to 12-fold lower KM and53-fold higher kcat).

    [186] The improvement was attributed totighter substrate binding, fine-tuned electrostatics (Fig-ure 11a,b), and stabilization of the catalytic dyad in anorientation optimal for catalysis (Figure 11c,d). Progressiverounds of directed evolution cause the “D loop” to becomeless mobile (Figure 11 c) and allow the catalytic dyad residues(His17 and Asp45) to form a stronger hydrogen bond(Figure 11 d).

    The active site of KE70 is based on theozyme II inFigure 5b. The interaction potential near the ideal distance(r0) of the His17–Asp45 contact can be approximated toa harmonic function. Thus, the energetic penalty for devia-tions from r0 is approximately proportional to (Dr)

    2. Further-more, by assuming a simple transition-state model and usingthe Eyring equation, ln(kcat) is proportional to the activationfree energy. The linear relationship between (Dr)2 and ln(kcat)(Figure 11 d) then suggests that the increase in the kcat value ofthe evolved variants results in a large part from the tighteningof the hydrogen bond between His17 and Asp45, as the activesite residues of the more evolved KE70 variants become moreoptimally placed and less mobile.

    In contrast to KE07, beneficial mutations were notexclusive to the second and third shell, but also includedfirst-shell residues.

    4.2.3.3. KE59

    The functional mutations that gave rise to a comparativelyhigh initial activity of this computational Kemp design alsocaused it to be one of the least stable. In contrast to KE07 and

    Figure 8. Water coordination distributions from MD simulations(d

  • KE70, a number of fold-stabilizing consensus mutations hadto be introduced prior to the directed evolution. KE59 wasthen subjected to 16 rounds of directed evolution, whichresulted in a > 2000-fold increase in the kcat/KM value, mostlythrough a significantly increased kcat value.

    [187] The mostproficient variant displayed a KM value of 37 mm, a kcat/KM value of 0.6 � 10

    6 s�1m�1, and a kcat/kuncat value of approx-imately 107—kinetic parameters that approach those ofnatural enzymes.

    KE59 is also the only design in the series that acceptsa variety of benzisoxazoles besides 5-NBZ. In fact, the largestoptimization was achieved for 5,7-dichlorobenzisoxazole,a significantly less activated substrate. The large improvementin the kcat value was attributed mainly to the ability of moreadvanced variants to more effectively exclude bulk solventfrom interacting with the catalytic base (Figure 12a). Sub-

    stituents at the 5- and 7-positions of the substrate were foundto be well-suited for this purpose (Figure 12b). Conversely,Glu230 can adopt an alternative and catalytically suboptimalconformation in which it interacts with the backbone-NHgroup of Ser210 and an average of four water molecules(Figure 12 c).

    4.3. Diels–Alderases

    Siegel et al. describe the inside-out computational designand experimental characterization of enzymes catalyzinga bimolecular Diels–Alder reaction (Figure 13a) with highstereoselectivity and substrate specificity (Figure 13b).[172] Nonaturally occurring enzymes are known that can catalyze thiscycloaddition. The catalytic motif was inspired by previouscatalytic antibody studies, where an Asp, Asn, and Tyr formedthe catalytic arrangement.[44, 46] The catalytic motif hereconsists of a Gln and two Tyr residues positioned such as tobind the bimolecular transition state leading to the 3R,4S-endo product. Two active proteins were produced: DA20 wasdesigned into a b-propeller scaffold and DA42 into the KSIscaffold (Figure 14).

    Computational evaluation methods (QM and MD) helpedrationalize experimental observations and guided adjust-ments to early designs that resulted in improved kineticparameters. A notable example is the development ofDA_20_10 from DA_20_00. Molecular dynamics simulationsof DA_20_00 (kcat = 0.1 h

    �1) show that the catalytic Tyr121can access a noncatalytic conformation in which it binds to thebackbone carbonyl group of residue 271 (Figure 15, red).Increasing the steric bulk at position 272 was proposed tointerfere with this interaction (Figure 15a, blue), allowing

    Figure 11. a) Crystal structure of the “wild-type” KE70 design. b) Crystalstructure of the round 6 variant R6 6/10A. Gly101Ser stabilizes Arg69in an alternative conformation that does not interfere with the Asp45–His17 catalytic dyad. c) Atomic fluctuation profiles from MD simula-tions. The active site residues (circles) and the catalytic dyad (stars)are labeled. Peaks correspond to loops with elevated flexibility. d) Thesquare of the deviation (Dr)2 from the ideal hydrogen-bond distance(1.8 � for this contact) versus �ln(kcat). Modified from Ref. [186].

    Figure 12. Number of water molecules within 3.2 A of either of theGlu230 carboxylate oxygen atoms (from MD simulations), plotteda) over all evolved KE59 variants with available kcat values for 5-nitro-benzisoxazole, and b) over all substrates with available kcat values forvariant R13. Error bars correspond to + /� the standard deviation ofthe MD-based distributions. c) The predominant conformation ofGlu230 as observed in MD simulations with 5-nitro-6-chlorobenzisox-azole (blue, substrate in orange) versus the conformation observed inthe crystal structures (green), shown here for the R13 3/11H variant.Modified from Ref. [187].

    .AngewandteReviews

    K. N. Houk et al.

    5716 www.angewandte.org � 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim Angew. Chem. Int. Ed. 2013, 52, 5700 – 5725

    http://www.angewandte.org

  • Tyr121 to assume the conformation required for binding andcatalysis. Figure 15b shows an overlay of the active sites ofDA_20_00, DA_20_10, and the QM theozyme to showcasethese observations pictorially. DA_20_10 was characterizedwith a kcat value of 2.1 h

    �1.The crystal structure that was solved for a variant of the

    DA_20 design superimposes well onto the computationaldesign. The catalytic efficiency is comparable to that pre-

    viously achieved by catalytic antibodies with equal or highercatalytic rates, but a relatively weak binding of the dienophile.

    4.4. Iterative Approach to “Inside-Out” Design of Enzymes

    The Houk and Mayo research groups together exploredan iterative variation[191] to the Baker/Houk inside-outapproach.[171] Rather than expressing and characterizinga large number of computationally designed proteins, theefforts were focused on a single template protein (PDB-ID1gor;[193] Figure 16a). As with the Rçthlisberger designs, thenative active site of the template was constructed to comple-ment the transition-state geometry for the Kemp eliminationof 5-NBZ (Figure 5a). Theozyme I (Figure 5b) served as thecatalytic motif, and Phoenix[194] rather than Rosetta was usedfor the design of the active-site in silico. The overall protocolwas similar to that of the Rçthlisberger study and wasvalidated in its utility to generate active Kemp eliminases.However, no activity could be produced with the 1gorscaffold. HG-1 (Figure 16 b), the resulting inactive first-generation design, differs from the wild-type 1gor by sevenmutations and is fully folded under the conditions of theactivity assays, with a secondary structure that is very similarto the wild-type scaffold 1gor. Analysis of the structure anddynamics of HG-1, however, highlighted a number ofproblems. The innately flexible active-site pocket of 1gorcould not be engineered to provide the necessary support forthe theozyme geometry in HG-1 (compare Figure 18 a,b).Additionally it was found that a substantial number of watermolecules can access the active site of HG-1—an observationthat has implications for both the binding of the hydrophobicsubstrate (Figure 18c) and the base strength of Glu237, theintended catalytic residue. Efforts to increase the hydro-phobic character of the HG-1 active site were unsuccessful,and so a more invasive strategy was explored: rather thansearching the RCSB for a protein with a native active site thatis better suited for theozyme I (Figure 5b), the focus of thecomputational design was shifted away from the native activesite and onto a pre-existing small pocket inside the b barrel(shaded area at center-bottom of Figure 16a,b).

    Figure 13. Diels–Alder reaction between 4-carboxybenzyl-trans-1,3-buta-diene-1-carbamate and N,N-dimethylacrylamide (a), which gives onlythe 3R,4S endo product (b). Part (b) is reprinted from Ref. [172].

    Figure 14. Computationally designed Diels–Alderases. a) DA_20_10,b) DA 42 04. Active-site Gln, Tyr, and substrates are in red sticks.c) Active site of DA_20_00 and d) of DA 20_10. Mutations are high-lighted in red (for DA_20_00 compared to the native protein scaffold)and in orange (for DA_20_10 compared to DA_20_00). Parts (b) and(c) are reprinted from Ref. [172].

    Figure 15. a) DA_20_00 shows a narrow distribution at 2 � (hydrogenbond between Tyr121 and the carbonyl group at position 271), whileDA_20_10 shows a wide distribution at 5 � (no hydrogen bond).b) Overlay of the QM transition-state geometry (orange) with equili-brated geometries from MD simulations on DA_20_00 (red) andDA_20_10 (blue). Reprinted from Ref. [172].

    Computational Enzyme DesignAngewandte

    Chemie

    5717Angew. Chem. Int. Ed. 2013, 52, 5700 – 5725 � 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim www.angewandte.org

    http://www.angewandte.org

  • The resulting HG-2 design differs from the wild-type 1gorby 12 mutations and utilizes Asp127 as the general base(Figure 16 c). The native pocket was deepened by 7 � andtightened at the entrance, effectively generating a new activesite inside the b barrel (Figure 17).

    MD simulations predicted that the new design was active.They showed that HG-2 was capable of stabilizing thetheozyme geometry, that it could limit the influx of watermolecules to the catalytic base, and that it could support the

    catalytic contact between substrate and Asp127 (Figure 18 d).The protein was expressed, kinetically characterized, and itsactivity was confirmed with a kcat/KM value of 123.2m

    �1 s�1,comparable to the 163m�1 s�1 for the most active Rçthlis-berger design, KE59.

    The crystal structure of HG-2 was solved to a resolution of1.2 �. A transition-state analogue (TSA) was cocrystallizedand occupied the active site in two distinct but catalyticallycompetent orientations. The structure validated the dynam-ics-guided computational design, but it also drew attention tothe importance of accounting for alternative substrateorientations in future versions of the enzyme design protocol.Additional single-point mutations were explored with theSer265Thr variant, and further enhanced the kcat/KM value bya factor of three.

    Figure 16. Variation of the active site. a) The unmodified 1gor scaffold,b) design HG-1, and c) design HG-2. The TS model is shown inorange. Reprinted from Ref. [191].

    Figure 17. Active site relocated by 7 �. a) HG-1. b) HG-2. Cutaway viewwith active-site residues in red and the TS model in orange.

    Figure 18. Dynamics of HG-1 (a) and of HG-2 (b). Equidistant snap-shots from MD simulations are shown with the backbone in blue andthe active-site residues in red. The backbone dynamics are of compa-rable magnitude in both HG-1 and HG-2, but side-chain active-sitedynamics differ significantly. c) and d) show angle–distance scatterplots of the catalytic contacts between substrate and base. Modifiedfrom Ref. [191].

    .AngewandteReviews

    K. N. Houk et al.

    5718 www.angewandte.org � 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim Angew. Chem. Int. Ed. 2013, 52, 5700 – 5725

    http://www.angewandte.org

  • 4.5. Structure Prediction and Design through Crowd Sourcing

    Modern computer algorithms have become very effectiveat approximating the physics that governs molecular inter-actions. A significant challenge that remains, however, is thatof conformational sampling. The free-energy landscapes ofbiomolecules are so vast that navigating them is one of themajor bottlenecks in the study of protein folding, structureprediction, and protein engineering. Various approaches havebeen developed to address this numeric problem over the pastfew years and range from structure prediction (e.g. Rosetta)to simulation (e.g. Markov state models in combination withFolding@home) as well as highly specialized hardware (e.g.Anton). Most recently, crowd-sourced structure predictiondemonstrated its utility as a surprisingly effective addition tostatistical and deterministic search algorithms.

    Cooper et al. introduced Foldit, a graphical user interfaceto some of Rosetta�s functionality, which has the addedcapability to serve as an online multiplayer game.[195] The ideabehind recruiting “homo ludens”—the playing (wo)man—toscientific challenges is based on observations from human-based computing, in which certain tasks—such as shaperecognition—can be performed faster and more efficiently byhumans than by machines. The Foldit study shows that basicspatial recognition, intuition, and decision making can out-compete the stochastic component of conformational searchwhen applied to problems of protein-structure prediction.

    The collaborative nature of the game allows participantsto form groups and to share and evolve their experience andstrategies in the form of “recipes” to more effectivelycompete against other groups. Successful “recipes” get usedand tweaked more often than unsuccessful ones and so aninteresting evolutionary process starts taking place in whichnew and enhanced prediction algorithms are discovered bythe gaming community. The two most popular “recipes”, forexample, encoded an algorithm that turned out to beara striking resemblance to an improvedstructure-prediction method, the devel-opment of which had not been publishedat the time.[196]

    Two recent reports demonstrate theutility of crowd-sourcing through Folditbeyond a proof-of-principle stage.Khatib et al. generated models formolecular replacement with which thelong elusive structure of the M-PMVretroviral protease could be solved,[197]

    and Eiben et al. achieved an 18-foldimproved substrate binding for a previ-ously designed Diels–Alderase throughsubstantial loop redesign (Figure 19).[198]

    Although the evolutionary dynamicsof crowd-sourcing and the applicabilityof nonscientific thought processes toreal-world scientific problems are fasci-nating topics in their own right, thequality of the resulting structure predic-tions strongly depends on how well thescientific objectives can be broken down

    into palatable challenges.[197] Many interesting developmentscan be expected here in the near future.

    5. Challenges in Enzyme Design

    The field of computational chemistry and biology hasexperienced significant advances through the development ofnew algorithms and hardware—but equally important,through an increase and solidificat