a mosaic of monofunctional domains

12
Biochem. J. (1987) 246, 375-386 (Printed in Great Britain) The pentafunctional arom enzyme of Saccharomyces cerevisiae is a mosaic of monofunctional domains Kenneth DUNCAN,* R. Mark EDWARDSt and John R. COGGINS*j *Department of Biochemistry, University of Glasgow, Glasgow G12 8QQ, and tSearle Research and Development, Lane End Road, High Wycombe, Bucks. HP12 4HL, U.K. The nucleotide sequence of the Saccharomyces cerevisiae AROI gene which encodes the arom multifunctional enzyme has been determined. The protein sequence deduced for the pentafunctional arom polypeptide is 1588 amino acids in length and has a calculated Mr of 174555. Functional regions within the polypeptide chain have been identified by comparison with the sequences of the five monofunctional Escherichia coli enzymes whose activities correspond with those of the arom multifunctional enzyme. The observed homologies demonstrate that the arom polypeptide is a mosaic of functional domains and are consistent with the hypothesis that the AROI gene evolved by the linking of ancestral E. coli-like genes. INTRODUCTION Multifunctional enzymes are found in all classes of organisms, including bacteria, higher plants and mam- mals, but they appear to be particularly common on the biosynthetic pathways of the lower eukaryotes (Kirschner & Bisswanger, 1976; Schmincke-Ott & Bisswanger, 1980; Hardie & Coggins, 1986). Most multifunctional enzymes catalyse two or more consecutive reactions on a biosynthetic pathway. In some cases the pathway intermediates are covalently bound to the enzyme, as in the case of the fatty acid synthases (Schweizer, 1986; Hardie & McCarthy, 1986), but more frequently the products of the individual reactions are free to diffuse away from the enzyme surface. The arom multifunctional enzyme (Lambert et al., 1985; Coggins et al., 1985; Coggins & Boocock, 1986) which catalyses the five central steps of the shikimate pathway (see Fig. 1) is an example of this latter type of multifunctional enzyme. One remarkable feature of the arom system is the diversity in the patterns of gene and enzyme organization found in different species. Genetic and biochemical studies have revealed the presence of an 'arom gene cluster' in Neurospora crassa (Giles et al., 1967a; Catcheside et al., 1985), Aspergillus nidulans (Ahmed & Giles, 1969; Charles et al., 1986), Saccharomyces cerevisiae (de Leeuw, 1967; Larimer et al., 1983), Schizosaccharomyces pombe (Strauss, 1979; Nakanishi & Yamamoto, 1984), a number of other fungal and yeast species (Ahmed & Giles, 1969; Bode & Birnbaum, 1981), and in Euglena gracilis (Berlyn et al., 1970). In contrast, the corresponding structural genes for the five central enzymes of the shikimate pathway in Escherichia coli, Salmonella typhimurium and Bacillus subtilis are widely scattered about the genome (Bachmann, 1983; Sanderson & Roth, 1983; Henner & Hoch, 1980) and in the case of E. coli the five enzymes have also been shown to be separable (Berlyn & Giles, 1969; Chaudhuri & Coggins, 1985; Coggins et al., 1985). In plants three of the enzymes of the pathway are separable but two, 3-dehydroquinase and shikimate dehydrogenase, co- purify (Polley, 1978; Koshiba, 1979; Fiedler & Schultz, 1985; Coggins, 1986; Mousdale et al., 1987) and have been shown to occur on a single bifunctional polypeptide chain (Polley, 1978; Fiedler & Schultz, 1985; Mousdale et al., 1987). These observations raise the question of what relationship there is between the prokaryotic mono- functional shikimate pathway enzymes and the multi- functional eukaryotic enzymes. On the basis of limited proteolysis experiments on the N. crassa arom multi- functional enzyme (Smith & Coggins, 1983; Coggins et al., 1985; Coggins & Boocock, 1986) and from knowledge of its subunit molecular mass and the subunit molecular mass of the five corresponding E. coli enzymes we have proposed that the arom protein has a mosaic structure consisting of five autonomous, monofunctional domains each one homologous to the appropriate E. cli enzyme (Coggins et al., 1985; Chaudhuri & Coggins, 1985; Coggins & Boocock, 1986; Hardie & Coggns, 1986). To confirm this hypothesis we set out to determine -tbe complete sequence of the S. cerevisiae arom protein and the five corresponding monofunctional. E. coli enzymes,. While this work was in progress Hawkins and vi1 co-workers reported the partial (Charles et at., 1985) and later the complete (Charles et al., 1986) sequence of the A. nidulans arom multifunctional enzyme and pointed out that it contained a region homologous to E. coli EPSP synthase (Charles et al., 1986; Duncan et al., 19S4). Here we report the complete sequence of the S. cerevisiae arom multifunctional enzyme and compare it with tie sequences, determined in our laboratory, of all five of the corresponding monofunctional E. coli enzymes (Duncan et al., 1984, 1986; Millar et al., 1986; Millar & Coggins, 1986; Anton & Coggins, 1987). The results confirm that the arom polypeptide is a 'mosaic' of five functional domains, each of which is homologous to a mono- functional E. coli polypeptide. Abbreviation used: EPSP, 5-enoylpyruvylshikimate 3-phosphate, t To whom correspondence and reprint requests should be addressed. These sequence data have been submitted to the EMBL/GenBank Data Libraries under the accession number Y003 13. Vol. 246 375

Upload: nguyenque

Post on 05-Feb-2017

220 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: a mosaic of monofunctional domains

Biochem. J. (1987) 246, 375-386 (Printed in Great Britain)

The pentafunctional arom enzyme of Saccharomyces cerevisiae isa mosaic of monofunctional domainsKenneth DUNCAN,* R. Mark EDWARDSt and John R. COGGINS*j*Department of Biochemistry, University of Glasgow, Glasgow G12 8QQ, and tSearle Research and Development,Lane End Road, High Wycombe, Bucks. HP12 4HL, U.K.

The nucleotide sequence of the Saccharomyces cerevisiae AROI gene which encodes the arom multifunctionalenzyme has been determined. The protein sequence deduced for the pentafunctional arom polypeptide is1588 amino acids in length and has a calculated Mr of 174555. Functional regions within the polypeptidechain have been identified by comparison with the sequences of the five monofunctional Escherichia colienzymes whose activities correspond with those of the arom multifunctional enzyme. The observedhomologies demonstrate that the arom polypeptide is a mosaic of functional domains and are consistentwith the hypothesis that the AROI gene evolved by the linking of ancestral E. coli-like genes.

INTRODUCTIONMultifunctional enzymes are found in all classes of

organisms, including bacteria, higher plants and mam-mals, but they appear to be particularly common on thebiosynthetic pathways ofthe lower eukaryotes (Kirschner& Bisswanger, 1976; Schmincke-Ott & Bisswanger, 1980;Hardie & Coggins, 1986). Most multifunctional enzymescatalyse two or more consecutive reactions on abiosynthetic pathway. In some cases the pathwayintermediates are covalently bound to the enzyme, as inthe case of the fatty acid synthases (Schweizer, 1986;Hardie & McCarthy, 1986), but more frequently theproducts of the individual reactions are free to diffuseaway from the enzyme surface. The arom multifunctionalenzyme (Lambert et al., 1985; Coggins et al., 1985;Coggins & Boocock, 1986) which catalyses the fivecentral steps of the shikimate pathway (see Fig. 1) is anexample of this latter type of multifunctional enzyme.One remarkable feature of the arom system is the

diversity in the patterns of gene and enzyme organizationfound in different species. Genetic and biochemicalstudies have revealed the presence of an 'arom genecluster' in Neurospora crassa (Giles et al., 1967a;Catcheside et al., 1985), Aspergillus nidulans (Ahmed &Giles, 1969; Charles et al., 1986), Saccharomycescerevisiae (de Leeuw, 1967; Larimer et al., 1983),Schizosaccharomycespombe (Strauss, 1979; Nakanishi &Yamamoto, 1984), a number of other fungal and yeastspecies (Ahmed & Giles, 1969; Bode & Birnbaum, 1981),and in Euglena gracilis (Berlyn et al., 1970). In contrast,the corresponding structural genes for the five centralenzymes of the shikimate pathway in Escherichia coli,Salmonella typhimurium and Bacillus subtilis are widelyscattered about the genome (Bachmann, 1983; Sanderson& Roth, 1983; Henner & Hoch, 1980) and in the case ofE. coli the five enzymes have also been shown to beseparable (Berlyn & Giles, 1969; Chaudhuri & Coggins,1985; Coggins et al., 1985). In plants three of theenzymes of the pathway are separable but two,

3-dehydroquinase and shikimate dehydrogenase, co-purify (Polley, 1978; Koshiba, 1979; Fiedler & Schultz,1985; Coggins, 1986; Mousdale et al., 1987) and havebeen shown to occur on a single bifunctional polypeptidechain (Polley, 1978; Fiedler & Schultz, 1985; Mousdaleet al., 1987).

These observations raise the question of whatrelationship there is between the prokaryotic mono-functional shikimate pathway enzymes and the multi-functional eukaryotic enzymes. On the basis of limitedproteolysis experiments on the N. crassa arom multi-functional enzyme (Smith & Coggins, 1983; Cogginset al., 1985; Coggins & Boocock, 1986) and fromknowledge of its subunit molecular mass and the subunitmolecular mass of the five corresponding E. coli enzymeswe have proposed that the arom protein has a mosaicstructure consisting of five autonomous, monofunctionaldomains each one homologous to the appropriate E. clienzyme (Coggins et al., 1985; Chaudhuri & Coggins,1985; Coggins & Boocock, 1986; Hardie & Coggns,1986).To confirm this hypothesis we set out to determine -tbe

complete sequence of the S. cerevisiae arom protein andthe five corresponding monofunctional. E. coli enzymes,.While this work was in progress Hawkins and vi1co-workers reported the partial (Charles et at., 1985) andlater the complete (Charles et al., 1986) sequence of theA. nidulans arom multifunctional enzyme and pointedout that it contained a region homologous to E. coliEPSP synthase (Charles et al., 1986; Duncan et al., 19S4).Here we report the complete sequence of the S. cerevisiaearom multifunctional enzyme and compare it with tiesequences, determined in our laboratory, of all five of thecorresponding monofunctional E. coli enzymes (Duncanet al., 1984, 1986; Millar et al., 1986; Millar & Coggins,1986; Anton & Coggins, 1987). The results confirm thatthe arom polypeptide is a 'mosaic' of five functionaldomains, each of which is homologous to a mono-functional E. coli polypeptide.

Abbreviation used: EPSP, 5-enoylpyruvylshikimate 3-phosphate,t To whom correspondence and reprint requests should be addressed.These sequence data have been submitted to the EMBL/GenBank Data Libraries under the accession number Y003 13.

Vol. 246

375

Page 2: a mosaic of monofunctional domains

K. Duncan, R. M. Edwards and J. R. Coggins

GO CHH

HO.

Erytpho

OH co2hrose 4- oJ)sphate QHL0CO 1 O CH2H OH. 2 OH

M2+NAD+

2

HO~ Go-Ks 2

OH3

3-1heptulo.

Phosphoenolpyruvate

CO-2 NADPH NP+

O -OH 4OH

3-Dehydroshikimate

Deoxy-D-arabino-isonic acid-7-phosphate

CO-H AT]

HO OHOH

Shikimate

3-Dehydroquinate

CO-P ADP

OH

Shikimate 3-phosphate

Go-

2

CH2

(E)' co2OH

5-Enolpyruvylshikimate3-phosphate

FMNNADPH

7

2 CH /tPrephenate Phenylalanine

P.AnthranilateoTTryptophano C02- p-Aminobenzoate

OH

Chorismate

Fig. 1. Reactions of the early common pathway of aromadc amino acid biosynthesis

The numbers on the Figure refer to the enzymes of the pathway: (1) 3-deoxy-D-arabino-heptulosonic acid 7-phosphate synthase(EC 4.1.2.15), (2) 3-dehydroquinate synthase (EC 4.6.1.3), (3) 3-dehydroquinase (EC 4.2.1.10), (4) shikimate dehydrogenase (EC1.1.1.25), (5) shikimate kinase (EC 2.7.1.71), (6) 5-enoylpyruvylshikimate 3-phosphate (EPSP) synthase (EC 2.5.1.19, alternativename 3-phosphoshikimate 1-carboxyvinyltransferase), (7) chorismate synthase (EC 4.6.1.4). The arom multifunctional enzymecatalyses reactions 2-6.

MATERIALS AND METHODSMaterials

Restriction enzymes were purchased from a number ofcommercial suppliers and were used in accordance withthe manufacturers' instructions. T4 DNA ligase andE. coli DNA polymerase I were from Bethesda Re-search Laboratories, Paisley, U.K. All reagents forDNA sequencing, including z-[35S]thioATP, werepurchased from Amersham International, Amersham,Bucks., U.K.

Cloning and DNA sequence analysisPlasmid preparations and manipulations were as

described in Maniatis et al. (1982). DNA sequencingmethods have been described previously (Duncan et al.,1984). The paired vectors M13mp8 and M13mp9(Messing & Vieira, 1982) or Ml3mpl8 and Ml3mp19(Norrander et al., 1983) were used.

RESULTS AND DISCUSSION

Nucleotide sequence determination of the ARO1 gene

The characterization of two independently isolatedS. cerevisiae AROJ clones, pFL6 [a derivative ofYpARI (Larimer et al., 1983)] and pME173, has been

described (Duncan et al., 1987). These plasmids, whichby restriction analysis have almost identical genomicinserts, are capable of complementing the auxotrophiclesions in a number of E. coli aromatic pathway mutantstrains, namely aroA, aroB, aroD and aroE strains. Theability of a series of deletion derivatives of pME173 tocomplement the various aromatic pathway mutants leaddirectly to the location on the genomic insert of the'sub-regions' within AROJ, and suggested the likelydirection of transcription by comparison with the knownorder of the activities on the analogous N. crassapolypeptide (Duncan et al., 1987).The nucleotide sequence of pFL6 was determined,

using the M13/dideoxy method (Sanger et al., 1977;Messing & Vieira, 1982; Biggin et al., 1983; Norranderet al., 1983) and the sequence confirmed by sequenceanalysis of the Sau3A fragments between the KpnI andHindIll sites of pME173. The sequencing strategy isoutlined in Fig. 2. Both strands were sequenced in theirentirety and all the restriction sites used to generatefragments for sequencing were overlapped by thesequenced fragments.

Translation of the DNA sequence revealed a singleopen reading frame which is sufficiently long to encodethe arom polypeptide. This 1588 amino acid sequence(shown in Fig. 3) has a calculated Mr of 174555, whichcompares with an estimate by SDS/polyacrylamide-gel

1987

PEP Pi

376

0 0---.<CH 2

Page 3: a mosaic of monofunctional domains

Sequence of Saccharomyces cerevisiae arom multifunctional enzyme

ATG TAG

pF L6

Taql 1-ISau3A 1=1 I _~.IBgllPstiHpaIHinc I1:XbaIPvulEco R I -SstI

BamHI I

pME173/Sau3A I '., * *-l

500 bp

, r-or: ---. L 0 -1aI 11I I I I I I

-F--Iw~~~~~~~~~~~~

I8-

4 ]

4- I--.Il

-JI-I--

4 J,c-

1I

Fig. 2. Sequencing strategy for the yeast ARO1 gene

Restriction sites used for sub-cloning into Ml 3 prior to DNA sequence analysis are shown. Arrows indicate the direction andlength of sequence obtained; *indicates clones resulting from digestion at EcoRI* sites, or blunt-end cloning of sheared DNA.

electrophoresis of 165000 for the corresponding enzymeisolated from Neurospora crassa (Lumsden & Coggins,1977). Although direct evidence in support of theassignment of the N-terminal methionine is lacking, thisparticular ATG codon is favoured for two reasons. Thenext methionine in the open reading frame, Met-104, iswithin the region of the arom polypeptide which ishomologous to the product of the E. coli aroB gene (seebelow). Also, it has been shown that the first AUG codonin eukaryotic mRNA usually serves as the translationinitiation codon (Kozak, 1984); evidence has beenobtained that this codon is the first AUG in the AROJtranscript (Duncan et al., 1987).

Overall comparison of S. cerevisiae ARO1 gene with thefive corresponding aro genes of E. coli and the aromgene of Aspergillus nidulansThe subunit Mr of each of the individual E. coli

enzyme is listed in Table 1. The yeast arom subunit Mrof 174555 corresponds closely to the Mr of 159698 forthe combined E. coli enzymes. In order to confirm thatthe arom protein contained all five E. coli domains thepredicted AROJ coding sequence was compared, in turn,with each of the individual E. coli protein sequencescorresponding to the five activities of the complex. Thiswas carried out using the programs 'BESTFIT' and'GAP' (University of Wisconsin Genetics ComputerGroup Package of DNA sequence analysis programs;Devereux et al., 1984). BESTFIT uses the 'localhomology' algorithm of Smith & Waterman (1981) tofind the best segments of similarity between twosequences. The alignments obtained are illustrated inFig. 4. There are very clear homologies between thesequence of each E. coli enzyme and a region ofcorresponding length in the S. cerevisiae multifunctionalenzyme. The order of the activities on the arom poly-peptide chain is the same as that predicted for N. crassa(Giles et al., 1 967a), and for S. pombe (Strauss,1979; Nakanishi & Yamamoto, 1984), and not as

originally deduced for S. cerevisiae by de Leeuw (1967).The computer programs were also used to align thesequence of the A. nidulans arom polypeptide (Charleset al., 1985, 1986) with the S. cerevisiae sequence (Fig. 5).The order of functional regions is the same for these twofungal multifunctional enzymes, which are more hom-ologous to each other than they are to the five individualE. coli enzymes. The number of position identities andthe percentage homologies for pair-wise combinations ofthe bacterial and fungal enzymes is shown in Table 2. TheS. cerevisiae, A. nidulans and E. coli sequences are allclearly homologous although the exact degree ofhomology varies with the different domains as discussedbelow.

Specific homologies between the functional domainsThe first 392 amino acid residues of the S. cerevisiae

arom polypeptide are homologous with the E. coli aroBgene product, 3-dehydroquinate synthase (Millar &Coggins, 1986). In the alignment shown in Fig. 4 thereis 36% identity between the two sequences. Thedistribution of the homology shows two very highlyconserved sub-domains, consisting of residues 100-213and 258-387 in the S. cerevisiae sequence. One of thesesub-domains includes the flafl nucleotide-binding foldpreviously identified between residues 96 and 126 in theE. coli 3-dehydroquinate synthase sequence (Millar &Coggins, 1986). Linking these two conserved sub-domains there is a region of very low homology (residues214-257 in the S. cerevisiae sesquence) which contain-sonly one conserved residue and where, in the S. cere-visiae sequence, there is a 27 amino acid insertion.The A. nidulans polypeptide contains a shorter insertion(13 amino acids compared with the E. coli sequence) inthis region (Fig. 5) which cannot be essential for enzymeactivity.A sequence of 11 amino acids (residues 393-403)

connects the aroB region to a region homologous withthe E. coli aroA gene product, EPSP synthase. The.

Vol. 246

377

Page 4: a mosaic of monofunctional domains

378 K. Duncan, R. M. Edwards and J. R. Coggins

1 ATGGTGCAGTTAGCCAAAGTCCCAATTCTAGGAAATGATATTATCCACGTFGGGTATAACATTCATGACCATTGGTFGAAACCATAATT 90MetValGlnLeuAlaLyeValProIleLeuGlyAunAupIleIleHilValGlyTyrAunhleHisAspHisLLuValGluThrIleIle

91 AAACATTGTCCTTCTTCGACATACGTTATTTGCAATGATACGAACTTGAGTAAAGTTCCATACTACCAGCAATTAGTCCTGGAATTCAAG 180LyUHlCy.ProSerSerThrTyrValIleCysAsnAspThrAunLeuSerLymValProTyrTyrGlnG1nLeuValLeuGluPheLys

181 GCTFCITTGCCAGAAGGCTCTCGTTTAC 1rACTTATGTTGTTAAACCAGGTGAGACAAGTAAAAGTAGAGAAACCACAGCGCAGCTAGAA 270AlaSerLeuProGluGlySerArgLeuLeuThrTyrValValLysProGlyGluThrSerLyuSerArgGluThrLysAlaGlnLeuGlu

271 GATTATCTTTTAGTGGAAGGATGTACTCGTGATACGGTTATGGTAGCGATCGGTGGTGGTGTTATTGGTGACATGATTGGGTTCGTTGCA 360AupTyrLeuLeuValGluGlyCymThrArgAspThrValMetValAlaIleGlyGlyGlyValIleGlyAspMetIleGlyPheValAla

361 TCTACA1FATGAGAGGTGTTCGTGTTGTCCAAGTACCAACATCCTTATGGCAATGGTCGATFCCTCCATTGGTGGTAACTGCTATT .50SerThrPheMetArgGlyValArgValValGlnValProThrSerLeuLeuAlaMetValAmpSerSerIleGlyGlyLyeThrA alle

451 GACACTCCTCTAGGTAAAAACTTTA1rGGTGCATTrTGGCAACCAAAAnGTCCTTGTAGATATTAAATGGCTAGAAACGTTAGCCAAG 540AmpThrProLeuGlyLyAsAnPheIleGlyAlaPheTrpGlnProLyePheValLeuValAspIleLysTrpLeuGluThrLeuAlaLys

541 AGAGAGTTTATCAATGGGATGGCAGAAGTTATCAAGACTGCTTGTA1TrGGAACGCTGACGAATTTACTAGATTAGAATCAAACGC7TCG 630ArgGluPheIleAunGlyMetAlaGluValIleLysThrAlaCy IleTrpAsnAlaAmpGluPheThrArgLeuGluSerAenAlaSer

631 TrGTrCTTAAATGTTGT'AATGGGGCAAAAAATGTCAAGGTTACCAATCAATTGACAAACGAGATTGACGAGATATCGAATACAGATATT 720LeuPheLeuAunValValAunGlyAlaLyeAunValLysValThrAunGlnLeuThrAunGluIleAupGluIleSerAunThrAspIle

721 GAAGCTATGTTGGATCATACATATAAGTTAGTTCTTGAGAGTATTAAGGTCAAAGCGGAAGTTGTCTCTTCGGATGAACGTGAATCCAGT 810GluAlaMetLeuAspHiuThrTyrLyuLeuValLeuGluSerIleLyuValLysAlaGluValValSerSerAupGluArgGluSerSer

811 CTAAGAAACC1TITGAACTTCGGACATTCTATTGGTCATGCTTATGAAGCTATACTAACCCCACAAGCATTACATGGTGAATGTGTGTCC 900LeuArgAunLeuLeuAsnPheGlyHisSerIleGlyHiuAlaTyrGluAlaIleLeuThrProGlnAlaLeuHiuGlyGluCyuValSer

901 A?TGGTATGGTTAAAGAGGCGGAATTATCCCGTTATTTCGGTATTCTCTCCCCTACCCAAGTTGCACGTCTATCCAAGATTTGGFTTGCC 990IleGlyMetValLyaGluAlaGluLeuSerArgTyrPheGlylleLeuSerProThrGlnValAlaArgLeuSerLyslleLeuValAla

991 TACGGGTTGCCTGTTTCGCCTGATGAGAAATGGTTTAAAGAGCTAACCTTACATAAGAAAACACCATTGGATATCTTATTGAAGAAAATG 1080TyrGlyLeuProValSerProAupGluLyuTrpPheLyuGluLe'iThrLeuHisLy.LyuThrProLeuAepIleLeuLeuLymLysMet

1081 AGTATTGACAAGAAAAACGAGGGTTCCAAAAAGAAGGTGGTCATrmTAGAAAGTATrGGTAAGTGCTATGGTGACTCCGCTCAATTTGTT 1170SerIleAspLyuLyeAunGluGlySerLymLyuLyuValValIleLeuGluSerIleGlyLyuCy.TyrGlyAspSerAlaGlnPh.Val

1171 AGCGATGAAGACCTGAGArmfATTCTAACAGATGAAACCCTCGTTTACCCCTTCAAGGACATCCCTGCTGATCAACAGAAAGTTGTTATC 1260SerA.pGluAepL.uArgPh.IleLeuThrAepGluThrLeuValTyrProPheLyuAmpIleProAiaAspGlnGlnLyuValVallle

1261 CCCCCTGGTTCTAAGTCCATCTCCAATCCTrm AATTCTTGCTGCCCTCGGTGAAGGTCAATGTAAAATCAAGAACTTATTACATTCT 1350ProProGlySerLysSerIleSerAenArgAlaLeuIleLeuAlaAlaLeuGlyGluGlyGlnCyuLyeIleLyeAunLeuLeuHlSer

1351 GATGATACTAAACATATGTTAACCGCTGTTCATGAATTGAAAGGTGCTACGATATCATGGGAAGATAATGGTGAGACGGTAGTGGTGGAA 1440AupAepThrLyuHKtMetLeuThrAlaValHieGluLeuLyuGlyAlaThrIleSerTrpGluAupAunGlyGluThrValValValGlu

1441 GGACATGGTGGTTCCACATTGTCAGCTTGTGCTGACCCCTTATATCTAGGTAATGCAGGTACTGCATCTAGATTTTCGACTTCCTFGGCT 1530GlyHlGlyGlySerThrLeuSerAlaCysAlaAupProLeuTyrLeuGlyAenAlaGlyThrAlaSerArgPheLeuThrSerLeuAla

1531 GCCTrTGGTCAATTCTACTTCAAGCCAAAAGTATATCGTTTTAACTGGTAACGCAAGAATGCAACAAAGACCAATTGCTCC1TTGGTCGAT 1620AlaLeuValAunSerThrSerSerGlnLyeTyrIleValLeuThrGlyA.nAlaAr5MetGlnGlnArgProIleAlaProLeuValAsp

1621 TCTI7GCGTGCTAATGGTACTAAAATTGAGTACTTGAATAATGAAGGTTCCCTGCCAATCAAAGTTTATACTGATTCGGTATTCAAAGGT 1710SerLeuArgAlaAunGlyThrLyeIleGluTyrLeuAunAunGluGlySerLeuProIleLyuValTyrThrAspSerValPheLyuGly

1711 GGTAGAATTGAATTAGCTGCTACAGTTTCTTCTCAGTACGTATCCTCTATCTTGATGTGTGCCCCATACGCTGAAGAACCTGTAACTTTG 1800GlyArgIllGluLeuAlaAlaThrValSerSerGlnTyrValSerSerIleLeuMetCy.AlaProTyrAlaGluGluProValThrLeu

1801 GCTCT-GTTGGTGGTAAGCCAATCTCTAAATTGTACGTCGATATGACAATAAAAATGATGGAAAAATTCGGTATCAATGTTGAAACTTCT 1890AlaLeuValGlyGlyLysProIleSerLyuLeuTyrValAupMetThrIleLyuMetMetGluLysPheGlylleAunValGluThrSer

1891 ACTACAGAACCTTACACTTATTATATTCCAAAGGGACAT'TATATTAACCCATCAGAATACGTCATTGAAAGTGATGCCTCAAGTGCTACA 1980ThrThrGluProTyrThrTyrTyrIleProLy.GlyHisTyrIleAsnProSerGluTyrValIleGluSerAupAlaSerSerAlaThr

1981 TACCCATTLGCC?rCGCCGCAATGACTCGTACTACCGTAACGGTTCCAAACATiGG1rrrGAGTCGTTACAAGGTGATGCCAGATTTGCA 2070TyrProLeuAlaPheAlaAlaMetThrGlyThrThrValThrValProAenIleGlyPheGluSerLeuGlnGlyAupAlaArSPheAla

2071 AGAGATGTC;TTAGACCTATGGGTTCTAAACCTAAACGGCAACT1TCAACTACTGTrTCGGGTCCTCCTGTAGGTAC11TAAAGCCA 2160ArgAepValLeuLysProMetGlyCyeLyuIleThrGlnThrAlaThrSerThrThrVaJSerGlyProProValGlyThrLeuLysPro

2161 TTAAAACATGTIGATATGGAGCCAATGACTGATGCGTFCTTAACTGCATGTGTTGTTGCCGCTATTTCGCACGACAGTGATCCAAATTCT 2250LeuLyHHluValAspMetGluProMetThrAspAlaPh.LeuThrAlaCyuValValAlaAlalleSerHi AupSerA.pProAunSer

2251 GCAAATACAACCACCATTGAAGGTATTGCAAACCAGCGTGTCAAAGAGTGTAACAGAATITFGGCCATGGCTACAGAGCTCGCCAAATTT 2340Al&AsnThrThrThrIleGluGlylleAlaAenGlnArgValLyuGluCysAsnArgIleLeuAlaNetAlaThrGluLeuAlaLyePhe

2341 GGCGTCAAAACTACAGAATTACCAGATGGTATTCAAGTCCATGGTTTAAACTCGATAAAAGATFTGAAGGTTCCTTCCGACTCTFCTGGA 2430GlyValLyuThrThrGluLeuProAspGlylleGlnValHusGlyLeuAsnSerIleLysAspLeuLyUValProSerACpSerSerGly

1987

Page 5: a mosaic of monofunctional domains

Sequence of Saccharomyces cerevisiae arom multifunctional enzyme

2431 CCTGTCGGTGTATGCACATATGATGATCATCGTGTGGCCATGAGTTTCTCGCTTCrTGCAGGAATGGTAAATTCTCAAAATGAACGTGAC 2520ProValGlyValCyuThrTyrAspAspHKiArgValAlaMetSerPheSerLeuLeuAlaGlyMetValAunSerGlnAsnGluArgAsp

2521 GAAGTrGCTAATCCTGTAAGAATACTTGAAAGACATrGTACTGGTAAAACCTGGCCIGGCTGGTGGGATGTG1TACATTCCGAACTAGGT 2610GluValAlaAonProValAr.IleLeuGluArgHliCyuThrGlyLyuThrTrpProGlyTrpTrpAupV&lLeuHisSerGluLeuGly

2611 GCCAAATTAGATGGTGCAGAACCTTTAGAGTGCACATCCAAAAAGAACTCAAAGAAAAGCGTTGTCATTATTGGCATGAGAGCAGCTGGC 2700AlaLyeLeuAupGlyAlaGluProLeuGluCyuThrSerLyLysAmnSerLymLyuSerVlVal1l1le.leGlyMetArgAlaAlaGly

2701 AAAACTACTATAAGTAAATGGTGCGCATCCGCTCTGGGTTACAAATrAGTTGACCTAGACGAGCrITTTGAGCAACAGCATAACAATCAA 2790LyuThrThrIleSerLysTrpCyuAlaSerAlaLeuGlyTyrLyuLeuValAupLeuAupGluLeuPheGluGlnGlnHisAunAmnGln

2791 AGTGTTAAACAATTTGTrGTGGAGAACGGTTGGGAGAAGTTCCGTGAGGAAGAAACAAGAATTTFCAAGGAAGTTATTCAAAATrACGGC 2880SerValLysGlnPheValValGluAunGlyTrpGluLysPheArgGluGluGluThrArgIlePheLysGluValIl eGlnAnTyrGly

2881 GATGATGGATATGTTTTCTCAACAGGTGGCGGTATTGTTGAAAGCGCTGAGTCTAGAAAAGCCTTAAAAGAT m GCCTCATCAGGTGGA 2970AupAspGlyTyrValPheSerThrGlyGlyGlyIleValGluSerAlaGluSerArgLyuAlaLeuLymAepPheAlaSerSerGlyGly

2971 TACGTTTTACACTTACATAGGGATATTGAGGAGACAATTGTCTrTTTACAAAGTGATCCTTCAAGACCTGCCTATGTGGAAGAAATTCGT 3060TyrValLeuHisLeuHiuArgAspIleGluGluThrIleValPheLeuGlnSerAupProSerArSProAlaTyrValGluGluIleArg

3061 GAAGTGGAACAGAAGGGAGGGGTGGTATAAAGAATGCTCAAATTTCTCTTTCTTTGCTCCTCATFGCTCCGCAGAAGCTGAGTTCCAA 3150GluValTrpAunAraArgGluGlyTrpTyrLyuGluCysSerAunPheSerPhePheAlaProHisCysSerAlaGluAlaGluPheGln

3151 GCTCTAAGAAGATCGTTTAGTAAGTACATTGCAACCATTACAGGTGTCAGAGAAATAGAAATTCCAAGCGGAAGATCTGCCTTiGTGTGT 3240AlaLeuAr&ArgSerPheSerLysTyrIleAlaThrIleThrGlyValArgGluIleGluIleProSerGlyArgSerAlaPheValCys

3241 1rAACCFGATGACTTAACTGAACAAACTGAGAATTrGACTCCAATCTGTrATGGTAGTGAGGCTGTAGAGGTCAGAGTAGACCATTTG 3330LeuThrPheAspAupLeuThrGluGlnThrGluAunLeuThrProIleCysTyrGlyCyeGluAlaValGluValArgValAspHisLeu

3331 GCTAATTACTCTGCTGATTTCGTGAGTAAACAGTTATCTATATrGCGTAAAGCCACTGACAGTATTCCTATCATTTTTACTGTGCGAACC 3420AlaAunTyrSerAlaAspPheValSerLyuGlnLeuSerIleLeuArgLyeAlaThrAupSerIleProlleIlePheThrValArgThr

3421 ATGAAGCAAGGTGGCAACrrFCCTGATGAAGAGTTCAAAACCTTGAGAGAGCTATACGATATTGCCTTGAAGAATGGTGTFGAATTCCTT 3510MetLyeGlnGlyGlyAsnPheProAupGluGluPheLyuThrLeuArgGluLeuTyrA.pIleAlaLeuLysAunGlyValGluPheLeu

3511 GACTTAGAACTAACTTTACCTACTGATATCCAATATGAGGTTATTAACAAAAGGGGCAACACCAAGATCATTGGTTCCCATCATGACTTC 3600AupLeuGluLeuThrLeuProThrAspIleGlnTyrGluValIleAsnLysArgGlyAunThrLyuIleIleGlySerHleHisAupPhe

3601 CAAGGATTATACTCCTGGGACGACGCTGAATGGGAAAACAGATTCAATCAAGCGTTAACTCTTGATGTGGATGTTGTAAAATTl;TGGGT 3690GlnGlyLeuTyrSerTrpAupAspAlaGluTrpGluAsnArgPheAunGlnAlaLeuThrLeuAupValAspValValLyuPheValGly

3691 ACGGCTGTTAATTTCGAAGATAATTGAGACTGGAACACTTTAGGGATACACACAAGAATAAGCCTTTAATTGCAGTTAATATGACTTCT 3780ThrAlaValAsnPheGluAspAunLeuArgLeuGluHi PheArgAupThrHisLysAunLysProLeuIleAlaValA.nMetThrSer

3781 AAAGGTAGCATTTCTcGTGATrTGAATAATGATTvAACACCTGTGACATCAGATTTATTGCCTAACTCCGCTGCCCCTGGCCAATTGACA 3870LysGlySerIleSerArgValLeuAsnAunValLeuThrProValThrSerAupLeuLeuProAsnSerAlaAlaProGlyGlnLeuThr

3871 GTAGCACAAATAACAAGATGTATACATCTATGGGAGGTATCGAGCCTAAGGAACTGTTTGTrGTTGGAAAGCCAA7TGGCCACTCTAGA 3960ValAlaGInIleAenLyuMetTyrThrSerMetGlyGlyIleGluProLyuGluLeuPheValValGlyLymProlleGlyHi SerArg

3961 TCGCCAATTTTACATAACACTGGCTATGAAATTTTAGGTTTACCTCACAAAGTCGATAAATTTGAAACTGAATCCGCACAATrGGTGAAA 4050SerProIleLeuHiuAunThrGlyTyrGluIleLeuGlyLeuProHisLysPheAspLyePheGluThrGluSerAlaGlnLeuValLys

4051 GAAAAACTrTTGACGGAAACAAGAACTrrGGCGGTGCTGCAGTCACAATTCCTCTGAAATTAGATATAATGCAGTACATGGATGAATTG 4140GluLyeLeuLeuAspGlyAunLyeAsnPheGlyGlyAlalAaValThrIleProLeuLyaLeuAspIleMetGlnTyrMetAupGluLeu

4141 ACTGATCCTGCTAAAGTTAFFGGTGCTGTAAACACAGTTFATACCAITGGGTAACAAGAAGTTTAAGGGTGATAATACCGACTGGTrAGGT 4230ThrAspAlaAlaLysValIleGlyAlaValAunThrValI leProLeuGlyAunLysLysPheLyuGlyApAunThrAspTrpLeuGly

4231 ATCCGTAATGCCTTAATTAACAATGGCGTTCCCGAATATGTrGGTCATACCGCGMTGGTTATCGGTGCAGGTGGCACTTCTAGAGCC 4320IleArgAmnAlaLeuIleAsnAsnGlyValProGluTyrValGlyHiuThrAlaGlyLeuValIleGlyAlaGlyGlyThrSerArgAla

4321 GCCCTTTACGCCCCAGTTT CACAAAGTCF CAGGACAACTrCGATGAA CATTAAGAGTC^CTT 4410AlaLeuTyrAlaLeuHluSerLeuGlyCysLysLysIlePhellelleAsnArgThrThrSerLyuLeuLysProLeulleGluSerLeu

4411 CCATCTGATTCAACArTArrGGAATAGAGTCCACTAAATCTATAGAAGAGATTAAGGAACACGTTGGCGTTGCTGTCAGCTGTGTACCA 4500ProSerGluPheAmnIleIleGlyIleGluSerThrLyuSerIleGluGluIleLy.GluHiuValGlyValAlaValSerCysValPro

4501 GCCGACAAACCATTAGATGACGAACTT'AAGTAAGCTGGAGAGATTCCGTGAAAGGTGCCCATGCTGCTmrGTACCAACCTTATTG 4590AlaAspLysProLeuAspAupGluLeuLeuSerLyuLeuGluArgPheLeuValLysGlyAlaHisAlaAlaPheValProThrLeuLeu

4591 GAAGCCGCATACAAACCAAGCGTTACTCCCGTTATGACAATrTCACAAGACAAATATCAATGGCACGrrGTCCCTGGATCACAAATGrFA 4680GluAlaAlaTyrLyuProSerValThrProValMetThrIleSerGlnAepLysTyrGlnTrpHiaValValProGlySerGlnMetLeu

4681 GTACACCAAGGTGTAGCTCAGT AGTGGACAGGATTCAGGGCCCT AAGGCCATTTrTGATGCCGTrACGAAAGAGAG 4767

ValHisGInGlyValAl&GlnPheGluLysTrpThrGlyPheLyuGlyProPheLyuAlaIlePheAspAlaValThrLyaGluEnd

Fig. 3. DNA sequence of the AROI coding region, and the corresponding arom protein sequence

Vol. 246

379

Page 6: a mosaic of monofunctional domains

K. Duncan, R. M. Edwards and J. R. Coggins

Table 1. Structure of the five E. coi enzymes which correspond to the S. cerevisiae arom activities

Note that the length of shikimate kinase is reported here as 173 amino acids and is 174 amino acids in the text. The N-terminalmethionine is cleaved post-translationally (Millar et al., 1986).

Pathway Calculated Length Quaternarystep Enzyme activity E. coli gene Mr (amino acids) structure

2

34

5

6

3-Dehydroquinatesynthase3-DehydroquinaseShikimatedehydrogenase

ShikimatekinaseEPSP synthase

Total

aroB

aroDaroE

aroL

aroA

38880

2637729380

18937

46112

159689

362

240272

173

427

1475

Monomer

DimerMonomer

Monomer

Monomer

S. cerevisiae EPSP synthase domain is located betweenamino acids 404 and 866. Of the five functional domainsof the arom multifunctional enzyme this is the best con-served, with 38% identity between the E. coli and S. cere-visiae sequences and 55% identity between the fungalsequences (Table 2). As with the 3-dehydroquinatesynthase domain there are two very well conservedsub-domains separated by a region with no homology. Inthe S. cerevisiae sequence this unconserved region of 51residues (Ile-701 to Thr-753), like the unconserved regionseparating the two 3-dehydroquinate synthase sub-domains, contains an 11-residue insertion compared withthe E. coli sequence. This two sub-domain pattern isillustrated in Fig. 6, which shows the alignment of twobacterial and two fungal EPSP synthase sequences.Much attention has been focused recently on EPSP

synthase since the discovery that the commerciallyimportant herbicide glyphosate (N-phosphonomethyl-glycine) acts on plants by inhibiting this enzyme(Amrhein et al., 1980; Mousdale & Coggins, 1984).Glyphosate is also a potent inhibitor of the N. crassa andE. coli enzymes (Boocock & Coggins, 1983; Lewendon &Coggins, 1983). A glyphosate-insensitive form of EPSPsynthase has been isolated from a Salmonella typhi-murium mutant resistant to glyphosate (Comai et al.,1983) and it has been shown that the only alteration inthe enzyme structure is a Pro to Ser change at position101 in the enzyme sequence (Stalker et al., 1985). Pro-l01is conserved between E. coli and S. typhimurium (Fig. 6),but in both the fungal sequences it is replaced by Phe(position 505 in the S. cerevisiae sequence). This positionfollows a highly conserved region in the bacterial andfungal sequences and precedes a less well conservedsequence which includes a 5-amino-acid insertion in bothfungal sequences (Fig. 6). The absence ofa conserved Proat position 505 in the fungal sequences indicates that thisresidue cannot be an essential feature of glyphosate-sensitive forms of the enzyme.

It has been proposed that the mechanism of EPSPsynthase involves a cysteine residue at the active site(Ganem, 1978). The greater than 98% inactivation ofthe multifunctional N. crassa EPSP synthase by N-ethylmaleimide and the protection against inactivationby this reagent observed in the presence of shikimate3-phosphate and glyphosate are consistent with thissuggestion (M. R. Boocock & J. R. Coggins, unpub-

lished work), but it should be noted that cysteine-directedreagents do not completely inactivate the monofunctionalE. coli (Lewendon, 1984) and Aerobacter aerogenes(Steinrucken & Amrhein, 1984) enzymes. This impliesthat an important cysteine residue is near to but notnecessarily at the active site of the enzyme. There is asingle cysteine which is conserved in all four EPSPsynthase (Cys-853 in the S. cerevisiae sequence, seeFig. 6); further experiments are required to establishthe precise functional role of this residue.The EPSP synthase region is linked to a region

homologous to the E. coli aroL gene product, shikimatekinase II, by a 20-amino-acid sequence (residues 867-886in the S. cerevisiae sequence). Homology with E. colishikimate kinase II (Millar et al., 1986) extends toresidue 1059. The homology found in this region, whichis 23% for the E. coli versus fungal sequences and 4000for the two fungal species, is lower than that found in theEPSP synthase region. Although the overall degree ofhomology between the yeast and E. coli shikimate kinasesequences is rather low, there is one well conserved regionwhich has sequence homology with the 'A' sequence ofthe ATP-binding site of phosphofructokinase andadenylate kinase (Walker et al., 1982). This 'A'sequence, G-X4-G-K-(T)-X6-I/V, occurs between resi-dues 895 and 909 in the S. cerevisiae sequence(corresponding to E. coli shikimate kinase residues9-23); the final residue, conserved between these twospecies, is an alanine rather than the usual isoleucine orvaline. Comparing the S. cerevisiae and A. nidulansshikimate kinase domains, the sequences around this 'A'region of the ATP-binding site are very homologous(9/11 matches). However, the A. nidulans enzyme doesnot have the GKT motif; instead it was GKS. Althoughnot unique in this respect (Midgeley & Murray, 1985),this feature is unusual in that the GKT is highlyconserved over a wide species range and over a widerange of different ATP-utilizing enzymes, and it isconserved between the S. cerevisiae and E. coli shikimatekinases (Millar et al., 1986).

Following the shikimate kinase region is a regionwhich shows homology to the E. coli aroD gene product,3-dehydroquinase (Duncan et al., 1986). In the alignmentshown in Fig. 4 the N-terminal amino acid of E. coli3-dehydroquinase overlaps with the C-terminal aminoacid of E. coli shikimate kinase. The percentage

1987

380

Page 7: a mosaic of monofunctional domains

Sequence of Saccharomyces cerevisiae arom multifunctional enzyme 381

S.CerevIlsae(l)N V Q L A K VMPI[i?Gj. ND0 1 1KNV C V N IN N LVjT I I K N C P S TT VVI C N D TN..l L S K V Pi 1 Q QflV L 1571B.colIarol NSIVT..J. .KRRSYP IT IA SCGLFNI& SLPK!CQ VKLVEIT M. EKT L ALUL~IMY.D K

1 jI aroBS.c.reviuiae K F K A S L P K S R L L T Y V V K PKUT SK-SJ BUT K A Q L K TD YLL . . V K G C T TV N VA ICCGGCVI G-D1N (I1515

S.cerevisiae I F V A S T F MVN V V V V SLLV I I OSL SV L KT ADL L C K D F GV L V 1V L(15K.colliarog V N I I JS.crevimieKGA K GKINfjNiTj ACjJ ADKfT Ii ~ SAll F KENVVNCjNFI AV fK V T N 0LTVND I 0KW (235)B.colI aroS KT JG PA LS A

GVGLG K T Al CLWLK CNA G ANF V Q PJL D A LUD L 0 C

S-.cereRiriLe KC V S IG NVG [N L Ti AF~IOWLr P TD 01 VlAWIILUZSKIMA AL WTF7N .SNGD[NK VF K K L TLW K K T P L (2353)S.cer*lla -1S.cerevistee SNTDIDILL AK N L0 K N K L S K K K V V ILKSIKA C ML OSAtSS F1VN K LSiFr7 ILGTO TAL VIP F K D IQ An (413)

S.Cerovieiae 00I0KV V1 IDP[FKKj I G S ALKlI IL:Mj7 TCKJCKTGI K I. VMDIDW LRTKI I LT DNKTKL VT ISF K 0 NP[(473)K.coll areA K RL PL SG K9LVLN DCQ A

S.cereviuta. C KT V V KG H G C TLS AC ADL

P AG9GQCT ASI T SLA DAJj INS T55 K0 ZY KIL TGM SA WDQIi(4731K.coliaroA STSCKI IUN~~~~~~~JPLHAVIS AL KUI.N LUAL C . SDZILKL JKKJ

__co__aoA_ VrI P LVOSI L KY L MN KGA LAP1KVT VT D S L VF RGSK LA A TV SS 0 VSSI LKS11 (593

S.cerevisla. A K K P VT:LA LVCff- K PLIS C~V TIG ANH K KlFnGINr VT STL TR KPSVTS VIPKGTIVLTGI APS NQI I[i(85331K.coll aroAIe E 2GLs rLG AGTANPLA ALC i TG.WS9 P R LVWK

S.cerevieiae S A S S A V P LA AN ANT CTEfITV TV P NFn K ATD SV FTKEA CK T VTA STI TVJ S CPJ (7913K.coll are IGNACSWL :JL G AKJI Ktj I QU P LI L Q GG tGGJIVJDCG SIVS SQJF.~JKK~J T LC TCOD LI SCT SC

S.cerevielae PVCM V MLK P LKV OKGK PH TO A FL TAIC V V A AKISHG OS P NSATTN VTI K CIAP VKM K CS rIV IM~(6531K.colliaroA KLN IDDNVNNRIPKUDLVSKNPT IL FGVaIA.QKT QFV S QS G Y L V

S.cer.vilaae K SLAITMK LFC AKT AK PD CITr VTVN L NI G1 D LK ASSCi PVC VKCHTV C KS VTQ AN SF L LRS N V (7133)K.coll areAKoSA GK V AK IC H tTVK GRV. I T PKIKla NF A1.1 K I K A NO S N A N C F SL V TR

S.cereviulse PVGTLN NKSD LKVA VNEN[STDK A F LCTA C CVVDAA fl .NSDSDKLCSAAKLDCTTIG A KPQltVK7C TSKKNIS(68 773

S.cerevisia K VVI AV ATLCK GTIS V C AG L CS V K L KVDMLOK LSF P~ N N NC S V KOF VrVA N C KK G K K (847)

K.CelliArOA R2AL AE.V. Y .... ..RTASTJA KLLTFWNN NNC RIVAVLCAPVD VLVMI 3JLAA

S.cerevislae NVFWIQ V pv I RNr7 KTWnGWW:VLn G D G L E C TS KKVAS(1 71)B.colI aroL K 0 .S P. L. C,K, P L KV0K K 0ALUDV S MAN Q A.TK P . .V I S C I S. A .A . I NC .~

7 1 /aroA~reL(14) 1/ao

S.cerevluiae A 0 F V S K 0 L S~~IGKAT05I IIFTVT TTNl QS CKNFWP0CK NF NTQSK V D I A LV K L (1174)

S.cereviuiae T L PT GV K.. V I NKSC TKIICSN F0CLVSA VYKLD DOE A QKV fEOA VVKFVCTA(2)K.coll aroL FT CDPL V K K TV AC VKA VN ANO K V D S NRRFN K TPD A..Q I IQ AL SKTW N IOS F ADI G AL NR

S.cerevlalac V IF KO EL IL NY G D DK N PSOT K N K LI AV NS NTSK AKC D F LS N NG V LT KVT S DL L DN S A P C (1268)K.colliaroL EnS.cerevisiae L V A 01 TSKP Vi-C-D S P Y E IW : Z WKP ICK NS SFS FIL FTCSVAK AL CL A. .L .[RS KFS OKY FA(1341)K.coll aroL A LRP N LLCK.NKl TI D LLVH4JW J~.JI. JFI I D JT F9 A S QV IN GK L A QTLJ VNC SK V

areD/(240) (1)/aroK ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ arL/171(I/soS.crervIluae T KST A 0V KM K KSL LOC K NFV CCL AF ADV TIP L Kf LO T VI N KLTGAACI VIA V NV I P. Nfl K K F (11401)K.coll aroDK LA I OF INTLKG AP I CS CL NVKI TV F K KL K A F RAS A K LFD K S WIAAL C IAV NT L K .~ LC L

S.cerevluiae K C ONT KV L C I RLNATl : I pN N C TPF VVE C nKTA7CN I C A C C KTS S A L V A LN C C Kn K IDL1F LI(11741K.coll aroDK Lj ~ V LLS1DTM EK.ILFTFRS. GGI IST9A CT GL DNIDL9L

S.cerevisla. TTKLKP 1 L P SF.VFNK GmI C I K ST KM D FKIQ KKL V CW V A WV S CV P NOK P LOOKLV fL . K-FVG (15312

S.cerevislae VLKSFLVKCARNAAF HFPTLLKAHA KNKPSVjLFA V NMTSKIn lSKVQV LNVVCSNLVTP CVATSQJiLP 1AA Q[~ 11572)K.coii aroD .LQIKSPTCISV V 0E FyAD RQ K C TK I.I.. JAF K W AGCKES CS KGNGA N11L C WII CA G9AKA JLLSVWCA

S.cerevsliae C F K C P F KAIF0A V TKNGG fEPK11586)-KFIGHIS [-] HNTGYEI GL ....M DKF(31K.colliaroDAK~JLPG KP....VIKQL ....QK A~VF[JN!T FHSKS Q JNL.....SA ]P

aroD/(2401 (Iare/12 roE

Fi.S.ceAminoaci homologiesLV Etwe theDGS. NerevisiAe ArVom TmLtifunctIonal ezMeD:tTDAKVIandNthecorespondin moo FctoaE1c0lienzymes PE GR

Key.aoB,aOAPINDdeyrountesntae arA,FFSAGGKGANEPSP synthase; aroL, shikimate kiae;ao,L-eydounaearER hkmtdehydroenase Nubrsaoe adblwth eune inicteaIno Aci posITiosi h .cvsanyeadihiniiulaeccl e Nzymes, :Arsetvl.GpiN both sEquence mAinaiGheOV mi with Fig. 5. K I MIiRI14

homoogy amongL th NThre species isRsimilarPtotht it apentAdecGApeptideLIisolaed.fro Th 3-deydrofoudCieteisoTshiKimtLKinaESeP IKIEE MMVPdoaiP(abeL).qunaeDctveofth N.crss aro polypeptide112Confirmation that thisEreio oFA th S. QerevSiaEL (S ChaudhurIN &JG R. CoggPAIns unpublished. work)

sequence truly encodes the 3-dehydroquinase activity This peptide had been radiolabelled, by treatmentwas provided by the observation that there is homology with 3-dehydroquinate and NaB3H4, on the lysine

Vol. 246

Page 8: a mosaic of monofunctional domains

K. Duncan, R. M. Edwards and J. R. Coggins

S.c revlse.a Vl L A K V pr-L7 N V G YMN! H ONHL YV:TKT!fl TSFIT I C iTX TN. L 5 K VIFY Y Q L 157)

A.nldulaen JS NPFTK ISIL GIRI1SI IAD FPG LWRN IV A .TIT.L.V!L VTD T NJIG 5I Y

ScIerevIlsa. KF K ~S L PF G L IiTIM1 V K1 GOjiTIS TTilaI Tf 1AJQ L I D L fl. V C.MET liiDT V1 v IR G GTG TV G DIN 11315

A.nidulene K RAW 9 I 1K5R TKADII I LS0N PGRDTV I LGGGVIG

S.cer*vleia. I G V A S6Tl GVRV TI L L AN V D 1I1IG0GK TA!I D T PFL GKMN O-AFi FP K F V L VDMJI wf(175)

A.nlidulans 16 IIW PT KI Y I DL

S.cer*vleia. TA K A V TA Cf VW N A D fla[Li1 Si -71S L F LMN VflM G A K N V K V T N Q L T N I DI 1I (2351

j~~~~~~~~~~~~ K A~~~~~ 1 aIG V TV K TF JG aIl r T AN a F. I T U

S.ceroevtola N T D I MA L D T Y K LVO R I K V -A IDIW L R N L L AIG y 1 [ILT PT]A Lo (2541

R N L L N V H S I C H A I A . I L T F QIHL.11 R L: I r A

.A G L FDK K[W FL H K 5A.wnidulans l A 9LJA R LIG LK V A U5S RI V C UAA Y L PT K D A R I R KILjTJA G151 C S

S.Cer*vislae fIf L K Kr-IS I ID iTT111GWSRTTV fVjI fLu SfTr17K C flGD SfQ F [Fj D aFD L lilF IflT D T L MY~FMF K D I A 14131

UK T RWS V A N D I R5JV V A S I G V A N

S.cerevleiae 0 113K V V I IPGF K SIl NIt ALI L A A TLTG fIQOfK IK N L TiiWDD T K HN -iT MTRV N OiK G !-TF I DiV~ (4731

A.ftidulane S S V I C AIP I S_N _ALJ V L L G SjWT R K LLa .S D Di T VIM LIMN L ALL TJFLS Bigaa

S. cerevielae GI iT V [V-IE GMH GUGS T fla MiC AD0 ~FiFMLLUA GTAl.RFL TIS LRA L V[NUST SnS K Y I L TiG1NA[45l3331A.nldul aneig IV L V V N G K jjG JG L

C .1T EI2a.

S.corevleale. 1I A L V LsflRAliNlG T K IZ Y L N N G S L PI KIV Y TD0 S V [MKJ[T'G fRI KI 1TTIO v s sjI

A.Imidul.ans I G D0L D JAL TjIA N V L PF LNH T S K G AlA ~M~H A~ K

S.cerevelea. P1 VTii LIA CKPIS1K Lrn Di T1I KRW11N NVTMR11 iT Ti"P fTVY FlK IiiM N[Wil5 v TIj9(631A.Imidulan~e P V T L LV K PFI SJ FW I D TT A NJR SIFG DVOW Xi: K T YaML! P

S.coreviala. 1sfTY AFr-AIW4TGTT1VIT GF [IF 10G FAJR LK FHCCK TOUriAI TV fPi (7131

A.nImduItne D A SAJTCYj L : V AJV IIT G.TT.irJC p N IG S A L G R VK[!j P C, CIT V K,0 ElJ~K [

S.c*roviesie V TOLK L K H V D m A IF L T A V V A A I S D S- ID FPM S A T T T I KE G IA QR V K TC N IL[A?Nj7731

A_n_dulan S D GoI RI.aA T S K G G T Lia C V RuFi T SNHRP K T T V S SI IAN RVK CN IKL M

S.cerovlslae A T L A FGVK T T tiL PF[1 I V G L M1K D L VP D S G V Vii1 [FaV 1LLOA G N V (1331A.nldulans K Djr L A K F GCV I C R NJDtD GI. .LI D G I R L R P1 V G V 5V AJF F VL..J.L

S.cer*v1eiae N S Q N E D KV ANM V it1 L a1 N1CM K T GD0VN S LM A K-L D Ar P L I C T S K K N S (8171

A.nidulans S L VT 0 FTL

L KK T TO;J a3L G[CK]K L E K1 P V A A S G D

S.Corevlela* K KrnMV V I IG A KT T I S K V-C A SrA L G K L LF 13Q H N N S V K F V V ET W Kr R (947)

A.nidulane N A I I I G R I A G KS TA N JV SK1A5JNM R FVjD L DJT KL .T V G N T I D I I K WI GI..!.AIM

S.Cerevslae FT MF Kwlg VI N Y DDIG0 V FI T1GR G I F11 A S[a KfA LfK F A S S M V L H L H I T (10031

A.nidulane 5L K±IL KU. T L K RIG V F A G G GV V KjN FP A K L[ LT DY H K T K G V L L L N K K I

S.Cerevlula* VWTF SflFMP RK( V W N rR R 1KG Cf-Y K P H CflA KAK F13QN. .L aS NjS K YIA( O1011

A.nldulans D F L S I 15K .s .P.A. Y.!...5JD0N C V LL2.JK F15JF 13E C N Q Y Y s D A P S CG L A D NR L V

S ov lse oe H LA N S . (1114

A.nldulane D SI

K K K. JF A SL jLF1P R KA D I0 KE V CV CS A V L R V DIL LOK D P A S N N D I S

S.cerev1leao AfiDMF VfiS Kj FL ILURj K A T D S IFI I F T V K[ GINF P DFKFK F K T L ALYIRL KN Vrg F L L1E L (11741.

A.nidulanv V DJ V 11Q LSIF L S

S.corevlslaa T L T I Y .nmINK TNKI F1IIG iF jL Y S!1 WI0 A IK N F M-O rA LIT L 0DV fV F [VnTTM( 12321

KI FI HS NDI P KG A LIS A.N KIFIYC KIL

A.nidulan. AF15 .1AUJ FTK.N KI5.0VI!.±jI.~C11.1 L[W5I _JH~~ JS GVC 13

S. cerevlmlae flV 13 TR CF HV S SI rC IN TL .DKLF Fi (1191)A.nldulann 5S A T L S L I KIP K K F A I F 15]S P IJS S T T P Y L S A I T T PF A W L R T K]

S.coervileI. T [qTlA lV F K L t. N K N F G, A A V T t, lil1D 74( Y D (iTjLlT 0, A A V I A V N

IHP 1. G. N K K F (1401)

A.nldulans C S Lij A15 L T S A A P S V T R S S T S C F S TI LI It R S SE15jI. T S F PC R L A R WL H A SY V

S. -ereulmle* D W 1 I It N A N N r-lY P S1 V 1; H T A (4iT--V] A G GT jS li A A S InC KK[T FV VI N (14581A Nd TeDS W I s FtJ A G V YU P It IF 0 LL... I NH [ J S [ J Y U C 5

S.Cerovm1.e TF LK PFL I Ffi F-%]F NFr -lI( fF sT XFlJI [RF KHV GV A V CV PA'A K D I.fl K (I1312)A.nidulans PI LI N MH V sU F P SI Y15J_Sj V JP S15jF 5JS V V A I T I A I D15T M 5T 5C HN F A Q

S.r-ere ie ce L. F' V 1GJ]M A A V F1 T [I. F A rW7 -K7 1S rI-TTiFP fliT I S FDJK Y QflH- V V rV`GS WL VH rQT]-jV AR F K W T (13/,2)A. n idulans A D E A V K I A P I RjJ t. TI P JIKF LV5y I2 ..

G

S.-erevlelee G F KG("PF KAlI F DA VT (1388)IA. nidulaone S I 1. A C:F 1. T P S N T L S R Y H R V V A P S F N

Fig. 5. Amino acid homologies between the S. cerevisiae and the A. nidulans arom multifunctional enzymes

Numbers indicate amino acid positions in the sequences. Again, gaps in both sequences maintain the format from Fig. 4.

residue which is known to form a imine intermediateduring the enzyme-catalysed reaction (Chaudhuri et al.,1986). This homology predicts that Lys- 1227 in theS. cerevisiae arom polypeptide is the residue involved inimine formation in the 3-dehydroquinase reaction. Thealignment places Lys-170 at the active site of the E. colienzyme; this prediction has also been confirmed by theisolation of an active site peptide (K. Duncan &J. R. Coggins, unpublished work). The mechanism forthe action of 3-dehydroquinase proposed by Walsh

(1979) requires a basic group for proton abstraction.Chaudhuri et al. (1986) have provided evidence that, forboth the E. coli and N. crassa enzymes, this group is theimidazole side chain of a histidine residue. Two histidineresidues are conserved between the E. coli and S. cere-visiae sequences, but only one of these (His-1198) isalso conserved in the 3-dehydroquinase domain of the A.nidulans arom polypeptide (Charles et al., 1985, 1986)(Fig. 5). It is therefore reasonable to propose that this isthe active site histidine residue.-

1987

382

Page 9: a mosaic of monofunctional domains

Sequence of Saccharomyces cerevisiae arom multifunctional enzyme

Table 2. Summary of the homologies found between the five E. coli monofunctional enzymes and the S.multifunctional enzymes

cerevisiae and A. nidulans

Number of residuesNumber of residues conserved conserved in thebetween the E. coli mono- two multifunctional Number of residues

Polypeptide functional enzymes and each enzymes and in the conserved in thechain length arom multifunctional enzyme corresponding functional domains(amino acid) E. coli monofunc- of the two arom

E. coli enzyme residues) S. cerevisiae A. nidulans tional enzyme polypeptide chains

3-Dehydroquinate 362 130 (36%) 128 (36%) 96 (27%) 201 (51%)synthaseEPSP synthase 427 162 (38%) 146 (34%) 127 (30%) 254 (55%)Shikimate kinase 174 39 (23%) 35 (20%) 22 (13%) 68 (40%)3-Dehydroquinase 240 50 (21%) 41 (17%) 30 (13%) 97 (42%)Shikimate 272 68 (25%) 41 (15%) 29 (11%) 75 (27%)dehydrogenase

144A1U@ EVHP..CV^HSS 1 L V L A A L G S G T C R I K N L L H S D D T C VM A L lE aA.nldulans EV N pS cer9vl*l&* L V Y P F K ID I P A1D Q Q K V V I P P G S K S I S N R A L I L A A L G L G Q C K I K N L L H S D D T K N V1EH e L&.coll *ro Nl Z S L T L Q P I A R V D G T I N L P G S KE V S N R A L L L A A L A H G K T V L T N L L D S D D V R H N L N A T A LS.tvh aroA N Z S L T L Q P I AR V D G A I N L P G S K V S N AA LL L A A L A C G K T A L T N L L D SID D V R H M L N A L S A L

A.nidulans G A A T F SW eE E GE V L V V NG KGGI .N L QA S S SPLLY L G NA G T S _FL T T VA TLA NS . S T V D S sS.c-r*vlsl&* K A T I S W Z D N G Z T V V V Z G H G G S T L CA C D P L Y L G N AG T AS R F L T S L AAL V N S T S S Q K Y|IVZ.coll aroA .GV S YT L S AD T R C e I I G N G G PL H A E GA L L F L G N A G TA N R P L A AA L C L . . . . . G S N D I VStvDPh aroA .JI N Y T L S A D R T I C D I TG N G G ALR P G LE L F L G N A G T A N R P L AA A L C L . . . . . G Q N E I V

A.nldulans L T GIN N Q P I G D L V D A T A NV L P L N T S K G R A S L P L K I A A S A G G IMN L A A KV SS QIYS.cerevlslae |L T G|N A R N Q P IEP|L V D S L A NT K E N N G S L|P|I K V Y T D K G G RLIE L A A T|V S S Q|YE.coll aroA L T GIE P|R MER P I IL V DAL RL GG AKIITY LIE QIEN Y PIP. .L R L Q fFITIG GIN V D V D G SV S S QIFS.tyh aroA |L T G|C P|R M K|ER P I G L V DS QGANIDYLQEQAE N Y PNJ. . L R L RLGR JTLGGJDWE V D G S V S S Q F

A.nldulan V S SLL M C|A;PYI;PKP VL R L G K P JT ID T A N R S F G I D V Q K S T T C E H T Y H I P Q G RS.cerevlslae V SSZL M|C AP]YAE|E|P V T LA L GIGIK P I 1S5 1JYID |TI K NINEK F G I N VE T S T T EP Y T Y Y I P K G HE.coll aroA L T A LLNTjA PL A PE . DITIV I R I KIGID L VIS K P Y IDI IITIL N LIIK TIF GIV E I E . N Q H Y Q Q F V V K G G Q SS.typh aroA L T ALL NT A P L AP K. D TI I R[EK JE L VIS K P Y I D I TL N L N K T F G V E I A . N H H Y Q Q F V V K G G Q Q

A.nldulans YV NF Evi S;1AT PLV V T T CT P NIS A L Q G D A R F A V R PGC T v cS.cer*vl9lb* |YI N|P|S 9 Y V I 9 S D A S S A T Y P L A F| AA T|G|T|T VT|V|P N|I G|F E|S|L|Q G D A R F A tV L|K P N G|C K T QE.coll aroA IYIQ S|P|G TIY|L VIE|G|D AS S AISIYIFL AIA^AAI K|G|G|TV|K|V|T G|I GI NISINIQG DIIIR F AI.ID V L EKEN G ATIIIC wS.typh aroA NH S P G IRYLVjG D JSA Y F L GAA A I K JGG TVJKLVT G IK J N1G I R F AJ.D VL2E K GA TW T w

A.nidulans T E T S T T VWG P S D G I L . RA T S KR G Y G T N DR C V P R C F R T G S H L P N E K S Q T T P P V S S G IAN QS.cerevluiae T A T S T T V S G P P V G T L K P L K H V D LE P M T D A F L TA C V V A A I S H D S D P N S ANT T IE GGIIIINIQRE.coll aroA G D D Y I S C[TlR G EL N A I D M D M N H IP D A A M T IA . . . . . . . . . . . T A A L FA KGIT TR LR N IYINWRS.tjh &toA G D t F I A CWR G E L H I D M N N H I P D A A M T I ... . . . . . . . T T A L F A K G JT L R N Y N WI]

A.nidulens K CIC N I KA.K.Dr F V I C R H D G .... . . L C I D G I DR S N LR Q P V G G V F CDD H RIVRS.cerevislae |V K C|C N R I L|A N T E L A K F|G V K T- T|C|L P D GrIQ V H G L N S I K D L KVVS D S S G P V G V C rYD D HN |V|AE.coll aroA IV K EIT D|R|L F A NA T E LREKIVIGIAE VEIEIG HIDIYIIY. RI TIPP E K L N FA I AIT YIND HRINAS.Yshb aroA V K E T D RL FAA E LJR JVGJA E V EE JG H l.JY WJ....... I... R I T P A K L Q H A D I G jN D H RJKA

(8661A.nidulans F S V1L ..... . . . . S L V T P Q TL[ LE K E VG KT W 1G W W D T R Q L F K V KS.cerev1lse NSF S L LRG N V N S Q N E R D E VANP VIKILIE R H|CT|GIKTIWI GWWDVIL .N. . . . . HR.coll aroA C F S L V. . . ....... . L S D T|PV|T|I L|D P KIC TIAIK TIFIPD Y F 9 Q L AI S Q DAS.tYeharO JCF S LV.V. . .. . . . . L S DT| P VTIL| D P KCT K T F PD Y FC QU L R N S T P A

('271

Fig. 6. Amino acid homologies in the EPSP synthase domain in four species, S. cerevisiae, A. nidulans, E. coli and S. typhimurium

Only positions which are conserved in at least three of the sequences are boxed.

In some species of micro-organism there is aninducible catabolic pathway-that allows the utilizationof the plant metabolite quinic acid as a carbon source(Giles et al., 1967b; Giles, 1978). One of these catabolicenzymes is a 3-dehydroquinase, and it has been reportedthat there is no discernible homology between thebiosynthetic 3-dehydroquinase domain of the A. nidulansarom polypeptide and the inducible catabolic 3-dehydroquinases of N. crassa and A. nidulans (Da Silvaetal., 1986). Neither the 3-dehydroquinase domain of theS. cerevisiae arom protein nor the E. coli biosynthetic3-dehydroquinase show any homology with the induciblefungal 3-dehydroquinases, which supports the proposal

that the biosynthetic and degradative 3-dehydroquinasefunctions have arisen independently (Da Silva et al.,1986).The C-terminal region of the arom polypeptide (resi-

dues 1306-1588) is homologous to the E. coli aroE geneproduct, shikimate dehydrogenase (Anton & Coggins,1987). In this case, the homology between E. coliand S. cerevisiae (25%) is higher than that for the shiki-mate kinase and 3-dehydroquinase domains; the A. nidu-lans shikimate dehydrogenase domain however hasdiverged substantially, being only 15% homologous withthe E. coli and 27% homologous with the S. cerevisiaesequences (Table 2). This final domain of the arom

Vol. 246

383

Page 10: a mosaic of monofunctional domains

K. Duncan, R. M. Edwards and J. R. Coggins

A.nidulansA S.cerevisiae

E.coll

[387) 2A S VfA N[IfiAVVfPAPSIEIE H . .GVAHSSNVICAIPPA QQF SJD [L F I TDFETDLTYLF KDVIJPIADQQKVVII PPDCQSA. .. .MES LTLQP I ARVD TINLT

A.nidulansB S.cereviuiae

E.coll

[8651L RQL l..L AR

L F KVQF .KV.KEGifEELEEEVAASGPDHSE G AKJL DG A LE CT S K K

i ;SAA..... .....

(894)RGN [QRA IVYr I

NSKK VV I.MTQPL LI

1032 )A.nidulans E C S N I

c S.cerevisiae E C S N F-E.coli [JV A H I

A.nidulansD s.cereviulae

E.coli

110691Q YY S RD ASP S G L ARRS ED F N R L Q V A JG Q ID S L SSF F A P H C EA E FQ. .LRRSWSKYIATITGVRE III DAT NE PS Q VI S G IRS AL A Q TI NC KrEV TV K D LVI

112661 [1313)R I FGGFM|TPV|SHPS FAA PGQL SATER GLSL E K K F

RV fL VT SD LL N AAPGQLTV AQ N NJY T SMG GWE MJE L G

RLAGEVFGSGGNFWCGK AV R ANL GK... ...... ...METYVF

Fig. 7. Amino acid homologies in the regions which link the functional domains

The linker sequences shown are between: (A) 3-dehydroquinate synthase and EPSP synthase; (B) EPSP synthase and shikimatekinase; (C) shikimate kinase and 3-dehydroquinase; (D) 3-dehydroquinase and shikimate dehydrogenase.

polypeptide chain is connected to the 3-dehydroquinaseregion by a 12-amino-acid peptide (residues 1294-1305 inthe S. cerevisiae arom sequence).Linkage of the domainsThe S. cerevisiae arom polypeptide chain contains

1588 amino acid residues, which is 113 more than thetotal number of amino acid residues found in the fivecorresponding E. coli polypeptide chains (Table 1).Many of these extra amino acids occur in the regionslinking the various domains (Fig. 7). Zalkin et al. (1984)have postulated that connector regions are probablyessential for the structural integrity of multifunctionalproteins, but that their sequence is not important. TheS. cerevisiae-E. coli homologies break down towards theend of the E. coli sequences, making it impossible to sayprecisely where one domain ends and another begins inthe arom sequence and making it difficult to define wherethe connectors begin and end. There are nonetheless fourobvious connector regions linking the five domains of theS. cerevisiae arom polypeptide chian. These connectorregions are characterized by a lack of homology betweenthe E. coli and S. cerevisiae sequences that extends oversome 30-40 residues and in three of the four cases by aninsertion of from 11 to 20 amino acids in the S. cerevisiaesequence (Fig. 7). Secondary structure predictionsfollowing the method ofChou and Fasman indicates thatthese non-homologous connector regions are essentiallydevoid of secondary structure. In the three cases wherethere are insertions the additional amino acids are mainlyhydrophilic.Codon usageThe codon usage of the AROJ gene is shown in Table

3. The pattern resembles that for other S. cerevisiae genesinvolved in amino acid biosynthesis, for example TRP5(Zalkin & Yanofsky, 1982), HIS] (Hinnesbusch & Fink,1983) and HIS4 (Donahue et al., 1982) suggesting thatAROI is expressed at about the same level as these otheramino acid biosynthetic enzymes. Studies on two highlyexpressed S. cerevisiae genes, alcohol dehydrogenase I

.

Table 3., Codon utilizadon in the ARO1 gene

'Term' indicates translation termination codons.

TI' Phe 32 TCT Ser 32TMC Phe 29 TCC Ser 26

TTA Leu 57 TCA Ser 17TTG Leu 44 ICG Ser 14

TAT Tyr 23

TAC Tyr 22

TAA Term -

TAG Term *

T&T Cys 15

TGC Cys 8

T&A Term -

TGG Trp 17

CIT Leu 17 CCT Pro 5 CAT His 27 OCT Arg 16

CIC Leu 4 CC Pro 33 CAC His 11 (GC Arg 0

CrA Leu 15 CCA Pro 31 CAA Glu 34 CGA Arg 1

CMG Leu 9 OCG Pro 0 CAG Glu 10 OGG Arg 0

ATT lle 66 ACT Thr 46 AAT Asn 37 AGT Ser 19

ATC lie 23 ACC Thr 20 AAC Asn 34 AGC Ser 8

ATA Ile 17 ACA Thr 33 AAA Lys 66 AGA Arg 31

ATG Met 31 ACG Thr 9 AAG Lys 46 AGG Arg 5

GIT Val 67 GCT Ala 48 GAT Asp 54 CCGG Gly 73

GTC Val 25 GCC Ala 31 GAC Asp 27 GGG Gly 17

GrA Val 21 GCA Ala 28 GAA Glu 72 GGA Gly 17

GTG Val 18 (=G Ala 6 GAG Glu 38 GM Gly 6

and glyceraldehyde-3-phosphate dehydrogenase, allowedBennetzen & Hall (1982) to identify the 25 preferredcodons which correspond to the most abundantisoacceptor tRNA species of S. cerevisiae. They havederived a codon bias index which quantifies the degree ofthe bias towards these selected codons and whichcorrelates well with the extent of expression of a gene.The codon bias index was calculated forAROJ; the valueobtained (0.25) indicates that is is a moderatelyexpressed gene. This is consistent with the report that theAROM gene of A. nidulans is also expressed at a lowlevel compared with the highly expressed phospho-glycerate kinase gene (Clements & Roberts, 1986).

DISCUSSIONThere is an increasing body of evidence that long

polypeptide chains have evolved by the fusion of smallerpre-existing functional modules (Hardie & Coggins,

1987

384

Page 11: a mosaic of monofunctional domains

Sequence of Saccharomyces cerevisiae arom multifunctional enzyme 385

1986). In some cases, for example the immunoglobulins,the fusions have involved the repetition and diversi-fication of a single structural element, presumablythrough gene duplication followed by divergence (Cush-ley, 1986). In other cases there is evidence that functionswhich in some species are present as separate mono-functional proteins occur in other species as fusedmultifunctional proteins. The amino acid sequences ofthese multifunctional proteins have mosaic structureswith recognizable regions that are closely related to theirmonofunctional counterparts (Hardie & Coggins, 1986).The results presented here for the S. cerevisiae arommultifunctional enzyme demonstrate that its penta-functional polypeptide chain has such a mosaic struc-ture and in this respect is very similar to the A. nidu-lans arom multifunctional enzyme (Charles et al.,1986). The most likely explanation for the origin of thepentafunctional fungal arom polypeptides is that theyhave arisen by the fusion of ancestral E. coli-like genes(Hardie & Coggins, 1986; Charles et al., 1986). Thealternative explanation, that the multifunctional enzymesare more ancient and that the monofunctional bacterialenzymes arose from them by mutational insertion of stopand start codons, cannot however be totally excluded(Hardie & Coggins, 1986).Assuming that the gene fusion hypothesis is correct it

would be expected that at least some of the functionalregions of the arom polypeptide chain would maintain adegree of structural autonomy and that functionaldomains might be isolatable, for example by limitedproteolysis. Although no such studies of the S. cere-visiae arom polypeptide have been reported thedomain structure of the closely related N. crassa arompolypeptide has been studied directly by limitedproteolysis (Smith & Coggins, 1983; Coggins et al., 1985;Coggins & Boocock, 1986). A very stable C-terminaltryptic fragment of Mr 68000 which carries both the3-dehydroquinase and shikimate dehydrogenase acti-vities has been isolated (Smith & Coggins, 1983; Coggins& Boocock, 1986). One particularly interesting propertyof this bifunctional fragnent of the arom polypeptide isthat even after denaturation with 8 M-urea or sodiumdodecyl sulphate it can refold and regain some of itsshikimate dehydrogenase activity (Smith & Coggins,1983; Coggins & Boocock, 1986). This implies that theC-terminal region of the arom polypeptide is a trulyautonomous functional region. Evidence has also beenpresented that expression of the C-terminal region of theA. nidulans AROM gene gives an independently foldingpolypeptide chain carrying 3-dehydroquinase activity(Kinghorn & Hawkins, 1982) and a truncated bi-functional A. nidulans AROM polypeptide carryingEPSP synthase and 3-dehydroquinase activity has beenreported (Charles et al., 1986). The early genetic data forthe N. crassa arom locus, which included the descriptionofmany point mutations lacking a single enzyme activity(Giles et al., 1967a; Rines et al., 1969; Case & Giles,1971) is also consistent with the mosaic model for thearom polypeptide.

Forty years ago Horowitz proposed that biosyntheticpathways, as they occur today, are the result ofretroevolution; that is, they have been progressively builtbackwards from the final metabolite of the pathway(Horowitz, 1945). The mechanism of this processpresumably involved gene duplication followed bydivergence (Horowitz, 1945, 1965) and one would

therefore expect that the evolved proteins would retainsome homology with the ancestral protein at the end ofthe metabolic sequence. Evidence in support of thishypothesis has recently been provided by the demon-stration that two enzymes catalysing successive steps inmethionine biosynthesis in E. coli are homologous(Belfaiza et al., 1986). While the sequence homologiespresented here between the five monofunctional E. colishikimate pathway enzymes and the multifunctionalarom polypeptides imply that the tertiary structures ofthe functional domains are conserved, we have so farbeen unable to identify any homologies, at the primarystructure level, between the five shikimate pathwayenzymes. The question of whether these five enzymeshave common structural features at the tertiary level willhave to await detailed three-dimensional structuralanalysis.

Gaertner and his co-workers have attributed somevery interesting catalytic properties to the N. crassa aromsystem (Gaertner et al., 1970; Welch & Gaertner, 1975,1976). These included 'catalytic facilitation' (Gaertneret al., 1970), 'channelling' (Welch & Gaertner, 1975) and'co-ordinate regulation' (Welch & Gaertner, 1976). It isnow clear that all these experiments were carried out witharom that was not only proteolytically degraded(Gaertner, 1978) but was also seriously deficient in3-dehydroquinate synthase activity (Lambert et al.,1985). Also the kinetic parameters used in the calculationswere very different from those determined more recentlywith well defined preparations of homogeneous enzyme(Lambert et al., 1985; Coggins & Boocock, 1986). At thepresent time we are not aware of any conclusive evidenceof catalytic interactions between the componentenzymes, nor have we obtained any evidence ofco-ordinate activation (G. A. Nimmo, M. R. Boocock,J. M. Lambert & J. R. Coggins, unpublished work). Thislack of evidence for any special catalytic properties forthe arom system has lead us to consider an alternativeadaptive advantage for the occurrence of the penta-functional arom polypeptide chain. By having fiveenzymic functions involved in catalysing five sequentialsteps on a biosynthetic pathway on a single multi-functional polypeptide chain the problem of co-ordinating the expression of the five separate enzymeactivities is avoided. In this connection it is interesting tonote that the turnover numbers of the five arom enzymeactivities for the N. crassa multifunctional enzyme arevery similar (Lambert et al., 1985).We wish to thank Dr. Frank Larimer (Oak Ridge National

Laboratory, U.S.A.) for a gift of plasmid pFL6. The workcarried out in Glasgow was supported by a research grant fromSearle Research and Development.

REFERENCESAhmed, S. I. & Giles, N. H. (1969) J. Bacteriol. 99, 231-237Alton, N. K., Buxton, F. Patel, V., Giles, N. H. & Vapnek, D.

(1982) Proc. Natl. Acad. Sci. U.S.A. 79, 1955-1959Amrhein, N., Schab, J. & Steinrucken, H. C. (1980) Natur-

wissenschaften 67, 356-357Anton, I. A. & Coggins, J. R. (1987) Biochem. J., in the

pressBachmann, B. (1983) Microbiol. Rev. 44, 180-230Belfaiza, J., Parsot, C., Martel, A., Bouthier de la Tour, C.,

Margarita, D., Cohen, G. N. & Saint-Girons, I. (1986) Proc.Natl. Acad. Sci. U.S.A. 83, 867-871

Bennetzen, J. L. & Hall, B. D. (1982) J. Biol. Chem. 257,3026-3031

Vol. 246

Page 12: a mosaic of monofunctional domains

386 K. Duncan, R. M. Edwards and J. R. Coggins

Berlyn, M. B. & Giles, N. H. (1969) J. Bacteriol. 99, 222-230Berlyn, M. B., Ahmed, S. I. & Giles, N. H. (1970) J. Bacteriol.

104, 768-774Biggin, M. D., Gibson, T. J. & Hong, G. F. (1983) Proc. Natl.Acad. Sci. U.S.A. 80, 3963-3965

Bode, R. & Birnbaum, D. (1981) Z. Allg. Mikrobiol. 21,417-422

Boocock, M. R. & Coggins, J. R. (1983) FEBS Lett. 154,127-133

Case, M. E. & Giles, N. H. (1971) Proc. Natl. Acad. Sci. U.S.A.68, 58-62

Catcheside, D. E. A., Storer, P. J. & Klein, B. (1985) Mol. Gen.Genet. 199, 446-451

Charles, I. J., Keyte, J. W., Brammar, W. J. & Hawkins, A. R.(1985) Nucleic Acids Res. 13, 8119-8128

Charles, I. J., Keyte, J. W., Brammar, W. J., Smith, M. &Hawkins, A. R. (1986) Nucleic Acids Res. 14, 2201-2213

Chaudhuri, S. & Coggins, J. R. (1985) Biochem. J. 226,217-223

Chaudhuri, S., Lambert, J. M., McColl, L. A. & Coggins, J. R.(1986) Biochem. J. 239, 699-704

Coggins, J. R. (1986) in Biotechnology and Crop Improvementand Protection (Day, P. R., ed.), pp. 101-110, British CropProtection Council, Croydon

Coggins, J. R. & Boocock, M. R. (1986) in MultifunctionalProteins: Structure and Evolution (Hardie, D. G. &Coggins, J. R., eds.), pp. 259-281, Elsevier, Amsterdam

Coggins, J. R., Boocock, M. R. Campbell, M. S., Chaudhuri,S., Lambert, J. M., Lewendon, A., Mousdale, D. M. &Smith, D. D. S. (1985) Biochem. Soc. Trans. 13, 299-303

Comai, L., Sen, L. C. & Stalker, D. M. (1983) Science 221,370-371

Clements, J. & Roberts, G. F. (1986) Gene 44, 97-105Cushley, W. (1986) in Multifunctional Proteins: Structure and

Evolution (Hardie, D. G. & Coggins, J. R., eds.), pp. 13-53,Elsevier, Amsterdam

Da Silva, A. J. F., Whittington, H., Clements, J., Roberts,C. F. & Hawkins, A. R. (1986) Biochem. J. 240, 481-488

de Leeuw, A. (1967) Genetics 56, 554-555Devereux, J., Haeberli, P. & Smithies, 0. (1984) Nucleic Acids.

Res. 12, 387-395Donahue, T. F., Farnbaugh, P. J. & Fink, G. R. (1982) Gene

18, 47-59Duncan, K., Lewendon, A. & Coggins, J. R. (1984) FEBS Lett.

170, 59-63Duncan, K., Chaudhuri, S., Campbell, M. S. & Coggins, J. R.

(1986) Biochem. J. 238, 475-483Duncan, K., Dacey, S. A., Edwards, R. M. & Coggins, J. R.

(1987) FEBS Lett., in the pressFiedler, E. & Schultz, (1985) Plant Physiol. 79, 212-218Gaertner, F. H., Ericson, M. C. & DeMoss, J. A. (1970) J. Biol.Chem. 245, 595-600

Gaertner, F. H. (1978) in Microenvironments and MetabolicCompartmentation (Srere, P. A. & Estabrook, R. W., eds.),pp. 345-353, Academic Press, New York

Ganem, B. (1978) Tetrahedron 34, 3353-3383Giles, N. H. (1978) Am. Naturalist 112,-641-658Giles, N. H., Case, M. E., Partridge, C. W. H. & Ahmed, S. I.

(1967a) Proc. Natl. Acad. Sci. U.S.A. 58, 1453-1460Giles, N. H., Partridge, C. W. H., Ahmed, S. I. & Case, M. E.

(1-967b)-Proc. Natl;- Acad. Sci. U;S.A;-58, 1930-1937Hardie, D. G. & Coggins, J. R. (1986) in Multifunctional

Proteins: Structure and Evolution (Hardie, D. G. &Coggins, J. R., eds.), pp. 332-334, Elsevier, Amsterdam

Hardie, D. G. & McCarthy, T. (1986) in MultifunctionalProteins: Structure and Evolution (Hardie, D. G. &Coggins, J. R., eds.), pp. 229-258, Elsevier, Amsterdam

Henner, D. J. & Hoch, J. A. (1980) Microbiol. Rev. 44, 57-82Hinnebusch, A. G. & Fink, G. R. (1983) J. Biol. Chem. 258,

5238-5247

Horowitz, N. H. (1945) Proc. Natl. Acad. Sci. U.S.A. 31,153-157

Horowitz, N. H. (1965) in Evolving Genes and Proteins(Bryson, V. & Vogel, H. J., eds.), pp. 15-23, Academic Press,New York

Kinghorn, J. R. & Hawkins, A. R. (1982) Mol. Gen. Genet.186, 145-152

Kirschner, K. & Bisswanger, H. (1976) Annu. Rev. Biochem.45, 143-166

Koshiba, T. (1979) Plant Cell Physiol. 2, 667-670Kozak, M. (1984) Nucleic Acids Res. 12, 857-872Lambert, J. M., Boocock, M. R. & Coggins, J. R. (1985)

Biochem. J. 226, 817-829Larimer, F. W., Morse, C. C., Beck, A. K., Cole, K. W. &

Gaertner, F. H. (1983) Mol. Cell. Biol. 3, 1609-1614Lewendon, A. (1984) Ph.D. Thesis, University of GlasgowLewendon, A. & Coggins, J. R. (1983) Biochem. J. 213,

187-191Lumsden, J. & Coggins, J. R. (1977) Biochem. J. 161, 599-607Maniatis, T., Fritsch, E. F. & Sambrook, J. (1982) Molecular

Cloning: A Laboratory Manual, Cold Spring HarborLaboratory, New York

Messing, J. & Vieira, J. (1982) Gene 19, 269-276Midgley, C. A. & Murray, N. E. (1985) EMBO J. 4, 2695-2703Millar, G. & Coggins, J. R. (1986) FEBS Lett. 200, 11-17Millar, G., Lewendon, A., Hunter, M. & Coggins, J. R. (1986)

Biochem. J. 237, 427-437Miozar, G. F. & Yanofsky, C. (1979) Nature (London) 277,

486-489Mousddle, D. M. & Coggins, J. R. (1984) Planta 160, 78-83Mousdale, D. M., Campbell, M. S. & Coggins, J. R. (1987)

Phytochemistry 26, in the pressNakanishi, N. & Yamamoto, M. (1984) Mol. Gen. Genet. 195,

164-169Norrander, J., Kempe, T. & Messing, J. (1983) Gene 26,

101-106Polley, L. D. (1978) Biochim. Biophys. Acta 526, 259-266Rines, H. W., Case, M. E. & Giles, N. H. (1969) Genetics 61,

789-800Sanderson, K. E. & Roth, J. R. (1983) Microbiol. Rev. 47,

410-553Sanger, F., Nicklen, S. & Coulson, A. R. (1977) Proc. Natl.Acad. Sci. U.S.A. 74, 5463-5467

-Schmincke-Ott, E. & Bisswanger, H. (1980) in MultifunctionalProteins (Bisswanger, H. & Schmincke-Ott, E., eds.), pp. 1-29,Wiley, New York

Schweizer, M. (1986) in Multifunctional Proteins: Structureand Evolution (Hardie, D. G. & Coggins, J. R., eds.), pp.195-227, Elsevier, Amsterdam

Smith, D. D. S. & Coggins, J. R. (1983) Biochem. J. 213,405-415

Smith, T. F. & Waterman, M. S.. (1981) Adv. Appl. Math. 2,482-489

Stalker, D. M., Hiatt, W. R. & Comai, L. (1985) J. Biol. Chem.260, 4724-4728

Steinrucken, H. C. & Amrhein, N. (1984) Eur. J. Biochem. 143,341-349

Strauss, A. (1979) Mol. Gen. Genet. 172, 233-241Walker, J. E., Saraste, M., Runswick, M. J. & Gray, M. J.

(1982) EMBO J. 1, 945-951Walsh,-C. (1979) Enzyme Reaction Mechanisms, pp. 553-556,

Freeman, San FranciscoWelch, G. R. & Gaertner, F. H. (1975) Proc. Natl. Acad. Sci.

U.S.A. 72, 4218-4222Welch, G. R. & Gaertner, F. H. (1976) Arch. Biochem.

Biophys. 172, 476-489Zalkin, H. & Yanofsky, C. (1982) J. Biol. Chem. 257,

1491-1500Zalkin, H., Puluh, J. L., van Cleemput, M., Maoye, W. S. &

Yanofsky, C. (1984) J. Biol. Chem. 259, 3985-3992

Received 5 December 1986/2 April 1987; accepted 13 May 1987

1987